Fossil group origins. XI. The dependence of galaxy orbits on the magnitude gap

We aim to study how the orbits of galaxies in clusters depend on the prominence of the corresponding central galaxies. We divided our data set of $\sim$ 100 clusters and groups into four samples based on their magnitude gap between the two brightest members, $\Delta m_{12}$. We then stacked all the systems in each sample, in order to create four stacked clusters, and derive the mass and velocity anisotropy profiles for the four groups of clusters using the MAMPOSSt procedure. Once the mass profile is known, we also obtain the (non parametric) velocity anisotropy profile via the inversion of the Jeans equation. In systems with the largest $\Delta m_{12}$, galaxy orbits are prevalently radial, except near the centre, where orbits are isotropic (or tangential when also the central galaxies are considered in the analysis). In the other three samples with smaller $\Delta m_{12}$, galaxy orbits are isotropic or only mildly radial. Our study supports the results of numerical simulations that identify radial orbits of galaxies as the cause of an increasing $\Delta m_{12}$ in groups.


Introduction
The term "fossil group" (FG) was first introduced by Ponman et al. (1994) for an apparently isolated elliptical galaxy surrounded by an X-ray halo, with a X-ray luminosity typical of a group of galaxies. Ponman et al. (1994) made the hypothesis that FGs could be the fossil relics of old groups of galaxies, in which the L * galaxies (where L * is the characteristic magnitude of the cluster luminosity function) have had enough time to merge with the central one (BCG). Follow-up investigations have identified companion galaxies to the FG BCG (Jones et al. 2003), and established the currently adopted definition of a FG. For a galaxy system to be classified a FG, it must have an X-ray luminosity L X ≥ 10 42 h −2 50 erg s −1 , and a magnitude gap, ∆m 12 ≥ 2, in the r−band, between the BCG and the second brightest group member within 0.5 r 200 from the BCG. With this definition, even some clusters can enter the FG class (Cypriano et al. 2006;Zarattini et al. 2014).
Fossil groups are found to be transitional objects both in numerical simulations (von Benda-Beckmann et al. 2008;Kundert et al. 2017) and in observations (Aguerri et al. 2018). If FGs are created by merging of L * galaxies onto the central BCG, a mechanism is needed to enhance the central merger rate of galaxies in FGs relative to other galaxy systems. In the standard cosmological model, groups and clusters of galaxies form hierarchically by merging of dark matter (DM) ha-los. The survival time of a subhalo accreted by a larger one, depends on its orbit. The merger timescale with the central halo is shorter for L * galaxies on radial orbits than for galaxies on tangential orbits (see eq. 4.2 of Lacey & Cole 1993). Subhalos on more radial orbits are more easily destroyed, and the disrupted material is accreted onto the central halo (e.g. Wetzel 2011;Contini et al. 2018). Using TreeSPH simulations, Sommer-Larsen et al. (2005) were the first to point out that galaxies in FGs are located on more radial orbits than those in non-fossil systems. A different orbital distribution of galaxies in fossil and non-fossil systems could then naturally explain the increased growth of the central galaxy in FGs at the expense of disrupted satellites approaching on radial orbits. Moreover, D'Onghia et al. (2005) claimed that the infall of L * galaxies along filaments with small impact parameters is required to explain the existence of FGs in numerical simulations. Testing this scenario requires determining the orbits of FG galaxies.
Orbits of galaxies in non-fossil systems have been determined observationally through the use of the Jeans equation (Binney & Tremaine 1987) that relates the mass profile of an observed spherically-symmetric system, M(r), to the radial component of the velocity dispersion profile, σ r (r), the number density profile of the tracer, ν(r), and the velocity anisotropy profile, A&A proofs: manuscript no. main where σ θ , and σ φ , are the two tangential components of the velocity dispersion, assumed to be identical. The velocity anisotropy profile describes the relative content in kinetic energy of galaxy orbits along the tangential and radial components. For purely radial (resp. tangential) orbits β = 1 (resp. β = −∞), while β = 0 correspond to isotropic orbits. In lieu of β, a widely used parameter to describe the velocity anisotropy is σ r /σ θ ≡ (1 − β) −1/2 (e.g. Biviano & Poggianti 2009). For purely radial (resp. tangential) orbits σ r /σ θ = +∞ (resp. σ r /σ θ = 0), while σ r /σ θ = 1 corresponds to isotropic orbits. Several studies found passive/red/early-type galaxies in lowredshift clusters to follow nearly isotropic orbits, whereas star-forming/blue/late-type galaxies follow more radially elongated orbits (Mahdavi et al. 1999;Hwang & Lee 2008;Munari et al. 2014;Mamon et al. 2019). However, this trend is not universal, since Aguerri et al. (2017) found that early-type galaxies have more radially elongated orbits than late-type galaxies in Abell 85. At intermediate redshifts, up to z ∼ 1, all cluster galaxies follow a trend of increasingly radial orbits with increasing distance from the cluster centre (Biviano & Poggianti 2009;Biviano et al. 2013Biviano et al. , 2016aCapasso et al. 2019a), independent of their colour/spectral type.
Previous determinations of β(r) have been obtained for clusters of galaxies with a sufficiently rich spectroscopic data set, either individually (e.g. Biviano et al. 2013) or as stacks of several clusters (e.g. Biviano & Poggianti 2009). It is interesting to determine β(r) for fossil systems, to learn more about their formation process, that numerical simulations suggest to be related to the orbital shape of their galaxies . Unfortunately, a suitable data set for fossil systems that would allow a precise determination of their β(r) does not exist at present. We therefore selected a data set of 97 clusters and groups for which we measured the magnitude gap between the two brightest members, ∆m 12 , independently of whether these systems are classified as fossil or not. By stacking these systems in four bins of ∆m 12 we can study the dependence of the orbits of their galaxies on ∆m 12 . This is the aim of this work.
A substantial part of the data set we use in this paper comes from the "Fossil Group Origins" (FOGO) project, presented in Aguerri et al. (2011). The detailed study of the sample was presented in Zarattini et al. (2014) and, within the same project, we also published a study of on the properties of central galaxies in FGs (Méndez-Abreu et al. 2012), their X-ray versus optical properties , the dependence on the magnitude gap of the luminosity functions (LFs, Zarattini et al. 2015) and substructures (Zarattini et al. 2016). The X-ray scaling relations of FGs were presented in Kundert et al. (2017), the stellar population in FG central galaxies are analysed in Corsini et al. (2018), and the velocity segregation of galaxies is studied in Zarattini et al. (2019).
The structure of this paper is the following. We describe the samples in Sect. 2, and the methods used in our analysis in Sect. 3. We present our results in Sect. 4, and provide our conclusions in Sect. 5.
Throughout this paper, as in the rest of the FOGO papers, we adopt the following cosmology, H 0 = 70 km s −1 Mpc −1 , Ω Λ = 0.7, and Ω M = 0.3.

Samples
In this paper we use the same data set already used in Zarattini et al. (2015) and Zarattini et al. (2019), that comes from the merging of two different data sets. The first data set (S1 hereafter) comprises 34 FG candidates proposed by Santos et al. (2007) and already analysed by the FOGO team (Aguerri et al. 2011;Zarattini et al. 2014). The spectroscopy of S1 is ≥ 70% (resp. ≥ 50%) complete down to m r = 17 (resp. m r = 18). We removed 12 systems with less than 10 spectroscopic members each. We also removed another system because its membership assignment is uncertain (FGS15; see Zarattini et al. 2014). We were left with 21 systems with z < 0.25 and with a total of 1065 spectroscopic members. We refer the reader to Zarattini et al. (2014) for more details on S1 and the membership selection.
For each of the 21 S1 systems we computed ∆m 12 (see Table 1 in Zarattini et al. 2014). Since S1 only contains FG candidates, systems in S1 have a high mean ∆m 12 ≃ 1.5, with only four systems with ∆m 12 < 0.5. To determine whether the orbits of galaxies depend on their system ∆m 12 , we need to consider another data set (that we call S2) that includes systems spanning a wider range of ∆m 12 . We used the data set of Aguerri et al. (2007) that contain all the 88 z < 0.1 clusters in the catalogues of Zwicky et al. (1961), Abell et al. (1989), Voges et al. (1999), andBöhringer et al. (2000), that are also observed in the Sloan Digital Sky Survey Data Release 4 (SDSS-DR4, Adelman-McCarthy et al. 2006). The spectroscopic completeness of the S2 sample is ≥ 85% (resp. ≥ 60%) down to m r = 17 (resp. m r = 18). Of the 88 available clusters, we selected only those 76 with spectroscopically confirmed ∆m 12 . The total number of spectroscopic members in the S2 data set is 4338.
As the goal of this work is to study the dependence of β(r) on ∆m 12 , we divided our 97 S1+S2 galaxy systems into 4 samples in bins of ∆m 12 , chosen to ensure at least 20 systems in each bin. The four samples contain 31 systems with ∆m 12 ≤ 0.5, 23 with 0.5 < ∆m 12 ≤ 1.0, 23 with 1.0 < ∆m 12 ≤ 1.5, and 20 with ∆m 12 > 1.5 (see Table 1). The properties of the systems in the four samples are given in Tables A.1

Stacking the clusters
The number of spectroscopic members in any of the clusters is too small for allowing a robust individual cluster determination of β(r), with the exception of Abell 85, already  (2): number of clusters in each sample. Column (3): number of member galaxies. Column (4): fraction of members within r 200 . Columns (5, 6, 7): weighted averages of cluster redshift, r 200 , and v 200 resp. (using the number of member galaxies in each cluster as a weight) and their 1 σ uncertainties. Column (8): best-fit scale radius of the galaxies number density profile (in units of r 200 ) and its 1 σ uncertainty.
analysed by Aguerri et al. (2017). To improve statistics, we built stacks of clusters in each ∆m 12 sample. In our stacking procedure we follow several previous dynamical studies (e.g. van der Marel et al. 2000;Rines et al. 2003;Katgert et al. 2004;Biviano et al. 2016b). The procedure relies on the assumption that different clusters have similar mass profiles, differing only for the normalisation. Such an assumption is justified by the existence of a universal mass profile for cosmological halos (Navarro et al. 1997) and on the fact that the concentration of halo mass profiles is very mildly mass-dependent (e.g. Biviano et al. 2017). Following Munari et al. (2013), we compute the virial radius 1 r 200 = σ v /(6.67 H z ), where σ v is the line-of-sight restframe velocity dispersion and H z is the Hubble constant at redshift z. Only for one system, FGS28, we estimate its virial radius from its X-ray luminosity, L X , since it does not contain enough members in its central region for a reliable σ v estimate (see note in Table A.4 for details). We also compute the virial velocity v 200 = 10 H z r 200 . We stack the individual clusters by scaling the cluster-centric galaxy distances R to the virial radius, R/r 200 , and the line-of-sight, rest-frame velocities, v rf ≡ c (z − <z>)/(1 + <z>), where c is the speed of light and <z> is the cluster mean redshift, to the virial velocity, v rf /v 200 .
The properties of the four stacks are computed as the weighted averages of the properties of the clusters in each sample, using the number of member galaxies as weights, and are presented in Table 1. The four stacks have very similar mean redshifts, while the mean r 200 values are marginally different among each others. The projected phase-space distributions of galaxies in the four samples is shown in Fig. 1.

MAMPOSSt
When the mass profile of a cluster is derived from the kinematics of its galaxies (as in this work) using the Jeans equation (e.g. van der Marel 1994), the solutions for M(r) and β(r) are degenerate with respect to the usually adopted osbervables, the projected number-density and velocity-dispersion profiles (e.g. Walker et al. 2009). The MAMPOSSt technique of Mamon et al. (2013) has been shown to partially break this degeneracy. It estimates M(r) and β(r) in a parametrised form, by performing a maximum likelihood fit to the full distribution of galaxies in the projected phase space.
In our analysis we consider three models for M(r): 1 We define virial radius r 200 the radius of a sphere with mass overdensity 200 times the critical density at the cluster redshift.
the Navarro, Frenk, and White profile (NFW, Navarro et al. 1997), where M 200 ≡ 100 H 2 z r 3 200 /G, and H z is the Hubble constant at the system redshift, c = r 200 /r −2 is the concentration of M(r), and r −2 is the scale radius, defined as the radius where the NFW profile has a logarithmic slope of -2 (Navarro et al. 2004).
-The Einasto profile (Einasto 1965;Navarro et al. 2004), where P(a, x) represents the regularised incomplete gamma function, and where we fix m = 5, which represents well cluster-size halos in numerical simulations (Mamon et al. 2010). -The Burkert profile (Burkert 1995;Biviano et al. 2013), where r B is the scale radius of the model. All these models have two free parameters, r −2 and r 200 . However, the four stacks on which we run MAMPOSSt, have the observables already defined in virial units, R/r 200 and v rf /v 200 (see Sect. 3.1), so r 200 is no longer a free parameter. In Sect. 4 we show that if we allow r 200 as a free parameter in the MAMPOSSt analysis, the best-fit values are consistent with the mean values reported in Table 1, and the likelihood of the MAMPOSSt best-fit does not improve significantly with respect to keeping r 200 fixed. We consider five different models for β(r): -"C": constant anisotropy with radius, β = β c .
-"T": from Tiret et al. (2007), where β ∞ is the anisotropy value at large radii. -"O": from Biviano et al. (2013), Article number, page 3 of 12 A&A proofs: manuscript no. main -"ML": from Mamon & Łokas (2005), where r β is the anisotropy radius. -"OM": from Osipkov (1979) and Merritt (1985), All these β(r) models have one free parameter each (β C , β ∞ , or r β ). We run MAMPOSSt in the so-called Split mode (Mamon et al. 2013), that is we use an external maximumlikelihood analysis to determine the value of the scale radius of the galaxies number density profile, r ν . We fit the radial distributions of the galaxies in each stack with NFW models (in projection), taking into account the correction for sample incompleteness as in Zarattini et al. (2019). The best fit values for r ν are given in Table 1. The ∆m 12 > 1.5 sample has a slightly more concentrated distribution of galaxies than the other three samples.

Inversion of the Jeans equation
While MAMPOSSt is able to constrain M(r) and β(r), the constraints are specific to the set of models that are considered (see the previous Sect.). There is a vast literature on the modelisation of cluster M(r) (e.g. Ludlow et al. 2013;Pratt et al. 2019, and references therein). On the other hand, less is known from numerical simulations and observations about the shape of β(r) in galaxy systems, and a large variance among different systems has been suggested (see Fig. 1 in Mamon et al. 2013). Our choice of models for MAMPOSSt could therefore be adequate to describe M(r), but not perhaps to describe β(r). To confirm that our β(r) modelisation is not too restrictive we use the M(r) determined by the MAMPOSSt analysis, to directly invert the Jeans equation and derive β(r) in an (almost) non-parametric way. For this, we follow the method of Binney & Tremaine (1987) in the implementations of Solanes & Salvador-Sole (1990) and Dejonghe & Merritt (1992).
Our procedure is the following. We fix M(r) to the MAM-POSSt solution. The two observables we need to consider are the number density and velocity dispersion profiles. We apply the LOWESS technique (see, e.g., Gebhardt et al. 1994) to smooth these profiles. The number density profile is then de-projected numerically, (using Abel's equation, see Binney & Tremaine 1987). Since the equations to be solved contain integrals up to infinity, we extrapolate the profiles to a large-enough radiuswe find 30 Mpc to be sufficient for our results to be stable. The extrapolations are performed as in Biviano et al. (2013).
Uncertainties in the β(r) profiles are estimated by performing the Jeans inversion on 100 bootstrap resamplings of the original datasets.

MAMPOSSt
We apply MAMPOSSt to the four samples of Table 1, limiting each data set to the region 0.05 Mpc ≤ R ≤ r 200 , since the inner region, R < 0.05 Mpc, is dominated by the BCG, where our parametrisation of M(r) may not work because the total mass is where N data is the sample size, and N pars is the number of free parameters used in the model andL is the MAMPOSSt-derived likelihood. The BIC values obtained using two free parameters (r −2 , and the β(r) parameter) are systematically lower than the BIC values obtained using three free parameters (i.e. adding r 200 as a free parameter) for all combinations of M(r) and β(r) models. This means there is no statistical advantage of adding r 200 as a free parameter in our analysis, presumably because the stack sample observables are already in normalised units with respect to r 200 and v 200 . Anyway, we checked that the best-fit r 200 values obtained by MAMPOSSt in the 3-free parameters runs, are consistent with the weighted mean values of r 200 resulting from the cluster stacking procedure (listed in Table 1; see also Sect. 3.1).
The main results of the MAMPOSSt analysis are given in Table 2. We list the best-fit values of r −2 and the values of β(r) at two characteristic radii (r 200 /4 and r 200 ). These values are listed for the combination of M(r) and β(r) models that give the minimum BIC values for each of the four samples. The minimum-BIC solutions for the three smaller ∆m 12 samples are obtained using the Einasto mass profile and the constant anisotropy profile. On the other hand, for the ∆m 12 > 1.5 sample the minimum-BIC solution is obtained using the Burkert mass model and the T anisotropy model. The listed uncertainties are marginalised errors obtained from a Markov chain Monte Carlo (MCMC) analysis.
The ∆m 12 > 1.5 sample differs from the other three not only for the different minimum-BIC models, but also for the larger S. Zarattini et al.: The dependence of galaxy orbits on the magnitude gap  (2) value of β(r 200 ), and for the smaller value of r −2 . The smaller r −2 implies a higher concentration of the mass distribution, as already found for the galaxy distribution in Sect. 3.2. A higher mass concentration for systems with large magnitude gaps is predicted by cosmological numerical simulations (Ragagnin et al. 2019). However, the smaller r −2 we find for ∆m 12 > 1.5 systems probably compensates for the fact that the Burkert mass model is cored at the centre, unlike the Einasto model. In fact, when we force the NFW M(r) model to all the samples, the r −2 values of the four samples are not much different (see the second set of results in Table 1). On the other hand, the β(r 200 value of the ∆m 12 > 1.5 sample is larger than the corresponding values of the the other three samples, independently of the M(r) model. Table 2 are the weighted average MAMPOSSt results of all models combinations, using the MAMPOSSt likelihoodsL as weights. The quoted errors on the parameters are the weighted variance. Also for this set of results, it is confirmed that the ∆m 12 > 1.5 sample has a higher β(r 200 ) value compared to the other three samples, although the difference is less significant than for the minimum-BIC, and the minimum-BIC NFW sets of results.

The third set of results shown in
In Fig. 2 we display the four samples velocity anisotropy profiles, σ r /σ θ , corresponding to the first and third sets of results of Table 2. We do not show the velocity anisotropy models obtained by forcing the NFW M(r) for the sake of clarity of the plot -anyway, they are quite similar to those of the minimum-BIC models. The velocity anisotropy of the ∆m 12 > 1.5 sample is increasing with radius, indicating more radial orbits in the outer regions than the other three samples.
The differences we found are larger than 1σ, but smaller than 3σ. There is therefore only tentative evidence for the presence of more radial orbits in systems with large magnitude gap. A larger data set is needed to provide a more solid statistical basis to our result, and eventually to extend it to a sample of pure fossil systems.

Jeans equation inversion
Given the best-fit MAMPOSSt M(r) models and the observables, namely the galaxies number-density and velocity-dispersion profiles, we now perform the inversion of the Jeans equation to determine β(r) in a non-parametric form. This procedure allows us to free the determination of β(r) from the constraints imposed by the choice of models used in the MAMPOSSt analysis. Also in this case we limit the analysis to the region 0.05 Mpc ≤ R ≤ r 200 .
The results of the Jeans inversion analysis are shown in Fig. 3. The results are similar to those obtained with MAM-POSSt. The marginal differences (always within 20%) between the σ r /σ θ profiles obtained by the two methods can be attributed to the limited number of β(r) models considered in MAMPOSSt.
There appears to be a trend of increasing β(r) at large radii with increasing ∆m 12 . This is confirmed by the values of β(r 200 ) reported in Table 2.
Near the centre, the situation is less clear. To better investigate the inner region, we repeat the Jeans inversion analysis by A&A proofs: manuscript no. main Fig. 3. Red solid curves and orange shadings: velocity anisotropy profile σ r /σ θ and 1 σ confidence regions (estimated from 100 bootstrap resampling) for the four samples, obtained from the Jeans equation inversion using the minimum-BIC MAMPOSSt M(r) (see Table 2). The dashed red curves indicate the solutions obtained including galaxies in the central < 0.05 Mpc regions. For comparison, the grey shadings reproduce the 1 σ confidence regions of the MAMPOSSt solutions shown in Fig. 2. Top-left panel: systems with ∆m 12 ≤ 0.5. Top-right panel: systems with 0.5 < ∆m 12 ≤ 1.0. Bottom-left panel: systems with 1.0 < ∆m 12 ≤ 1.5. Bottom-right panel: systems with ∆m 12 > 1.5. extending the analysed region to 0.0 ≤ R ≤ r 200 . The results are shown in Fig. 3 (dashed lines). Including the galaxies near the centre leads to decreasing velocity anisotropy near the centre in all samples, but the decrease is stronger in the systems with higher ∆m 12 . A possible explanation for this behaviour resides in the velocity segregation of BCGs that is stronger for systems with higher ∆m 12 , as found by Zarattini et al. (2019). Dynamical friction can decrease the velocities of galaxies but also make their orbits more isotropic if not tangential.

Systematics
Here we examine possible systematics affecting our result.
In low-z clusters, late-type/blue galaxies are observed to have more radially anisotropic orbits than early-type/red galaxies Munari et al. 2013;Mamon et al. 2019, e.g.). If the ∆m 12 > 1.5 systems contain a a larger fraction of late-type/blue galaxies compared to the systems that compose the other three samples, this could explain the higher values of β at large radii.
Lacking detailed studies of galaxy populations as a function of ∆m 12 in the literature, we here determine the galaxies g − r colour distribution in the four stacks. These are shown in Fig. 4. It can be seen that systems in the ∆m 12 > 1.5 bin do not show a larger amount of late-type/blue galaxies. If anything, they contain more red galaxies, mostly because of the deep spectroscopic follow-up of the S1 data set, that contributes most systems in the ∆m 12 > 1.5 bin. The S1 data set includes many dwarf galaxies, up to three magnitudes fainter than SDSS spectroscopy. Dwarf galaxies in clusters/groups are mostly early-type (e.g. Jerjen & Tammann 1997;Lisker et al. 2013), and therefore occupy the red tail of the g − r distribution.
Different β(r) have also been reported for galaxies of different stellar masses (Annunziatella et al. 2016) or luminosity . To check if a different magnitude distri- Fig. 4. g − r colour distribution of the galaxies in the four samples. Dotted black histogram and grey shading: ∆m 12 ≤ 0.5. Dashed red histogram and orange shading: 0.5 < ∆m 12 ≤ 1.0. Dash-dotted violet histogram and pink shading: 1.0 < ∆m 12 ≤ 1.5. Solid green histogram and turquoise shading: ∆m 12 > 1.5. bution could be at the origin of the different β(r) seen for the ∆m 12 > 1.5 stack, we repeat the MAMPOSSt analysis using only galaxies with r ≤ 17.77. This is the magnitude limit of the SDSS spectroscopy, and effectively exclude the tail of red dwarf galaxies from the two highest-∆m 12 samples, while leaving almost unchanged the other two samples. We find that the results for the β(r) of the four stacks do not change significantly when applying the magnitude cut r ≤ 17.77, with β(r 200 ) changing by < 5%.
The number of members is very different in different systems of our data set. To check that our result is not driven by a few very rich systems, we removed from the ∆m 12 > 1.5 sample the three richest clusters, FGS03, FGS27 and FGS30 that together contain almost 1/3 of all members in their sample (see Table A.4). We performed the full analysis on the remaining sample of 17 systems. The resulting σ r /σ θ profile is very similar to the original one based on all 20 systems (see Fig. 5).
The S1 and S2 data sets cover different z ranges, their median z are 0.16 and 0.07, respectively. This difference reflects in an increase of the <z> of the four samples with increasing ∆m 12 , since the higher-∆m 12 systems are mostly from the S1 data set. To check for a z-dependence of β(r) we divided our ∆m 12 > 1.5 sample in two subsamples, of ten clusters at z < 0.12 and another ten at z > 0.12. After performing the dynamical analysis separately on the two subsamples, we found no significant difference in their β(r), but the uncertainties are large due to the small size of the subsamples, so this test cannot be considered very significant.
There is independent support against the hypothesis that the <z> difference across the four samples can be the reason for the observed difference in β(r). The <z> range across the four samples correspond to 0.7 Gyr in cosmic time. This is only 25% of the dynamical time for a typical system of galaxies at z ∼ 0.1 (Sarazin 1986), and it is unlikely that galaxies could modify their orbits in such a short time. Moreover, there is no observational evidence for orbital evolution of cluster galaxies across the much larger cosmic time span from z = 1.32 to z = 0.26 (corresponding to 6 Gyr, Capasso et al. 2019b).

Discussion and conclusions
We analysed a data set of 97 galaxy clusters and groups to study the dependence of β(r), and therefore of the orbital distribution of galaxies, on ∆m 12 . We split our data set in four samples of different ∆m 12 . We then stacked together the systems in each of the four samples. We then run MAMPOSSt to derive the mass and the (parametric) anisotropy profiles of the four samples. Finally, with the mass profiles obtained from MAMPOSSt, we perform the inversion of the Jeans equation, allowing us to determine β(r) in a model-independent way.
We find that β(r) shows a steeper dependence on r for the systems with ∆m 12 > 1.5 than for the other three samples with smaller ∆m 12 . The orbits of galaxies in the ∆m 12 > 1.5 stack are more radial (with marginal significance, at more than 1σ level) at large radii (r 0.8 r 200 ) than in systems with smaller ∆m 12 . In the central regions the orbits of galaxies are nearly isotropic in all stacks, or even tangential at radii < 0.05 Mpc in the ∆m 12 > 1.5 stack. The tangential orbits found in the very central regions of these systems are related to the observed velocity segregation (Zarattini et al. 2019) in the same region, and can be interpreted as an effect of dynamical friction slowing down galaxies that approach their cluster centre.
Dynamical friction is thought to be more efficient for galaxies on radial orbits (Lacey & Cole 1993). As galaxies lose their kinetic energy due to dynamical friction, they can more easily merge with the central galaxy, and this is the process suggested by Sommer-Larsen et al. (2005) for the formation of large magnitude gaps in galaxy systems.
In Zarattini et al. (2015) we studied the dependence of the luminosity function (LF) on the magnitude gap. We found that systems with ∆m 12 > 1.5 are missing not only L * , but also dwarf galaxies. In fact, the faint end of their LFs is clearly flatter than that of ∆m 12 < 1.5 systems. Moreover, Adami et al. (2009) found that dwarf galaxies in Coma are located in radial orbits even in the central region of the cluster. Thus, we suggest that also the lack of dwarf galaxies in FGs could be linked to radial orbits. Unfortunately, deeper data are required to study the orbits of dwarf galaxies in FGs, studies that could be done in the near future with new wide-field spectroscopic facilities (e.g. the WEAVE spectrograph).
Galaxy systems are thought to evolve from an initial phase of rapid collapse characterised by isotropisation of galaxy orbits, to a phase of slow accretion characterised by radial orbits (Lapi & Cavaliere 2011). Major mergers operate in the same way as the initial phase of rapid collapse, introducing dynamical entropy in the system, and transferring angular momentum from clusters colliding off-axis to galaxies, leading to orbit isotropisation. The fact that ∆m 12 > 1.5 systems have more galaxies on radial orbits than smaller-∆m 12 systems therefore suggests a difference in the time since last major merger, and this supports the conclusions of Kundert et al. (2017) based on cosmological simulations.
However, although D' Onghia et al. (2005) found that radial orbits are required for the formation of FGs (see their Eq. 3), there is no clear evidence of correlation between the type of orbits and the magnitude gap in numerical simulations. Our observational confirmation of the presence of radial orbits in systems with large magnitude gaps could be a boost for the analysis of future simulations in order to explain such a difference.
In summary, our study is a first observational confirmation that galaxies in systems with large ∆m 12 have more radially elongated orbits than galaxies in systems with small ∆m 12 . Our samples contain a mix of pure FGs and normal systems. A substantial increase of the spectroscopic data set for FGs is needed to check if our results also apply for these systems as a separated class.