The ultra-diffuse galaxy NGC 1052-DF2 with MUSE: II. The population of DF2: stars, clusters and planetary nebulae

NGC 1052-DF2, an ultra diffuse galaxy (UDG), has been the subject of intense debate. Its alleged absence of dark matter, and the brightness and number excess of its globular clusters (GCs) at an initially assumed distance of 20Mpc, suggested a new formation channel for UDGs. We present the first systematic spectroscopic analysis of both the stellar body and the GCs (six of which were previously known, and one newly confirmed member) of this galaxy using MUSE@VLT. Even though NGC 1052-DF2 does not show any spatially extended emission lines we report the discovery of three planetary nebulae (PNe). We conduct full spectral fitting on the UDG and the stacked spectra of all GCs. The UDG's stellar population is old, 8.9$\pm$1.5 Gyr, metal-poor, with [M/H] = $-$1.07$\pm$0.12 with little or no $\alpha$-enrichment. The stacked spectrum of all GCs indicates a similar age of 8.9$\pm$1.8 Gyr, but lower metallicity, with [M/H] = $-$1.63$\pm$0.09, and similarly low $\alpha$-enrichment. There is no evidence for a variation of age and metallicity in the GC population with the available spectra. The significantly more metal-rich stellar body with respect to its associated GCs, the age of the population, its metallicity and alpha enrichment, are all in line with other dwarf galaxies. NGC 1052-DF2 thus falls on the same empirical mass-metallicity relation as other dwarfs, for the full distance range assumed in the literature. We find that both debated distance estimates (13 and 20 Mpc) are similarly likely, given the three discovered PNe.


Introduction
Ultra diffuse galaxies (UDGs) are a particular type of lowsurface brightness galaxies, defined as having central surface brightnesses of µ g,0 > 24 mag.arcsec −2 , and sizes of R eff > 1.5 kpc (van Dokkum et al. 2015). Galaxies with such properties were already known for several decades (Sandage & Binggeli 1984;Impey et al. 1988;Dalcanton et al. 1997;Conselice et al. 2003), but their particularly high abundance in galaxy clusters drew attention in the last few years (e.g. van Dokkum et al. 2015;Koda et al. 2015;Mihos et al. 2015;Muñoz et al. 2015;van der Burg et al. 2016). UDGs are now routinely identified also in groups and in the field (Román & Trujillo 2017;van der Burg et al. 2017;Shi et al. 2017;Müller et al. 2018). To explain their high abundance in over-dense regions, such as the Coma cluster, van Dokkum et al. (2015) proposed that UDGs may be hosted by massive, MW-like, dark matter (DM) halos that could protect them from environmental effects. One UDG in particular, DF44, was measured to have a stellar velocity dispersion consistent with a 10 12 M halo (van Dokkum et al. 2016). In addition, the empirical linear relation observed between the mass of the globular cluster (GC) system and the halo mass (Blakeslee et al. 1997;Peng et al. 2004;Harris et al. 2017) allows one to use this quantity to assess the DM content of UDGs. The high number of GCs around DF44 (∼ 100) would confirm the hypothesis of it being hosted by a very massive DM halo, along with few other UDGs with a GC excess. But most UDGs have GC systems typical of dwarf galaxy DM halos (Beasley & Trujillo 2016;Amorisco et al. 2018;Lim et al. 2018). This is in line with a stacked weak-lensing study performed by Sifón et al. (2018), showing that not all UDGs can have halo masses similar to those estimated for DF44. Formation scenarios need to explain how galaxies with similar masses and morphologies may be hosted in a broad variety of DM halo masses.
Di Cintio et al. (2017) suggested the possibility that internal processes (i.e. gas outflows associated with feedback) can, under some circumstances, kinematically heat the distribution of stars and form very extended systems similar to UDGs. An early cessation of star formation at z ∼ 2 would render their stellar masses, and associated surface brightness, low (cf. Yozin & Bekki 2015). At different quenching times, such a scenario results in UDG-like galaxies with low metallicities (−1.8 [Fe/H] −1.0) and a range of ages (Chan et al. 2018). This is supported by photometric (Pandya et al. 2018) and spectroscopic observations (Kadowaki et al. 2017;Gu et al. 2018;Ferré-Mateu et al. 2018;Ruiz-Lara et al. 2018) of UDGs. The general consensus is that UDGs have stellar populations that are typically old (> 9 Gyr) and metal-poor ( M/H ∼ −0.5 to −1.5). Moreover, these studies found that the UDGs' stellar masses and stellar metallicities fall on the empirical relation found for dwarf galaxies (Kirby et al. 2013). They conclude that UDGs are most likely the result of both internal processes, such as bursty star formation histories (SFH) or high-spin halos (Amorisco & Loeb 2016;Rong et al. 2017), and environmental effects such as tidal disruption (Collins et al. 2013;Yozin & Bekki 2015). One may note that UDGs may also form in tidal debris (see e.g. Kroupa 2012;Duc et al. 2014;Bennet et al. 2018).
To reconcile both the discovery of UDGs with exceptional characteristics such as DF44 and the average properties of typical UDGs, several different formation channels need to be invoked. Most stellar population studies have targeted "ordinary" UDGs, with typical dwarf galaxy DM haloes. While such galaxies may be well represented in current hydrodynamical simulations (see e.g. Chan et al. 2018), an open question is how more extreme cases (for instance UDGs with an extremely high, or low, halo mass for their stellar mass) have formed.
Of particular recent interest is the UDG NGC 1052-DF2 1 (hereafter DF2), which may have a special formation channel. Using the velocities of 10 GCs associated with DF2, van Dokkum et al. (2018b) claimed a low total mass that is consistent with the stellar mass only (however see Martin et al. 2018;Famaey et al. 2018;Laporte et al. 2018, for a re-analysis). Hypotheses put forward by van Dokkum et al. (2018b) suggest that DF2 may have been formed by gas ejected by tides following a merger or quasar winds from the massive elliptical NGC 1052, whose projected distance is only 14 , or ∼80 kpc at 20 Mpc distance.
A second striking feature of this galaxy is its GC system. DF2 has 12 confirmed GCs (Emsellem et al., submitted), an unusually large population when compared to normal dwarf galaxies (Lim et al. 2018;Amorisco et al. 2018). This is at odds with the DM deficiency, as explained above. These GCs are also very luminous: their absolute magnitudes are similar to those of the most massive Milky Way GCs at an assumed distance of 20 Mpc (van Dokkum et al. 2018b). Trujillo et al. (2018) advocated for a closer distance of 13 Mpc, and showed that DF2 and its GCs would then fall on the same empirical relation as other UDGs. The exact distance of DF2 is still debated (van Dokkum et al. 2018c). The new GC candidates associated with DF2 from Trujillo et al. (2018), if confirmed, would move the peak of the GC luminosity function towards fainter magnitudes and alleviate the issue of 'too bright' GCs. This would further increase the discrepancy between the DM halo mass estimated through GC kinematics and that from the GC abundance. In a companion paper (Emsellem et al., submitted, hereafter Paper I), we have indeed confirmed one new candidate GC from Trujillo et al. (2018).
In this series of two papers, we study DF2 with MUSE observations taken at the VLT. Thanks to the field of view of this integral field spectrograph, we are able to simultaneously probe the stellar body of the UDG and seven bright associated GCs, for the first time. While Paper I focuses on the kinematics of the UDG, this paper presents a stellar population analysis of this galaxy and its associated GCs.
In Section 2 we present the data reduction, sky removal and extraction of spectra. We estimate the age and metallicity of the stellar body and the GCs in Section 3. We report the discovery of three planetary nebulae in Section 4. We discuss the origin and the distance of the UDG and its association with the surrounding GCs in Section 5. The conclusions are in Section 6.

Data
The details of the observations, reduction and flux extraction procedures are detailed in Paper I. In the following we summarize the main points of the procedure.

Observation & Reduction
MUSE observations of NGC1052-DF2 were conducted via two ESO-DDT programs (2101.B-5008(A) and 2101.B-5053(A), PI: Emsellem) between July and November 2018 amounting to a total of ∼5.1h on-target integration time. We obtained 28 individual exposures with slight dithers and rotations to account for systematics due to the slicers. We deliberately offset the MUSE field by ∼ 8 with respect to the centre of the galaxy (cf. Fig. 1) to include an area where the surface brightness of the UDG is several magnitudes fainter than in the centre, which is used for the sky removal.
The OBs were all reduced using the latest MUSE esorex pipeline recipes (2.4.2). The reduction follows the standard steps. As the object is very faint and standard sky subtraction was not sufficient to recover a good quality signal, the full sky subtraction was done with the principle component analysisbased software Zurich Atmosphere Purge (ZAP, Soto et al. 2016). The principal components, or eigenspectra, are derived from the outermost regions of the MUSE object cube, where the sky is most dominant: the sky region is defined by excluding the bright sources and an ellipse centered on the UDG (see Paper I). In the following, we use as fiducial datacube the output of the ZAP procedure with an ellipse of circularized radius 30 , 45 eigenvalues and 50 spectral bins for the continuum filter. We discuss the effect of these parameters on the results in Section 3.2. The final data set, rendered in a mock HST broadband color image, using the same filters as in van Dokkum et al. (2018b), is shown in Fig. 1.

Extraction of spectra
The detection of the sources is described in Paper I.
We first create a spatial mask, presented in Paper I, to remove the background and foreground objects surrounding the UDG. We extract the spectrum of the UDG by summing each channel of the masked cube with a spatial weight corresponding to the flux of the UDG in the HST F814W image.
The GC and PN spectra are extracted with a Gaussian weight function to provide a S/N-optimized extraction. The full width at half-maximum is set to ∼ 0.8" to approximately match the point spread function. The background is measured locally with identical apertures in eight nearby locations that do not overlap with identified sources. In each channel we obtain the source flux by subtracting the median of the sky exposures from the weighted sum of the source spectrum. The dominant source of uncertainty is taken from the scatter in the sky spectrum values. The relative velocities are small (see Paper I), thus we do not correct for the relative velocities of the GCs. Contrarily to van Dokkum et al. (2018a), we do not weight each GC by their SNR. This would provide us with the highest reachable SNR, but the brightest source, GC73 (see Fig.1), would dominate the stack.
To estimate the physical spread in the different parameters (age, [Fe/H], α-enrichment) in the GC population we also create 100 bootstrapped spectra. These are new spectra constructed by adding together seven GC spectra that are randomly picked from the sample with replacement.

DF2's stellar populations: stellar body and GCs
We show in Fig. 2 the spectrum obtained for the UDG and the stack of all GCs. We note strong Balmer and calcium triplet (CaT) absorption lines, plus shallower absorption lines such as Mg and Fe. We do not detect any emission lines. This is consistent with the non-detection of atomic gas which implies a stringent upper-limit on the gas fraction of DF2 (below 2%, see Chowdhury 2019). Around the Hα line, we estimate a signal-tonoise ratio (SNR) of 62 pix −1 for DF2 and 72 pix −1 for the stack of all GCs (see Paper I).

Fitting procedure
We use the fitting routine pPXF (Cappellari & Emsellem 2004;Cappellari 2017) combined with the eMILES library (Vazdekis et al. 2016). The details of the fitting procedure are given in Paper I. In the following we summarize the main points of the procedure.
As template spectra we use the eMILES single stellar populations (SSPs) with a Kroupa (2001) initial mass function (IMF) and the Padova 2000 (Girardi et al. 2000) isochrones which were shown to perform well in the expected regime of old and low metallicity stellar populations (Conroy et al. 2009). The original range of metallicity values being rather sparse (only seven metallicity covering Fe/H from −2.32 to 0.22 with logarithmic spacing), we linearly interpolate for sixteen more metallicity values, between Fe/H = −2.32 and −0.71, following Kuntschner et al. (2010).
To avoid being biased by the flux calibration differences between our MUSE data and the eMILES library, we make use of multiplicative polynomials during the fit (Cappellari 2017). For the study we chose to allow for a 12-degree Legendre multiplicative polynomial and the impact of changing the degree is discussed in Sect. 3.2.

Ages and metallicities -Fitting method
We estimate the stellar population parameters by fitting single stellar populations (SSPs) to our spectra. The parameter uncertainties are derived from fitting 100 new spectra, constructed by adding the randomly shuffled residuals to the best fit. The best SSP fits for the UDG and GC stack are shown in Fig. 2. It should be noted that the first CaT line is masked during the fit (as in van Dokkum et al. 2018b), because it is located in a region affected by sky residuals. It is nonetheless well recovered by the fits, which independently shows that the sky subtraction did not affect these lines.
The location of the UDG and the GCs in the age-metallicity plane is shown in Fig. 3 with the estimation of age and metallicity for the stack of GCs from van Dokkum et al. (2018a), along with their 1σ error bars. For the UDG we find a best fitting age of 8.9 ± 1.5 Gyr and metallicity [M/H]=−1.07±0.12. For the full stack of GCs, we find a best fitting age of 8.9±1.4 Gyr and metallicity of [M/H]= −1.63 ± 0.09. The parameters of the best fits are not sensitive to a change of the degree of the multiplicative polynomial between 11 and 15, nor to a change of parameters in the ZAP procedure (masked radius of 30 or 36 , number of eigenvalues of 30, 45 or 50, and continuum filter window size of 30 and 50 Å).
It should be noted that our method does not consider the detailed continuum shape to derive the parameters, because of the use of multiplicative polynomials. In order to check the consistency of our estimates with the broad-band colors, we compute the AB magnitude color of the eMILES templates in F606W−F814W. The color of both best fit templates, respectively 0.40 mag for the UDG's and 0.35 mag for the GC stack's, agree with the colors computed by van Dokkum et al. (2018a): respectively 0.37±0.05 mag and 0.35±0.02 mag for the UDG and the GC stack.
The age and metallicity estimated for the GC stack are consistent within 1-σ for the ages and 2-σ for the metallicity to the values obtained by van Dokkum et al. (2018a): age of 9.3 +1.3 −1.2 Gyr and [Fe/H] = −1.35 ± 0.12. We obtain a lower metallicity for our GC stack, but it should be noted that we do not use the same stellar libraries for the fits, and the IMF they assume is not described. Furthermore, the spectral region studied in van Dokkum et al. (2018a) extends further into the blue compared to our MUSE data, where different spectral diagnostics contribute to the fit. Even though these differences may drive systematic shifts between the parameters measured in different studies, we note that in this work we make a direct comparison between the GCs and the stellar body from a single data set, which have similar SNR and assumptions. Even though the exact age and metallicity may be affected by different systematics, the relative differences between GCs and stellar body are significant and robust.
To quantify the spread in age and metallicity inside the GC population, we use our method on the bootstraped spectra. The median of the 100 realisations has the same age and metallic-ity as the GC stack. The error bar shows the propagation of the measurement error and the dispersion of the results in the bootstrap sample. We obtain an age of 8.9 ± 2.1 and a metallicity of −1.63 ± 0.11. We see that it is of the same order of magnitude as the error on the estimation of the parameters of the bootstrap, meaning that there is no significant evidence for a spread in properties between the individual clusters. Finally, we used pPXF on three radial sectors of the UDG: inside 0.5 R e , between 0.5 and 1 R e and between 1 and 1.5 R e , where R e is the effective radius of DF2 (see Paper I). The best fits have all the same age, but the central sector's metallicity estimate has a higher metallicity: [Fe/H] = -1.07±0.12 compared respectively to [Fe/H]= -1.19±0.12 and -1.19±0.14 for the two outer sectors. We thus note that the higher Article number, page 4 of 10 Fensch, van der Burg, Jeřábková, Emsellem, et al.: NGC 1052-DF2: II. The population of DF2 Fig. 3. Location of the best fit and 1-σ error bars in the age-metallicity plane for the UDG and the GC stack. The result of the study by van Dokkum et al. (2018a) is shown in black. In orange is shown the location of the median age and metallicity of the GC bootstrap sample. Note that the orange error bar is both a measure of the error of the fit and an estimate of the physical parameter spread intrinsic to the GC sample. metallicity found in the center of DF2 is not significant and that the metallicity gradient in DF2 is consistent with being flat.

Ages and metallicities -Spectral indices
We use a complementary method to estimate ages and metallicities based on the measurement of spectral line indices. In the following we work in the standardized Lick/IDS system (Worthey et al. 1994), and we list several key diagnostics in terms of age and metallicity in Table 1. Two diagnostics are shown in Fig. 4, with over-plotted grids of theoretical Lick indices of SSPs, based on the MILES spectral library, are obtained from Thomas et al. (2010).
To study the α-enrichment of the GCs and the UDG, we plot in the left panel of Fig. 4 Mg b, as a probe of the α elements, and Fe (the average of Fe λ5270 and Fe λ5335, following Evstigneeva et al. 2007). The α-enrichment of the GC stack and the UDG are not well constrained due to the small separation of iso-[α/Fe] lines in the metal-poor regime. Still, the diagnostics infer slight α-element enrichment: from 0 to 0.15 for the UDG and from 0 to 0.3 for the stack of GCs. This latter value is consistent with the value derived by van Dokkum et al.
The right panel of Fig. 4 shows the age-sensitive index Hβ versus [MgFe] = [Mgb × (0.72 Feλ5270 + 0.28 Feλ5335)] 1/2 (which probes the total metallicity, following Evstigneeva et al. 2007). The Lick index suggests a slightly higher metallicity than our full spectral fitting method indicates. However, it confirms the trend given by the first method that DF2 and the stack of all GCs have similar ages but that the UDG has a higher metallicity.
Even though measurements of Lick indices from individual GC spectra are imprecise given the noise in our spectra, we indicate the cluster-to-cluster variation by showing a distribution of bootstrapped stacks. The scatter of these realizations is represented by the second error quoted in Table 1. We note that the bootstrap uncertainties are similar to the formal statistical uncertainties (first errors quoted). Since the bootstrap error, by construction, also includes the statistical uncertainty on the measurement, them being of similar magnitude confirms that there is no significant spread in the properties of the individual clusters.
The two different methods indicate that DF2's stellar population has the same old age as the GC's, around 9 Gyr, and is significantly more metal-rich, by around 0.5 dex.

DF2's planetary nebulae
The spectra of the three detected PNe are shown in Fig. 5. Their kinematic association with the stellar body of the UDG is confirmed in Paper I. We see strong emission from the [O iii] doublet and Hα lines. However, Hβ and [N ii] are not detected for any of the PNe, which prevents us from computing their intrinsic extinction or metallicity. The measurement of their apparent 5007Å magnitude is given in Table 2. It is defined 2 as: with F 5007 is the integrated flux in the second [O iii] line in erg s −1 cm −2 . We check our flux calibration by comparing the flux of our GCs with those presented in Trujillo et al. (2018), with their HST observations. We found that the flux from the MUSE cubes are brighter by 0.064 ± 0.079 mag. We neglect this calibration difference and use the flux calibration from MUSE.
We assume a foreground extinction 3 of 0.076 ± 0.006 magnitude, corresponding to mean of the computed extinction for the line-of-sight of NGC 1052 (Schlegel et al. 1998;Schlafly & Finkbeiner 2011). The uncertainty includes a propagation of the uncertainty on the foreground extinction, the flux calibration and on the flux measurement. The latter is obtained by re-noising the spectrum with a Gaussian noise with a dispersion measured in the continuum red-ward of the [O iii] line. Hβ is not detected in any of the PNe and the SNR prevents us to infer a meaningful lower limit on the extinction. Thus, we did not correct for internal extinction. van Dokkum et al. (2018b) argued that DF2 was DM-deficient and a very different system from other galaxies, in particular from UDGs that were routinely shown to be hosted by dwarf to MW sized DM halos (see Introduction). In the following we discuss whether DF2 also stands out in terms of its stellar populations.

In terms of stellar populations
In Section 3.2, we estimated the age, the metallicity and the α element enrichment of DF2 and its GCs.
We found that the stellar population of DF2 is old, around 9 Gyr. It should be noted that our age estimate should be taken as a lower-limit, as blue horizontal branch stars could bias our age estimate to lower ages (Schiavon 2007;Conroy et al. 2018).    m 5007 M 5007 M 5007 at 13 Mpc at 20 Mpc PN1 28.4 ± 0.05 -2.24 ± 0.05 -3.18 ± 0.05 PN2 29.32 ± 0.14 -1.32 ± 0.14 -2.26 ± 0.14 PN3 29.91 ± 0.16 -0.73 ± 0.16 -1.67 ± 0.16 To study the metallicity of DF2, we show in Fig. 6 the location of the UDG in the mass-metallicity plane along with data from previous studies of quiescent UDGs. We indicate two different stellar masses for DF2: the one inferred for a distance of 20 Mpc (2 − 3 × 10 8 M ; van Dokkum et al. 2018b) and the other for a distance of 13 Mpc (6 ± 3 × 10 7 M ; Trujillo et al. 2018). We see that DF2 has a similar metallicity as the other UDGs previously studied, and falls on the empirical relation for dwarf galaxies from Kirby et al. (2013) for both mass estimates. We note that our data provide us with a much tighter metallicity estimate than most of the ones available in the literature.
This stellar mass-metallicity relation is interpreted as an effect of self-enrichment. The more massive a galaxy, the less metals are lost to galactic winds launched by star-formation feedback (Kirby et al. 2013). The mass-metallicity relation may also result from the galaxy-wide stellar IMF becoming systematically top-lighter with decreasing baryonic mass or star formation rate as shown to be the case using the IGIMF theory (Köppen et al. 2007;Recchi et al. 2015).  (Gu et al. 2018;Ferré-Mateu et al. 2018;Ruiz-Lara et al. 2018;Pandya et al. 2018). We show the location of DF2 for two mass estimates, corresponding to the two distance estimates of 13 and 20 Mpc (see text). The empirical mass-metallicity relations for low-mass and high-mass systems (from Kirby et al. 2013 andGallazzi et al. 2005, respectively) are shown in gray and orange.
If one assumes that DF2 is DM-deficient (van Dokkum et al. 2018b), one would then expect DF2 to be an outlier of the relation, with a lower metallicity than galaxies with the same stellar mass, which typically have a halo mass of 10 10 M (see e.g. Read et al. 2017). However we see that, for the assumed distance of 20 Mpc, corresponding to the DM-deficiency hypothesis, DF2 lies within the scatter of the relation. Even though the scatter of the relation is quite large (∼ 1 dex), DF2 has a higher metallicity than DF44 ([Fe/H] = −1.3 ± 0.4, see Gu et al. 2018) which has a similar stellar mass and an over-massive DM halo (∼ 10 12 M , see van Dokkum et al. 2016).
A first possibility could be that DF2 had a larger stellar mass than today and gradually lost part of it due to stripping. This stripping would not modify the metallicity of DF2, but move its location in this plot horizontally towards lower stellar mass and thus closer to the relation. This process, which could explain the location of some dwarfs above the stellar mass-metallicity relation (see the case of Antlia2, Torrealba et al. 2018), could also move a metal-deficient UDG closer to the relation. Furthermore, Trujillo et al. (2018) note a significant brightening of DF2 in the Northern region in ultra-deep g-band Gemini data, which might be a trace of a past stripping event. We note that the stripping of the stars only begins when most of the DM mass is already lost (around 90%, see e.g. Peñarrubia et al. 2008). Stellar mass stripping could then fit in the hypothesis of a DM-deficient galaxy. However, such a stripping scenario should also affect the GC system which should be stripped, or at least heated kinematically (Smith et al. 2013), which does not seem consistent with both the number (see next subsection) or the low velocity dispersion of the GCs associated with DF2.
A second possibility is that the gas of DF2 was already enriched in metals. This could be the case if DF2 was formed through tidally stripped material. We discuss this possibility in details in Sect. 5.3.
Overall, DF2 shows a stellar population typical of quiescent UDGs. Its location in the mass-metallicity plane, which is very similar to that of dSphs, is not what one would expect for a DMfree galaxy. This could be a hint to the origin of this galaxy.

In terms of GC systems
In Fig. 3 we saw that the metallicity of the GCs surrounding DF2 is significantly lower than that of the UDG, by around 0.5 dex. Lotz et al. (2004) found that field stars in 45 local dE are typically 0.1-0.2 mag redder than their GCs which they interpreted as a legacy of different star formation events and/or different metallicities. This color mismatch seems to be lower for UDGs (less than 0.05 mag for DF17 and DF2 van Dokkum et al. 2018a). In the case of DF2, we can show that this is driven by the stellar body being more metal-rich than the GCs. This is typical for dwarf galaxies of similar masses, including the Fornax dSph which has an excess of GCs (see e.g. Cole et al. 2012;Larsen et al. 2014).
In the left panel of Fig. 4 we see that the α-enrichment of the GCs is between [α/Fe] = 0 and 0.3. These are also typical values for GCs in dwarfs, whose GCs are known to be less α-enriched than those in more massive galaxies (Sharina et al. 2010). Thus the stellar populations of GCs around DF2 do not seem to deviate from previous known systems.
DF2 seems to have a rather high specific frequency 4 of GCs compared to other UDGs (above 11, see van Dokkum et al. 2018a). Studies have shown that the S N of UDGs varies dramatically from galaxy to galaxy and is on average higher than in dwarf galaxies (Amorisco et al. 2018;Lim et al. 2018).
Moreover, we note that the fraction of light that is in GCs for DF2 is similar to that of other UDGs (such as DF 17, see van Dokkum et al. 2015;Peng & Lim 2016). The only feature by which the GC system of DF2 differs from other GC systems, and which remains unexplained, is that the peak magnitude of the GC luminosity function is unusually high if one assumes a 20 Mpc distance (van Dokkum et al. 2018a). We note that Trujillo et al. (2018) found that the GC luminosity function of DF2 is standard, if located at a distance of 13 Mpc.

In terms of PNe
It is the first time that PNe are discovered around a UDG, thanks to the use of an integral field unit (IFU) spectrograph with good spatial resolution. One may compare our number of detections with an estimate of the expected number of PNe for such a system.
The total number of PNe per bolometric luminosity of the host galaxy is parametrized as α = N PN /L bol . We define α 2.5 as the number of PNe in the brightest 2.5 mag of the PNLF per bolometric luminosity. While stellar evolutionary models still have difficulties in reproducing the constancy of the PNLF bright cut-off in galaxies of different morphology (see e.g. Marigo et al. 2004), the study of the luminosity-specific PN numbers (the α parameter) in external galaxies (Buzzoni et al. 2006) provides a way of estimating the expected number of PNe in a galaxy. A typical α for metal-poor populations is ∼ 3 × 10 −7 PN per L bol /L . The three detected PNe are probably in the brightest 2.5 mag of the PNLF, and, using the standard PNLF, α 2.5 ≈ α/10. So if L bol ≈ 6 × 10 6 to 10 8 L for DF2, then our 3 PNe imply α ∼ 3 × 10 −8 to 5 × 10 −7 , in reasonable agreement with expectations for a metal-poor stellar population (Buzzoni et al. 2006). Note that our field of view does not cover all the outskirts of DF2, where other PNe may be found. Thus DF2 does not seem to have a different PNe formation rate than other systems.

What is the distance to DF2?
The distance of DF2 is subject of a yet unsettled debate. Indeed, as noted in van Dokkum et al. (2018b), a shorter distance would give a smaller stellar mass and increase the DM mass needed to recover the velocity dispersion measured.
van Dokkum et al. (2018b) computed a distance of 19.0 ± 1.7 Mpc from the surface brightness fluctuations (SBF) of the stellar body of DF2 and adopted a nominal distance of 20 Mpc. This distance was confirmed by an independent team, using the same technique (Blakeslee & Cantiello 2018). Trujillo et al. (2018) claim that the calibration used by van Dokkum et al. (2018b) is only valid for colors redder than that of DF2, and that the extrapolation to bluer colors is not trivial. They use five different redshift-independent methods to compute the distance of DF2 which all give consistent result of ∼13 Mpc. For such a distance, the measured velocity dispersion cannot be achieved without a significant DM content. van Dokkum et al. (2018c) demonstrated that the tip of the red giant branch (TRGB) stars may be blended in the HST images. By using a megamaser-TRGB-SBF distance ladder they find a new estimate of the distance of 18.7±1.7 Mpc, which is consistent with their first distance estimate.
Another reliable distance estimator at these distances is the bright abrupt cut-off of the PNLF, whose absolute magnitude is almost independent of galaxy type, at around M = −4.51 mag (see Ciardullo 2012, for a recent review). However, a trend towards a fainter cut-off magnitude in low metallicity galaxies is expected from theoretical models (Dopita et al. 1992;Schönberner et al. 2010), which is confirmed by observations (see e.g. Ciardullo 2012). Unfortunately, low-metallicity objects are usually not very massive and do not have enough PNe to sample well the PNLF. Hence, the metallicity dependence of M is hard to probe at the low metallicity end. In particular, the Dopita et al. (1992) theoretical relation was not confirmed at metallicities lower than that of the SMC. The cut-off magnitude of the low metallicity SMC and NGC 55 are estimated to be around M = −4.10 (see review by Ciardullo 2012). The SNR does not allow us to detect the [NII] line in the spectra of the PNe for a direct metallicity estimate of the PNe. If one extrapolates the Dopita et al. (1992) relation to the stellar metallicity of 1/10th solar for DF2, derived in Section 3.2, one would expect a cut-off magnitude of M = −3.67 for the PNLF of DF2.
Our IFU observations allowed us to find three PNe, which is a première for UDGs. To quantify how much these three PNe inform us on the distance estimate, we perform a maximum likelihood estimation (MLE), using the PNLF from (Ciardullo et al. 1989). For a cut-off magnitude M , the number of PN with absolute magnitude M is proportional to: The likelihood function L can be written: with µ the distance modulus, m i the apparent magnitude of each PN, and m l the completeness limit. We set as completeness limit a [O iii] emission line peaking at three times the local rms measured for the PNe. This gives m l = 30.64 mag. We minimize -ln(L) by varying µ and M . We define respectively the 1-σ and 3-σ error range by the range of parameters for which respectively, ∆ ln(L) < 0.5 and 4.5. In Fig. 7 we show the result of the MLE. For M = −4.51, the distance that maximizes the likelihood is 19.0 Mpc, and the 1 and 3-σ upper limits are respectively 33.3 and 38.1 Mpc. These values decrease for fainter M . In particular, for M = −3.67, which is the value expected from Dopita et al. (1992) for 1/10th solar metallicity, the distance that maximizes the likelihood is 12.9 Mpc, and the 1 and 3-σ upper limits are respectively 22.6 and 25.9 Mpc.
Thus, we note that none of the two former distance estimates is significantly more likely, given the three discovered PNe. Given the off-centered field of view that we chose, we may have missed a brighter PN in the South-West part of DF2. In order to give strong constraints on the distance to DF2 the potential brightest PNe would need to be 1.5 mag brighter than PN1, magnitude for which a 20 Mpc distance would be ruled out by 3-σ.

What is the origin of DF2?
From a kinematic study of 10 GCs surrounding DF2, van Dokkum et al. (2018b) inferred a low (projected) velocity dispersion, which they interpreted as DF2 'lacking' DM. This claim has been heavily scrutinized (Trujillo et al. 2018;Famaey et al. 2018;Laporte et al. 2018;Kroupa et al. 2018) and is revisited in Paper I. If confirmed by other independent tracers, this lack of DM calls for an additional formation channel, to explain the existence of both DM-deficient and DM-dominated UDGs (such as DF44, van Dokkum et al. 2016).
As a first hypothesis, van Dokkum et al. (2018b) propose that the claimed 'lack' of dark matter in DF2 may be explained if DF2 is a tidal dwarf galaxy (TDG), i.e. a galaxy formed from material that was expelled from a massive galaxy host during a galactic interaction (see review by Duc & Mirabel 1999). The proximity of the massive galaxy NGC 1052 and the peculiar ra-dial velocity of DF2 (+293 km.s −1 if at 20 Mpc) would support this hypothesis. Moreover, galaxies with typical morphological parameters of UDGs were observed to be still connected by a stellar stream to a massive host (Bennet et al. 2018). Unfortunately, no measurement of the stellar populations of those systems has been performed so far.
Because of this particular mode of formation, TDGs are indeed expected to be dark-matter free (Bournaud & Duc 2006;Wetzstein et al. 2007;Lelli et al. 2015) and almost devoid of stars from their host (Boquien et al. 2010). Furthermore, old TDGs enter in the category of UDGs with their low central surface brightness and large effective radii (Duc et al. 2014). Interestingly, we note that the cluster formation efficiency of TDGs, that is the fraction of SFR which happens in bound clusters, is seen to be very high (50%) compared to other systems (Fensch et al., subm.). Last but not least, they inherit the metal-enrichment from their more massive host. All observed tidal dwarf galaxies, which have stellar ages of typically less than 1 Gyr, deviate from the luminosity-metallicity diagram and have a significantly higher metallicity than other dwarfs for a similar luminosity, with a metallicity of typically around half solar, independent of their mass (Duc et al. 2000;Weilbacher et al. 2003). They are thus outliers of the stellar mass-metallicity relation.
Given the age of the stellar population of DF2, if it is a TDG the interaction must have happened at around z = 2, where the metal-enrichment of the gas in the outskirts of the host galaxy could still be quite low (see e.g. Jones et al. 2013). Unfortunately, there is not much data on old TDGs as their low surface brightness makes them difficult to study (but see Duc et al. 2014), unless some or all of the Milky Way and Andromeda satellite galaxies are very old TDGs (Metz & Kroupa 2007;Pawlowski et al. 2011;Yang et al. 2014). In Sect. 5.1.1, we noted that DF2 is on the stellar mass-metallicity relation, contrarily to the young TDGs. If the level of pre-enrichment is between 0.001 and 0.01 Z , it is possible that the TDGs would not reach the mass-metallicity relation of young TDGs after many Gyrs (Recchi et al. 2015). Under the DM-deficiency hypothesis, a small pre-enrichment of DF2 could then explain the location of DF2 in the stellar mass-metallicity diagram (see discussion in Sect. 5.1.1). Moreover, we note that most GCs in 'normal' dwarf galaxies with spectroscopic metallicity measurements are very metal-poor (e.g. [Fe/H]∼ −2 dex, for GCs in the Fornax dSph, see de Boer & Fraser 2016). A coeval formation of the UDG and its clusters in pre-enriched gas ejected from a massive galaxy could explain how the GCs of DF2 have been enriched to [Fe/H]∼ −1.6 dex. Thus the metallicity of DF2 and its GCs could be consistent with the TDG origin hypothesis.

Conclusions
We present the first simultaneous analysis of the stellar population of a UDG and its surrounding globular clusters.
We fit SSPs to the starlight component of the stellar body and the stack of all GCs using the empirical stellar library eMILES with the fitting routine pPXF.
We find that the UDG's stellar populations are consistent with an old age, 8.9±1.5 Gyr, low metallicity, [M/H] = −1.19 ± 0.11, and little to no α-enrichment, i.e formed over a timescale larger than 1 Gyr. The GC spectra are consistent with the same age, 8.9 ± 1.4 Gyr, but have a lower metallicity than DF2 ([Fe/H] = −1.55 ± 0.09). This result is consistent with the Lick indices diagnostics and the broadband colors of DF2 and its clusters.
The stellar mass and metallicity of the UDG fall on the empirical relation found for old dwarf galaxies. In particular, DF2 has a comparable metallicity to DF44, which has the same stellar mass but was shown to have a MW-like DM-halo. This relation is a consequence of the self-enrichment of galaxies and thus depend on the total mass of the galaxy. Under the DM-deficiency hypothesis one would then expect DF2 to have lower metallicity than galaxies with similar stellar mass. We note that stellar mass loss due to stripping could move a metal-deficient galaxy back to the relation, but this would affect its GC system, which does not seem to be the case for DF2. Another hypothesis would be that DF2 has a tidal origin and was formed by gas pre-enriched in metals.
We also report the discovery of the first three PNe in a UDG. That number is consistent with the number of PNe in other galaxies with similar luminosity and metallicities. We find that distance estimates of 13 to 20 Mpc are similarly likely, given the three discovered PNe.