Validation of standardized data formats and tools for ground-level particle-based gamma-ray observatories

.


Introduction
In preparation for the upcoming Cherenkov Telescope Array (CTA), the ground-based γ-ray astronomy community has made a joint effort to define standardized data formats and develop community-sourced tools aimed to facilitate access to the data by a wide audience.This requires the identification of a data-processing stage in which standardization between different instruments is possible.The primary source of background for any γ-ray observatory are events of hadronic origin usually referred to as cosmic rays.After reconstructing the events that A&A 667, A36 (2022) by the FITS Working Group in the Office of Guest Investigators Program (OGIP) at NASA1 and expands them to address the specific needs of the γ-ray community.The availability of such an open data format will not only help to prepare the operation of CTA as an open observatory, but also simplify the process for existing observatories and experiments to possibly publish and archive their data in an openly documented and maintained data format.
With similar motivation, a variety of open-source analysis tools has been developed.This signals a transition in a field that up until recently, and with the notable exception of the Fermi Large Area Telescope (LAT; Wood et al. 2017) or the INTEGRAL analysis tools2 , for instance, relied heavily on independent proprietary software developed for a specific observatory.These new open-source tools can be broadly classified into two classes.Some packages, such as the Multi-Mission Maximum Likelihood (3ML; Vianello et al. 2015), aim to bridge the gaps between different instruments by providing a common framework in which their respective proprietary tools interface, allowing joint, multiwavelength studies to be carried out.On the other hand, packages such as Gammapy (Deil et al. 2017b) and ctools (Knödlseder et al. 2016) aim to replace the existing frameworks altogether, and offer a single tool with which to carry out the analysis of data from multiple γ-ray observatories, individually or jointly.The latter requires GADF-conforming inputs, so that data from different observatories can be correctly read and analyzed by the same software.
There has been a number of studies that validated and highlighted the potential of the shared format and tools, focusing on either a single instrument (Mohrmann et al. 2019) or on multiinstrument analysis (Nigro et al. 2019).However, these efforts have largely been focused on Imaging Atmospheric Cherenkov Telescopes (IACTs), excluding the other type of ground-based γray instrument: particle detector arrays such as the High-Altitude Water Cherenkov (HAWC) observatory.While initially the focus of the GADF and shared tools was on IACTs, given that CTA will be an array of such telescopes, the standard is in practice mostly compatible with the data from any other type of γ-ray instrument.
Because of the complementary nature of IACTs and particle detector arrays, including both in the conception and development of such tools can be very beneficial.Particle detector arrays continuously survey large fractions of the sky, but can do so with relatively low angular resolution (Abeysekara et al. 2019).IACTs, on the other hand, have to be pointed to the region of interest, and are limited by weather and dark time, but can achieve higher angular resolution.IACTs can achieve good performance at low energies, below 1 TeV, while particle detector arrays are able to reach higher energies, of above 100 TeV.Multi-instrument analysis thus becomes necessary to cover the entire TeV range.A common data format and analysis tools allow the combination of data from IACT and particle detector arrays without the need for proprietary analysis software.This is relevant for both the current wide-field particle detector arrays, such as HAWC and the Large High Altitude Air Shower Observatory (LHAASO; Aharonian et al. 2021), and for future arrays such as the Southern Wide-Field Gamma-ray Observatory (SWGO; Hinton 2021).
In this paper, we present the first full production of HAWC event lists and IRFs that follows the GADF specification.We analyze it using Gammapy to reproduce a selection of published HAWC results.To do this, we start by building a background model that takes the produced event lists as input.Thereafter, we check the consistency of low-level data products such as the number of events and maps by comparing them with published examples.Furthermore, we reproduce three published HAWC results, each for a different source class by using Gammapy.Last, as a proof of concept, we perform a joint fit using data of the Crab nebula from six different γ-ray observatories using Gammapy.

HAWC observatory
The High Altitude Water Cherenkov (HAWC) γ-ray observatory is situated on the flanks of the Sierra Negra at 18 • 59 41 N, 97 • 18 30.6W in Mexico.It detects cosmic rays and γ-rays in the energy range from a few hundreds of GeV to more than a hundred TeV with a wide field of view (FoV) of ∼ 2 sr.HAWC has been fully operational with 300 Water Cherenkov Detectors (WCDs) since March 2015.In each such WCD of 4.5 m height and 7.3 m diameter, four photomultiplier tubes (PMTs) are submerged in 200 m 3 of purified water.The modular structure of HAWC WCDs allows optically isolating the detected Cherenkov light (300-500 nm) signal produced by the secondary particles such as e ± , γ, and µ ± , while traveling through the water volume.It also facilitates the identification of the local variations in the observed lateral distribution of detected showers, which in turn greatly helps performing γ-hadron separation.
The standard HAWC analysis procedure begins with the production of the instrument response functions (IRFs), which characterize the performance of the instrument.For this, air shower and detector simulations are generated using CORSIKA (Heck et al. 1998) and a dedicated package based on GEANT4 (v4.10.00,Agostinelli et al. 2003) named HAWCSim, respectively.These simulations are ran through the reconstruction procedure to obtain the two histograms that describe the detector response: the angular resolution and energy dispersion, the latter not normalized so that it also contains the effective area information.These quantities are usually stored in a ROOT (Brun & Rademakers 1997) file.The reconstructed data are first binned depending on the fraction of the available PMTs triggered by the air shower, a quantity referred to here as f Hit .This results in a total of nine f Hit bins, referred to with integer numbers between 1 and 9, as described in Abeysekara et al. (2017b).The value of f Hit is only weakly correlated with energy.In order to estimate the energy on an event-by-event basis, more advanced algorithms have been developed.The ground-parameter (GP) algorithm is based on the charge density deposited at the ground by the shower.The neural network (NN) algorithm estimates energies with an artificial neural network that takes as input several quantities computed during the event reconstruction.A detailed overview of both algorithms can be found in Abeysekara et al. (2019).All results shown in this paper correspond to energies estimated using the GP method, but that is only for simplicity, as it is also possible to use the NN estimator results instead.
Energy bins are usually defined beforehand, with 12 reconstructed energy bins, each spanning a quarter decade in log 10 (E/TeV).Energy bins are labeled alphabetically with increasing energy, as shown in Table 1.The combination of both binning schemes leads to a total of 108 2D f Hit /energy bins (Abeysekara et al. 2019), identified by the combination of the f Hit number and the energy letter.For each bin, the γhadron separation cuts are optimized independently and applied Notes.The first two bins (a and b) are not used in the analysis as the estimate is highly biased.
to the reconstructed data.A detailed description of the variables used for γ-hadron separation can be found in Abeysekara et al. (2017b).DL3 products are currently not produced during the HAWC standard analysis procedure, and instead, the events are selected for γ-likeness and directly binned into a HEALPix (Górski et al. 2005) full-sky map.In this step, a pointing correction, usually referred to as alignment, is applied to the reconstructed data as described in Abeysekara et al. (2017b).During the same mapmaking procedure, a background and exposure map is computed as well (Abeysekara et al. 2017b).These maps and the detector response file are typically used within 3ML (Vianello et al. 2015) via the HAWC-specific plugin hawc_hal (Vianello et al. 2018) to carry out γ-ray source analysis.

Gammapy
Gammapy is a community-developed Python package for γ-ray astronomy.It is built on the scientific Python standard packages Numpy, Scipy, and Astropy and implements data reduction and analysis methods for γ-ray astronomy.It will also be used as the base package for the science tools for the future CTA.Gammapy has already been successfully used and validated for analysis of IACT data from H.E.S.S. (Mohrmann et al. 2019) and has also been used to perform joint analyses of multiple IACTs with Fermi-LAT (Nigro et al. 2019).The standard analysis workflow of Gammapy begins at the level of selecting the DL3 data and time intervals based on Good Time Intervals (GTIs).In the next step, selected events are binned into multidimensional sky maps, such as the World Coordinate System (WCS) or HEALPix with an additional energy axis.The corresponding instrument response, including the residual hadronic background, is projected onto the same but possibly spatially coarser sky map.The binned data are bundled into a dataset, and together with a parametric model description, they can be used to model the data in a binned likelihood fit.Multiple datasets can be combined, and by sharing the same source model, they can be used to handle multiple event types or data from different instruments in a joint-likelihood fit.To fully support the analysis of HAWC data, we made one contribution to Gammapy.The HAWC point-spread function (PSF) is computed as a function of reconstructed energy, as opposed to true energy, which is typical for IACTs.For this reason, we implemented the possibility to exchange the order of the application of the PSF and energy dispersion matrix (see Sect. 4.3).All of the other features are already compatible with standard analysis workflows used for ground-based wide-field instruments.This includes combined spectral and morphological modeling of γ-ray sources, computation of test-statistic (TS) maps, and estimation of flux points and light curves.All the results shown in this paper were produced using Gammapy version 0.18.2.

HAWC data and IRFs in the GADF
At the DL3 level, the GDAF defines mandatory header keywords and columns, containing the basic information necessary for γ-ray data analysis, as well as optional entries that can be useful for specific instruments or observing strategies.The storage and distribution of γ-ray data as event lists together with some parameterization of the IRFs has been shown to be extremely successful by the Fermi-LAT observatory 3 .This format was extended to IACTs by the GADF initiative, and is extended in this work to particle detector arrays such as the HAWC Observatory.

Event lists
The DL3 stage refers to lists of reconstructed events that have been identified as γ-ray-like.Right after reconstruction, HAWC events are stored in event lists that mostly contain background events of a hadronic nature.The first step toward DL3 event lists is thus to bin them as described in Sect. 2 and apply the corresponding γ-hadron separation criteria in each bin to select γ-ray-like events.An additional alignment correction is applied to the direction of each event (see Abeysekara et al. 2017b), and the coordinates are transformed into the J2000 epoch.For each of the events, five basic quantities are required by the GADF: an event identification number, the two sky direction coordinates, the estimated energy, and the arrival time (Deil et al. 2017a).Event time-stamps are stored in GPS seconds after midnight January 6, 1980 4 , with the reference time provided in the FITS file header.Additionally, any other instrument-specific column can be added, such as the fraction of available triggered PMTs or the core location of the shower in the detector coordinates.Table 2 shows an example of such an event list.An integer indicating to which of the 108 2D bins each event belongs to is added as a column.This allows storing events from different bins in the same file without any loss of information.

GTIs and exposure calculation
The GTIs are defined as the time intervals during which the detector is stable and taking data.They are stored as a separate table within the same FITS file as the associated event list.They are used to compute the exposure time, which is crucial for measuring, for example, the γ-ray flux of astrophysical sources.
HAWC raw data are stored in files that span 125 s of datataking.After the data are reconstructed, these intervals are checked for stability (Abeysekara et al. 2019); their duration becomes the minimum unit of time that can be described as Notes.The real precision of the numbers has been reduced for formatting convenience.The Bin ID information is stored as an integer (e.g., bin 7f would correspond to number 77), with the bin name shown here only for convenience.good.The GADF requires the GTI table to have two columns, one with the start of the interval (TSTART), and one with the end (TSTOP).For the DL3 production presented here, these time stamps are obtained from the reconstructed data files themselves.This is done by selecting the first and last event in a file before applying any γ-hadron separation or binning.Currently, the effect of trigger dead time, which is expected to be in the order of a few percent, is not taken into account when analyzing HAWC data.From these time intervals, the exposure map is constructed by considering which part of the sky is above the maximum zenith angle observation threshold as seen from the observatory during each interval.Because of the continuous observations performed by HAWC, it is often useful to describe exposure in terms of a source transit above the detector, which is defined in Abeysekara et al. (2017a).The green curve in Fig. 1 shows the number of transits during which HAWC was stable and taking data between June 2015 and June 2019 as a function of right ascension (RA).Detector downtime can be caused by a variety of factors, ranging from hardware issues to meteorological conditions such as electric storms.As a result, these interruptions are not necessarily distributed uniformly over time; technical maintenance, for instance, is more likely during particular times of the day.This leads to the fluctuations in the exposure as a function of hour angle, or equivalently, RA, that is shown as the green curve in Fig. 1.These fluctuations are on the 3% level and are usually neglected in long-term source analysis.One of the advantages of the production of GTIs together with event lists is that this effect becomes easy to characterize and correct for.
The different data ranges defined by the GTIs can still be ranked by detector stability criteria, and those ranked lower are iteratively removed, effectively shaving time off of the green curve in Fig. 1 until it becomes flat.The result is shown by the curve labeled "Corrected exposure" in Fig. 1.This allows accurately neglecting the RA dependence of the live time while still keeping a total data efficiency of more than 90%.

Instrument response
The IRFs describe the combined detection abilities and precision of an instrument data-taking and reconstruction procedure.Independent of the actual detection principle, the response of a γ-ray instrument can be described by a few key properties.The angular resolution of the experiment is the reconstruction accuracy of the direction of the incident γ-ray, and is described by the point-spread function (PS F).The energy dispersion (E disp ) is the reconstruction accuracy of the energy of each event.The detection probability of a γ-ray is given by the effective area (A eff ).Finally, the expected residual hadronic background by misclassified events (N B ) is described by the background model (see Sect. 5).
The current version of the GADF neglects the correlation between PS F, E disp , and A eff and considers them independent.This can be described by the following combined instrument response R: where x and E represent the reconstructed position and energy, while x and E are the corresponding unknown true quantities.The assumption of independence is mostly sufficient for the current generation of instruments, including HAWC.However, it will be readdressed for CTA and likely GADF in the future.As mentioned in Sect.3, the HAWC PSF is currently provided in reconstructed energy, which introduces a dependence on the assumed spectral index of the modeled source.However, the data format also allows defining PSF in true energy as well, which allows the spectral reweighting of the PSF during model evaluation.Using this assumption, predicted counts N Pred can be computed as where N B is the expected residual hadronic background, t life is the live time, R is the combined instrument response as defined in Eq. ( 1

Derivation of the background model
Background estimation in HAWC analysis is typically done using the so-called direct integration method (Abdo et al. 2012).This method deals with the expected dipole cosmic-ray anisotropy by splitting the data into time intervals (usually 2 h) and estimating the background in each of these intervals, which are then added up.This requires the input files to be provided chronologically sorted and is typically done in the same process as the γ-hadron separation and map-making.However, the production of γ-like event lists simplifies this process by allowing the use of the entire dataset at once with the slightly modified method described below.This has the advantage of significantly reducing the required computing time, given that the input events are already selected as γ-ray-like, as well as providing additional flexibility and modularity to the process.Removing the need for small sequential time intervals also leads to improved statistics at the highest energies.
At a given sidereal time, every day, the region of sky above the observatory is the same.The events in the event lists were selected using the GTIs described in Sect.4.2, and split into 720 bins of sidereal time, τ.The duration of the bins is thus chosen to be 2 min of sidereal time, during which the sky above the observatory moves by 0.5 Å.This very fine binning is helpful to account for the dipole anisotropy.In each of these sidereal time bins, a sky map in local coordinates was filled using the selected events for each of the 2D analysis bins introduced in Sect.2, which we refer to as B τ (θ, φ), where θ and φ are the zenith and azimuth angles, respectively.We define a mask to exclude a band of ± 4 • around the Galactic plane, as well as other known bright γ-ray sources, as detailed in Table 3.We computed the mask in local coordinates for each of the defined sidereal time intervals, M τ (θ, φ).
From these ingredients, we can construct the background model.First, we mask and add the maps in sidereal time to build a time-independent masked spatial template, (3) In order to correct for the presence of the mask, we integrate the mask in sidereal time to compute weights quantifying the fraction of a sidereal day that each spatial pixel spends inside of Notes.The center of the region is given in Galactic coordinates.
the mask, We can now recover the unmasked spatial templates, An example of this process is shown in Fig. 2.These spatial templates represent the time-independent spatial distribution of events in the HAWC sky for each of the 2D analysis bins.To account for temporal fluctuations in the event rate, we compute the event rate outside of the exclusion mask for the maps in sidereal time bins, and the event rate outside of the exclusion mask in the spatial template, We can now combine the time-independent spatial template B(θ, φ) for each analysis bin with the time-dependent rate as  This results in B(θ, φ, τ), the background map in local coordinates for each sidereal time interval, which takes into account the fluctuations in the event rate.Finally, in order to construct the desired map in sky coordinates, we project each of the local coordinate maps corresponding to a sidereal time τ into the corresponding sky coordinates and stack them together into one full-sky map.This process yields one such map per analysis bin.The different energy bins can be bundled together into groups of the same f Hit bin, which results in a three-dimensional sky map that includes an energy axis for each of the f Hit bins.

Background normalization
To validate the background model, we split the full-sky counts map into 192 tiles of equal solid angle, applying the same mask as was used for the background model creation.For the 148 tiles that are at least partially contained in the HAWC FoV, we then compared the background model to the observed counts outside of the mask, where no bright γ-ray sources are expected.To do this, we defined a background normalization parameter that multiplies the background model, and performed a fit. Figure 3 shows the histogram of the resulting best-fit background normalization for the tiles.The normalization distributions are centered around 1, as is expected for a well-normalized background model.

Full-sky significance map
Because particle detector arrays continuously survey large fractions of the sky, producing full-sky maps is critical for the science and diagnosis of the data products.An example of this are full-sky significance maps, which can also be used to identify new sources in the instrument FoV.Such a map has been shown repeatedly by the HAWC Collaboration, for example, in Albert et al. (2020).Using the background map produced as described in Sect.5.1 for 1311 transits and the corresponding count map produced with the event list, we can compute the significance map using Gammapy.We used this map to confirm the background model because we expect the significance to have no hotspots above 5σ and to be normally distributed outside of the mask described in Table 3.The general approach to this is again to divide the HEALPix-based all-sky data into smaller patches using tangential WCS projections, compute the maps, and reproject back to a HEALPix pixelization.One of the resulting maps, for f Hit bin 4, is shown in Fig. 4. A histogram of the masked significance for all the other bins is shown in Fig. 5.As expected, there are no regions in the map with a significance above 5σ.The significance histograms for most bins follow a Gaussian distribution with a mean value of roughly zero and a width of unity, as expected from random fluctuations.The broader distribution in bin 1 is due to the imperfect characterization of the cosmic-ray anisotropy, which is most relevant in bins in which the background rate is higher, that is, low f Hit bins.Additionally, all the pixels with a significance above 5σ in f Hit bins 1 and 3 are located close to the edge of the HAWC FoV, indicating that they are likely the result of an edge effect of the map.The deviation from Gaussian behavior in bins 8 and 9 is explained by the relatively low number of events in these bins.Any source that is not covered by the mask is expected to contribute to the distributions shown in Fig. 5.However, following the construction of the mask, these sources would be faint, meaning that their individual contribution to each f Hit bin is unlikely to cross the 5σ threshold.

Comparison of data products
In order to ensure that the event selection and alignment were performed correctly, we can compare the number of events classified into each bin.Because event lists were not previously produced in HAWC, we do this by comparing the number of counts in the standard maps to the number of events in the lists for the region defined by a radius of 3 • around the Crab nebula.We make this comparison prior to the exposure flattening described in Sect.4.2 in order to compare the same number of data.We expect the event lists to contain slightly more events than the maps because a few percent of the total events is rejected during the standard map-making process due to criteria on the gaps between the time-stamp of events required by the exposure calculation.This is no longer necessary when the exposure is computed by using GTIs, as described in Sect.4.2.This effect is larger for bins with more events, such as low f Hit bins. Figure 6 shows that the number of counts agrees well for all bins.The total difference is about the expected 1%.
In addition to the total number of events, it is important to also ensure that their spatial distribution follows the expectations.Figure 1 in Abeysekara et al. (2019) shows the excess map of the region around the Crab nebula above 1 TeV for 837.2 days.We reproduced this excess map for the same data range and present both maps together with the residual between them in Fig. 7.It is clear that the maps are strikingly similar.

Validation
To validate the production of the event lists and IRFs, we chose three sources representing three different analysis use cases: the Crab nebula (eHWC J0534+220) as the standard candle and Galactic point source, the extended source eHWC J1907+063, and the extragalactic variable source Mrk 421.The map is masked using the mask described in Table 3.For the two steady source analyses, we used the framework described previously, with the events and IRFs described in Sect. 4 and the background model described in Sect. 5.For the special case of Mrk 421, the background was estimated locally, as detailed in Sect.7.3.Despite this difference, the workflow was very similar in all three cases.Events were selected from a 8 • × 8 • region in the sky around the source position.For each of the f Hit bins described in Sect.2, a three-dimensional map was produced, with two spatial axes and a reconstructed energy axis, binned as also described in Sect. 2. The background map was interpolated to that same geometry, and so were the IRFs described in Sect. 4. As an additional check and because it is also possible within Gammapy, the analyses in Sects.7.1 and 7.2 were also carried out using the existing HAWC counts and direct integration background maps.
The data and IRFs were bundled into a Gammapy dataset (see Sect. 3), one for each f Hit bin.Then, the relevant model was attached to the datasets, and a joint-likelihood fit was performed Fig. 6.Comparison of the number of events in a region of 3 • radius around the Crab nebula in the standard HAWC map and in the event lists for each of the analysis bins.The selected 2D bins shown here are those that are used in Sect.7.1 and follow the selection procedure described in Abeysekara et al. (2019).to all nine datasets together.The only difference in the case of Mrk 421 is that this same procedure was carried out for each of the selected time intervals to build the light curve.
We do not expect to exactly reproduce the reference best-fit values for several reasons.First, some of the validation analyses shown here make use of a different background model than the reference (see Sect. 5).Second, as mentioned in Sect.4.2, the exposure for the reference analyses is assumed to be flat.This introduces an error in the flux of up to a few percent that is not present in the validation analysis.Finally, the data reduction process described in Sect. 3 includes the projection into a sky-map and interpolation of the IRFs to a coordinate grid centered on the source position.This is not done in the reference analysis, which uses the IRF value that corresponds to the nearest declination node, spaced by 5 • .This can result in differences for the best-fit parameters, especially for sources located between declination nodes for which the IRFs evolve rapidly in the spatial dimension.All error values shown throughout this section are statistical only.

Point source: Crab nebula
As one of the brightest sources in the γ-ray sky, the Crab nebula is used as the standard candle for calibration and reference analysis.Due to its declination, it transits over the HAWC sky A36, page 7 of 12 A&A 667, A36 (2022) passing very close to the zenith.HAWC is able to detect (with a significance of roughly 5σ) the Crab nebula every day, that is, in the span of one transit.
We fit a point source and assumed a log-parabolic shape of the spectrum, where E 0 is the only fixed parameter with a value of 7 TeV.We compared against Abeysekara et al. (2019).Figure 8 shows the spectrum obtained with Gammapy and the exported data, compared against the reference analysis.The two results agree excellently.
The resulting best-fit parameters are shown in Table 4 as "From events", together with those from Abeysekara et al. (2019).Additionally, we repeated the exercise, but instead of using the exported data, we used the same standard HAWC counts and background map as were used in Abeysekara et al. (2019).The results of this fit are also shown in Table 4 as "From map".
From events 2.39 ± 0.04 2.79 ± 0.02 0.113 ± 0.007 Reference 2.35 ± 0.04 2.79 ± 0.02 0.10 ± 0.01 From map 2.35 ± 0.05 2.79 ± 0.02 0.12 ± 0.01 Notes.The fit result obtained using the DL3 products is given in the row labeled "From events".The fit result obtained using the standard HAWC map products is given in the "From map" row.The values in the "reference" column are taken from Abeysekara et al. (2019).

Extended source: eHWC J1907+063
Abeysekara et al. ( 2020) reported the detection by HAWC of several sources emitting above 56 and 100 TeV.One of those detected above 100 TeV is eHWC J1907+063.It is found in the vicinity of MGRO J1908+063.The 1σ extension of the emission is reported to be 0.67 • over the entire energy range with a Gaussian assumption.The best-fit spectrum is modeled as a logparabola (see Eq. ( 9)), with the pivot energy E 0 fixed at 10 TeV.We fit a combined spatial and spectral model made with the same assumptions as described above.Both components were fitted at the same time, including the source extension and position.The best-fit parameters are presented in Table 5. Figure 9 shows the spectrum of eHWC J1907+063 compared against the reference analysis.The agreement is clearly excellent.Figure 10 shows the resulting best-fit spatial model compared to the result in Abeysekara et al. (2020).When the errors detailed in Table 5 are taken into account, the agreement is very good.Additionally, we repeated the exercise, but instead of using the DL3 products, we used the same standard HAWC counts and background map as were used in Abeysekara et al. (2020).The results of this fit are also shown in Table 5 as "From map".

Time domain: Mrk 421
Markarian (Mrk) 421 is a BL Lac object that has been extensively observed in the γ-ray band (Albert et al. 2022).Its emission is known to be variable on timescales of hours or less.Abeysekara et al. (2017a)   Notes.The position in Abeysekara et al. (2020) is determined above 56 TeV, which leads to higher statistical errors due to the lower number of events.The fit result obtained using the DL3 products is given in the row labeled "From events".The fit result obtained using the standard HAWC map products is given in the "From map" row.The values in the "reference" column are taken from Abeysekara et al. (2020).described in Abeysekara et al. (2019) were implemented, which means that the energy of individual events could not be obtained.This leads to a different data selection and binning based only on f Hit , like the one described in Abeysekara et al. (2017b).Consequently, there is no such thing as an energy dispersion matrix for each of the f Hit bins.In order to deal with this, we introduced an assumed energy axis with a single bin for each f Hit bin dataset.This workaround allowed us to use the Gammapy framework in the same way as the previous two cases.The data selection and time binning were performed in a similar way as in Abeysekara et al. (2017a).Each event was associated with a sidereal day, starting at midnight local sidereal time at the HAWC site.The current HAWC detector stability criteria for data selection were applied, noting that these are slightly stricter than those used by Abeysekara et al. (2017a).For each of the sidereal days, the fraction of a Mrk 421 transit that is included in the data was computed by integrating the curve shown in Fig. 1 of Abeysekara et al. (2017a).Sidereal days for which this fraction is lower than 0.5 were removed from the selection.The result is a total of 463 transits, slightly fewer than the 471 included in Abeysekara et al. (2017a) due to the stricter data selection cuts.For each of these transits, the background was estimated locally.This was done by masking the expected source location and computing the number of counts outside the mask in overlapping declination bands, which takes into account the varying instrument response with declination.Because Mrk 421 is seen by HAWC as an isolated point source, this approximation suffices.Finally, counts and background maps were bundled with the PSF and effective area, the latter corrected for the transit fraction.
A point source spatial model was used with the position fixed to (166.11 • , 38.21 • ) in equatorial coordinates, as was done in Abeysekara et al. (2017a) and Lauer (2017).The spectrum of Mrk 421 was modeled by a power law with normalization φ 0 at E 0 =1 TeV, photon index Γ, and an exponential cut-off E C , The best-fit spectrum was first obtained for the entire data range.
The resulting parameters are presented in Table 6 together with those reported in Abeysekara et al. (2017a).In order to ensure a stable fit, a minimum value for E C = 0.1 TeV was imposed because of the high correlation between Γ and E C .When the values of E C = 5 TeV and Γ = 2.2 were fixed, the normalization was set free and was fit for each of the transits.The resulting spectra were integrated above 2 TeV to match the result of Abeysekara et al. (2017a).Figure 11   obtained with Gammapy together with the light curve from the reference analysis (Abeysekara et al. 2017a).The agreement is good, given the differences in data selection.The overall trend is clearly reproduced and the majority of points are compatible within errors.Figure 12 shows the distribution of the differences between the reference light-curve points and those obtained with Gammapy as a fraction of the statistical error.The large majority of values clearly lies within the 1σ region; the total is contained in the 2σ region.Nigro et al. (2019) presented the first fully reproducible measurement of the Crab nebula spectrum using public data from many different instruments.The analysis was carried out in Gammapy, and emphasizes the power of a shared and open analysis tool.Similar to Nigro et al. (2019), the goal of this study is not to reach any scientific conclusion regarding the Crab nebula.For this reason, we selected a small range of HAWC data, spanning only one  month in time, and included it in the joint fit from Nigro et al. (2019).This was easily done due to the fully reproducible nature of that work.A log-parabola (see Eq. ( 9)) spectral shape with fixed E 0 = 1 TeV was assumed for all the instruments.Performing the individual instrument data reductions and joint fits was straightforward after the data and IRFs were stored according to the GADF. Figure 13 shows the result of this joint fit.The spectrum of the Crab nebula might not be best described by the same spectral shape in all the different energy ranges, which could lead to differences in the best-fit parameters from the different experiments.However, this choice was made for simplicity, as the goal was not to reach any scientific conclusion regarding the Crab nebula, but rather show a proof of concept for this multi-instrument analysis.The joint fit shown in Fig. 13 is indeed not noteworthy for the resulting spectral shape, but for the fact that a multi-instrument fit was performed using data from six different γ-ray instruments, including one satellite (Fermi-LAT), four IACTs, and one particle detector array (HAWC) natively within the same tool.

Proof of concept: Joint fit
The HAWC event lists and IRFs used in this section have been made publicly available on the HAWC Observatory A36, page 10 of 12 website5 .This data release is the first to include HAWC data at the event list level.The data being public makes the result shown in Fig. 13 fully reproducible, as an extension of Nigro et al. (2019).

Conclusions and outlook
We have presented the first full production of HAWC data and IRFs that follows the community-shared specifications of the GADF.Data in this format allow reusing existing high-level analysis tools such as Gammapy for the analysis of HAWC data.We validated this approach by reproducing several published HAWC results and found excellent agreement.We additionally reproduced the analysis using the maps that are typically produced by the HAWC Collaboration directly into Gammapy, which also yielded a very good agreement.
This cross-check does not only validate the analysis tools, it also provides a valuable cross-check of the corresponding HAWC results.The published results have been reproduced with high precision with a different analysis tool, which is a powerful, non-trivial check.
The lifetime of observatories is finite, and one of the concerns at the end of the operation is to ensure that the archival data are available and easy to use both for future studies and to reproduce previous results.In this regard, having data in a format that is shared across the community and that can be analyzed with a general-use tool is a key advantage.The evolution of the GADF will be driven by the requirements imposed by current and future observatories, which will require data and IRFs to be described in increasingly realistic and complex ways.This will directly benefit the current generation of instruments, which will be able to ensure that their legacy data are properly used and interpreted.
The joint Crab nebula fit presented in Sect.8 highlights the potential of this approach to perform multi-instrument analyses, spanning energy ranges much wider than those of a single instrument.This in turn can lead to synergies, bringing the IACT and particle detector array communities together.Future and current detectors, such as SWGO and LHAASO, will operate at the same time as CTA, and thus would benefit most from the ability to share data and analysis tools.A shared analysis tool translates into a much larger developer and user base than any of the other collaboration-specific tools individually.This increases the complexity of features that can be implemented and maintained, benefiting all the instruments involved.
The work presented here is a proof of concept of what a particle detector array data analysis chain would look like when the shared format and tools are used.The very few limitations encountered arise because the initial development was led by the IACT community.However, these are minimal, and furthermore, expected to be resolved by future improvements, for example, with the expansion of the GADF standard for sky maps, which are tremendously useful given the high event rates recorded by particle detector arrays.This development should be made taking existing standards into account when possible, and would allow data products to be efficiently distributed in map form as well.
The GADF and science tools are constantly evolving to meet the needs of the community.Future particle detector arrays, such as SWGO, will be able to and should partake in this effort by ensuring that the format remains compatible with this detector class, while taking advantage of all the benefits it has to offer.

Fig. 1 .
Fig. 1.Number of transits during which the detector was stable as a function of RA.
) and Φ(p , E ) the flux of the source model.One set of each IRF is produced per analysis bin because they are independent, resulting in a value of N Pred per analysis bin.More details of the HAWC IRFs in the GADF can be found in Olivera-Nieto et al. (2021).A36, page 4 of 12 A. Albert et al.: Validation of standardized data formats and tools for particle detector arrays

Fig. 2 .
Fig. 2. Local coordinates view of the different quantities relevant to the background model construction.Left: masked spatial template for bin 1c, BM (θ, φ).Middle: mask weights, W M (θ, φ), quantifying the fraction of the total time that a pixel is masked.Right: weighted (unmasked) spatial template for bin 1c, B(θ, φ).

Fig. 3 .
Fig. 3. Best-fit results for the background normalization of the tiles for each of the nine f Hit bins.

Fig. 4 .
Fig.4.Full-sky significance map for f Hit bin 4 as computed with Gammapy.The map is masked using the mask described in Table3.

Fig. 5 .
Fig. 5. Distribution of the significance values outside of the mask for all the f Hit bins (green).A Gaussian function is fit to each of the histograms (dashed black line), and the best-fit mean and width are given in each panel.

Fig. 7 .Fig. 8 .
Fig. 7. Comparison of excess counts maps of the Crab nebula region.Left: crab excess counts map above 1 TeV as computed from the standard HAWC pipeline.Middle: Crab excess counts map above 1 TeV as computed from the DL3 data products.Right: residual map resulting from subtracting the reference map from the map derived from the DL3 data products.

Fig. 9 .
Fig. 9. Best-fit spectrum of eHWC J1907+063 obtained with Gammapy compared with Abeysekara et al. (2020).The bottom panel shows the comparison between the flux points computed in this work and those reported in the reference.

Fig. 10 .
Fig. 10.Spatial model of eHWC J1907+063 as obtained with Gammapy.The green star and circle represent the best-fit position and the 68% containment region, respectively.The blue cross and circle are the reference values from Abeysekara et al. (2020) for each quantity.

Fig. 12 .
Fig. 12. Distribution of the difference between the reference light-curve values (φ ref ) and those obtained with Gammapy (φ) as a fraction of the combined error ∆φ T = ∆φ 2 ref + ∆φ 2 .A Gaussian fit is indicated with the dashed black line.

Fig. 13 .
Fig. 13.Crab nebula spectral energy distribution for individual instrument fits and from the joint fit.Single-instrument results are represented with dashed lines, and the fit of all the datasets together, labeled as joint, is represented as a thick solid red line.The joint fit result fromNigro et al. (2019) is represented with a dotted black line.

Table 1 .
Albert et al.:Validation of standardized data formats and tools for particle detector arrays Definition of the reconstructed energy bins.Each bin spans one quarter decade.

Table 2 .
Simplified entries of an event list.

Table 4 .
Maximum likelihood fit results for the Crab nebula.
presented the HAWC light curve of Mrk 421, spanning over 17 months between November 2014 and April 2016.This work was carried out before the energy estimator techniques A36, page 8 of 12

Table 6 .
Maximum likelihood fit results for Mrk 421.