ExoCross: a general program for generating spectra from molecular line lists

ExoCross is a Fortran code for generating spectra (emission, absorption) and thermodynamic properties (partition function, specific heat etc.) from molecular line lists. Input is taken in several formats, including ExoMol and HITRAN formats. ExoCross is efficiently parallelized showing also a high degree of vectorization. It can work with several line profiles such as Doppler, Lorentzian and Voigt and support several broadening schemes. Voigt profiles are handled by several methods allowing fast and accurate simulations. Two of these methods are new. ExoCross is also capable of working with the recently proposed method of super-lines. It supports calculations of lifetimes, cooling functions, specific heats and other properties. ExoCross can be used to convert between different formats, such as HITRAN, ExoMol and Phoenix. It is capable of simulating non-LTE spectra using a simple two-temperature approach. Different electronic, vibronic or vibrational bands can be simulated separately using an efficient filtering scheme based on the quantum numbers.


Introduction
We present a Fortran 2003 program ExoCross to compute spectra as well as spectral properties of molecules using line lists. ExoCross is specifically developed to work with huge molecular line lists such as those generated as part of our ExoMol project (Tennyson & Yurchenko 2012) or similar endeavours (Rey et al. 2016). ExoCross takes such line lists as input and returns pressure-and temperature-dependent cross sections as well a variety of other derived molecular properties which depend on the underlying spectroscopic data. These include statedependent lifetimes, temperature-dependent cooling functions, and thermodynamic properties such as partition functions and specific heats.
The main challenge when working with hot line lists for polyatomic molecules is their extremely large sizes. Thus, for example, there are several line lists generated as part of the ExoMol project containing in excess of 10 billion transitions Sousa-Silva et al. 2015;Underwood et al. 2016a;Yurchenko et al. 2017a;Owens et al. 2017;Pavlyuchko et al. 2015;Al-Refaie et al. 2015b,a;Yurchenko et al. 2017a). The size of these datasets makes them impractical for direct use in line-by-line applications. We note that simply ignoring the billions of often very weak lines does not give reliable results (Yurchenko et al. , 2017a. While there are a number of approaches to this problem such as the use of k-coefficients (see, for example, Showman et al. (2009); Amundsen et al. (2014); Malik et al. (2017); Min (2017)), the most practical approach which does not involve making significant approximations is to produce cross sections for a set of predefined conditions. These cross sections are then easier to handle in, for example, radiative transfer codes than the original line lists as they can be stored on far fewer grid points than there are lines. However handling these large line lists requires care and, in particular, the generation of cross sections on an appropriate temperature-, pressure-and ⋆ The corresponding author: Sergei N. Yurchenko; E-mail: s.yurchenko@ucl.ac.uk frequency/wavelength-dependent grid is data intensive and can become computationally highly demanding. ExoCross provides a computational solution to this problem; it has been extensively optimised to process huge datasets, including the introduction of an efficient algorithm for generation large numbers of Voigt profiles which is discussed below. ExoCross is optimized to provide high throughput via efficient parallelization and vectorization. This is especially important when working with line lists containing tens of billions lines. At different stages of development ExoCross was used to generate spectra by Underwood et al. (2016b,a); McKemmish et al. (2016); Wong et al. (2017); ; Yurchenko et al. (2017a); Owens et al. (2017); Prajapat et al. (2017); Yurchenko et al. (2018) and Rutkowski et al. (2018).
ExoCross is designed to generate molecular cross sections (absorption or emission) on a grid for a set of temperatures and pressures using different line profiles (e.g. Doppler, Voigt etc) under the local thermodynamic equilibrium (LTE) as well as non-LTE (Darby-Lewis et al. 2018). Other useful functionality include computing lifetimes , stick spectra, partition functions, cooling functions, and specific heats. The HITRAN molecular spectroscopic database (Gordon et al. 2017) is a widely-used compilation aimed at radiative transport studies of the Earth's atmosphere. ExoCross is capable of working with HITRAN line lists (.par) as well as super-lines (Rey et al. 2016;Yurchenko et al. 2017a). It can be easily extended to accept other formats.
As part of this implementation, we have developed two new algorithms to perform convolution integrals needed for the Voigt line profile. The first algorithm is based on the Gauss-Hermite quadratures and is developed specifically to guarantee conservation of the Voigt line area. The second algorithm is based on exploiting the similarity of the Voigt profile at large distances from the line centre to compute the opacities quickly.
There are a number of other similar programs available which are designed to work with line lists. These include the HI-TRAN interface HAPI (Kochanov et al. 2016), SpectraPlot.com (Goldenstein et al. 2017) and SPECTRA (Tennyson et al. 1993). A&A proofs: manuscript no. exocross_2003_arxiv However, all of these programs would struggle to handle the huge line lists required for models of atmospheres at elevated temperatures. ExoCross is designed to be flexible; it takes input in both ExoMol (Tennyson et al. 2013 and HITRAN (Rothman et al. 2005) formats. Data can be returned in a variety of formats: ExoMol, HITRAN and Phoenix (Jack et al. 2009), where Phoenix is a full non-LTE atmospheric transfer code accounting for depth-dependent abundances (cloud formation, element diffusion, etc.) using the line-by-line approach. Thus as a subsidiary function the code can be used to interconvert between ExoMol and HITRAN formats.
The paper is organised as follows. The main functionality of ExoCross is presented in Section 2. The line profile implemented in ExoCross are discussed in Section 3. Section 4 presents ExoCross calculation steps. The data format are described in Section 5. Section 7 offers some conclusions. The Ex-oCross manual provided as part of the supplementary data as well as GitHub and CCPForge repositories gives full working details of the program so the description below is restricted to outlines and examples.

Intensities and partition function
An absorption line intensity I fi (cm/molecule), also known as absorption coefficient, is given by where A fi is the Einstein-A coefficient (s −1 ),ν fi is the transition wavenumber (cm −1 ), Q(T ) is the partition function defined as as sum over states g tot n is the total degeneracy given by g tot n = g ns n (2J n + 1), J n is the corresponding total angular momentum, g ns n is the nuclear-spin statistical weight factor, c 2 = hc/k B is the second radiation constant (cm K),Ẽ i = E i /hc is the energy term value (cm −1 ), and T is the temperature in K.
The emissivity (erg/molecule/sterradian) is given by: Note that the isotopic abundance is not included in the definition of the line intensities (absorption or emission) in Eqs. (1) and (3). This is different from the HITRAN convention, where the absorption coefficients of an isotopologue contain the corresponding natural (terrestrial) isotopic abundances, see https://www.cfa.harvard.edu/hitran/molecules.html. For such applications where the isotopic abundance is required, the intensities in Eqs. (1) and (3) can be scaled by an abundance factor specified in the input.

Radiative lifetime
The radiative lifetime (s) can be computed as ) See an example of the lifetimes in Fig. 1 computed from the 10to10 line list for CH 4 . Examples of ExoMol lifetimes and cooling functions can be found in Tennyson et al. (2016a); Melnikov et al. (2016) and Mizus et al. (2017).

Cooling function
The emissivity (erg/s sr molecule) can be used to produce the cooling function W(T ) as the total energy emitted by a molecule (Neale et al. 1996)

Stick spectra
A stick spectrum is a list of frequencies and line intensities, accompanied by the full description (quantum numbers) of the upper and lower states. When plotted, each line is represented by a 'stick' with the intensity given by its height, see Table 1 where an extract from an output file containing an absorption stick spectrum of KCl (Barton et al. 2014) is shown. A stick spectra of CaO is shown in Fig. 2.

Cross sections
A cross section σ fi (ν) from a single line f ← i is related to the corresponding integrated absorption coefficient I fi as whereν is a transitional wavenumber. By introducing a line profile fν fi (ν) the cross section (cm/(molecule cm −1 )) can be defined as  where fν fi (ν) is an integrable function with the area normalized to unity:

Grids
By default ExoCross uses an equidistant grid, defined by the wavenumber of wavelength range [ν A ,ν B ] and the number of the grid points N points . The latter includes both the first and last bounds. The grid bin size is defined by .
The number of intervals is then N points − 1. Usually the number of points is an odd number in order to make ∆ν a 'round' value. Non-equidistant wavenumbers grids can be generated either as grids of constant resolving power R =ν/∆ν or equidistant wavelength grids.

Partition function and specific heat
The partition function Q(T ) is given by Eq. (2). The evaluation of Q(T ) requires the energy term valuesẼ i and degeneracies g tot , which are usually included in molecular line lists. As part of the intensity calculations, the partition function must be either evaluated using these quantities, or directly provided as part of the input. These values can be, e.g., taken from the .pf files provided as part of the ExoMol database  or as part of the TIPS program provided by HITRAN (Gamache et al. 2017). The direct input option is recommended as often the Ex-oMol or HITRAN partition functions are more accurate as they contain additional, higher energy contributions which make an important contribution, particularly at elevated temperatures.
The molar specific heat is given by (JK −1 mol −1 ) where R is the gas constant and the 1st and 2nd moments Q ′ and Q ′′ are These latter moments can be also requested from ExoCross. An example of C p (T ) of CH 4 generated using the 10to10 line list is shown in Fig. 4.
It is often instructive to plot individual contributions to the partition function from different J states defined as This is useful to assess the convergence of the line list with respect to J and thus to estimate T max the line list is applicable to. Figure 5 shows the such individual Q J (T ) contributions for the UYT2 line list for SO 3 (Underwood et al. 2016a).

Intensity thresholds
An intensity threshold can be used to speed up the cross-section calculation or to reduce the output in stick-spectra type calculations (done by simply specifying a constant intensity threshold value in cm/molecule in the input file). The constant intensity cut-offs are however known to cause problems at long wavelengths, where the density of lines is small and each line, even weak, can be important. A more sophisticated method is to use the dynamic HITRAN's intensity cut-off (Rothman et al. 2013), defined as where the HITRAN values forν crit and I crit are 2000 cm −1 and 10 −29 cm/molecule, respectively. These values are also default in ExoCross but can be changed in the input.

HITRAN
ExoCross can be used to work with the line list in the HITRAN native format .par, which covers almost all its functionality. It can also be used to convert to ExoMol to HITRAN format (see Section 5.2).

Phoenix
ExoCross has the facility to output data in Phoenix format (Jack et al. 2009). In order to speed up the line-by-line calculations Phoenix's atomic and molecular line lists have a compact structure, where all required properties (line positions, oscillator strengths, lower state energies and broadening parameters) are stored as 4-and 2-bytes integers. For the wavelength (µm, 4 byte-integers) this is defined as: The oscillator strength g f fi for a f ← i transition, energy term valuesẼ, and broadening parameters γ and n are mapped onto 2-byte integers according to where p is one of these properties. The integers i λ , i γ and i n are then written as unformatted records with direct access, each of which containing data for 65536 lines (block-size). For molecules the broadening parameters include the reference Voigt line widths due to H 2 (γ H 2 ) and He (γ He ) and the corresponding temperature exponents n H 2 and n He (see below). It should be noted that Phoenix uses the so-called astrophysicsconvention for the nuclear statistical weights, which are related to the physics convention (adopted by ExoMol and HITRAN) as follows: where i counts different nuclear statistics. For example, in case of water (H 2 16 O), the nuclear statistics g ns i factors (physics convention) are 1 (para) and 3 (ortho), thus g ns i in the astrophysics convention are 1/4 (para) and 3/4 (ortho). Since Phoenix's partition functions are directly affected by the astrophysics convention, in order to be consistent, the ExoMol g f fi values have to be scaled by the factor 1/4 for water, or i g ns−phys i −1 in general.

Treating non-local thermodynamic equilibrium (non-LTE)
ExoCross provides a simple approach to treating non-LTE environments by differentiating between the rotational and vibrational (vibronic) temperatures when calculating intensities or partition functions (or other T -dependent properties). To this end we approximate the total energy as a sum of the vibrational (or vibronic) and rotational contributions; where and k are generic vibrational (vibronic) and rotational quantum numbers, respectively. If the pure vibronic contributions are taken as the corresponding energy values at J = 0 (integer spin), J = 1/2 (non-integer spin) or the lowest J allowed by the symmetry of the electronic term and the parity, corresponding to the lowest states (usually '+' or 'e'). The rotational contribution is simply given bỹ We also assume that the rotational and vibrational modes are in corresponding (Boltzmann) LTE and that the non-LTE population of a given state (used in intensity and/or partition function calculations) is given by For this representation it is important to have all the vibrational and rotational quantum numbers defined in the line list, or at least for states accessed by non-LTE calculations.

Line profiles
The line broadening is important for practical applications. While temperature effects are commonly modelled by a Doppler line profile, pressure broadening is more complicated. For very high pressure regimes Lorentzian profiles can be used, while for moderate pressures Voigt profiles are generally used (see, for example, Schreier (2017)).

Standard line profiles and sampling method
The most commonly used line profiles in ExoCross include Gaussian, Doppler, Voigt and Lorentzian. The general Gaussian line profile is given by (Hill et al. 2013b) whereν fi is the line centre position and α G is the Gaussian halfwidth at half-maximum (HWHM). The Gaussian line profile is useful to model generic spectra represented by lines with constant HWHM. The Gaussian line profile can be also used to model the microturbulence broadening by choosing α G appropriately.
The Doppler line profile f D ν fi ,α D (ν) is based on the Gaussian shape defined in Eq. (15), where the Doppler HWHM is given by at temperature T for a molecule of mass m. The Lorentzian profile is given by where γ L is the Lorentzian line width (HWHM), given most commonly by Here T 0 and P 0 are the reference temperature and pressure, respectively, γ 0 and n L are broadening parameters for a given broadener, reference HWHM and temperature exponent, respectively.
The Voigt profile is a convolution of the Doppler and Lorentazian profiles: where we introduced a unitless variable ν given by: The Lorentzian line width γ L strongly depends on the molecule and is usually also state-dependent. The corresponding values must be given in the input including the specification of the broadeners and their mixing ratio. Each calculation can handle only one combination of broadeners.
Additionally, a simple box-type line profile given by where ∆ν is the width of the box, is available. The individual contribution from each line to the cross sections at a given frequency grid point k is evaluated by sampling the corresponding line profile (see Eq. 7) a given by which will be often referred to as a sampling method. This method has the disadvantage of underestimating the opacity when too coarse grids are used which can lead to lines being partially or completely left out. This is a typical problem for long wavelengths where the lines are narrow and far from each other, which is usually tackled either by re-normalizing the line area, see, for example, Sharp & Burrows (2007), or by using a random sampling (Lupu et al. 2016). Below we explore a different, more rigorous alternative.
In practical applications the cross sections are computed on a grid of frequencies (wavenumbers)ν i . When the grid is not sufficiently dense, the line profiles lose their normalisation. This is usually not a problem, at least for most of the room temperature applications. However for high T when billions of lines are used, this leakage can lead to a significant loss of opacity. In order to prevent this effect, Hill et al. (2013b) suggested using an averaged intensity over a given frequency bin, where the corresponding cross section is integrated analytically. This method originally presented for the Gaussian (Doppler) line profile, is extended here to describing Lorentzian and Voigt profiles.

Binned Gaussian profile with analytical integrals
An averaged (integrated) cross section over a bin [ν k − ∆ν/2 . . .ν k + ∆ν/2] from a line f← i is given bȳ where erf is the error function and A&A proofs: manuscript no. exocross_2003_arxiv are the scaled limits of the wavenumber bin centred onν k relative to the line centre,ν fi , and I fi is the line intensity in units of cm −1 /molecule cm −2 from Eq. (1). Here we take advantage of the fact that an analytical solution exists for the integral of the Gaussian function where C is an integration constant. The total cross section at the frequency bin k is given by a sum over all contributions from individual lines fi: and can be interpreted as an average value of the cross sections from a given frequency bin k. The advantage of this approach is that in definition it always gives exact integrated cross sections independent of the number of grid points used or the integration interval. Therefore it is recommended for applications where accurate integrated cross sections or absorption coefficients on coarse grids are required. However it is known that averaged cross sections, especially on coarse grids, can lead to huge errors in integrated flux. Therefore for radiative transfer applications, the direct sampling methods are more accurate and should be used instead.

Binned Lorentzian profile with analytical integrals
Here we apply the same idea of analytical integral to the Lorentzian line profile: where y ± k,fi =ν kνfi ± ∆ν/2 γ L .
Here the following integral was used: where C is an integration constant. Again, the integration within each bin is done analytically which guarantees no loss of accuracy for any number of points.

Binned Voigt profile with analytical integrals
The two line profiles (Gaussian and Loretnzian) can be combined to produce a similar formulation for the Voigt profile, where we use the idea of Gauss-Hermit quadratures as, for example, used in Humlíček's algorithm (Humlicek 1979). The Voigt convolution integral in Eq. (19) can be written using these quadratures as follows: where ν k and w G−H k are the Gauss-Hermite quadrature points and weights, respectively (k = 1 . . . N G−H ) and ν is related toν via Eq. (20). In this form the computation of Voigt can be also generalised to produce the area-conserved integrals using Eq. (28): where y ± k,fi is defined in Eq. (29). We usually take N G−H = 30 Gauss-Hermite points. This approach does not appear to have been taken previously.

Vectorized Voigt approximation
Evaluation of Voigt line profile is generally one of the biggest bottlenecks in opacity calculations. Here we present a new approximate cross section algorithm for the Voigt line profile, which leads to efficient vectorization and thus fast calculations. Our approach is based on the observation that the shape of the wings of the Voigt profile (> 4 cm −1 from the line centre), at least for Humlíček's algorithm, is relatively constant over the large variation ofν as Lorentzian broadening is generally the largest contributor. For example, Figure 6 shows how the wings of the Voigt profile centred atν fi = 1 differ from the wings of other Voigt profiles centred at all otherν fi across the entire wavenuber range from 0 to 30 000 cm −1 (computed using Humlíček's algorithm). As expected, the error grows as the Doppler HWHM (Eq. 16) increases with transition wavenumber. However this never exceeds more than 1% for even the lowest pressure. One of the most interesting observations is that at 10 0 bar, the relative error is almost the same as the mostly Doppler profile error at 10 −20 bar. With higher pressures this error falls significantly to lower than 10 −2 % and lower temperatures reduces this by orders of magnitudes. It is only around the line centre, which we estimate to be within 4 cm −1 , that the variation of the line shape of the Voigt profile is important.
Based on this observation, the Voigt profile f V−Ṽ ν fi ,α D ,γ L (ν) can be split into two parts as follows (omitting the indeces α D , γ L for simplicity): Here βν is a parameter that is used to prevent discontinuities at ν =ν fi ± 4 cm −1 when switching between the two profiles and is given by: This parameter is included for completeness and is generally set to β = 1 for a performance boost as the discontinuities are not visible at most scales for a single transition and invisible once the whole spectrum is considered. For a given set of pressure broadening parameters γ L we pre-compute a set of points defining the wings f ref α D ,γ L and then simply select a relevant set. Therefore the only palce where real Voigt calculation needs to be done is around the centre. Additionally, (if used) βν fi needs to be calculated at the boundary, which completes the evaluation of the given profile.
The algorithm is based on the (Humlicek 1979) approximation for the Voigt profile in Eq. (19), which is the main method used by ExoCross. The Humlíček algorithm is called only for the regions within 4 cm −1 from the line centre. Using the conventionally used Lorentz cutoff of 25 cm −1 , this means that only up to 8% of the calculation is computationally demanding giving a theoretical speed up of 12.5 times. This is illustrated in Figure  7, which shows speed up using our Vectorized Voigt algorithm when applied to the region of 0.0-300 cm −1 of the BT2 water linelist (Barber et al. 2006) at T=1900 K and P=1 bar. The speed up S for N points used to bin the wavenumber grid is defined as: where T N 0 is the time required for a standard Humlíček computation on a wavenumber grid N and T N V−V is the time required using the Vectorized Voigt method. The speed up converges to a maximum value of about 11 times compared to the standard Humlíček calculation, close to the predicted maximum speed up.
This procedure is also efficiently vectorized. Firstly, for the inner part (top of Eq. (32)), which is symmetric, only one half is computed. The other half is then merely looped through backwards and applied to the grid, requiring only to multiply by the absorption coefficient (emissivity) and to add to the opacity grid. The second vectorization occurs when dealing with the second part of Eq. (32). Here again, only a multiplication by the intensity and add to the opacity grid is required. These two loops are vectorized through the Fused-Multiply-Add (FMA) instruction. Figure 8 presents an illustration, where both the Vectorized and standard (Humlíček) Voigt methods were used to generate cross sections of water from the BT2 line list for T=1900 K and P=1 bar. The new algorithm captures all features with the total opacity for the range shown differing by only by 10 −6 cm 2 molecule −1 .
Lastly, for a full opacity calculation between 0.0-30 000 cm −1 , Table 2 shows that using no intensity threshold with the Vectorized Voigt method is almost 3.5 times faster than the full Humlíček method at 10 −30 cm molecule −1 thresholding. Comparing like for like, the Vectorized Voigt is around 10 to 12 times faster compared to the standard Humlíček method.
Future development of the algorithm will look into automatically tuning the distance from the line centre depending on the temperature and pressure parameters given.  (Barber et al. 2006) between the standard Humlíček and the Vectorized Voigt method with T =1900 K. Bottom plot: Percentage difference between the Humlíček and Vectorized Voigt method. The calculations used no intensity threshold and a wavenumber bin of 0.1 cm −1 .

Binned Vectorized Voigt with the line area preserved
Considering the importance of preserving integrated cross section in many applications, we also provide an alternative version of the Vectorized Voigt, based on re-normalization of the line area. During the precomputation stage of the Vectorized Voigt method, the total sum for all points (Σ α D ,γ L ) that lie above |ν −ν fi | > 4 cm −1 is computed and stored alongside the reference Voigt profile. When computing the Vectorized Voigt on a transition, the central Humlíček region is evaluated into a temporary array and its sum is added to Σ α D ,γ L . After which the scaled absolute intensityĨ f i is computed as: Both the temporary Humlíček array and reference Voigt is applied to the opacity grid with the scaled intensityĨ f i . Whilst not a proper treatment of area conservation as that given by Eq. (31), it serves as a reasonable approximation and, as shown in Table  Table 2. Time taken (s) for differing methods and intensity thresholds (cm molecule −1 ) to compute opacities using the 500 million transitions of the BT2 water line list (Barber et al. 2006) between 0 and 30 000 cm −1 with a wavenumber binning of 0. 3, gives good results within 1% of the total summed absolute intensity for even large wavenumber bins. To our knowledge, this method does not appear to be reported before.

Broadening parameters
The Voigt profile as a convolution of Doppler and Lorentzian profiles requires definition of the corresponding line widths (HWHM), α D (see Eq. (16)) and γ L , given by Eq. (18). The Doppler parameter α D (T ) is easy to deal with. It does not depend on the molecular states, only the line position and can be always computed on the fly. The Lorentzian (Voigt) parameters γ 0 (P 0 , T 0 ) and n L however are very different for different molecules. Besides they show a pronounced dependence on the state quantum numbers, with the rotational (J) state dependence being the strongest. The two-file format of the ExoMol database requires special structure for the broadening parameters. Instead of using the conventional line-by-line approach employed by spectroscopic databases such in HITRAN (Gordon et al. 2017) or GEISA (Jacquinet-Husson et al. 2016), where the pressure broadening is specified for the each transitions, ExoMol's broadening parameters are stored in separate files with the extension .broad ). This structure is justified for most applications as the same parameters are usually used for a large number of different transitions. The latter is either due to the absence of broadening information on all the lines or due to the weak dependence of these parameters for different states. This structure was recently implemented for a number of molecules including H 2 O, CH 4 and HCN (Barton et al. 2017;Yurchenko et al. 2017b). Table 4 shows an extract from the .broad file for CS as an example. Each line in .broad has the following structure: type (a0, a1, . . .), γ 0 (P 0 , T 0 ), n L and quantum numbers defined by the type.
Currently ExoCross supports three following broadening schemes, constant, a0 and a1, depending on the rigorous quantum numbers J ′ and J ′′ . The simplest case is when γ 0 (P 0 , T 0 ) and n L are constant and the .broad data is not required. The a0 type corresponds to the J-dependence only. In this case the 4th column in the .broad file contains the J values. The J quantum number is a mandatory quantity in the ExoMol format (column 4 in .states) and is therefore relatively straightforward to handle. A similar scenario (a1) is when the broadening depends on the upper J ′ (column 5 in .broad) and lower J ′′ (column 4) rotational quantum numbers. All other broadening schemes involve dependence on some non-rigorous quantum numbers ('labels'), such as vibrational or rotational K. The non-rigorous quantum numbers and their position in the .states file are molecule dependent and thus need to be specified. This information can be found in the ExoMol's .def (API) file. The current version of ExoCross supports rigorous quantum numbers only and therefore does not require interfacing with the ExoMol database.

Mixtures of broadeners
We consider different broadeners to be independent and their effect additive. Thus the total value of γ L is a weighed sum of γ L i from each broadener as given by: where ρ i is the fraction portion of the ith broadener. Here we used the fact that the cross sections from each lines are additive and thus the line profile can be represented as a weighted average of lines broadened by different species.

Off-set
Even though, at least in principle, a line profile has infinite spread, in practical calculations a frequency (or wavelength) cutoffs must be applied to limit the calculation region to around the line centre only. Not only does this influence the computation time and the accuracy of cross sections, but it is also assumed in some applications as a point of convention. For example, water cross section are conventionally taken to have a 25 cm −1 cutoff, with far-wing contributions outside this region assumed to form part of the so-called water-continuum (Shine et al. 2012). 25 cm −1 is the default cut-off value in ExoCross, alternatively it is specified in the input file.

Super-lines
The super-line approach is an efficient method for describing a molecular broadened continuum originally proposed by Rey et al. (2016) and was recently studied in detail by Yurchenko et al. (2017a). The super-lines are constructed as temperature-dependent intensity histograms as follows (see also detailed discussion by Rey et al. (2016)). We divide the wavenumber range [ν A ,ν B ] into N frequency bins, each centred around a grid pointν k . For eachν k the total absorption intensity I k (T ) is computed as a sum of absorption line intensities I fi , as in Eq. (1), from all f → i transitions falling into the wavenumber bin [ν k − ∆ν k /2 . . .ν k + ∆ν k /2] at the given temperature T . Each grid pointν k forms a super-line of an artificial transition with an effective absorption intensity I k (T ). The superline lists are given in a two-column format {ν k , I k (T )} with precomputed intensities I k , in the same format as used to store Ex-oMol cross-sections ). The filename have The histograms in ExoCross can be produced as cross sections using the Bin-option in the input file (see Manual), which is basically just a sum of all intensities within a given bin i. Ones the histograms are computed (in the standard cross section twocolumn format), they can be treated as normal line lists. In this case the .states file is not needed as all the information has been already included into the line position and intensity. Moreover, since the states-specific information is completely lost from the line characteristics, the state-dependent line profiles can not be used for temperature/pressure broadening. Doppler line profiles require no information on the upper/lower states and are not restricted. However for the Voigt pressure broadening parameters, which usually depend (at least) on J, only constant values of γ 0 and n L (see Eq. (18)) can be used in conjunction with super-lines. For this reason the super-lines are recommended for description of featureless continuum produced from the weaker lines only. The stronger lines should be treated as usual, line-by-line.

User-defined profiles
New line profiles, see Tennyson et al. (2014) for example, can be easily implemented to ExoCross by the user. A detailed description is provided in the manual. The HITRAN option in ExoCross can be used as an example.

Calculation protocol
The typical ExoCross calculation includes the following steps (see Fig. 9

Data formats
ExoCross currently takes in input in either ExoMol or HITRAN format. It can provide output in these formats and in the format used by the Phoenix radiative transport code (Jack et al. 2009). These formats are discussed in turn below.

ExoMol format
A line list is defined as a catalogue of transition frequencies and intensities (Tennyson & Yurchenko 2012). In the basic ExoMol format (Hill et al. 2013b), adopted by ExoCross, a line list has a compact structure consisting of two files: 'States' and 'Transitions'; an example for the list NOname line list for 14 N 16 O (Wong et al. 2017) is given in Tables 5 and 6. The 'States' (.states) file contains energy term values supplemented by the running number n, total degeneracy g tot n , rotational quantum number J n (all obligatory fields), other quantum numbers and labels (both rigorous and not rigorous), lifetimes and Landé gfactors. For example for a generic open-shell diatomic molecule, the quantum numbers include υ, Λ, parity (±), Σ, Ω and the electronic state label (e.g. X2Sigma+) ). The 'Transitions' (.trans) file contains three obligatory columns, the upper and lower state indexes n f and n i which are running numbers from the 'State' file, and the Einstein coefficient A fi . For the convenience it also sometimes provides the wavenumbersν f i as the column 4. The line list in the ExoMol format can be used to simulate absorption or emission spectra for any temperature in a general way.

HITRAN
The current "HITRAN format" is fully specified in Table 1 of the 2004 edition of HITRAN (Rothman et al. 2005). This format, which is also used for the current release of the related hightemperature database HITEMP (Rothman et al. 2010), has been implemented here.
Although the HITRAN format is widely adopted as a de facto standard, we advise some caution before adopting it. The format is rather verbose and can become extremely unwieldy as a means of representing large line lists. The format is highly tuned towards Earth atmosphere application (e.g. in its choice pressure broadening parameters and temperature ranges) and is therefore rather inflexible for other applications. HITRAN themselves have recognised these issues and have introduced their own web-based interface HAPI Kochanov et al. (2016) to act as front end and to perform data compression. The database itself has moved to an online-version which provides much more flexibility than the 2004 format Hill et al. (2013a).

Improving data processing
Both the cross-section and intensity steps (see Fig. 9) are OpenMP parallelized. Users can specify the number of processors requested, which is otherwise set to 1 (no parallelization). In order to make reading and processing data from the .trans file more efficient, ExoCross reads line transitions in chunks of N lines, not line-by-line. 'Caching' these records into RAM allows for the parallelization for both the transition filtering and of the computation of line-profiles. Each thread is given their own version of the opacity grid to perform work independently without the usage of atomic operations or mutex locks. The total opacity grid can be retrieved at the end of the program run combining all threads' opacity arrays. This number N is either specified in the input file or estimated based on the memory available on the system (default). The number of processors must be specified in the input as well (see below on the memory handling).

Filters
ExoCross allows the selection of specific bands/states when computing intensities using the 'filter' option. The filters are based on the column-numbers containing the corresponding quantum labels of the upper and lower states. For example, the vibrational quantum number in the NOname line list is given in the column number 10 (see Table 5), which can be used to generate absorption cross section of NO for the overtone band = 5, i.e. for transitions between ′ = 5 and ′′ = 0 of NO, by referring in the input the corresponding values from the column 10 (see Manual for details). Another typical example is to generate cross sections for specific electronic bands, see Fig. 10, where an overview of three absorption electronic bands X-X, A-X, A-A of NaH is shown (Rivlin et al. 2015).
The filter-feature will work even if not all states are assigned. According to the ExoMol convention, the string NaN (with any combination of upper an lower cases) is used for missing quantum labels. Thus 'NaN' in this case will be effectively used by ExoCross' filter as a quantum label.

Units
The default units of ExoCross are listed in Table 7. Microns (µm) can be optionally used for wavelength as alternative to wavenumber (default). Pressure does not have designated units; it is assumed to have the same units as of the parameter P 0 defining the broadening parameter γ, see Eq. (18).

Memory handling
The program records and controls the memory used at all processors. For proper control, the user is requested to specify the memory available on the machine in Gb or Mb. This number is used, for example, to estimate the number of transition lines from .trans processed simultaneously. At the end of the program a memory usage report is given.

Conclusion
We present a new Fortran program ExoCross to compute different spectroscopic properties of molecules using spectral line lists. The program has being actively used by ExoMol to generate absorption cross sections using the ExoMol line lists available at www.exomol.com. In order to work with huge sizes of some line lists, ExoCross is optimized for efficient usage of parallelizm and vectorisation. Our new Voigt algorithm (Vectorized Voigt) is designed to be fast and accurate.
The program can easily be extended by users with their profiles or other functionality.
We are planning to provide production of k-coefficients as part of ExoCross in the future; integrate the API via the Exo-Mol .def file; reading the partition function from an ExoMol .pf file; implement a non-LTE model, which does not require definition of non-rigorous quantum numbers (see Section 2.11).