A hybrid line list for CH$_4$ and hot methane continuum

Molecular line lists are important for modelling absorption and emission processes in atmospheres of different astronomical objects, such as cool stars and exoplanets. In order to be applicable for high temperatures, line lists for molecules like methane must contain billions of transitions, which makes their direct (line-by-line) application in radiative transfer calculations impracticable. Here we suggest a new, hybrid line list format to mitigate this problem, based on the idea of temperature-dependent absorption continuum. The line list is partitioned into a large set of relatively weak lines and a small set of important, stronger lines. The weaker lines are then used either to construct a temperature-dependent (but pressure-independent) set of intensity cross sections or are blended into a greatly reduced set of super-lines. The strong lines are kept in the form of temperature independent Einstein A coefficients. A line list for methane is constructed as a combination of 17 million strong absorption lines relative to the reference absorption spectra and a background methane continuum in two temperature-dependent forms, of cross sections and super-lines. This approach eases the use of large high temperature line lists significantly as the computationally expensive calculation of pressure dependent profiles only need to be performed for a relatively small number of lines. Both the line list and cross sections were generated using a new 34 billion methane line list (34to10), which extends the 10to10 line list to higher temperatures (up to 2000 K). The new hybrid scheme can be applied to any large line lists containing billions of transitions. We recommend to use super-lines generated on a high resolution grid based on resolving power (R = 1,000,000) to model the molecular continuum as a more flexible alternative to the temperature dependent cross sections.


Introduction
Methane is one of the key absorbers in the atmospheres of exoplanets and cool stars. Due to a large number of relatively strong lines (up to several billion) at high temperatures, the calculation of cross sections becomes extremely computationally expensive. The contribution of each line to the total absorption must be taken into account by summing their individual cross sections, usually computed using Voigt profiles, on a grid of wavelengths. To make radiative transfer calculations using these line lists more tractable the line lists are usually converted into pre-computed tables of temperature and pressure dependent cross sections, or k-coefficients, for specific atmospheric conditions (temperature, pressure, broadeners) (Amundsen et al. 2014;Malik et al. 2017). Subsequent radiative transfer calculations interpolate in these tables. However, the calculation of these cross sections and k-coefficients still require the contributions of all lines to be summed, if only once for each atmospheric condition. Both pre-tabulated cross sections and k-coefficients are less flexible than a line-by-line approach, but computationally more efficient.
As part of the ExoMol project (Tennyson & Yurchenko 2012) we have produced an extensive line list for methane ( 12 CH 4 ), called 10to10 (Tennyson & Yurchenko 2012), containing almost 10 billion transitions. The line list was constructed to describe the opacity of methane for temperatures up to 1500 K. The 10to10 line list been has shown to 2000 K. All other computational components (potential energy and dipole moment surfaces, basis sets etc.) are the same as in .
In order to mitigate the difficulty of using such an extremely large line list, we propose dividing it into two subsets, responsible for strong and weak absorptions. The first question is how to define and separate 'strong' and 'weak' transitions. Due to the large dynamic variation of the methane intensities, a single intensity threshold would be not optimal due to the following factors: (i) In regions of very strong bands many lines with moderate intensities are barely visible, while weak lines which lie between the main bands can be relatively important. (ii) The definition of 'strong' and 'weak' must be temperature dependent as 'hot' bands, which are weak at low temperatures due to the Boltzmann factor, become stronger with increasing population of excited lower states at higher temperatures. (iii) At the same time the intensities of the fundamentals and overtones decrease with temperature due to the decrease of their relative population (e.g. due to larger partition function). (iv) Finally, even relatively weak lines at longer wavelengths are very sensitive to pressure variations due to their relatively lower density. It is therefore necessary to take these factors into account when defining the intensity partitioning thresholds.
To aid building a 'strong'/'weak' partitioning, we introduced a reference CH 4 opacity α ref (ν) based on two temperatures, T 1 = 300 K and T 2 = 2000 K, and two pressures, P 1 = 0 bar and P 2 = 50 bar, on a wavenumber grid of ∆ν = 1 cm −1 (ν = 0 . . . 12000 cm −1 ) by choosing the maximum cross section value amoung these four at each wavenumber grid point k: The reference average intensities (cm/molecule) can then be defined as: (2) Figure 1 shows the reference cross section curve used here for the 34to10 line list. We then define the 'strong'/'weak' partitioning using two criteria, one dynamic and one static. Static: All lines stronger than the threshold I thr are automatically taken into the 'strong' section (e.g. I thr = 10 −25 cm/molecule). Dynamic: The linẽ ν f i from the wavenumber bin k (ν f i ∈ [ν k − 0.5 cm −1 ,ν k + 0.5 cm −1 ]) is 'strong' if all four reference absorption intensities are stronger than the reference (average)Īν k intensity by some scaling factor C scale (e.g. stronger than 10 −5 ×Īν k ). The scaling factor C scale is made wavenumber dependent using the following exponential form, also shown in Fig. 2: This scaling is necessary to take into account the importance of the varying density of lines at different spectroscopic regions for the accurate description of the line profiles: the smaller number of lines at the longer wavelengths means the Article number, page 3 of 12 cross sections are more sensitive to the shape of the profiles as well as to the sampling of the grid points. At the shorter wavelengths the spectrum is smoothed out by the large number of overlapping lines, which is therefore less sensitive to these factors. With this expression we thus assume a quasi-exponential increase of the density of lines vs wavenumber, or, colloquially, a quasi-exponential decrease of their importance.  Figure 3 and Figure 4 illustrate how these partitioning criteria affect the absorption cross sections and size of the strong and weak lines partitions, respectively, using the constant scale factor C scale for simplicity. For example, the combination (C scale = 10 −2 , I thresh × [cm/molecules] −1 = 10 −23 ), with C scale constant, leads to 262,470 lines. Using the scale factor C scale = 10 −5 increases the number of strong lines by an order of magnitude. For example, for the partitioning (10 −5 ,10 −21 ) we obtain 125 million strong lines. The dynamic partitioning defined by Eq. 3 in combination with I thresh = 10 −23 cm/molecules is also shown in Fig. (4) as a large triangle. This partitioning is also our preferred choice used in the following discussions as well as to construct the hybrid line list presented in this work. It results in 17 million selected lines (16,776,857) as part of the strong section, out of the original 34 × 10 10 . This is a significant reduction and should ease line-by-line calculations significantly. The remaining lines are converted into temperature-dependent histograms (super-lines) and/or cross sections to form our methane quasi-continuum, which is described below. By comparison, the HITRAN 2012 (Rothman et al. 2013) databases contains 336,830 12 CH 4 transitions.

Quasi-continuum from the Doppler line profile
The main difficulty associated with modelling cross sections (i.e. dressing lines with appropriate absorption profiles) is the pressure effect, requiring line shapes to be described using Lorentzian profiles (high pressure), Voigt profiles (moderate to high pressure) or even more sophisticated profiles ). The Doppler profile (zero pressure), however, is much simpler: it is fast to compute, with a simple parametrisation of the line width (mass and frequency dependent only), an no dependence on the transition quantum numbers, mixing ratios of broadeners etc. (Amundsen et al. 2014).
We will assume that the 'weak'-lines quasi-continuum forms a nearly featureless background that is not very sensitive to the variation of pressure (at least for moderate pressures). This means the exact shape of the lines that form this quasi-continuum is relatively unimportant and can be modelled using a pressure-independent, temperature-dependent profile. Basically, our assumption is that any realistic line profile would be applicable as long as it preserves the area as the frequency integrated cross section of each line. In order to illustrate this approach, we show in Figure 5 the quasicontinuum cross section from the 'weak' lines. The cross section was computed at 2000 K using the ExoCross code ) as described by Hill et al. (2013) for our selected partitioning using a Doppler line profile. Upper display: Methane continuum at 2000 K, P = 10 bar (blue) and the total absorption (red). The lower display shows the relative differences of the P = 0 and P = 10 bar continuum cross sections for the three wavenumber grids of ∆ν = 0.01 cm −1 (red), 0.1 cm −1 (blue) and 1 cm −1 (grey).
In order to benchmark the zero pressure Doppler-based model of the continuum absorption we also computed the corresponding cross sections using the Voigt line profile at P = 10 bar, T = 2000 K. We use the simple ExoMol pressurebroadening diet of Barton et al. (2017) to describe the Voigt broadening of CH 4 lines by 100 % H 2 . The J dependence of the pressure-broadened half-width, γ, is similar to that used by Amundsen et al. (2014), and the temperature-dependence exponent, n, is assumed to be a constant. The broadening model is provided as part of the supplementary material to this paper. A grid spacing of ∆ν = 0.01 cm −1 was chosen. Figure 5 (bottom display) also shows the relative difference between  Fig. 6. Comparison of the P = 0 and P = 10 bar cross sections for 300 K (left) and 2000 K (right): black (total P = 0), blue (continuum P = 0) and red (continuum P = 10). The middle panels are a blow up of the continuum, also for P = 0 and P = 10 bar, which are almost indistinguishable on the upper panels. The lower row shows the relative difference between the P = 0 and P = 10 bar continuum cross sections as defined in Eq. (4). The integrated area of the relative difference is 0.06 % over the region 6700 -6750 cm −1 . A wavenumber grid of ∆ν = 0.01 cm −1 was used.
the Doppler-based continuum (P = 0) and the realistic P = 10 bar continuum (Voigt) on three grids of 0.01, 0.1 and 1 cm −1 at T = 2000 K. The grid of 0.01 cm −1 shows the fluctuations of the error within 2-8 %. Here the relative difference of cross sections is defined as follows where α P=0 , α P and α Tot P are the P = 0 (Doppler) continuum, P 0 continuum (Voigt) and the P 0 total cross section, respectively. The largest error is for the long wavelength region, characterized by the weakest intensities and least densities of lines. In this region the Doppler-broadened lines become increasingly narrow, which makes the cross section to be very sensitive to the grid sampling used. The best agreement is in the spectral regions with large cross sections and at short wavelengths, where the density is highest. Using coarser grids of 0.1 or 1 cm −1 drops the fluctuations to within 4 and 1.5 %, respectively. The total integrated difference should be zero by definition since the area of the Voigt profile is conserved (subject to the numerical error). However, we note that unless the background lines are optically thin the resulting integrated flux will not be conserved.
A more detailed example of the P = 0 and P = 10 bar cross sections for the region 6000 -7000 cm −1 is shown in Fig. 6 for 300 K (left) and 2000 K (right). Even on the very small scale (see a zoom-in in the middle panels of this figure) the P = 0 and P = 10 bar continuum cross sections are almost identical: the difference between the two continuum curves (P = 0 and P = 10 bar) is barely seen. The bottom panels of Fig. 6 show absolute relative differences |∆α(ν)|/α(ν) between these two cross sections. For our partitioning a 1-2 % accuracy (measured as the relative difference between these two profiles) is achieved for this region. In fact, the difference is not systematic, therefore the integrated effect should be even smaller. For example, integration of the relative difference ∆α(ν) for T = 2000 K in the region 6700 -6800 cm −1 gives an error of only 0.004 % using the grid spacing of ∆ν = 0.01 cm −1 . The fluctuations for T = 300 K between the high and zero pressure cases are slightly higher, but still within approximately 1-2 %. The integrated relative difference in this case is about 0.06 % (6700 -6750 cm −1 , see Fig. 6).
The corresponding line shapes are very different at these temperatures and pressures. The total P = 0 and P = 10 bar cross sections have very different profiles as also illustrated in Fig. 7. However the difference between continuum curves is negligible (see also Fig. 6).

Super-line approach
In this section we consider temperature-dependent lists of super-lines (Rey et al. 2016), which present a more flexible alternative to the Doppler-broadened continuum in terms of the line-profile modelling. The super-lines are constructed as temperature-dependent intensity histograms as follows (see also detailed instruction in Rey et al. 2016). The wavenumber range [ν A ,ν B ] is divided into N frequency bins, each centered around a grid pointν k . Here we assume a general case of non-equidistant grids with variable widths ∆ν k . For eachν k the total absorption intensity I k (T ) is computed as a sum of absorption line intensities I i f from all i → f transitions falling into the wavenumber bin [ν k − ∆ν k /2 . . .ν k + ∆ν k /2] at the given temperature T .
Here A i f is the Einstein A coefficient (s −1 ), c is the speed of light (cm s −1 ), Q(T ) is the partition function,Ẽ ′′ is the lower state term value (cm −1 ), c 2 is the second radiation constant (K cm), g ns is the nuclear statistical weight, J ′ is the rotational angular momentum quantum number of the upper state and I i f is the line intensity or absorption coefficient (cm 2 /molecule cm −1 ). Each grid pointν k is then treated as a line position of an artificial transition (super-line) with an effective absorption intensity I k (T ). The 'super'-line lists are then formed as catalogues of these artificial transitions {ν k , I k (T )} with pre-computed intensities I k . This can be compared to the temperature-independent ExoMol-type {ν i f , A i f } or temperature-dependent HITRAN-type {ν i f , I i f (T )} line lists. As in the case of the conventional line lists, the super-lines can be used in line-by-line modelling of absorption cross sections, which significantly reduces the computational costs. Indeed, each super-line can be dressed with the corresponding line profile to generate actual cross sections for the corresponding T and any given pressure broadening, providing that these line profiles depend only on the line positions and temperature, and not on the (for example) quantum numbers. In fact, the main disadvantage of the histograms is that they lose any information on the upper and lower states, including the quantum numbers. This information is important when dealing with the pressure-dependent line profiles, which often show strong variation with quantum numbers, particularly J. One can still assume, however, that the continuum is nearly featureless and thus not very sensitive to dependence of the line profiles on the quantum numbers of the upper or lower states.
In order to illustrate the applicability of this approximation in Figure 8 we show the error of the methane continuum at T = 2000 K and P = 10 bar as the difference between two cross sections: (i) obtained using the J-dependent Voigtprofile model by Barton et al. (2017) and (ii) obtained using constant Voigt parameters, relative to the total methane cross sections at these values of T and P. The error is within 0.05 % for the most of the frequency range and not larger than 0.1 %. Another artifact of the histogram method (apart from the limited profile description) is the error of the line position within a bin. Therefore the smaller the bin the better accuracy of the super-line list.
The important advantage of histograms is that they are very robust and efficient for computing cross sections due to a relatively small number of the super-lines defined by the density of the wavenumber grid, which is therefore much smaller (at least for methane) than the number of the original lines. For example, with the 0.01 cm −1 grid spacing, the size of a histogram at a given T is only 1,200,000 grid points (super-lines) for our line list coverage (< 12, 000 cm −1 ), which is much smaller than the original 34 billion lines. Even for a more sophisticated four-grids model suggested by Tennyson et al. Fig. 8. Relative error from using J-independent line broadening to describe methane continuum at high temperature (T = 2000 K) and pressure (P = 10 bar) as the difference between two cross sections (J-dependent a 0 model vs J-independent model) relative to the total cross sections. The wavenumber grid of ∆ν = 0.1 cm −1 is used.
(2016) (∆ν = 10 −5 cm −1 for 10-100 cm −1 , 10 −4 cm −1 for 100-1,000 cm −1 , 0.001 cm −1 for 1,000-10,000 cm −1 and 0.01 cm −1 for > 10, 000 cm −1 ) we obtain 28,200,000 super-lines, which also should not be a problem for line-by-line practical applications. Since the long wavelength region is always more demanding in terms of the accuracy, such dynamic grids are more accurate. In the following we also propose another dynamic grid based on a constant resolving power, R.
In order to benchmark the super-line approach we have computed three sets of histograms for T = 2000 K representing the continuum of methane (i.e. from the 'weak' lines only) using the following grid models: histrogram I is with a constant grid spacing of 0.01 cm −1 (1,200,000 points); histrogram II consists of four sub-grids proposed by Tennyson et al. (2016) (28 million points); histrogram III is constructed to have a constant resolving power R of 1,000,000 (7,090,081 points). The constant R-grid can be defined to have variable grid spacings as given bỹ Thus the vavenumber grid pointν k (k = 0 . . . N(R)) is given by: where a = (R + 1)/R andν A =ν 0 is the left-most wavenumber grid point (cm −1 ). The total number of bins, N(R) is given by: whereν B =ν N is the right-most grid point and N(R) + 1 is the total number of the grid points. The histrograms I, II and III were used to generate the continuum cross sections of CH 4 at P = 10 bar. Here we assumed the Voigt profile with constant parameters (γ 0 = 0.051 cm −1 , n = 0.44, T 0 = 298 K and P 0 = 1 bar) and used the grid with ∆ν = 0.01 cm −1 . These cross sections were then compared to the corresponding continuum cross sections (T = 2000 K, P = 10 bar) computed line-by-line directly from the 34to10 line list. All histogram models show very similar, almost identical deviations, well below 0.1 % for the most of the range. Figure 9 illustrates the relative errors obtained for the R = 1, 000, 000 histogram model. Now we turn to the case of the pure Doppler broadening (P = 0 bar, T = 2000 K, grid spacing ∆ν = 0.01 cm −1 ), where the lines are sharper and narrower, such that the line width may become comparable or even smaller than the grid spacing. Figure 10 illustrates the errors for the same three histogram models. The uniform histrogram I of 0.01 cm −1 (1,200,000 points) exhibits the largest errors in the low frequency region, while the two adaptive grids show errors within about 4-5 %. Clearly, ∆ν = 0.01 cm −1 is too coarse for the super-line approach to describe the low frequency range in the the zero pressure case, therefore we recommend using grids with more points (lines) in the region below 1000 cm −1 . For the denser histrograms II and III the error drops to <0.2-0.5 %. The histogram III (resolving power R = 1, 000, 000) shows a more even error distribution. Fig. 9. Relative errors using the histogram-model R = 1, 000, 000 to describe the methane continuum at T = 2000 K and P = 10 bar as the difference with the 34to10 cross sections (Voigt-model) relative to the total 34to10 cross sections. The wavenumber grid of ∆ν = 0.01 cm −1 is used. Fig. 10. Relative error from the histogram-model for three different grids to describe the methane continuum at T = 2000 K and P = 0 bar as the difference with the 34to10 cross sections (pure Doppler-model) relative to the total 34to10 cross sections at P = 0 bar. The wavenumber grid of ∆ν = 0.01 cm −1 is used.
Similar comparison for T = 300 K showed even better agreement, with errors about an order of magnitude smaller than those found for T = 2000 K. Using a coarser grid to simulate cross sections (e.g. ∆ν = 0.1 cm −1 ) also drops the errors by an order of magnitude.
For super-lines it is obviously important that the underlying grid spacing is not too large compared to the line width. This is illustrated in Fig. 11, which shows the P = 0, T = 2000 K continuum cross sections modelled using the R = 100, 000 histogram with the Doppler profile. It is clear that the Doppler line width is smaller than the separation between the super-lines, which leads to strong oscillations. In fact the same histogram performs well in case of much broader lines when modelling P = 10 bar, Fig 8. In order to estimate the impact of the errors in the continuum models in actual atmospheric radiative transfer and retrieval calculations, we have calculated the transmission T and the relative error in the transmission ∆T /T corr from the continuum models, where and α(ν) is the total cross section, u is the column amount and T corr is the correct transmission calculated from the direct line-by-line evaluation of the 34to10 line list using the a 0 Voigt model Barton et al. (2017). We show the transmissions and errors in Figure 12 obtained using both continuum models with column amounts ranging from 10 19 to 10 24 molecule/cm 2 at T = 2000 K, and P = 0 and P = 10 bar. The histogram model performs extremely well for the high pressure case (lower display) with the errors within 1 % and significantly better than the Doppler-grid model (upper display). The errors in the histogram model at zero-pressure are higher due to the very narrow lines at small wavenumbers (middle display), but should be acceptable for most of the applications (within 5 %). If higher accuracy is required, the histogram resolution should be increased.

Hybrid line list and temperature dependent continuum cross sections
Our partitioning of the total 34,170,582,862 lines in our new 34to10 line list leads to 16,776,857 strong and 34,153,806,005 weak lines. The latter were used to (i) generate temperature-dependent continuum cross sections (Doppler-broadened) and (ii) temperature-dependent histograms of super-lines for the following set of temperatures: 296 K, 400 K, 500 K, 600 K, 700 K, 800 K, 900 K, 1000 K, 1100 K, 1200 K, 1300 K, 1400 K, 1500 K, 1600 K, 1700 K, 1800 K, 1900 K, and 2000 K. A wavenumber grid with constant R = 1, 000, 000 consisting of 7,090,081 points (super-lines) was adopted for the total range of 0 -12000 cm −1 . The remaining 16,776,857 strong lines together with the .states file containing 8,194,057 energies form a line list in the standard ExoMol format . The super-lines are stored in the two-column format with the frequency wavenumbers (cm −1 ) and absorption coefficients (cm/molecule), which is the same as the format used for the ExoMol cross sections . Thus the histogram format does not require any information on the upper/lower states, temperature, partition function, or statistical weights, only the line profile specifications are needed. The line broadening can only depend on the wavenumber. The hybrid line list is given as supplementary material to this paper via the CDS database http://cdsarc.u-strasbg.fr and can also be found on the ExoMol website www.exomol.com. We also include the Voigt-model used in the simulations of cross sections.

Conclusion
We have extended our previous 10to10 methane line list to higher temperatures, the result of which is a new line list containing 34 billion transitions. Line lists of this size are impractical to work with as the calculation of cross sections becomes extremely computationally expensive. We have therefore explored the idea of partitioning this line list into a relatively small subset of strong lines which are retained and to be fully treated in any cross section calculation, and augmented by a temperature-dependent quasi-continuum which represents the contribution of the remaining lines. A key assumption is that assume the this quasi-continuum to be essentially featureless and not very sensitive to the variation of the pressure broadened line shape. The strong lines are selected such that they retain the flexibility required to describe the variation of the shape of the methane absorption with pressure. Two P-independent models were tested to represent the continuum built from the weak lines, Doppler-broadened cross sections and super-lines. For the Doppler-broadened scheme, the assumption is that the methane continuum does not strongly depend on pressure and can be modelled using the pressure-independent line profiles (Doppler). The error of Transmissions computed using the Doppler (upper display) and histogram (R = 1, 000, 000, middle and lower displays) continuum models with relative errors for the column amounts 10 19 , 10 20 , 10 21 , 10 22 , 10 23 , and 10 24 molecule/cm 2 at T = 2000 K, P = 0 and P = 10 bar. The upper part of each display shows the total transmission obtained both from the strong weak lines at this temperature, while the lower part shows the relative error comparing to the direct line-by-line evaluation from the 34to10 line list using the a 0 Voigt model (Barton et al. 2017). The error in regions with very small transmissions (< 10 −4 ) are removed as the medium is optically thick. this approach on dense grids (0.01 cm −1 ) ranges from within 8 % for long wavelength down to within 3 % above 1 µm. The coarser grid of 0.1 cm −1 gives the errors within 2 %.
The super-lines approach is more flexible as it allows the continuum to depend on pressure. The variation with pressure, however, should not depend on the upper or lower states, only on the line position. For this model we also introduced the dynamic grid representation with a constant resolution, with the grid spacing changing as a function of the wavenum-ber to keep R =ν/∆ν the same. Each grid in this histogram model point containing the total absorption within the ∆ν bin is then used as a super-line. We find that the super-lines built as histograms on an adaptive grid of a high resolution are more accurate for absorption modelling, and therefore was put forward as the ExoMol standard. The typical errors even for dense grids are within 1 %. With our selected partitioning we retain 17 million strong lines for our strong line lists and computed a set of histograms containing 7,090,081 points (super-lines) for a set of 18 temperatures using a dynamic wavenumber grid with a resolution of R = 1, 000, 000. The strong lines are given as the ExoMol line list while the continuum histograms are presented using the ExoMol format developed for cross sections . We recommend to use this hybrid line list based on the super-line approach for line-by-line atmospheric modeling of methane absorption. For low pressures and short-wavelengths, the resolution might need to be increased to higher than 1,000,000 due to the very narrow Doppler-broadened lines and their low density in this region.
The integrated errors of cross sections over an extended frequency range (significantly larger than than linewidth) are found to be vanishingly small if the line profiles used preserve the area (subject to numerical accuracy). That is, for optically thin atmospheres both continuum models will guarantee that exact answer for the integrated opacities. We have also shown that even in case of realistic, not optically thin media, the super-line approach leads to very small error transmission.