Helioseismological determination of the subsurface spatial spectrum of solar convection: Demonstration using numerical simulations

Understanding convection is important in stellar physics, for example as an input in stellar evolution models. Helioseismic estimates of convective flow amplitudes in deeper regions of the solar interior disagree by orders of magnitude among themselves and with simulations. We aim to assess the validity of an existing upper limit of solar convective flow amplitudes at a depth of 0.96 solar radii obtained using time-distance helioseismology and several simplifying assumptions. We generated synthetic observations for convective flow fields from a magnetohydrodynamic simulation (MURaM) using travel-time sensitivity functions and a noise model. We compared the estimates of the flow with the actual values. For the scales of interest ($\ell<100$), we find that the current procedure for obtaining an upper limit gives the correct order of magnitude of the flow for the given flow fields. We also show that this estimate is not an upper limit in a strict sense because it underestimates the flow amplitude at the largest scales by a factor of about two because the scale dependence of the signal-to-noise ratio has to be taken into account. After correcting for this and after taking the dependence of the measurements on direction in Fourier space into account, we show that the obtained estimate is indeed an upper limit. We conclude that time-distance helioseismology is able to correctly estimate the order of magnitude (or an upper limit) of solar convective flows in the deeper interior when the vertical correlation function of the different flow components is known and the scale dependence of the signal-to-noise ratio is taken into account. We suggest that future work should include information from different target depths to better separate the effect of near-surface flows from those at greater depths. The measurements are sensitive to all three flow directions, which should be taken into account.


Introduction
Helioseismic inferences of convective motions in the solar interior yield consistent results in near-surface layers using different techniques (e.g., De Rosa et al. 2000;Braun & Lindsey 2003;Hindman et al. 2004;Gizon & Birch 2004;Langfellner et al. 2014Langfellner et al. , 2015. One important statistical property of convective motions is their spatial power spectrum, which quantifies the scale-dependent amplitude of the velocities. Inferences of the power spectrum of convective motions in deeper regions of the solar interior have led to conclusions that differ by more than an order of magnitude (Hanasoge et al. 2012Greer et al. 2015). Understanding these differences and obtaining reliable estimates of the spectrum of convective motions in the Sun is important for several processes in stellar interiors, such as the maintenance of differential rotation, and for understanding the role of convection in the emergence of active regions. This is even more urgent because convective velocities from numerical simulations of solar convection exhibit largescale convective motions that are far too strong, even compared to surface observations (Gizon & Birch 2012;Lord et al. 2014). This last issue is often referred to as the convective conundrum.
To date, the only methods that have been used to constrain the spectrum of convective motions in the deeper solar interior are time-distance helioseismology (Hanasoge et al. 2010(Hanasoge et al. , 2012 and ring-diagram analysis (Greer et al. 2015). As helioseismic techniques, both infer solar interior flows from surface observations of stochastically excited sound waves and surface-gravity waves, which propagate through the interior. To do so, timedistance helioseismology (Duvall et al. 1993) uses travel times of waves that travel different distances and hence probe different depths. In a ring-diagram analysis (Hill 1988), observations of the local power spectrum of the surface oscillations are used and the Doppler-shifted frequency of the seismic waves is measured by way of a background flow field. Gizon & Birch (2005) and Gizon et al. (2010) reviewed these local helioseismic techniques, and Hanasoge et al. (2016) reviewed their application to probing interior convection. In addition, global helioseismology can be used to infer properties of time-dependent flows such as convection (e.g., Roth & Stix 2003;Woodard 2016;Mani & Hanasoge 2020). This method relies on global properties of the oscillations such as mode frequencies or amplitudes. Time-dependent flows result in mode coupling, which can be measured in global observations and be used to reveal flow properties.
For the case of time-distance helioseismology, Hanasoge et al. (2010Hanasoge et al. ( , 2012 estimated an upper limit of convective motions based on several simplifying assumptions, notably that the signal-to-noise ratio of the helioseismic measurements is independent of the horizontal spatial scale and that the signal in travel-time measurements is predominantly due to the flow in one direction at or near a targeted depth. We here therefore aim to assess the validity of the method suggested by HDS2012 to infer an upper limit on convective flow amplitudes in the solar interior. To do this, we generated a number of realizations of synthetic travel-time maps using background flow fields from existing numerical magnetohydrodynamic (MHD) simulations of quiet-Sun convection (Lord et al. 2014). As we know the input flow, we can assess the performance of the resulting estimate and the underlying assumptions. To generate the synthetic observations, we employed a forward model of the signal and a noise model. In the forward model, the signal in the travel-time maps is computed using the first Born approximation (Böning et al. 2016), which takes singlescattering perturbations to the wave field into account (Gizon & Birch 2002). We obtain realizations of the stochastic noise from a Gaussian noise model in Fourier space (Gizon & Birch 2004;Fournier et al. 2014), which is based on the assumption of a large number of random wave sources homogeneously distributed in space and time. These are well-tested methods used in the community (e.g., Böning et al. 2017;Gizon et al. 2020).

Methods
We consider the problem of estimating subsurface convective flow amplitudes from simulations in Cartesian geometry. We assume that the statistical properties of the flow do not depend on location within a horizontal layer.
Clearly, the use of Cartesian geometry is a simplification, especially when the interest is on larger horizontal scales. The main effects of this simplification are that (i) the resolution in Fourier space is reduced compared to covering the entire sphere, and (ii) effects of the spherical geometry such as rotation and curvature on the flow are not modeled. These are important shortcomings that should be addressed in future studies. However, given the order-of-magnitude difference between HDS2012 and Greer et al. (2015), our main interest in this work is to verify whether one of the proposed methods (HDS2012) is working at all, and potentially, to give clues on the cause of the differences.
In addition, the essential problem that convective flow amplitudes are larger in simulations than in observations is also present for Cartesian simulations of convective flows (Lord et al. 2014, see the comparison to surface observations in Fig. 5). In Cartesian geometry, more realistic simulations of convection are available than in spherical geometry because radiative transfer can be better included and smaller turbulent scales can be resolved (Kupka & Muthsam 2017).
In this section, we first summarize the methods that we used to model the method of HDS2012 and to create synthetic data. We then summarize the assumptions that were made by HDS2012 to obtain an upper limit on the spectrum of convection to be able to study the validity of these assumptions in the following sections.

Convective flow fields from MURaM
As input for generating synthetic observations, we used 66 consecutive snapshots of MHD simulations of solar convection. These simulations were obtained by Lord (2014) using the MURaM code (Vögler et al. 2005) with the vertical magnetic flux fixed to zero. The details for these simulations are given in Lord (2014, see the case "zero net flux" in Tables 2.1 and 5.1, as well as Sections 2 and 5) 1 . Results for the purely hydrodynamic case of this simulation setup were reported in Lord et al. (2014). After the hydrodynamic simulations had reached a statistically steady state, Lord (2014) initialized the magnetic field with a root mean square magnetic field strength of 0.5 G while keeping the horizontally averaged vertical magnetic flux at zero. The dynamo-related magnetic field amplification resulted in a photospheric root mean square magnetic field strength of 140 G and a photospheric mean unsigned vertical magnetic flux of 37.2 G. Lord (2014) showed snapshots of the vertical velocity u z of the hydrodynamic case in Figures 2.2 and 2.3 and horizontally averaged density, temperature, and pressure profiles in Figure 2.5, which are close to a standard solar model (Christensen-Dalsgaard et al. 1996). Snapshots and horizontal averages of background quantities are very similar for the magnetic case (see Lord 2014). We show a snapshot of the horizontal velocity u x in Figure C.1.
We used 66 consecutive snapshots that were archived by Lord (2014). They were extracted at a cadence of 3.8 hours of solar time, and rebinned to 512 × 512 × 384 grid points with a resolution of 384 × 384 × 128 km 3 in a Cartesian box sized 196 × 196 × 50 Mm 3 in the (x, y, z) directions, where z points upward and L = 196 Mm is the horizontal extent of the box. The original spatial resolution had twice as many grid points in all directions. The reduction in resolution only changes the smaller scales in the simulation, while in this study, we are interested in the very largest scales, which are relatively unaffected by the reduction in resolution. The simulation extended to a depth of z = −48.5 Mm (r = 0.93 R , or about 16 cumulative pressure scale heights from the surface). This study is targeted at flows around r = 0.96 R , which is about 2.2 local pressure scale heights from the bottom boundary.
The helioseismic data in HDS2012 were averaged over time periods of one day. The data are thus sensitive to temporal averages of convective motions. We here mimicked this temporal averaging by subdividing the simulated cubes into 11 chunks of 6 × 3.8 h = 22.8 h and by averaging the flow field over the six snapshots for each chunk. This is further discussed in Sections 2.4.3 and 4.

Forward model in Cartesian geometry
As the simulated flow fields are given in Cartesian geometry and are horizontally periodic, we adopt a Cartesian geometry for the rest of this paper. The Cartesian box was centered at the equator. In order to convert from Cartesian coordinates (x, y, z) into spherical coordinates (r, θ, φ) and vice versa, where r is the distance from solar center, θ is the colatitude and φ is the longitude, we used the relations x = φR , y = (π/2 − θ)R , z = r − R . To facilitate comparison with HDS2012, we continue to refer to the z coordinate by the corresponding value for r.
In Cartesian geometry, the convolution theorem can be used, see Appendix A, to write the relation between travel times τ and flows u = (u x , u y , u z ), which is mediated by the sensitivity func-tion K = (K x , K y , K z ) in the Fourier domain, where we mark the horizontal Fourier transform only by the use of the horizontal Fourier variable or wave vector, k, see Appendix A, and we used = kR to convert from horizontal wave number k = |k| into harmonic degree. In Eq.
(1), we have summed over all vertical grid points z and horizontal grid points x , we multiplied by the uniform vertical grid spacing h z and horizontal grid spacing h x = h y , and we used to define a horizontal convolution. In our computations, we used the discretization in Eq. (1), but for brevity in notation, we write in the remainder of this paper for the vertical integral We here considered the deep-focus travel-time geometry used by HDS2012 for the target depth of r = 0.96R and computed travel-time sensitivity kernelsK for this measurement using the Born approximation and the method of Böning et al. (2016), see Figure 1. This method assumes that the flow is constant in time. Sensitivity functions obtained using the Born approximation compare well in different approaches (e.g., Birch et al. 2007;Burston et al. 2015;Böning et al. 2016;Mandal et al. 2017). To compute the sensitivity functions, we used a spatial grid that was four times as wide as the MURaM grid to capture the entire sensitivity function. It was then Fourier-projected on the MURaM grid using the periodicity of the flow field. Furthermore, we used only half the spatial resolution for computing kernels. This is sufficient because the smallest seismic wavelength is still more then ten times larger than the spatial resolution. Appendix B gives further details on the deep-focus travel-time geometry, which was modeled using the kernels. The travel-time geometry was designed by HDS2012 to be predominantly sensitive to u x flows (the east-west component of the flow) close to a target depth of 0.96 R . Figure 1 shows that the travel-time measurements are sensitive to flows in all spatial directions and especially to near-surface flows, although there is some degree of focussing for K x .
Using these kernels and the flow fields from MURaM, we obtained forward-modeled travel-time maps using Eq. (2). These travel times are free of seismic noise ( = 0) and are thus referred to as noiseless travel times. The bottom left panel in Figure 2 shows an example of these maps for one realization of the flow field. The top panels of Figure 2 show that u x indeed contributes most to the travel times, as intended by HDS2012. However, the u y and u z flow components also contribute significantly, and a large fraction of the u x contribution comes from near-surface flows (bottom middle panel in Fig. 2), which was unintended. On the other hand, the bottom right panel in the figure shows that the signal from the flows in the deeper layers is somewhat correlated with the total signal, which shows that the deep flows might be detected using these data.

Noise model and covariance of travel times
In Cartesian geometry and in the quiet Sun, it is reasonable to assume that the correlation in the noise of two travel-time measurements only depends on their separation (e.g., Gizon  −100 0 100 Cut at y = 0 Cut at r = R Fig. 1. Sensitivity of deep-focus travel times (target depth 0.96R ) to flows. The observational setup is taken from HDS2012 and is described in detail in Appendix B. We computed the sensitivity functions using the first Born approximation (Böning et al. 2016).  Figure 3 and contributions from the different flow components (top panels) as well as different depth ranges (remaining panels) obtained using Eq. (2) with = 0.

2004)
, so that, see Appendix A, where Λ is the noise covariance matrix and h k is the resolution in Fourier space. Here, we assumed that averages q = q(t) t for any quantity q are taken over many independent realizations q(t), which are observed for different nonoverlapping periods indexed by t. We assumed that enough data were averaged so that the temporal averages approximate their expectation values.
As a consequence of the horizontal translation invariance, the problem decouples in k space, that is, We compute the noise covariance matrix for the deep-focus measurement using the method of Gizon & Birch (2004) and Fournier et al. (2014) and draw realizations of the noise as described in Appendix A. Figure 3 shows an example realization of the synthetic noise. It is qualitatively very similar to the actual data while being a different stochastic realization. This can be seen by comparing the left panel in Figure 3 to Fig. 5 in the supplementary material of HDS2012. Our synthetic noise has a standard deviation of 10.3 s and the contribution of the travel-time signal to the right panel of Figure 3 has a standard deviation of 1.9 s. Both compare reasonably well with the data (HDS2012), where the noise has a standard deviation of 11.8 ± 0.3 s and the data 2.5 ± 0.02 s (Birch et al., in prep.). We note that this noise model has been tested successfully against data (e.g., Figs. 5, 6, 8, 10, and 11 in Gizon & Birch 2004;Fig. 5 in Fournier et al. 2014).
However, we also note that it is not clear why Fig. 3 in HDS2012 shows a very similar travel-time map, but with features aligned in north-south direction, as opposed to the features aligned in east-west direction in Fig. 5 in the supplementary information of HDS2012 and in our Fig. 3. Except for the difference in the orientation, the three figures are qualitatively very similar. Because the measurements were averaged (see also Fig. 1), it is expected that travel-times are more correlated in east-west direction, as is shown in our noise maps and in Fig. 5 in the supplementary material of HDS2012. On the other hand, the orientation of the noise does not affect the results in our study or in HDS2012. Therefore we do not address this issue further.

Assumptions for estimating an upper limit
In this subsection, we summarize the essential assumptions leading to the estimate obtained by HDS2012 and introduce their method. We also briefly outline the motivation for these assumptions. After summarizing the assumptions that lead to this estimate, we assess their validity in the following section. We therefore do not discuss their validity in this section. We here explicitly mark the most important assumptions made by HDS2012 to obtain an upper limit.
However, we stress already here that rather than an inequality sign, an approximate equality sign is a more adequate assumption in many cases. In the following, we therefore include alternative formulations of the assumptions as approximations for future reference in parentheses.

Noise correction
A major step is the so-called noise reduction, which aims at estimating the noiseless (or noise-reduced) travel times τ NR = τ − , which is the signal of interest. In order to estimate the amplitude of the signal from the noisy travel-time measurements, HDS2012 used a relation from Gizon & Birch (2004, Section 6). Correcting an obvious error in Gizon & Birch (2004), Hanasoge et al. (2012 used the correct version where σ 2 is the variance of the data, T is the observation time, N 2 0 /T = N 2 is the variance of the noise for an observation time T , and S 2 is the contribution of the signal to the total variance. This equation is valid in the case of time-independent flows and omits a small 1/T 2 contribution to the noise (see Eqs. (14) and (B.6) in Fournier et al. 2014). From Equation (7), it follows that and by assuming that the signal-to-noise ratio is independent of the spatial scale, HDS2012 obtained Here, q fft ( ) indicates an azimuthally averaged spectrum that was obtained from a horizontal Fourier transform of the quantity q (e.g., q = τ or q = u x ) to be compared with the spherical notation of HDS2012. The quantity q fft ( ) is defined as a mode amplitude per spherical harmonic degree (same unit as q). As usual, an appropriate summation of q fft ( ) 2 can be used to estimate the variance in real space, see Appendix A for the exact normalization. Starting from Equation (9), we use a notation like "(A1)" to mark inequalities or equalities that correspond to the most important assumptions from HDS2012 that we assess in this study. These assumptions are summarized in Section 2.4.3.
The main aspect of assumption (A1) that we assess in this study is the assumed independence of k or of the signal-tonoise ratio, which is implicit in Eq. (9). HDS2012 fit the signalto-noise ratio S /N in Eq. (7) to the data. We do not study the validity of this fitting procedure here.
In order to estimate the signal from the noisy data, HDS2012 used assumption (A1), which says that the noise-reduced traveltime mode amplitude τ fft NR ( ) is bounded by (or can be approximated by) the signal-to-noise ratio times the observed noisy travel-time mode amplitude τ fft ( ). Here, τ fft NR ( ) 2 is a spectral decomposition of the signal variance, and similarly, τ fft ( ) 2 is a decomposition of the total variance σ 2 .
For synthetic data, we know and we can thus compare this value with the value of S /N used by HDS2012.

Calibration curve
When the amplitude of the signal has been estimated from the data or a noiseless travel-time spectrum is given for synthetic data, the aim is to estimate the amplitude of the convective motions at the depth that generates the signal. HDS2012 proposed to do this using a so-called calibration curve. This calibration curve simply converts the noise-reduced travel-time spectrum into a velocity spectrum.
To introduce the derivation of the calibration curve proposed by HDS2012, including the underlying assumptions, we start from Equation (6). Assuming that the statistical properties of the data are translation invariant, we can only consider correlations τ * (k) τ(k ) for k = k . One then obtains where, following HDS2012, we assumed that the spectrum of the travel-time signal is bounded below by (or can be estimated by) the spectrum of travel-times stemming only from the xcomponent of the flow in assumption (A3), or in an intermediate step as a sum of spectra from the three components, ignoring correlations of different flow components, in assumption (A2). These assumptions are motivated by the idea that the measurement geometry is designed to be mostly sensitive to the x direction. u x contributes most to the signal, see the top left and bottom left panel in Figure 2. However, the measurements are also sensitive to the other flow components, see Figure 1 and the top middle and top right panels in Figure 2. We evaluate and discuss the validity of the assumptions made by HDS2012 and the effect on the inferred flow spectrum in detail in Section 3. HDS2012 assumed that the flow near the target depth of the travel-time measurement bounds the spectrum from below (contributes dominantly to the spectrum of travel times). In addition, they assumed that the correlation of the flow at different depths u * x (k, z) u x (k, z ) may be bounded (or modeled) by a Gaussian, more specifically that some Gaussian is centered at the target depth z T = (0.96 − 1)R (z = 0 is at r = R ) with a width of σ z = 16.77 Mm = 1.8H P , where 1.8 pressure scale heights H P is the mixing length. It is required here that the Gaussian falls off faster than the correlation function, such that and where We did not confirm whether the assumption of a Gaussian shape of the correlation function is an appropriate choice. This is left to future studies. The final assumption is where I = {k such that |k| − h k /2 ≤ /r < |k| + h k /2} selects all horizontal wave vectors of similar length, N is the number of elements in I , and = j takes the values j = j h k r with integer j in order to be compatible with the discrete Fourier grid. This introduces the so-called calibration curve, C . By making assumption (A5) in Eq. (17), we assume that the azimuthal dependence of the flow and the kernel spectra are not anticorrelated, such that the azimuthal average of the kernels and flow spectra can be taken independently, without increasing the result. This was assumed but not explicitly mentioned by Hanasoge et al. (2012). By taking the azimuthal average of the travel-time power spectrum in Eq. (11) and by using the above assumptions, one obtains the final inequality on which the estimate of the upper limit of the flow amplitude by HDS2012 is based. At the same time, we convert from Cartesian into spherical geometry in the Fourier domain using = kr and h = h k r. Here and in the definition of I above, we used r ≈ const = R for the entire domain, which is equivalent to the assumption of Cartesian geometry. In addition, we employed the convention for converting spectra from Fourier space into spherical harmonic space outlined in Appendix A, which is inspired by Birch et al. (in prep.), In order to facilitate the numerical computation of Eq. (16), HDS2012 instead chose a smaller width σ z = σ z /D, where D = 9.64, see also Birch et al. (in prep.). As HDS2012 derived K or C from numerical simulations of wave propagation, shrinking the correlation width by the factor D permitted them to deduct a larger number of independent measurements from one simulation, and thus to reduce the numerical cost. However, the quantity D at the same time acts as an additional free parameter. We show below that its introduction has important effects on the estimated spectrum.
After obtaining K(k, σ z ) and the alternative calibration curve HDS2012 then approximated C by The integration around the target depth with a Gaussian weighting of width σ z is equivalent to assuming that the flows are vertically correlated with each other for about a scale height. This is reasonable for the simulated data (Lord et al. 2014), although a scale dependence of this width may be a more adequate assumption. At the same time, the value of the vertical correlation length of the flows is not known for the Sun.

Summary of assumptions
As a result of the above assumptions, we obtain, see Eqs. (9), (21), and (23), where the equivalent of the right-hand side was used by HDS2012 to estimate the flow near the target depth. We writeû x for the estimate of u x . We now briefly summarize the most important assumptions that were explicitly marked above and that were proposed by HDS2012. We evaluate them in the following section.
(A1) The variance in the travel-time maps stemming from the signal and due to the flow is bounded above by (or can be estimated by) obtaining the S/N from a fit to the data (see Hanasoge et al. 2012, in prep.). The S/N is independent of spatial scale, see Eq. This assumption stems from the observation that the traveltime geometry is chosen such that the major contribution is in fact from this component.
(A4) The spectrum of travel times is further bounded from below by (or can be estimated by) the contribution of the dominant depth range, which is assumed to be a pressure scale height around the target depth of the deep-focus measurement, see Eq. (15). (A5) The azimuthal dependence of u x and K x is independent, that is, it does not matter whether these quantities are first multiplied and are then averaged over the azimuth of kk, or whether the average is taken first and is then multiplied, see Eq. (17). (A6) To obtain the calibration curve, similar results are obtained regardless of whether the kernels are integrated over about a pressure scale height, σ z or over the smaller width σ z = σ z /D and then the result is multiplied by D, that is, C ≈ D C , see Eq. (23).
Furthermore, a few more assumptions were (at least implicitly) made by HDS2012. We state these assumptions here for completeness; these assumptions are not be studied here but should be addressed in future work: (B1) In fitting the signal-to-noise ratio, HDS2012 followed the assumption of Gizon & Birch (2004) that the noise decreases with 1/ √ T . A potential 1/T term (see Eqs. (13)

Scale-dependent noise correction is important
For MURaM-type flows, we find that the signal-to-noise ratio significantly depends on the spatial scale and is not independent of scale, as assumed by HDS2012, see the top panel in Figure 4 and assumption (A1). The true signal-to-noise ratio for the Sun may be different to what we report here if the subsurface flows in the Sun are substantially different from MURaM flows. We strongly recommend that the noise correction be made as a function of harmonic degree (or wave number), possibly even as a function of azimuth (i.e., m or k x /|k|). The middle and bottom panels in Figure 4 show the effect of this scale-dependent noise correction on the spectrum of travel times and the estimated flow amplitudes. For MURaM flows, a constant signal-to-noise ratio for all scales clearly leads to an underestimation of the flow amplitudes at the largest scales. Impact on estimated velocity spectrum noise-correction (S/N ) from this work noise correction (S/N ) from HDS2012 exact noise correction (noiseless data τ fft NR ) Fig. 4. Inverse of the signal-to-noise ratio (top) and its effect on the noise reduction (middle) as well as the estimate of the spectrum of horizontal flows (bottom, as a velocity amplitude per mode) for different approaches. The signal-to-noise ratio is clearly scale dependent, in contradiction to assumption (A1). We use = kR to convert from wave number into harmonic degree. We do not show results for = 0 because this corresponds to an arbitrary change in reference frame.
the flow) around the target depth of 0.96 R . Figure 1 shows that the travel times are in principle sensitive to all flow components and to a broad range of depths, including flows near the surface. However, there is some focus on the target depth for the sensitivity to u x flows, see Figure 1 (bottom left panel).
This is also reflected in the contribution of the different flow components and depth layers to the travel-time signal, which we show in Figure 2. Although u x (top left) contributes most to the signal (bottom left), u y and u z also contribute significantly (top center and right). The contribution from the surface layers (bottom center) is also large. However, the signal from the deeper u x flows (bottom right) is visible in the signal, which means that u x flows around the target depth might be detected using this method.

Calibration curve can be reproduced, but depends on assumed flow properties
Second, we find that the computation of the calibration curve, which was done by HDS2012 using simulations of wave propagation, can be reproduced, see the blue line in Figure 5. The agreement is quite good when compared to Fig. 2 of the supplementary material of HDS2012, even when we computed it on the coarser-grained wave number grid of the MURaM simulation, see the green dashed line in Figure C.2. However, HDS2012 computed the calibration curve assuming a width of convective features of σ z = 1.74 Mm and later rescaled the result by a factor of D = 9.64 to mimic an effective width of σ z = Dσ z , which is similar to the mixing length or 1.8 pressure scale heights at 0.96R (see Sec. 2.4.2 and Birch et al.,in prep.). We compare this to directly assuming a larger width of convective features of σ z = Dσ z , see C in Figure 5. This alternative calibration curve is lower. Using C thus results in a correspondingly higher estimated upper limit, see Eq. (24).
In other words, the use of the width σ z = σ z /D and rescaling the calibration curve by using DC as done by HDS2012 effectively results in a lower estimate for the upper limit. This difference arises because the kernel is not constant in the area σ 2 z around the target depth, see Fig. C.4. This increases the value of DC compared to C . Consequently, assumption (A6) is thus not met and the resulting inequality is in the wrong sense. As the value of σ z is an assumption on the vertical correlation length of the flow, an unknown property of the Sun enters the method here. In addition, the vertical correlation function is approximated by a Gaussian, although its shape is not known, see the discussion in Section 3.7.
The introduction of the rescaling with D acts similar to a free parameter. The overall effect of this is further discussed in section 3.6.

Concept of a calibration curve is generally valid
Furthermore, we confirmed the assumptions (A2)-(A6) that were made to derive the calibration curve, see Figure 6. While some of the inequalities were met, some were violated, although not substantially. We conclude from this that the concept of a calibration curve is generally valid when the resulting estimate is rather understood as a rough estimate of the amplitude of the flow and not a strict upper limit. We recall that the calibration curve depends on the assumed vertical correlation function, which is not known for the Sun, see the discussion in Section 3.7.

Azimuthal averages have to be treated more carefully
We find that when the effect of off-diagonal flow-correlation is neglected, assumption (A2) is not justified because the offdiagonal elements make a significant negative contribution, see Figure C.6. The reason for this is as follows. When the dependence of K * x K y and u * x u y on the direction of kk is considered, we observe that the two quantities are anticorrelated, see the left panel in Figure C.5. This is the case for almost all k and results in a negative total effect of the off-diagonal terms on the travel-time spectrum. It might be possible to remove this anticorrelation in future studies by using a pure point-to-point travel-time geometry, where cross correlations are not averaged over arcs as done by HDS2012, or by considering a decomposition in divergent and vortical flows, rather than x and y components, see also the discussion in Section 4.  Fig. 2 in the supplementary material) can be reproduced (blue solid curve). When a more realistic width σ z = 16.8 Mm = 1.8 H P of flow features is used to compute the calibration curve, a change is seen (dashed orange curve), which would lead to an increase in the estimated upper limit, see assumption (A6). The crosses and pluses indicate the grid points in harmonic degree, which were obtained by converting horizontal Fourier modes from Cartesian to spherical geometry (see Appendix A.4). The kernels and thus the calibration curves shown here were computed on a spatial grid with a four times larger spatial extent than the MURaM grid, therefore the resolution in Fourier space, h l = rh k = r 2π L , is four times finer than in other plots in this paper. See Fig. C.2 for a more detailed comparison.
We find that the diagonal elements have a positive contribution (everywhere but for a small region for d = z, see Figure C.6). As a consequence, neglecting d = d = y and d = d = z, as in assumption (A3), is justified to obtain an upper limit. However, we find that all flow components contribute significantly to the travel-time spectrum, see Figure 6 (compare the blue, orange, and green curves in the left panel). This is consistent with all flow components contributing to the travel-time signal (see Sec. 3.2). To obtain an accurate estimate of the flow spectrum, the effect of all flow components thus has to be taken into account.
Similar to the off-diagonal case, neglecting the dependence of K * x K x and u * x u x on the direction of k introduces an error, see the right panel in Figure C.5. The magnitude of the error depends on k and r/R , but it is generally about 20% (as the effect on the spectrum, not on the mode amplitude). The resulting effect on the travel-time spectrum is of the wrong sign and artificially increases it, see Figure 6. Strictly speaking, assumption (A5) is thus not justified.

Major contribution from near-surface flows, small contribution from target depth
We find that the major contribution to the travel-time spectrum comes from near the surface. Only about 10-30%, rarely up to 40%, of the travel-time spectrum is produced below 0.99R or in a Gaussian envelope with width σ z around the target depth, see Figure C.3. Assumption (A4) is thus well satisfied as an inequality, but is far from being a good approximation. This result is consistent with the sensitivity kernels peaking near the surface (Fig. 1) and with the travel-time signal having a dominant contribution from the surface (Sec. 3.2).

Rather a rough estimate than a strict upper limit
Finally, we compared the resulting estimates of the flow amplitude with the actual estimates in Figure 7. Taking all previous results into account, we find that it is more adequate to consider the estimates to be rough estimates of the magnitude of the flow than a strict upper limit. Equivalently, we find it to be more accurate to write an approximate equality sign instead of an inequality in most cases for assumptions (A1)-(A6). Figure 7 compares the estimateŝ with the amplitude of the time-averaged flow, u fft x ( , z T ), which was averaged over six snapshots that were taken within the period of one day (see Sec. 2.1). We averaged all estimates and actual flow amplitudes over the 11 available realizations. These averages were taken over the corresponding spectra before the square root was taken to obtain the flow amplitude.

Why the method gives the correct order of magnitude as a result
Considering the results shown in Figure 7, we conclude that for MURaM-like flows, the method of HDS2012 just as applied in that paper (blue curve) gives the correct order of magnitude of the flow within a factor of about two for the larger scales with 90. For scales with < 60, the estimate is not an upper limit because it underestimates the true flow by a factor of about two. For the smaller scales with > 100, the estimate is too large by nearly an order of magnitude, although it is a true upper limit.
When the noise correction is applied correctly and the free parameter D is eliminated (solid orange curve), the agreement is better. In nearly the entire range ( < 110), the improved estimate constitutes an upper limit and is consistent with the actual flow spectrum within a factor of two.
Given that for both cases some of the inequalities are not met, see Figure 6 and Section 3.4.1, the question remains why the method works, at least for estimating the order of magnitude. To understand this, we first note that considering only the effect of d = x flow correlations roughly gives the right amplitude of the travel-time spectrum. This is because the cumulative effects of the other flow components are partly positive and partly negative, and roughly cancel out, see Figure 6 (blue versus green curve in left panel) and Figure C.6 (cumulative effect of assumptions A2 and A3).
When we only take the flow near the target depth into account, the travel-time spectrum is greatly reduced, see Figure C.3. The flows around the target depth only contribute about 10-40% to the squared signal. See also the (A4) curve in Figure 6 and the corresponding estimate in the right panel in Figure 6. This effect is almost completely compensated for by the amplification from the change from the calibration curve C to DC , see middle panel in Figure 6. By this change, a free parameter D is effectively introduced by HDS2012, which compensates for the inaccuracies that occurred previously. We can only speculate about the reason why the compensation using D is apparently of the correct magnitude to yield a tight upper limit, as can be seen in the middle panel of Figure 6 and for the dashed line in Fig. 7. It may be due to chance or arise because Hanasoge et al. Comparison of final estimateŝ u fft,HDS2012 x (N/S = 4.7 ± 0.8, using ClD) u fft x (N/S from this work, using ClD) u fft x (N/S from this work, using C l ) u fft x from simulation (0.96R ) u fft x for noiseless data using C l Fig. 7. Comparison of estimates of the spectrum of convective flows obtained from the synthetic data based on flows from a MURaM simulation. The method of HDS2012 applied to this synthetic data (solid blue line) can be improved by using a scale-dependent signal-to-noise correction (dashed orange line, this work). Both estimates give the correct order of magnitude of the flow (green line), although the estimate from the method of HDS2012 is not a strict upper limit. When the underlying assumptions are improved (solid orange line), our best estimate results in a larger upper limit.
(2010) and HDS2012 calibrated the inferred spectrum of convection with the flows from simulations, and hence intuitively chose a value for D that gives the correct magnitude of the estimated spectrum for the flow field used in their study.
For smaller scales ( > 110), the estimates become worse, as can be seen toward the right end of Figures 7 and 6. This is because for increasingly smaller horizontal scales, the sensitivity of the travel times to flows near the target depth decreases and the sensitivity to near-surface flows increases.

Importance of the vertical correlation structure and the surface contribution
Because the signal receives a major contribution from nearsurface flows (Sections 3.2 and 3.4.2, Fig. C.3), a correct conversion of a travel-time spectrum to a flow spectrum always has to take the relation of the surface flows to the deep flows into account, see especially Figure C.6. It would be possible to measure this correlation function for MURaM flows, but the true vertical correlation function for the Sun is not known. We therefore find as a major conclusion that the vertical correlation structure of the flow is needed as a prior knowledge for this method to be usable, or equivalently, that its result crucially depends on the assumed correlation structure. This point concerns not only the correlation of the surface with the deeper regions, but also the relative amplitude of the deep flows compared to those closer to the surface. In other words, without knowing the effect of the near-surface layers to the travel-time spectrum, it is impossible to infer the spectrum at greater depths. Inferences of the spectrum at depth from observations can thus only be trustworthy when the spectrum is inferred at several depths at once, for example, using information from different target depths and an inversion procedure. Using different target depths would in principle also include information on the vertical correlation structure. Alternatively, it may be possible to improve the observational setup so that the sensitivity kernel is better focused at the target depth (see the bottom left panel in Fig. 1 for the current lack of focus and Sec. 3.2). If the measurement were only sensitive to flows close to the target depth, the problem discussed above would be avoided. Reducing the sensitivity to near-surface flows would be a first step toward such an improvement.

Conclusions
We have performed an independent evaluation of the method suggested by HDS2012 to estimate an upper limit of solar convective flow velocities from helioseismic measurements at a target depth of r = 0.96R . We did this using synthetic helioseismic data for which the true flow is given by the output of a larger scale MURaM MHD convection simulation of the quiet Sun (Lord 2014) similar to Lord et al. (2014). Furthermore, we used consistent normalizations of the relevant spherical harmonic and Fourier transforms as proposed by Birch et al. (in prep.).
For the case of MURaM flows, we find that the estimates obtained using this method give the correct order of magnitude of the actual flow amplitudes within a factor of two. However, at the largest scales, the actual flow amplitude is underestimated and the upper limit is too low by a factor of about two. This is due to the scale dependence of the signal-to-noise ratio, which was not considered in the original study (HDS2012).
Because several simplifying assumptions enter the method we studied, the question arises why it roughly works. One way to look at this is by considering individual effects of every assumption made. When the correction for the signal-to-noise ratio is made, we find that when only the x component of the flow is considered, this is a reasonable approximate model for the travel-time spectrum, although the other components contribute significantly as well. Even though the flows around the target depth then only contribute 10-40% to the travel-time spectrum, the effect of this on the estimated upper limit is nearly compensated for by a rescaling of the calibration curve introduced by HDS2012. As a result, the estimated upper limit is just above the actual flow spectrum. If the rescaling of the calibration curve were not done, the upper limit would be higher by a factor of about 1.5. The dilemma is that the answer to how the calibration curve is correctly computed depends on the vertical correlation length of the actual flows in the Sun, the shape of the vertical correlation function, and the ratio of the amplitudes at depth to the surface, none of which are known. Another way to look at this is as follows. For MURaM flows, even though some inequalities that each corresponds to an assumption are not met, others are met with substantial leeway, so that the final estimate still is an upper bound, except at large scales because of the scale dependence of the signal-to-noise correction.
Several lessons can be learned from our study that should be taken into account in future inferences of the flow spectrum at depth. The first is the mentioned scale dependence of the signalto-noise ratio.
Second, the uncertainty of how the calibration curve is to be computed is substantial. This is because the correct calibration curve depends on the properties of the actual flows in the Sun, especially the vertical correlation function of the horizontal flows, which is not known. In addition, the travel-time measurements are more sensitive to near-surface flows than to flows near the target depth. As the ratio between the near-surface spectrum and the spectrum at depth is not known, the effect of the near-surface flows on the measured travel-time spectrum is hard to infer from observations of only one target depth. Possible solutions to this may be performing inversions using several target depths to infer the vertical correlation structure to improve the goodness of the focus at depth (e.g., using helioseismic holography, Lindsey & Braun 2000), or to fit several different models for power spectra and vertical correlation lengths to the data.
Third, the dependence of the kernel and the flow on the azimuthal angle in the Fourier domain has to be taken into account. As a result of this dependence, correlations of u x with u y have non-negligible effects on the travel-time spectrum. A solution may be decomposing the flow into divergent and vortical components, rather than x and y, or using point-to-point or point-toannulus geometry rather than an arc-to-arc travel-time geometry. Alternatively, because the problem decouples in k space, an inversion may solve for every kk individually.
This study may be extended in future work. First, the effect of the time dependence of the flow has not been studied, neither on the signal-to-noise ratio due to the decrease of the signal with time, nor by forward-modeling the effect of time-dependent flows on the travel times. Second, the potential relevance of a 1/T 2 term in the noise model, which may arise in addition to the usual 1/T dependence of the variance (see Eqs. (13) and (B.6) in Fournier et al. 2014), may be studied analytically. Third, this study should be extended to spherical geometry and to include different types of flow fields, including simulations with different spectral shapes for the flow field (e.g., Cossette & Rast 2016;Featherstone & Hindman 2016) and different flow topologies (e.g., Spruit 1997;Brandenburg 2016;Bekki et al. 2017;Hotta et al. 2019;Anders et al. 2019).
On the one hand, the results presented here give evidence that the method of HDS2012 correctly estimates the order of magnitude of the flow under the assumption that solar flows were similar to MURaM (if consistent normalizations of spherical harmonics are used, see also Birch et al., in prep.). However, the observed spectrum then leads to the conclusion that the convective conundrum persists at depth. If the vertical correlation function of flow components in the Sun is very different to MURaM, for example, with a very different correlation length, this conclusion may well be different. Determining the correlation function of solar convection may be simpler if a parameterized version of this function were available, for example, obtained by comparing different simulations.
On the other hand, we cannot resolve the inconsistency of these results with those obtained by Greer et al. (2015) using ring-diagram analysis at this stage. A deeper analysis of that method is required to resolve this issue, see also Nagashima et al. (2020). We are confident that by a careful and detailed evaluation of all aspects of both methods this discrepancy can be understood and resolved. In addition, global helioseismic methods might soon be used to infer the spectrum of solar convection at depth (e.g., Roth & Stix 2003;Woodard 2016;Mani & Hanasoge 2020). Helioseismology of convection is thus becoming a promising topic to be studied in the near future.
The noise covariance matrix is defined as In order to generate realizations of the noise, we set (A.9) or equivalently, A.11) where N(k) are complex Gaussians with independent real and imaginary parts and unit variance that are independent for different kk, except for the condition N(−k) = N(k) * .

Appendix A.3.2: Continuous setting
In a continuous setting (Gizon & Birch 2002), (A.20) To transform back to a discrete setting, we can substitute and we also obtain C = 1/h k .
Appendix A.4: Power spectra and equivalent mode amplitudes in spherical harmonics Following Birch et al. (in prep.), we converted the power spectrum of any quantity from Fourier space into an equivalent power spectrum in spherical harmonic space with the same variance at a single location or per area. As our Fourier transform definition is differently normalized than in Birch et al. (in prep.), we here summarize all formulas appropriate for the convention chosen here.
For any quantity q (e.g., q = τ or q = u x ), we write where h k r is the equivalent resolution in harmonic degree, I = { j h k | j = 0, . . . , N x − 1}, the sets I are defined in Section 2.4.2, and we use = kr to convert wave number into spherical harmonic degree. We then define the amplitude per mode by where q( ) 2 is an interpolated power per spherical harmonic mode, where P fft q,i is an interpolated version of P fft q . As we work in Cartesian geometry, this is equivalent to assuming that the radial coordinate r is approximately constant throughout the domain. We thus assume r = const = R . For u x at r = 0.96R , the difference due to that compared to assuming r = const = 0.96R is only 2% ≈ 1 − √ 0.96. The corresponding energy spectrum E q as defined by Birch et al. (in prep.)  In summary, the quantity E q is directly comparable to E φ from Birch et al. (in prep.) when we here choose q = u x .  Fig. 5. When the kernels are interpolated onto the MURaM grid, the results do not substantially change (dotted curves), except at ≈ 20. The crosses and plusses indicate the grid points in harmonic degree, which were obtained by converting horizontal Fourier modes from Cartesian into spherical geometry (see Appendix A.4). The kernels were computed on a spatial grid with a four times larger spatial extent than the MURaM grid, therefore the resolution in Fourier space, h l = rh k = r 2π L , is four times finer. Figure C.2 shows a more detailed comparison of the calibration curves C l and C l /D, computed for the kernel grid and after interpolating the kernels on the MURaM grid. Figure C.3 shows the contribution of the region below 0.99 R to the travel-time spectrum.
Finally, we show a few more detailed plots to verify the assumptions for selected spatial scales kR and depths z. Plots for other spatial scales and depths are qualitatively similar. As a shorthand for Figures C.4 -C.6, we write for any quantity Q(k) mean k Q(k) = mean |k|=k Q(k) = 1 N kr k∈I kr Q(k), (C.1)  x u x . For the plot, each quantity was divided by its maximum value in the given range. Error bars of C u were obtained from the standard deviation of the 11 available simulation snapshots and are similar for C K C u . At other depths and similar wave numbers, the plots are qualitatively similar. In the right panel, we also show that taking the mean before or after multiplying C K xy and C u xy changes the result (thick solid black line).