Issue 
A&A
Volume 658, February 2022



Article Number  A142  
Number of page(s)  9  
Section  The Sun and the Heliosphere  
DOI  https://doi.org/10.1051/00046361/202141743  
Published online  16 February 2022 
Multiscale deep learning for estimating horizontal velocity fields on the solar surface
^{1}
Department of Astronomical Science, School of Physical Sciences, The Gradiate University for Advanced Studies, SOKENDAI, 2211 Osawa, Mitaka, Tokyo 1818588, Japan
email: ryohtaroh.ishikawa@grad.nao.ac.jp
^{2}
National Astronomical Observatory of Japan, 2211 Osawa, Mitaka, Tokyo 1818588, Japan
^{3}
National Institute for Fusion Science, 3226 Oroshicho, Toki, Gihu 5095292, Japan
^{4}
Department of Physics and Astronomy, Aichi University of Education, Kariya, Aichi 4468501, Japan
^{5}
MaxPlanckInstitut für Sonnensystemforschung (MPS), JustusvonLiebigWeg 3, 37077 Göttingen, Germany
Received:
8
July
2021
Accepted:
1
November
2021
Context. The dynamics in the photosphere is governed by the multiscale turbulent convection termed as granulation and supergranulation. It is important to derive threedimensional velocity vectors to understand the nature of the turbulent convection and to evaluate the vertical Poynting flux toward the upper atmosphere. The lineofsight component of the velocity can be obtained by observing the Doppler shifts. However, it is difficult to obtain the velocity component perpendicular to the line of sight, which corresponds to the horizontal velocity in disk center observations.
Aims. We present a new method based on a deep neural network that can estimate the horizontal velocity from the spatial and temporal variations of the intensity and vertical velocity. We suggest a new measure for examining the performance of the method.
Methods. We developed a convolutional neural network model with a multiscale deep learning architecture. The method consists of multiple convolutional kernels with various sizes of receptive fields, and performs convolution for spatial and temporal axes. The network is trained with data from three different numerical simulations of turbulent convection. Furthermore, we introduced a novel coherence spectrum to assess the horizontal velocity fields that were derived for each spatial scale.
Results. The multiscale deep learning method successfully predicts the horizontal velocities for each convection simulation in terms of the global correlation coefficient, which is often used to evaluate the prediction accuracy of the methods. The coherence spectrum reveals the strong dependence of the correlation coefficients on the spatial scales. Although the coherence spectra are higher than 0.9 for largescale structures, they drastically decrease to less than 0.3 for smallscale structures, wherein the global correlation coefficient indicates a high value of approximately 0.95. By comparing the results of the three convection simulations, we determined that this decrease in the coherence spectrum occurs around the energy injection scales, which are characterized by the peak of the power spectra of the vertical velocities.
Conclusions. The accuracy for the smallscale structures is not guaranteed solely by the global correlation coefficient. To improve the accuracy on small scales, it is important to improve the loss function for enhancing the smallscale structures and to utilize other physical quantities related to the nonlinear cascade of convective eddies as input data.
Key words: Sun: granulation / Sun: photosphere / methods: data analysis / methods: observational
© ESO 2022
1. Introduction
The dynamics in the solar photosphere are governed by thermally driven convection. This in turn produces cellular patterns known as granules that are observed in visible light continuum images. The bright areas with hot rising flows are surrounded by darker and cooler intergranular lanes. A typical granule has a diameter of approximately 1000 km and lasts for approximately 10 min (Nordlund et al. 2009 and references therein). The turbulent nature of the granular convection inherently creates smallscale flow structures that are smaller than the typical size of granules (e.g., Matsumoto & Kitai 2010, Katsukawa & Orozco Suárez 2012, Rempel 2018). These types of smallscale flows interact with magnetic fields and can produce the upward Poyinting flux. This in turn can drive various phenomena, such as explosions (Shibata et al. 2007; Toriumi et al. 2017), jets (Hollweg et al. 1982; Iijima & Yokoyama 2017), and heating (van Ballegooijen et al. 2011; De Pontieu et al. 2012), in the upper atmosphere, chromosphere, and corona. Supergranulation is another convective pattern observed on the solar surface; it is characterized by horizontal flow fields with a large spatial scale of about 30 Mm (Rieutord & Rincon 2010) and a typical lifetime of about 1.7 days (Hirzberger et al. 2008). Photospheric magnetic fields are passively advected into the edges of supergranules and form network structures (Gošić et al. 2014). Recent observation found that persistent vortex flows exist at supergranular vertices, and magnetic flux can be concentrated in the vortices (Requerey et al. 2018).
We can obtain the lineofsight (LOS) component of the flow velocities by spectroscopic observation via the Doppler effect. Conversely, to date, there are no direct methods for observing the components perpendicular to the LOS. These components correspond to the horizontal velocity on the solar surface in disk center observations. The most commonly used method for estimating the horizontal velocity field is local correlation tracking (LCT; November & Simon 1988). This method uses two consecutive images and computes the crosscorrelation, and thereby detects the motions of granule patterns. Although the LCT technique can evaluate the horizontal velocity with good accuracy on a larger scale, its accuracy on a scale smaller than the granules is limited by as much as a factor of three (Verma et al. 2013) or more (Malherbe et al. 2018). The errors are preferentially high in the boundaries between granular cells (Louis et al. 2015). This is mainly due to the fact that the window utilized in the LCT method to compute the crosscorrelations blurs velocity fields. However, the accuracy on a smaller scale is important for evaluating the interaction between magnetic fields and horizontal flows because the magnetic fields are often concentrated in small regions in the photosphere (e.g., Parnell 2002).
An alternative approach involves identifying features that are observed as bright points in Gband or continuum images, and obtaining horizontal velocities by tracking them (Berger et al. 1998; Utz et al. 2010). The method can provide horizontal velocities of small magnetic features in intergranular lanes. Given that these magnetic features appear associated with strong concentrations of magnetic fields, we cannot obtain velocity fields and their spatial distribution across the entire areas by using this method.
There is a new method for estimating the horizontal velocity that employs a deep learning approach. Asensio Ramos et al. (2017) developed a model using a convolutional neural network (DeepVel), which was trained on a set of velocity fields simulated for the photosphere. DeepVel can estimate the horizontal velocity at various heights in the solar atmosphere without averaging. Tremblay et al. (2021a) showed that the Pearson linear correlation between the estimation and the answer was approximately 0.8. The correlation increases when the horizontal velocity fields are averaged over several granular lifetimes (Tremblay et al. 2018); the increment of the accuracy by taking the average shows the same trend as the LCT. The results of DeepVel and LCT are similar when they are averaged; however, DeepVel still has the advantage of reproducing the kinetic power spectra on subsupergranule scales. Tremblay & Attie (2020) developed a new architecture for DeepVel using the UNET architecture and found that it is more effective than other tracking methods. However, their accuracies have not been verified at various spatial scales.
There are several concerns of estimating the horizontal velocity on small scales, which should be clarified before we use the method for observational data. The motivation of this study was to evaluate the accuracy of the above methods on various spatial scales, to reveal their weaknesses, and to suggest possible improvements. We developed a new method for estimating the spatial distribution of the horizontal velocity based on a multiscale deep learning architecture with several sizes of convolutional kernels to capture the multiscale nature of the solar convection. We adopted the new method for three different numerical simulations of convection, and discussed the relationship between the power spectra of the velocities and performance of the network. Furthermore, we suggest a new measure for evaluating the scalebyscale velocity estimation.
2. Method
We developed a convolutional neural network that predicts the spatial distribution of the horizontal velocity from the spatial and temporal variations of vertical velocity and temperature^{1}. This model includes multiscale deep learning architectures: the convolution layers have receptive fields of various sizes (Fig. 1 and Table 1). The sizes of the kernels corresponded to 3 × 3, 7 × 7, 15 × 15, 31 × 31, and 51 × 51. This type of multiscale architecture exhibits an advantage in detecting the solar convection motion, which is highly turbulent to the extent that the horizontal velocities exhibit broad power spectra. This architecture is similar to the inception module (Szegedy et al. 2014). The inception module is a network that consists of kernels of various sizes (1 × 1, 3 × 3, and 5 × 5) and the pooling layer. By utilizing kernels of varying sizes the inception module can efficiently detect spatially concentrated structures in a single region, and also highly spread structures.
Fig. 1. Structure of the network. BN and SE denote the batch normalization and squeezeandexcitation, respectively. The number of kernels in each convolutional layer is presented in Table 1. 
Number of kernels in each convolutional layer.
In the first block of the model, 3D convolutions were conducted along the spatial and temporal axes, and the channels corresponded to the different physical quantities. After the first block the data was reshaped, and thereby time and physical quantities were converged into new channels. Then, in the second block the convolutions were conducted only along the spatial axes. A potential problem corresponds to the large number of parameters due to the large size of kernels corresponding to 31 × 31 and 51 × 51. To reduce the number of parameters, we also included 1x1 convolutions before each convolution layer. Furthermore, some of the outputs of the convolutions were highly correlated. This can decrease the efficiency of the optimization. The 1 × 1 bottleneck layer can partially resolve this problem. After the 3D convolutions, the feature maps have four dimensions: two spatial axes, temporal axes, and channels. The reshape layer concatenates the temporal axes and the channels to change the structure of the feature maps into three dimensions. The “squeezeandexcitation block” (Hu et al. 2017) can improve the performance of the network by modeling interdependencies between the channels. This squeezeandexcitation block produces a collection of modulation weights for the channels. These weights are applied to the feature maps by multiplying and the results are fed into the subsequent layers. Additionally, we included the batch normalization (Ioffe & Szegedy 2015) after all the convolutions. This normalized the outputs of convolutions into zero mean and unit covariance, and this in turn accelerated the training. All the convolution layers were initialized with a random method (He et al. 2015a). Asensio Ramos et al. (2017) developed a deep neural network model (DeepVel) to estimate the horizontal velocity. Their model was based on ResNet (He et al. 2015b), which consists of deeply stacked layers with only 2D convolutions^{2} with a receptive field of 3 × 3. The total number of trainable parameters of our model corresponded to ∼4.0 × 10^{5}, which was less than that of DeepVel by a factor of 4.
3. Data
To train the neural network, we used numerical simulation data in three different types of convection models. By comparing several cases with different energy injection scales and spectral properties, we can evaluate and discuss the versatility of the neural network.
The first model is a convection model in which convection is driven via cooling at the top boundary. We termed this model nonlocal. The 3D compressible magnetohydrodynamic (MHD) turbulence without any rotations was considered in a Cartesian domain by covering the depth of the convection zone (Masada & Sano 2016). The horizontal sizes of the domain were four times larger than the vertical size. The superadiabatic condition with superadiabaticity δ ≃ 10^{−5} was imposed on only 5% of the region to ensure that the top boundary was convectively unstable, whereas the remaining part of the region remained adiabatic to ensure convective stability. The cool downward plumes produced near the top boundary drove the convection: fast downward motions with the entrainment behavior appeared locally and transiently.
The second model was termed the local model. This is a convection model in which convection is driven by local entropygradient. The entire convective zone was convectively unstable with superadiabatic conditions. The same Cartesian domain was considered as that in the nonlocal model. Unlike the nonlocal model, convective cells were generated over the entire convective zone in which the spatial scale of cells were dependent on the local scale height in the vertical direction (Cossette & Rast 2016). Hence, the convective motions with larger cells, which were produced near the bottom region, were more pronounced in the local model than those in the nonlocal model.
In the numerical simulations of the nonlocal and local models, a spatial resolution of 256 × 256 × 128 was used. The other physical and numerical parameters were similar to those in an earlier study by Masada & Sano (2016) (see Yokoi et al. 2021 for details). The apparent convective cells in the nonlocal and local simulations roughly correspond to supergranular cells rather than granular ones. The spatiotemporal data of the simulated turbulent fields were utilized for training the network, where the MHD turbulence was fully developed to the statistically steady state after several tens of the turnover time of the convective cells. We used the temperature and vertical velocities at 3.3 Mm below the top boundary as the input data of the network. Furthermore, the horizontal velocities were at the same depth as the corresponding ground truth. We performed downsampling for these data by setting their sizes to 128 × 128 pixels, which covered a region of 748 Mm × 748 Mm. Specifically, 1000 snapshots were available for both the models. Because the input data do not represent the parameters on the surface, the trained network is not directly applicable to an observation. Instead, we compared the characteristics of the network with those trained with a realistic simulation, as described below.
We also used a 3D compressible radiation MHD simulation, which was calculated with the MURaM code (Vögler et al. 2005) as the third model. In the code the radiative energy exchange was solved via a nongray radiative transfer under the assumption of local thermal equilibrium (LTE) to reproduce more realistic granular scale flows in the photosphere. Hence, in this study we used the same simulation setup as that used by Riethmüller et al. (2014), in which the simulation box covered 6 Mm in both horizontal directions with a grid size of 10.42 km and 1.4 Mm in the vertical direction with a 14 km grid size. A unipolar homogeneous vertical magnetic field of B_{z} = 30 G was introduced as an initial condition, and the simulation was run for a total solar time of 16 h, which was long enough to reach a statistically stationary state. We obtained more than 1600 snapshots of the data cube with a mean cadence of 35 s. We extracted a small region, which covers 5.4 Mm in both the horizontal directions, and downsampled the region with a grid size of 42 km. Thus, the size of the region corresponded to 128 × 128 pixels. We used the temperature and vertical velocity at the surface with an optical depth of unity.
The snapshots of the three simulations are shown in Fig. 2. The top three panels show the spatial distributions of the vertical velocities (U_{z}), while the middle three panels show those of the horizontal velocities (U_{y}). The vertical and horizontal velocities of each model were normalized with averages and standard deviations. The bottom left panel shows the probability density functions of the horizontal velocities of the three simulations. The fast velocity component existed in certain localized regions within short lifetimes. This appeared as a nonGaussian distribution in the probability density function. The power spectra of the horizontal and vertical velocities are shown in the bottom center and right panels. Here, the power spectrum of a physical quantity X is defined as
Fig. 2. Representation of the training data sets. The spatial distributions of the vertical velocities (top panels) and Y component of the horizontal velocities (middle panels) of the nonlocal (left), local (center), and MURaM (right) simulation data. The bottom panels show the probability density functions (left), power spectra of the horizontal velocities (center), and power spectra of the vertical velocities (right) of the three simulations. The vertical and horizontal velocities in the simulations are normalized with zero and unit dispersion. This satisfies Eq. (3). 
where Δk denotes the sampling interval of the wavenumber and denotes the Fourier transform of the spatial distributions of the physical quantity X. The Fourier transform of X is written as
where N^{2} denotes the number of pixels in an image, which corresponded to 128 × 128. Then the power spectrum was consistent with that defined in a previous study (Rieutord et al. 2010) and satisfied
where the overline denotes the average over the entire FOV. Given the normalization of the vertical and horizontal velocities, the dispersion corresponds to unity.
The power spectra of the vertical velocity showed their peaks. These peaks corresponded to the typical scales of the convective cells, while the spectra were broad. The power spectra of the horizontal velocity exhibited high power on large scales, whereas they did not exhibit clear peaks, especially in the nonlocal and local models.
4. Training process
We used the spatial distributions of temperature and vertical velocity with a size of 128 × 128 pixels and three consecutive frames as the input data. The output of the network was the spatial distribution of the horizontal velocity (U_{x} or U_{y}). We used the mean squared error between the horizontal velocities predicted by the network and those in the original data set for the loss function. The loss function was expected to be minimized in the training. Each physical quantity in the data set was normalized to zero average and unit standard deviation. We prepared 350 sets of data for the training, 40 sets for the crossvalidation, and 40 sets for the test. Although the snapshots in the data sets were temporally consecutive, the training, crossvalidation, and test data sets were sufficiently separated, with an ample time interval that was longer than the turnover times of the convection in all the simulations. The optimization of the network was performed by using the Adam firstorder gradientbased optimization (Kingma & Ba 2014).
Given the normalization of the training data set, the average and standard deviation of the horizontal velocity predicted by the network were also near zero and unity, respectively. When we develop a network for application to actual observational data, the network should be supervised by realistic numerical simulations of the photosphere, such as MURaM, with the spectral line synthesis and degradation with respect to the pointspreadfunction of observational instruments. The network should predict the absolute value of the horizontal velocity without any normalization. Hence, restoration of the velocity from the normalized value after the prediction by multiplying the standard deviation of the horizontal velocity in the simulation or that obtained with recent observations (e.g., Oba et al. 2020) can potentially be an option.
5. Results
In this section we show the comparison between the original simulation and model prediction of the horizontal velocities for the three convection models. In the LCT and DeepVel, only the emergent intensity distributions were used for the estimation. Furthermore, we used the vertical velocity distributions for the input of the network and examined their benefits. Table 2 lists the results with different inputs for each convection model. The results show that a higher accuracy of estimation was realized by using the vertical velocity when compared to the temperature distributions. This result is consistent with Tremblay & Attie (2020). By using vertical velocity and temperature, the network exhibited the highest global correlation coefficient. However, it was slightly smaller than that obtained by using only vertical velocity for the local model. In the rest of this paper we focus on the network that uses the vertical velocity and temperature as the input.
Global correlation coefficients with different inputs.
Figure 3 compares the horizontal velocity distributions in the simulation and prediction for each convection model. In the figure similar cellular patterns are observed in the simulated and predicted distributions. The global correlation coefficients were 0.948, 0.948, and 0.771 for the nonlocal, local, and MURaM models, respectively. The performance of the network for the MURaM model was not as good when compared with those for the nonlocal and local models in terms of the global correlation coefficients, and the predicted patterns appeared blurred. The right column in Fig. 3 shows the differences between the simulations and predictions. The difference was expected to be high at the boundaries between the convection cells. This was similar to the trend observed in the case of the LCT method (Louis et al. 2015). Although only the results for the U_{y} are shown in the figure, the results for the U_{x} are similar because of the symmetric setup of the simulation and the network.
Fig. 3. Spatial distributions of the vertical velocity, Y component of the horizontal velocities during the prediction and simulation, and the difference between them in the nonlocal model (top), the local model (middle), and the MURaM model (bottom). The vertical velocity and horizontal velocity in the simulation are normalized as zero mean with unit dispersion. The vertical velocity and temperature are used as inputs even though only the vertical velocity is shown in this figure. 
Figure 4 shows the comparisons between the simulated velocity fields and predicted velocity fields in terms of the power spectra and histograms. The power spectra of the predicted horizontal velocity distributions are similar to those of the simulation in the larger scales. However, the power spectra were slightly underestimated for all the scales, and noticeable errors were observed for the smaller scales. In the case of the MURaM model, the network significantly underestimated the velocity on small scales. The occurrence of the high velocities became less frequent in the predictions. The blur in the images blended the highvelocity components, which were preferentially localized in the small regions.
Fig. 4. Comparison between the simulated velocity field (red) and predicted velocity (blue) in terms of power spectra (left) and histograms (center). The histograms of normalized dot product (see Eq. (4)) are shown in the right column. 
We also investigated dot products between the estimated and simulated horizontal velocity vectors. The normalized dot product at each pixel is defined as
where v and v_{0} represent the estimated and simulated horizontal velocity vectors, respectively. This definition is consistent with Tremblay & Attie (2020) and Tremblay et al. (2021b). This dot product indicates the difference of the orientation of the vectors. The histograms shown in the right column of Fig. 4 have a peak around A ∼ 1, which indicates that the estimated and simulated horizontal velocity vectors are well aligned in most regions. The histograms have tails toward the negative dot products, although the fraction of the tails is small. The tail is relatively large in the histograms of the MURaM model. The averages of the normalized dot products are 0.92, 0.91, and 0.70 for the nonlocal, local, and MURaM models, respectively.
To examine the performance of the network, we defined the coherence spectrum, which represents the correlation at each wavenumber. This has never been introduced in past studies including as Louis et al. (2015) and Asensio Ramos et al. (2017). The coherence spectrum of distributions of two physical quantities X and Y can be defined as follows. First, we calculate the crossspectrum of X and Y as
where the asterisk denotes the complex conjugate, and the ensemble average ⟨⟩ is calculated over all 40 frames in the test data. Finally, we define the coherence spectrum by normalizing the crossspectrum with the average power spectra as
The crossspectrum is equivalent to the Fourier transform of the crosscorrelation function of X and Y. Therefore, the coherence spectrum γ_{XY}(k) represents the correlation between the two distributions at each wavenumber. A useful relationship exists among the coherence spectrum, power spectra, and global correlation coefficient:
Here R_{XY}(0, 0) denotes the global correlation coefficient without any translation (see Appendix A). This equation indicates that the global correlation coefficient is equivalent to the average of the coherence spectrum weighted with the power spectra. Hence, this highlights the importance of the coherence spectrum analysis, shown in this study, in examining the prediction accuracy scalebyscale, particularly in smallscale structures. As evident in the local model, the power spectrum of the horizontal velocity exhibited high power on large scales. Subsequently, the coherence spectra of large scales exhibited the importance of the global correlation coefficient. Conversely, the power spectrum of the horizontal velocity in the MURaM model exhibited relatively high power on small scales when compared to the nonlocal and local models. This indicated a greater importance of the coherence spectra on small scales.
The results of the coherence spectra γ_{XY}(k) for the three convection models are shown in Fig. 5. We reconstructed the DeepVel and trained it with three convection simulations with vertical velocity as the input, while the original DeepVel was trained with the emergent intensity. The performance of our network was similar to or better than that of DeepVel for all the simulation data and at all the wavenumbers, with the exception of the large scale in the nonlocal model. However, it was still difficult to reproduce the smallscale structures. In the nonlocal and local models, the coherence spectra exceeded 0.9 for small wavenumbers, and they drastically decreased to 0.3 or less for large wavenumbers. Furthermore, even the global correlation coefficients for both the models exceeded 0.9. The discussion on the smallscale structures should be treated with care even if the global correlation coefficient is high. In the MURaM model, which exhibited lower steep power spectra of the horizontal velocity (Fig. 2), the performance was limited to a wide range of wavenumbers when compared with those of the nonlocal and local cases. However, a rough trend and better accuracy on the large scale were observed.
Fig. 5. Coherence spectra between the simulation and prediction as a function of the wavenumber with our network (solid) and DeepVel (dashed; Asensio Ramos et al. 2017). It should be noted that the vertical velocity distribution rather than the emergent intensity is used for the input to DeepVel. The coherence spectra show strong scaledependence. 
When comparing the results of the three convection models, it can be observed that the different onset of the significant decrease in the coherence spectrum appears on smaller scales than on the energy injection scales where the power spectra of the vertical velocities have their peaks. This signature suggests that the decrease in performance is not simply due to the small size of the spatial patterns, but is also related to turbulent structures. For example, the energy injection scale of k ∼ 0.7, which is observed from the power spectrum of the vertical velocity in the nonlocal case (Fig. 2), is correlated to the onset wavenumber for degrading the prediction accuracy toward the high wavenumber region. This is attributed to the nonlinear cascade of convective eddies. As presented in Table 2, the high performance of this network relies on the vertical velocity. This implies that the relationship between the horizontal and vertical flows is rather simple on large scales, whereas it is more complex on small scales, probably due to the nonlinearity.
The small global correlation coefficient for the MURaM model is related to two characteristics of the convection: the large energy injection scale and the small slope of the power spectra. The energy injection scale is higher than that of the nonlocal model, which diminishes the coherence spectra over a wide range of spatial scales. In combination with the higher power on small scales, the resultant global correlation coefficient becomes small for the MURaM model. A large global correlation coefficient was realized for the local model even though the energy injection scale was similar to that in the MURaM model. This was due to the fact that the power spectrum of the horizontal velocity in the local model exhibited a steep slope, which in turn induced large weights on large scales.
To improve the accuracy on smaller scales, the optimization of the small kernels should be performed more efficiently. A potential method of realizing this involves improving the loss function. In this study the network was trained by minimizing the mean squared error between the prediction and the simulation. Given that the horizontal velocity distribution exhibits high power on large scales, as shown in Fig. 2, larger weights are used on the largerscale structures for the loss function. This can promote the learning of the largescale structures and can inhibit the detection of the smallscale structures. Hence, an appropriate loss function can improve the optimization. An alternative method involves increasing the number of physical quantities to the input. In this study, the spatial distributions of the vertical velocity and temperature were used as the input data. It can be beneficial to use other physical quantities that are associated with the nonlinear process, such as the velocities at multiple heights. Furthermore, the spatial distributions with smallscale enhancement obtained via preprocessing can also aid in improving the performance.
6. Conclusions
We developed a novel convolutional neural network to estimate the spatial distribution of horizontal velocity by using the spatial distributions of temperature and vertical velocity. This new network was comprised of convolutional layers with various sizes of receptive fields. This led to the efficient detection of spatially spread features and concentrated features. Given that the velocity distribution driven by the convection exhibited multiscale structures and broad power spectra, this type of a multiscale network exhibited an advantage in detecting the solar surface convective motion. Our network exhibited a higher performance on almost all the spatial scales when compared to those reported in previous studies.
In most previous studies the evaluation of the method was performed solely via the global correlation coefficient between the simulated velocity field and the predicted velocity field. This in turn cannot show the accuracy on different spatial scales. To evaluate the accuracy on each scale, we introduced the coherence spectrum, which represents the correlation at each wavenumber. The evaluation with the coherence spectrum revealed the strong scaledependence of the accuracy. The horizontal velocities on large scales were accurately predicted with our network, while those on small scales were limited. Recently, the smallscale vortical motions in the solar photosphere has attracted significant attention observationally (Bonet et al. 2008; Vargas Domínguez et al. 2011) and theoretically (Shelyag et al. 2011; Moll et al. 2011). However, given the rapid decrease in the accuracy in estimating the horizontal velocities toward the small scales, we should be careful when discussing them observationally. The accuracy for the smallscale structures is crucial for calculating vorticities via a derivative of horizontal velocities.
By comparing the results of the three convection models we observed that the rapid decrease in coherence spectrum occurred on scales that were lower than the energy injection scales, which were characterized by the peaks of the power spectra of the vertical velocities. This implies that the network was not appropriately trained to reproduce the velocity fields on small scales generated by turbulent cascades. To improve the accuracy on small scales, it is potentially important to consider improving the loss function in the network. This can be realized by adding more weight to the smallscale structures and inputting other physical quantities, such as the vertical velocities at multiple heights, which can be related to the nonlinear process. These challenges can be explored in future studies.
In this study we developed a new model for estimating the horizontal velocity field with numerical simulation data and evaluated its performance with a new measure. By adopting our network to the highresolution observation data obtained via the SUNRISE3 balloonborne telescope (Katsukawa et al. 2020; Feller et al. 2020) and Daniel K. Inouye Solar Telescope (Rimmele et al. 2020; Rast et al. 2021), we can estimate the horizontal velocity distributions to a high accuracy over a wide range of wavenumbers.
The trained network and sample data set can be found at https://github.com/RTIshikawa/MultiScaleDL
Asensio Ramos et al. (2017) describe the convolution as “3dimensional”. The convolution was performed along spatial axes (x and yaxes) and also dealt with the channels. In this paper we use “2D” after the procedure name “Conv2D” in Keras. The 3D convolution in our network represents the convolution along the spatial and time axes as channels.
Acknowledgments
R. T. I. is supported by JSPS Research Fellowships for Young Scientists. This study is supported by the NINS program for crossdisciplinary study on Turbulence, Transport, and Heating Dynamics in Laboratory and Astrophysical Plasmas: “SoLaBoX” (Grant Numbers 01321802 and 01311904) and by JSPS KAKENHI Grant Numbers JP18H05234 (PI: Y. Katsukawa), JP19J20294 (PI: R. T. Ishikawa), and 20KK0072 (PI: S. Toriumi).
References
 Asensio Ramos, A., Requerey, I. S., & Vitas, N. 2017, A&A, 604, A11 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Berger, T. E., Löfdahl, M. G., Shine, R. S., & Title, A. M. 1998, ApJ, 495, 973 [NASA ADS] [CrossRef] [Google Scholar]
 Bonet, J. A., Márquez, I., Sánchez Almeida, J., et al. 2008, ApJ, 687, L131 [NASA ADS] [CrossRef] [Google Scholar]
 Cossette, J.F., & Rast, M. P. 2016, ApJ, 829, L17 [Google Scholar]
 De Pontieu, B., Carlsson, M., Rouppe van der Voort, L. H. M., & Rutten, R. J. 2012, ApJ, 752, L12 [NASA ADS] [CrossRef] [Google Scholar]
 Feller, A., Gandorfer, A., Iglesias, F. A., et al. 2020, SPIE, 11447, 1447AK [NASA ADS] [Google Scholar]
 Gošić, M., Bellot Rubio, L. R., Orozco Suárez, D., Katsukawa, Y., & del Toro Iniesta, J. C. 2014, ApJ, 797, 49 [Google Scholar]
 He, K., Zhang, X., Ren, S., & Sun, J. 2015a, ArXiv eprints [arXiv:1502.01852] [Google Scholar]
 He, K., Zhang, X., Ren, S., & Sun, J. 2015b, ArXiv eprints [arXiv:1512.03385] [Google Scholar]
 Hirzberger, J., Gizon, L., Solanki, S. K., & Duvall, T. L. 2008, Sol. Phys., 251, 417 [NASA ADS] [CrossRef] [Google Scholar]
 Hollweg, J. V., Jackson, S., & Galloway, D. 1982, Sol. Phys., 75, 35 [NASA ADS] [CrossRef] [Google Scholar]
 Hu, J., Shen, L., Albanie, S., Sun, G., & Wu, E. 2017, ArXiv eprnts [arXiv:1709.01507] [Google Scholar]
 Iijima, H., & Yokoyama, T. 2017, ApJ, 848, 38 [NASA ADS] [CrossRef] [Google Scholar]
 Ioffe, S., & Szegedy, C. 2015, ArXiv eprints [arXiv:1502.03167] [Google Scholar]
 Katsukawa, Y., & Orozco Suárez, D. 2012, ApJ, 758, 139 [Google Scholar]
 Katsukawa, Y., del Toro Iniesta, J. C., Solanki, S. K., et al. 2020, SPIE, 11447, 114470Y [NASA ADS] [Google Scholar]
 Kingma, D. P., & Ba, J. 2014, ArXiv eprints [arXiv:1412.6980] [Google Scholar]
 Louis, R. E., Ravindra, B., Georgoulis, M. K., & Küker, M. 2015, Sol. Phys., 290, 1135 [NASA ADS] [CrossRef] [Google Scholar]
 Malherbe, J.M., Roudier, T., Stein, R., & Frank, Z. 2018, Sol. Phys., 293, 4 [NASA ADS] [CrossRef] [Google Scholar]
 Masada, Y., & Sano, T. 2016, ApJ, 822, L22 [NASA ADS] [CrossRef] [Google Scholar]
 Matsumoto, T., & Kitai, R. 2010, ApJ, 716, L19 [NASA ADS] [CrossRef] [Google Scholar]
 Moll, R., Cameron, R. H., & Schüssler, M. 2011, A&A, 533, A126 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Nesis, A., Hammer, R., Schleicher, H., & Roth, M. 2012, A&A, 542, A85 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Nordlund, A., Stein, R. F., & Asplund, M. 2009, Liv. Rev. Sol. Phys., 6, 2 [Google Scholar]
 November, L. J., & Simon, G. W. 1988, ApJ, 333, 427 [Google Scholar]
 Oba, T., Iida, Y., & Shimizu, T. 2020, ApJ, 890, 141 [NASA ADS] [CrossRef] [Google Scholar]
 Parnell, C. E. 2002, MNRAS, 335, 389 [NASA ADS] [CrossRef] [Google Scholar]
 Rast, M. P., Nazaret Bello, G., Luis B. R., et al. 2021, Sol. Phys., 296, 70 [NASA ADS] [CrossRef] [Google Scholar]
 Rempel, M. 2018, ApJ, 859, 161 [Google Scholar]
 Requerey, I. S., Cobo, B. R., Gošić, M., & Bellot Rubio, L. R. 2018, A&A, 610, A84 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Riethmüller, T. L., Solanki, S. K., Berdyugina, S. V., et al. 2014, A&A, 568, A13 [Google Scholar]
 Rieutord, M., & Rincon, F. 2010, Liv. Rev. Sol. Phys., 7, 2 [Google Scholar]
 Rieutord, M., Roudier, T., Rincon, F., et al. 2010, ApJ, 512, A4 [Google Scholar]
 Rimmele, T. R., Warner, M., Keil, S. L., et al. 2020, Sol. Phys., 295, 172 [Google Scholar]
 Shelyag, S., Keys, P., Mathioudakis, M., et al. 2011, A&A, 526, A5 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Shibata, K., Nakamura, T., Matsumoto, T., Otsuji, K., et al. 2007, Sci, 318, 1591 [NASA ADS] [CrossRef] [Google Scholar]
 Szegedy, C., Liu, W., Jia, Y., et al. 2014, ArXiv eprints [arXiv:1409.4842] [Google Scholar]
 Toriumi, S., Katsukawa, Y., & Cheung, M. C. M. 2017, ApJ, 836, 63 [Google Scholar]
 Tremblay, B., & Attie, R. 2020, Front. Astron. Space Sci., 7, 25 [NASA ADS] [CrossRef] [Google Scholar]
 Tremblay, B., Roudier, T., Rieutord, M., & Vincent, A. 2018, Sol. Phys., 293, 57 [NASA ADS] [CrossRef] [Google Scholar]
 Tremblay, B., Cossette, J.F., Kazachenko, M. D., Charbonneau, P., & Vincent, A. 2021a, J. Space Weather Space Clim., 11, 9 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Tremblay, B., Reardon, K., Attié, R., et al. 2021b, Cambridge Workshop on Cool Stars, Stellar Systems, and the Sun, 204 [Google Scholar]
 Utz, D., Hanslmeier, A., Muller, R., et al. 2010, A&A, 511, A39 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 van Ballegooijen, A. A., AsgariTarghi, M., Cranmer, S. R., & DeLuca, E. E. 2011, ApJ, 736, 3 [Google Scholar]
 Vargas Domínguez, S., Palacios, J., Balmaceda, L., et al. 2011, MNRAS, 416, 148 [Google Scholar]
 Verma, M., Steffen, M., & Denker, C. 2013, A&A, 555, A136 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Vögler, A., Shelyag, S., Schüssler, M., et al. 2005, A&A, 429, 335 [Google Scholar]
 Yokoi, N., Masada, Y., & Takiwaki, T. 2021, MNRAS, submitted, [arXiv:2111.08921] [Google Scholar]
Appendix A: Coherence spectrum analysis
We introduce a new measure called coherence spectrum for evaluating the network. In this section we describe the definition of the coherence spectrum and provide a proof of Equation (7). When we have the spatial distributions of physical quantities X and Y, we can define the 2D crossspectrum by using the Fourier transform as follows:
The power spectrum is written as
where Δk is the sampling interval of the wavenumber. Furthermore, the crossspectrum is equivalent to the Fourier transform of the correlation function between X and Y, which corresponds to the Wiener–Khinchin theorem (Nesis et al. 2012):
Then, we obtain the relationship between the global correlation coefficient R_{XY} and 2D crossspectrum:
Here we define the crossspectrum by calculating the ensemble average ⟨⟩ and integrating the 2D crossspectrum as follows:
The phase of the averaged 2D crossspectrum corresponds to the average phase difference between the two Fourier components. Given the contour integration, the imaginary part of the crossspectrum converges to zero because is always satisfied. This integration is justified when the statistical isotropy is assumed or else the resultant coherence spectrum can be misleading. Consequently, the coherence spectrum defined below is a real number:
Hence, we obtain
where we use equation and the condition of the statistical steady state .
All Tables
All Figures
Fig. 1. Structure of the network. BN and SE denote the batch normalization and squeezeandexcitation, respectively. The number of kernels in each convolutional layer is presented in Table 1. 

In the text 
Fig. 2. Representation of the training data sets. The spatial distributions of the vertical velocities (top panels) and Y component of the horizontal velocities (middle panels) of the nonlocal (left), local (center), and MURaM (right) simulation data. The bottom panels show the probability density functions (left), power spectra of the horizontal velocities (center), and power spectra of the vertical velocities (right) of the three simulations. The vertical and horizontal velocities in the simulations are normalized with zero and unit dispersion. This satisfies Eq. (3). 

In the text 
Fig. 3. Spatial distributions of the vertical velocity, Y component of the horizontal velocities during the prediction and simulation, and the difference between them in the nonlocal model (top), the local model (middle), and the MURaM model (bottom). The vertical velocity and horizontal velocity in the simulation are normalized as zero mean with unit dispersion. The vertical velocity and temperature are used as inputs even though only the vertical velocity is shown in this figure. 

In the text 
Fig. 4. Comparison between the simulated velocity field (red) and predicted velocity (blue) in terms of power spectra (left) and histograms (center). The histograms of normalized dot product (see Eq. (4)) are shown in the right column. 

In the text 
Fig. 5. Coherence spectra between the simulation and prediction as a function of the wavenumber with our network (solid) and DeepVel (dashed; Asensio Ramos et al. 2017). It should be noted that the vertical velocity distribution rather than the emergent intensity is used for the input to DeepVel. The coherence spectra show strong scaledependence. 

In the text 
Current usage metrics show cumulative count of Article Views (fulltext article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 4896 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.