Application of neural networks to synchro-Compton blazar emission models

Jets from supermassive black holes in the centers of active galaxies are the most powerful persistent sources of electromagnetic radiation in the Universe. To infer the physical conditions in the otherwise out-of-reach regions of extragalactic jets we usually rely on fitting of their spectral energy distribution (SED). The calculation of radiative models for the jet non-thermal emission usually relies on numerical solvers of coupled partial differential equations. In this work machine learning is used to tackle the problem of high computational complexity in order to significantly reduce the SED model evaluation time, which is needed for SED fitting with Bayesian inference methods. We compute SEDs based on the synchrotron self-Compton model for blazar emission using the radiation code ATHE${\nu}$A, and use them to train Neural Networks exploring whether these can replace the original computational expensive code. We find that a Neural Network with Gated Recurrent Unit neurons can effectively replace the ATHE${\nu}$A leptonic code for this application, while it can be efficiently coupled with MCMC and nested sampling algorithms for fitting purposes. We demonstrate this through an application to simulated data sets and with an application to observational data. We offer this tool in the community through a public repository. We present a proof-of-concept application of neural networks to blazar science. This is the first step in a list of future applications involving hadronic processes and even larger parameter spaces.


Introduction
Active galactic nuclei (AGNs) are known to launch relativistic and collimated plasma outflows, known as jets.While jetted AGNs comprise a small fraction of the AGN population (Padovani et al. 2017), they have attracted a great deal of attention in recent decades because they are the most powerful and persistent sources of non-thermal electromagnetic radiation (for recent reviews, see Blandford et al. 2019;Hovatta & Lindfors 2019).Blazars are AGNs with jets that are directed toward us.The combination of the small viewing angle with the relativistic motion of the emitting plasma results in strong Doppler beaming, making blazars ideal sources for studying non-thermal radiation processes in jets.
The spectral energy distributions (SEDs) of blazar are central to their study and they can be thought of as a metric that maps the energy output of the jet across different photon energies, presenting a panoramic view of the radiative processes within the jet.Decoding the SED is pivotal in uncovering the prevailing physical conditions and radiative processes at play in the otherwise out-of-reach regions of extragalactic jets.However, a significant hurdle in this process is the fitting of the SED itself, which relies on a juxtaposition of the observed energy distributions with model predictions.Radiative models for jet emission are primarily numerical, as they require the solution of a system of stiff coupled partial differential equations (PDEs) describing the evolution with time of the distribution functions of radiating particles and photons (e.g., Dimitrakoudis et al. 2012;Gao et al. 2017;Cerruti et al. 2015;Gasparyan et al. 2022).Such numerical codes tend to have a high computational complexity, especially in the case of lepto-hadronic models 1 , where each SED computation could last from a few minutes up to an hour (depending on the numerical schemes used by the PDE solver, but also on the approximations made for each operator in the PDE).If such numerical models were to be used in a Markov chain Monte Carlo (MCMC) algorithm to determine the posterior distributions of the model parameters, the overall execution time could become prohibitively long.Addressing this computational bottleneck is central to accelerating the pace of discovery in blazar science.
Although the idea behind the application of neural networks (NNs) has been around for more than 50 y (McCulloch & Pitts 1943), its extended use has become more feasible thanks to hardware developments over the last decade.Inspired by recent trends in computational biology Wang et al. (2019) and similar applications in other disciplines e.g.,  (Wiecha & Muskens 2020;Alarfaj et al. 2022), this work investigates the possibility of replacing a numerical model of average computational complexity, namely, the synchrotron self-Compton2 (SSC, e.g., Maraschi et al. 1992;Bloom & Marscher 1996;Mastichiadis & Kirk 1997) with a NN (see Abiodun et al. 2018).In pursuit of this objective, the initial phase involves the meticulous assembly of a dataset derived from the numerical model.While the intricate processes of dataset generation are set aside for brevity, it is imperative that this dataset is comprehensive and uniformly spans the parameter space.Furthermore, this compilation should be both ample in size to enable rigorous training and analysis, and it should also be efficiently constructed to conserve time and resources.With a robust dataset in hand, the exploration of various NN architectures takes center stage.
The realm of machine learning (ML) is characterized by its blend of scientific rigor and artistic intuition; there exists no monolithic model that stands as the unequivocal solution.Each model is a unique amalgamation of structural nuances and algorithmic intricacies, tailored to address specific aspects of complex problems.Consequently, we will investigate a suite of NN configurations to identify those offering optimal performance and accuracy in the context of the defined problem space.
This paper is structured as follows.In Sect.2, we outline the methodology used to construct a model based on a NN, which includes the use of a numerical code to construct a dataset and the various NN tuning and evaluation tests.In Sect.3, we present a few scenarios where the NN trained model is coupled with a Bayesian interface to fit simulated data and actual blazar observations.We also provide notes on the efficiency of our approach compared to numerical codes.Finally, we present the conclusions of this work and some future aspects in Sect. 4.

Methods
Our approach (schematically shown in Fig. 1) is comprised of the following steps: (i) we built an ensemble of blazar SED models, (ii) we trained the NN using available datasets, and (iii) we Notes.Here, U() stands for a uniform distribution.
evaluated our results based on datasets that were not used in the training.In the following, we elaborate on the steps above.

Generation of dataset
To generate blazar SEDs, we use the radiative transfer code ATHEνA (Mastichiadis & Kirk 1995;Dimitrakoudis et al. 2012) that solves a system of coupled PDEs describing the evolution of relativistic particle distributions and photons contained in a non-thermal emitting source.For the current project, we used the leptonic module of the code that describes a magnetized source containing relativistic electron-positron pairs and photons.Electrons are continuously injected into the source at a fixed rate (or injection luminosity) and escape on an energy-independent timescale that is equal to the light-crossing time of the source.Photons are produced via electron synchrotron radiation and scattered to higher energies through inverse Compton scattering, while they escape on the light-crossing time from the source3 .For a detailed description of the modeling of each physical process, we refer to Mastichiadis & Kirk (1995), Dimitrakoudis et al. (2012).The SSC model parameters, which are used as an input to ATHEνA, are as follows; (i) the characteristic size of the emitting region, R, (i.e., its radius, assuming a spherical source in the jet rest frame), (ii) the magnetic field strength B (measured in the jet rest frame), (iii) the electron injection compactness ℓ e , which is a dimensionless measure of the power injected into relativistic electrons (defined as ℓ e = σ T L e /(4πRm e c 3 )), (iv) the power-law slope p of the electron distribution at injection (i.e., dN e /dtdγ e ∝ γ −p e ), and (v) the Lorentz factor range of radiating electrons, [γ min , γ max ], also at injection.
For the generation of the dataset, we used parameter values motivated by SSC models of blazar emission (e.g., Celotti & Ghisellini 2008).For instance, we take the lower bound of the electron distribution to range between 10 0.1 and 10 4 .To avoid very narrow electron distributions or cases where γ max < γ min (which would lead to code crashing), we set the upper cutoff of the distribution to lie between 10 2 γ min and min(10 8 , γ H ); here, γ H = eBR/(m e c 2 ) is an upper limit imposed by the Hillas condition (Hillas 1984), which ensures the magnetic confinement of the most energetic electrons in the source.Each variable is randomly drawn from a uniform distribution in logarithmic space (except for the slope p), with details given in Table 1.Given the theoretically known dependence of the synchrotron and SSC fluxes and the characteristic photon energies on model parameters, such as R and B (see, e.g., Mastichiadis & Kirk 1997;Finke et al. 2008), the selected ranges are wide enough to produce a diverse dataset, crucial for the NN training (see the subsections below).
Another parameter that should be taken into account for the SED fitting is the Doppler factor, δ, of the emitting region, which is used to transform the photon spectra from the jet comoving frame to the observer's frame.However, this is a nuisance parameter for the dataset creation, since the Doppler boosting can be applied at a later stage to transform the comoving photon spectra (unprimed quantities) to the observer's frame, namely: ν obs = δν and L ν,obs = δ 3 L ν (here, ν is the photon frequency and L ν is the specific luminosity).
To generate the dataset, we sampled the space of leptonic parameters and fed each set to ATHEνA.The system evolves for 5 R/c, which is sufficiently long for the establishment of a steady state (equilibrium solution of the PDEs).The parameters along with the photon PDE solution are concatenated to form the entry of each dataset.All SEDs were computed at the same 500 grid points of photon energies, logarithmically spaced between 10 −15 and 10 10 (in units of m e c 2 ).The SED fluxes returned by ATHEνA at each grid point (νF ν in code units) were re-normalized using min-max scaling between the values of 0 and 1, before being used for the training.The normalization of inputs is standard practice (Bishop 1995).In total, we created a dataset of 10 000 samples for the range of parameters listed in Table 1, within ∼500 CPU hours (i.e., about 3 min per model) on a desktop computer (AMD Ryzen 5950X).The generated dataset is split into 80% for training, 10% for validation, and 10% for testing.All datasets are stored in a single file with one dataset per line, the NN inputs are the 6 SSC model parameters and stored in the first 6 columns, while the output SED (i.e., 500 grid points) is stored in the following columns.

Training of the NNs
Once a dataset has been obtained, it is essential to select the appropriate type of NN artificial neuron and its corresponding structure.The prediction of the SED can be perceived as a regression problem, making the simple artificial neuron an apparent choice.However, it is also possible to approach the bins of the SED as sequential data.In this case, all three types of recurrent neurons become potential options for constructing the NN.We employed several NN topologies that varied with respect to the following parameters: number of hidden layers, number of neurons per hidden layer, and three types of neurons: an artificial NN with a deep stack of hidden layers, namely: an artificial neural network (ANN), a gated recurrent unit (GRU, Cho et al. 2014), and a long short-term memory (LSTM, Hochreiter & Schmidhuber 1997).The ANN takes an input vector, calculates a dot product with a vector of weights, adds a bias and passes the result through an activation function.Apart from the ANN cells, there are also recurrent neural network (RNN) cells, such as LSTM and GRU, that maintain a state (memory) of past inputs, making them well-suited for tasks such as time series predictions.A noteworthy point is that both LSTM and GRU do not suffer from the unstable gradient problem, which is crucial for successful training, and a thorough study of this was done by Glorot & Bengio (2010).
When handling time series-like inputs, recurrent cells are used to predict the next value and each cell in a layer feeds its output to its neighbor, thus forming a chain.The characteristic of a time-series input is that a value is bound to a given range that depends on the values of its neighbors, which means there cannot be a large derivative at each point of the series.The same holds for SEDs, so we can handle the energy values of the SED as a time series-like input, with the time being replaced by the photon frequency.Using a network with GRU or LSTM cells, ensures in a way that the output (i.e., in our case the SED) maintains a somewhat smooth shape.
Currently, there are a few ML frameworks available for several programming languages that can be used for practical applications.The Python programming language is a very popular choice in the astrophysics community and, as a result, there are a lot of scientific software packages available for use.The major ML packages available in Python are: Tensorflow created by Google Brain (Abadi et al. 2015), PyTorch developed by Facebook's AI Research lab (Paszke et al. 2019), scikit-learn built by David Cournapeau (Pedregosa et al. 2011), and Theano developed by the Montreal Institute for Learning Algorithms (The Theano Development Team et al. 2016).Tensorflow and Theano use Keras (Chollet et al. 2015), an open-source application programming interface (API) that provides a Python interface for NNs.It was developed with a focus on enabling fast experimentation and prototyping, being both user-friendly and modular.In this work we opted for Tensorflow.
We trained the three NNs using the 10k dataset.The hyperparameters and complexity of each NN are presented in Table 2.An additional necessary parameter is the number of epochs, which refers to the number of training cycles.The appropriate number of epochs must be carefully selected for each network to address the issue of overfitting.To determine the optimal number of epochs for each network, an initial training phase is conducted with a substantially large number of epochs (set at 3000), since experience has demonstrated that over-fitting, if it occurs, typically starts after 200 epochs.Finally, we can choose a loss function other than the typical mean squared error (MSE).Since our networks handle normalized data it was decided to use instead the mean squared logarithmic error (MSLE) as a loss function.The training process of these networks is observed in real-time using Tensorboard.This enables the monitoring of whether a network is overfitting or whether it has achieved a satisfactory loss error, which is on the order of 10 −4 .Whichever of these conditions is met first dictates the stopping point.The epoch at which training is stopped is noted and then the network undergoes retraining for this specific number of epochs.

Evaluation of results
After training our three NNs, we proceeded to the evaluation of the models by comparing the NN predictions to the SEDs from the test sample.We first used standard metrics, namely, the   3) show that the GRU is ranked first among the three metrics, while the ANN is ranked third by all criteria.We also visually inspect the test dataset, where we compare the prediction of the NNs against the result of ATHEνA -see Fig. 2. From this qualitative test, it became clear that the ANN failed to describe the SED, suggesting that more complexity has to be added to the network.Meanwhile, both GRU and LSTM delivered acceptable solutions (based on the eyeball test).The main difference was that GRU offered more smooth solutions for the SED, while the LSTM predictions had noticeably small amplitude scatter across neighboring values.In combination with the evaluation results listed in Table 3, we selected the GRU NN for the applications that follow.We note the fact that in our case, the GRU model 4 The MSE and MSA metrics are defined as 1 n Σ i (y i − ŷi ) 2 and 1 n Σ i |y i − ŷi | respectively, where y i is the predicted value, ŷi is the model value, and n is the number of grid points. 5This is an algorithm of complexity O(n 2 ) primarily used for measuring similarity between two temporal sequences, which may vary in time or speed, by allowing for a stretching or compressing of the sequences.
generates more smooth SEDs compared to LSTM is not a feature of the cell architecture and applies only to the particular trained model.If the training is repeated with the same number of epochs, the resulting model will be different due to the stochastic nature of the training algorithm (Kingma & Ba 2014).The evaluation and comparison of the models should always be repeated when a new network is trained.
In Fig. 3, we present random examples drawn from the test dataset that highlight the wide variety of SEDs (in terms of flux and shape) reproduced by the GRU NN, including spectra that are not typical of blazars.For instance, there are cases without a clear two-hump morphology due to the smooth superposition of the synchrotron and SSC components (see, e.g., panels a and b) or due to a very extended synchrotron spectrum (panel q), which are more reminiscent of the spectra of pulsar wind nebulae (see, e.g., Amato & Olmi 2021, for Crab nebula).Moreover, the sample contains spectra produced in the so-called inverse Compton catastrophe limit (Readhead 1994;Petropoulou et al. 2015b); these SEDs consist of three distinct components attributed to synchrotron, first-order and second-order SSC emission, with the latter carrying most of the bolometric power of the source (see panels l and o).In most cases, the trained NN describes accurately the numerical model (e.g., panels a, h, and i).There are also cases where the NN prediction misses the details of the model, but still captures the general trend (e.g., panels t, o, and p).As we show in the following section, such differences are not crucial for the interpretation of observed spectra in the context of fitting with Bayesian methods.

Simulated SEDs
A fundamental question pertains to whether a trained NN can be used to recover the physical parameters corresponding to an observed SED, provided the "correct" answer is already known.The only way to test this is to use simulated SEDs based on an SSC model with known parameters, which are then fitted by the NN; the fitted values can be then compared to the true values of the SSC model to address the question of whether the NN can constrain the parameters with sufficient accuracy.
To create a simulated SED, we selected 25 points of an SSC model computed with ATHEνA from a wide frequency range and added Gaussian noise with 0.1 standard deviation to the logarithmic flux values.We then attributed an error to each flux point between 0.1 and 0.4 in logarithm to mimic statistical uncertainties in the measurement.As blazars are observed in specific energy bands, we selected points that broadly correspond to radio, UV, X-ray, and γ-ray bands.By creating an incomplete SED with gaps in regions where the solution may have large derivatives (e.g., spectral cutoffs), we increased the difficulty for A185, page 4 of 12   the NN fitting.We also note that the simulated flux points may overshoot the baseline model of ATHEνA in certain frequency bands due to the randomness of the simulation and the small number of selected points.This mimics observational effects related to non-simultaneous observations and intrinsic source variability.We used the trained GRU NN to fit the data in Python with emcee6 (Foreman-Mackey et al. 2013).For constructing the log-likelihood function and fitting the data, we followed an commonly adopted methodology in similar problems (e.g., Karaferias et al. 2023;Stathopoulos et al. 2024).We also added a term ln f as a free parameter to account for the systematic scatter and noise of the simulated data and the NN model.By including ln f, we get an excess variance compared to statistical uncertainties, namely, σ 2 tot,i = σ 2 i + e 2 ln f , where σ i is the error of individual flux points.
Here, we present results for two characteristic cases.The simulated SEDs (markers) together with the corresponding SED from the ATHEνA code (dashed line) and the fitted models (shaded regions) are shown in Fig. 4. The corner plots of the posterior distributions are presented in Figs.A.1 and A.2.The NN model is able to successfully fit the simulated data and capture the overall shape of the true model.In terms of parameters estimations all parameters were consistent within 2σ with the original values used for simulating the data sets (see Table 4).

Observed SEDs
We demonstrate the capabilities of the trained NN model to fit a real blazar SED with emcee and UltraNest (Buchner 2021), which employs nested sampling to scan multi-modal posterior distributions.The latter allows us to better explore possible degeneracies in multi-parameter problems.
For this application, we chose 3HSP J095507.9+355101(Giommi et al. 2020;Paliya et al. 2020), which is a BL Lac object at redshift z = 0.557 (Paiano et al. 2020;Paliya et al. 2020) that belongs to the rare class of extreme blazars (Biteau et al. 2020).This blazar has been detected in GeV γ-rays by the Fermi Large Area Telescope (LAT) and is part of the 4FGL catalog (Abdollahi et al. 2020).3HSP J095507.9+355101 is an ideal test-bed because it combines measurements with small errors at lower energies with several upper limits in the γ-rays.Another reason for our source selection was that this target was also recently modeled with an emcee adaptation of the leptonic module of LeHaMoC7 , a Python-based lepto-hadronic code that offers a factor of ∼30-100 speed improvement compared to ATHEνA, while producing consistent results (Stathopoulos et al. 2024).Therefore, we can compare the efficiency and accuracy of SED fitting performed with a fast leptonic code and a trained NN.
We used the same dataset as Stathopoulos et al. (2024).The quasi-simultaneous observations in the UV, soft X-rays, and hard X-rays provide a detailed picture of the low-energy part of the spectrum.On the contrary, the high-energy part of the spectrum is less constrained observationally.For fitting the data and constructing the log-likelihood function, we followed Stathopoulos et al. (2024), taking into account asymmetric errors and upper limits in the Fermi data.For our application we defined uniform priors to cover all the parameter space used to build the datasets and a Doppler boost factor (in logarithm) in the range of [0,3].We ran emcee with 48 walkers and 10 000 steps each.We started with a guess value that resembles the general spectral shape and discard the first 1000 steps of each chain as burn-in.We also ran UltraNest with 400 live points, a maximum number improvement loops of 3, and a target evidence uncertainty of 0.5.The results of the fits are presented in Fig. 5.We find no noticeable differences between the emcee and UltraNest results, demonstrating that the SSC model has no multimodal posterior distributions (presented in Figs.A.3 and A.4).When compared to the fitting results obtained with a leptonic code, the posterior distributions of parameters obtained by the GRU NN and LeHaMoC (Stathopoulos et al. 2024) are identical in terms of median values, standard deviation, and overall distribution shape (as shown in the corner plot of Fig. A.3).Moreover, the physical processes that were neglected in the NN training (i.e., synchrotron self-absorption and γγ pair production) turned out to be negligible for this application, since both processes were included in the fitting with LeHaMoC.

Efficiency of NNs
Here, we briefly comment on the efficiency of the NN generated model in comparison to the other numerical codes used in our tests.
The ATHEνA code, first presented in the mid-1990s (Mastichiadis & Kirk 1995), did not target fitting large samples of data.As a result, the numerical scheme adopted in ATHEνA is not optimized for speed, but for accuracy (see also Cerruti et al. 2022, for comparison of ATHEνA against other proprietary codes).The typical execution time of ATHEνA for an SSC model is about 1-4 min.Moreover, the code may fail for certain input parameters (when these lead to very large derivatives in the PDEs), which cannot be determined a priori.For instance, about 4% of the cases failed when creating the dataset for the NN training.LeHaMoC, on the other hand, is more flexible for the generation of big datasets and its use in fitting algorithms (Stathopoulos et al. 2024): it delivers a factor of ∼30-100 improvement in speed compared to ATHEνA and does not crash regardless of the input parameters.When combined with the of emcee parallelization option, LeHaMoC can fit the SED of 3HSP J095507.9+355101within 1 day (using 16/32 CPU cores/threads).On the contrary, fitting of 3HSP J095507.9+355101, using the GRU NN model was completed in approximately 20 min on a single core, which is about 1,000 faster compared to LeHaMoC.
Therefore, implementing such a NN as GRU offers a huge improvement in computational time, as it performs a MCMC fitting with emcee in a matter of minutes.When combined with nested sampling algorithms like UltraNest, the efficiency of the sampling can drop significantly, as is common for high-dimensional problems.For our application to 3HSP J095507.9+355101, about 20 million GRU NN models were evaluated to search all the parameter space (see Table 1) within 21 h.In Table 5, we list execution times for the various cases discussed in this work.All computational tasks were performed with the same hardware (AMD Ryzen 5950X).Quoted times were scaled to a single CPU core when parallel computing was used.Times may vary depending on the hardware used and, therefore, the listed values in Table 5 are meant to be comparative.

Conclusions
We have presented a proof-of-concept study of blazar SED modeling that relies on the use of neural networks (NNs) and Bayesian inference.We have demonstrated how computationally expensive numerical models can be substituted with trained NNs.By testing three typical configurations of neurons, we conclude that a GRU NN offers optimal results for the astrophysical problem at hand.We have tested the efficiency of our approach against simulated datasets and observational data.Our results demonstrate the big leaps in computing efficiency that can be achieved compared to state-of-the-art radiative numerical codes.This in turn makes the process of fitting about 1000 times faster, enabling the use of not only typical MCMC methods, but also nesting sampling algorithms such as those of UltraNest.These findings will provide additional motivation in the development and testing of different NNs or neurons and experimenting with more complex configurations.A natural next step would be the training of NNs against more complex radiative models that include leptohadronic processes.A typical example is the case of 3HSP J095507.9+355101,where lepto-hadronic models computed with ATHEνA have only been tested by eye against the data A185, page 7 of 12 Tzavellas, A., et al.: A&A, 683, A185 (2024) (see Petropoulou et al. 2015aPetropoulou et al. , 2020, for such applications), for such applications).Given the large number of parameters (at least 11) and the execution time (tens of minutes, even with faster codes such as LeHaMoC) a statistical fitting with Bayesian methods is prohibitive.In leptohadronic models, the number of PDEs that need to be solved increases from two to at least five.Moreover, the PDE describing the evolution of each particle species in the source is non-linearly coupled to the PDEs of other species.Due to the non-linear coupling small change in the model parameters may lead to drastic changes in the resulting SED.Complexity of lepto-hadronic models is further increased by the fact that not all equilibrium solutions of the problem are constant in time.There are regions of the high-dimensional parameter space, as extensively discussed by Mastichiadis et al. (2020), that produce oscillatory equilibrium solutions (Petropoulou & Mastichiadis 2012, 2018;Mastichiadis et al. 2005), known as limit cycles in non-linear dynamics (Strogatz 2000).The methodology presented here can also be extended to other astrophysical problems, like spectral modeling of X-ray pulsars (see examples of available numerical models Wolff et al. 2016;West et al. 2017;Becker & Wolff 2022) or gamma-ray bursts (GRBs, e.g., Rudolph et al. 2023).
While we were finalizing this work, the paper by Bégué et al. (2023) appeared.While the main concept and the radiation model are the same, the two studies are independent and complementary.They focus on a different NN, along with distinct evaluations and modeling examples.

Fig. 2 .
Fig. 2. Visual comparison of GRU (magenta points) and LSTM (maroon points) NNs on top of the ATHEνA SED.Both NN generated models are able to capture the overall trends, however, GRU offers a better description of the ATHEνA SED with less scatter.

Fig. 3 .
Fig. 3. Examples of SEDs from the test dataset (i.e., not used for the training).Predictions of the trained GRU NN are overplotted (colored markers) to the numerical models created with ATHEνA (solid lines).

Fig. 4 .
Fig. 4. Simulated blazar SEDs fitted with the GRU NN generated model.Simulated data are shown with markers.The dashed magenta line plots the SED from the ATHEνA code.Dark-and light-shaded areas denote the range of GRU NN models with parameters from the 50% and 90% posterior distributions, respectively.

Fig. A. 3 .
Fig. A.3.Corner plot showing the posterior distributions of the GRU NN model parameters for 3HSP J095507.9+355101asderived from emcee.Dashed lines in the histograms indicate the median and 68 % range of values.With magenta color we overplot the corner plot of the SSC model fitted by LeHaMoC as presented by Stathopoulos et al. (2024).

Table 1 .
Input parameter ranges used for the generation of training and validation datasets with the ATHEνA code.

Table 2 .
Hyper-parameters for the NNs.

Table 3 .
Evaluation of results.
Notes. (* ) Median metric value for each NN based on the test sample. (* * ) Percentage of times where each NN ranked 1st/2nd/3rd in their between comparison for the test sample.R 2 score (Draper & Smith 1998), mean squared error (MSE), mean squared absolute error (MAE) 4 , Kolmogorov-Smirnov (KS, Hodges Jr 1958) criterion, and dynamic time warping 5 (DTW, Senin 2008), to compare the performance of the trained NNs.In terms of R 2 the GRU and LSTM NNs have significantly better values than the ANN network.The ranking results (listed in Table

Table 4 .
Values from GRU NN fitting to simulated SEDs with emcee.

Table 5 .
Typical CPU times for SSC computations.