Issue 
A&A
Volume 683, March 2024



Article Number  A185  
Number of page(s)  12  
Section  Numerical methods and codes  
DOI  https://doi.org/10.1051/00046361/202348566  
Published online  19 March 2024 
Application of neural networks to synchroCompton blazar emission models^{★}
^{1}
Department of Physics, National and Kapodistrian University of Athens, University Campus Zografos,
15784
Athens,
Greece
email: tzavellas.anastasios@gmail.com; gevas@phys.uoa.gr
^{2}
Institute of Accelerating Systems & Applications, University Campus Zografos,
Athens,
Greece
Received:
10
November
2023
Accepted:
21
December
2023
Context. Jets from supermassive black holes at the centers of active galaxies are the most powerful and persistent sources of electromagnetic radiation in the Universe. To infer the physical conditions in the otherwise outofreach regions of extragalactic jets, we usually rely on fitting their spectral energy distributions (SEDs). The calculation of radiative models for the jet’s nonthermal emission usually relies on numerical solvers of coupled partial differential equations.
Aims. In this work, we use machine learning to tackle the problem of high computational complexity to significantly reduce the SED model evaluation time, which is necessary for SED fittings carried out with Bayesian inference methods.
Methods. We computed the SEDs based on the synchrotron selfCompton model for blazar emission using the radiation code ATHEvA. We used them to train neural networks (NNs) to explore whether they can replace the original code, which is computationally expensive.
Results. We find that a NN with gated recurrent unit neurons (GRUN) can effectively replace the ATHEvA leptonic code for this application, while it can be efficiently coupled with Markov chain Monte Carlo (MCMC) and nested sampling algorithms for fitting purposes. We demonstrate this approach through an application to simulated data sets, as well as a subsequent application to observational data.
Conclusions. We present a proofofconcept application of NNs to blazar science as the first step in a list of future applications involving hadronic processes and even larger parameter spaces. We offer this tool to the community through a public repository.
Key words: radiation mechanisms: nonthermal / radiative transfer / methods: numerical / methods: statistical
The results of our work are available in GitHub; https://github.com/tzavellas/blazar_ml. This includes: (a) the NN and accompanied code produced to train them, (b) code for visualization of results in python and jupyter notebooks with instructions, and (c) part of the ATHEvA datasets that can be used for evaluation and plotting examples.
© The Authors 2024
Open Access article, published by EDP Sciences, under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
This article is published in open access under the Subscribe to Open model. Subscribe to A&A to support open access publication.
1 Introduction
Active galactic nuclei (AGNs) are known to launch relativistic and collimated plasma outflows, known as jets. While jetted AGNs comprise a small fraction of the AGN population (Padovani et al. 2017), they have attracted a great deal of attention in recent decades because they are the most powerful and persistent sources of nonthermal electromagnetic radiation (for recent reviews, see Blandford et al. 2019; Hovatta & Lindfors 2019). Blazars are AGNs with jets that are directed toward us. The combination of the small viewing angle with the relativistic motion of the emitting plasma results in strong Doppler beaming, making blazars ideal sources for studying nonthermal radiation processes in jets.
The spectral energy distributions (SEDs) of blazar are central to their study and they can be thought of as a metric that maps the energy output of the jet across different photon energies, presenting a panoramic view of the radiative processes within the jet. Decoding the SED is pivotal in uncovering the prevailing physical conditions and radiative processes at play in the otherwise outofreach regions of extragalactic jets. However, a significant hurdle in this process is the fitting of the SED itself, which relies on a juxtaposition of the observed energy distributions with model predictions. Radiative models for jet emission are primarily numerical, as they require the solution of a system of stiff coupled partial differential equations (PDEs) describing the evolution with time of the distribution functions of radiating particles and photons (e.g., Dimitrakoudis et al. 2012; Gao et al. 2017; Cerruti et al. 2015; Gasparyan et al. 2022). Such numerical codes tend to have a high computational complexity, especially in the case of leptohadronic models^{1}, where each SED computation could last from a few minutes up to an hour (depending on the numerical schemes used by the PDE solver, but also on the approximations made for each operator in the PDE). If such numerical models were to be used in a Markov chain Monte Carlo (MCMC) algorithm to determine the posterior distributions of the model parameters, the overall execution time could become prohibitively long. Addressing this computational bottleneck is central to accelerating the pace of discovery in blazar science.
Although the idea behind the application of neural networks (NNs) has been around for more than 50 y (McCulloch & Pitts 1943), its extended use has become more feasible thanks to hardware developments over the last decade. Inspired by recent trends in computational biology Wang et al. (2019) and similar applications in other disciplines e.g., (Wiecha & Muskens 2020; Alarfaj et al. 2022), this work investigates the possibility of replacing a numerical model of average computational complexity, namely, the synchrotron selfCompton^{2} (SSC, e.g., Maraschi et al. 1992; Bloom & Marscher 1996; Mastichiadis & Kirk 1997) with a NN (see Abiodun et al. 2018). In pursuit of this objective, the initial phase involves the meticulous assembly of a dataset derived from the numerical model. While the intricate processes of dataset generation are set aside for brevity, it is imperative that this dataset is comprehensive and uniformly spans the parameter space. Furthermore, this compilation should be both ample in size to enable rigorous training and analysis, and it should also be efficiently constructed to conserve time and resources. With a robust dataset in hand, the exploration of various NN architectures takes center stage.
The realm of machine learning (ML) is characterized by its blend of scientific rigor and artistic intuition; there exists no monolithic model that stands as the unequivocal solution. Each model is a unique amalgamation of structural nuances and algorithmic intricacies, tailored to address specific aspects of complex problems. Consequently, we will investigate a suite of NN configurations to identify those offering optimal performance and accuracy in the context of the defined problem space.
This paper is structured as follows. In Sect. 2, we outline the methodology used to construct a model based on a NN, which includes the use of a numerical code to construct a dataset and the various NN tuning and evaluation tests. In Sect. 3, we present a few scenarios where the NN trained model is coupled with a Bayesian interface to fit simulated data and actual blazar observations. We also provide notes on the efficiency of our approach compared to numerical codes. Finally, we present the conclusions of this work and some future aspects in Sect. 4.
Fig. 1 Implementation flow chart describing our methodology. 
2 Methods
Our approach (schematically shown in Fig. 1) is comprised of the following steps: (i) we built an ensemble of blazar SED models, (ii) we trained the NN using available datasets, and (iii) we evaluated our results based on datasets that were not used in the training. In the following, we elaborate on the steps above.
Input parameter ranges used for the generation of training and validation datasets with the ATHEvA code.
2.1 Generation of dataset
To generate blazar SEDs, we use the radiative transfer code ATHEvA (Mastichiadis & Kirk 1995; Dimitrakoudis et al. 2012) that solves a system of coupled PDEs describing the evolution of relativistic particle distributions and photons contained in a nonthermal emitting source. For the current project, we used the leptonic module of the code that describes a magnetized source containing relativistic electronpositron pairs and photons. Electrons are continuously injected into the source at a fixed rate (or injection luminosity) and escape on an energyindependent timescale that is equal to the lightcrossing time of the source. Photons are produced via electron synchrotron radiation and scattered to higher energies through inverse Compton scattering, while they escape on the lightcrossing time from the source^{3}. For a detailed description of the modeling of each physical process, we refer to Mastichiadis & Kirk (1995), Dimitrakoudis et al. (2012).
The SSC model parameters, which are used as an input to ATHEvA, are as follows; (i) the characteristic size of the emitting region, R, (i.e., its radius, assuming a spherical source in the jet rest frame), (ii) the magnetic field strength B (measured in the jet rest frame), (iii) the electron injection compactness ℓ_{e}, which is a dimensionless measure of the power injected into relativistic electrons (defined as ℓ_{e} = σ_{T} L_{e}/(4πRm_{e}c^{3})), (iv) the powerlaw slope p of the electron distribution at injection (i.e., , and (v) the Lorentz factor range of radiating electrons, [γ_{min}, γ_{max}], also at injection.
For the generation of the dataset, we used parameter values motivated by SSC models of blazar emission (e.g., Celotti & Ghisellini 2008). For instance, we take the lower bound of the electron distribution to range between 10^{0.1} and 10^{4}. To avoid very narrow electron distributions or cases where γ_{max} < γ_{min} (which would lead to code crashing), we set the upper cutoff of the distribution to lie between 10^{2}γ_{min} and min(10^{8}, γ_{H}); here, γ_{H} = eBR/(m_{e}c^{2}) is an upper limit imposed by the Hillas condition (Hillas 1984), which ensures the magnetic confinement of the most energetic electrons in the source. Each variable is randomly drawn from a uniform distribution in logarithmic space (except for the slope p), with details given in Table 1. Given the theoretically known dependence of the synchrotron and SSC fluxes and the characteristic photon energies on model parameters, such as R and B (see, e.g., Mastichiadis & Kirk 1997; Finke et al. 2008), the selected ranges are wide enough to produce a diverse dataset, crucial for the NN training (see the subsections below).
Another parameter that should be taken into account for the SED fitting is the Doppler factor, δ, of the emitting region, which is used to transform the photon spectra from the jet comoving frame to the observer’s frame. However, this is a nuisance parameter for the dataset creation, since the Doppler boosting can be applied at a later stage to transform the comoving photon spectra (unprimed quantities) to the observer’s frame, namely: v_{obs} = δv and L_{v,obs} = δ^{3}L_{v} (here, v is the photon frequency and L_{v} is the specific luminosity).
To generate the dataset, we sampled the space of leptonic parameters and fed each set to ATHEvA. The system evolves for 5 R/c, which is sufficiently long for the establishment of a steady state (equilibrium solution of the PDEs). The parameters along with the photon PDE solution are concatenated to form the entry of each dataset. All SEDs were computed at the same 500 grid points of photon energies, logarithmically spaced between 10^{−15} and 10^{10} (in units of m_{e}c^{2}). The SED fluxes returned by ATHEvA at each grid point (vF_{v} in code units) were renormalized using minmax scaling between the values of 0 and 1, before being used for the training. The normalization of inputs is standard practice (Bishop 1995). In total, we created a dataset of 10 000 samples for the range of parameters listed in Table 1, within ~500 CPU hours (i.e., about 3 min per model) on a desktop computer (AMD Ryzen 5950X). The generated dataset is split into 80% for training, 10% for validation, and 10% for testing. All datasets are stored in a single file with one dataset per line, the NN inputs are the 6 SSC model parameters and stored in the first 6 columns, while the output SED (i.e., 500 grid points) is stored in the following columns.
2.2 Training of the NNs
Once a dataset has been obtained, it is essential to select the appropriate type of NN artificial neuron and its corresponding structure. The prediction of the SED can be perceived as a regression problem, making the simple artificial neuron an apparent choice. However, it is also possible to approach the bins of the SED as sequential data. In this case, all three types of recurrent neurons become potential options for constructing the NN. We employed several NN topologies that varied with respect to the following parameters: number of hidden layers, number of neurons per hidden layer, and three types of neurons: an artificial NN with a deep stack of hidden layers, namely: an artificial neural network (ANN), a gated recurrent unit (GRU, Cho et al. 2014), and a long shortterm memory (LSTM, Hochreiter & Schmidhuber 1997). The ANN takes an input vector, calculates a dot product with a vector of weights, adds a bias and passes the result through an activation function. Apart from the ANN cells, there are also recurrent neural network (RNN) cells, such as LSTM and GRU, that maintain a state (memory) of past inputs, making them wellsuited for tasks such as time series predictions. A noteworthy point is that both LSTM and GRU do not suffer from the unstable gradient problem, which is crucial for successful training, and a thorough study of this was done by Glorot & Bengio (2010).
When handling time serieslike inputs, recurrent cells are used to predict the next value and each cell in a layer feeds its output to its neighbor, thus forming a chain. The characteristic of a timeseries input is that a value is bound to a given range that depends on the values of its neighbors, which means there cannot be a large derivative at each point of the series. The same holds for SEDs, so we can handle the energy values of the SED as a time serieslike input, with the time being replaced by the photon frequency. Using a network with GRU or LSTM cells, ensures in a way that the output (i.e., in our case the SED) maintains a somewhat smooth shape.
Currently, there are a few ML frameworks available for several programming languages that can be used for practical applications. The Python programming language is a very popular choice in the astrophysics community and, as a result, there are a lot of scientific software packages available for use. The major ML packages available in Python are: Tensorflow created by Google Brain (Abadi et al. 2015), PyTorch developed by Facebook’s AI Research lab (Paszke et al. 2019), scikitlearn built by David Cournapeau (Pedregosa et al. 2011), and Theano developed by the Montreal Institute for Learning Algorithms (The Theano Development Team et al. 2016). Tensorflow and Theano use Keras (Chollet et al. 2015), an opensource application programming interface (API) that provides a Python interface for NNs. It was developed with a focus on enabling fast experimentation and prototyping, being both userfriendly and modular. In this work we opted for Tensorflow.
We trained the three NNs using the 10k dataset. The hyperparameters and complexity of each NN are presented in Table 2. An additional necessary parameter is the number of epochs, which refers to the number of training cycles. The appropriate number of epochs must be carefully selected for each network to address the issue of overfitting. To determine the optimal number of epochs for each network, an initial training phase is conducted with a substantially large number of epochs (set at 3000), since experience has demonstrated that overfitting, if it occurs, typically starts after 200 epochs. Finally, we can choose a loss function other than the typical mean squared error (MSE). Since our networks handle normalized data it was decided to use instead the mean squared logarithmic error (MSLE) as a loss function. The training process of these networks is observed in realtime using Tensorboard. This enables the monitoring of whether a network is overfitting or whether it has achieved a satisfactory loss error, which is on the order of 10^{−4}. Whichever of these conditions is met first dictates the stopping point. The epoch at which training is stopped is noted and then the network undergoes retraining for this specific number of epochs.
Hyperparameters for the NNs.
2.3 Evaluation of results
After training our three NNs, we proceeded to the evaluation of the models by comparing the NN predictions to the SEDs from the test sample. We first used standard metrics, namely, the R^{2} score (Draper & Smith 1998), mean squared error (MSE), mean squared absolute error (MAE)^{4}, KolmogorovSmirnov (KS, Hodges Jr 1958) criterion, and dynamic time warping^{5} (DTW, Senin 2008), to compare the performance of the trained NNs. In terms of R^{2} the GRU and LSTM NNs have significantly better values than the ANN network. The ranking results (listed in Table 3) show that the GRU is ranked first among the three metrics, while the ANN is ranked third by all criteria. We also visually inspect the test dataset, where we compare the prediction of the NNs against the result of ATHEvA – see Fig. 2. From this qualitative test, it became clear that the ANN failed to describe the SED, suggesting that more complexity has to be added to the network. Meanwhile, both GRU and LSTM delivered acceptable solutions (based on the eyeball test). The main difference was that GRU offered more smooth solutions for the SED, while the LSTM predictions had noticeably small amplitude scatter across neighboring values. In combination with the evaluation results listed in Table 3, we selected the GRU NN for the applications that follow. We note the fact that in our case, the GRU model generates more smooth SEDs compared to LSTM is not a feature of the cell architecture and applies only to the particular trained model. If the training is repeated with the same number of epochs, the resulting model will be different due to the stochastic nature of the training algorithm (Kingma & Ba 2014). The evaluation and comparison of the models should always be repeated when a new network is trained.
In Fig. 3, we present random examples drawn from the test dataset that highlight the wide variety of SEDs (in terms of flux and shape) reproduced by the GRU NN, including spectra that are not typical of blazars. For instance, there are cases without a clear twohump morphology due to the smooth superposition of the synchrotron and SSC components (see, e.g., panels a and b) or due to a very extended synchrotron spectrum (panel q), which are more reminiscent of the spectra of pulsar wind nebulae (see, e.g., Amato & Olmi 2021, for Crab nebula). Moreover, the sample contains spectra produced in the socalled inverse Compton catastrophe limit (Readhead 1994; Petropoulou et al. 2015b); these SEDs consist of three distinct components attributed to synchrotron, firstorder and secondorder SSC emission, with the latter carrying most of the bolometric power of the source (see panels 1 and o). In most cases, the trained NN describes accurately the numerical model (e.g., panels a, h, and i). There are also cases where the NN prediction misses the details of the model, but still captures the general trend (e.g., panels t, o, and p). As we show in the following section, such differences are not crucial for the interpretation of observed spectra in the context of fitting with Bayesian methods.
Fig. 2 Visual comparison of GRU (magenta points) and LSTM (maroon points) NNs on top of the ATHEvA SED. Both NN generated models are able to capture the overall trends, however, GRU offers a better description of the ATHEvA SED with less scatter. 
Evaluation of results.
3 Results
3.1 Simulated SEDs
A fundamental question pertains to whether a trained NN can be used to recover the physical parameters corresponding to an observed SED, provided the "correct" answer is already known. The only way to test this is to use simulated SEDs based on an SSC model with known parameters, which are then fitted by the NN; the fitted values can be then compared to the true values of the SSC model to address the question of whether the NN can constrain the parameters with sufficient accuracy.
To create a simulated SED, we selected 25 points of an SSC model computed with ATHEvA from a wide frequency range and added Gaussian noise with 0.1 standard deviation to the logarithmic flux values. We then attributed an error to each flux point between 0.1 and 0.4 in logarithm to mimic statistical uncertainties in the measurement. As blazars are observed in specific energy bands, we selected points that broadly correspond to radio, UV, Xray, and γray bands. By creating an incomplete SED with gaps in regions where the solution may have large derivatives (e.g., spectral cutoffs), we increased the difficulty for the NN fitting. We also note that the simulated flux points may overshoot the baseline model of ATHEvA in certain frequency bands due to the randomness of the simulation and the small number of selected points. This mimics observational effects related to nonsimultaneous observations and intrinsic source variability.
We used the trained GRU NN to fit the data in Python with emcee^{6} (ForemanMackey et al. 2013). For constructing the loglikelihood function and fitting the data, we followed an commonly adopted methodology in similar problems (e.g., Karaferias et al. 2023; Stathopoulos et al. 2024). We also added a term ln f as a free parameter to account for the systematic scatter and noise of the simulated data and the NN model. By including ln f, we get an excess variance compared to statistical uncertainties, namely, , where σ_{i} is the error of individual flux points.
Here, we present results for two characteristic cases. The simulated SEDs (markers) together with the corresponding SED from the ATHEvA code (dashed line) and the fitted models (shaded regions) are shown in Fig. 4. The corner plots of the posterior distributions are presented in Figs. A.1 and A.2. The NN model is able to successfully fit the simulated data and capture the overall shape of the true model. In terms of parameters estimations all parameters were consistent within 2σ with the original values used for simulating the data sets (see Table 4).
Fig. 3 Examples of SEDs from the test dataset (i.e., not used for the training). Predictions of the trained GRU NN are overplotted (colored markers) to the numerical models created with ATHEvA (solid lines). 
Fig. 4 Simulated blazar SEDs fitted with the GRU NN generated model. Simulated data are shown with markers. The dashed magenta line plots the SED from the ATHEvA code. Dark and lightshaded areas denote the range of GRU NN models with parameters from the 50% and 90% posterior distributions, respectively. 
Values from GRU NN fitting to simulated SEDs with emcee.
3.2 Observed SEDs
We demonstrate the capabilities of the trained NN model to fit a real blazar SED with emcee and UltraNest (Buchner 2021), which employs nested sampling to scan multimodal posterior distributions. The latter allows us to better explore possible degeneracies in multiparameter problems.
For this application, we chose 3HSP J095507.9+355101 (Giommi et al. 2020; Paliya et al. 2020), which is a BL Lac object at redshift ɀ = 0.557 (Paiano et al. 2020; Paliya et al. 2020) that belongs to the rare class of extreme blazars (Biteau et al. 2020). This blazar has been detected in GeV γrays by the Fermi Large Area Telescope (LAT) and is part of the 4FGL catalog (Abdollahi et al. 2020). 3HSP J095507.9+355101 is an ideal testbed because it combines measurements with small errors at lower energies with several upper limits in the γrays. Another reason for our source selection was that this target was also recently modeled with an emcee adaptation of the leptonic module of LeHaMoC^{7}, a Pythonbased leptohadronic code that offers a factor of ~30–100 speed improvement compared to ATHEvA, while producing consistent results (Stathopoulos et al. 2024). Therefore, we can compare the efficiency and accuracy of SED fitting performed with a fast leptonic code and a trained NN.
We used the same dataset as Stathopoulos et al. (2024). The quasisimultaneous observations in the UV, soft Xrays, and hard Xrays provide a detailed picture of the lowenergy part of the spectrum. On the contrary, the highenergy part of the spectrum is less constrained observationally. For fitting the data and constructing the loglikelihood function, we followed Stathopoulos et al. (2024), taking into account asymmetric errors and upper limits in the Fermi data. For our application we defined uniform priors to cover all the parameter space used to build the datasets and a Doppler boost factor (in logarithm) in the range of [0,3]. We ran emcee with 48 walkers and 10 000 steps each. We started with a guess value that resembles the general spectral shape and discard the first 1000 steps of each chain as burnin. We also ran UltraNest with 400 live points, a maximum number improvement loops of 3, and a target evidence uncertainty of 0.5.
The results of the fits are presented in Fig. 5. We find no noticeable differences between the emcee and UltraNest results, demonstrating that the SSC model has no multimodal posterior distributions (presented in Figs. A.3 and A.4). When compared to the fitting results obtained with a leptonic code, the posterior distributions of parameters obtained by the GRU NN and LeHaMoC (Stathopoulos et al. 2024) are identical in terms of median values, standard deviation, and overall distribution shape (as shown in the corner plot of Fig. A.3). Moreover, the physical processes that were neglected in the NN training (i.e., synchrotron selfabsorption and γγ pair production) turned out to be negligible for this application, since both processes were included in the fitting with LeHaMoC.
Fig. 5 Observed SED of 3HSP J095507.9+355101 (colored markers) fitted with the GRU NN generated model, with emcee (left) and UltraNest (right). The dark and lightshaded areas denote, respectively, the 50% and 90% confidence regions of the fitted models. 
3.3 Efficiency of NNs
Here, we briefly comment on the efficiency of the NN generated model in comparison to the other numerical codes used in our tests.
The ATHEvA code, first presented in the mid1990s (Mastichiadis & Kirk 1995), did not target fitting large samples of data. As a result, the numerical scheme adopted in ATHEvA is not optimized for speed, but for accuracy (see also Cerruti et al. 2022, for comparison of ATHEvA against other proprietary codes). The typical execution time of ATHEvA for an SSC model is about 1–4 min. Moreover, the code may fail for certain input parameters (when these lead to very large derivatives in the PDEs), which cannot be determined a priori. For instance, about 4% of the cases failed when creating the dataset for the NN training. LeHaMoC, on the other hand, is more flexible for the generation of big datasets and its use in fitting algorithms (Stathopoulos et al. 2024): it delivers a factor of ~30–100 improvement in speed compared to ATHEvA and does not crash regardless of the input parameters. When combined with the of emcee parallelization option, LeHaMoC can fit the SED of 3HSP J095507.9+355101within 1 day (using 16/32 CPU cores/threads). On the contrary, fitting of 3HSP J095507.9+355101, using the GRU NN model was completed in approximately 20 min on a single core, which is about 1,000 faster compared to LeHaMoC.
Therefore, implementing such a NN as GRU offers a huge improvement in computational time, as it performs a MCMC fitting with emcee in a matter of minutes. When combined with nested sampling algorithms like UltraNest, the efficiency of the sampling can drop significantly, as is common for highdimensional problems. For our application to 3HSP J095507.9+355101, about 20 million GRU NN models were evaluated to search all the parameter space (see Table 1) within 21 h. In Table 5, we list execution times for the various cases discussed in this work. All computational tasks were performed with the same hardware (AMD Ryzen 5950X). Quoted times were scaled to a single CPU core when parallel computing was used. Times may vary depending on the hardware used and, therefore, the listed values in Table 5 are meant to be comparative.
Typical CPU times for SSC computations.
4 Conclusions
We have presented a proofofconcept study of blazar SED modeling that relies on the use of neural networks (NNs) and Bayesian inference. We have demonstrated how computationally expensive numerical models can be substituted with trained NNs. By testing three typical configurations of neurons, we conclude that a GRU NN offers optimal results for the astrophysical problem at hand. We have tested the efficiency of our approach against simulated datasets and observational data. Our results demonstrate the big leaps in computing efficiency that can be achieved compared to stateoftheart radiative numerical codes. This in turn makes the process of fitting about 1000 times faster, enabling the use of not only typical MCMC methods, but also nesting sampling algorithms such as those of UltraNest. These findings will provide additional motivation in the development and testing of different NNs or neurons and experimenting with more complex configurations.
A natural next step would be the training of NNs against more complex radiative models that include leptohadronic processes. A typical example is the case of 3HSP J095507.9+355101, where leptohadronic models computed with ATHEvA have only been tested by eye against the data (see Petropoulou et al. 2015a, 2020, for such applications). Given the large number of parameters (at least 11) and the execution time (tens of minutes, even with faster codes such as LeHaMoC) a statistical fitting with Bayesian methods is prohibitive. In leptohadronic models, the number of PDEs that need to be solved increases from two to at least five. Moreover, the PDE describing the evolution of each particle species in the source is nonlinearly coupled to the PDEs of other species. Due to the nonlinear coupling small change in the model parameters may lead to drastic changes in the resulting SED. Complexity of leptohadronic models is further increased by the fact that not all equilibrium solutions of the problem are constant in time. There are regions of the highdimensional parameter space, as extensively discussed by Mastichiadis et al. (2020), that produce oscillatory equilibrium solutions (Petropoulou & Mastichiadis 2012, 2018; Mastichiadis et al. 2005), known as limit cycles in nonlinear dynamics (Strogatz 2000). The methodology presented here can also be extended to other astrophysical problems, like spectral modeling of Xray pulsars (see examples of available numerical models Wolff et al. 2016; West et al. 2017; Becker & Wolff 2022) or gammaray bursts (GRBs, e.g., Rudolph et al. 2023).
While we were finalizing this work, the paper by Bégué et al. (2023) appeared. While the main concept and the radiation model are the same, the two studies are independent and complementary. They focus on a different NN, along with distinct evaluations and modeling examples.
Acknowledgements
We thank the anonymous referee for their comments and suggestions that helped improve the clarity of the manuscript. M.P. acknowledges support from the Hellenic Foundation for Research and Innovation (H.F.R.I.) under the “2nd call for H.F.R.I. Research Projects to support Faculty members and Researchers” through the project UNTRAPHOB (Project ID 3013). G.V. acknowledges support by H.F.R.I. through the project ASTRAPE (Project ID 7802).
Appendix A Corner plots
Fig. A.1 Corner plot showing the posterior distributions of the parameters of the GRU NN model presented in Fig. 4 (left). Contours denote Parameters of the initial model are marked with magenta lines. 
Fig. A.2 Corner plot showing the posterior distributions of the parameters of the GRU NN model presented in Fig. 4 (right). Parameters of the initial model are marked with magenta lines. 
Fig. A.3 Corner plot showing the posterior distributions of the GRU NN model parameters for 3HSP J095507.9+355101as derived from emcee. Dashed lines in the histograms indicate the median and 68 % range of values. With magenta color we overplot the corner plot of the SSC model fitted by LeHaMoC as presented by Stathopoulos et al. (2024). 
Fig. A.4 Corner plot showing the posterior distributions of the GRU NN model parameters for 3HSP J095507.9+355101as derived from UltraNest. Dashed lines in the histograms indicate the median and 68 % range of values. 
References
 Abadi, M., Agarwal, A., Barham, P., et al. 2015, TensorFlow: LargeScale Machine Learning on Heterogeneous Systems, software available from tensorflow.org [Google Scholar]
 Abdollahi, S., Acero, F., Ackermann, M., et al. 2020, ApJS, 247, 33 [Google Scholar]
 Abiodun, O. I., Jantan, A., Omolara, A. E., et al. 2018, Heliyon, 4, e00938 [CrossRef] [PubMed] [Google Scholar]
 Alarfaj, F. K., Khan, N. A., Sulaiman, M., & Alomair, A. M. 2022, Symmetry, 14, 2482 [Google Scholar]
 Amato, E., & Olmi, B. 2021, Universe, 7, 448 [NASA ADS] [CrossRef] [Google Scholar]
 Becker, P. A., & Wolff, M. T. 2022, ApJ, 939, 67 [NASA ADS] [CrossRef] [Google Scholar]
 Bégué, D., Sahakyan, N., Dereli Bégué, H., et al. 2023, ApJ, submitted [arXiv:2311.02979] [Google Scholar]
 Bishop, C. M. 1995, Neural Networks for Pattern Recognition (Oxford: Clarendon Press) [Google Scholar]
 Biteau, J., Prandini, E., Costamante, L., et al. 2020, Nat. Astron., 4, 124 [Google Scholar]
 Blandford, R., Meier, D., & Readhead, A. 2019, ARA&A, 57, 467 [NASA ADS] [CrossRef] [Google Scholar]
 Bloom, S. D., & Marscher, A. P. 1996, ApJ, 461, 657 [NASA ADS] [CrossRef] [Google Scholar]
 Buchner, J. 2021, J. Open Source Softw., 6, 3001 [CrossRef] [Google Scholar]
 Celotti, A., & Ghisellini, G. 2008, MNRAS, 385, 283 [NASA ADS] [CrossRef] [Google Scholar]
 Cerruti, M., Zech, A., Boisson, C., & Inoue, S. 2015, MNRAS, 448, 910 [Google Scholar]
 Cerruti, M., Kreter, M., Petropoulou, M., et al. 2022, in 37th International Cosmic Ray Conference, 979 [Google Scholar]
 Cho, K., Van Merriënboer, B., Bahdanau, D., & Bengio, Y. 2014, arXiv eprints [arXiv:1409.1259] [Google Scholar]
 Chollet, F., et al. 2015, Keras, https://keras.io [Google Scholar]
 Dimitrakoudis, S., Mastichiadis, A., Protheroe, R. J., & Reimer, A. 2012, A&A, 546, A120 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Draper, N. R., & Smith, H. 1998, Applied Regression Analysis, 326 (John Wiley & Sons) [Google Scholar]
 Finke, J. D., Dermer, C. D., & Böttcher, M. 2008, ApJ, 686, 181 [NASA ADS] [CrossRef] [Google Scholar]
 ForemanMackey, D., Hogg, D. W., Lang, D., & Goodman, J. 2013, PASP, 125, 306 [Google Scholar]
 Gao, S., Pohl, M., & Winter, W. 2017, ApJ, 843, 109 [NASA ADS] [CrossRef] [Google Scholar]
 Gasparyan, S., Bégué, D., & Sahakyan, N. 2022, MNRAS, 509, 2102 [Google Scholar]
 Giommi, P., Padovani, P., Oikonomou, F., et al. 2020, A&A, 640, L4 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Glorot, X., & Bengio, Y. 2010, in Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, JMLR Workshop and Conference Proceedings, 249 [Google Scholar]
 Hillas, A. M. 1984, ARA&A, 22, 425 [Google Scholar]
 Hochreiter, S., & Schmidhuber, J. 1997, Neural Comput., 9, 1735 [CrossRef] [Google Scholar]
 Hodges Jr, J. 1958, Arkiv för matematik, 3, 469 [CrossRef] [Google Scholar]
 Hovatta, T., & Lindfors, E. 2019, New Astron. Rev., 87, 101541 [CrossRef] [Google Scholar]
 Karaferias, A. S., Vasilopoulos, G., Petropoulou, M., et al. 2023, MNRAS, 520, 281 [NASA ADS] [CrossRef] [Google Scholar]
 Kingma, D. P., & Ba, J. 2014, arXiv eprints [arXiv:1412.6988] [Google Scholar]
 Maraschi, L., Ghisellini, G., & Celotti, A. 1992, ApJ, 397, L5 [CrossRef] [Google Scholar]
 Mastichiadis, A., & Kirk, J. G. 1995, A&A, 295, 613 [NASA ADS] [Google Scholar]
 Mastichiadis, A., & Kirk, J. G. 1997, A&A, 320, 19 [NASA ADS] [Google Scholar]
 Mastichiadis, A., Protheroe, R. J., & Kirk, J. G. 2005, A&A, 433, 765 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Mastichiadis, A., Florou, I., Kefala, E., Boula, S. S., & Petropoulou, M. 2020, MNRAS, 495, 2458 [NASA ADS] [CrossRef] [Google Scholar]
 McCulloch, W. S., & Pitts, W. 1943, Bull. Math. Biophys., 5, 115 [CrossRef] [Google Scholar]
 Padovani, P., Alexander, D. M., Assef, R. J., et al. 2017, A&ARv, 2, 25 [Google Scholar]
 Paiano, S., Falomo, R., Padovani, P., et al. 2020, MNRAS, 495, L108 [Google Scholar]
 Paliya, V. S., Böttcher, M., OlmoGarcía, A., et al. 2020, ApJ, 902, 29 [NASA ADS] [CrossRef] [Google Scholar]
 Paszke, A., Gross, S., Massa, F., et al. 2019, in Advances in Neural Information Processing Systems 32 (Curran Associates, Inc.), 8024 [Google Scholar]
 Pedregosa, F., Varoquaux, G., Gramfort, A., et al. 2011, J. Mach. Learn. Res., 12, 2825 [Google Scholar]
 Petropoulou, M., & Mastichiadis, A. 2012, MNRAS, 421, 2325 [NASA ADS] [CrossRef] [Google Scholar]
 Petropoulou, M., & Mastichiadis, A. 2018, MNRAS, 477, 2917 [Google Scholar]
 Petropoulou, M., Dimitrakoudis, S., Padovani, P., Mastichiadis, A., & Resconi, E. 2015a, MNRAS, 448, 2412 [Google Scholar]
 Petropoulou, M., Piran, T., & Mastichiadis, A. 2015b, MNRAS, 452, 3226 [Google Scholar]
 Petropoulou, M., Oikonomou, F., Mastichiadis, A., et al. 2020, ApJ, 899, 113 [NASA ADS] [CrossRef] [Google Scholar]
 Readhead, A. C. S. 1994, ApJ, 426, 51 [Google Scholar]
 Rudolph, A., Petropoulou, M., Bošnjak, Ž., & Winter, W. 2023, ApJ, 950, 28 [NASA ADS] [CrossRef] [Google Scholar]
 Senin, P. 2008, Dynamic Time Warping Algorithm Review (Honolulu, USA: Information and Computer Science Department) 855, 40 [Google Scholar]
 Stathopoulos, S. I., Petropoulou, M., Vasilopoulos, G., & Mastichiadis, A. 2024, A&A, in press, https://doi.org/10.1051/00046361/202347277 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Strogatz, S. H. 2000, Nonlinear Dynamics and Chaos: With Applications to Physics, Biology, Chemistry and Engineering (Westview Press) [Google Scholar]
 The Theano Development Team, AlRfou, R., Alain, G., et al. 2016, arXiv eprints [arXiv: 1605.02600] [Google Scholar]
 Wang, S., Fan, K., Luo, N., et al. 2019, Nat. Commun., 10, 4354 [NASA ADS] [CrossRef] [Google Scholar]
 West, B. F., Wolfram, K. D., & Becker, P. A. 2017, ApJ, 835, 129 [NASA ADS] [CrossRef] [Google Scholar]
 Wiecha, P. R., & Muskens, O. L. 2020, Nano Lett., 20, 329 [NASA ADS] [CrossRef] [Google Scholar]
 Wolff, M. T., Becker, P. A., Gottlieb, A. M., et al. 2016, ApJ, 831, 194 [Google Scholar]
Additional processes, such as photonphoton pair production and synchrotron selfabsorption, that may affect the lowest and highest energy parts of the photon spectrum are not included in this proofofconcept work. Trained NNs with these procedures will be gradually added to the GitHub repository of the project.
All Tables
Input parameter ranges used for the generation of training and validation datasets with the ATHEvA code.
All Figures
Fig. 1 Implementation flow chart describing our methodology. 

In the text 
Fig. 2 Visual comparison of GRU (magenta points) and LSTM (maroon points) NNs on top of the ATHEvA SED. Both NN generated models are able to capture the overall trends, however, GRU offers a better description of the ATHEvA SED with less scatter. 

In the text 
Fig. 3 Examples of SEDs from the test dataset (i.e., not used for the training). Predictions of the trained GRU NN are overplotted (colored markers) to the numerical models created with ATHEvA (solid lines). 

In the text 
Fig. 4 Simulated blazar SEDs fitted with the GRU NN generated model. Simulated data are shown with markers. The dashed magenta line plots the SED from the ATHEvA code. Dark and lightshaded areas denote the range of GRU NN models with parameters from the 50% and 90% posterior distributions, respectively. 

In the text 
Fig. 5 Observed SED of 3HSP J095507.9+355101 (colored markers) fitted with the GRU NN generated model, with emcee (left) and UltraNest (right). The dark and lightshaded areas denote, respectively, the 50% and 90% confidence regions of the fitted models. 

In the text 
Fig. A.1 Corner plot showing the posterior distributions of the parameters of the GRU NN model presented in Fig. 4 (left). Contours denote Parameters of the initial model are marked with magenta lines. 

In the text 
Fig. A.2 Corner plot showing the posterior distributions of the parameters of the GRU NN model presented in Fig. 4 (right). Parameters of the initial model are marked with magenta lines. 

In the text 
Fig. A.3 Corner plot showing the posterior distributions of the GRU NN model parameters for 3HSP J095507.9+355101as derived from emcee. Dashed lines in the histograms indicate the median and 68 % range of values. With magenta color we overplot the corner plot of the SSC model fitted by LeHaMoC as presented by Stathopoulos et al. (2024). 

In the text 
Fig. A.4 Corner plot showing the posterior distributions of the GRU NN model parameters for 3HSP J095507.9+355101as derived from UltraNest. Dashed lines in the histograms indicate the median and 68 % range of values. 

In the text 
Current usage metrics show cumulative count of Article Views (fulltext article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 4896 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.