A&A 438, 1021-1028 (2005)
DOI: 10.1051/0004-6361:20052865
A. Asensio Ramos1 - H. Socas-Navarro2
1 - Istituto Nazionale di Astrofisica (INAF), Osservatorio Astrofisico di Arcetri, Largo Enrico Fermi 5, 50125 Florence, Italy
2 -
High Altitude Observatory, NCAR
, 3450
Mitchell Ln, Boulder CO 80307-3000, USA
Received 11 February 2005 / Accepted 19 April 2005
Abstract
A novel approach is presented for the solution of instantaneous chemical equilibrium problems. The chemical equilibrium can be considered, due to its intrinsically local character, as a mapping of the three-dimensional
parameter space spanned by the temperature, hydrogen density and electron density into many one-dimensional spaces representing the
number density of each species. We take advantage of the ability of artificial neural networks to approximate non-linear functions
and construct neural networks for the fast and efficient solution of the chemical equilibrium problem in typical stellar
atmosphere physical conditions. The neural network approach has the advantage
of providing an analytic function, which can be rapidly evaluated. The
networks are trained with a learning set (that covers the entire
parameter space) until a relative error below 1% is reached. It has been verified that the networks are not overtrained by using an
additional verification set. The networks are then applied to a snapshot of realistic three-dimensional convection simulations
of the solar atmosphere showing good generalization properties.
Key words: molecular processes - astrochemistry - methods: numerical
In order to calculate the atomic and molecular number densities using the ICE approximation one needs to solve a non-linear system of equations (see below). Although efficient numerical methods exist for solving this kind of systems, the computing time when the ICE problem has to be solved in a large amount of points (e.g., dense grids, multi-dimensional geometries, iterative inversions, etc) becomes prohibitive. It is then very important to develop a numerical method for the rapid solution of chemical equilibrium problems.
Artificial Neural Networks (ANNs) have proven to be a powerful approach to a broad variety of problems (see, e.g., Bishop 1996). In the solar community, they have been recently applied to the problem of inferring the magnetic field from observations of the polarization profiles of selected atomic lines (Socas-Navarro 2005; Carroll & Staude 2001; Socas-Navarro 2003). In this paper, we make use of the ability of ANNs with one hidden layer to approximate any non-linear continuous function (e.g., Jones 1990; Blum & Li 1991) to solve the ICE problem.
The structure of the paper is as follows. Section 2 describes the ICE approximation and its standard solution. Section 3 describes our approach to the ICE problem using ANNs, discussing how it can be trained to associate a combination of physical parameters with the number density of each species included in the problem. Section 4 details how the trained network can be applied to solve the ICE problem in realistic convection simulations of the solar atmosphere. In Sect. 5 we show the dependence of the outputs of the ANN on the physical parameters, which can be easily done because of the intrinsic analytic character of the mapping generated by the ANN. Finally, the most relevant conclusions are summarized in Sect. 6.
Following the same line of reasoning, one can consider the ionization equilibria for the atomic species:
Under ICE conditions, the number of atoms and molecules are obtained by solving the conservation of mass and
the chemical equilibrium
conditions given by Eq. (2). The conservation of mass establishes that the sum of the partial pressures of
all the species containing a given atomic element (taking into account the stoichiometry) equals the fictitious pressure of the
given element:
![]() |
(10) |
As we discussed above, the problem of obtaining the atomic and molecular number densities is completely defined once the temperature
and the total density (or pressure) is known. However, we also need to know the value of the equilibrium constant for each
temperature. It is known from considerations of statistical mechanics that the value of the equilibrium constant can be obtained with
the aid of the partition function of the molecule and of the individual atomic components, their respective masses and the
dissociation energy D0 (see, e.g., Tejero Ordoñez & Cernicharo 1991, for detailed information on how the equilibrium constants can be calculated).
The expression is given by (Tejero Ordoñez & Cernicharo 1991, and references therein):
Table 1: Number of hidden neurons used for each ANN.
The ICE problem consists on obtaining the partial pressures of the atomic species (or the atomic species number densities) which are
consistent with Eq. (8) once the local temperature, local hydrogen density and local electron density are given.
One could also take a different point of view and see the ICE problem as a
mapping between
the three-dimensional space
given by
onto several one-dimensional spaces, one for each atomic species included in the problem. The functional form
of the functions f:
is not known.
The Artificial Neural Network (ANN) with one hidden layer is
a universal approximant to any non-linear continuous function (e.g., Jones 1990; Blum & Li 1991).
The schematic structure of such a network is shown in Fig. 1. We have constructed an ANN with an input layer of three
neurons where the temperature, hydrogen density and
electron density are introduced. The neurons of the input layer have a linear activation function. The hidden layer consists on
neurons
with a non-linear activation function
.
Each hidden neuron is
connected to all the neurons of the input layer by a certain weight
(indicated by arrows in the figure). The value obtained at each hidden neuron is a linear combination of the values
at the neurons of the input layer multiplied by the weights. These values are then applied the non-linear function
,
multiplied by another set of weights and summed to give the final output of the neural network. The output of the network can be written
as:
![]() |
Figure 1: Schematic structure of the neural network model used for approximating the ICE results. The input layer and the output layer are linear, while the hidden layer is assumed to be non-linear, with an hyperbolic-tangent activation function. The input layer consists of three neurons for the temperature, hydrogen number density and electronic number density. The output neuron gives the partial pressure of element i for the physical conditions in the input layer. |
| Open with DEXTER | |
The use of a neural network approach for the solution of the ICE problem is very appealing
for several reasons.
Firstly, we obtain an analytic function
which is infinitely differentiable and which could be easily used in any subsequent calculation. This way, we can investigate with great
detail the dependence of the atomic and/or molecular abundances with temperature, hydrogen density and electron density. Secondly, the
powerful approximation capabilities of ANNs make them very suitable for
multi-dimensional interpolation problems. Even more striking is the ability
of ANNs to extrapolate data outside the range of parameters used in the
learning process. A direct consequence of these
properties is that the number of points that need to be used for the learning
process can be very small compared to
the case of standard interpolation techniques. Furthermore, the number of parameters needed for performing the interpolation
is also small. Essentially, one transforms the problem from storing in memory the whole set of N points which relates T, n(H) and n(e) with
the partial pressures Pi for every species to storing only the
weights present in Eq. (13) for each
species. This reduction is proportional to the ratio
,
which can be very large for large N, as we will see
below. Finally, it is also important to note that the neural network can be straightforwardly implemented in parallel architectures.
In order to obtain a neural network which is able to approximate the abundance of each species included in the chemical equilibrium calculation, we randomly select Nl sets of temperature, hydrogen density and electronic density. For each one of these combinations, we solve the ICE equations and calculate the abundances of all the atomic species. We have verified that taking Nl=1000 gives a good coverage of the parameter space, considering that they are randomly selected. The temperature is varied in the range from 3500 K and 14 000 K, which is representative of the physical conditions in typical stellar atmospheres. Concerning the hydrogen and electronic abundances, we decided to use a logarithmic scaling. The hydrogen density is varied from 1012 cm-3 to 1018 cm-3 while the electronic density is varied from 108 cm-3 to 1017 cm-3. Again, these values are a reasonable representation of realistic conditions.
Once the learning set is selected, the training of the network for each species i reduces to the minimization of the following error function:
![]() |
Figure 2:
Relative error obtained after the training of the neural network for hydrogen, oxygen, carbon and iron. The distribution of
relative errors are shown for the learning set and for a verification set, which aids at detecting overtraining. We indicate the value
of the standard deviation |
| Open with DEXTER | |
It is interesting to note that the usual approach of training the neural network with a learning set obtained from the original problem
can be circumvented here. We do not need to rely on the precision given by the
Newton-Raphson solution and the neural network can be made as precise as desired. This is a consequence of the fact that the equations
describing the ICE approximations are known. One can see that the equations of ICE can be formally written as
,
where
is the set of
partial pressures (or atomic number densities) which satisfy the equations. Therefore, we can build
as many neural networks as species included in the ICE problem and train the network by minimizing the following error function:
A critical parameter in the training ability of a neural network is the number of hidden neurons
.
It is important to build
networks with sufficient number of hidden networks to accurately approximate the non-linear function we are interested
in. However, since the number of learning points is not infinite, the number of hidden neurons one can use is practically limited by the
overtraining phenomenon (in principle, the number of training points should be much larger than the number of hidden neurons). If
is very large, the network is capable of correctly approximating all the points in the learning
set but its interpolation capacity is lost, producing large oscillations between the points of the learning set. This is a direct
consequence of the increase in the degrees of freedom of the network when the number of hidden neurons is increased. One way to avoid
this overtraining is to use two different data sets: one for learning purposes and the other for testing. When the training process
takes places, the error given by expression (14) is evaluated for the training set and for the test set. When
the network is training correctly, both errors are reduced. When the network starts to be overtrained, the error of the test set
starts to increase. We have followed this scheme in order to stop the
training process before overtraining.
In our case, we have used two different values for the number of hidden neurons. We have verified that
gives sufficiently
accurate results for 13 of the 21 atomic species included in the ICE calculations, while we were forced to increase this number to
for the rest of species. Table 1 lists the
number of hidden neurons for each species. We
did not investigate the dependence of the minimum error on the number of hidden neurons so that the increase to
was
somewhat arbitrary.
![]() |
Figure 3:
Histograms of relative errors for the 2.55 |
| Open with DEXTER | |
The training of the neural networks was stopped when the standard deviation of the relative error between the output of the neural network and the values obtained from the solution of the ICE equations for both the learning and the test sets was 1% or below. Obviously, the relative error was always larger for the test set than for the learning set, but the training always reached the limit of 1% and was stopped before any indication of overtraining was observed. In Fig. 2 we present the relative error obtained for the 1000 training points for the abundances of H, C, O and Fe. The behavior of the relative error for the rest of species follows the same behavior, except for He, which presents a much smaller dispersion. Since He cannot form molecules, it is always in its atomic form and the non-linear mapping that the neural network has to learn is strongly simplified.
We have indicated in each plot the value of the standard deviation and the
percentage of points which are contained within 1
,
2
and 3
.
This is an indication of the precision obtained with
the neural network. It is common in all the networks
that more than
95% of the points are inside 3
,
which translates into a relative error of
3% (since
is
always of the order or below 1%). However, we note that it is common to all the networks that more than 75% of the points are
below 1
.
Along with the plots of the results obtained for the learning set, we have included the results for the verification
set. It is interesting to note that, although the standard deviation of the distribution of relative errors is larger than that obtained
for the learning set, typically more than
80% of the points are below this limit. In the case of the network for the carbon
abundance, we have a relatively high value for
,
but more than 90% of the points are below this limit.
Once the neural networks were trained to the desired precision, as shown in Fig. 2, we have applied
them to realistic situations in order to compare their performance with the
complete ICE calculations. To this end, we make use of a
three-dimensional snapshot of the convection simulation of the solar atmosphere of Asplund et al. (2000).
We consider this simulation to be
an adequate representation of the physical conditions in the solar atmosphere since several investigations are reporting good agreements
between the synthetic line profiles and the observed ones (e.g., Asplund et al. 2004,2003; Shchukina & Trujillo Bueno 2001). The snapshot box
size is
,
so that the number of points is
.
We have solved the ICE problem in each point
and confronted the results with those obtained from
the neural networks.
![]() |
Figure 4: Number density of carbon and oxygen obtained with the simplified model and with the neural network for two different combinations of hydrogen and electron densities and for different values of the temperature. Note that this kind of investigation is greatly simplified due to the analytical character of the neural network. |
| Open with DEXTER | |
The normalized error histograms are shown in Fig. 3 for H, C, O and Fe, with vertical dashed lines indicating the position
of 1
,
2
and 3
of the distribution. Note that the histograms are decreasing functions of the relative error, so that
the number of points with large relative error is much smaller than the number of points with small relative errors. It is verified that
the value of the standard deviation of the distribution of relative errors is quite similar to those obtained for the verification
set. This reinforces our confidence that the learning set and the verification set
provide sufficient coverage of the three-dimensional parameter
space
.
The shapes of the histograms turn out to be fairly linear in this log-linear scale, so that we can consider, as a first
approximation, that the number of points
with a certain relative error r is given by
.
For example, we find
that
for hydrogen,
for carbon and
for oxygen. There is an exception for iron, for
which the distribution of points with relative errors below
2.3% seems to be constant, while the number of points with errors
larger than
2.3% is three order of magnitude smaller. The behavior of the histograms for the rest of species is very similar to
those shown in Fig. 3 for the selected ones, maintaining the properties obtained for the verification sets.
![]() |
Figure 5: Contribution to the total pressure of carbon and oxygen in two different cases. The curves have been obtained using the neural networks. The addition of all the contribution closely matches the fictitious pressure of carbon and oxygen. This is the reason why the very simplified model for C and O works very well. |
| Open with DEXTER | |
Note that the CO molecule is efficiently formed when the temperature is below
5000 K in both situations. The neutral species are
mainly situated at intermediate temperatures, typical from photospheric regions, while the ionized species tend to dominate for high
temperatures, above
6000-7000 K.
We have successfully trained 21 neural networks which approximate the solution to the Instantaneous Chemical Equilibrium problem. We have generated a learning set which is representative of the physical conditions in stellar atmospheres and have verified that the coverage of the three-dimensional parameter space is sufficiently good with only 1000 points. We have trained the neural networks so that the standard deviation of the relative error were below or of the order of 1%. In order to avoid overtraining, we have employed an independent verification set. The standard deviation of the relative error in this verification set is larger than that for the learning set but also of the order of 1%. We applied the neural networks obtained to the ICE problem in a three dimensional convection simulation, representative of the physical conditions in the solar atmosphere. The histograms of relative errors built with all the points in the simulation show that the neural networks are capable of solving the ICE problem with relative errors similar to those obtained for the verification set. The advantage of the neural network approach is their intrinsic analytical character, the possibility of parallelization and the very fast evaluation. The development of a fast approach to ICE is of importance for, among others, recent multi-dimensional simulations of stellar atmospheres which include very dense grids and iterative inversions of observed spectra.
Acknowledgements
This research has been partly funded by the Ministerio de Educación y Ciencia through project AYA2004-05792 and by the European Solar Magnetism Network (contract HPRN-CT-2002-00313).