Online material
Appendix A: Database biases
The content of the computed part of the NASA Ames PAH IR Spectroscopic Database has some intrinsic biases (Bauschlicher et al. 2010; Boersma et al. 2014). These biases originate historically from a limit in available computational power, that smaller PAH species (N_{C}< 20) are more easily calculated, and a focus on astrophysically relevant species, i.e., pure, catacondensed, neutral, and singly positively ionized PAH species (Tielens 2008). The bias towards small PAHs in the database is somewhat negated by the physics of the PAH emission process and the wavelength range considered here (5–15 μm). Small PAHs get significantly hotter than their larger counterparts upon absorbing the same photon energy. This pushes more of the emission blue of the wavelength region considered here. Similarly for large PAHs (N_{C}> 80), which stay significantly cooler, more of the emission is pushed red of the wavelength region considered here. From a stability standpoint, the family of catacondensed, compact PAHs is very stable and hence more likely to survive the rigors of interstellar space (Allamandola et al. 1985). These species are well represented in the database. The database does undersample dehydrogenated PAHs, both in the levels of dehydrogenation and the possible permutations. However, it has been shown that the removal of only one or two hydrogens from catacondensed PAHs does not alter the spectrum much and that such fully dehydrogenated species in space are probably rare (Bauschlicher & Ricca 2013). Variations in peripheral hydrogen adjacencies are reflected by variations in the 10–15 μm region of the PAH spectrum (Hony et al. 2001). As PAHs become increasingly large, while remaining compact, they obtain more straight edges. This is reflected in their spectra by a strong 11.2 μm feature. Adding more irregular PAHs to the database can alleviate some of the current bias towards compact, straight edged PAHs in the database. Of all studied hetero atom substitutions, nitrogen is the most viable candidate 1) because its inclusion does not affect PAHs stability; 2) because nitrogen is abundant in the circumstellar shells around carbon rich AGB stars (Allamandola et al. 1985, 1989; Frenklach & Feigelson 1989); 3) because of the place where PAHs are thought to be formed (Boersma et al. (2006), and references therein); and 4) because of their known presence in meteorites known presence in meteorites (Hayatsu et al. 1977). Other substitutions either have little effect (e.g., silicon, magnesium) or significantly disrupt the aromatic network, and therefore reduce the stability of the PAH, e.g., oxygen (Hudgins et al. 2005). Singly charged PAH anions are well represented in the database for the larger PAH species. Considering detailed charge balance, doubly charged PAH cations only become important in the more extreme astrophysical environments and higher ionization states can safely be ignored. These and other considerations regarding database biases and their astrophysical relevance have also been discussed in Boersma et al. (2013). Our database mixed spectra are affected by these biases. The region in the spectrum most affected is the 10–15 μm range because of the underrepresentation of irregular PAHs. This could also explain the disparity between the observations and the database mixtures in these regions, e.g., the weak 12.7 μm feature.
Fig. A.1
Structure and DFT computed 5–15 μm vibrational infrared spectra of a selection of PAHs. PAHs are a class of carbonaceous molecules that form a skeleton where carbon atoms are arranged in a honeycomb structure with hydrogen atoms sitting on the periphery. Additional atoms, such as nitrogen, also can be present in the skeleton. For each species, the chemical formula, simple name (if it exists), and the corresponding midIR spectrum calculated by DFT at 0 K are given. All data are taken from the NASA Ames PAH IR Spectroscopic Database (Bauschlicher et al. 2010; Boersma et al. 2014). 

Open with DEXTER 
Appendix B: Correlation between the 1000 mixtures
Figure B1 shows the probability density function (PDF) of the correlation matrix of the 1000 mixtures (Fig. 2, left panel). The peak, average, and median correlation coefficients are shown in blue, red, and green, respectively. The 1sigma variation around the mean is shown in yellow. The peak of the PDF (the most likely correlation) is found at 0.96 and the standard deviation is 0.023, i.e., 85% of the correlations fall between 0.94 and 0.98. Only 4% of the correlations are below 0.9. The distribution is sharp and narrow, showing without a doubt that random PAH mixtures are indeed very alike.
Fig. B.1
Probability density function of the correlation coefficients between the average 5–15 μm spectra from 1000 mixtures of 548 species with random abundances between 0–1. The peak, average, and median correlation coefficients are shown in blue, red, and green, respectively. The 1sigma variation around the mean is shown in yellow. 

Open with DEXTER 
Appendix C: Statistical analysis
First we concentrate on the database and apply the following procedure:

1.
Randomly select r spectra from the database.

2.
Create m random linear combinations (mixtures) of the r spectra by assigning each spectrum a random abundance between 0 and 1 such that (C.1)where X is an m × n matrix holding m number of mixed spectra over n wavelength bins; A is an m × r abundance matrix, containing random numbers between 0 and 1; and S is a r × n matrix holding the original set of database spectra. For this analysis we set m to 100, creating 100 random mixtures each time.

3.
Repeat steps (1)(2) p times, randomly selecting a new set of r PAH spectra each time (here we vary r between 10 and 100). Thus, p spectra in matrix X are created, which we will denote as X_{i}. We use p = 100 for this analysis. This means that there are m random mixtures of r spectra, and we reselect and remix those r spectra p times.

4.
We find the maximum, minimum, and mean values of the spectra with index i, X_{i}, in the X matrix and for every wavelength bin, λ. From these values we create three spectra, S_{max}(λ), S_{min}(λ), and S(λ). The spectra S_{max}(λ) and S_{min}(λ) represent the boundaries within which any spectrum (X_{i}) falls. The mean spectrum is the kernel spectrum for that particular set of m × p mixed spectra.

5.
We calculate the Euclidian distance from the minimum and maximum spectra with respect to the mean, (C.2)(C.3)and we define N_{r} as (C.4)which is a measurement of the maximum variation in the mixtures for each set of r spectra.
Now we consider the observations and calculate N_{obs} using the following steps:

1.
Subtract a linear baseline (corresponding to the emission fromanother dust component) for each observed spectrum.

2.
Out of the ten observed restframe spectra, define a minimum, maximum, and mean at each wavelength bin.

3.
Similarly to step 4 in the database analysis, we find the maximum, minimum, and mean value of the observed spectra with index i, X_{i}, in the X matrix and for every wavelength bin, λ. From these values we create three spectra, S_{obs,max}(λ), S_{obs,min}(λ), and S_{obs}(λ). The spectra S_{obs,max}(λ) and S_{obs,min}(λ) represent the boundaries within which any of the observed spectra (X_{i}) fall. The mean spectrum S_{obs}(λ) is the average of the particular set of restframe observations presented in Fig. 1. Similarly to (6) for the database spectra, we define
Fig. C.1
Top panel: range of variations in the kernel spectra as a function of the number of PAH species considered in the mixture: 10 species (light grey) through 90 species (dark grey). In red is the range for 100 species and in black the range for 548 species. Bottom panel: evolution of the norm N_{r} which captures the variations in the kernel spectra (blue line) and the norm N_{obs} which captures the variations in the observations (see text for details). 

Open with DEXTER 
Appendix C.1: Comparison between database and observations statistics
In Fig. C.1, we present the results of the statistical analysis in a graphical way. The top panel of Fig. C.1 shows the shaded regions between S_{min} and S_{max}, which highlights the boundaries of the m × i mixtures of r spectra. The lightest grey shaded region represents r = 10, and increasingly darker greys represent r = 20 − 90. The red region is where r = 100 and the black region is when r = 548, i.e., the whole database. It is clear from the figure that, by increasing the number of species in the sample, the resulting variations between the kernel spectra decrease. This can be investigated more quantitatively, by following the evolution of the norm N_{r} as a function of the number of species present in the mixture r. This is done in the bottom panel of Fig. C.1 where the decrease of N_{r} can be seen clearly. One way to compare the variations of the observed AIB spectrum with those present in the kernel spectra, is to compare N_{r} with N_{obs} which has a constant value reported in Fig. C.1. When N_{r}<N_{obs}, the spectral variations (in terms of Euclidian norm as defined in Appendix C) of the database mixture are within the spectral variations observed in PAH. This happens when r> 30 (Fig. C.1).
Appendix D: Blind signal separation
Blind signal separation is commonly used to restore a set of unknown source signals from a set of observed signals which are
mixtures, or combinations, of these original source signals, with unknown mixture parameters (Hyvarinen et al. 2001). Several methods and algorithms exist in the literature. The astronomical PAH cation and neutral spectra presented in this Letter were obtained with Lee and Seung’s nonnegative matrix factorization (Lee & Seung 2001, NMF; NMF was applied to data of the reflection nebula NGC 7023 obtained with the Infrared Spectrograph onboard the Spitzer Space Telescope. Details on the procedure can be found in Berne et al. (2010).
© ESO, 2014