Quantum modeling of the optical spectra of carbon cluster structural families and relation to the interstellar extinction UV bump

Context. The UV bump observed in the interstellar medium extinction curve of galaxies has been assigned to π→π transitions within the sp2 conjugated network of carbon grains. These grains are commonly thought to be graphitic particles or polycyclic aromatic hydrocarbons. However, questions are still open regarding the shape and degree of amorphization of these particles, which could account for the variations in the 2175 Å astronomical bump. Optical spectra of graphitic and onion-like carbon structures were previously obtained from dielectric constant calculations based on oscillating dipole models. In the present study, we compute the optical spectra of entire populations of carbon clusters using an explicit quantum description of their electronic structure for each individual isomer. Aims. Our aim is to determine the optical spectra of pure carbon clusters Cn=24,42,60 sorted into structural populations according to specific order parameters, namely asphericity and sp2 fraction, and to correlate these order parameters to the spectral features of the band in the region of the UV bump. Our comparison involves data measured for the astronomical UV bump as well as experimental spectra of carbon species formed in laboratory flames. Methods. The individual spectrum of each isomer is determined using the time-dependent density functional tight-binding method. The final spectrum for a given population is obtained by averaging the individual spectra for all isomers of a given family. Our method allows for an explicit description of particle shape, as well as structural and electronic disorder with respect to purely graphitic structures. Results. The spectra of the four main populations of cages, flakes, pretzels, and branched structures (Dubosq et al. 2019, A&A, 625, L11) all display strong absorption in the 2–8 eV domain, mainly due to π→π transitions. The absorption features, however, differ from one family to another and our quantum modeling indicates that the best candidates for the interstellar UV bump at 217.5 nm are cages and then flakes, while the opposite trend is found for the carbonaceous species formed in flame experiments; the other two families of pretzels and branched structures play a lesser role in both cases. Conclusions. Our quantum modeling shows the potential contribution of carbon clusters with a high fraction of conjugated sp2 atoms to the astronomical UV bump and to the spectrum of carbonaceous species formed in flames. While astronomical spectra are better accounted for using rather spherical isomers such as cages, planar flake structures are involved as a much greater component in flame experiments. Interestingly, these flake isomers have been proposed as likely intermediates in the formation mechanisms leading to buckminsterfullerene, which was recently detected in space. This study, although restricted here to the case of pure carbon clusters, will be extended towards several directions of astronomical relevance. In particular, the ability of the present approach to deal with large-scale molecular systems at an explicit quantum level of electronic structure and its transferable character towards different charge states and the possible presence of heteroatoms makes it the method of choice to address the important case of neutral and ionic hydrogenated compounds.


Introduction
The UV bump, first observed in the 1960s (Stecher 1965a), is a broad ultraviolet absorption bump observed on the interstellar medium extinction curves from the Milky Way to high-redshift galaxies, with a stable position centered at 217.5 nm, although showing variations in intensity and width in different lines of sight (Bless & Savage 1972;Fitzpatrick & Massa 1986Gordon et al. 2003;Elíasdóttir et al. 2009;Clayton et al. 2015;Zafar et al. 2018;Fitzpatrick et al. 2019, and references therein). This spectral feature has long been assigned to π→π * transitions specific to sp 2 conjugated carbon bonding of graphite particles or polycyclic aromatic hydrocarbons (Mishra & Li 2015. Graphite particles were first proposed in the mid-1960s (Stecher 1965b). Graphite, which is made of stacked graphene layers, represents the perfectly ordered prototypical solid sp 2 structure, whose optical properties in the 0.3-200 µm domain were measured during the last decade by Kuzmenko et al. (2008). Graphite's properties were included in astrophysical dust models such as the one developed by Draine & Lee (1984), the Drude model appearing as a convenient way to model the astronomical UV bump (Fitzpatrick & Massa 1986.
However, considering graphite grains of a single size and shape in dust models did not reproduce the observed width and A&A 634, A62 (2020) peak characteristics simultaneously, which led the authors to consider more spherical grains with specific size distributions in order to reach spectral agreement (Czyzak et al. 1981;Rouleau et al. 1997;Will & Aannestad 1999). In particular, and as an alternative carrier, Henrard et al. (1993) suggested the possibility of onion-type carbon particles composed of multiple shells of sp 2 carbon atoms around a spherical core, which were later explored in the laboratory and confirmed as relevant candidates (de Heer & Ugarte 1993;Henrard et al. 1997;Ugarte 1995;Chhowalla et al. 2003;Iglesias-Groth 2004). This hypothesis has been revived since C 60 buckminsterfullerene was identified in the ISM (Cami et al. 2010;Sellgren et al. 2010;Campbell et al. 2015Campbell et al. , 2016Walker et al. 2015Walker et al. , 2016Walker et al. , 2017. The variations in the UV bump width and shape have been related to the presence of defects within the sp 2 conjugated carbon network. Several theoretical and experimental studies were subsequently carried out to unravel the structural origin of these variations. The effects of disorder on the optical absorption spectra of graphitic grains were modeled by Papoular et al. (2013) who computed their dielectric constant using a tight-binding model for nano-sized building blocks of a few finite graphene sheets (flakes), introducing disorder by replacing sp 2 carbons by sp 3 or sp 1 atoms. Interestingly, these authors highlighted an effect on the width of the computed band that they located at 5.77 eV, in agreement with Draine's model (Draine & Lee 1984). Following similar ideas, classical continuum modeling based on the dipolar approximation (Moulin et al. 2008) showed that the optical response of soot particles represented by layers of carbon agglomerated as an onion depends quite sensitively on the details of the nanostructure. Experimental studies also pointed out the role of curvature and probably identified onion-like structures as shaping the band at about 220 nm in carbonaceous particles (Llamas-Jansa et al. 2007;Jaeger et al. 2008).
More generally, the contribution of more disordered species was further elaborated under the form of hydrogenated amorphous carbon (HAC; Jones et al. 1990;Gadallah et al. 2011) and soot particles (Wada et al. 1999;Gavilan et al. 2016Gavilan et al. , 2017. In an attempt to establish correlations between carbon grain morphology and spectral data, Rotundi et al. (1998) proposed that carbon nanostructures ordered on the micrometer scale could be better candidates to interpret the UV bump rather than graphitic particles. This motivated the determination of experimental optical spectra of soot particles for species produced either in plasma form (Wada et al. 1999) or in flames (Gavilan et al. 2016(Gavilan et al. , 2017. Wada et al. (1999) produced what they refer to as quenched carbonaceous composites (QCCs), and proposed that they were constituted of onion-like carbonaceous particles with a small amount of sp 3 carbon atoms. In their flame experiments, Gavilan et al. (2017) produced compounds analogous to interstellar carbon dust with various levels of nanostructuration (Carpentier et al. 2012), leading to variations in the positions of the UV bump.
Several experimental and theoretical studies have also suggested that polycyclic aromatic hydrocarbons (PAHs) might contribute to the UV bump (Beegle et al. 1997;Steglich et al. 2010;Malloci et al. 2008Malloci et al. , 2007a. Together with larger grains, PAHs were included in the model proposed by Cecchi-Pestellini et al. (2008). However, the corresponding structures and charge states were chosen in an ad hoc way and in limited numbers (50 molecules in the size range 10 to 66 carbon atoms, in the charge states 0, ±1, and +2 only).
It is thus commonly admitted that the UV bump is assigned to carbon grains with a high sp 2 ratio, whose variations in the electronic structure lead to changes in the shape and width of the band profile. Experimental studies have shown that the optical response of carbonaceous grains formed in flames or plasmas is related to their nanostructure (de Heer & Ugarte 1993;Mennella et al. 1995;Wada et al. 1999). Unfortunately, modeling the optical spectra of carbon grains taking into account their structure at the atomistic level of details is a theoretical challenge. So far, and to the best of our knowledge, either the Drude model or an ensemble of oscillating dipoles were considered so as to reproduce the astronomical UV bump. Although efficient, these approaches are partly phenomenological and do not account for the details of the electronic structure, especially the excited states that are involved in the optical spectrum.
As an alternative to the above-mentioned approaches, the present article proposes a novel description of the nanostructures and optical spectra of broad populations of isomers for large carbon clusters C n (n = 60, 42, and 24), employing a quantum description of the electronic structure allowing excited states to be determined. More precisely, we use time-dependent density functional tight-binding (TD-DFTB; Niehaus et al. 2001) to systematically calculate the UV-visible response for each individual isomer. The isomer populations were sampled specifically to cover highly diverse degrees of aromaticity, and structural and chemical orderings. Such large numbers of carbon atoms also allow the nanometer size range to be reached, which is relevant for interstellar carbonaceous dust particle distributions adopted in astrophysical models. The main goal of the present investigation is to assess the possible contribution of these various isomers to the UV bump and to evaluate their relevance as a component of interstellar dust. This work follows on an earlier effort in which the structural diversity itself was characterized (Bonnin et al. 2019) and its correlation with the infrared (IR) response evaluated in comparison with the spectra measured in planetary nebulae (Dubosq et al. 2019). Here we employ the same samples, but focus on the optical response in the UV-visible range, using appropriate but consistent methodologies for addressing electronic excitations in large databases containing hundreds of thousands of structures.
This article is organized as follows. In Sect. 2 we recall the methodology used to generate and analyze the isomer samples, and provide some details about the calculations of optical spectra based on TD-DFTB. In Sect. 3 we discuss the spectra of the main isomer families of C 60 clusters, as well as smaller ones. Based on these results, in Sect. 4 we discuss the possible contribution of these isomer families to the astronomical or experimentally observed UV bump. Finally, we draw some conclusions and outlook in Sect. 5.

Building up the populations of isomers
The structural diversity of size-selected carbon clusters was explored in our previous work (Bonnin et al. 2019;Dubosq et al. 2019) and classified according to dedicated order parameters, from which the asphericity β and the ratio of sp 2 hybridization among carbon atoms were found to be the most insightful. Based on these parameters, four main families could be identified: cages, planar polyaromatics (PPAs; also called flakes), pretzels, and branched structures.
Briefly, cages represent the most ordered (high % sp 2 ) and spherical isomers (low β) and the most stable ones (see Fig. 1 in Dubosq et al. 2019). PPAs or flakes are more planar (β ∼ 0.5) but still relatively ordered (medium % sp 2 , % sp 1 carbon atoms at the edge) with an intermediate energetic stability. Pretzel isomers are more disordered with an increased number of sp 1 atoms and have a quite pronounced spherical character (β < 0.4). Finally, the branched family tends to be constituted of more sp 1 atoms than sp 2 atoms with no spherical character (β close to 1). Pretzel and branched structures represent the least stable of the four populations (see Fig. 1 in Dubosq et al. 2019). These results are shown in Fig. 1a.
An initial sample of about 100 000 structures was determined for each of the four families from a force field exploration (Bonnin et al. 2019) followed by local optimization (Dubosq et al. 2019) at the level of self-consistent charge density functional tight-binding (SCC-DFTB) (Elstner et al. 1998). Among these structures, between about 5000 and 12 000 (8-17%) were selected for electronic spectroscopy in order to save computational burden (see Fig. 1a for n = 60, Figs. A.1a and b for n = 42 and 24). The characteristics of these families are reported in Table A.1. These configurations were selected based on their values of β and sp 2 fraction lying near the average values of the entire sample for this family and within two standard deviations of the corresponding distribution. When the number of configurations meeting this criterion was considered too large with respect to computational time limitations, some were randomly removed.

Computation of the optical spectra with the TD-DFTB approach: benchmark and limitations
Full time-dependent density functional theory (or higher level methods) are not practical for treating thousands of systems containing hundreds of valence electrons, and the TD-DFTB method (linear response formulation) derived by Niehaus et al. (2001) was used instead to calculate vertical excitation spectra for all minima in our DFTB sample. TD-DFTB was originally shown to satisfactorily reproduce π→π transitions in π-conjugated systems, and the main features of the experimental UV-visible spectrum of C 60 buckminsterfullerene in the 3-7 eV range, the peak positions being systematically redshifted similarly to the linear response TD-DFT results (Niehaus et al. 2001).
The good performance of TD-DFTB relative to TD-DFT for polyaromatic compounds was further demonstrated on the examples of cationic PAHs (Rapacioli et al. 2015). TD-DFT itself is not exact, and TD-DFTB has inherited some of its limitations, notably for double excitations that are not accounted for or the use of an approximated exchange correlation functional.
In the present work, and to further decrease the computational cost, the dimension of the TD-DFTB linear response matrix was reduced by neglecting the contribution of the highestenergy single-particle excitations (higher than 24 eV for all sizes and families of structures). The technique is described in Appendix B.3. Overall, the accuracy of TD-DFTB was assessed against available experiments and dedicated TD-DFT calculations using the BLYP (Becke Lee-Yang-Par) functional and the pc-2 (polarization consistent) basis set (Jensen 2002(Jensen , 2001 for selected isomers and the details of this benchmarking procedure are provided in Appendix B. This benchmarking could be performed only up to about 8 eV, above which the description of the electronic excitations becomes less reliable (see Fig. B.3). Higher lying excitations are likely to involve orbitals of Rydberg character (Malloci et al. 2007a), which are not present in our theoretical method due to the use of a confined minimal basis set. We thus limit our comparison to excitation energies below 8.2 eV (∼150 nm) in the following discussion. From this benchmarking it also follows that transition energies should be scaled by 0.988 for a better comparison with experimental measurements, the maximum discrepancy on band positions remaining within 0.3-0.4 eV for individual fullerenes (see Tables B.1

and B.2).
Having validated the TD-DFTB approach for the present system, the individual spectra were computed, including 3000 excited states for isomers of C 60 , 1050 for C 42 , and 600 for C 24 . These rather large numbers of excited states are required to access the region of the UV bump and the onset of the far-UV region where the spectrum starts to increase (Greenberg et al. 1983). All absorption bands shown below were convoluted with a Lorentzian profile with a full width at half maximum (FWHM) of 0.01 eV, a value chosen to match the typical band profile of the first electronic transitions of such large molecules (Pino et al. 2011).
Following the approach used to analyze the IR spectral domain (Dubosq et al. 2019), global optical spectra for each of the four families (and subfamilies) were constructed by adding the individual contributions from all members of those families.
The (TD-)DFTB calculations were performed with the deMonNano software package (Heine et al. 2009) employing the mio set of parameters (Elstner et al. 1998). TD-DFT calculations were carried out with the Orca software package (Neese 2012).

Results: electronic spectra of C n (n = 60, 42, 24) isomer families
The optical absorption spectra computed for the four isomer families of C n (n = 60, 42, 24) are reported in Fig shapes that differ from one family to the other. The evolution of the maximum position as a function of size is notably reported in Fig. 5b and the corresponding values are given in Table C.1. Despite the multideterminantal character of the transitions making the analysis more difficult, insights into their nature were gained by visualizing the orbitals of the dominant single-particle contributions for a selected number of transitions at various energies, for a small sample of structures within each family. Results for the four specific isomers depicted in Fig. 1b are given in Appendix C.3. In the present work, we refer to σ and π characters as local concepts because the isomers have no particular symmetry in general: σ orbitals are symmetric with respect to the considered C-C bond, whereas π orbitals are antisymmetric with respect to a local molecular plane.
The optical spectrum of the cage population, characterized by a high sp 2 ratio, exhibits a pronounced absorption band centered at 5.52 eV (225 nm or 4.45 µm −1 ) with a FWHM (obtained after continuum substraction) of 1.53 eV in the case of C 60 , corresponding to (σ, π)→π transitions (see Fig. C.1 with examples of involved orbitals). At higher energy, the intensity increases again, starting at ∼8 eV. When decreasing the size of the cluster to n = 42, the position of the maximum is found to be hardly affected (5.49 eV, FWHM = 1.49 eV), while the red tail of the highest-energy band appears redshifted. The case of C 24 is somewhat different because of the very limited number of cage isomers it supports (11 only). The spectra for subfamilies of a given sp 2 fraction are reported in Figs. 3a and b for C 60 and C 42 , respectively. Decreasing the number of sp 2 atoms hardly affects the position of the maximum (5.41-5.52 eV for C 60 , 5.42-5.47 eV for C 42 ). However, the shape of the bump appears sensitive to this order parameter in the case of C 60 and broadens with decreasing sp 2 fraction, the FWHM varying from 1.13 (100% sp 2 ) to 1.72 eV (93% sp 2 ).
The spectrum of the flake population is characterized by one broad band centered at 5.74 eV (215 nm or 4.66 µm −1 ) in the case of C 60 , and is followed by a plateau at higher energies. Upon decreasing the cluster size, the band maximum shifts towards higher energies, 6.17 eV for C 42 and 8.48 eV for C 24 . The shape of this band, which is particularly broad for the smaller cluster, suggests that it originates from the combination of different sub-bands. This is likely due to an evolution in the nature of the electronic transitions, from mostly π→π character at lower energies reaching ∼6-7 eV toward increasingly σ→π character at higher energies (see The spectra for subfamilies of a given sp 2 fraction are reported in Figs. 4a and b for C 60 and C 42 , respectively. Decreasing the number of sp 2 atoms hardly affects either the position of the maximum (5.69-5.77 eV in the case of C 60 ) or the shape of the bump. However, in the case of C 42 , the position appears slightly blueshifted when decreasing the sp 2 fraction, varying from 6.10 to 6.24 eV.
The spectrum of the pretzel population displays only one broad band with a maximum at 7.52 eV (165 nm or 6.07 µm −1 ) in the case of C 60 , which marks the onset of a plateau at higher energies. The position of the maximum tends to be shifted towards higher energies with decreasing cluster size. In the studied Finally, the combined spectrum of the branched structures shows one band in the selected energy range. This band is centered at 5.31 eV (233 nm or 4.28 µm −1 ) in the case of C 60 and  Fitzpatrick & Massa 2005 or in circumstellar environnments (CSM; Drilling et al. 1997;Buss et al. 1989;Waelkens et al. 1995;Gavilan et al. 2017) is also reported. Panel b: position of the first electronic transition, with dashed lines drawn to guide the eyes. astronomical features of the interstellar and circumstellar environments. The evolution of the first electronic transition energy as a function of size for all families and all cluster sizes is reported in Fig. 5b. The value of this gap is found to decrease with increasing cluster size, which was expected since π→π transitions are involved, and this gives credit to the method used.
In summary, the spectra of the cage population are hardly dependent on size and so are the spectra obtained for pretzels structures, albeit to a lesser extent. In particular, the position of the lowest-energy broad band found for the cages remains unchanged for C 42 and C 60 , while the red tail of the highestenergy band appears redshifted with decreasing cluster size. In the spectra of the flake and branched populations, the unique broad band becomes blueshifted and flattens out with decreasing cluster size. However, our approach does not allow us to analyze the origin of this plateau at higher energies that lie beyond the benchmarking range.
At this stage we showed how a quantum description of the electronic structure of carbon clusters classified into distinct families enables us to establish a correlation between the spectral and the structural features. In the next section we consider the contributions of the different families to the astronomical UV bump or to the experimental spectrum of carbon species produced in laboratory flames.

Comparison with astronomical and experimental data
In order to assess the possible contributions of the previously presented families to the astronomical UV bump, we attempted to fit the average galactic extinction curve reported by Fitzpatrick & Massa (2007) using the least-squares method including all populations of all carbon clusters sizes (see Fig. 6a). We focused on the UV bump only, i.e., we did not try to fit the far UV rise or the energy range lower than 4 eV (3.2 µm −1 ) as the lowestenergy part exhibits a shoulder near 2.8 µm −1 partly assigned to the contribution of larger grains in the dust distribution in astrophysical models (see Fig. 5 in Clayton et al. 2015, and references therein). Since the spectra of pretzels and branched isomers appear to have characteristic features far from those of the astronomical UV bump, we constrained in our fit the weights of pretzels and branched isomers to be lower than those of cages and flakes. This assumption is justified by the greater amount of sp 1 atoms these isomers contain and their larger amorphous character with respect to the cages and flakes families, consistently with the suggestion that the astronomical UV bump is due to the resonance of electrons in sp 2 orbitals of a conjugated system. As can be seen in Fig. 6a, a good agreement could be obtained for 55% of cages (24% for n = 60, 31% for n = 42) and 31% of flakes (n = 60). The presence of the cage population reproduces the top of the bump, but it cannot reproduce its red and blue tails, while the contribution of the flakes tends to correct this shortcoming. The isomers belonging to the cage and flake families possess mainly conjugated sp 2 atoms, but also a smaller fraction of sp 1 atoms that can be regarded as defects. In addition, more than half the total population that fits has a spherical character (cage family). These results are quite consistent with previous astrophysical modeling and experimental data which concluded that grain size and shape distributions are necessary to mimic the UV bump feature (Czyzak et al. 1981;Rouleau et al. 1997;Will & Aannestad 1999), and that pure graphite alone could not account for the 2175 Å feature. The cage family can be regarded as one layer of the carbon-onion structures, considered as good candidates to account for the UV bump (Chhowalla et al. 2003, and references therein). The TD-DFTB approach does not allow a reliable determination of optical transitions for the onion species, due to possible charge transfer between layers not being properly described. A specific study based on constrained DFTB (Scholz et al. 2013) or long-range-corrected TD-DFTB schemes (Kranz et al. 2017) would be required to address this specific issue.
For the present astronomical measurements in the UVvisible range, the calculated spectrum lacks intensity at energies above ∼6 eV, and we believe that polyaromatic hydrogenated compounds might help in solving this discrepancy. Such a hypothetical contribution is supported by earlier TD-DFT calculations by Malloci et al. (2005) who found a bump at ∼6 eV for various PAH isomers. These conclusions also confirm the need for considering, in accordance with the astrophysical environments, the inclusion of a small percentage of hydrogen in carbon clusters in future extensions of the present work. We now compare our results with experimental spectra of carbonaceous species produced in laboratory flames (Gavilan et al. 2017) using an approach similar to the one used to fit the UV bump. The corresponding results are given in Fig. 6b. The best fit was obtained for more than 60% of flake isomers (n = 60, 42, and 24) and 25% of cages. Adding small amounts of pretzels and branched structures in the relative weights improves the agreement by a small but noticeable amount, in particular the red tail of the experimental bump.
Interestingly, in the experimental soot spectrum the strong UV band in the 260-200 nm range shifts towards lower energies when the height above the burner (HAB) increases (Gavilan et al. 2017), whereas in the computed spectra of the flake families, the central wavelength of the UV bump decreases with increasing cluster size (see Fig. 5). This indicates that the farther away from the burner, the larger the flakes should be. At short distances from the burner (HAB = 14 mm), the position of the maximum is blueshifted with respect to that at 22 mm, and would instead correspond to a PAH population (see Fig. B.3). Therefore, our calculations, although in a restricted size range, support the interpretation of the difference in nanostructuration between the carrier of the interstellar bump at 217 nm and that of the circumstellar bump observed at various positions in the 250-230 nm range (Gavilan et al. 2017;Drilling et al. 1997;Waelkens et al. 1995).

Concluding remarks
The optical response of pure carbon clusters C n with n = 24, 42, and 60 was computationally determined using a multiscale approach based on the systematic identification of important families of isomers and the calculation of individual absorption spectra by means of time-dependent density functional tight-binding theory. The TD-DFTB method offers a good compromise in terms of chemical accuracy and numerical efficiency, which allowed tens of thousands of structures to be analyzed. With respect to earlier classical models, the detailed effect of chemical structure (shape and carbon hybridization) on the electronic spectrum was obtained from an approximated DFT scheme. The isomer populations identified in previous work (Dubosq et al. 2019), and especially those of cage structures, were found to be of astrophysical relevance and were possibly contributing to the 6-9 µm plateau observed in planetary nebulae. Here a significant extension was provided by considering the effects of electronic excitations and the optical spectra in the UV-visible range. A few limitations of the present work which deserve more scrutiny in the future were noted. On the technical side, the optical spectra are reliable only in a rather limited range of excitation energies ( 8 eV) and cannot properly describe the contribution to the far-UV rise. Furthermore, our sample of conformations was computationally limited to three cluster sizes, instead of a more likely broad distribution of sizes, or larger clusters with a stronger graphitic character.
By assigning global absorption spectra from the (unweighted) individual contributions within each population, the present work showed that cages and flakes are likely contributors of soot constituents and carriers of the interstellar UV bump with a major contribution of cage structures to the astronomical spectra. Likewise, flakes were found to be the dominant motif to interpret measurements on laboratory analogs of carbonaceous dust. Both comparisons confirm the origin of the UV bump that is commonly assigned to π→π transitions. In particular, our quantum modeling confirms the potential contribution of mostly spherical carbon clusters, and to a lesser extent planar nanostructures, with a high fraction of conjugated sp 2 atoms to the astronomical UV bump. Interestingly, some of these isomers are potentially involved in the formation mechanism of buckminsterfullerene, which has recently been detected in space. The advantage of the approach developed in the present work is its transferable character and ability to treat chemically different systems and ionic states. The present investigation, although restricted to pure carbon clusters, paves the way for similar studies on hydrogenated clusters and, more generally, A62, page 7 of 13 A&A 634, A62 (2020) for clusters containing heteroatoms such as oxygen or nitrogen. It will be particularly interesting to apply our methodology to hydrocarbon compounds at various hydrogen loadings, notably including -but not limited to -fully or partially hydrogenated PAHs.  .1a and b show the two-dimensional distributions of isomers projected onto the (% sp 2 ratio, β) plane from which the Table A.1. Ranges in asphericity β and % sp 2 covered by the four isomer families and number # of family members for which the TD-DFTB spectra were calculated. it was necessary to assess the accuracy of its predictions against reference data. For disordered carbonaceous clusters, no experimental data on their optical spectra are available and we resorted to dedicated TD-DFT calculations. For PAH molecules Halasinski et al. 2005;Parac & Grimme 2003) and carbon chains (Maier 1998(Maier , 1997, TD-DFT calculations and experimental data are both available and those systems were also used for benchmarking. The vertical TD-DFT calculations performed in this work were achieved using the BLYP functional and the pc-2 basis set, with the geometries optimized in their ground electronic states at the same level of theory.

Appendix A: Structural diversity of C 42 and C 24
TD-DFTB vs. TD-DFT for various isomers of pure carbon clusters The comparison was limited to 15 isomeric structures of C 24 due to increased difficulties in the analysis of larger systems, plus C 60 buckminsterfullerene. In order to assign the computed transitions at TD-DFTB level to those at TD-DFT(BLYP/pc-2) level, we examined the orbitals involved in the lowest-energy transitions. High-energy transitions can hardly be analyzed due to the extreme multideterminantal character of the involved excited states. As seen in Fig. B.1, the TD-DFTB and TD-DFT transition energies are found to be highly correlated, with a Pearson linear correlation coefficient of 0.9931 ± 0.0071.
TD-DFT vs. experimental spectra for PAHs and small carbon chains Figure B.2 shows the correlation between the experimentally measured bands and the excitation energies obtained at the TD-DFT(BLYP/pc-2) level. The data were associated with each other based on the order of appearance of the band and the intensity. This correlation plot was constructed from a set of 36 molecules (26 PAHs, 8 carbon chains, and the C 60 and C 70 fullerenes in their lowestenergy structures). Again, the experimental and TD-DFT(BLYP/ pc-2) absorption bands appear highly correlated with one another and have a Pearson linear correlation coefficient of 0.9951 ± 0.0072.
Finally, the two series of results on carbon clusters and PAHs together with small carbon chains were combined, allowing us to define a global scaling factor of 0.988 ± 0.010 between the band computed at the TD-DFTB level and the reference data.

B.2. Assessment of the TD-DFTB scaling factor for PAH populations and fullerenes
As the initial benchmark was achieved on a sample of C 24 structures alone, the validity of our method was also evaluated on fullerene isomers and on PAH populations, which are both relevant to the current populations of cages and flakes identified in our systematic sampling. The optical absorption spectra of C 60 buckminsterfullerene and of C 70 fullerene computed at the TD-DFTB and TD-DFT(BLYP/pc-2) levels are represented in Fig. B.4, the positions of the main absorption bands being given in Tables B.1 and B.2, respectively.
To test the accuracy of our approach on a broader population of molecules, additional calculations were performed on the set of PAH molecules reported by Malloci et al. (2007b). The geometries were first reoptimized at the SCC-DFTB level of theory, and their averaged TD-DFTB spectrum was computed as  Notes. Experimental values are from Bauernschmitt et al. (1998) andSmith (1996).
the sum of individual spectra, weighted by the number of carbon atoms.
As can be seen in Fig. B.3, similar trends are observed for TD-DFTB and TD-DFT spectra up to ∼8.2 eV or 6.6 µm −1 . Above this limit, the oscillator strength continues decreasing in the TD-DFTB spectrum, while it increases with TD-DFT Notes. Experimental values are from Bauernschmitt et al. (1998) andSmith (1996). The red and black curves were normalized as proposed by these authors. (Malloci et al. 2007b). This difference can be explained by the intrinsic approximation of the DFTB approach, which relies on a confined minimal valence basis set. The TD-DFTB method thus clearly lacks a description of Rydberg orbitals that would be needed to properly account for higher-energy transitions above 8.2 eV.

B.3. Reduction of the TD-DFTB linear response matrix
To reduce the computational cost and enable thousands of structures to be processed, the TD-DFTB linear response matrix was truncated following the procedure detailed here. In our case the initial linear response matrix is a square matrix whose dimension is equal to the square of the number of occupied orbitals multiplied by the number of unoccupied orbitals. For instance, in the case of C 60 the size of the LR matrix amounts to 14 400 × 14 400. Due to the computational cost of the diagonalization procedure of the full LR matrix, a reduced matrix was considered by conserving only lower-energy single-particle excitations. Typically, we eliminate single-particle excitations with energies higher than 24 eV. This choice for the size of the response matrix offers a good compromise between accuracy and computational cost. It should be kept in mind that truncating the linear response matrix in this way introduces significant errors for the computed high-energy excitations, but poorly affects the low-energy excitations. We checked the validity of the approximation by insuring that the spectra were hardly modified after truncating the LR matrix (see, e.g., Fig. B.4).

Appendix C: Complementary data
C.1. Data related to computed optical spectra Table C.1 gives the main spectral features (positions and widths) of the average absorption band obtained for the four families and the three cluster sizes, as shown in Fig. 2.

C.2. Comparison with experiments
Table C.2 below gives the detailed weights employed to reproduce the experimental and astronomical spectra from the various populations of isomers and the three cluster sizes.

C.3. Nature of orbitals involved in the electronic transitions
The orbitals involved in transitions at two energies (5.3 and 8.1 eV) for the particular cage isomer of C 60 shown in Fig. 1(b) are depicted in Fig. C.1. As shown on the left side, all depleted orbitals (i.e., orbitals lacking one electron) have a mixed π and σ character. On the right side, target orbitals carrying one extra electron during the excitation are of π type. Likewise, the orbitals involved in transitions at various energies (2.1, 4.6, 5.8, and 7.7 eV) for the specific flake isomer of C 60 in Fig. 1b are represented in Fig. C.2. Here all depleted orbitals are of π character except orbitals 63 and 68, which are of σ type. Target orbitals are all of π type. Interestingly, some virtual (target) orbitals such as orbitals 127 and 123 are mostly located at the edges of the flake, highlighting differences with the orbitals involved in the electronic transitions of PAHs where the carbon atoms are bonded to hydrogen.
The orbitals involved in one intense transition at 6.6 eV for the pretzel isomer of C 60 shown in Fig. 1b are depicted in Fig. C.3. For this isomer, the depleted and target orbitals are of π and π types, respectively.
Finally, the orbitals involved in one intense transition at 4.9 eV for the branched isomer of C 60 shown in Fig. 1b are represented in Fig. C.4. The depleted and target orbitals are also of π and π types, respectively.