We explored several different settings for the PIKAIA control parameters, because the tuning of those parameters is very problem dependent (Charbonneau & Knapp 1996). The values we decided to use are listed in Table 1. Four notes can be made here.

Firstly the steady-state-delete-worst reproduction plan (irep=3) we adopted, in which we replace the least-fit individual from the population when the fitness of the new individual is superior to that of the least-fit population member. Choosing this reproduction plan implies that the elitism control parameter (ielite) is non-operative, because elitism is active by default. We evaluated two other reproduction plans (Charbonneau & Knapp 1996); full generational replacement and steady-state-delete-random. The steady-state-delete-worst reproduction plan produced on average the best results.

Secondly the mutation rate of 0.35 corresponds, in case of a default 2 digit accuracy, with the on average occurrence of 2.8 mutations per astrophysical parameter.

Thirdly, the fitness differential parameter fdif, a measure for the selection pressure, would normally be chosen as high as possible (fdif=1 in this case). However, it may possible to circumvent local minima by lowering that value a bit (Charbonneau & Knapp 1996). Setting fdif=0.95 turned out to be a good trade-off choice.

Fourthly, we want to explore as large a fraction of the parameter space as possible at the first entry in AMORE. This is done by using only an one digit accuracy (nd=1). Due to the active re-scaling of the parameter space boundaries we do not require a very high precision in our exploration. A one percent accuracy (nd=2) of the parameter space is sufficient in the subsequent PIKAIA cycles.

In biological terms, the PIKAIA control parameters define the ecosystem in which our population evolves.

All computations presented in this paper were performed with an executable generated with the g77 compiler. This executable was then installed on various PCs running Red Hat Linux 6.X and 7.X. The PCs were equipped with Intel Pentium III or Athlon processors with clock speeds ranging from 600 - 1200 MHz.

All tests, unless stated otherwise, use the synthetic population as described by Ng (1998):

Table 3: Average fitness ( $\overline {f_{\rm A}}$ ) for different values of the parameters as obtained from Table A.1. The first column displays the parameter name which is varied, the parameter value is given in the second column, the third & fourth column show respectively $\overline {f_{\rm A}}$ , uncorrected for pcorr = 0.0, together with its standard deviation, and the fifth and sixth column display the averaged values after removal of the results of the models with pcorr = 0.0. See Sect. 5.1.2 for additional details.
parameter value $\overline {f_{\rm A}}$ $\sigma _{n-1}$ $\overline {f_{\rm A}}$ $\sigma _{n-1}$

pcross 0.50 0.276 0.070 0.287 0.066

0.85 0.299 0.063 0.297 0.056

rcross 1.00 0.293 0.065 0.290 0.063

2.00 0.285 0.068 0.297 0.065

3.00 0.285 0.070 0.290 0.057

rbrood 1.00 0.286 0.077 0.230 0.059

2.00 0.292 0.058 0.301 0.055

4.00 0.285 0.066 0.276 0.067

pcreep 0.0 0.287 0.071 0.292 0.068

0.3 0.290 0.065 0.297 0.058

0.7 0.285 0.066 0.288 0.060

pcorr 0.0 0.278 0.077 NA NA

0.3 0.285 0.065 NA NA

0.7 0.300 0.057 NA NA

**Table 3:** Average fitness ( $\overline {f_{\rm A}}$ ) for different values of the parameters as obtained from Table A.1. The first column displays the parameter name which is varied, the parameter value is given in the second column, the third & fourth column show respectively $\overline {f_{\rm A}}$ , uncorrected for `pcorr` = 0.0, together with its standard deviation, and the fifth and sixth column display the averaged values after removal of the results of the models with `pcorr` = 0.0. See Sect. 5.1.2 for additional details.
parameter	value	$\overline {f_{\rm A}}$	$\sigma _{n-1}$	$\overline {f_{\rm A}}$	$\sigma _{n-1}$
`pcross`	0.50	0.276	0.070	0.287	0.066
	0.85	0.299	0.063	0.297	0.056
`rcross`	1.00	0.293	0.065	0.290	0.063
	2.00	0.285	0.068	0.297	0.065
	3.00	0.285	0.070	0.290	0.057
`rbrood`	1.00	0.286	0.077	0.230	0.059
	2.00	0.292	0.058	0.301	0.055
	4.00	0.285	0.066	0.276	0.067
`pcreep`	0.0	0.287	0.071	0.292	0.068
	0.3	0.290	0.065	0.297	0.058
	0.7	0.285	0.066	0.288	0.060
`pcorr`	0.0	0.278	0.077	NA	NA
	0.3	0.285	0.065	NA	NA
	0.7	0.300	0.057	NA	NA

4.3 Description of the tests

4.3.1 Test 1: Determining values for pcross, rcross, rbrood, pcreep, and pcorr

In the first test, we evaluate the 162 models listed in Table A.1 in order to study the effect of the PIKAIA parameters pcross, rcross, rbrood, pcreep, and pcorr on the convergence and computational effort. The test has as a secondary objective to provide an understanding of the degeneracy of the parameter space.

All astrophysical parameters to be retrieved are set free, floating between reasonable minimum and maximum values (see Table 2 for details). AMORE runs for 20 iterations of 20 generations (ngen=20) to recover the a priori known parameters of the synthetic population. The number of iterations and generations determine the total length of an evolutionary run: $20 \times 20=400$ generations. Note that the range of each parameter is set within reasonable limits and not taken excessively large, because it might lead to the case that no acceptable parameter setting is found with the standard iteration loop.

4.3.2 Test 2: Rounding

4.3.3 Test 3: Fixing parameters at the correct value

Table 4: The effect of rounding in the parameters on the fitness evaluation; $f_{\rm trial}$ is the fitness obtained with AMORE for the given values of the parameters. The average value and $\sigma$ are based on the 162 models in Table A.1. The $\sigma$ is obtained from an unweighted average for the fitness of the models involved. The $\widehat{\sigma}$ is the error as estimated by Ng 1998.
parameter log d(pc) $A_{\rm V}$ $\log t_{\rm low}$ $\log t_{\rm high}$ $[Z]_{\rm low}$ $[Z]_{\rm high}$ $\alpha$ $\beta$ $f_{\rm trial}$

original 3.906335 0 $.\!\!^{\rm m}$ 00 9.90309 9.95424 - 0.60206 0.17609 2.35 1.0 0.44597

round-v1 3.906 0 $.\!\!^{\rm m}$ 00 9.903 9.954 - 0.60 0.18 2.35 1.0 0.28595

round-v2 3.906 0 $.\!\!^{\rm m}$ 00 9.903 9.954 - 0.602 0.176 2.35 1.0 0.30812

round-v3 3.9063 0 $.\!\!^{\rm m}$ 00 9.9031 9.9542 - 0.602 0.176 2.35 1.0 0.42439

average value 3.8958 0 $.\!\!^{\rm m}$ 027 9.866 9.984 - 0.554 0.244 2.358 1.574

$\sigma$ 0.0033 0 $.\!\!^{\rm m}$ 014 0.049 0.047 0.050 0.13 0.034 1.40

$\widehat{\sigma}$ 0.012 0 $.\!\!^{\rm m}$ 06 0.043 0.023 0.18 0.08 0.03 1.4

**Table 4:** The effect of rounding in the parameters on the fitness evaluation; $f_{\rm trial}$ is the fitness obtained with *AMORE* for the given values of the parameters. The *average value* and $\sigma$ are based on the 162 models in Table A.1. The $\sigma$ is obtained from an unweighted average for the fitness of the models involved. The $\widehat{\sigma}$ is the error as estimated by Ng 1998.
parameter	log d(pc)	$A_{\rm V}$	$\log t_{\rm low}$	$\log t_{\rm high}$	$[Z]_{\rm low}$	$[Z]_{\rm high}$	$\alpha$	$\beta$	$f_{\rm trial}$
original	3.906335	0 $.\!\!^{\rm m}$ 00	9.90309	9.95424	- 0.60206	0.17609	2.35	1.0	0.44597
round-v1	3.906	0 $.\!\!^{\rm m}$ 00	9.903	9.954	- 0.60	0.18	2.35	1.0	0.28595
round-v2	3.906	0 $.\!\!^{\rm m}$ 00	9.903	9.954	- 0.602	0.176	2.35	1.0	0.30812
round-v3	3.9063	0 $.\!\!^{\rm m}$ 00	9.9031	9.9542	- 0.602	0.176	2.35	1.0	0.42439
average value	3.8958	0 $.\!\!^{\rm m}$ 027	9.866	9.984	- 0.554	0.244	2.358	1.574
$\sigma$	0.0033	0 $.\!\!^{\rm m}$ 014	0.049	0.047	0.050	0.13	0.034	1.40
$\widehat{\sigma}$	0.012	0 $.\!\!^{\rm m}$ 06	0.043	0.023	0.18	0.08	0.03	1.4

In the third test we take six models in which one of the parameters is set fixed at its correct value in order to study the effects on the convergence. The models chosen were two of high, two of intermediate and two of low fitness as determined from the first test. The convergence in this test basically can go two ways: either the convergence is faster, because less parameters have to be optimized. Or, due to the fact that AMORE has less maneuverability in this situation, the convergence is slower. We adjusted the limits for age and metallicity as given in Table 2 such that AMORE would not try to find solutions in forbidden regions of parameter space which might severely slow down convergence due to constant rejection by AMORE of the chosen parameter values.

For example, fixing the $[Z]_{\rm low}$ parameter at its correct value of - 0.60206 means that we have to adjust the lower limit for $[Z]_{\rm high}$ to - 0.60206 as well.

In the case of fixing the $\log t_{\rm low}$ parameter this also implies that the initial guess has to be adjusted. We set this initial guess to 10.1.

4.3.4 Test 4: Fixing parameters at the wrong value

In the fourth test we take six models in which one of the parameters is set fixed at 1 $\sigma$ offset (determined from the first test) from its original value, in order to study its effect on the "second best'' setting of the remaining parameters. Normally one would expect a fitness $f\!>\!{1\over3}$ . In this case, however, $F\!<\!F_{\rm P}^2+F_\chi^2 = 1^2+(1+1)^2 = 5$ and the associated fitness constraint drops to $f\!>\!{1\over6}$ . However, this assessment ignores the fact that, when a parameter is offset from its optimum value, the number of matched points will decrease and $F_{\rm P}$ increases. Using Eq. (6) one has for a good fit $F\!=\!2$ . On average the offset per parameter k from the optimum value is $\sqrt{1\over4}~\sigma_k\!=\!{1\over2}~\sigma_k$ , at best the offset is $0~\sigma_k$ , and in the worst case this is $\sqrt{2}~\sigma_k$ . So with one parameter k put at $1~\sigma_k$ offset we distinguish the three possibilities

1	at best	$F\!=\!(1+2)$	=3	$\rightarrow$	$f\!=\!{1\over4}$
2	on average	$F\!=\!(1+{1\over2})^2+{7\over4}$	=4	$\rightarrow$	$f\!=\!{1\over5}$
3	at worst	$F\!=\!(1+\sqrt{2})^2$	=5.8	$\rightarrow$	$f\!=\!{1\over6.8}\cdot$

Note that the worst case limit is in agreement with the results presented in Table A.3.

The effect of the 1em $\sigma$ offset of one of the parameters will partly be canceled by forcing other parameters away from the optimum value. For example, the effect of an increased extinction can be masked partially by generating a bluer stellar population with a lower metallicity and a younger age. The effect will be such that the fitness will not be around $f\!\simeq\!{1\over{6.8}}$ , but somewhere in the range ${1\over{6.8}}\!<\!f\!<\!{1\over3}\cdot$

We fixed the parameters both at one sigma above and one sigma below the original value, because the evolutionary effects do not have to be symmetric. The only exception is the extinction, which we only fix at one sigma above the original value of A_V = 0.0.

Again we adjusted the limits for the upper and lower limit for age and metallicity.

4 Tests

4.1 Test objectives

4.2 Setup

4.3 Description of the tests

4.3.1 Test 1: Determining values for `pcross, rcross, rbrood, pcreep,` and `pcorr`

4.3.2 Test 2: Rounding

4.3.3 Test 3: Fixing parameters at the correct value

4.3.4 Test 4: Fixing parameters at the wrong value

4 Tests

4.1 Test objectives

4.2 Setup

4.3 Description of the tests

4.3.1 Test 1: Determining values for pcross, rcross, rbrood, pcreep, and pcorr

4.3.2 Test 2: Rounding

4.3.3 Test 3: Fixing parameters at the correct value

4.3.4 Test 4: Fixing parameters at the wrong value

4.3.1 Test 1: Determining values for `pcross, rcross, rbrood, pcreep,` and `pcorr`