Using multiobjective optimization to reconstruct interferometric data. Part I

Hendrik Müller; Alejandro Mus; Andrei Lobanov

doi:10.1051/0004-6361/202346207

Home

All issues

Volume 675 (July 2023)

A&A, 675 (2023) A60

Full HTML

Open Access

Issue		A&A Volume 675, July 2023


Article Number		A60
Number of page(s)		13
Section		Numerical methods and codes
DOI		https://doi.org/10.1051/0004-6361/202346207
Published online		30 June 2023

A&A 675, A60 (2023)

Using multiobjective optimization to reconstruct interferometric data. Part I

Hendrik Müller¹^,*, Alejandro Mus²^,3^,* and Andrei Lobanov¹

¹ Max-Planck-Institut für Radioastronomie, Auf dem Hügel 69, 53121 Bonn (Endenich), Germany
e-mail: hmueller@mpifr-bonn.mpg.de
² Departament d’Astronomia i Astrofísica, Universitat de València, C. Dr. Moliner 50, 46100 Burjassot ,València, Spain
e-mail: alejandro.mus@uv.es
³ Observatori Astronòmic, Universitat de València, Parc Científic, C. Catedràtico José Beltrán 2, 46980 Paterna, València, Spain

Received: 21 February 2023
Accepted: 24 April 2023

Abstract

Context. Imaging in radioastronomy is an ill-posed inverse problem. However, with increasing sensitivity and capabilities of telescopes, several strategies have been developed in order to solve this challenging problem. In particular, novel algorithms have recently been proposed using (constrained) nonlinear optimization and Bayesian inference.

Aims. The Event Horizon Telescope (EHT) Collaboration convincingly investigated the fidelity of their image reconstructions with large surveys, solving the image reconstruction problem with different optimization parameters. This strategy faces a limitation for the existing methods when imaging active galactic nuclei: Large and expensive surveys solving the problem with different optimization parameters are time-consuming. We present a novel nonconvex, multiobjective optimization modeling approach that gives a different type of claim and may provide a pathway to overcome this limitation.

Methods. To this end, we use a multiobjective version of the genetic algorithm (GA): the Multiobjective Evolutionary Algorithm Based on Decomposition, or MOEA/D. The GA strategies explore the objective function by evolutionary operations to find the different local minima and to avoid becoming trapped in saddle points.

Results. First, we tested our algorithm (MOEA/D) using synthetic data based on the 2017 EHT array and a possible EHT plus next-generation EHT configuration. We successfully recover a fully evolved Pareto front of nondominated solutions for these examples. The Pareto front divides into clusters of image morphologies representing the full set of locally optimal solutions. We discuss approaches to find the most natural guess among these solutions and demonstrate its performance on synthetic data. Finally, we apply MOEA/D to observations of the black hole shadow in Messier 87 with the EHT data in 2017.

Conclusions. The MOEA/D is very flexible and faster than any other Bayesian method, and it explores more solutions than regularized maximum likelihood methods. We have written two papers to present this new algorithm. In the first, we explain the basic idea behind multiobjective optimization and MOEA/D, and we use MOEA/D to recover static images. In the second paper, we extend the algorithm to allow dynamic and (static and dynamic) polarimetric reconstructions.

Key words: techniques: interferometric / techniques: image processing / techniques: high angular resolution / methods: numerical

^*

The first two authors contributed equally to this work.

© The Authors 2023

Open Access article, published by EDP Sciences, under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

This article is published in open access under the Subscribe to Open model.

Open Access funding provided by Max Planck Society.

1 Introduction

Very long baseline interferometry (VLBI) is a radio interferometric technique. All antennas in the array observe the same source at the same time. The recorded signals at each antenna pair in the array are correlated gradually sampling the Fourier transform of the true image brightness distribution of the observed source with Fourier frequencies determined by baselines projected on the sky plane. Imaging, that is, the procedure of creating an image from sparsely sampled Fourier coefficients (visibilities), is a challenging ill-posed inverse problem.

Three main families of imaging algorithms in radioas-tronomy have been proposed by the community: CLEAN-based algorithms (Högbom 1974; Clark 1980; Bhatnagar & Cornwell 2004; Cornwell 2008; Rau & Cornwell 2011; Müller & Lobanov 2023a); maximum entropy-based and regularized maximum likelihood-based (RML) algorithms (Cornwell & Evans 1985; Chael et al. 2016, 2018; Akiyama et al. 2017a,b; Müller & Lobanov 2022); and Bayesian-based algorithms (Broderick et al. 2020; Arras et al. 2021; Tiede 2022). Since the point spread function (PSF) has a nonvanishing kernel (i.e., missing data), there is not a unique image reconstruction. The solution to the unpenalized optimization problem is multivalued. This inherent degeneracy is addressed by the regularizer either by imposing support constraints manually (by placing “CLEAN windows”) in CLEAN or by adding a penalty term (in RML) to the objective functional. However, the result is that only one representative solution is recovered, and it does not reflect the problem of missing data due to the fact that more than one model fits the visibilities and a unique mathematically ideal solution does not exist. Instead, for these methods the problem of missing data could be addressed by testing the reconstructions with different combinations of penalizations, as done in Event Horizon Telescope Collaboration (2019, 2022). Alternatively, the third family (i.e., the Bayesian methods) can be used, as it imposes a prior distribution and looks for all possible images that fit the data, but the high computational cost makes this approach very slow, and it requires powerful machines.

The RML methods minimize the weighted sum of a data fidelity term (ensuring proximity to the observed data) and a regularization term (ensuring simplicity and fidelity of the guess solution). Viable regularization terms used in frontline VLBI applications, such as observations with the Event Horizon Telescope (EHT), include total flux and non-negativity constraints, smoothness assumptions (total variation, total squared variation), entropy functionals, and sparsity priors. For more details, we refer to the discussion in Event Horizon Telescope Collaboration (2019). More recently, multiscalar penalization terms have been proposed (Müller & Lobanov 2022, 2023b). Most RML techniques use local search techniques by quasiNewton approaches. While the reconstructions are generally excellent, with improvements compared to CLEAN in resolution, accuracy, and dynamic range in particular for very sparse data (see e.g., the comparisons in Event Horizon Telescope Collaboration 2019, 2022; Müller & Lobanov 2022; Roelofs et al. 2023), this strategy does come with two major drawbacks.

The first drawback is that the landscape of the objective is highly complex. Hence, local optimization strategies could easily become trapped in local minima instead of reaching the global minimum. Since station-based gains have a priori known uncertainties (which can however be unbound in one direction in the case of uncharacterized telescope errors) that need to be self-calibrated during the imaging, current RML pipelines use the closure phases and closure amplitudes instead of the visibilities in a first round of self-calibration (Event Horizon Telescope Collaboration 2019, 2022). The hybrid imaging problem (i.e., the common reconstruction of the gains and the image in alternating imaging and self-calibration rounds) is nonconvex. Similarly, the projection to the calibration independent closure quantities is nonconvex as well, and therefore the use of gradient descent-based optimization strategies is further questioned (i.e., the optimization problem is multimodal. Although this issue may be addressed effectively by systematic tests of various starting models, regularization parameter combinations, and reconstruction methods (as done in Event Horizon Telescope Collaboration 2019, 2022), a more global optimization strategy is desired.

The second issue is that with a larger number of possible regularization terms that need to be combined to achieve an ideal image, a priori, which selection of weighting parameters to choose is not known. This problem is typically solved with a brute force approach: a library of synthetic data is created that needs to be studied with every possible parameter combination (parameter survey). Only the parameter combination that passes several acceptance criteria will be used for the analysis of the real observational data. This procedure is tedious and time consuming. Moreover, the process is poorly motivated. The set of test images could impact the top set selection and thus the quality of the reconstruction. A multiobjective algorithm that evolves the subspace of nondominated solutions in parallel and selects the ideal hyperparameter array automatically is therefore needed.

Overall, applications to very sparse VLBI data sets, such as in Event Horizon Telescope Collaboration (2019, 2022), were successful in addressing both of the issues raised above related to the multimodality and multiobjectivity of the problem by parameter surveys, combining the reconstructions by different methods and teams, and by extensive testing of the data set. This strategy allowed for strong indications of the fidelity of the ring-like appearance of the black hole shadow. In this work, we build upon the success of such a survey strategy in identifying the fidelity of the recovered images but look for a reformulation of the problem that allows for a faster, less time-consuming alternative. We present a novel imaging algorithm that provides an independent, alternative claim on the morphology of the image. Moreover, this algorithm may provide the possibility to accelerate parameter surveys. To this end, we present a novel formulation for RML-like problems to adapt to the multimodality and multiobjectivity of the problem. All the solutions calculated in a parameter survey span a subspace of “optimally balanced” (or more correctly, nondominated solutions). Instead of computing this subspace by independent optimizations on a regular grid of coordinates and selecting the best representant (i.e., computing a parameter survey), we aim for the complete subspace to be the result of our optimization procedure. This new modeling consists of solving a multiobjective optimization problem (MOP) where the objective function is a combination of the most used regularizing terms in the current RML methods (Chael et al. 2016, 2018; Akiyama et al. 2017a,b). We solve this problem by using the global search technique of genetic algorithms (GA), in particular we utilize the Multiobjective Evolutionary Algorithm Based on Decomposition (MOEA/D) algorithm (Zhang & Li 2007). In this way, we avoid falling into one local minimum, and we ensure more diversity in the solutions. We compute a set of candidate solutions that are nondominated (i.e., optimal with respect to the corresponding regularization). Our strategy is similar in philosophy to the parameter surveys for RML methods but not equivalent to them since we use a different optimality concept. Instead of searching over several hyperparameter combinations, we jointly evolve the solutions to all hyperparameter combinations together (i.e., we speed up the exploration of hyperparameters drastically through genetic crosstalk of similar regularizer weight combinations). In a subsequent paper, we will extend the problem to capture dynamics (and then to reconstruct “movies” instead of snapshots) and polarimetry.

In contrast to CLEAN and RML methods, we do not recover only one representative solution. Instead, the hypersurface of all nondominated (locally optimal) models is recovered. Every model in this surface of possible solutions corresponds to a specific regularizer weight combination, and they are all explored in parallel. This diversity of solutions represents the full set of possible image morphologies. To find the most natural solution within this hypersurface, we propose two different approaches: one, by calculating the minimal distance to the ideal point and, two, by looking for accumulation points. Though Bayesian methods are able to reconstruct a set of candidate solutions related to the multimodality of the problem as well, the parameter space they explore makes their time performance low (they require quite a few hours or days of computation) for that scope, despite there being considerable recent progress in speeding up their performance (e.g., Knollmüller & Enßlin 2019; Tiede 2022) and Bayesian methods having been consequently applied, even for large data sets (e.g., Arras et al. 2021; Tychoniec et al. 2022). In contrast, the set of candidate solutions computed by MOEA/D does not have a Bayesian interpretation as a posterior distribution, and hence it does not allow for an uncertainty quantification.

The paper is divided as follows: in Sect. 2, we describe the basics of VLBI measurements and imaging reconstruction using RML methods. In Sects. 3 and 4, we give a short overview of multiobjective optimization, and we introduce the terms that will be useful later together with a global search technique used to solve these types of problems. We present the model of our problem in Sect. 5. Every solution of this problem is an optimal image. We test our algorithm and we discuss several points in Sect. 6 (synthetic data) and Sect. 7 (real data). For all of these tests, we use self-calibrated data. In Sect. 8, we run the algorithm in data that is not self-calibrated, and we study the importance of the initial point to constrain the problem when there is not a self-calibration model in the case of sparse uv coverage, while in cases with better uv coverage, any extra constraint is needed to recover the intrinsic source structure. The main part of this first paper ends in Sect. 9, which contains the conclusions of the work. Further appendixes are included to avoid an unnecessary extension.

2 VLBI measurements and imaging

An interferometer consists of a set of T telescopes observing the same source at the same time. The signal recorded at two independent stations is correlated. This correlation product is called visibility 𝒱(u, υ) with harmonic coordinates (u, υ) determined by the baseline of the antenna pair. The true sky brightness distribution I(l, m) and 𝒱 are related by the van-Cittert-Zernike theorem (Thompson et al. 1994): $𝒱 (u, υ) = \int \int I (l, m) e^{- 2 π i (l u + m υ)} d l d m .$ $V\left( {u,\upsilon } \right) = \,\,\,\,\int {\,\,\,\int {\,I\left( {l,m} \right){e^{ - 2\pi i\left( {lu + m\upsilon } \right)}}{\rm{d}}l{\rm{d}}m} .}$ (1)

With a full aperture, the true image could be recovered from the visibilities by an inverse Fourier transform. However, for VLBI, only a sparse subsample of the Fourier coefficients is measured. We call the subsample of measured Fourier frequencies the UV coverage. The measured visibilities along one pair of antennas (indexed with i, j ∈ 1,2,…, T) at a time t is corrupted by station-based gains ɡ_i and additional thermal noise N_i,j specific to the baseline such that the measured visibility on this baseline reads: $V (i, j, t) = g_{i} g_{j}^{*} 𝒱 (i, j, t) + N_{i, j} .$ $V\left( {i,j,t} \right) = {g_i}g_j^*V\left( {i,j,t} \right) + {N_{i,j}}.$ (2)

Closure quantities are gain-independent quantities derived from the observed visibilities, that is, the closure phases over a triangle of antennas i, j, k ∈ {1,2,…, T} are: $Ψ_{i j k} = \arg (V_{i j} V_{j k} V_{k i}),$ ${{\rm{\Psi }}_{i\,jk}} = \arg \,\left( {{V_{ij}}{V_{jk}}{V_{ki}}} \right),$ (3)

and the closure amplitudes over a rectangle of antennas i, j, k, l ∈ {1,2,…, T} are: $A_{i j k l} = \frac{| V_{i j} | | V_{k l} |}{| V_{i k} | | V_{j l} |} .$ ${A_{i\,jkl}} = {{\left| {{V_{ij}}} \right|\left| {{V_{kl}}} \right|} \over {\left| {{V_{ik}}} \right|\left| {{V_{jl}}} \right|}}.$ (4)

For RML methods, we optimize not only a single but several objective functionals (Chael et al. 2016, 2018; Akiyama et al. 2017a,b; Event Horizon Telescope Collaboration 2019; Müller & Lobanov 2022). These include data fidelity terms that measure the fidelity of the guess solution to the observed data, for example, the fit quality to the visibilities: $S_{vis} (𝒱, V) = \frac{1}{N_{vis}} \sum_{i = 1}^{N_{vis}} \frac{{| 𝒱_{i} - V_{i} |}^{2}}{Σ_{i}^{2}},$ ${S_{\,{\rm{vis}}}}\left( {M,V} \right) = {1 \over {{N_{{\rm{vis}}}}}}\sum\limits_{i = 1}^{{N_{{\rm{vis}}}}} {{{{{\left| {{M_i} - {V_i}} \right|}^2}} \over {{\rm{\Sigma }}_i^2}}} ,$ (5)

with the number of observed visibilities N_vis, visibilities of the guess solution 𝒱_i, and error Σ_i. Moreover, the fit to the amplitudes: $S_{amp} (𝒱, V) = \frac{1}{N_{vis}} \sum_{i = 1}^{N_{vis}} \frac{{(| 𝒱_{i} | - | V_{i} |)}^{2}}{Σ_{i}^{2}}$ ${S_{\,{\rm{amp}}}}\left( {M,V} \right) = {1 \over {{N_{{\rm{vis}}}}}}\sum\limits_{i = 1}^{{N_{{\rm{vis}}}}} {{{{{\left( {\left| {{M_i}} \right| - \left| {{V_i}} \right|} \right)}^2}} \over {{\rm{\Sigma }}_i^2}}}$ (6)

to the closure phases: $S_{cph} (𝒱, V) = \frac{1}{N_{cph}} \sum_{i = 1}^{N_{cph}} \frac{{| Ψ_{i} (𝒱) - Ψ_{i} (V) |}^{2}}{Σ_{cph, i}^{2}}$ $S{\,_{{\rm{cph}}}}\left( {M,V} \right) = {1 \over {{N_{{\rm{cph}}}}}}\sum\limits_{i = 1}^{{N_{{\rm{cph}}}}} {{{{{\left| {{{\rm{\Psi }}_i}\left( M \right) - {{\rm{\Psi }}_i}\left( V \right)} \right|}^2}} \over {{\rm{\Sigma }}_{{\rm{cph}},i}^2}}}$ (7)

and closure amplitudes: $S_{cla} (𝒱, V) = \frac{1}{N_{cla}} \sum_{i = 1}^{N_{cla}} \frac{{| \ln A_{i} (𝒱) - \ln A_{i} (V) |}^{2}}{Σ_{cla, i}^{2}}$ $S{\,_{{\rm{cla}}}}\left( {M,V} \right) = {1 \over {{N_{{\rm{cla}}}}}}\sum\limits_{i = 1}^{{N_{{\rm{cla}}}}} {{{{{\left| {\ln \,{A_i}\left( M \right) - \ln \,{A_i}\left( V \right)} \right|}^2}} \over {{\rm{\Sigma }}_{{\rm{cla}},i}^2}}}$ (8)

could be used. Regularization terms measure the feasibility of the solution to fit the data with a model that is as simple as possible. Usual choices include a flux constraint f: $R_{flux} (I, f) = ‖ \int \int I (l, m) d l d m - f ‖,$ ${R_{{\rm{flux}}}}\left( {I,f} \right) = \left\| {\,\,\int {\,\,\int {\,I\left( {l,m} \right){\rm{d}}l{\rm{d}}m - f} } } \right\|,$ (9)

where f is the total flux together with norm constraints: $R_{l 1} (I) = {‖ I ‖}_{l^{1}},$ ${R_{l1}}\left( I \right) = {\left\| I \right\|_{{l^1}}},$ (10) $R_{l 2} (I) = {‖ I ‖}_{l^{2}},$ ${R_{l2}}\left( I \right) = {\left\| I \right\|_{{l^2}}},$ (11)

smoothness priors $\begin{matrix} R_{tv} (I) = \int ‖ \nabla I ‖ d l d m, & R_{tsv} (I) = \sqrt{\int {‖ \nabla I ‖}^{2} d l d m,} \end{matrix}$ $\matrix{ {{R_{{\rm{tv}}}}\left( I \right) = \,\,\,\int {\,\left\| {\nabla I} \right\|\,\,{\rm{d}}l\,\,{\rm{d}}m,} } & {{R_{{\rm{tsv}}}}\left( I \right) = \sqrt {\int {{{\left\| {\nabla I} \right\|}^2}\,{\rm{d}}l\,\,{\rm{d}}m,\,} } } \cr }$ (12)

and an entropy functional $R_{entr} (I) = \int \int I \ln (\frac{I}{M}) d l d m,$ ${R_{{\rm{entr}}}}\left( I \right) = \,\,\,\int {\,\int {I\,\ln \,\left( {{I \over M}} \right)\,{\rm{d}}l{\rm{d}}m} ,}$ (13)

where M denotes the brightness distribution of a model image.

In RML methods, these terms are added with corresponding weighting parameters (e.g., α,β,…, ι ∈ ℝ) to create the full objective functional ℱ: $\begin{array}{l} ℱ = α S_{vis} + β S_{amp} + γ S_{clp} + δ S_{cla} \\ + ϵ R_{flux} + ζ R_{l 1} + η R_{l 2} + θ R_{tv} + ι R_{tsv} + κ R_{enter} . \end{array}$ $\matrix{ {{\cal F} = \alpha {S_{\,{\rm{vis}}}} + \beta {S_{\,{\rm{amp}}}} + \gamma {S_{\,{\rm{clp}}}} + \delta {S_{\,{\rm{cla}}}}} \hfill \cr {\,\,\,\,\,\,\,\,\,\, + {R_{{\rm{flux}}}} + \zeta {R_{l1}} + \eta {R_{l2}} + \theta {R_{{\rm{tv}}}} + \iota {R_{{\rm{tsv}}}} + \kappa {R_{{\rm{enter}}}}.} \hfill \cr }$ (14)

The joined minimization of S_vis, S_amp, S_clp, and S_cla combines redundant information (e.g., the information encoded by the closure amplitudes is also encoded in the amplitudes), a potential weak point of the forward modeling techniques. In Sect. 8, we show how our algorithm works with terms that are only closure dependent.

3 Multiobjective optimization

A multiobjective (minimization) optimization problem in its general form can be stated as (Pardalos et al. 2017)

Problem 1 (MOP standard form) $\begin{array}{l} \min_{x \in D} & F (x) : = (f_{1} (x), \dots, f_{n} (x)), \\ subject to & x \in D ℝ^{m}, \end{array}$ $\matrix{ {\mathop {\min }\limits_{x \in D} } \hfill & {F\left( x \right): = \left( {{f_1}\left( x \right), \ldots ,{f_n}\left( x \right)} \right),} \hfill \cr {{\rm{subject}}\,{\rm{to}}} \hfill & {x \in D{^m},} \hfill \cr }$ (MOP)

where D is the decision space, ℝⁿ is the space of objectives, and F : D → ℝⁿ is the vector-valued multiobjective optimization functional whose individual components f_i : D → ℝ, i = 1…, m are an objective functional. The feasible set D ⊂ ℝⁿ is generally expressed by a number of inequality constraints D = {x ∈ ℝⁿ | C_i(x) ≤ 0,i = 1,…,n}. This is the setting of our work.

It is common to find discrepancies between objectives, meaning a single point x ∈ D that minimizes (maximizes) all the f_i simultaneously does not exist (Pardalos et al. 2017). Therefore, the goal of the MOP is to find the best compromise among all of the objectives. This compromise is defined by means of a special set of solutions: the Pareto front $P_{ℝ^{m}}$ ${P_{{^m}}}$ (F) (Pardalos et al. 2017).

The Pareto front consists of the Pareto optimal (nondomi-nated) solutions in the space of objectives. A point x^* ∈ D is considered Pareto optimal (nondominated) if there is no point x ∈ D such that f_i (x) ≤ f_i (x^*), ∀_i = 1,…, m and f_i (x) < f_j (x^*) for at least one j = 1,…, m. In other words, we call a point x^* ∈ D Pareto optimal if the further optimization in one objective automatically has to worsen another objective functional.

In general, $P_{ℝ^{m}}$ ${P_{{^m}}}$ (F) cannot be found analytically. In particular, most of the efforts are devoted to approximating $P_{ℝ^{m}}$ ${P_{{^m}}}$ (F) or identifying characteristic members (Pardalos et al. 2017). The selection of these represent members should be carefully done to avoid unfeasible, long running times or small diversity among the solutions. We also point out that the Pareto front represents a novel approach in VLBI to the multiple data and regularization terms discussed in Sect. 2. With the weighted sum approach (Eq. (14)) and with varying parameter combinations {α,β,…, ι}, we calculated a hypersurface that effectively approximates the Pareto front.

The Pareto front is bound by the nadir and the ideal objective vectors (see for instance Pardalos et al. 2017). While the nadir is not used in this work (we refer the reader to the bibliography for a clear definition), the ideal objective vector is the element l = (l₁,…, l_m) on D such that each component l_i is computed by (compare also Fig. 1): $l_{i} = \begin{matrix} \inf_{x \in P_{ℝ^{m}} (F)} & f_{i} (x), i = 1, \dots, m . \end{matrix}$ ${l_i} = \matrix{ {\mathop {\inf }\limits_{x \in {P_{{^m}}}\left( F \right)} } & {{f_i}\left( x \right),\,\,i = 1, \ldots ,m.} \cr }$ (15)

In this work, we used this vector to define a metric used to return one representative image, despite all the images belonging to the Pareto front being equally valid solutions.

Among all the possible strategies used to approximate the Pareto front (we refer the reader to the recent summary Sharma & Chahar 2022, for a comprehensive overview), the so-called multiobjective evolutionary algorithms (MOEAs) have been found to be efficient approaches. In this work, we used MOEA/D (Zhang & Li 2007; Li & Zhang 2009). This technique first obtains $P_{ℝ^{m}}$ ${P_{{^m}}}$ (F) by solving a set of scalar functionals associated with the objectives in a collaborating manner via an evolutionary algorithm. This cooperative strategy allows for the handling of large-scale optimization problems by decomposing them into smaller scale subproblems (Tsurkov 2001). The MOEA/D has a high search ability for continuous, combinatorial, and multiobjective optimization. The MOEA/D also has a lower complexity than other algorithms. It is out of the scope of this paper to do a comparison of algorithms for solving MOP, and we thus refer the reader to Xin-She & Xing-Shi (2019) for more details on the multiple variants of algorithms solving MOP.

Fig. 1

Schematic illustration of the Pareto front for two given functions (f₁ and f₂), its ideal point, and two concepts for the most natural preferred points.

4 MOEA/D

Nonconvex problems, as Prob. (MOP-MOEA/D), generally have more than one optimal solution. Such solutions are called “local optimal solutions”. Gradient- or Hessian-based algorithms are questionable in such types of problems because they are only able to return the first local solution they find. We refer the reader to Mus & Marti-Vidal (in prep.) for a longer discussion on the initial point dependence in noncovex problems. In this section, we summarize a global search strategy called MOEA/D, which was first proposed in Zhang & Li (2007), that overcomes this issue. For more details, we refer to Zhang & Li (2007); Li & Zhang (2009); Xin-She & Xing-Shi (2019).

The MOEA/D algorithm solves the optimization problem with a genetic algorithm. Genetic algorithms are inspired by natural evolution. At every generation, a population of solutions is created from the preceding generation. We calculated the evolution from one generation to the next generation by genetic operations, that is, by random mutation of the single representants (genes) in the population and mating (i.e., mixing) of randomly selected parents in the parent generation.

In MOEA/D, the problem is decomposed in single problems either by a Tchebycheff decomposition or by a weighted sum decomposition. For this work, we focus on the weighted sum decomposition due to its philosophical similarity to established RML reconstructions in VLBI (Chael et al. 2016, 2018; Johnson et al. 2017; Akiyama et al. 2017b). We defined weight arrays $λ^{1} = {λ_{0}^{1}, λ_{1}^{1}, \dots, λ_{m}^{1}}$ ${\lambda ^1} = \left\{ {\lambda _0^1,\lambda _1^1, \ldots ,\lambda _m^1} \right\}$ , $λ^{2} = {λ_{0}^{2}, λ_{1}^{2}, \dots, λ_{m}^{2}}$ ${\lambda ^2} = \left\{ {\lambda _0^2,\lambda _1^2, \ldots ,\lambda _m^2} \right\}$ ,… that are related to the objective functionals f₁, f₂, …, f_m. Every weight array was normalized: $\begin{matrix} λ_{i}^{j} \in [0, 1], & \forall i, j \end{matrix},$ $\matrix{ {\lambda _i^j \in \left[ {0,1} \right],} & {\forall i,j} \cr } ,$ (16) $\sum_{i = 1}^{m} λ_{i}^{j} = 1, \forall j .$ $\sum\limits_{i = 1}^m {\,\lambda _i^j = 1,\,\,\forall j} .$ (17)

Prob. (MOP) was decomposed into solving single optimization problems by a weighted sum approach: $x^{j} \in \arg \min_{x} \sum_{i = 1}^{m} λ_{i}^{j} f_{i} (x) .$ ${x^j} \in \arg {\min _x}\sum\limits_{i = 1}^m {\lambda _i^j{f_i}\left( x \right)} .$ (18)

The nondominated single solutions in {x^j} ∈ D ⊂ ℝⁿ approximated the Pareto front. The optimization was done with a genetic algorithm that interchanges information between several genes in one population at every iteration. For details, we refer the reader to Zhang & Li (2007) and Li & Zhang (2009). In a nutshell, we defined the closest neighborhoods U_B(λ^j) around every weight array λ_j. The update step consisted roughly of the following substeps performed for every j: first we selected two random indices k, l from the neighborhood U_B(λ^j). Second, we generated a new solution y^j through genetic operations from x^k and x^l, that is, by random mutation and crossover among different candidates. Third, we updated all the solutions in the neighborhood, that is, for all indices n ∈ U_B(λ^j) we set xⁿ = y if $\sum_{i = 1}^{m} λ_{i}^{n} f_{i} (y^{j}) < \sum_{i = 1}^{m} λ_{i}^{n} f_{i} (x^{n})$ $\sum\nolimits_{i = 1}^m {\lambda _i^n{f_i}\left( {{y^j}} \right)\, > } \,\sum\nolimits_{i = 1}^m {\lambda _i^n{f_i}\left( {{x^n}} \right)}$ . Finally, we found the nondominated solutions. Next, we proceed with the next update step, i.e. we reiterated the four substeps outlined above until convergence is achieved. The MOEA/D algorithm therefore evolves the population at every point in the Pareto front at the same time. Moreover, it preserves diversity since isolated neighborhoods are protected from each other.

5 Modelization of the problem

To model the problem, we chose seven objective functionals: $f_{1} : α S_{vis} + β S_{amp} + γ S_{clp} + δ S_{cla} + ζ R_{l 1},$ ${f_1}:\alpha {S_{\,{\rm{vis}}}} + \beta {S_{\,{\rm{amp}}}} + \gamma {S_{\,{\rm{clp}}}} + \delta {S_{\,{\rm{cla}}}} + \zeta {R_{l1}},$ (19) $f_{2} : α S_{vis} + β S_{amp} + γ S_{clp} + δ S_{cla} + θ R_{tv},$ ${f_2}:\alpha {S_{\,{\rm{vis}}}} + \beta {S_{\,{\rm{amp}}}} + \gamma {S_{\,{\rm{clp}}}} + \delta {S_{\,{\rm{cla}}}} + \theta {R_{{\rm{tv}}}},$ (20) $f_{3} : α S_{vis} + β S_{amp} + γ S_{clp} + δ S_{cla} + τ R_{tsv},$ ${f_3}:\alpha {S_{\,{\rm{vis}}}} + \beta {S_{\,{\rm{amp}}}} + \gamma {S_{\,{\rm{clp}}}} + \delta {S_{\,{\rm{cla}}}} + \tau {R_{{\rm{tsv}}}},$ (21) $f_{4} : α S_{vis} + β S_{amp} + γ S_{clp} + δ S_{cla} + η R_{l 2},$ ${f_4}:\alpha {S_{\,{\rm{vis}}}} + \beta {S_{\,{\rm{amp}}}} + \gamma {S_{\,{\rm{clp}}}} + \delta {S_{\,{\rm{cla}}}} + \eta {R_{l2}},$ (22) $f_{5} : α S_{vis} + β S_{amp} + γ S_{clp} + δ S_{cla} + ϵ R_{flux},$ ${f_5}:\alpha {S_{\,{\rm{vis}}}} + \beta {S_{\,{\rm{amp}}}} + \gamma {S_{\,{\rm{clp}}}} + \delta {S_{\,{\rm{cla}}}} + {R_{{\rm{flux}}}},$ (23) $f_{6} : α S_{vis} + β S_{amp} + γ S_{clp} + δ S_{cla} + κ R_{entr},$ ${f_6}:\alpha {S_{\,{\rm{vis}}}} + \beta {S_{\,{\rm{amp}}}} + \gamma {S_{\,{\rm{clp}}}} + \delta {S_{\,{\rm{cla}}}} + \kappa {R_{{\rm{entr}}}},$ (24) $f_{7} : α S_{vis} + β S_{amp} + γ S_{clp} + δ S_{cla} .$ ${f_7}:\alpha {S_{\,{\rm{vis}}}} + \beta {S_{\,{\rm{amp}}}} + \gamma {S_{\,{\rm{clp}}}} + \delta {S_{\,{\rm{cla}}}}.$ (25)

Therefore, the nonconvex multiobjective problem to be solved is

Problem 2 (MOP for imaging reconstruction) $\begin{array}{l} \min_{x \in D} & F (x) : = (f_{1} (x), \dots, f_{7} (x)), \\ subject to & x \in ℝ_{+}^{Npix} . \end{array}$ $\matrix{ {\mathop {\min }\limits_{x \in D} } \hfill & {F\left( x \right): = \left( {{f_1}\left( x \right), \ldots ,{f_7}\left( x \right)} \right),} \hfill \cr {{\rm{subject}}\,{\rm{to}}} \hfill & {x \in _ + ^{{\rm{Npix}}}.} \hfill \cr }$ (MOP-MOEA/D)

This modelization is flexible and it is easy to include new regularization functionals. Due to Eq. (17), the weighted sum decomposition with weights array {ζ, ϵ, η, θ, τ, κ, 1 – ζ – ϵ – η – θ – τ – κ} is equivalent to Eq. (14). Hence, every item in the Pareto front corresponds to the optimal solution of a single hyperparameter combination (thus replacing parameter surveys). Moreover, this selection of the objective functionals ensures that the optimization is compatible with the data in every optimization direction (i.e., every f_i assures fit quality to the data). In contrast to the parameter surveys that were proposed for RML methods but are numerically unfeasible and poorly motivated, all the solutions corresponding to a specific regularizer weight combination λ^j are evolved together in our approach. Due to the genetic mixing of neighboring solutions, MOEA/D shares information between the solutions for similar weight arrays λ^j and is improved in this regard compared to parameter surveys that calculate the solutions independently. However, we mention that while the similarity to parameter surveys constitutes the main motivation behind the application of MOEA/D for VLBI imaging, Pareto optimality is a slightly different optimization concept.

Although we use the combined objectives f_1,…,f₇ for the MOEA/D, during postprocessing we examine the front related to the penalty terms R_l1 ∝ f₁ – f₇, R_tv ∝ f₂ – f₇,…. For more details, we refer to Sect. 6.2, where we show examples of different Pareto fronts. We plotted the Pareto front in the first row as projections onto the three-dimensional domain. In the second row, we present the same front but with different projections. Every single point in these plots corresponds to an image recovered for a specific weighting combination. When inspecting the front, we identified several disjoint clusters. These clusters demonstrate several image morphologies that become visible when changing the weight parameter combinations. The image diversity within one cluster, however, is small. We found the clusters by a standard clustering algorithm. First, we renormalized every axis such that the values were in the range [0,1]. Then for every point, we found the respective neighbors (where a point is classified as a neighbor when the relative distance between them in the seven-dimensional space spanned by [f₁,…, f₇] is larger than some threshold). Finally, we classified all data points that were connected by a path of neighbors as being part of the same cluster.

We mention that every solution in the Pareto front is an optimal solution with respect to the multiobjective optimization problem. Mathematically, there is no preferred solution. All image morphologies in the different clusters are mathematically reasonable solutions that fit the data. Therefore, the Pareto front is the main output of MOEA/D representing the fact that there is not a single preferred image due to missing Fourier coefficients, and it presents an illustration of possible image features that are similar in philosophy to Bayesian algorithms (draws from the posterior distribution) and the survey of images produced in a parameter surveys. However, it is standard in VLBI imaging to select one image that is most natural with respect to scientific perception. We therefore present two strategies for finding the most natural image among all optimal solutions. These two strategies are illustrated in Fig. 1.

In one strategy, we can first define the ideal point following Eq. (15). As the ideal point is the point that would be optimal in every objective, it can be found by identifying the cross section of the minimum in every single objective (see Fig. 1). The ideal, however, has no physical meaning since it is not a solution of the Prob. (MOP-MOEA/D). A natural choice for a single image would thus be the recovered solution in the Pareto front that has the shortest (Euclidean) distance to the ideal point.

As a second strategy, we can look for accumulation points. Every cluster of solutions represents a locally optimal image morphology. We assumed that most solutions may cluster among the most natural solutions, while the edges of the parameter space with more exotic solutions are less sampled. To find a natural representative image, we therefore look for the points with the highest number of close neighbors: the accumulation point of the cluster.

The traditional multiobjective optimization theory lacks a standard methodology to address the challenge of solution selection, which entails the identification of a representative solution from an entire set of solutions comprising the Pareto front. Although a few strategies have been put forth in the literature (for instance Schwind et al. 2014, 2016), there is no consensus on an optimal approach. In this paper, we introduce two distinct criteria for solution selection: the accumulation point and the closest image to the ideal. Notably, any method that yields an image belonging to the Pareto front is a valid method of selecting a mathematical solution for the given problem.

6 Verification on synthetic data

6.1 Synthetic data

We tested our algorithm on the four synthetic geometric models (double, crescent, disk, and ring) that were used for the imaging verification by the EHT in 2017 (Event Horizon Telescope Collaboration 2019). For each model, two observational data sets were generated, one based on the EHT array of 2017 (coverage and thermal noise level of the 2017 observations of M87 on April 5) and the other based on a possible EHT plus ngEHT configuration (Roelofs et al. 2023) including ten additional stations, a quadrupled bandwidth, and an enhanced frequency coverage. We simulated synthetic data at 230 GHz and added thermal noise according to the expected systematic noise levels used for Roelofs et al. (2023).

The geometric models were the same that were used for imaging verification in Event Horizon Telescope Collaboration (2019) and Müller & Lobanov (2022). The size of the ring, crescent, and disk mimics the size of the famous black hole shadow in Messier 87 (M87; Event Horizon Telescope Collaboration 2019). The ring and crescent have a radius of 22 µas, and the disk has a diameter of 70 µas. The double image mimics a completely different image structure that has features comparable to 3C279, as seen by the EHT (Kim et al. 2020). All geometric models were normalized to a total flux of 0.6 Jy. Moreover, we blurred the geometric models with a 10 µas-blurring kernel before synthetically observing them in order to avoid sharp edges.

Fig. 2

uv coverage for M87 on April 11, 2017. The red dots show EHT, and the black crosses indicate ngEHT at 345 GHz.

6.2 EHT + ngEHT array

Figure 2 depicts the UV coverage of the EHT plus ngEHT array. The crosses and points together form the full EHT plus ngEHT array. The UV plane is less sparse for the combined EHT and ngEHT instrument, thus leading to improved constraints on the inverse problem, which remains ill posed. To mimic the uncertainty in the phases we performed the reconstruction from the amplitudes and closure quantities only as was proposed in Chael et al. (2018); Müller & Lobanov (2022) and applied in Event Horizon Telescope Collaboration (2019, 2022). In particular we used the data term combination α = 0 and β = γ = δ = 1· The weights were chosen on a grid with ten steps in every dimension, giving rise to 3003 parameter combinations that satisfy Eq. (17). We used a genetic algorithm with 4000 iterations. Moreover, we set the hyperparameters related to the genetic operations to the default values proposed in Li & Zhang (2009). For more details on the optimal choice of the genetic parameters (e.g., mating probability, random mutation size, and number of iterations), we refer the reader to Appendix A. Moreover, we added an overall scaling to every penalty term that was found to be ideal. For more details, we refer the reader to Appendix A. We selected a 40 µas Gaussian as a prior. The entropy functional was computed relative to this prior image. Moreover, we had to choose an initial population for MOEA/D. Instead of starting from a pre-imaging model, we started the MOEA/D algorithm from a random distribution. Every gene in the initial population was drawn independently from a uniform distribution with values between 0 and 1 Jy. Finally, all the initial genes in the initial population were re-scaled to the respective compact flux.

We show our results for the four geometric models in Figs. 3–6. The Pareto front is a hypersurface in a seven-dimensional space (six penalty terms and one combined data term). We plotted the front as a series of three projections into the three-dimensional space (respectively two penalty terms and the data term) in the top-row panels of Figs. 3–6. Upon a first observation, we saw that the entropy, total flux constraint, and data term are strongly correlated since a wrong total flux or source size worsens the fit to the observed amplitudes.

When inspecting specific fronts, we observed that R_l1 and R_l2 as well as R_tv and R_tSv seemed to be correlated. That is reasonable, as these regularization terms promote a similar prior information (sparsity and smoothness, respectively). In the second-row panels of Figs. 3–6, we present the same Pareto front, but this time we combined the terms with conflicting assumptions (smoothness versus sparsity), that is, the various prior assumption that we aim to balance with an RML method. As expected, we observed an anti-correlation in most cases. The Pareto front represents all optimal balances along this line of conflicting assumptions. In consequence, the front became divided into a varying number of clusters. The diversity of the images within every cluster is small, but the diversity from one cluster to the next cluster is significant. In the lower panels of Figs. 3–6, we show the ground truth image (top left) and a single representant (the accumulation point) of every cluster. We also show the solution that is preferred by the accumulation point selection criterion (i.e., the one that has the largest number of close neighbors), and we show the solution that is preferred by the closest optimum criterion. The two criteria coincide except in the disk model, but in all cases, the algorithm selected a reasonable reconstruction. For all four geometric models, the reconstruction is quite successful. The image features are recovered very well, although MOEA/D seems to slightly prefer blurred reconstructions.

Fig. 3

Solution space of the MOEA/D. First two rows: Pareto front for the crescent case using the EHT + ngEHT array. The first panel of the solution clusters (top-left corner) shows the true image. The Pareto front is a seven-dimensional hypersurface. We illustrate the Pareto front with six projections. The six projections show the correlation between two regularizers and their values with respect to f₇ (data fidelity term, only functional). The bluer the points are, the lower the value for f₇. Bottom two rows: solution clusters (following rows) for the crescent case using the EHT + ngEHT array. The family of solutions can be grouped into eight clusters. The red box surrounding the cluster indicates the preferred solution by the accumulation point strategy, while the blue box is the solution closest to the ideal.

Fig. 4

Same as Fig. 3 but for the disk model.

Fig. 5

Same as Fig. 3 but for the double model.

Fig. 6

Same as Fig. 3 but for the ring model.

6.3 EHT array

Figure 2 shows the UV coverage of the supermassive black hole M87 during the 2017 EHT observation campaign. The poor coverage of the campaign made the imaging more challenging, in particular since we were limited to the closure quantities. This resulted in a more difficult optimization problem, as demonstrated by the enhanced diversity in the population in the case of GA. The reconstruction was done with 5000 iterations and a random starting point. Moreover, we used the same genetic parameters as used for the EHT plus ngEHT data sets.

We show the recovered clustered images for all for geometric models in Figs. 7–10. The recovered images show a wide variety of image morphologies. There are reasonably well-produced (but slightly blurred) reconstructions (e.g., cluster 1 and 6 for the crescent; cluster 1 and 4 for the ring). Moreover, we observed over-resolved reconstructions (e.g., cluster 0 and cluster 4 for the crescent; cluster 0 and cluster 5 for the double structure). Since every point in the Pareto front has a one-to-one correspondence to a specific weight vector combination { $λ_{i}^{j}$ $\lambda _i^j$ }, we could investigate which solutions were causing the over-resolved structural patterns. As expected, these solutions are at the edge of the Pareto front with a dominating sparsity term. Finally, we found some clusters that show phantom repetition of the same structures (e.g., cluster 2 and 3 for the ring; cluster 2 for the disk). These secondary phantom images are not unusual for image reconstructions given the combination of visibility phase uncertainty and poor UV coverage. This issue is addressed in RML algorithms either by surveying the correct balancing of various regularization terms (Event Horizon Telescope Collaboration 2019, 2022) or by a multiscale-based hard thresholding (Müller & Lobanov 2022). An analysis of the reconstructions in MOEA/D showed that these solutions are related to the unpenalized reconstruction. This result can be well explained by the dirty beam. In particular, the cluster 2 reconstruction of the disk example resembles the dirty image, that is, the unpenalized reconstruction converges to the dirty image, as it is the easiest solution that fits the data.

For all four geometric models, we selected well-reconstructed cluster images using the accumulation point strategy and the closest neighbor strategy. These strategies may give rise to a completely data-driven image reconstruction without the need of parameter surveys. Among all the optimal images, we selected the best by looking for the image with the most close neighbors in the Pareto front.

Fig. 7

Solution clusters for the crescent case using the EHT array. The first panel (top-left corner) shows the true image. The red box surrounds the cluster indicated by the accumulation point strategy, while the blue box highlights the cluster closest to the ideal.

Fig. 8

Same as Fig. 7 but for the disk model.

Fig. 9

Same as Fig. 7 but for the double-source model.

Fig. 10

Same as Fig. 7 but for the ring model.

7 Real data

We applied our algorithm to real data taken during the 2017 EHT campaign (Papers I-VI). We reconstructed images of M87 and the calibrator source 3C279 using the UV fit files available on the official webpage of the EHT¹. To obtain the image, we used the best-parameter setting discussed in Sect. 6. For this work, we considered three variants of a data reduction pipeline: First, we used a random starting point and a random prior (Scenario A). Second, we tested a random starting point and a Gaussian prior (Scenario B). Finally, we used the image with less χ² (obtained with ehtim without any regularizer) as a starting point and a Gaussian prior (Scenario C). Although we only discuss Scenario C in this section, the remaining scenarios can be found in Appendix C. We emphasize the importance of the initial points in real data, in particular when there is sparse UV coverage. To avoid including unnecessary information in the paper, we only show the Pareto front and clustered images. We note that the convergence of the algorithm has been already shown in Sect. 6.

Messier 87 was observed across four days: April 6, 7, 10, and 11 (see for example Paper I). The reconstructions for all days can be found in Appendix B. Figure 11 depicts the set of clustered solutions for the day of April 11. Cluster 0 and cluster 1 present a good ring structure. Indeed, the representant for cluster 1 was chosen as the preferred image reconstruction, and it is very similar to the one published by the EHT. The rest of clusters have more subtle differences. The three “families” in the Pareto front can be observed in this figure.

Fig. 11

Relative Pareto fronts (top two lines) and clustered images for M87 on April 11 (bottom three lines).

8 Closure-only imaging

In this paper, we consider α to be zero and β, γ, δ to be different from zero. Hence, the reconstruction we present is independent from the highly unstable phase calibration, but the reconstruction only works properly if the data set has an amplitude that is self-calibrated, as described in Readhead & Wilkinson (1978). The aim of self-calibration is to adjust the complex gains of an interferometer by iteratively comparing the calibrated visibilities to an improved model. An accurate modeling of the source structure is crucial for the correct convergence of self-calibration (see for instance Martí-Vidal & Marcaide 2008; Mus et al. 2022).

In this section, we try our algorithm on non-self-calibrated data from the April 11 real EHT data and EHT plus ngEHT data. For this case, when the algorithm is applied to non-self-calibrated data, the nonclosure related fitting weights (α and β) are set to zero, that is, only the closure phases and the closure amplitudes are fitted, as suggested by Chael et al. (2018); Müller & Lobanov (2022). Moreover, setting α = β = 0 solves the redundancy of data terms that we mention in Sect. 2.

The reconstruction problem becomes more challenging since the number of independent closure phases and closure amplitudes are smaller than the number of independent visibilities, that is, the data consistency is less constraining (Kulkarni et al. 1991; Chael et al. 2018; Blackburn et al. 2020). Moreover, we mention that S_cph and S_cla are only approximations to the true likelihoods (Lockhart & Gralla 2022).

Figure 12 shows the set of solutions recovered using the EHT plus ngEHT array and a random brightness distribution on the pixels as starting point. We observed that MOEA/D is able to recover the intrinsic source structure even when self-calibration is not performed, although the obtained images are not as good as the ones obtained with a self-calibrated amplitude data term (i.e., β ≠ 0). We note that the selection criteria for optimal solutions seemed to not select the optimal solution in this case.

In Fig. 13, we present the reconstructions for the real M87 EHT data. The poorer the UV coverage, the less constrained the optimization problem was. In the case of self-calibrated data, the problem is biased by the self-calibration model. If there were not a “biased” model, there would be more degrees of freedom, due to the smaller number of independent closure quantities. The larger number of degrees of freedom is translated into more local minima, and in consequence, there are more clusters. We highlight that the starting point is even more crucial in the case of non-self-calibrated data (i.e., without amplitude consistency), particularly when the UV coverage is sparse. Since a closure-only data set is less constraining than fitting fully calibrated visibilities, it is harder for the method to converge by random mutation and random genetic mixing. For the left panel of Fig. 13, we used a ring as the starting point, and for the right panel, we used a Gaussian distribution. While we recovered a ring with a central depression in most of the clusters in the first case, in the second, one less ring-like structure was recovered, but it can still be seen. Hence, the intrinsic structure of the source is predominant in the data even when considering a distribution not related with the real structure as a starting point.

The quality of the obtained solutions was the worst when only closure quantities were fitted. Nevertheless, we could use one of the clusters for creating a nonbiased model for self-calibration and rerun MOEA/D with updated amplitude information. Another alternative is to use this selected cluster as the initial point and to run the MOEA/D. In this way, the MOEA/D is run iteratively, improving the starting point.

Fig. 12

Cluster solutions for the case of the EHT+ngEHT data with noise included and not self-calibrated. The starting point is a random distribution in the pixels.

Fig. 13

Clusters of solutions for the case of M87 on the April 11, 2017 EHT campaign. The starting points are a ring (left panel) and Gaussian (right panel) and play an important role in constraining the solutions in absence of a self-calibration model.

9 Summary and conclusions

Imaging in radioastronomy is an ill-posed inverse problem, and it is particularly difficult when UV coverage is very sparse, as is the case of global VLBI observations. Several strategies to overcome such a challenge using algorithms have been developed, and they can be classified into three main families: CLEAN methods, (constrained) nonlinear optimization methods, and Bayesian approaches. Each algorithm family has its advantages and disadvantages. For example, optimization methods are considerably faster than Bayesian ones, but they lack a global posterior exploration, and therefore a large, highly expensive parameter survey is then required. On the other side, Bayesian methods explore a huge set of parameters, but they have slow performance compared to nonlinear optimization methods and CLEAN, in particular for large data sets. We have identified two specific issues related to the imaging problem, namely, the problem is multimodal and multiobjective.

In this work, we have presented a novel multiobjective formulation based on evolutionary algorithms that overcomes these problems. We computed a set candidate of solutions that are nondominated (i.e., optimally balanced): the Pareto front. A parameter survey is not required anymore with this approach since the complete Pareto front of optimal solutions is evolved in parallel, considerably speeding up the time required by standard RML approaches to obtain a set of solutions. Furthermore, the result of parameter surveys depends on the set of test images. This issue does not arise for MOEA/D.

Moreover, the MOEA/D is a global search technique that is less likely to get trapped in local extrema. Therefore, we are able to recover a full subset of solutions that are the best compromise of multiple objective functionals. Every candidate solution in the Pareto front is related to a specific hyperparameter combination.

The MOEA/D algorithm is faster than any Bayesian approach, but it does not explore a posterior distribution. Nonetheless, it is very flexible, allowing for the introduction of new regularizers very easily without exponentially increasing the computing complexity. We created a clustering algorithm to group the “similar” solutions. Then, we implemented two different techniques to choose the representative image between all clusters. All of the clusters are mathematically valid images, and therefore any other criterion to choose one among all can also be used.

We successfully tested our algorithm in four synthetic models (double, disk, ring, and crescent) using a sparse array (EHT 2017) and a more complete instrumental configuration (EHT + ngEHT). Finally, we ran our algorithm in real 2017 EHT M87 data. In this work, we discussed the role of various regularization terms and their impact in a multiobjective framework. In a subsequent work, we will focus on the data terms. That is, we will include a wider variety of (nonredundant) data term combinations in the multiobjective formulation (also including dynamic and polarimetric data products), study their role in more detail, and extend MOEA/D to dynamic and polarimetric observations.

¹

https://eventhorizontelescope.org/for-astronomers/data

Acknowledgements

A.M. and H.M. have contributed equally to this work. This work was partially supported by the M2FINDERS project funded by the European Research Council (ERC) under the European Union’s Horizon 2020 Research and Innovation Programme (Grant Agreement No. 101018682) and by the MICINN Research Project PID2019-108995GB-C22. A.M. also thanks the Generalitat Valenciana for funding, in the frame of the GenT Project CIDE-GENT/2018/021 and the Max Plank Institut für Radioastronomie for covering the visit to Bonn which has made this possible. H.M. received financial support for this research from the International Max Planck Research School (IMPRS) for Astronomy and Astrophysics at the Universities of Bonn and Cologne. The authors thank Michael Janssen for his valuable comments to improve this work. We will make our imaging pipeline and our software available soon in the second release of MrBeam (https://github.com/hmuellergoe/mrbeam). Our software makes use of the publicly available ehtim (Chael et al. 2016, 2018), regpy (https://github.com/regpy/regpy), MrBeam (Müller & Lobanov 2022, 2023b,a) and pygmo (Biscani & Izzo 2020) packages.

Appendix A Parameter survey

The genetic evolution from one generation to the next generation in MOEA/D is calculated by genetic mixing and random mutation. These operations are controlled by specific control parameters. For full details, we refer to Li & Zhang (2009). Following the algorithmic outline presented in Sec. 4, assume that k, l are randomly selected indices in the neighborhood U_B(λ^j). We aim to compute a new solution y^j by genetic mixing and random mutation. The genetic mixing is (Li & Zhang 2009): $y_{i}^{j} = {\begin{array}{l} x_{i}^{j} + F \cdot (x_{i}^{k} - x_{i}^{l}) & with probability CR \\ x_{i}^{j} & with probability 1 - CR . \end{array}$ $y_i^j = \left\{ {\matrix{ {x_i^j + F \cdot \left( {x_i^k - x_i^l} \right)} \hfill & {{\rm{with}}\,{\rm{probability}}\,{\rm{CR}}} \hfill \cr {x_i^j} \hfill & {{\rm{with}}\,{\rm{probability}}\,1 - {\rm{CR}}{\rm{.}}} \hfill \cr } } \right.$ (A.1)

Random mutation can be written as (Li & Zhang 2009): $y_{i}^{j} = {\begin{array}{l} x_{i}^{j} + σ_{i} \cdot (b_{i} - a_{i}) & with probability p_{m} \\ x_{i}^{j} & with probability 1 - p_{m} \end{array},$ $y_i^j = \left\{ {\matrix{ {x_i^j + {\sigma _i} \cdot \left( {{b_i} - {a_i}} \right)} \hfill & {{\rm{with}}\,{\rm{probability}}\,{p_m}} \hfill \cr {x_i^j} \hfill & {{\rm{with}}\,{\rm{probability}}\,1 - {p_m}} \hfill \cr } } \right.,$ (A.2)

where a_i and b_i are the lower and upper bound of the current decision vector. The magnitude of mutation is controlled by σ_i (Li & Zhang 2009): $σ_{i} = {\begin{array}{l} {(2 \cdot rand)}^{\frac{1}{η + 1}} - 1 & if rand < 0.5 \\ 1 - {(2 - 2 \cdot rand)}^{^{\frac{1}{η + 1}}} & otherwise \end{array},$ ${\sigma _i} = \left\{ {\matrix{ {{{\left( {2 \cdot {\rm{rand}}} \right)}^{{\textstyle{1 \over {\eta + 1}}}}} - 1} \hfill & {{\rm{if}}\,{\rm{rand}}\,{\rm{ > }}\,{\rm{0}}{\rm{.5}}} \hfill \cr {1 - {{\left( {2 - 2 \cdot {\rm{rand}}} \right)}^{^{{\textstyle{1 \over {\eta + 1}}}}}}} \hfill & {{\rm{otherwise}}} \hfill \cr } } \right.,$ (A.3)

where rand is a random number uniformly distributed in [0,1]. Overall, we have four control parameters: two related to the genetic mixing (F, CR) and two related to the polynomial mutation (p_m, η). Moreover, the number of genes per generation and the number of generations are free parameters. A false combination of hyperparameters could keep MOEA/D from reaching convergence (e.g., if the random mutation is too small) or from having diversity in the solution (e.g., if the genetic mating appears too often). Thus, we started from the default choices suggested in Li & Zhang (2009) and surveyed several parameter combinations for the crescent example with EHT coverage. We changed F ∈ [0.1,0.5,0.9], η ∈ [5,20,50] and changed the grid of the weighting combinations to 10⁷,30⁷, 100⁷, and 500⁷ weight combinations. Moreover, we tested 1000, 3000, and 5000 iterations. Except for parameters far out of the standard range (i.e., F = 0.9), the performance was overall quite similar, with a tendency toward more generations, a smaller number of weight arrays, and genetic parameters close to those that were found to be optimal in Li & Zhang (2009). So we used CR = 1, F = 0.5, η = 20, and p_m = 0.9 for this manuscript. The number of weight parameter combinations was limited to 10⁷ combinations to keep the algorithm reasonably fast. Moreover, our investigation of more parameter combinations did not suggest an improved performance. Using 3000 – 5000 iterations seems sufficient and is also supported by the frozen-in-convergence condition of the decision vectors at these generations.

In MOEA/D, we solved problems of the form: $x^{j} \in \arg \min_{x} \sum_{i = 1}^{m} λ_{i}^{j} f_{i} (x),$ ${x^j} \in \arg {\min _x}\,\sum\limits_{i = 1}^m {\lambda _i^j{f_i}\left( x \right)} ,$ (A.4)

with the objective functionals discussed in Sec. 5. Every point in the Pareto front represents a solution optimal with respect to the local parameter combination. In this way, we tested the output of RML imaging with several parameter configurations. However, due to the normalization of λ^j and the limited grid size, the weighting parameters for different optimization terms differ only by one order of magnitude. Since the regularization terms are not normalized with respect to the pixel size (e.g., TV-pseudonorm or l¹ -norm change when a smaller pixel size is used), we had to re-scale the regularization terms to a similar order of magnitude of impact before running MOEA/D. We varied the pre-factor for R_l1, R_l2, R_tv, and R_tsv between [1,10, and 100] and the pre-factor for entropy regularization between [0.1,1, and 10]. The parameter combinations were tested on synthetic EHT and EHT plus ngEHT data for all four models, with optimal combinations for entropy re-scaling found to be by 0.1 or one, for l² re-scaling to be by a factor of ten, for l¹ re-scaling to be by a factor of one, and for total variation and total squared variation re-scaling to be by a factor of one or ten.

Appendix B M87 in all epochs

In this appendix, we extend the results of M87 shown in Sect. 7 with the reconstructed images for the rest of the epochs. Figure B.1 depicts the image reconstructions for M87 on April 5, 6, and 10. In all of them the preferred image is the one presenting the clearest ring structure.

Fig. B.1

Image reconstructions of M87 2017 EHT observations. From top to bottom: April 5, April 6, and April 10.

Appendix C M87 reconstructed using different starting points

In this appendix, we present two reconstructions for M87 April 11 using 1) random brightness pixel distribution and 2) ring model as starting points, respectively. Figure C.1 shows the different clusters of the obtained solutions. As expected, starting with a ring morphology helps the convergence of the algorithm. Nevertheless, even when starting with a random brightness distribution on the pixels, we can recover a ring structure, which is a robust signature of the presence of a ring in the data. Also notable is the lesser degree of diversity presented with the geometric model. This could be due to the robust constraints that the random brightness distribution starting point imposes.

Fig. C.1

Set of solutions obtained supposing a random starting point (left panel) and ring starting point (right panel).

References

Akiyama, K., Ikeda, S., Pleau, M., et al. 2017a, AJ, 153, 159 [NASA ADS] [CrossRef] [Google Scholar]
Akiyama, K., Kuramochi, K., Ikeda, S., et al. 2017b, ApJ, 838, 1 [NASA ADS] [CrossRef] [Google Scholar]
Arras, P., Bester, H. L., Perley, R. A., et al. 2021, A&A, 646, A84 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Bhatnagar, S., Cornwell, T. J. 2004, A&A, 426, 747 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Biscani, F., Izzo, D. 2020, J. Open Source Softw., 5, 2338 [NASA ADS] [CrossRef] [Google Scholar]
Blackburn, L., Pesce, D. W., Johnson, M. D., et al. 2020, ApJ, 894, 31 [Google Scholar]
Broderick, A. E., Pesce, D. W., Tiede, P., Pu, H.-Y., Gold, R. 2020, ApJ, 898, 9 [NASA ADS] [CrossRef] [Google Scholar]
Chael, A. A., Johnson, M. D., Narayan, R., et al. 2016, ApJ, 829, 11 [Google Scholar]
Chael, A. A., Johnson, M. D., Bouman, K. L., et al. 2018, ApJ, 857, 23 [Google Scholar]
Clark, B. G. 1980, A&A, 89, 377 [NASA ADS] [Google Scholar]
Cornwell, T. J. 2008, IEEE J. Selected Topics Signal Process., 2, 793 [NASA ADS] [CrossRef] [Google Scholar]
Cornwell, T. J., Evans, K. F. 1985, A&A, 143, 77 [NASA ADS] [Google Scholar]
Event Horizon Telescope Collaboration (Akiyama, K., et al.) 2019, ApJ, 875, L4 [Google Scholar]
Event Horizon Telescope Collaboration (Akiyama, K., et al.) 2022, ApJ, 930, L14 [NASA ADS] [CrossRef] [Google Scholar]
Högbom, J. A. 1974, A&AS, 15, 417 [Google Scholar]
Johnson, M. D., Bouman, K. L., Blackburn, L., et al. 2017, ApJ, 850, 172 [Google Scholar]
Kim, J.-Y., Krichbaum, T. P., Broderick, A. E., et al. 2020, A&A, 640, A69 [EDP Sciences] [Google Scholar]
Knollmüller, J., Enßlin, T. A. 2019, ArXiv e-prints [arXiv:1901.11033] [Google Scholar]
Kulkarni, S. R., Prasad, S., Nakajima, T. 1991, J. Opt. Soc. Am. A, 8, 499 [NASA ADS] [CrossRef] [Google Scholar]
Li, H., Zhang, Q. 2009, IEEE Trans. Evol. Comput., 13, 284 [CrossRef] [Google Scholar]
Lockhart, W., Gralla, S. E. 2022, MNRAS, 509, 3643 [Google Scholar]
Martí-Vidal, I., Marcaide, J. M. 2008, A&A, 480, 289 [CrossRef] [EDP Sciences] [Google Scholar]
Müller, H., Lobanov, A. P. 2022, A&A, 666, A137 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Müller, H., Lobanov, A. P. 2023a, A&A, 672, A26 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Müller, H., Lobanov, A. 2023b, A&A, 673, A151 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Mus, A., Martí-Vidal, I., Wielgus, M., Stroud, G. 2022, A&A, 666, A39 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Pardalos, P. M., Žilinskas, A., Žilinskas, J. 2017, Non-Convex Multi-Objective Optimization (Berlin: Springer) [CrossRef] [Google Scholar]
Rau, U., Cornwell, T. J. 2011, A&A, 532, A71 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Readhead, A. C. S., Wilkinson, P. N. 1978, ApJ, 223, 25 [NASA ADS] [CrossRef] [Google Scholar]
Roelofs, F., Blackburn, L., Lindahl, G., et al. 2023, Galaxies, 11, 12 [NASA ADS] [CrossRef] [Google Scholar]
Schwind, N., Okimoto, T., Konieczny, S., Wack, M., Inoue, K. 2014, 2014 IEEE 26th International Conference on Tools with Artificial Intelligence, 170 [Google Scholar]
Schwind, N., Okimoto, T., Clement, M., Inoue, K. 2016, Proceedings of the Fifteenth International Conference on Principles of Knowledge Representation and Reasoning, KR’16 (AAAI Press), 601 [Google Scholar]
Sharma, S., Chahar, V. 2022, Archives Comput. Methods Eng., 29, 3 [Google Scholar]
Thompson, A., Moran, J., Swenson, G. 1994, Interferometry and Synthesis in Radio Astronomy (USA: Krieger Publishing Company) [Google Scholar]
Tiede, P. 2022, J. Open Source Softw., 7, 4457 [NASA ADS] [CrossRef] [Google Scholar]
Tsurkov, V. 2001, Large Scale Optimization (Berlin: Springer), 51 [Google Scholar]
Tychoniec, L., Guglielmetti, F., Arras, P., Ensslin, T., Villard, E. 2022, Phys. Sci. Forum, 5, 52 [NASA ADS] [Google Scholar]
Xin-She, Y., Xing-Shi, H. 2019, Mathematical Foundations of Nature-Inspired Algorithms, Springer Briefs in Optimization (Berlin: Springer) [Google Scholar]
Zhang, Q., Li, H. 2007, IEEE Trans. Evol. Comput., 11, 712 [CrossRef] [Google Scholar]

All Figures

	Fig. 1 Schematic illustration of the Pareto front for two given functions (f₁ and f₂), its ideal point, and two concepts for the most natural preferred points.
In the text

	Fig. 2 uv coverage for M87 on April 11, 2017. The red dots show EHT, and the black crosses indicate ngEHT at 345 GHz.
In the text

Fig. 3

Solution space of the MOEA/D. First two rows: Pareto front for the crescent case using the EHT + ngEHT array. The first panel of the solution clusters (top-left corner) shows the true image. The Pareto front is a seven-dimensional hypersurface. We illustrate the Pareto front with six projections. The six projections show the correlation between two regularizers and their values with respect to f₇ (data fidelity term, only functional). The bluer the points are, the lower the value for f₇. Bottom two rows: solution clusters (following rows) for the crescent case using the EHT + ngEHT array. The family of solutions can be grouped into eight clusters. The red box surrounding the cluster indicates the preferred solution by the accumulation point strategy, while the blue box is the solution closest to the ideal.

In the text

	Fig. 4 Same as Fig. 3 but for the disk model.
In the text

	Fig. 5 Same as Fig. 3 but for the double model.
In the text

	Fig. 6 Same as Fig. 3 but for the ring model.
In the text

	Fig. 7 Solution clusters for the crescent case using the EHT array. The first panel (top-left corner) shows the true image. The red box surrounds the cluster indicated by the accumulation point strategy, while the blue box highlights the cluster closest to the ideal.
In the text

	Fig. 8 Same as Fig. 7 but for the disk model.
In the text

	Fig. 9 Same as Fig. 7 but for the double-source model.
In the text

	Fig. 10 Same as Fig. 7 but for the ring model.
In the text

	Fig. 11 Relative Pareto fronts (top two lines) and clustered images for M87 on April 11 (bottom three lines).
In the text

	Fig. 12 Cluster solutions for the case of the EHT+ngEHT data with noise included and not self-calibrated. The starting point is a random distribution in the pixels.
In the text

	Fig. 13 Clusters of solutions for the case of M87 on the April 11, 2017 EHT campaign. The starting points are a ring (left panel) and Gaussian (right panel) and play an important role in constraining the solutions in absence of a self-calibration model.
In the text

	Fig. B.1 Image reconstructions of M87 2017 EHT observations. From top to bottom: April 5, April 6, and April 10.
In the text

	Fig. C.1 Set of solutions obtained supposing a random starting point (left panel) and ring starting point (right panel).
In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.

[1] Akiyama, K., Ikeda, S., Pleau, M., et al. 2017a, AJ, 153, 159 [NASA ADS] [CrossRef] [Google Scholar]

[2] Akiyama, K., Kuramochi, K., Ikeda, S., et al. 2017b, ApJ, 838, 1 [NASA ADS] [CrossRef] [Google Scholar]

[3] Arras, P., Bester, H. L., Perley, R. A., et al. 2021, A&A, 646, A84 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[4] Bhatnagar, S., Cornwell, T. J. 2004, A&A, 426, 747 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[5] Biscani, F., Izzo, D. 2020, J. Open Source Softw., 5, 2338 [NASA ADS] [CrossRef] [Google Scholar]

[6] Blackburn, L., Pesce, D. W., Johnson, M. D., et al. 2020, ApJ, 894, 31 [Google Scholar]

[7] Broderick, A. E., Pesce, D. W., Tiede, P., Pu, H.-Y., Gold, R. 2020, ApJ, 898, 9 [NASA ADS] [CrossRef] [Google Scholar]

[8] Chael, A. A., Johnson, M. D., Narayan, R., et al. 2016, ApJ, 829, 11 [Google Scholar]

[9] Chael, A. A., Johnson, M. D., Bouman, K. L., et al. 2018, ApJ, 857, 23 [Google Scholar]

[10] Clark, B. G. 1980, A&A, 89, 377 [NASA ADS] [Google Scholar]

[11] Cornwell, T. J. 2008, IEEE J. Selected Topics Signal Process., 2, 793 [NASA ADS] [CrossRef] [Google Scholar]

[12] Cornwell, T. J., Evans, K. F. 1985, A&A, 143, 77 [NASA ADS] [Google Scholar]

[13] Event Horizon Telescope Collaboration (Akiyama, K., et al.) 2019, ApJ, 875, L4 [Google Scholar]

[14] Event Horizon Telescope Collaboration (Akiyama, K., et al.) 2022, ApJ, 930, L14 [NASA ADS] [CrossRef] [Google Scholar]

[15] Högbom, J. A. 1974, A&AS, 15, 417 [Google Scholar]

[16] Johnson, M. D., Bouman, K. L., Blackburn, L., et al. 2017, ApJ, 850, 172 [Google Scholar]

[17] Kim, J.-Y., Krichbaum, T. P., Broderick, A. E., et al. 2020, A&A, 640, A69 [EDP Sciences] [Google Scholar]

[18] Knollmüller, J., Enßlin, T. A. 2019, ArXiv e-prints [arXiv:1901.11033] [Google Scholar]

[19] Kulkarni, S. R., Prasad, S., Nakajima, T. 1991, J. Opt. Soc. Am. A, 8, 499 [NASA ADS] [CrossRef] [Google Scholar]

[20] Li, H., Zhang, Q. 2009, IEEE Trans. Evol. Comput., 13, 284 [CrossRef] [Google Scholar]

[21] Lockhart, W., Gralla, S. E. 2022, MNRAS, 509, 3643 [Google Scholar]

[22] Martí-Vidal, I., Marcaide, J. M. 2008, A&A, 480, 289 [CrossRef] [EDP Sciences] [Google Scholar]

[23] Müller, H., Lobanov, A. P. 2022, A&A, 666, A137 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[24] Müller, H., Lobanov, A. P. 2023a, A&A, 672, A26 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[25] Müller, H., Lobanov, A. 2023b, A&A, 673, A151 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[26] Mus, A., Martí-Vidal, I., Wielgus, M., Stroud, G. 2022, A&A, 666, A39 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[27] Pardalos, P. M., Žilinskas, A., Žilinskas, J. 2017, Non-Convex Multi-Objective Optimization (Berlin: Springer) [CrossRef] [Google Scholar]

[28] Rau, U., Cornwell, T. J. 2011, A&A, 532, A71 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[29] Readhead, A. C. S., Wilkinson, P. N. 1978, ApJ, 223, 25 [NASA ADS] [CrossRef] [Google Scholar]

[30] Roelofs, F., Blackburn, L., Lindahl, G., et al. 2023, Galaxies, 11, 12 [NASA ADS] [CrossRef] [Google Scholar]

[31] Schwind, N., Okimoto, T., Konieczny, S., Wack, M., Inoue, K. 2014, 2014 IEEE 26th International Conference on Tools with Artificial Intelligence, 170 [Google Scholar]

[32] Schwind, N., Okimoto, T., Clement, M., Inoue, K. 2016, Proceedings of the Fifteenth International Conference on Principles of Knowledge Representation and Reasoning, KR’16 (AAAI Press), 601 [Google Scholar]

[33] Sharma, S., Chahar, V. 2022, Archives Comput. Methods Eng., 29, 3 [Google Scholar]

[34] Thompson, A., Moran, J., Swenson, G. 1994, Interferometry and Synthesis in Radio Astronomy (USA: Krieger Publishing Company) [Google Scholar]

[35] Tiede, P. 2022, J. Open Source Softw., 7, 4457 [NASA ADS] [CrossRef] [Google Scholar]

[36] Tsurkov, V. 2001, Large Scale Optimization (Berlin: Springer), 51 [Google Scholar]

[37] Tychoniec, L., Guglielmetti, F., Arras, P., Ensslin, T., Villard, E. 2022, Phys. Sci. Forum, 5, 52 [NASA ADS] [Google Scholar]

[38] Xin-She, Y., Xing-Shi, H. 2019, Mathematical Foundations of Nature-Inspired Algorithms, Springer Briefs in Optimization (Berlin: Springer) [Google Scholar]

[39] Zhang, Q., Li, H. 2007, IEEE Trans. Evol. Comput., 11, 712 [CrossRef] [Google Scholar]