Synthetic direct demodulation method and its applications in Insight-HXMT data analysis

Aims. A modulation equation relates the observed data to the object where the observation is approximated by a linear system. Reconstructing the object from the observed data is therefore is equivalent to solving the modulation equation. In this work we present the synthetic direct demodulation (synDD) method to reduce the dimensionality of a general modulation equation and solve the equation in its sparse representation. Methods. A principal component analysis is used to reduce the dimensionality of the kernel matrix and k-means clustering is applied to its sparse representation in order to decompose the kernel matrix into a weighted sum of a series of circulant matrices. The matrix- vector and matrix-matrix multiplication complexities are therefore reduced from polynomial time to linear-logarithmic time. A general statistical solution of the modulation equation in sparse representation is derived. Several data-analysis pipelines are designed for the Hard X-ray modulation Telescope (Insight-HXMT) based on the synDD method. Results. In this approach, a large set of data originating from the same object but sampled irregularly and/or observed with different instruments in multiple epochs can be reduced simultaneously in a synthetic observation model. We suggest using the proposed synDD method in Insight-HXMT data analysis especially for the detection of X-ray transients and monitoring time-varying objects with scanning observations.


Introduction
The newly launched Hard X-ray Modulation Telescope (Insight-HXMT) is China's first X-ray astronomical satellite.Insight-HXMT carries three main payloads onboard: the High Energy telescope (HE), the Medium Energy telescope (ME) and the Low Energy telescope (LE) (Zhang et al. 2014).One of its primary scientific objectives is scanning the Galactic plane to monitor time-varying objects and to discover X-ray transients (Zhang et al. 2014).Insight-HXMT is based on the direct demodulation (DD) method because all telescopes onboard are position-insensitive collimated detectors and images can only be reconstructed from the observed data by offline data analysis (Li & Wu 1993, 1994;Li 2007;Zhang 2009).
In the analysis of astronomical data, observed data can be modelled as object functions modulated by kernel functions, which characterise the observation process, mainly the instrument response.For example, an observed image is a spatial distribution of objects modulated by an imaging system, where a point spread function (PSF) serves as the modulation kernel function.Observed spectra are object spectra modulated by energy response matrices.The relations among the observed data, the object, and the observation process can be formulated Supported by National Natural Science Foundation of China (11403014 and 11373025).
with a Fredholm integral equation of the first kind: where d (ω), h (x, ω) and f (x) represent models of the observed data, the kernel, and the object, respectively, and ω and x are coordinate variables for the observed data domain (e.g.pixel indices, energy channels, possible state parameters of instruments and so on) and the object domain (e.g.WCS celestial sphere coordinates), respectively.The object model f (x) represents both the sources and the sky background.The kernel h (x, ω) characterises the observation process including the instrument response as well as non-sky backgrounds; for example, dark currents, or cosmic rays for high-energy detectors.In order to solve the equation, both periodic calibration of instrument response and non-sky background estimation are necessary to estimate the kernel.Differences between the true kernel and its estimation lead to systematic errors in the solution, that is, the estimated object.Discussions of instrument response calibration, non-sky background modelling and estimation, and systematic error due to above issues are beyond the scope of this work.
The DD method is introduced by Li & Wu (1993) to solve the modulation equation and then to reconstruct the unknown object from the observed data.In Eq. (1) a model of the observed data instead of the data itself is given.Since the data itself is a random outcome of the observation process, statistical solution of the modulation equation is achieved via the DD method according to the probability distribution of the observed data implied by its model.For observed data dominated by Poisson fluctuation, the simplest implementation of the DD method degenerates into Richardson-Lucy (RL) iteration (Richardson 1972;Lucy 1974).As a non-parametric approach, reconstruction through RL iteration allows a larger pool of solutions that fit the observed data, compared with parametric fitting approaches that require explicit and known models of both signals and backgrounds (Puetter et al. 2005).However, a larger pool of solutions often results in an ill-posed problem, which is unstable or divergent.The DD method overcomes this problem by continuously cross-correlating both sides of Eq. ( 1) and transforming the original modulation equation into its L-fold equivalents to improve its positive definitiveness, as well as non-linear physical constraints used to regularize the iteration, thereby shrinking the pool of solutions so that the object reconstruction problem is reduced.
In real-world data analysis, it is often necessary to avoid data with a poor signal-to-noise ratio (S/N) through screening and selection; for example, the Good Time Interval (GTI) auxiliary record (Blackburn 1995) is used to select time series data in Rossi X-ray Timing Explorer (RXTE) as well as Insight-HXMT data analysis, or, to include relevant data from other instruments or epochs in order to provide better statistics.Meanwhile, the dimensionality and irregularity of the corresponding modulation equation increases, which imposes growing complexity on the reconstruction, especially for Insight-HXMT data analysis, where the modulation kernels lack circular symmetry.The computational complexity introduced by rotation modulation is reduced by an angular clustering method specifically designed for Insight-HXMT all-sky survey (Huo & Zhou 2013), since the rotation modulation can be represented much more sparsely with explicit angular coordinates.
Although the DD method treats the ill-posed object reconstruction problem, the lack of a modulation-kernel-constructing process or accelerated implementation for general cases prevents this method from being applied to data analysis tasks with high dimensionality or large datasets.In this article we provide the synthetic direct demodulation method (synDD), which features 1. a modulation-kernel-constructing process to combine kernels characterizing individual instruments, observations and/or data screening/selection into one synthetic kernel (modulation equation synthesis process), and 2. an accelerated implementation for general cases by decomposing an arbitrary kernel matrix into a weighted sum of a series of circulant matrices so that the fast Fourier transform (FFT) can help (modulation kernel matrix analysis process).Consequently, we can not only deal with the previously mentioned rotation modulation without losing important information sampled from different position angles, as well as the projection distortion that occurs on the tangential plane of the celestial sphere, which we put aside in previous work (Huo & Zhou 2013), but can also squeeze a more complicated kernel (e.g. with screening or weighing matrices) and additional observed data into our representation of the modulation equation.

Modulation equation
In this article, vectors and matrices are highlighted in bold, while regular type face is used for scalars, plain sequences (or, 1D array), and multiple dimensional arrays.Operations on plain arrays are element-wise if not specifically indicated.For instance, x indicates a vector of x 1 , x 2 , . . ., x n , while x simply suggests a plain sequence of these elements.Modular arithmetic is used for subscripts in this article.
Consider the algebraic form of Eq. ( 1) where N × 1 column vector d and M × 1 vector f are discrete samplings of the models of the observed data and the object, while N × M matrix H is the modulation kernel matrix, which is a discrete sampling of the modulation kernel.

Analysis and synthesis of modulation equation
Both N and M in Eq. ( 2) are used to indicate the size of the object reconstruction problem.Time costs in solving the modulation equation increase rapidly as the size of the problem increases unless the naive matrix-matrix and matrix-vector multiplications are carefully treated.For N × N matrices and N × 1 vectors the time costs of matrix-matrix and matrix-vector multiplications are proportional to N 3 and N 2 , respectively.In computer science, such costs are measured with computational complexities, denoted by O(N 3 ) (cubic complexity) and O(N 2 ) (quadratic complexity).If the complexity could be reduced from polynomials (cubic or quadratic) to quasilinear, for example, O(N log N) by replacing all the naive matrix mulplications with FFTs, time costs for a typical 512 × 512 Insight-HXMT image reconstruct (N = 512 × 512) can be reduced from 10 5 s to a few seconds (Huo & Zhou 2013).In order to achieve such a treatment, we decompose the kernel matrix H, for this purpose taken to be an arbitrary matrix, into the sum of a finite sequence of circulant kernel matrices multiplied by diagonal coefficient matrices, as where A k is an N×N diagonal matrix serving as the coefficient of the kth N×M kernel matrix H k , which is row-circulant.Diagonal entries of A k have values of either 0 or 1 only.A matrix H k is row-circulant if each of its row vectors is circular-shifted by one element to the right relative to its preceding row; that is, H k,i, j = H k,i+1, j+1 = h k, j−i+1 for all i and j, where h k is the first row vector of H k .A row-circulant matrix is not necessarily square.If the number of its rows is greater than the number of its columns (N > M), its excess rows are considered as circularly appended to the first M rows of the matrix, while if N < M the matrix is considered as the first N rows that truncated from a M × M square circulant matrix.In addition, it is required that and where I N is the N × N identity matrix.
Allowing a k to be a column vector of diagonal elements of A k , Eq. ( 2) becomes The operator • denotes the element-wise multiplication, the operator * denotes the circular convolution, and h k is the reverse of We note that a k has N elements while f * h k has M elements so before calculating their element-wise multiplication the right operand f * h k is truncated (if N > M) or padded (if N < M) to the same size as the left operand a k .Such truncation or padding is applied when necessary in this article.Decomposition of d is naturally derived as where the kth element of the observed data The term analysis refers to separating the original modulation equation into a finite sequence of equations by decomposing the kernel matrix as well as the observed data, while the term synthesis refers to combining a finite sequence of modulation equations to form an equivalent one.Analysis of a modulation equation is useful for reducing the computational complexity of the corresponding inverse problem.Data obtained from different observations of the same object can be synthesised as in Eq. ( 7) and demodulated as in Eq. ( 2) or Eq. ( 6).

L-fold correlation
The one-fold correlation of a modulation equation such as Eq. ( 2) is achieved by left-multiplying both sides of the equation by the transpose of the kernel matrix, as In the correlated equation, the unknown object image remains f while the observed data as well as the kernel matrix are both transformed into one-fold correlated data c 1 = H T d and kernel provided that c 0 = d and P 0 = H (Li & Wu 1994), so the L-fold correlated equation is Although L-fold correlation is preferred when using the DD method, it is difficult to compute with naive matrix multiplication when the kernel matrix is very large; for example, a 2 20 × 2 20 kernel matrix, since the computational complexity of multiplication between two N × N matrices is O(N 3 ) if performed naively.With a more complicated algorithm the computational complexity can be reduced to O(N 2.376 ) (Coppersmith & Winograd 1990), however it would still take hundreds of hours to compute the multiplication between two 2 20 × 2 20 matrices.
We have derived a simple but efficient approach to compute the matrix multiplication from Eq. ( 3).First we compute the first column vector of H T H, as We can compute the remaining column vectors in the same way.
The time cost of computing each column vector is proportional to KN log N, where K is the number of circulant matrices, if the circulant convolution is calculated with the FFT algorithm.For rotation modulation, K is the number of position angle clusters, for wide-field image reconstruction, K is the number of field-ofview (FoV) clusters, and for synthetic kernel, K is the number of observations.Because both the position angle and the FoV distortion vary gradually in practical observations, their clusters are always far less than pixels or bins of the observed data samples.We found that 20-100 clusters are sufficient to approximate the kernel variation where the difference between the simulated true kernel and the kernel approximated with the weighted sum of circulant matrices are negligible compared to the required kernel calibration accuracy.Therefore K is always less than N by several orders of magnitude and is independent of N. As a result, the complexity is reduced to O(N 2 log N).

Screening and weighing
We can avoid data with poor S/N (screening) or adjust their weights accordingly by introducing a weight matrix M into Eq.( 2), where the weight matrix M is an N × N diagonal matrix.The ith element on its main diagonal m i serves as the weight of the ith observed datum d i .We can avoid certain observed data by assigning 0-valued elements as their weights.Modulated estimates of the object H f ( f is an estimate of the unknown object f ) appear as denominators in reconstructions with the DD method as well as RL iterations.The original kernel in Eq. (1) contains both instrument response and non-sky background, while the object contains sources and the sky background; hence division-by-zero will not happen.But possible zero-weights in M of Eq. ( 12) are like holes and may cause such a problem.To prevent this problem in later reconstructions, L-fold correlation is applied so that zero-holes in MH f are smoothed to non-zeros by the kernel and their nonzero neighbours.

Additional observation
Observations at multiple epochs or with different instruments can be joined into a single modulation equation by using a partitioned kernel matrix, as where d λ and H λ (λ = 1, 2, . . ., Λ) are the observed data and the kernel matrix of the λth observation, respectively.

Iterative solution of synthetic modulation equation
A model of the observed data instead of the observed data themselves has been placed on the left side of Eqs. ( 1) and ( 6), where the model characterizes the statistically expected value of the observed data while the data itself is a random outcome from a specific observation.For Poisson fluctuation(photon noise)dominant data, the likelihood function of the object image f as A151, page 3 of 10 an unknown parameter given the observed data d is Once we find the f that maximizes the likelihood function, the maximum likelihood (ML) solution of the modulation equation is achieved.The object reconstruction problem is therefore transformed into solving the following equation. Therefore, and Fixed-point iteration is a method of finding a fixed point of a given function in a numerical analysis.x is a fixed point of the function g(x) if x = g(x).The iteration x (n+1) = g x (n) is expected to converge to x if g is continuous.The ML estimate of the true image f is a fixed point of function We therefore expect to find the ML estimate iteratively.Iteration at the lth step is where the normalization factor w = k a k * h k .RL iteration can also be considered as a fixed-point iteration that achieves an ML solution that restores an image blurred by a convolution kernel.For a perfectly known modulation kernel and statistical model of the observed data, the proper criterion to terminate the fixedpoint iteration can be found by monitoring the residuals.However when the modulation kernel is not perfectly known or the statistical model of the observed data does not perfectly describe the random nature of the data, one should terminate the iteration as soon as the required source is resolved from the data.In addition, cross-validation and sensitivity assessment (Huo et al. 2015) are necessary to prevent overfitting or noise amplification.For Insight-HXMT survey data analysis, which is mainly focused on point-like-source detection and monitoring, the stopping criterion is not critical since the demodulated image is not the final result of object reconstruction but only serves as a hint for the following parametric fitting procedures.Provided both the object image f and the observed data d are N × 1 and the kernel matrix is then N × N, it takes ∼2N 2 scalar multiplications for each RL iteration of the original DD.In contrast, it takes ∼2KN log N scalar multiplications for each iteration of demodulation synthesised from a K circulant kernel matrix with the aid of the FFT algorithm.The computational complexity is therefore reduced, provided that K N.
Let us take Insight-HXMT as an example, where the level-1 scientific data products consist of lists of X-ray photon arrival events detected by each scientific instrument.Each event is described by properties such as time on arrival, detector identifier, energy channel identifier, anti-coincidence detector counts, and so on.Satellite orbit coordinates and telescope attitude (pointing angles and position angle) on arrival of each event are interpolated from the housekeeping data provided by the satellite platform.Two-dimensional discrete samples of the observed image, 1D binned samples of the observed light curve, and/or a 1D binned energy spectrum are counted from the above events as well as auxiliary housekeeping data accordingly.For object image reconstruction, each image contains too many pixels for the naive matrix multiplication in Eq. ( 2) and the modulation cannot be computed with spherical harmonics or fourier transforms due to the absence of circular symmetry in the kernel, i.e. the point spread function (PSF) (Huo & Zhou 2013).With the approach described here the time cost of Insight-HXMT image reconstruction is reduced by orders of magnitude.

Cluster analysis of the modulation kernel
Cluster analysis of the modulation kernel is the key building block of the method we present in this article.A kernel matrix is an ordered set of row vectors, which are classified into K groups through cluster analysis.In cluster analysis we refer to each group as a cluster.Row vectors in the same cluster are similar to each other, as if they were taken from a circulant matrix.A generalized Euclidean distance x i − x j is introduced here to measure the similarity between a pair of row vectors x i and x j .The two row vectors are considered sufficiently similar if the distance is less than the kernel calibration accuracy.So cluster analysis is the underlying procedure through which a kernel matrix H is decomposed into a sequence of circulant matrices H k and their coefficients A k .The number of clusters, K, determines the extent to which the computational complexity can be reduced.The smaller the number K, the more complexity is reduced when we approximate H with K k=1 A k H k .The accuracy of the approximation is determined by similarities between any two row vectors in the same cluster.Therefore, to perform a cluster analysis of a modulation kernel is to classify all row vectors of the kernel matrix into as few clusters as possible, while the similarities between any two vectors in the same cluster are acceptable.
Principal component analysis (PCA) is also used to decompose a kernel in order to reduce its dimensionality (Jolliffe 1986).We categorize the analysis of a modulation kernel in this article as a cluster analysis.Because PCA is only used to reduce the dimensionality of calculating the similarities between row vectors here, this means that instead of calculating the generalized Euclidean distance between two M × 1 row vectors, we actually calculate the distance between two representative vectors that each contain many fewer components, i.e., the principal components.Therefore, PCA serves as a preprocessing for cluster analysis.Since each row vector of the kernel matrix has the same size as the object image, expense of computing similarities between row vectors increases sharply with the image resolution, especially if the image has more than one dimension.Fortunately a row vector of a kernel could be more sparse with a certain representation than it appears with a naive pixel-wise representation.PCA is used to find the basis of a sparse representation of given row vectors of a specific kernel.In PCA the basis vectors are termed principal components.By expressing a row vector as a finite linear combination of the basis vectors, its dimensionality is effectively reduced, since the number of non-zero coefficients of a sparse representation is usually less than the number of pixels of the image.
Each row of the given kernel matrix H is circularly shifted leftwards according to its row number in the matrix, that is, the ith row is shifted by i elements leftwards.In this way all row vectors of a circulant matrix would be aligned so that the shifted matrix appears as N vertically stacked copies of identical row vectors, which is ready for the following PCA processing.
Allowing x i be the shifted ith row vector with pixelwise representation, we use the NIPALS-PCA algorithm (Geladi & Kowalski 1986) to find the principal components of the set of shifted row vectors {x i } iteratively, and transform each row vector x i to a new vector t i in a space with reduced dimensionality defined by the principal components.
The similarity between two row vectors x i and x j is measured by the Euclidean distance between their sparse representations t i and t j , namely, t i − t j .As a result, k-means clustering (Lloyd 1982) is used to classify {t i } into clusters.A drawback of k-means clustering is that the number of clusters is taken as an input parameter.The minimum number of clusters should therefore be determined through extra running of clustering beforehand to make sure the similarities between vectors in the same cluster are acceptable.
Once the vectors {t i } are classified into K clusters, a central vector of each cluster is calculated according to the corresponding row vectors {x i }, as where N k is the number of vectors in the kth cluster, while S k is the set of indices of vectors in this cluster.A circulant matrix H k is then constructed from h k as its first row vector.Its coefficient matrix A k is constructed from its diagonal elements a k , as

Object, modulation kernels, and observed data
We produced a model with a series of point sources located on a spiral as the object (refer to Fig. 1).Distance between adjacent sources increases from the interior to the exterior of the spiral, while the intensity decreases.We can therefore use this model object to assess both spatial resolution and sensitivity of a given observation and the corresponding reconstruction.
Aperture PSF model of the high energy (HE) telescope of Insight-HXMT (as shown in Fig. 2), which reflects the geometrical effects of all the collimators of HE/Insight-HXMT detectors, that is, the detection efficiency of the telescope to a test point source in its FoV is brought here to build modulation kernels.The spatial coordinates of the test point source are variables of the PSF.
We simulated observations in which the sky region is scanned in different directions as shown in Fig. 3. Scanning speed along each row is 0.10 • s −1 , while the interval between adjacent scanning rows is 3.0 • .
With the simulated scanning paths, PSF, object, and a uniform background of 0.3 counts s −1 deg −2 , which is consistent A151, page 5 of 10

Reconstruction and comparison
First we reconstructed the object from a single-epoch observed light curve (as shown in the top left panel of Fig. 4) on a lowresolution pixel grid (64 by 64 pixels).The reconstructed image is shown in Fig. 5.We estimated from the marginal null region of the reconstructed image that the 3σ pixel-wise sensitivity is 2.4 counts s −1 deg −2 .So pixels with image values below the estimated sensitivity were discarded.
Similarly we reconstructed the object again from simulated light curves from the four different scanning paths shown in Fig. 3 by synthesizing all modulation kernel matrices and all light curves together, according to Sect.2.5.The reconstructed image is shown in Fig. 6.The estimated 3σ sensitivity is 1.7 counts s −1 deg −2 .
In the above reconstructions we employed the original DD iterations with vector-extrapolation between iterations to speed up the convergence (Biggs & Andrews 1997).The vectorextrapolation is a technique in numerical analysis which predicts the subsequent step with previous steps instead of computing the iteration, meaning that less iterations are required.For the low-resolution reconstruction, we find that after 30 iterations the solutions no longer improve due to significant artefacts.For each iteration the modulation was calculated through matrix multiplication routines provided by OpenBLAS, which is a widely used open source implementation of the Basic Linear Algebra Subprograms (BLAS; Xianyi et al. 2014).This is referred to as the original DD while the solution specified in Eq. ( 19) is referred to as the synthetic DD.Time cost per original DD iteration is 5.3 s on a dual-core PC.We reconstructed the object from simulated light curves from the four different scanning paths shown in Fig. 3 on the same pixel grid with synthetic DD for comparison, as shown in Fig. 7.
Time cost per synthetic DD iteration is 15 milliseconds on one personal computer (PC).As we can see the reconstructed image with synthetic DD is similar to that with original DD but the time cost is reduced by orders of magnitude.
Subsequently, we tested higher-resolution (512 by 512 pixels) reconstruction with synthetic DD.Original DD was not tested because of its expected time cost (at least 10 h).Time cost per original DD iteration increases polynomially and more iterations are requested with higher resolution.The image reconstructed from the simulated light curves for the four different scanning paths illustrated in Fig. 3 is shown in Fig. 8. Time cost per iteration on this pixel grid is 0.6 s.
We also observe the convergence of clustering analysis as a prerequisite of synthetic DD.The residual of the sparse representation of the kernel matrix is reduced rapidly with the first components; subsequently the marginal improvement involving more components appears to diminish, as illustrated in Fig. 9.
Finally time costs of reconstructions with original DD iteration and synthetic DD iteration are shown in Fig. 10.

Conclusion
The synDD method including modulation kernel matrix analysis and modulation equation synthesis reduces the polynomial computational complexity of a time-consuming process in solving the modulation equation to linear-logarithmic complexity, which in turn saves time costs of object reconstruction in Insight-HXMT data analysis by orders of magnitude.This encourages us to include more observed data, for example, in multiple epochs or multiple energy bands, and to reconstruct the object with higher resolution.Therefore it is possible to achieve both improved sensitivity and higher resolution but with even less resources.We suggest using the proposed method in Insight-HXMT data analysis especially with scanning observations from multiple payloads, different energy bands, and/or multiple epochs for transient detection, time-varying objects monitoring and so on.
operator implemented with FFT algorithm IFFT (• • • ) The inverse Fourier transform operator • • • The element-wise complex conjugation operator A small number specified as an input parameter i and j Array indices N and M Numbers of rows and columns of the original kernel matrix k Cluster index K The number of clusters J The expected number of principal components d Array of N elements represents the observed data H N × M array representing the original kernel matrix X N × M array representing the shifted kernel matrix so that row vectors are aligned X 0 Array of N elements storing the arithmetic average of the M column vectors of X T N × J array representing the principal component scores of the N row vectors of X, i.e., T [i] is the principal component score of X[i] R N × M array storing residuals in PCA Subroutine 2 λ and λ p Estimates of the eigenvalue of the covariance in Subroutine 2 S Array of K lists, while the kth list S [k] contains all the row numbers of vectors classified into the kth cluster at the current step L Array of K elements representing the numbers of vectors in the K clusters, e.g.L[k] is the number of vectors in the kth cluster C (and C p ) K × J array, the kth row of which stores the central vector of the kth cluster calculated at the current step (and the previous step), represented by principal component scores Y An K × M array storing the central vectors of each cluster in pixel-wise representation D An K × N array representing the Euclidean distance between the N shifted vectors and central vectors of the K clusters, e.g.D[k][i] represents the Euclidean distance between the ith shifted vector X[i] and the kth central vector C[k] P An K × N × M array representing the K circulant kernel matrices in Eq. (3) A An K × N × N array representing the K diagonal coefficient matrices in Eq. (3) a An K × N array representing the vectors of diagonal elements of the K matrices w An array of M elements representing the normalization factor in Eq. (19) f (and f p ) Array of M elements representing the object image estimated at the current step (and the previous step) r An array of N elements q An array of M elements A151, page 9 of 10 Table A.1.Symbols in algorithm descriptions.