A&A 421, 977-982 (2004)
DOI: 10.1051/0004-6361:20040301

A new method for calculating the velocity ellipsoid

R. L. Branham, Jr.

Instituto Argentino de Nivología y Glaciología (IANIGLA), C.C. 330, 5500 Mendoza, Argentina

Received 19 February 2004 / Accepted 23 March 2004

Abstract
The velocity ellipsoid has traditionally been calculated by use of moments of the velocity components after the distribution of parallaxes has been corrected for parallax error. An alternative, used in this study, fits a quadric surface to the velocity components and reduces the data by use of a new mathematical technique, semidefinite programming, and a criterion, difference of squares, that calculates a unique ellipsoid that represents a global minimum of the reduction criterion. Use is made of 246 stars of spectral class O-B5 and luminosity class V to calculate the velocity ellipsoid. The results may be considered superior to those found from the method of moments.

Key words: Galaxy: kinematics and dynamics - methods: data analysis

1 Introduction

The velocity ellipsoid remains essential for the study of stellar kinematics. Until recently the paucity of parallaxes assured that the ellipsoid was calculated mainly by use of radial velocities alone or proper motions alone. The publication of the Hipparcos catalog (ESA 1997), with over 100 000 stellar parallaxes, remedied this situation. One may now calculate the velocity ellipsoid from total space motions for a reasonable number of stars.

Having an adequate number of stars also implies that one should look for the best method to calculate the ellipsoid. Trumpler & Weaver's classic text, Statistical Astronomy (1962), outlines a method based on computing the moments of the space velocities. Ogorodnokov (1965) presents basically the same prescription. Most of the traditional values for the parameters of the velocity ellipsoid, such as those Delhaye (1965) tabulates, are based on the method of moments. Oblak (1983), however, prefers an approach based on maximum likelihood estimation. A relatively recent study (Dehnen & Binney 1998), and one of the first to make extensive use of the Hipparcos data, returns to the use of statistical moments. This paper presents yet another idea, but one with attractive features, in particular that the ellipsoid calculated is unique.

Because stellar parallaxes incorporate significant errors, the median parallax for the Hipparcos catalog, 4.790 milli-arcsec (mas), having a median error of 1.090 mas, the velocities also contain significant errors. Trumpler & Weaver recommend correcting the parallaxes for error by use of an assumed distribution for the error: the corrected parallaxes may then be used to calculate the moments.

To use a reduction technique that allows for errors such as parallax error and implements a more rigorous reduction model than statistical moments seems preferable, however. Such a reduction technique, known as total least squares (TLS) exists. See Branham (2001) for a discussion of the method. To fit a quadric surface to the calculated space velocities becomes the more rigorous reduction model. I have applied the method, and it works well, but unfortunately suffers from a severe drawback, so severe that it cannot be recommended for general use. To understand the more complicated technique actually adopted, called semidefinite programming and which, unfortunately, does not correct for parallax error, it is instructive to review the TLS approach.

2 The reduction model

Let $\dot{x},\dot{y},\dot{z}$ be the space velocities of a star. The quadric surface to fit to these velocities becomes

 \begin{displaymath}%
a\dot{x}^{2}+b\dot{y}^{2}+c\dot{z}^{2}+d\dot{x}\dot{y}+e\dot{x}\dot{z}+f\dot{%
y}\dot{z}+g\dot{x}+h\dot{y}+k\dot{z}+l=0,
\end{displaymath} (1)

where a,...,l are the ten coefficients that define the quadric that must be determined. The equation may be expressed in more symmetric notation as

\begin{eqnarray*}\left(
\begin{array}{lll}
\dot{x} & \dot{y} & \dot{z}
\end{arr...
...ay}{l}
\dot{x} \\
\dot{y} \\
\dot{z}
\end{array}\right) +l=0.
\end{eqnarray*}


Equation (1) is homogeneous. Techniques exist, based on the singular value decomposition, to solve homogeneous equations by the precepts of TLS (Gene Golub, personal communication). But it is possible to convert the equation to a non-homogeneous equation. Set a=1, which corresponds to dividing Eq. (1) by a and redefining the coefficients b/a... as b,..., and write the equation as

 \begin{displaymath}%
l+b\dot{y}^{2}+c\dot{z}^{2}+d\dot{x}\dot{y}+e\dot{x}\dot{z}+f\dot{y}\dot{z}+g%
\dot{x}+h\dot{y}+k\dot{z}=-\dot{x}^{2}.
\end{displaymath} (2)

In Eq. (2) the coefficients b,...,k are associated with variables that contain errors whereas the coefficient l associates with an error-free variable. The equation should, therefore, be solved by a modification of TLS, known as mixed total least squares-least squares (TLS-LS), that allows for some of the columns of the data matrix containing no error. Branham (2001) discusses how to calculate a TLS-LS solution, along with corresponding mean errors; see particularly his Eq. (16).

The procedure outlined so far can work well in practice; see Branham (2003). But a possible pitfall must be addressed. Granted, Eq. (2) will calculate the coefficients of a quadric surface, but how do we know that the surface will be an ellipsoid? Suppose the coefficients correspond to a paraboloid, hyperboloid, or some other surface? Given the observational error in the data, can the possibility of calculating a surface other than an ellipsoid be ruled out a priori? The answer is no, the possibility cannot be ruled out. But at the cost of substituting a nonlinear reduction procedure for the linear one in Branham (2001) one can enforce the condition that the surface be an ellipsoid. The condition that a quadric surface be an ellipsoid is, if we use the coefficients of Eq. (2), that the matrix

 \begin{displaymath}%
A=\left(
\begin{array}{lll}
a & d/2 & e/2 \\
d/2 & b & f/2 \\
e/2 & f/2 & c
\end{array}\right)
\end{displaymath} (3)

be positive-definite. Such will be the case if all of  $A^{\prime }s$eigenvalues are real and positive. The solution itself may be calculated by quadratic programming, a special case of nonlinear programming, with nonlinear constraints (Gass 1975). Because quadratic programming is iterative, a first solution could be the one computed by the linear algorithm in Branham (2001). The covariance matrix remains the same, although it should not be calculated until the solution has converged.

It therefore seems as it a TLS reduction can be retained by use of nonlinear programming. Unfortunately, such a procedure suffers, as I found out from experience, as have others, from such severe problems of divergence, slow convergence, or convergence to a local rather than global minimum that it becomes unsuitable for general application.

3 Semidefinite programming

What can be done, then? Use the coefficients of Eq. (1) and let

\begin{eqnarray*}v=(\dot{x}\;\dot{y}\;\dot{z})^{T};
\end{eqnarray*}


and

\begin{eqnarray*}c=(g\;h\;k)^{T}.
\end{eqnarray*}


Then Eq. (1) can be rewritten as

 
(v-c)TA(v-c)-q2, (4)

where

\begin{eqnarray*}q^{2}=ag^{2}+bh^{2}+ck^{2}+ekg+dhg+\allowbreak fkh.
\end{eqnarray*}


Define a goodness-of-fit criterion by

 \begin{displaymath}%
F=\sum_{i=1}^{m}\left[ (v_{i}-c)^{T}A(v_{i}-c)-q^{2}\right] ^{2},
\end{displaymath} (5)

where m represents the number of data points. Calafiore (2002) refers to this criterion as "difference of squares'' (DOS). It is not a least squares criterion, but offers the substantial advantage that with DOS a unique ellipsoid satisfies the data and represents a global minimum of the criterion. See his article for proofs of these assertions. Convergence, moreover, is guaranteed if the solution is calculated by means of a new technique, semidefinite programming (SDP), developed during the 1990's. Although new, SDP is really an extension of linear programming, developed near the end of the 1940's.

Semidefinite programming can be summarized by (see Vandenberghe & Boyd (1996) for a complete discussion):

 \begin{displaymath}%
C*X = {\rm min ~ (or }-C*X = {\rm max})
\end{displaymath} (6)

subject to

\begin{eqnarray*}A_{k}*X=b_{k},\;(k=1,\cdots ,n),
\end{eqnarray*}


where n stands for the number of conditions imposed on the minimizer, Eq. (6), and here coincides with the number of unknowns to be determined. C,X, and Ak are symmetric n $\times$ n matrices, the bk are scalars, X is semidefinite (eigenvalues real and nonnegative), and the symbol "*'' refers to $E*F=trace(E\cdot F).$

Calafiore (2002) shows how to convert the problem of fitting an ellipsoid to data points to an SDP problem. Let u be a vector of 1's of size m and $\gamma$ an m-vector of positive scalars. Define

\begin{eqnarray*}g_{i}(\Omega )=\left[
\begin{array}{cc}
A & \left(
\begin{arr...
... h/2 & k/2
\end{array}\right) & \gamma _{i}
\end{array}\right].
\end{eqnarray*}


Then the ellipsoid fitting problem becomes:

\begin{eqnarray*}u^{T}\gamma = {\rm min.}
\end{eqnarray*}


subject to

\begin{eqnarray*}\left(
\begin{array}{ll}
1 & g_{i}(\Omega ) \\
g_{i}(\Omega ) & \gamma _{i}
\end{array}\right) \succ 0,, \;i=1,\cdots ,m;
\end{eqnarray*}



\begin{eqnarray*}A\succ 0;
\end{eqnarray*}


trace(A)=1.

The second condition states that the matrix A must be positive-definite and thus ensures that the quadric surface fitted is indeed an ellipsoid. The third condition prevents the algorithm from calculating the trivial solution $a=b=\cdots =l=0.$ These conditions are sufficient to determine a unique ellipsoid, but SDP is flexible enough to allow more conditions. One could, for example, determine the most spherical ellipsoid that satisfies the data and thus minimize the ratios between the velocity dispersions.

Because the DOS criterion is not a least squares criterion, much less a TLS criterion, parallax errors will not be corrected, as was done automatically with the algorithm presented in Branham (2003). One must, therefore, correct the parallax errors. I performed this operation using Smith & Eichhorn's (1996) procedure, which seems superior to the method that Trumpler & Weaver use: if  $F_{\rm o}(\xi )$ is the observed distribution of parallax error, the real distribution can be calculated from

\begin{eqnarray*}F_{t}(\xi )=F_{\rm o}(\varepsilon )-\frac{\sigma ^{2}}{2}\frac{{\rm d}^{2}F_{\rm o}(\xi )}{{\rm d}\varepsilon ^{2}},
\end{eqnarray*}


where $\sigma $ stands for the standard deviation of the parallax error. Smith & Eichhorn's procedure is superior because it corrects each individual parallax based on the estimated error for that parallax, information the Hipparcos catalog (ESA 1997), from which the parallaxes used in this research were taken, supplies.

4 The solar velocity and components of the velocity ellipsoid

After the coefficients of Eq. (4) have been found, the components of the 3-vector c give the solar velocity. Alternatively, Eq. (4) can be converted into Eq. (1) and the solar velocity calculated after the linear terms g,h,k are, by a translation of the axes, removed:

 \begin{displaymath}%
\left(
\begin{array}{l}
\dot{x}^{\prime } \\
\dot{y}^{\pr...
...-l_{1} \\
\dot{y}-l_{2} \\
\dot{z}-l_{3}
\end{array}\right),
\end{displaymath} (7)

where l1,l2,l3 are defined to eliminate g,h,k and $(\begin{array}{lll}
c_{1} & c_{2} & c_{3}
\end{array}) $ = $(\begin{array}{lll}
l_{1} & l_{2} & l_{3}\end{array}).$ This is done by solution of the 3 $\times$ 3 system

 \begin{displaymath}%
\left(
\begin{array}{lll}
2 & d & e \\
d & 2b & f \\
e &...
...=\left(
\begin{array}{l}
-g \\
-h \\
-k
\end{array}\right).
\end{displaymath} (8)

The $l^{\prime }s$ represent the components of the solar velocity S0,found from

 \begin{displaymath}%
S_{0}=\sqrt{l_{1}^{2}+l_{2}^{2}+l_{3}^{2}}.
\end{displaymath} (9)

The translation defined by Eq. (4) leaves the coefficients a,...,funchanged, but l becomes $l^{\prime
}=l-al_{1}^{2}-bl_{2}^{2}-cl_{3}^{2}-dl_{1}l_{2}-el_{1}l_{3}-fl_{2}l_{3}.$The components of the velocity ellipsoid may be found after an eigenvalue-eigenvector decomposition of the matrix

\begin{eqnarray*}\left(
\begin{array}{llll}
1 & d/2 & e/2 & 0 \\
d/2 & b & f/2...
...2 & f/2 & c & 0 \\
0 & 0 & 0 & l^{\prime }
\end{array}\right).
\end{eqnarray*}


Let $\lambda _{1},\lambda _{2},\lambda _{3},\lambda _{4}$ be the four eigenvalues. Then the components of the velocity ellipsoid are found from

 \begin{displaymath}%
\left(
\begin{array}{l}
\sigma _{x} \\
\sigma _{y} \\
\s...
..._{2}} \\
\sqrt{\lambda _{4}/\lambda _{3}}
\end{array}\right).
\end{displaymath} (10)

The eigenvectors represent the orientation of the velocity ellipsoid.

The preceding analysis first translates the axes to remove the linear terms and then rotates the new axes by an orthogonal matrix to diagonalize the matrix of second order terms. One could just as easily first rotate and then translate, after the coefficients g,h,k have also been rotated. The results will be the same except that the components of the solar velocity will be referred to the new axes rather than the original axes. I prefer to translate and then rotate.

5 The covariance matrix and mean errors

Mean errors remain an essential part of data adjustment. Their calculation depends on an n $\times$ n covariance matrix, COV, derived from Eq. (5),

 \begin{displaymath}%
COV=\sum_{i=1}^{m}\left(
\begin{array}{c}
\partial F/\part...
...partial b & \cdots & \partial F/\partial
l
\end{array}\right).
\end{displaymath} (11)

The product of a column and row vector represents the matrix or outer product, of dimension n $\times$ n and rank 1, of two vectors. To derive mean errors for quantities such as the solar velocity and the velocity dispersions, one may employ Rice's procedure (1902). To express Rice's procedure in modern notation identify the errors in a quantity such as the coefficient a with the differential of the quantity, da. The error in a quantity such as the solar velocity can be found from

 \begin{displaymath}%
({\rm d}S_{0})^{2}=\Delta \left(
\begin{array}{lll}
\parti...
...a \\
\vdots \\
\partial S_{0}/\partial k
\end{array}\right),
\end{displaymath} (12)

where $\Delta $ is the dispersion of the residuals as measured by the DOS criterion, see Eq. (14).

6 The observations and solutions

Research has shown that the velocity ellipsoid depends on the spectrum-luminosity class of the stars. I decided to investigate the ellipsoid for dwarf stars (luminosity class V) of spectral classes O-B5 after the stars belonging to the Gould belt had been eliminated. Why use these particular stars rather than some others? Because I had already completed a study of the kinematics of the non-Gould belt O-B5 stars (Branham 2002), and to calculate a velocity ellipsoid represents a useful extension of the work. The data, moreover, were available and, because these stars exhibit a noticeable vertex deviation, it would be important to see if the SDP approach finds a deviation. Parallaxes and proper motions were taken from the Hipparcos catalog and radial velocities from the Wilson (Nagy 1991) and Barbier-Brossat et al. (1994) catalogs. This yielded 246 stars. Because Galactic longitude l and latitude b are more natural coordinates for studies of Galactic kinematics than right ascension and declination, the proper motion form the Hipparcos catalog was converted to proper motion in the Galactic system.

The treatment of the data involved the following steps. Smith & Eichhorn's Eq. (21) corrected the parallaxes for parallax error on a star by star basis using the parallax with its associated mean error taken from the Hipparcos catalog. Proper motions and radial velocities were corrected for Galactic rotation. Let $\mu _{l}$ be the proper motion in Galactic longitude, $\mu _{b}$ the proper motion in Galactic latitude, $\dot{r}$ the radial velocity, and A and B the Oort constants. Then the corrected proper motions, $\mu _{l}^{\prime }$ and  $\mu _{b}^{\prime },$ and radial velocity, $\dot{r}^{\prime }$ are:

 \begin{displaymath}%
\begin{array}{l}
\kappa \mu _{l}^{\prime }\cos b=\kappa \mu...
...\dot{r}^{\prime }=\dot{r}-\pi A\sin 2l\cos ^{2}b-K.
\end{array}\end{displaymath} (13)

In these equations $\kappa $ is a conversion constant with value 4.74047 km s-1 yr, $\pi $ is a star's parallax, and K is the K term, significant for O-B stars. I used Kerr & Lyndon-Bell's (1986) values for the Oort constants, A= 14 km s-1 kpc-1, B=-12 in the same units, and for the K term took K=4.7 km s-1, a value that appears consistent with various determinations. Branham (2002) had already determined whether an O-B5 star belonged to the Gould belt or not. No Gould belt stars were used.

Table 1: Solution for coefficients of velocity ellipsoid from SDP.

Because the DOS criterion is sensitive to outliers, it is important to eliminate discordant data from the sample before proceeding. To select outliers I used a filter that excluded space velocities that exceeded five times the mean absolute deviation of the velocities. I have used this criterion before. It seems to work well and, because few OB stars are high velocity, there is little chance that many genuine high velocity stars will erroneously be considered outliers and rejected. The criterion rejected no stars. A different criterion, such as Pierce's (Branham 1990) would have rejected thirteen stars, and changed significantly the dispersions and mean errors in Table 1, but Pierce's criterion is based on a normal distribution for residuals. As we shall soon see a normal distribution of error hardly characterizes the stars used in this study. Figure 1 graphs the velocities of these stars. The elongated shape of the velocities, suggestive of an ellipsoidal distribution, is clearly visible. Figure 2 shows a projection onto the $\dot{x}-\dot{y}$ plane, where a vertex deviation becomes manifest.


  \begin{figure}
\par\includegraphics[width=8.8cm,clip]{0301fig1.eps}\end{figure} Figure 1: Velocity distribution of 246 O-B5 stars.
Open with DEXTER


  \begin{figure}
\par\includegraphics[width=8.8cm,clip]{0301fig2.eps}\end{figure} Figure 2: Velocity distribution of 246 O-B5 stars in $\dot{x}\;\dot{-}\;\dot{y}$ plane.
Open with DEXTER

Table 2: Velocity ellipsoid from marginal moments.

From these data I calculated from SDP the solution shown in Table 1. For comparison I also calculated a solution using the procedure, based on marginal moments, outlined in Trumpler & Weaver (1962). This solution, given in Table 2, differs significantly from the SDP solution. The significance of the difference will be discussed in a moment. (No mean errors accompany this solution because Trumpler & Weaver provide no prescription for their calculation.) As a check on the criterion used to eliminate outliers, even though in fact no outliers were eliminated, I performed a solution using the robust L1 criterion. Replace Eq. (5) by

 \begin{displaymath}%
F=\sum_{i=1}^{m}\left\vert (v_{i}-c)^{T}A(v_{i}-c)-q^{2}\right\vert.
\end{displaymath} (14)

Because of the robustness of the criterion, no stars need be eliminated; all 246 were included in the solution (not shown), which differed little from the solution of Table 1. Figure 3 graphs the ellipsoid along with the data, and Fig. 4 projects the ellipsoid onto the $\dot{x}-\dot{y}$ plane. The vertex deviation can be clearly seen. Figures 5 and 6 show the ellipsoid as seen in the $\dot{x}-\dot{z}$ and $\dot{y}-\dot{z}$ planes.


  \begin{figure}
\par\includegraphics[width=8.8cm,clip]{0301fig3.eps}\end{figure} Figure 3: Velocity ellipsoid.
Open with DEXTER


  \begin{figure}
\par\includegraphics[width=8.8cm,clip]{0301fig4.eps}\end{figure} Figure 4: Velocity ellipsoid in $\dot{x}-\dot{y}$ plane.
Open with DEXTER


  \begin{figure}
\par\includegraphics[width=8.8cm,clip]{0301fig5.eps}\end{figure} Figure 5: Velocity ellipsoid in $\dot{x}-\dot{z}$ plane.
Open with DEXTER


  \begin{figure}
\par\includegraphics[width=8.8cm,clip]{0301fig6.eps}\end{figure} Figure 6: Velocity ellipsoid in $\dot{x}-\dot{z}$ plane.
Open with DEXTER

7 Discussion

We have calculated a solution by use of SDP. Why take this more complicated route when Trumpler & Weaver's procedure with statistical moments also calculates the coefficients of an ellipsoid, albeit without mean errors, with less labor? Trumpler & Weaver justify their procedure by reminding us that statistical moments coincide with what maximum likelihood estimation gives when the residuals are randomly selected from a normal distribution. But do the residuals follow a normal distribution and are they random? First we must specify what we mean by a residual. For the SDP solution the definition of a residual comes from the DOS criterion,

 
ri=(vi-c)TA(vi-c)-q2, (15)

where the residual has dimensions of km2 s-2. For the Trumpler-Weaver solution, whose justification relies on maximum likelihood estimation applied to a normal distribution and least squares, it is more logical to take as the definition

 \begin{displaymath}%
r_{i}=\sqrt{(v_{i}-c)^{T}A(v_{i}-c)}-q,
\end{displaymath} (16)

Table 3: Statistics of the residuals.

where the residual now has dimensions of km s-1. Table 3 and Figs. 7 and 8 give a negative answer to whether the distributions are normal. Figure 7 comes from the SDP solution and Fig. 8 from the Trumpler & Weaver solution. A simple glance shows that the residuals are far from normal. Table 3 quantifies this first impression. The runs test measures the randomness of the residuals, essential if one asserts that a least squares solution is optimum. The difference between the expected 123 runs versus the observed 82 runs for SDP and 106 for Trumpler-Weaver demonstrates that the residuals are in fact not random. The nonzero coefficients of skewness confirm the asymmetry of the residuals. The kurtosis indicates that the SDP solution is leptokurtic, the Trumpler-Weaver solution platykurtic. And the Q factor, measuring the weight of the tails of a distribution, shows that the tails of the actual distributions of residuals are far lighter than those for a normal distribution. (Because different definitions are used for a residual, cols. two and three of Table 3 are not directly comparable. Suffice it to say that if we use the DOS definition of a residual with the Trumpler-Weaver solution, the dispersion of the residuals becomes 1871 km2 s-2, there are 132 runs with a skewness of 1.49, kurtosis of 5.52, and a Q factor of 0.37.)


  \begin{figure}
\par\includegraphics[width=8.8cm,clip]{0301fig7.eps}\end{figure} Figure 7: Histogram of residuals from SDP solution.
Open with DEXTER


  \begin{figure}
\par\includegraphics[width=8.8cm,clip]{0301fig8.eps}\end{figure} Figure 8: Histogram of residuals from Trumpler-Weaver solution.
Open with DEXTER

This behavior of the residuals, however, hardly surprises. The ellipsoidal hypothesis of the distribution of velocities, although explaining the distribution's main features, still only approximates the real, and complicated, stellar velocity structure in the solar vicinity. But the behavior also undercuts the theoretical justification for using statistical moments. We have no reason to assert that statistical moments calculate a solution superior to that given by SDP. When one remembers that the SDP solution with the DOS criterion also calculates a unique ellipsoid that represents a global minimum of Eq. (5), then one can maintain that the SDP solution becomes superior.

One may dispute such an assertion by claiming that some of the parameters presented in Table 1 differ considerably from values tabulated elsewhere, in Delhaye (1965), for example. The solar motion and vertex deviation more or less agree, but the velocity dispersions are far higher. But the traditional determinations rely for the most part on statistical moments, and concordance is hardly to be expected. They also differ from Dehnen & Binney's (1998) determinations. Dehnen & Binney do not explicitly find the velocity ellipsoid for O-B5 stars, but divide their data into nine color bins and calculate the ellipsoid, but not the solar motion, for each bin. The bin with (B-V)=-0.238 comes closest to my data. The velocity dispersions, $\sigma _{z}=14.35$ km s-1, $\sigma _{y}=26.81$ km s-1, $\sigma _{x}=37.74$ km s-1, do not differ too much from those of Table 1, particularly if the mean errors are taken into account; their $\sigma _{x},$ for example, varies from 33.72 to 51.22 at the 84.3% confidence level. But surprisingly they find a positive rather than a negative vertex deviation. Once again, however, because their analysis relies on statistical moments, concordance with my results becomes unlikely.

8 Conclusions

Semidefinite programming works well when used to calculate the velocity ellipsoid and by calculating a unique ellipsoid that represents a global minimum of the reduction criterion seems superior to the method of moments. This becomes manifest when the theoretical justification for our using statistical moments, maximum likelihood estimation applied to a normal distribution, is untenable. The flexibility of semidefinite programming allows one, should one so desire, to incorporate futher criteria, such as calculating the most spherical ellipsoid that satisfies the data.

Acknowledgements
I would like to thank G. Calafiore for sending me Matlab routines for ellipsoid fitting to data points.

References



Copyright ESO 2004