2 The Difference Boosting Neural Network (DBNN)

DBNN is based on a powerful and intuitive procedure originally developed and used by mathematicians (Laplace 1812) for sensible classification of objects. Named after its inventor Bayes (1763), Bayes' theorem, according to Laplace, is the mathematical expression of common sense. Bayes' theorem computes the conditional probability for the occurrence of an event, given that another event which could lead to this event has occurred (for an introduction to Bayes' theorem see Loredo 1990). In complex problems, computation of the Bayesian probability becomes a laborious process. However, a variant of the Bayesian classifier known as the naive Bayesian classifier (Elkan 1997) is able to compute the Bayesian probability with reasonable accuracy under the assumption that, given the class, the attribute vectors (arrays of parameter values) are independent. However, in practice this independence assumption is frequently invalid and the performance of the network degrades when there are correlated attribute vectors. The Difference Boosting Algorithm (DBNN) (Philip & Joseph 2001) is a computationally less intensive Bayesian classifier algorithm than its peers, and is closely related to the naive Bayesian classifier. DBNN, however, does not strictly follow the independence of attributes as a basic criterion and allows some correlation between the attributes. It does this by associating a threshold window with each of the attribute values of the sample. The threshold window demands that all the attribute values be in the range specified by the training set for each class of the sample. When any of the attribute values is outside the range specified by the threshold function, the confidence in the classification is penalized by a certain factor.

A popular example manifesting the correlation between input parameters is the XOR gate. A typical XOR gate has two digital inputs and one digital output. A digital state has only two possible values represented as high or low state. The XOR gate behaves such that its output is a high only when the two inputs are dissimilar, as shown in Table 1. Although the conditions appear to be simple, this is a case where the conditional independence is violated. For example, from the knowledge of the value of only one of the inputs, it is not possible to have any preferred knowledge about the class of the object. The actual class can be assigned only when both the inputs are known together. Since only the value of one of the inputs is used by the naive Bayesian classifier at a time, it is not able to produce a confidence level better than 50% (both alternatives equally likely) on such data. However, since DBNN takes into account the values of the other parameters by the use of the window function, it is able to give an accurate representation of the output states. We then say that the network has learned the XOR problem.

Table 1: Truth table of the XOR gate.
Input 1 Input 2 Output

0 0 0

0 1 1

1 0 1

1 1 0

**Table 1:** Truth table of the XOR gate.
Input 1	Input 2	Output
0	0	0
0	1	1
1	0	1
1	1	0

Some other advantages of the DBNN algorithm are its explicit dependence on probability estimates, its ability to give an estimate of the confidence value of a prediction and greater training speed. For the particular application to star galaxy classification, the DBNN gives good results with fewer input parameters than SExtractor. Philip & Joseph (2001), compared results from DBNN with the results obtained by Schiffmann et al. (1994) on sixteen other network models. While Schiffmann et al. (1994) report an average training time of $\sim$ 12 hours on the dataset they used, DBNN on the same dataset took only about 10 min for training. While their best result from the eighteen models produced an accuracy of 98.48% on independent test data, DBNN gave an accuracy of 98.60%.

One of the motivations for the DBNN classifier is that the human brain looks for differences rather than details when it is faced with situations that require distinction between almost identical objects. While the standard Bayesian method is very elaborate and takes every possibility into consideration, and the naive Bayesian ignores all possible correlations between attribute values, DBNN is an attempt to have the best of both worlds by highlighting only the differences.

Up: A difference boosting neural