next up previous 208
Next: PISA2ARD
Up: Using the PISA parameters PISAPEAK, PISAKNN, PISA2CAT & PISA2ARD
Previous: Transforming PISA parameters to intensity invariant form

Distribution free classification

PISAKNN use the results of PISAPEAK to discriminate objects into two classes. PISAKNN uses KNN (k nearest neighbours) distribution-free multivariate discrimination to classify objects into two classes. The classes are seeded by supplying two files which contain the indices of objects typical to the class in question ($>5$, approximately equal numbers of each). Each object then propagates its class to the other objects on the basis of which class of the 2*k nearest neighbours (in the parameter space of the PISAPEAK results) of each of the unclassified objects is most common. This procedure is iterated until all objects are assigned and have a stable class or until a maximum number of iterations is exceeded. The results of the discrimination are written to two output files, one for each class.

KNN relies on good seed statistics as propagation is essentially linear (but remember that this is in a multivariate sense). If the seed subjects do not reasonably span the whole of the object parameter space improper incursion can occur, leading to misclassification. The classification in boundary areas between the objects will depend on the size of the nearest neighbour count, larger values will help the investigation of the `fuzzy' areas. If a very small value is chosen then this will act almost as a thresholding (but of all the parameters not just one).

KNN has the advantage over classical discriminant analysis in that it does not relies on the classes of objects having multinomial distributions. The assumption of the normality of the objects contributing to each class relies all classes having a random spread across a particular part of parameter space. It is unlikely that this requirement can be met for small galaxy populations, although this may work well for large statistical samples.

The ellipticity is included in the analysis, however, this may not always help selection. If some smallish round galaxies are present using this variable will increase the weight of selecting them as stars. In this case it may be profitable to switch off the ellipticity. Ellipticity can be used for other purposes, say if you want a complete sample of stars, all stars will have ellipticities below a given threshold and can be selected thus. Further refinement can then be applied to the list by thresholding in peakedness to remove objects with large wings.


next up previous 208
Next: PISA2ARD
Up: Using the PISA parameters PISAPEAK, PISAKNN, PISA2CAT & PISA2ARD
Previous: Transforming PISA parameters to intensity invariant form

PISA [2.5ex Position Intensity and Shape Analysis
Starlink User Note 109
Peter W. Draper and Nicholas Eaton
23 October 2002
E-mail:ussc@star.rl.ac.uk

Copyright © 2010 Science and Technology Facilities Council