Normalized to: Humphries, M.
[1]
oai:arXiv.org:1603.06395 [pdf] - 1396958
Comparison of three Statistical Classification Techniques for Maser
Identification
Submitted: 2016-03-21
We applied three statistical classification techniques - linear discriminant
analysis (LDA), logistic regression and random forests - to three astronomical
datasets associated with searches for interstellar masers. We compared the
performance of these methods in identifying whether specific mid-infrared or
millimetre continuum sources are likely to have associated interstellar masers.
We also discuss the ease, or otherwise, with which the results of each
classification technique can be interpreted. Non-parametric methods have the
potential to make accurate predictions when there are complex relationships
between critical parameters. We found that for the small datasets the
parametric methods logistic regression and LDA performed best, for the largest
dataset the non-parametric method of random forests performed with comparable
accuracy to parametric techniques, rather than any significant improvement.
This suggests that at least for the specific examples investigated here
accuracy of the predictions obtained is not being limited by the use of
parametric models. We also found that for LDA, transformation of the data to
match a normal distribution in the input parameters led to big improvements in
accuracy. The different classification techniques had significant overlap in
their predictions, further astronomical observations will enable the accuracy
of these predictions to be tested.