Normalized to: Ebner, J.
[1]
oai:arXiv.org:1711.06125 [pdf] - 1591327
Galaxy And Mass Assembly: Automatic Morphological Classification of
Galaxies Using Statistical Learning
Sreejith, Sreevarsha;
Pereverzyev, Sergiy;
Kelvin, Lee S.;
Marleau, Francine;
Haltmeier, Markus;
Ebner, Judith;
Bland-Hawthorn, Joss;
Driver, Simon P.;
Graham, Alister W.;
Holwerda, Benne W.;
Hopkins, A. M.;
Liske, J.;
Loveday, Jon;
Moffett, Amanda J.;
Pimbblet, K. A.;
Taylor, Edward N.;
Wang, Lingyu;
Wright, Angus H.
Submitted: 2017-11-16, last modified: 2017-11-17
We apply four statistical learning methods to a sample of $7941$ galaxies
($z<0.06$) from the Galaxy and Mass Assembly (GAMA) survey to test the
feasibility of using automated algorithms to classify galaxies. Using $10$
features measured for each galaxy (sizes, colours, shape parameters \& stellar
mass) we apply the techniques of Support Vector Machines (SVM), Classification
Trees (CT), Classification Trees with Random Forest (CTRF) and Neural Networks
(NN), returning True Prediction Ratios (TPRs) of $75.8\%$, $69.0\%$, $76.2\%$
and $76.0\%$ respectively. Those occasions whereby all four algorithms agree
with each other yet disagree with the visual classification (`unanimous
disagreement') serves as a potential indicator of human error in
classification, occurring in $\sim9\%$ of ellipticals, $\sim9\%$ of Little Blue
Spheroids, $\sim14\%$ of early-type spirals, $\sim21\%$ of intermediate-type
spirals and $\sim4\%$ of late-type spirals \& irregulars. We observe that the
choice of parameters rather than that of algorithms is more crucial in
determining classification accuracy. Due to its simplicity in formulation and
implementation, we recommend the CTRF algorithm for classifying future galaxy
datasets. Adopting the CTRF algorithm, the TPRs of the 5 galaxy types are : E,
$70.1\%$; LBS, $75.6\%$; S0-Sa, $63.6\%$; Sab-Scd, $56.4\%$ and Sd-Irr,
$88.9\%$. Further, we train a binary classifier using this CTRF algorithm that
divides galaxies into spheroid-dominated (E, LBS \& S0-Sa) and disk-dominated
(Sab-Scd \& Sd-Irr), achieving an overall accuracy of $89.8\%$. This translates
into an accuracy of $84.9\%$ for spheroid-dominated systems and $92.5\%$ for
disk-dominated systems.