Imbalanced learning for RR Lyrae stars
Submitted: 2018-01-07
We apply machine learning and Convex-Hull algorithms to separate RR Lyrae
stars from other stars, like main sequence stars, white dwarf stars, carbon
stars, CVs and carbon-lines stars, based on the Sloan Digital Sky Survey (SDSS)
and Galaxy Evolution Explorer (GALEX). In the low-dimensional space, the
Convex-Hull algorithm is applied to select RR Lyrae stars. Given different
input patterns of (u-g, g-r), (g-r, r-i), (r-i, i-z), (u-g, g-r, r-i), (g-r,
r-i, i-z), (u-g, g-r, i-z) and (u-g, r-i, i-z), different convex hulls can be
built for RR Lyrae stars. Comparing the performance of different input
patterns, u-g, g-r, i-z is the best input pattern. For this input pattern, the
efficiency (the fraction of true RR Lyrae stars in the predicted RR Lyrae
sample) is 4.2% with a completeness (the fraction of recovered RR Lyrae stars
in the whole RR Lyrae sample) of 100%, increases to 9.9% with 97% completeness
and to 16.1% with 53% completeness by removing some outliers. In the
high-dimensional space, machine learning algorithms are used with input
patterns (u-g, g-r, r-i, i-z), (u-g, g-r, r-i, i-z, r), (NUV-u, u-g, g-r, r-i,
i-z) and (NUV-u, u-g, g-r, r-i, i-z, r). RR Lyrae stars, which belong to the
class of interest in our paper, are rare compared to other stars. For the
highly imbalanced data, cost-sensitive Support Vector Machine (SVM),
cost-sensitive Random Forest and Fast Boxes are used. The results show that
information from GALEX is helpful for identifying RR Lyrae stars and Fast Boxes
are the best performers on the skewed data in our case.