Normalized to: Igel, C.
[1]
oai:arXiv.org:1704.04650 [pdf] - 1563387
Big Universe, Big Data: Machine Learning and Image Analysis for
Astronomy
Submitted: 2017-04-15
Astrophysics and cosmology are rich with data. The advent of wide-area
digital cameras on large aperture telescopes has led to ever more ambitious
surveys of the sky. Data volumes of entire surveys a decade ago can now be
acquired in a single night and real-time analysis is often desired. Thus,
modern astronomy requires big data know-how, in particular it demands highly
efficient machine learning and image analysis algorithms. But scalability is
not the only challenge: Astronomy applications touch several current machine
learning research questions, such as learning from biased data and dealing with
label and measurement noise. We argue that this makes astronomy a great domain
for computer science research, as it pushes the boundaries of data analysis. In
the following, we will present this exciting application area for data
scientists. We will focus on exemplary results, discuss main challenges, and
highlight some recent methodological advancements in machine learning and image
analysis triggered by astronomical applications.
[2]
oai:arXiv.org:1511.05424 [pdf] - 1550268
Sacrificing information for the greater good: how to select photometric
bands for optimal accuracy
Submitted: 2015-11-17, last modified: 2016-07-06
Large-scale surveys make huge amounts of photometric data available. Because
of the sheer amount of objects, spectral data cannot be obtained for all of
them. Therefore it is important to devise techniques for reliably estimating
physical properties of objects from photometric information alone. These
estimates are needed to automatically identify interesting objects worth a
follow-up investigation as well as to produce the required data for a
statistical analysis of the space covered by a survey. We argue that machine
learning techniques are suitable to compute these estimates accurately and
efficiently. This study promotes a feature selection algorithm, which selects
the most informative magnitudes and colours for a given task of estimating
physical quantities from photometric data alone. Using k nearest neighbours
regression, a well-known non-parametric machine learning method, we show that
using the found features significantly increases the accuracy of the
estimations compared to using standard features and standard methods. We
illustrate the usefulness of the approach by estimating specific star formation
rates (sSFRs) and redshifts (photo-z's) using only the broad-band photometry
from the Sloan Digital Sky Survey (SDSS). For estimating sSFRs, we demonstrate
that our method produces better estimates than traditional spectral energy
distribution (SED) fitting. For estimating photo-z's, we show that our method
produces more accurate photo-z's than the method employed by SDSS. The study
highlights the general importance of performing proper model selection to
improve the results of machine learning systems and how feature selection can
provide insights into the predictive relevance of particular input features.