Normalized to: Gieseke, F.
[1]
oai:arXiv.org:2005.08126 [pdf] - 2096510
Inferring astrophysical X-ray polarization with deep learning
Submitted: 2020-05-16
We investigate the use of deep learning in the context of X-ray polarization
detection from astrophysical sources as will be observed by the Imaging X-ray
Polarimetry Explorer (IXPE), a future NASA selected space-based mission
expected to be operative in 2021. In particular, we propose two models that can
be used to estimate the impact point as well as the polarization direction of
the incoming radiation. The results obtained show that data-driven approaches
depict a promising alternative to the existing analytical approaches. We also
discuss problems and challenges to be addressed in the near future.
[2]
oai:arXiv.org:1803.10032 [pdf] - 1739907
Return of the features. Efficient feature selection and interpretation
for photometric redshifts
Submitted: 2018-03-27, last modified: 2018-05-09
The explosion of data in recent years has generated an increasing need for
new analysis techniques in order to extract knowledge from massive datasets.
Machine learning has proved particularly useful to perform this task. Fully
automatized methods have recently gathered great popularity, even though those
methods often lack physical interpretability. In contrast, feature based
approaches can provide both well-performing models and understandable
causalities with respect to the correlations found between features and
physical processes. Efficient feature selection is an essential tool to boost
the performance of machine learning models. In this work, we propose a forward
selection method in order to compute, evaluate, and characterize better
performing features for regression and classification problems. Given the
importance of photometric redshift estimation, we adopt it as our case study.
We synthetically created 4,520 features by combining magnitudes, errors, radii,
and ellipticities of quasars, taken from the SDSS. We apply a forward selection
process, a recursive method in which a huge number of feature sets is tested
through a kNN algorithm, leading to a tree of feature sets. The branches of the
tree are then used to perform experiments with the random forest, in order to
validate the best set with an alternative model. We demonstrate that the sets
of features determined with our approach improve the performances of the
regression models significantly when compared to the performance of the classic
features from the literature. The found features are unexpected and surprising,
being very different from the classic features. Therefore, a method to
interpret some of the found features in a physical context is presented. The
methodology described here is very general and can be used to improve the
performance of machine learning models for any regression or classification
task.
[3]
oai:arXiv.org:1709.06257 [pdf] - 1640285
Deep-Learnt Classification of Light Curves
Submitted: 2017-09-19
Astronomy light curves are sparse, gappy, and heteroscedastic. As a result
standard time series methods regularly used for financial and similar datasets
are of little help and astronomers are usually left to their own instruments
and techniques to classify light curves. A common approach is to derive
statistical features from the time series and to use machine learning methods,
generally supervised, to separate objects into a few of the standard classes.
In this work, we transform the time series to two-dimensional light curve
representations in order to classify them using modern deep learning
techniques. In particular, we show that convolutional neural networks based
classifiers work well for broad characterization and classification. We use
labeled datasets of periodic variables from CRTS survey and show how this opens
doors for a quick classification of diverse classes with several possible
exciting extensions.
[4]
oai:arXiv.org:1708.08947 [pdf] - 1587653
Convolutional Neural Networks for Transient Candidate Vetting in
Large-Scale Surveys
Gieseke, Fabian;
Bloemen, Steven;
Bogaard, Cas van den;
Heskes, Tom;
Kindler, Jonas;
Scalzo, Richard A.;
Ribeiro, Valério A. R. M.;
van Roestel, Jan;
Groot, Paul J.;
Yuan, Fang;
Möller, Anais;
Tucker, Brad E.
Submitted: 2017-08-29
Current synoptic sky surveys monitor large areas of the sky to find variable
and transient astronomical sources. As the number of detections per night at a
single telescope easily exceeds several thousand, current detection pipelines
make intensive use of machine learning algorithms to classify the detected
objects and to filter out the most interesting candidates. A number of upcoming
surveys will produce up to three orders of magnitude more data, which renders
high-precision classification systems essential to reduce the manual and,
hence, expensive vetting by human experts. We present an approach based on
convolutional neural networks to discriminate between true astrophysical
sources and artefacts in reference-subtracted optical images. We show that
relatively simple networks are already competitive with state-of-the-art
systems and that their quality can further be improved via slightly deeper
networks and additional preprocessing steps -- eventually yielding models
outperforming state-of-the-art systems. In particular, our best model correctly
classifies about 97.3% of all 'real' and 99.7% of all 'bogus' instances on a
test set containing 1,942 'bogus' and 227 'real' instances in total.
Furthermore, the networks considered in this work can also successfully
classify these objects at hand without relying on difference images, which
might pave the way for future detection pipelines not containing image
subtraction steps at all.
[5]
oai:arXiv.org:1703.07607 [pdf] - 1582073
A probabilistic approach to emission-line galaxy classification
de Souza, R. S.;
Dantas, M. L. L.;
Costa-Duarte, M. V.;
Feigelson, E. D.;
Killedar, M.;
Lablanche, P. -Y.;
Vilalta, R.;
Krone-Martins, A.;
Beck, R.;
Gieseke, F.
Submitted: 2017-03-22, last modified: 2017-08-18
We invoke a Gaussian mixture model (GMM) to jointly analyse two traditional
emission-line classification schemes of galaxy ionization sources: the
Baldwin-Phillips-Terlevich (BPT) and $\rm W_{H\alpha}$ vs. [NII]/H$\alpha$
(WHAN) diagrams, using spectroscopic data from the Sloan Digital Sky Survey
Data Release 7 and SEAGal/STARLIGHT datasets. We apply a GMM to empirically
define classes of galaxies in a three-dimensional space spanned by the $\log$
[OIII]/H$\beta$, $\log$ [NII]/H$\alpha$, and $\log$ EW(H${\alpha}$), optical
parameters. The best-fit GMM based on several statistical criteria suggests a
solution around four Gaussian components (GCs), which are capable to explain up
to 97 per cent of the data variance. Using elements of information theory, we
compare each GC to their respective astronomical counterpart. GC1 and GC4 are
associated with star-forming galaxies, suggesting the need to define a new
starburst subgroup. GC2 is associated with BPT's Active Galaxy Nuclei (AGN)
class and WHAN's weak AGN class. GC3 is associated with BPT's composite class
and WHAN's strong AGN class. Conversely, there is no statistical evidence --
based on four GCs -- for the existence of a Seyfert/LINER dichotomy in our
sample. Notwithstanding, the inclusion of an additional GC5 unravels it. The
GC5 appears associated to the LINER and Passive galaxies on the BPT and WHAN
diagrams respectively. Subtleties aside, we demonstrate the potential of our
methodology to recover/unravel different objects inside the wilderness of
astronomical datasets, without lacking the ability to convey physically
interpretable results. The probabilistic classifications from the GMM analysis
are publicly available within the COINtoolbox
(https://cointoolbox.github.io/GMM\_Catalogue/).
[6]
oai:arXiv.org:1704.04650 [pdf] - 1563387
Big Universe, Big Data: Machine Learning and Image Analysis for
Astronomy
Submitted: 2017-04-15
Astrophysics and cosmology are rich with data. The advent of wide-area
digital cameras on large aperture telescopes has led to ever more ambitious
surveys of the sky. Data volumes of entire surveys a decade ago can now be
acquired in a single night and real-time analysis is often desired. Thus,
modern astronomy requires big data know-how, in particular it demands highly
efficient machine learning and image analysis algorithms. But scalability is
not the only challenge: Astronomy applications touch several current machine
learning research questions, such as learning from biased data and dealing with
label and measurement noise. We argue that this makes astronomy a great domain
for computer science research, as it pushes the boundaries of data analysis. In
the following, we will present this exciting application area for data
scientists. We will focus on exemplary results, discuss main challenges, and
highlight some recent methodological advancements in machine learning and image
analysis triggered by astronomical applications.
[7]
oai:arXiv.org:1701.08748 [pdf] - 1935344
On the realistic validation of photometric redshifts, or why Teddy will
never be Happy
Submitted: 2017-01-30, last modified: 2017-03-20
Two of the main problems encountered in the development and accurate
validation of photometric redshift (photo-z) techniques are the lack of
spectroscopic coverage in feature space (e.g. colours and magnitudes) and the
mismatch between photometric error distributions associated with the
spectroscopic and photometric samples. Although these issues are well known,
there is currently no standard benchmark allowing a quantitative analysis of
their impact on the final photo-z estimation. In this work, we present two
galaxy catalogues, Teddy and Happy, built to enable a more demanding and
realistic test of photo-z methods. Using photometry from the Sloan Digital Sky
Survey and spectroscopy from a collection of sources, we constructed datasets
which mimic the biases between the underlying probability distribution of the
real spectroscopic and photometric sample. We demonstrate the potential of
these catalogues by submitting them to the scrutiny of different photo-z
methods, including machine learning (ML) and template fitting approaches.
Beyond the expected bad results from most ML algorithms for cases with missing
coverage in feature space, we were able to recognize the superiority of global
models in the same situation and the general failure across all types of
methods when incomplete coverage is convoluted with the presence of photometric
errors - a data situation which photo-z methods were not trained to deal with
up to now and which must be addressed by future large scale surveys. Our
catalogues represent the first controlled environment allowing a
straightforward implementation of such tests. The data are publicly available
within the COINtoolbox (https://github.com/COINtoolbox/photoz_catalogues).
[8]
oai:arXiv.org:1608.08016 [pdf] - 1467912
Uncertain Photometric Redshifts
Submitted: 2016-08-29
Photometric redshifts play an important role as a measure of distance for
various cosmological topics. Spectroscopic redshifts are only available for a
very limited number of objects but can be used for creating statistical models.
A broad variety of photometric catalogues provide uncertain low resolution
spectral information for galaxies and quasars that can be used to infer a
redshift. Many different techniques have been developed to produce those
redshift estimates with increasing precision. Instead of providing a point
estimate only, astronomers start to generate probabilistic density functions
(PDFs) which should provide a characterisation of the uncertainties of the
estimation. In this work we present two simple approaches on how to generate
those PDFs. We use the example of generating the photometric redshift PDFs of
quasars from SDSS(DR7) to validate our approaches and to compare them with
point estimates. We do not aim for presenting a new best performing method, but
we choose an intuitive approach that is based on well known machine learning
algorithms. Furthermore we introduce proper tools for evaluating the
performance of PDFs in the context of astronomy. The continuous ranked
probability score (CRPS) and the probability integral transform (PIT) are well
accepted in the weather forecasting community. Both tools reflect how well the
PDFs reproduce the real values of the analysed objects. As we show, nearly all
currently used measures in astronomy show severe weaknesses when used to
evaluate PDFs.
[9]
oai:arXiv.org:1511.05424 [pdf] - 1550268
Sacrificing information for the greater good: how to select photometric
bands for optimal accuracy
Submitted: 2015-11-17, last modified: 2016-07-06
Large-scale surveys make huge amounts of photometric data available. Because
of the sheer amount of objects, spectral data cannot be obtained for all of
them. Therefore it is important to devise techniques for reliably estimating
physical properties of objects from photometric information alone. These
estimates are needed to automatically identify interesting objects worth a
follow-up investigation as well as to produce the required data for a
statistical analysis of the space covered by a survey. We argue that machine
learning techniques are suitable to compute these estimates accurately and
efficiently. This study promotes a feature selection algorithm, which selects
the most informative magnitudes and colours for a given task of estimating
physical quantities from photometric data alone. Using k nearest neighbours
regression, a well-known non-parametric machine learning method, we show that
using the found features significantly increases the accuracy of the
estimations compared to using standard features and standard methods. We
illustrate the usefulness of the approach by estimating specific star formation
rates (sSFRs) and redshifts (photo-z's) using only the broad-band photometry
from the Sloan Digital Sky Survey (SDSS). For estimating sSFRs, we demonstrate
that our method produces better estimates than traditional spectral energy
distribution (SED) fitting. For estimating photo-z's, we show that our method
produces more accurate photo-z's than the method employed by SDSS. The study
highlights the general importance of performing proper model selection to
improve the results of machine learning systems and how feature selection can
provide insights into the predictive relevance of particular input features.
[10]
oai:arXiv.org:1512.06810 [pdf] - 1935274
Exploring the spectroscopic diversity of type Ia supernovae with
DRACULA: a machine learning approach
Sasdelli, Michele;
Ishida, E. E. O.;
Vilalta, R.;
Aguena, M.;
Busti, V. C.;
Camacho, H.;
Trindade, A. M. M.;
Gieseke, F.;
de Souza, R. S.;
Fantaye, Y. T.;
Mazzali, P. A.
Submitted: 2015-12-21, last modified: 2016-06-30
The existence of multiple subclasses of type Ia supernovae (SNeIa) has been
the subject of great debate in the last decade. One major challenge inevitably
met when trying to infer the existence of one or more subclasses is the time
consuming, and subjective, process of subclass definition. In this work, we
show how machine learning tools facilitate identification of subtypes of SNeIa
through the establishment of a hierarchical group structure in the continuous
space of spectral diversity formed by these objects. Using Deep Learning, we
were capable of performing such identification in a 4 dimensional feature space
(+1 for time evolution), while the standard Principal Component Analysis barely
achieves similar results using 15 principal components. This is evidence that
the progenitor system and the explosion mechanism can be described by a small
number of initial physical parameters. As a proof of concept, we show that our
results are in close agreement with a previously suggested classification
scheme and that our proposed method can grasp the main spectral features behind
the definition of such subtypes. This allows the confirmation of the velocity
of lines as a first order effect in the determination of SNIa subtypes,
followed by 91bg-like events. Given the expected data deluge in the forthcoming
years, our proposed approach is essential to allow a quick and statistically
coherent identification of SNeIa subtypes (and outliers). All tools used in
this work were made publicly available in the Python package Dimensionality
Reduction And Clustering for Unsupervised Learning in Astronomy (DRACULA) and
can be found within COINtoolbox (https://github.com/COINtoolbox/DRACULA).
[11]
oai:arXiv.org:1210.7071 [pdf] - 582047
Finding New High-Redshift Quasars by Asking the Neighbours
Submitted: 2012-10-26
Quasars with a high redshift (z) are important to understand the evolution
processes of galaxies in the early universe. However only a few of these
distant objects are known to this date. The costs of building and operating a
10-metre class telescope limit the number of facilities and, thus, the
available observation time. Therefore an efficient selection of candidates is
mandatory. This paper presents a new approach to select quasar candidates with
high redshift (z>4.8) based on photometric catalogues. We have chosen to use
the z>4.8 limit for our approach because the dominant Lyman alpha emission line
of a quasar can only be found in the Sloan i and z-band filters. As part of the
candidate selection approach, a photometric redshift estimator is presented,
too. Three of the 120,000 generated candidates have been spectroscopically
analysed in follow-up observations and a new z=5.0 quasar was found. This
result is consistent with the estimated detection ratio of about 50 per cent
and we expect 60,000 high-redshift quasars to be part of our candidate sample.
The created candidates are available for download at MNRAS or at
http://www.astro.rub.de/polsterer/quasar-candidates.csv.
[12]
oai:arXiv.org:1108.4696 [pdf] - 1516217
Detecting Quasars in Large-Scale Astronomical Surveys
Submitted: 2011-08-23
We present a classification-based approach to identify quasi-stellar radio
sources (quasars) in the Sloan Digital Sky Survey and evaluate its performance
on a manually labeled training set. While reasonable results can already be
obtained via approaches working only on photometric data, our experiments
indicate that simple but problem-specific features extracted from spectroscopic
data can significantly improve the classification performance. Since our
approach works orthogonal to existing classification schemes used for building
the spectroscopic catalogs, our classification results are well suited for a
mutual assessment of the approaches' accuracies.