Normalized to: D'Isanto, A.
[1]
oai:arXiv.org:2006.08238 [pdf] - 2114365
Comparison of outlier detection methods on astronomical image data
Submitted: 2020-06-15
Among the many challenges posed by the huge data volumes produced by the new
generation of astronomical instruments there is also the search for rare and
peculiar objects. Unsupervised outlier detection algorithms may provide a
viable solution. In this work we compare the performances of six methods: the
Local Outlier Factor, Isolation Forest, k-means clustering, a measure of
novelty, and both a normal and a convolutional autoencoder. These methods were
applied to data extracted from SDSS stripe 82. After discussing the sensitivity
of each method to its own set of hyperparameters, we combine the results from
each method to rank the objects and produce a final list of outliers.
[2]
oai:arXiv.org:1803.10032 [pdf] - 1739907
Return of the features. Efficient feature selection and interpretation
for photometric redshifts
Submitted: 2018-03-27, last modified: 2018-05-09
The explosion of data in recent years has generated an increasing need for
new analysis techniques in order to extract knowledge from massive datasets.
Machine learning has proved particularly useful to perform this task. Fully
automatized methods have recently gathered great popularity, even though those
methods often lack physical interpretability. In contrast, feature based
approaches can provide both well-performing models and understandable
causalities with respect to the correlations found between features and
physical processes. Efficient feature selection is an essential tool to boost
the performance of machine learning models. In this work, we propose a forward
selection method in order to compute, evaluate, and characterize better
performing features for regression and classification problems. Given the
importance of photometric redshift estimation, we adopt it as our case study.
We synthetically created 4,520 features by combining magnitudes, errors, radii,
and ellipticities of quasars, taken from the SDSS. We apply a forward selection
process, a recursive method in which a huge number of feature sets is tested
through a kNN algorithm, leading to a tree of feature sets. The branches of the
tree are then used to perform experiments with the random forest, in order to
validate the best set with an alternative model. We demonstrate that the sets
of features determined with our approach improve the performances of the
regression models significantly when compared to the performance of the classic
features from the literature. The found features are unexpected and surprising,
being very different from the classic features. Therefore, a method to
interpret some of the found features in a physical context is presented. The
methodology described here is very general and can be used to improve the
performance of machine learning models for any regression or classification
task.
[3]
oai:arXiv.org:1706.02467 [pdf] - 1626256
Photometric redshift estimation via deep learning
Submitted: 2017-06-08, last modified: 2017-09-08
The need to analyze the available large synoptic multi-band surveys drives
the development of new data-analysis methods. Photometric redshift estimation
is one field of application where such new methods improved the results,
substantially. Up to now, the vast majority of applied redshift estimation
methods have utilized photometric features. We aim to develop a method to
derive probabilistic photometric redshift directly from multi-band imaging
data, rendering pre-classification of objects and feature extraction obsolete.
A modified version of a deep convolutional network was combined with a mixture
density network. The estimates are expressed as Gaussian mixture models
representing the probability density functions (PDFs) in the redshift space. In
addition to the traditional scores, the continuous ranked probability score
(CRPS) and the probability integral transform (PIT) were applied as performance
criteria. We have adopted a feature based random forest and a plain mixture
density network to compare performances on experiments with data from SDSS
(DR9). We show that the proposed method is able to predict redshift PDFs
independently from the type of source, for example galaxies, quasars or stars.
Thereby the prediction performance is better than both presented reference
methods and is comparable to results from the literature. The presented method
is extremely general and allows us to solve of any kind of probabilistic
regression problems based on imaging data, for example estimating metallicity
or star formation rate of galaxies. This kind of methodology is tremendously
important for the next generation of surveys.
[4]
oai:arXiv.org:1703.01979 [pdf] - 1581773
Uncertain Photometric Redshifts with Deep Learning Methods
Submitted: 2017-03-06
The need for accurate photometric redshifts estimation is a topic that has
fundamental importance in Astronomy, due to the necessity of efficiently
obtaining redshift information without the need of spectroscopic analysis. We
propose a method for determining accurate multimodal photo-z probability
density functions (PDFs) using Mixture Density Networks (MDN) and Deep
Convolutional Networks (DCN). A comparison with a Random Forest (RF) is
performed.
[5]
oai:arXiv.org:1608.08016 [pdf] - 1467912
Uncertain Photometric Redshifts
Submitted: 2016-08-29
Photometric redshifts play an important role as a measure of distance for
various cosmological topics. Spectroscopic redshifts are only available for a
very limited number of objects but can be used for creating statistical models.
A broad variety of photometric catalogues provide uncertain low resolution
spectral information for galaxies and quasars that can be used to infer a
redshift. Many different techniques have been developed to produce those
redshift estimates with increasing precision. Instead of providing a point
estimate only, astronomers start to generate probabilistic density functions
(PDFs) which should provide a characterisation of the uncertainties of the
estimation. In this work we present two simple approaches on how to generate
those PDFs. We use the example of generating the photometric redshift PDFs of
quasars from SDSS(DR7) to validate our approaches and to compare them with
point estimates. We do not aim for presenting a new best performing method, but
we choose an intuitive approach that is based on well known machine learning
algorithms. Furthermore we introduce proper tools for evaluating the
performance of PDFs in the context of astronomy. The continuous ranked
probability score (CRPS) and the probability integral transform (PIT) are well
accepted in the weather forecasting community. Both tools reflect how well the
PDFs reproduce the real values of the analysed objects. As we show, nearly all
currently used measures in astronomy show severe weaknesses when used to
evaluate PDFs.
[6]
oai:arXiv.org:1601.03931 [pdf] - 1364937
An analysis of feature relevance in the classification of astronomical
transients with machine learning methods
Submitted: 2016-01-15
The exploitation of present and future synoptic (multi-band and multi-epoch)
surveys requires an extensive use of automatic methods for data processing and
data interpretation. In this work, using data extracted from the Catalina Real
Time Transient Survey (CRTS), we investigate the classification performance of
some well tested methods: Random Forest, MLPQNA (Multi Layer Perceptron with
Quasi Newton Algorithm) and K-Nearest Neighbors, paying special attention to
the feature selection phase. In order to do so, several classification
experiments were performed. Namely: identification of cataclysmic variables,
separation between galactic and extra-galactic objects and identification of
supernovae.