Normalized to: Hilbe, J.
[1]
oai:arXiv.org:1409.7699 [pdf] - 1807260
The Overlooked Potential of Generalized Linear Models in Astronomy-II:
Gamma regression and photometric redshifts
Submitted: 2014-09-26, last modified: 2018-12-30
Machine learning techniques offer a precious tool box for use within
astronomy to solve problems involving so-called big data. They provide a means
to make accurate predictions about a particular system without prior knowledge
of the underlying physical processes of the data. In this article, and the
companion papers of this series, we present the set of Generalized Linear
Models (GLMs) as a fast alternative method for tackling general astronomical
problems, including the ones related to the machine learning paradigm. To
demonstrate the applicability of GLMs to inherently positive and continuous
physical observables, we explore their use in estimating the photometric
redshifts of galaxies from their multi-wavelength photometry. Using the gamma
family with a log link function we predict redshifts from the PHoto-z Accuracy
Testing simulated catalogue and a subset of the Sloan Digital Sky Survey from
Data Release 10. We obtain fits that result in catastrophic outlier rates as
low as ~1% for simulated and ~2% for real data. Moreover, we can easily obtain
such levels of precision within a matter of seconds on a normal desktop
computer and with training sets that contain merely thousands of galaxies. Our
software is made publicly available as an user-friendly package developed in
Python, R and via an interactive web application
(https://cosmostatisticsinitiative.shinyapps.io/CosmoPhotoz). This software
allows users to apply a set of GLMs to their own photometric catalogues and
generates publication quality plots with minimum effort from the user. By
facilitating their ease of use to the astronomical community, this paper series
aims to make GLMs widely known and to encourage their implementation in future
large-scale projects, such as the Large Synoptic Survey Telescope.
[2]
oai:arXiv.org:1603.06256 [pdf] - 1935289
Is the cluster environment quenching the Seyfert activity in elliptical
and spiral galaxies?
de Souza, R. S.;
Dantas, M. L. L.;
Krone-Martins, A.;
Cameron, E.;
Coelho, P.;
Hattab, M. W.;
de Val-Borro, M.;
Hilbe, J. M.;
Elliott, J.;
Hagen, A.
Submitted: 2016-03-20, last modified: 2016-07-06
We developed a hierarchical Bayesian model (HBM) to investigate how the
presence of Seyfert activity relates to their environment, herein represented
by the galaxy cluster mass, $M_{200}$, and the normalized cluster-centric
distance, $r/r_{200}$. We achieved this by constructing an unbiased sample of
galaxies from the Sloan Digital Sky Survey, with morphological classifications
provided by the Galaxy Zoo Project. A propensity score matching approach is
introduced to control for the effects of confounding variables: stellar mass,
galaxy colour, and star formation rate. The connection between Seyfert-activity
and environmental properties in the de-biased sample is modelled within an HBM
framework using the so-called logistic regression technique, suitable for the
analysis of binary data (e.g., whether or not a galaxy hosts an AGN). Unlike
standard ordinary least square fitting methods, our methodology naturally
allows modelling the probability of Seyfert-AGN activity in galaxies on their
natural scale, i.e. as a binary variable. Furthermore, we demonstrate how an
HBM can incorporate information of each particular galaxy morphological type in
a unified framework. In elliptical galaxies, our analysis indicates a strong
correlation of Seyfert-AGN activity with $r/r_{200}$, and a weaker correlation
with the mass of the host. In spiral galaxies these trends do not appear,
suggesting that the link between Seyfert activity and the properties of spiral
galaxies are independent of the environment.
[3]
oai:arXiv.org:1506.04792 [pdf] - 1935110
The Overlooked Potential of Generalized Linear Models in Astronomy-III:
Bayesian Negative Binomial Regression and Globular Cluster Populations
Submitted: 2015-06-15, last modified: 2015-08-13
In this paper, the third in a series illustrating the power of generalized
linear models (GLMs) for the astronomical community, we elucidate the potential
of the class of GLMs which handles count data. The size of a galaxy's globular
cluster population $N_{\rm GC}$ is a prolonged puzzle in the astronomical
literature. It falls in the category of count data analysis, yet it is usually
modelled as if it were a continuous response variable. We have developed a
Bayesian negative binomial regression model to study the connection between
$N_{\rm GC}$ and the following galaxy properties: central black hole mass,
dynamical bulge mass, bulge velocity dispersion, and absolute visual magnitude.
The methodology introduced herein naturally accounts for heteroscedasticity,
intrinsic scatter, errors in measurements in both axes (either discrete or
continuous), and allows modelling the population of globular clusters on their
natural scale as a non-negative integer variable. Prediction intervals of 99%
around the trend for expected $N_{\rm GC}$comfortably envelope the data,
notably including the Milky Way, which has hitherto been considered a
problematic outlier. Finally, we demonstrate how random intercept models can
incorporate information of each particular galaxy morphological type. Bayesian
variable selection methodology allows for automatically identifying galaxy
types with different productions of GCs, suggesting that on average S0 galaxies
have a GC population 35% smaller than other types with similar brightness.
[4]
oai:arXiv.org:1507.01293 [pdf] - 1429369
Using gamma regression for photometric redshifts of survey galaxies
Submitted: 2015-07-05
Machine learning techniques offer a plethora of opportunities in tackling big
data within the astronomical community. We present the set of Generalized
Linear Models as a fast alternative for determining photometric redshifts of
galaxies, a set of tools not commonly applied within astronomy, despite being
widely used in other professions. With this technique, we achieve catastrophic
outlier rates of the order of ~1%, that can be achieved in a matter of seconds
on large datasets of size ~1,000,000. To make these techniques easily
accessible to the astronomical community, we developed a set of libraries and
tools that are publicly available.
[5]
oai:arXiv.org:1409.7696 [pdf] - 1047947
The Overlooked Potential of Generalized Linear Models in Astronomy - I:
Binomial Regression
Submitted: 2014-09-26, last modified: 2015-04-04
Revealing hidden patterns in astronomical data is often the path to
fundamental scientific breakthroughs; meanwhile the complexity of scientific
inquiry increases as more subtle relationships are sought. Contemporary data
analysis problems often elude the capabilities of classical statistical
techniques, suggesting the use of cutting edge statistical methods. In this
light, astronomers have overlooked a whole family of statistical techniques for
exploratory data analysis and robust regression, the so-called Generalized
Linear Models (GLMs). In this paper -- the first in a series aimed at
illustrating the power of these methods in astronomical applications -- we
elucidate the potential of a particular class of GLMs for handling
binary/binomial data, the so-called logit and probit regression techniques,
from both a maximum likelihood and a Bayesian perspective. As a case in point,
we present the use of these GLMs to explore the conditions of star formation
activity and metal enrichment in primordial minihaloes from cosmological
hydro-simulations including detailed chemistry, gas physics, and stellar
feedback. We predict that for a dark mini-halo with metallicity $\approx 1.3
\times 10^{-4} Z_{\bigodot}$, an increase of $1.2 \times 10^{-2}$ in the gas
molecular fraction, increases the probability of star formation occurrence by a
factor of 75%. Finally, we highlight the use of receiver operating
characteristic curves as a diagnostic for binary classifiers, and ultimately we
use these to demonstrate the competitive predictive performance of GLMs against
the popular technique of artificial neural networks.
[6]
oai:arXiv.org:1301.3069 [pdf] - 614001
New Organizations to Support Astroinformatics and Astrostatistics
Submitted: 2013-01-14
In the past two years, the environment within which astronomers conduct their
data analysis and management has rapidly changed. Working Groups associated
with international societies and Big Data projects have emerged to support and
stimulate the new fields of astroinformatics and astrostatistics. Sponsoring
societies include the Intenational Statistical Institute, International
Astronomical Union, American Astronomical Society, and Large Synoptic Survey
Telescope project. They enthusiastically support cross-disciplinary activities
where the advanced capabilities of computer science, statistics and related
fields of applied mathematics are applied to advance research on planets,
stars, galaxies and the Universe. The ADASS community is encouraged to join
these organizations and to explore and engage in their public communication Web
site, the Astrostatistics and Astroinformatics Portal (http://asaip.psu.edu).