Normalized to: Portillo, S.
[1]
oai:arXiv.org:2002.10464 [pdf] - 2131872
Dimensionality Reduction of SDSS Spectra with Variational Autoencoders
Submitted: 2020-02-24, last modified: 2020-07-09
High resolution galaxy spectra contain much information about galactic
physics, but the high dimensionality of these spectra makes it difficult to
fully utilize the information they contain. We apply variational autoencoders
(VAEs), a non-linear dimensionality reduction technique, to a sample of spectra
from the Sloan Digital Sky Survey. In contrast to Principal Component Analysis
(PCA), a widely used technique, VAEs can capture non-linear relationships
between latent parameters and the data. We find that a VAE can reconstruct the
SDSS spectra well with only six latent parameters, outperforming PCA with the
same number of components. Different galaxy classes are naturally separated in
this latent space, without class labels having been given to the VAE. The VAE
latent space is interpretable because the VAE can be used to make synthetic
spectra at any point in latent space. For example, making synthetic spectra
along tracks in latent space yields sequences of realistic spectra that
interpolate between two different types of galaxies. Using the latent space to
find outliers may yield interesting spectra: in our small sample, we
immediately find unusual data artifacts and stars misclassified as galaxies. In
this exploratory work, we show that VAEs create compact, interpretable latent
spaces that capture non-linear features of the data. While a VAE takes
substantial time to train (~1 day for 48000 spectra), once trained, VAEs can
enable the fast exploration of large astronomical data sets.
[2]
oai:arXiv.org:1907.04929 [pdf] - 2068905
Multiband Probabilistic Cataloging: A Joint Fitting Approach to Point
Source Detection and Deblending
Submitted: 2019-07-10, last modified: 2020-03-20
Probabilistic cataloging (PCAT) outperforms traditional cataloging methods on
single-band optical data in crowded fields (Portillo et al. 2017). We extend
our work to multiple bands, achieving greater sensitivity ($\sim$ 0.4 mag) and
greater speed (500x) compared to previous single-band results. We demonstrate
the effectiveness of multiband PCAT on mock data, both in terms of recovering
accurate posteriors in the catalog space, and in directly deblending sources.
When applied to Sloan Digital Sky Survey (SDSS) observations of M2, taking
Hubble Space Telescope data as truth, our joint fit on $r$ and $i$ band data
goes $\sim0.4$ mag deeper than single-band probabilistic cataloging and has a
false discovery rate less than 20\% for F606W$\leq 20$. Compared to DAOPHOT,
the two-band SDSS catalog fit goes nearly 1.5 magnitudes deeper using the same
data, and maintains a lower false discovery rate down to F606W$\sim 20.5$.
Given recent improvements in computational speed, multiband PCAT shows promise
in application to large-scale surveys and is a plausible framework for joint
analysis of multi-instrument observational data.
[3]
oai:arXiv.org:1902.02374 [pdf] - 2068883
Photometric Biases in Modern Surveys
Submitted: 2019-02-06, last modified: 2020-02-14
Many surveys use maximum-likelihood (ML) methods to fit models when
extracting photometry from images. We show these ML estimators systematically
overestimate the flux as a function of the signal-to-noise ratio and the number
of model parameters involved in the fit. This bias is substantially worse for
resolved sources: while a 1% bias is expected for a 10$\sigma$ point source, a
10$\sigma$ resolved galaxy with a simplified Gaussian profile suffers a 2.5%
bias. This bias also behaves differently depending how multiple bands are used
in the fit: simultaneously fitting all bands leads the flux bias to become
roughly evenly distributed between them, while fixing the position in
"non-detection" bands (i.e. forced photometry) gives flux estimates in those
bands that are biased low, compounding a bias in derived colors. We show that
these effects are present in idealized simulations, outputs from the Hyper
Suprime-Cam fake object pipeline (SynPipe), and observations from Sloan Digital
Sky Survey Stripe 82. Prescriptions to correct for the ML bias in flux, and its
uncertainty, are provided.
[4]
oai:arXiv.org:1903.06796 [pdf] - 1850859
Astro2020 Science White Paper: The Next Decade of Astroinformatics and
Astrostatistics
Siemiginowska, A.;
Eadie, G.;
Czekala, I.;
Feigelson, E.;
Ford, E. B.;
Kashyap, V.;
Kuhn, M.;
Loredo, T.;
Ntampaka, M.;
Stevens, A.;
Avelino, A.;
Borne, K.;
Budavari, T.;
Burkhart, B.;
Cisewski-Kehe, J.;
Civano, F.;
Chilingarian, I.;
van Dyk, D. A.;
Fabbiano, G.;
Finkbeiner, D. P.;
Foreman-Mackey, D.;
Freeman, P.;
Fruscione, A.;
Goodman, A. A.;
Graham, M.;
Guenther, H. M.;
Hakkila, J.;
Hernquist, L.;
Huppenkothen, D.;
James, D. J.;
Law, C.;
Lazio, J.;
Lee, T.;
López-Morales, M.;
Mahabal, A. A.;
Mandel, K.;
Meng, X. L.;
Moustakas, J.;
Muna, D.;
Peek, J. E. G.;
Richards, G.;
Portillo, S. K. N.;
Scargle, J.;
de Souza, R. S.;
Speagle, J. S.;
Stassun, K. G.;
Stenning, D. C.;
Taylor, S. R.;
Tremblay, G. R.;
Trimble, V.;
Yanamandra-Fisher, P. A.;
Young, C. A.
Submitted: 2019-03-15
Over the past century, major advances in astronomy and astrophysics have been
largely driven by improvements in instrumentation and data collection. With the
amassing of high quality data from new telescopes, and especially with the
advent of deep and large astronomical surveys, it is becoming clear that future
advances will also rely heavily on how those data are analyzed and interpreted.
New methodologies derived from advances in statistics, computer science, and
machine learning are beginning to be employed in sophisticated investigations
that are not only bringing forth new discoveries, but are placing them on a
solid footing. Progress in wide-field sky surveys, interferometric imaging,
precision cosmology, exoplanet detection and characterization, and many
subfields of stellar, Galactic and extragalactic astronomy, has resulted in
complex data analysis challenges that must be solved to perform scientific
inference. Research in astrostatistics and astroinformatics will be necessary
to develop the state-of-the-art methodology needed in astronomy. Overcoming
these challenges requires dedicated, interdisciplinary research. We recommend:
(1) increasing funding for interdisciplinary projects in astrostatistics and
astroinformatics; (2) dedicating space and time at conferences for
interdisciplinary research and promotion; (3) developing sustainable funding
for long-term astrostatisics appointments; and (4) funding infrastructure
development for data archives and archive support, state-of-the-art algorithms,
and efficient computing.
[5]
oai:arXiv.org:1803.08931 [pdf] - 1805993
Mapping Distances Across the Perseus Molecular Cloud Using CO
Observations, Stellar Photometry, and Gaia DR2 Parallax Measurements
Submitted: 2018-03-23, last modified: 2018-10-17
We present a new technique to determine distances to major star-forming
regions across the Perseus Molecular Cloud, using a combination of stellar
photometry, astrometric data, and $\rm ^{12} CO$ spectral-line maps.
Incorporating the Gaia DR2 parallax measurements when available, we start by
inferring the distance and reddening to stars from their Pan-STARRS1 and 2MASS
photometry, based on a technique presented in Green et al. 2014; Green et al.
2015 and implemented in their 3D "Bayestar" dust map of three-quarters of the
sky. We then refine the Green et al. technique by using the velocity slices of
a CO spectral cube as dust templates and modeling the cumulative distribution
of dust along the line of sight towards these stars as a linear combination of
the emission in the slices. Using a nested sampling algorithm, we fit these
per-star distance-reddening measurements to find the distances to the CO
velocity slices towards each star-forming region. This results in distance
estimates explicitly tied to the velocity structure of the molecular gas. We
determine distances to the B5, IC348, B1, NGC1333, L1448, and L1451
star-forming regions and find that individual clouds are located between
$\approx 275-300$ pc, with typical combined uncertainties of $\approx 5\%$. We
find that the velocity gradient across Perseus corresponds to a distance
gradient of about 25 pc, with the eastern portion of the cloud farther away
than the western portion. We determine an average distance to the complex of
$294\pm 17$ pc, about 60 pc higher than the distance derived to the western
portion of the cloud using parallax measurements of water masers associated
with young stellar objects. The method we present is not limited to the Perseus
Complex, but may be applied anywhere on the sky with adequate CO data in the
pursuit of more accurate 3D maps of molecular clouds in the solar neighborhood
and beyond.
[6]
oai:arXiv.org:1711.09907 [pdf] - 1759253
Developing the 3-Point Correlation Function For the Turbulent
Interstellar Medium
Submitted: 2017-11-27, last modified: 2018-10-01
We present the first application of the angle-dependent 3-Point Correlation
Function (3PCF) to the density fields magnetohydrodynamic (MHD) turbulence
simulations intended to model interstellar (ISM) turbulence. Previous work has
demonstrated that the angle-averaged bispectrum, the 3PCF's Fourier-space
analog, is sensitive to the sonic and Alfv\'enic Mach numbers of turbulence.
Here we show that introducing angular information via multipole moments with
respect to the triangle opening angle offers considerable additional
discriminatory power on these parameters. We exploit a fast, order $N_{\rm g}
\log N_{\rm g}$ ($N_{\rm g}$ the number of grid cells used for a Fourier
Transform) 3PCF algorithm to study a suite of MHD turbulence simulations with
10 different combinations of sonic and Alfv\'enic Mach numbers over a range
from sub to super-sonic and sub to super-Alfv\'{e}nic. The 3PCF algorithm's
speed for the first time enables full quantification of the time-variation of
our signal: we study 9 timeslices for each condition, demonstrating that the
3PCF is sufficiently time-stable to be used as an ISM diagnostic. In future,
applying this framework to 3-D dust maps will enable better treatment of dust
as a cosmological foreground as well as reveal conditions in the ISM that shape
star formation.
[7]
oai:arXiv.org:1710.01785 [pdf] - 1682476
Too hot to handle? Analytic solutions for massive neutrino or warm dark
matter cosmologies
Submitted: 2017-10-04
We obtain novel closed form solutions to the Friedmann equation for
cosmological models containing a component whose equation of state is that of
radiation $(w=1/3)$ at early times and that of cold pressureless matter $(w=0)$
at late times. The equation of state smoothly transitions from the early to
late-time behavior and exactly describes the evolution of a species with a
Dirac Delta function distribution in momentum magnitudes $|\vec{p}_0|$ (i.e.
all particles have the same $|\vec{p}_0|$). Such a component, here termed "hot
matter", is an approximate model for both neutrinos and warm dark matter. We
consider it alone and in combination with cold matter and with radiation, also
obtaining closed-form solutions for the growth of super-horizon perturbations
in each case. The idealized model recovers $t(a)$ to better than $1.5\%$
accuracy for all $a$ relative to a Fermi-Dirac distribution (as describes
neutrinos). We conclude by adding the second moment of the distribution to our
exact solution and then generalizing to include all moments of an arbitrary
momentum distribution in a closed form solution.
[8]
oai:arXiv.org:1703.01303 [pdf] - 1581748
Improved Point Source Detection in Crowded Fields using Probabilistic
Cataloging
Submitted: 2017-03-03, last modified: 2017-08-07
Cataloging is challenging in crowded fields because sources are extremely
covariant with their neighbors and blending makes even the number of sources
ambiguous. We present the first optical probabilistic catalog, cataloging a
crowded (~0.1 sources per pixel brighter than 22nd magnitude in F606W) Sloan
Digital Sky Survey r band image from M2. Probabilistic cataloging returns an
ensemble of catalogs inferred from the image and thus can capture source-source
covariance and deblending ambiguities. By comparing to a traditional catalog of
the same image and a Hubble Space Telescope catalog of the same region, we show
that our catalog ensemble better recovers sources from the image. It goes more
than a magnitude deeper than the traditional catalog while having a lower false
discovery rate brighter than 20th magnitude. We also present an algorithm for
reducing this catalog ensemble to a condensed catalog that is similar to a
traditional catalog, except it explicitly marginalizes over source-source
covariances and nuisance parameters. We show that this condensed catalog has a
similar completeness and false discovery rate to the catalog ensemble. Future
telescopes will be more sensitive, and thus more of their images will be
crowded. Probabilistic cataloging performs better than existing software in
crowded fields and so should be considered when creating photometric pipelines
in the Large Synoptic Space Telescope era.
[9]
oai:arXiv.org:1607.04637 [pdf] - 1641209
Inference of Unresolved Point Sources At High Galactic Latitudes Using
Probabilistic Catalogs
Submitted: 2016-07-15, last modified: 2017-03-09
Detection of point sources in images is a fundamental operation in
astrophysics, and is crucial for constraining population models of the
underlying point sources or characterizing the background emission. Standard
techniques fall short in the crowded-field limit, losing sensitivity to faint
sources and failing to track their covariance with close neighbors. We
construct a Bayesian framework to perform inference of faint or overlapping
point sources. The method involves probabilistic cataloging, where samples are
taken from the posterior probability distribution of catalogs consistent with
an observed photon count map. In order to validate our method we sample random
catalogs of the gamma-ray sky in the direction of the North Galactic Pole (NGP)
by binning the data in energy and Point Spread Function (PSF) classes. Using
three energy bins spanning $0.3 - 1$, $1 - 3$ and $3 - 10$ GeV, we identify
$270\substack{+30 \\ -10}$ point sources inside a $40^\circ \times 40^\circ$
region around the NGP above our point-source inclusion limit of $3 \times
10^{-11}$/cm$^2$/s/sr/GeV at the $1-3$ GeV energy bin. Modeling the flux
distribution as a power law, we infer the slope to be $-1.92\substack{+0.07 \\
-0.05}$ and estimate the contribution of point sources to the total emission as
$18\substack{+2 \\ -2}$\%. These uncertainties in the flux distribution are
fully marginalized over the number as well as the spatial and spectral
properties of the unresolved point sources. This marginalization allows a
robust test of whether the apparently isotropic emission in an image is due to
unresolved point sources or of truly diffuse origin.
[10]
oai:arXiv.org:1402.6703 [pdf] - 1362535
The Characterization of the Gamma-Ray Signal from the Central Milky Way:
A Compelling Case for Annihilating Dark Matter
Submitted: 2014-02-26, last modified: 2015-03-17
Past studies have identified a spatially extended excess of $\sim$1-3 GeV
gamma rays from the region surrounding the Galactic Center, consistent with the
emission expected from annihilating dark matter. We revisit and scrutinize this
signal with the intention of further constraining its characteristics and
origin. By applying cuts to the \textit{Fermi} event parameter CTBCORE, we
suppress the tails of the point spread function and generate high resolution
gamma-ray maps, enabling us to more easily separate the various gamma-ray
components. Within these maps, we find the GeV excess to be robust and highly
statistically significant, with a spectrum, angular distribution, and overall
normalization that is in good agreement with that predicted by simple
annihilating dark matter models. For example, the signal is very well fit by a
36-51 GeV dark matter particle annihilating to $b\bar{b}$ with an annihilation
cross section of $\sigma v = (1-3)\times 10^{-26}$ cm$^3$/s (normalized to a
local dark matter density of 0.4 GeV/cm$^3$). Furthermore, we confirm that the
angular distribution of the excess is approximately spherically symmetric and
centered around the dynamical center of the Milky Way (within
$\sim$$0.05^{\circ}$ of Sgr A$^*$), showing no sign of elongation along the
Galactic Plane. The signal is observed to extend to at least $\simeq10^{\circ}$
from the Galactic Center, disfavoring the possibility that this emission
originates from millisecond pulsars.
[11]
oai:arXiv.org:1406.0507 [pdf] - 893668
Sharper Fermi LAT Images: instrument response functions for an improved
event selection
Submitted: 2014-06-02, last modified: 2014-09-28
The Large Area Telescope on the Fermi Gamma-ray Space Telescope has a point
spread function with large tails, consisting of events affected by tracker
ineffiencies, inactive volumes, and hard scattering; these tails can make
source confusion a limiting factor. The parameter CTBCORE, available in the
publicly available Extended Fermi LAT data, estimates the quality of each
event's direction reconstruction; by implementing a cut in this parameter, the
tails of the point spread function can be suppressed at the cost of losing
effective area. We implement cuts on CTBCORE and present updated instrument
response functions derived from the Fermi LAT data itself, along with all-sky
maps generated with these cuts. Having shown the effectiveness of these cuts,
especially at low energies, we encourage their use in analyses where angular
resolution is more important than Poisson noise.