Normalized to: Kaban, A.
[1]
oai:arXiv.org:0709.0928 [pdf] - 4664
Robust mixtures in the presence of measurement errors
Submitted: 2007-09-06
We develop a mixture-based approach to robust density modeling and outlier
detection for experimental multivariate data that includes measurement error
information. Our model is designed to infer atypical measurements that are not
due to errors, aiming to retrieve potentially interesting peculiar objects.
Since exact inference is not possible in this model, we develop a
tree-structured variational EM solution. This compares favorably against a
fully factorial approximation scheme, approaching the accuracy of a
Markov-Chain-EM, while maintaining computational simplicity. We demonstrate the
benefits of including measurement errors in the model, in terms of improved
outlier detection rates in varying measurement uncertainty conditions. We then
use this approach in detecting peculiar quasars from an astrophysical survey,
given photometric measurements with errors.
[2]
oai:arXiv.org:astro-ph/0608623 [pdf] - 84500
Young stellar populations in early-type galaxies in the Sloan Digital
Sky Survey
Submitted: 2006-08-29, last modified: 2006-11-13
We use a purely data-driven rectified factor analysis to identify early-type
galaxies with recent star formation in DR4 of the SDSS Spectroscopic Catalogue.
We compare the spectra and environment of these galaxies with `normal'
early-types, and a sample of independently selected E+A galaxies. We calculate
the projected local galaxy surface density (Sigma_5 and Sigma_10) for each
galaxy in our sample, and find that the dependence, on projected local density,
of the properties of E+As is not significantly different from that of
early-types with young stellar populations, dropping off rapidly towards denser
environments, and flattening off at densities < 0.1-0.3 Mpc^-2. The dearth of
E+A galaxies in dense environments confirms that E+As are most likely the
products of galaxy-galaxy merging or interactions, rather than star-forming
galaxies whose star formation has been quenched by processes unique to dense
environments. We see a tentative peak in the number of E+A galaxies at Sigma_10
\~ 0.1-0.3 Mpc^-2, which may represent the local galaxy density at which the
rate of galaxy-galaxy merging or interaction rate peaks. Analysis of the
spectra of our early-types with young stellar populations suggests that they
have a stellar component dominated by F stars, ~ 1-4 Gyr old, together with a
mature, metal-rich population characteristic of `typical' early-types. The
young stars represent > 10% of the stellar mass in these galaxies. This,
together with the similarity of the environments in which this `E+F' population
and the E+A galaxy sample are found, suggests that E+F galaxies used to be E+A
galaxies, but have evolved by a further ~ one to a few Gyr. Our factor analysis
is sensitive enough to identify this hidden population. (Abridged)
[3]
oai:arXiv.org:astro-ph/0609094 [pdf] - 84686
On class visualisation for high dimensional data: Exploring scientific
datasets
Submitted: 2006-09-04
Parametric Embedding (PE) has recently been proposed as a general-purpose
algorithm for class visualisation. It takes class posteriors produced by a
mixture-based clustering algorithm and projects them in 2D for visualisation.
However, although this fully modularised combination of objectives (clustering
and projection) is attractive for its conceptual simplicity, in the case of
high dimensional data, we show that a more optimal combination of these
objectives can be achieved by integrating them both into a consistent
probabilistic model. In this way, the projection step will fulfil a role of
regularisation, guarding against the curse of dimensionality. As a result, the
tradeoff between clustering and visualisation turns out to enhance the
predictive abilities of the overall model. We present results on both synthetic
data and two real-world high-dimensional data sets: observed spectra of
early-type galaxies and gene expression arrays.
[4]
oai:arXiv.org:astro-ph/0511503 [pdf] - 77861
A data-driven Bayesian approach for finding young stellar populations in
early-type galaxies from their UV-optical spectra
Submitted: 2005-11-16
We present the results of a novel application of Bayesian modelling
techniques, which, although purely data driven, have a physically interpretable
result, and will be useful as an efficient data mining tool. We base our
studies on the UV-to-optical spectra (observed and synthetic) of early-type
galaxies. A probabilistic latent variable architecture is formulated, and a
rigorous Bayesian methodology is employed for solving the inverse modelling
problem from the available data. A powerful aspect of our formalism is that it
allows us to recover a limited fraction of missing data due to incomplete
spectral coverage, as well as to handle observational errors in a principled
way. We apply this method to a sample of 21 well-studied early-type spectra,
with known star-formation histories. We find that our data-driven Bayesian
modelling allows us to identify those early-types which contain a significant
stellar population <~ 1 Gyr old. This method would therefore be a very useful
tool for automatically discovering various interesting sub-classes of galaxies.
(abridged)
[5]
oai:arXiv.org:astro-ph/0505059 [pdf] - 72829
Finding Young Stellar Populations in Elliptical Galaxies from
Independent Components of Optical Spectra
Submitted: 2005-05-03
Elliptical galaxies are believed to consist of a single population of old
stars formed together at an early epoch in the Universe, yet recent analyses of
galaxy spectra seem to indicate the presence of significant younger populations
of stars in them. The detailed physical modelling of such populations is
computationally expensive, inhibiting the detailed analysis of the several
million galaxy spectra becoming available over the next few years. Here we
present a data mining application aimed at decomposing the spectra of
elliptical galaxies into several coeval stellar populations, without the use of
detailed physical models. This is achieved by performing a linear independent
basis transformation that essentially decouples the initial problem of joint
processing of a set of correlated spectral measurements into that of the
independent processing of a small set of prototypical spectra. Two methods are
investigated: (1) A fast projection approach is derived by exploiting the
correlation structure of neighboring wavelength bins within the spectral data.
(2) A factorisation method that takes advantage of the positivity of the
spectra is also investigated. The preliminary results show that typical
features observed in stellar population spectra of different evolutionary
histories can be convincingly disentangled by these methods, despite the
absence of input physics. The success of this basis transformation analysis in
recovering physically interpretable representations indicates that this
technique is a potentially powerful tool for astronomical data mining.