Full-text search for arXiv

Kaban, Ata

Normalized to: Kaban, A.

5 article(s) in total. 4 co-authors, from 1 to 5 common article(s). Median position in authors list is 2,0.

[1] oai:arXiv.org:0709.0928 [pdf] - 4664

Robust mixtures in the presence of measurement errors

Sun, Jianyong; Kaban, Ata; Raychaudhury, Somak

Comments: (Refereed) Proceedings of the 24-th Annual International Conference on Machine Learning 2007 (ICML07), (Ed.) Z. Ghahramani. June 20-24, 2007, Oregon State University, Corvallis, OR, USA, pp. 847-854; Omnipress. ISBN 978-1-59593-793-3; 8 pages, 6 figures

Submitted: 2007-09-06

We develop a mixture-based approach to robust density modeling and outlier detection for experimental multivariate data that includes measurement error information. Our model is designed to infer atypical measurements that are not due to errors, aiming to retrieve potentially interesting peculiar objects. Since exact inference is not possible in this model, we develop a tree-structured variational EM solution. This compares favorably against a fully factorial approximation scheme, approaching the accuracy of a Markov-Chain-EM, while maintaining computational simplicity. We demonstrate the benefits of including measurement errors in the model, in terms of improved outlier detection rates in varying measurement uncertainty conditions. We then use this approach in detecting peculiar quasars from an astrophysical survey, given photometric measurements with errors.

[2] oai:arXiv.org:astro-ph/0608623 [pdf] - 84500

Young stellar populations in early-type galaxies in the Sloan Digital Sky Survey

Nolan, Louisa A.; Raychaudhury, Somak; Kaban, Ata

Comments: 7 pages, 5 figures, submitted to MNRAS, minor revision

Submitted: 2006-08-29, last modified: 2006-11-13

We use a purely data-driven rectified factor analysis to identify early-type galaxies with recent star formation in DR4 of the SDSS Spectroscopic Catalogue. We compare the spectra and environment of these galaxies with `normal' early-types, and a sample of independently selected E+A galaxies. We calculate the projected local galaxy surface density (Sigma_5 and Sigma_10) for each galaxy in our sample, and find that the dependence, on projected local density, of the properties of E+As is not significantly different from that of early-types with young stellar populations, dropping off rapidly towards denser environments, and flattening off at densities < 0.1-0.3 Mpc^-2. The dearth of E+A galaxies in dense environments confirms that E+As are most likely the products of galaxy-galaxy merging or interactions, rather than star-forming galaxies whose star formation has been quenched by processes unique to dense environments. We see a tentative peak in the number of E+A galaxies at Sigma_10 \~ 0.1-0.3 Mpc^-2, which may represent the local galaxy density at which the rate of galaxy-galaxy merging or interaction rate peaks. Analysis of the spectra of our early-types with young stellar populations suggests that they have a stellar component dominated by F stars, ~ 1-4 Gyr old, together with a mature, metal-rich population characteristic of `typical' early-types. The young stars represent > 10% of the stellar mass in these galaxies. This, together with the similarity of the environments in which this `E+F' population and the E+A galaxy sample are found, suggests that E+F galaxies used to be E+A galaxies, but have evolved by a further ~ one to a few Gyr. Our factor analysis is sensitive enough to identify this hidden population. (Abridged)

[3] oai:arXiv.org:astro-ph/0609094 [pdf] - 84686

On class visualisation for high dimensional data: Exploring scientific datasets

Kaban, Ata; Sun, Jianyong; Raychaudhury, Somak; Nolan, Louisa

Comments: to appear in Lecture notes in Artificial Intelligence vol. 4265, the (refereed) proceedings of the Ninth International conference on Discovery Science (DS-2006), October 2006, Barcelona, Spain. 12 pages, 8 figures

Submitted: 2006-09-04

Parametric Embedding (PE) has recently been proposed as a general-purpose algorithm for class visualisation. It takes class posteriors produced by a mixture-based clustering algorithm and projects them in 2D for visualisation. However, although this fully modularised combination of objectives (clustering and projection) is attractive for its conceptual simplicity, in the case of high dimensional data, we show that a more optimal combination of these objectives can be achieved by integrating them both into a consistent probabilistic model. In this way, the projection step will fulfil a role of regularisation, guarding against the curse of dimensionality. As a result, the tradeoff between clustering and visualisation turns out to enhance the predictive abilities of the overall model. We present results on both synthetic data and two real-world high-dimensional data sets: observed spectra of early-type galaxies and gene expression arrays.

[4] oai:arXiv.org:astro-ph/0511503 [pdf] - 77861

A data-driven Bayesian approach for finding young stellar populations in early-type galaxies from their UV-optical spectra

Nolan, L. A.; Harva, M. O.; Kaban, A; Raychaudhury, S.

Comments: 19 pages, 15 figures, accepted for publication MNRAS

Submitted: 2005-11-16

We present the results of a novel application of Bayesian modelling techniques, which, although purely data driven, have a physically interpretable result, and will be useful as an efficient data mining tool. We base our studies on the UV-to-optical spectra (observed and synthetic) of early-type galaxies. A probabilistic latent variable architecture is formulated, and a rigorous Bayesian methodology is employed for solving the inverse modelling problem from the available data. A powerful aspect of our formalism is that it allows us to recover a limited fraction of missing data due to incomplete spectral coverage, as well as to handle observational errors in a principled way. We apply this method to a sample of 21 well-studied early-type spectra, with known star-formation histories. We find that our data-driven Bayesian modelling allows us to identify those early-types which contain a significant stellar population <~ 1 Gyr old. This method would therefore be a very useful tool for automatically discovering various interesting sub-classes of galaxies. (abridged)

[5] oai:arXiv.org:astro-ph/0505059 [pdf] - 72829

Finding Young Stellar Populations in Elliptical Galaxies from Independent Components of Optical Spectra

Kaban, Ata; Nolan, Louisa A.; Raychaudhury, Somak

Comments: 12 Pages, 7 figures; accepted in SIAM 2005 International Conference on Data Mining, Newport Beach, CA, April 2005

Submitted: 2005-05-03

Elliptical galaxies are believed to consist of a single population of old stars formed together at an early epoch in the Universe, yet recent analyses of galaxy spectra seem to indicate the presence of significant younger populations of stars in them. The detailed physical modelling of such populations is computationally expensive, inhibiting the detailed analysis of the several million galaxy spectra becoming available over the next few years. Here we present a data mining application aimed at decomposing the spectra of elliptical galaxies into several coeval stellar populations, without the use of detailed physical models. This is achieved by performing a linear independent basis transformation that essentially decouples the initial problem of joint processing of a set of correlated spectral measurements into that of the independent processing of a small set of prototypical spectra. Two methods are investigated: (1) A fast projection approach is derived by exploiting the correlation structure of neighboring wavelength bins within the spectral data. (2) A factorisation method that takes advantage of the positivity of the spectra is also investigated. The preliminary results show that typical features observed in stellar population spectra of different evolutionary histories can be convincingly disentangled by these methods, despite the absence of input physics. The success of this basis transformation analysis in recovering physically interpretable representations indicates that this technique is a potentially powerful tool for astronomical data mining.