Full-text search for arXiv

Hosenie, Zafiirah

Normalized to: Hosenie, Z.

5 article(s) in total. 14 co-authors, from 1 to 4 common article(s). Median position in authors list is 4,0.

[1] oai:arXiv.org:2002.12386 [pdf] - 2065446

Imbalance Learning for Variable Star Classification

Hosenie, Zafiirah; Lyon, Robert; Stappers, Benjamin; Mootoovaloo, Arrykrishna; McBride, Vanessa

Comments: 11 pages, 8 figures, Accepted for publication in MNRAS

Submitted: 2020-02-27

The accurate automated classification of variable stars into their respective sub-types is difficult. Machine learning based solutions often fall foul of the imbalanced learning problem, which causes poor generalisation performance in practice, especially on rare variable star sub-types. In previous work, we attempted to overcome such deficiencies via the development of a hierarchical machine learning classifier. This 'algorithm-level' approach to tackling imbalance, yielded promising results on Catalina Real-Time Survey (CRTS) data, outperforming the binary and multi-class classification schemes previously applied in this area. In this work, we attempt to further improve hierarchical classification performance by applying 'data-level' approaches to directly augment the training data so that they better describe under-represented classes. We apply and report results for three data augmentation methods in particular: $\textit{R}$andomly $\textit{A}$ugmented $\textit{S}$ampled $\textit{L}$ight curves from magnitude $\textit{E}$rror ($\texttt{RASLE}$), augmenting light curves with Gaussian Process modelling ($\texttt{GpFit}$) and the Synthetic Minority Over-sampling Technique ($\texttt{SMOTE}$). When combining the 'algorithm-level' (i.e. the hierarchical scheme) together with the 'data-level' approach, we further improve variable star classification accuracy by 1-4$\%$. We found that a higher classification rate is obtained when using $\texttt{GpFit}$ in the hierarchical model. Further improvement of the metric scores requires a better standard set of correctly identified variable stars and, perhaps enhanced features are needed.

[2] oai:arXiv.org:1907.08189 [pdf] - 1925160

Comparing Multi-class, Binary and Hierarchical Machine Learning Classification schemes for variable stars

Hosenie, Zafiirah; Lyon, Robert; Stappers, Benjamin; Mootoovaloo, Arrykrishna

Comments: 16 pages, 11 figures, accepted for publication in MNRAS

Submitted: 2019-07-18

Upcoming synoptic surveys are set to generate an unprecedented amount of data. This requires an automatic framework that can quickly and efficiently provide classification labels for several new object classification challenges. Using data describing 11 types of variable stars from the Catalina Real-Time Transient Surveys (CRTS), we illustrate how to capture the most important information from computed features and describe detailed methods of how to robustly use Information Theory for feature selection and evaluation. We apply three Machine Learning (ML) algorithms and demonstrate how to optimize these classifiers via cross-validation techniques. For the CRTS dataset, we find that the Random Forest (RF) classifier performs best in terms of balanced-accuracy and geometric means. We demonstrate substantially improved classification results by converting the multi-class problem into a binary classification task, achieving a balanced-accuracy rate of $\sim$99 per cent for the classification of ${\delta}$-Scuti and Anomalous Cepheids (ACEP). Additionally, we describe how classification performance can be improved via converting a 'flat-multi-class' problem into a hierarchical taxonomy. We develop a new hierarchical structure and propose a new set of classification features, enabling the accurate identification of subtypes of cepheids, RR Lyrae and eclipsing binary stars in CRTS data.

[3] oai:arXiv.org:1807.02701 [pdf] - 1858909

DeepSource: Point Source Detection using Deep Learning

Sadr, A. Vafaei; Vos, Etienne. E.; Bassett, Bruce A.; Hosenie, Zafiirah; Oozeer, N.; Lochner, Michelle

Comments: 15 pages, 13 figures, submitted to MNRAS

Submitted: 2018-07-07

Point source detection at low signal-to-noise is challenging for astronomical surveys, particularly in radio interferometry images where the noise is correlated. Machine learning is a promising solution, allowing the development of algorithms tailored to specific telescope arrays and science cases. We present DeepSource - a deep learning solution - that uses convolutional neural networks to achieve these goals. DeepSource enhances the Signal-to-Noise Ratio (SNR) of the original map and then uses dynamic blob detection to detect sources. Trained and tested on two sets of 500 simulated 1 deg x 1 deg MeerKAT images with a total of 300,000 sources, DeepSource is essentially perfect in both purity and completeness down to SNR = 4 and outperforms PyBDSF in all metrics. For uniformly-weighted images it achieves a Purity x Completeness (PC) score at SNR = 3 of 0.73, compared to 0.31 for the best PyBDSF model. For natural-weighting we find a smaller improvement of ~40% in the PC score at SNR = 3. If instead we ask where either of the purity or completeness first drop to 90%, we find that DeepSource reaches this value at SNR = 3.6 compared to the 4.3 of PyBDSF (natural-weighting). A key advantage of DeepSource is that it can learn to optimally trade off purity and completeness for any science case under consideration. Our results show that deep learning is a promising approach to point source detection in astronomical images.

[4] oai:arXiv.org:1704.03467 [pdf] - 1582495

No evidence for extensions to the standard cosmological model

Heavens, Alan; Fantaye, Yabebal; Sellentin, Elena; Eggers, Hans; Hosenie, Zafiirah; Kroon, Steve; Mootoovaloo, Arrykrishna

Comments: 5 pages. Accepted for publication in PRL. Effect of inclusion of recent H0 measurements is added

Submitted: 2017-04-11, last modified: 2017-08-09

We compute the Bayesian Evidence for models considered in the main analysis of Planck cosmic microwave background data. By utilising carefully-defined nearest-neighbour distances in parameter space, we reuse the Monte Carlo Markov Chains already produced for parameter inference to compute Bayes factors $B$ for many different model-dataset combinations. Standard 6-parameter flat $\Lambda$CDM model is favoured over all other models considered, with curvature being mildly favoured only when CMB lensing is not included. Many alternative models are strongly disfavoured by the data, including primordial correlated isocurvature models ($\ln B=-7.8$), non-zero scalar-to-tensor ratio ($\ln B=-4.3$), running of the spectral index ($\ln B = -4.7$), curvature ($\ln B=-3.6$), non-standard numbers of neutrinos ($\ln B=-3.1$), non-standard neutrino masses ($\ln B=-3.2$), non-standard lensing potential ($\ln B=-4.6$), evolving dark energy ($\ln B=-3.2$), sterile neutrinos ($\ln B=-6.9$), and extra sterile neutrinos with a non-zero scalar-to-tensor ratio ($\ln B=-10.8$). Other models are less strongly disfavoured with respect to flat $\Lambda$CDM. As with all analyses based on Bayesian Evidence, the final numbers depend on the widths of the parameter priors. We adopt the priors used in the Planck analysis, while performing a prior sensitivity analysis. Our quantitative conclusion is that extensions beyond the standard cosmological model are disfavoured by Planck data. Only when newer Hubble constant measurements are included does $\Lambda$CDM become disfavoured, and only mildly, compared with a dynamical dark energy model ($\ln B\sim +2$).

[5] oai:arXiv.org:1704.03472 [pdf] - 1562206

Marginal Likelihoods from Monte Carlo Markov Chains

Heavens, Alan; Fantaye, Yabebal; Mootoovaloo, Arrykrishna; Eggers, Hans; Hosenie, Zafiirah; Kroon, Steve; Sellentin, Elena

Comments:

Submitted: 2017-04-11

In this paper, we present a method for computing the marginal likelihood, also known as the model likelihood or Bayesian evidence, from Markov Chain Monte Carlo (MCMC), or other sampled posterior distributions. In order to do this, one needs to be able to estimate the density of points in parameter space, and this can be challenging in high numbers of dimensions. Here we present a Bayesian analysis, where we obtain the posterior for the marginal likelihood, using $k$th nearest-neighbour distances in parameter space, using the Mahalanobis distance metric, under the assumption that the points in the chain (thinned if required) are independent. We generalise the algorithm to apply to importance-sampled chains, where each point is assigned a weight. We illustrate this with an idealised posterior of known form with an analytic marginal likelihood, and show that for chains of length $\sim 10^5$ points, the technique is effective for parameter spaces with up to $\sim 20$ dimensions. We also argue that $k=1$ is the optimal choice, and discuss failure modes for the algorithm. In a companion paper (Heavens et al. 2017) we apply the technique to the main MCMC chains from the 2015 Planck analysis of cosmic background radiation data, to infer that quantitatively the simplest 6-parameter flat $\Lambda$CDM standard model of cosmology is preferred over all extensions considered.