Full-text search for arXiv

Reis, Itamar

Normalized to: Reis, I.

4 article(s) in total. 8 co-authors, from 1 to 3 common article(s). Median position in authors list is 1,0.

[1] oai:arXiv.org:1911.06823 [pdf] - 1999710

Effectively using unsupervised machine learning in next generation astronomical surveys

Reis, Itamar; Rotman, Michael; Poznanski, Dovi; Prochaska, J. Xavier; Wolf, Lior

Comments: Comments are welcome! The portal is available at https://galaxyportal.space/

Submitted: 2019-11-15

In recent years many works have shown that unsupervised Machine Learning (ML) can help detect unusual objects and uncover trends in large astronomical datasets, but a few challenges remain. We show here, for example, that different methods, or even small variations of the same method, can produce significantly different outcomes. While intuitively somewhat surprising, this can naturally occur when applying unsupervised ML to highly dimensional data, where there can be many reasonable yet different answers to the same question. In such a case the outcome of any single unsupervised ML method should be considered a sample from a conceivably wide range of possibilities. We therefore suggest an approach that eschews finding an optimal outcome, instead facilitating the production and examination of many valid ones. This can be achieved by incorporating unsupervised ML into data visualisation portals. We present here such a portal that we are developing, applied to the sample of SDSS spectra of galaxies. The main feature of the portal is interactive 2D maps of the data. Different maps are constructed by applying dimensionality reduction to different subspaces of the data, so that each map contains different information that in turn gives a different perspective on the data. The interactive maps are intuitive to use, and we demonstrate how peculiar objects and trends can be detected by means of a few button clicks. We believe that including tools in this spirit in next generation astronomical surveys will be important for making unexpected discoveries, either by professional astronomers or by citizen scientists, and will generally enable the benefits of visual inspection even when dealing with very complex and extensive datasets. Our portal is available online at galaxyportal.space.

[2] oai:arXiv.org:1811.05994 [pdf] - 1806220

Probabilistic Random Forest: A machine learning algorithm for noisy datasets

Reis, Itamar; Baron, Dalya; Shahaf, Sahar

Comments: Accepted by AJ, comments are welcome! Code is available at https://github.com/ireis/PRF

Submitted: 2018-11-14

Machine learning (ML) algorithms become increasingly important in the analysis of astronomical data. However, since most ML algorithms are not designed to take data uncertainties into account, ML based studies are mostly restricted to data with high signal-to-noise ratio. Astronomical datasets of such high-quality are uncommon. In this work we modify the long-established Random Forest (RF) algorithm to take into account uncertainties in the measurements (i.e., features) as well as in the assigned classes (i.e., labels). To do so, the Probabilistic Random Forest (PRF) algorithm treats the features and labels as probability distribution functions, rather than deterministic quantities. We perform a variety of experiments where we inject different types of noise to a dataset, and compare the accuracy of the PRF to that of RF. The PRF outperforms RF in all cases, with a moderate increase in running time. We find an improvement in classification accuracy of up to 10% in the case of noisy features, and up to 30% in the case of noisy labels. The PRF accuracy decreased by less then 5% for a dataset with as many as 45% misclassified objects, compared to a clean dataset. Apart from improving the prediction accuracy in noisy datasets, the PRF naturally copes with missing values in the data, and outperforms RF when applied to a dataset with different noise characteristics in the training and test sets, suggesting that it can be used for Transfer Learning.

[3] oai:arXiv.org:1711.00022 [pdf] - 1689724

Detecting outliers and learning complex structures with large spectroscopic surveys - a case study with APOGEE stars

Reis, Itamar; Poznanski, Dovi; Baron, Dalya; Zasowski, Gail; Shahaf, Sahar

Comments: Published version. Data products from this work are available online, see github.com/ireis/APOGEE_tSNE_nb

Submitted: 2017-10-31, last modified: 2018-05-28

In this work we apply and expand on a recently introduced outlier detection algorithm that is based on an unsupervised random forest. We use the algorithm to calculate a similarity measure for stellar spectra from the Apache Point Observatory Galactic Evolution Experiment (APOGEE). We show that the similarity measure traces non-trivial physical properties and contains information about complex structures in the data. We use it for visualization and clustering of the dataset, and discuss its ability to find groups of highly similar objects, including spectroscopic twins. Using the similarity matrix to search the dataset for objects allows us to find objects that are impossible to find using their best fitting model parameters. This includes extreme objects for which the models fail, and rare objects that are outside the scope of the model. We use the similarity measure to detect outliers in the dataset, and find a number of previously unknown Be-type stars, spectroscopic binaries, carbon rich stars, young stars, and a few that we cannot interpret. Our work further demonstrates the potential for scientific discovery when combining machine learning methods with modern survey data.

[4] oai:arXiv.org:1805.09829 [pdf] - 1732695

Redshifted broad absorption line quasars found via machine-learned spectral similarity

Reis, Itamar; Poznanski, Dovi; Hall, Patrick B.

Comments: Submitted to MNRAS. Comments are welcome!

Submitted: 2018-05-24

We report the discovery of 31 new redshifted broad absorption line quasars (RSBALs) from the Sloan Digital Sky Survey (SDSS). The number of previously known such objects is 19. The identification of the new objects was enabled by calculating similarities between quasar spectra in the SDSS. Using these similarities we look for the objects that are similar to the ones in the original sample, visually inspecting only hundreds, out of over 160,000 spectra considered. We compare the performance of several similarity measures, as well as different methods of employing them, in finding the RSBALs. We find that decision tree based similarities recover the most objects, and that an ensemble of methods performs better than any single one. As the similarities are not tailored for the specific problem of finding RSBALs, they could be used for searching for other types of quasars. The similarities and the code for their calculation are available online.