Normalized to: Miquel, C.
[1]
oai:arXiv.org:2002.02460 [pdf] - 2044677
Intelligent Arxiv: Sort daily papers by learning users topics preference
Submitted: 2020-02-06
Current daily paper releases are becoming increasingly large and areas of
research are growing in diversity. This makes it harder for scientists to keep
up to date with current state of the art and identify relevant work within
their lines of interest. The goal of this article is to address this problem
using Machine Learning techniques. We model a scientific paper to be built as a
combination of different scientific knowledge from diverse topics into a new
problem. In light of this, we implement the unsupervised Machine Learning
technique of Latent Dirichlet Allocation (LDA) on the corpus of papers in a
given field to: i) define and extract underlying topics in the corpus; ii) get
the topics weight vector for each paper in the corpus; and iii) get the topics
weight vector for new papers. By registering papers preferred by a user, we
build a user vector of weights using the information of the vectors of the
selected papers. Hence, by performing an inner product between the user vector
and each paper in the daily Arxiv release, we can sort the papers according to
the user preference on the underlying topics.
We have created the website IArxiv.org where users can read sorted daily
Arxiv releases (and more) while the algorithm learns each users preference,
yielding a more accurate sorting every day. Current IArxiv.org version runs on
Arxiv categories astro-ph, gr-qc, hep-ph and hep-th and we plan to extend to
others. We propose several new useful and relevant implementations to be
additionally developed as well as new Machine Learning techniques beyond LDA to
further improve the accuracy of this new tool.