The search of scientific textual information in the Virtual Observatory.
Sergey Karpov (Special Astrophysical Observatory of Russian Academy of Sciences)
Oleg Bartunov (Sternberg Astronomical Institute of Moscow State University)
We discuss the problem of incorporation of full-text search into framework of the Virtual Observatory. VO provides programmatic access to the astronomical data, usually available in tabular form from distributed network of astronomical data centers. However, there are many astronomical resources available in textual form, such as scientific papeps, preprints, web pages, etc. These may often contain information not yet available in catalogues and it is very important to provide specialized search geared to the data on astronomical objects in such sources.
While it does not require any significant change in the Registry concept (any article may be considered as a separate resource, accessed through standard OAI-2.0 interface, superseded by the Registry), it raises some specific problems related to search queries for such resources. The search of astronomical textual information is complicated due to large diversity of object name nomenclature and terminology, which is suboptimal for usual indexing schemes. For example, ‘Messier 82’, ‘M82’ and ‘M 82’ refer to the same object, which in total has 60 unique names in different catalogues.
We describe two-stage approach which may be applied for such a task. It consists of both normalization of archived texts (conversion of different variants of the same name to standard form, like M 82→ M82), and expansion of object name query to include all its aliases (M82 → M82 + UGC5322 + all other 58 possible names).
We implemented it as a Registry for arXiv.org astro-ph abstracts using specially designed normalization dictionary for open-source PostgreSQL RDBMS with Tsearch2 full-text search plugin for the former task, and SIMBAD name-resolving web services - for the latter. Also, we implemented usual web form query interface to perform both full-text search and query by object name.Back to top