Normalized to: Brodley, C.
[1]
oai:arXiv.org:0905.3428 [pdf] - 24487
Finding Anomalous Periodic Time Series: An Application to Catalogs of
Periodic Variable Stars
Submitted: 2009-05-20
Catalogs of periodic variable stars contain large numbers of periodic
light-curves (photometric time series data from the astrophysics domain).
Separating anomalous objects from well-known classes is an important step
towards the discovery of new classes of astronomical objects. Most anomaly
detection methods for time series data assume either a single continuous time
series or a set of time series whose periods are aligned. Light-curve data
precludes the use of these methods as the periods of any given pair of
light-curves may be out of sync. One may use an existing anomaly detection
method if, prior to similarity calculation, one performs the costly act of
aligning two light-curves, an operation that scales poorly to massive data
sets. This paper presents PCAD, an unsupervised anomaly detection method for
large sets of unsynchronized periodic time-series data, that outputs a ranked
list of both global and local anomalies. It calculates its anomaly score for
each light-curve in relation to a set of centroids produced by a modified
k-means clustering algorithm. Our method is able to scale to large data sets
through the use of sampling. We validate our method on both light-curve data
and other time series data sets. We demonstrate its effectiveness at finding
known anomalies, and discuss the effect of sample size and number of centroids
on our results. We compare our method to naive solutions and existing time
series anomaly detection methods for unphased data, and show that PCAD's
reported anomalies are comparable to or better than all other methods. Finally,
astrophysicists on our team have verified that PCAD finds true anomalies that
might be indicative of novel astrophysical phenomena.
[2]
oai:arXiv.org:0901.3329 [pdf] - 20559
Event Discovery in Time Series
Submitted: 2009-01-21
The discovery of events in time series can have important implications, such
as identifying microlensing events in astronomical surveys, or changes in a
patient's electrocardiogram. Current methods for identifying events require a
sliding window of a fixed size, which is not ideal for all applications and
could overlook important events. In this work, we develop probability models
for calculating the significance of an arbitrary-sized sliding window and use
these probabilities to find areas of significance. Because a brute force search
of all sliding windows and all window sizes would be computationally
intractable, we introduce a method for quickly approximating the results. We
apply our method to over 100,000 astronomical time series from the MACHO
survey, in which 56 different sections of the sky are considered, each with one
or more known events. Our method was able to recover 100% of these events in
the top 1% of the results, essentially pruning 99% of the data. Interestingly,
our method was able to identify events that do not pass traditional event
discovery procedures.