SAX · tf·idf · cosine similarity · Java

Classify time series — and see why.

SAX-VSM turns each class of time series into a weighted bag of SAX words. Classification is a cosine-similarity lookup, and because every word maps back to a subsequence, the model tells you which shapes drove its decision.

Read the algorithm → The paper ↗ Source ↗

How it works

Two classic ideas, composed

SAX-VSM joins Symbolic Aggregate approXimation — a symbolic representation of time series — with the vector space model from information retrieval. Each class collapses into one tf·idf-weighted term vector; an unlabeled series is scored against each by cosine similarity and labeled by the closest.

step 01

Discretize

A sliding window + SAX converts every class's training series into one combined bag of SAX words.

step 02

Weight

tf·idf scales each word by how characteristic it is of its class, yielding one weight vector per class.

step 03

Classify

An unlabeled series is discretized the same way and assigned the label of the most cosine-similar class vector.

The primer

A step-by-step walk through the math

The algorithm pages build SAX-VSM from the ground up — z-normalization, PAA, the SAX symbol table, sliding-window discretization and numerosity reduction, then tf·idf and cosine similarity — each with worked R examples and figures.

Start the primer → Pattern discovery

SAX-VSM in a nutshell: training time series become tf·idf weight vectors used for cosine-similarity classification

Use it

Get the library

maven central

Add the dependency

SAX-VSM ships as net.seninp:sax-vsm:2.0.0 on Maven Central.

<dependency>
  <groupId>net.seninp</groupId>
  <artifactId>sax-vsm</artifactId>
  <version>2.0.0</version>
</dependency>

reference

Cite the work

Senin, P., Malinchik, S. SAX-VSM: Interpretable Time Series Classification Using SAX and Vector Space Model. ICDM 2013.

PDF ↗GitHub ↗