Piecewise Aggregate Approximation of time series

PAA approximates a time-series X of length n into vector \( \bar{X}=(\bar{x}_{1},…,\bar{x}_{M}) \) of any arbitrary length \( M \leq n \) where each of \( \bar{x_{i}} \) is calculated as follows:

Which simply means that in order to reduce the dimensionality from n to M, we first divide the original time-series into M equally sized frames and secondly compute the mean values for each frame. The sequence assembled from the mean values is the PAA approximation (i.e., transform) of the original time-series. As it was shown by Keogh et al, the complexity of the PAA transform can be reduced from O(NM) to O(Mm) where m is the number of frames. By using the following distance measure

Yi & Faloutsos, and Keogh et al, have shown that PAA satisfies to the lower bounding condition and guarantees no false dismissals, i.e.:

Example

In this primer I use the next time series:

series1 <- c(2.02, 2.33, 2.99, 6.85, 9.20, 8.80, 7.50, 6.00, 5.85, 3.85, 4.85, 3.85, 2.22, 1.45, 1.34)

and the following R code:

paa <- function(ts, paa_size){
  len = length(ts)
  if (len == paa_size) {
    ts
  }
  else {
    if (len %% paa_size == 0) {
      colMeans(matrix(ts, nrow=len %/% paa_size, byrow=F))
    }
    else {
      res = rep.int(0, paa_size)
      for (i in c(0:(len * paa_size - 1))) {
        idx = i %/% len + 1# the spot
        pos = i %/% paa_size + 1 # the col spot
        res[idx] = res[idx] + ts[pos]
      }
      for (i in c(1:paa_size)) {
        res[i] = res[i] / len
      }
      res
    }
  }
}

whose application produces a seven-point piecewise aggregate approximation:

s1_paa = paa(series1,7)
(2.23, 5.62, 8.67, 6.36, 4.58, 3.33, 1.45)

or a 9-point approximation which is a bit trickier:

s1_paa = paa(series1,9)
(2.14, 3.63, 8.26, 8.28, 6.27, 4.65, 4.45, 2.39, 1.38)