Data Science, Machine Learning
ROCKET: Fast and Accurate Time Series Classification
State-of-the-art algorithm for time series classification with python
“The task of time series classification can be thought of as involving learning or detecting signals or patterns within time series associated with relevant classes.” — Dempster, et al 2020, authors of ROCKET paper
Most time series classification methods with state-of-the-art (SOTA) accuracy have high computational complexity and scale poorly. This means they are slow to train on smaller datasets and effectively unusable on large datasets.
ROCKET (RandOM Convolutional KErnal Transform) can achieve the same level of accuracy in just a fraction of the time as competing SOTA algorithms, including convolutional neural networks. The algorithms were evaluated on the benchmark datasets in the UCR Archive.
ROCKET first transforms the time series dataset using random convolutional kernels, such as those used in a CNN, and then trains a linear classifier with these features.
How much faster is ROCKET? To train and test ROCKET on 85 benchmark datasets sequentially, it took 1 hour 40 min. For the same task, the next fastest SOTA algorithm (cBOSS) took 19 hours 33 minutes. For more comparisons on speed, see the paper.
In the remainder of this article, I will:
- Discuss alternative time series classifiers
- Explain how ROCKET works
- Provide a python code example
What are the alternatives?
Other methods for time series classification usually rely on specific representations of series, such as shape, frequency, or variance. The convolutional kernels of ROCKET replace this engineered feature extraction with a single mechanism that can capture many of the same features.
Survey of time series classification
Time series transformation is a foundational idea of time series classification. Many time-series specific algorithms are compositions of transformed time series and conventional classification algorithms, such as those in scikit-learn.
For an introductory survey of time series classification algorithms, see my earlier article.
Competing SOTA methods
The following methods strive to improve upon the speed and accuracy of the algorithms described in the Survey above.
- Proximity Forest is an ensemble of decision trees that are split on an elastic distance measure.
- TS-CHIEF extends Proximity Forest by using dictionary-based and interval-based splitting criteria.
- InceptionTime is an ensemble of 5 deep CNN’s based on the Inception architecture.
- Mr-SEQL applies a linear classifier to features extracted by symbolic representations of time series (SAX, SFA).
- cBOSS, or contractable BOSS, is a dictionary-based classifier based on the SFA transform.
- catch22 is a set of 22 pre-selected time series transformations that can be passed to a classifier.
How does ROCKET work?
ROCKET first transforms a time series using convolutional kernels and second passes the transformed data to a linear classifier.
Convolutional Kernels
The convolutional kernels, the same as those found in convolutional neural networks, are initialized with random length, weights, bias, dilation, and padding. See the paper for how the random parameters are sampled — they are part of ROCKET and the sampling does not need to be tuned. The stride is always one. ROCKET does not apply non-linear transforms, such as ReLU, on the resulting features.
ROCKET uses a very large number of kernels — the default is 10,000. It is possible to use so many because the cost of computing convolutions is very low. This is due to the fact that the kernel weights are not “learned” and that there is only a single layer of convolutions.
Unlike typical CNN’s, ROCKET uses a variety of kernels. The random lengths, dilations, paddings, weights, and biases allow ROCKET to capture a wide range of information. In particular, the variety of kernel dilation allows ROCKET to capture patterns at different frequencies and scales.
These random kernels, in combination, are able to capture features relevant to time series classification. Alone, a single random convolutional kernel may only weakly capture a useful feature from a time series.
The Convolutional Kernel Transform
Each kernel is convolved with each time series to produce a feature map. The kernel’s feature map is aggregated to produce two features per kernel: the maximum value and proportion of positive values.
The maximum value feature is similar to the global max pooling.
The proportion of positive values indicates how to weight the prevalence of a pattern captured by the kernel. This value is the most critical element of ROCKET that contributes to its high accuracy.
Linear Classification
For smaller datasets, the authors recommend a ridge regression classifier due to fast cross-validation of the regularization parameter and no other hyperparameters.
Regularization is critical when the number of features exceeds the number of training examples, as is often the case with small datasets. (By default, ROCKET uses 10,000 kernels and generates two features per kernel, resulting in 20,000 features)
For large datasets, the authors recommend logistic regression with stochastic gradient descent due to scalability.
In “large” datasets, the number of training examples is much larger than the number of extracted features.
How to use ROCKET with Python?
The ROCKET transform is implemented in the sktime
python package.
The following code example is adapted from the sktime Demo of ROCKET Transform.
First, load the required packages.
import numpy as np
from sklearn.linear_model import RidgeClassifierCV
from sktime.datasets import load_arrow_head # univariate dataset
from sktime.transformers.series_as_features.rocket import Rocket
Next set up the training and test data — in this case, I use the univariate ArrowHead series dataset for convenience. The Rocket transform can also be applied to multivariate data.
X_train, y_train = load_arrow_head(split="test", return_X_y=True)
X_test, y_test = load_arrow_head(split="train", return_X_y=True)
print(X_train.shape, X_test.shape)
>> (175, 1) (36, 1)
Transform the training data using the Rocket transform. By default, ROCKET uses 10,000 kernels. In general, more kernels results in higher classification accuracy; however, there is a trade-off between increased accuracy and computation time. Even with a large number of kernels, ROCKET is still very fast.
rocket = Rocket(num_kernels=10,000, random_state=111)
rocket.fit(X_train)
X_train_transform = rocket.transform(X_train)
X_train_transform.shape
>> (175, 20000)
Initialize and train a linear classifier from scikit-learn. The authors of sktime
recommend using RidgeClassifierCV
for smaller datasets (<20k training examples). For larger datasets, use logistic regression trained with stochastic gradient descent SGDClassifier(loss='log')
.
classifier = RidgeClassifierCV(alphas=np.logspace(-3, 3, 10), normalize=True)
classifier.fit(X_train_transform, y_train)
Finally, to score the trained model and generate predictions, transform the test data using Rocket and call the trained model.
X_test_transform = rocket.transform(X_test)
classifier.score(X_test_transform, y_test)
>> 0.9167
Citation
Dempster, A., Petitjean, F. & Webb, G.I. ROCKET: exceptionally fast and accurate time series classification using random convolutional kernels. Data Min Knowl Disc 34, 1454–1495 (2020). https://doi.org/10.1007/s10618-020-00701-z
'Data Analytics(en)' 카테고리의 다른 글
7 Python Tricks You Should Know (0) | 2020.10.22 |
---|---|
Advanced Python: Consider These 10 Elements When You Define Python Functions (0) | 2020.10.21 |
The Beginner’s Guide to Pydantic (0) | 2020.10.19 |
7 Commands in Python to Make Your Life Easier (0) | 2020.10.18 |
Don’t Choose Python as Your First Programming Language (0) | 2020.10.17 |