Gloonts - probabilistic time series modeling

Recently, the research of time series prediction model is being done. As for the more introduction of time series, we know that there are big guys who have made a detailed and systematic introduction. You can go directly if you are interested Here To glance at.

This is about the source code learning of Quick Start Tutorial in gloonts official API. You can translate and summarize your own experience by reading the tutorial cases. If you have any mistakes, please correct them. Official API case address

1. Quick start Wizard

The glonts toolkit contains components and tools for building time series models using MXNet. The currently included model is a prediction model, but the component also supports other time series use cases, such as classification or exception detection.
The toolkit is not intended to be a predictive solution for enterprises or end users, but for scientists and engineers who want to adjust algorithms or build and test their own models.
The contents include:

  • Components used to build new models (release functions, pipelines for feature processing, date features, etc.)
  • Data loading and processing
  • Multiple preset models
  • Mapping and evaluation indicators
  • Manual and real datasets

Import related libraries:

# Third-party imports
%matplotlib inline
import mxnet as mx
from mxnet import gluon
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import json

2. Dataset

Gloonts comes with many open datasets that can be imported directly

from gluonts.dataset.repository.datasets import get_dataset, dataset_recipes
from gluonts.dataset.util import to_pandas

To download one of the built-in datasets, simply call get? Dataset with one of the above names. Gloonts can reuse the saved dataset, so there is no need to download it again: just set regenerate = False.

dataset = get_dataset("m4_hourly", regenerate=True)

Generally, the data set provided by gloonts is an object composed of three main members:

  • dataset.train is an iterative set of data items for training. Each entry corresponds to a time series
  • dataset.test is an iterative set of data items for reasoning. The test dataset is an extended version of the train dataset, with a window at the end of each time series that is not seen during training. The length of the window is equal to the recommended forecast length.
  • dataset.metadata contains metadata of dataset, such as frequency of time series, recommended prediction range, relevant features, etc.
# Draw training set data image
entry = next(iter(dataset.train))
train_series = to_pandas(entry)
train_series.plot()
plt.grid(which="both")
plt.legend(["train series"], loc="upper left")
plt.show()
# Draw test set data image
entry = next(iter(dataset.test))
test_series = to_pandas(entry)
test_series.plot()
plt.axvline(train_series.index[-1], color='r') # end of train dataset
plt.grid(which="both")
plt.legend(["test series", "end of train series"], loc="upper left")
plt.show()
print(f"Length of forecasting window in test dataset: {len(test_series) - len(train_series)}")
print(f"Recommended prediction horizon: {dataset.metadata.prediction_length}")
print(f"Frequency of the time series: {dataset.metadata.freq}")

3. User defined data set

At this point, it is important to emphasize that this particular format is not required for custom datasets that users may have. The only requirement for a custom dataset is that it is iterative and has the target and start fields. To make this point clearer, assume that the common case is that the dataset is in the form of numpy.array, and the index of time series is represented by pandas.Timestamp (each time series may be different):

N = 10  # number of time series
T = 100  # number of timesteps
prediction_length = 24
freq = "1H"
custom_dataset = np.random.normal(size=(N, T))
start = pd.Timestamp("01-01-2019", freq=freq)  # can be different for each time series

Now you can split the dataset and convert it to the appropriate gloonts format in just two lines of code:

from gluonts.dataset.common import ListDataset
# train dataset: cut the last window of length "prediction_length", add "target" and "start" fields
train_ds = ListDataset([{'target': x, 'start': start}
                        for x in custom_dataset[:, :-prediction_length]],
                       freq=freq)
# test dataset: use the whole dataset, add "target" and "start" fields
test_ds = ListDataset([{'target': x, 'start': start}
                       for x in custom_dataset],
                      freq=freq)

4. Train an existing model (Estimator)

Gloonts comes with many pre built models. All users need to do is configure some super parameters. Existing models focus on (but are not limited to) probability prediction. Probability prediction is in the form of probability distribution, rather than simple single point estimation.
We will start with GulonTS's pre built feedforward neural network estimator, which is a simple but powerful prediction model. We will use this model to demonstrate the process of training the model, generating predictions and evaluation results.
Gloonts' built-in feedforward neural network (SimpleFeedForwardEstimator) accepts an input window with a length of "context" and predicts the distribution of the following projection "length values. In terms of gloonts, the feedforward neural network model is an example of Estimator. In gloonts, the Estimator object represents a prediction model and details such as its coefficients and weights.
Generally, each estimator (pre built or customized) is configured with many super parameters, which can be the same (but not binding) between all estimators (for example, prediction [length], or can be specific to a specific estimator (for example, number of layers) for strides in neural networks or CNN.
Finally, each estimator is configured with a Trainer, which defines how to train the model, the number of real-time periods, the learning rate, etc.

from gluonts.model.simple_feedforward import SimpleFeedForwardEstimator
from gluonts.trainer import Trainer
estimator = SimpleFeedForwardEstimator(
    num_hidden_dimensions=[10],
    prediction_length=dataset.metadata.prediction_length,
    context_length=100,
    freq=dataset.metadata.freq,
    trainer=Trainer(ctx="cpu",
                    epochs=5,
                    learning_rate=1e-3,
                    num_batches_per_epoch=100
                   )
)

After specifying all the necessary super parameters for our estimator, we can use the training data set data.train to train it by calling the estimator's training method. The training algorithm returns the fitting model (or predictor in terms of glonts) that can be used to build the prediction.

predictor = estimator.train(dataset.train)

Now that we have the predictor, we can predict the last window of dataset.test. Test and evaluate the performance of the model.
GluonTS has a make? Evaluation? Predictions function that automates the prediction and model evaluation process. In general, this function performs the following steps:

  • Delete the final window of the last length [precision] length of the dataset.test we want to predict
  • The estimator uses the remaining data to predict (in the form of sample paths) the "future" window just deleted
  • Module output prediction sample path and dataset.test (as python generator object)
from gluonts.evaluation.backtest import make_evaluation_predictions
forecast_it, ts_it = make_evaluation_predictions(
    dataset=dataset.test,  # test dataset
    predictor=predictor,  # predictor
    num_samples=100,  # number of sample paths we want for evaluation
)

First, we can convert these generators to lists to simplify subsequent calculations.

(ps: here are some personal opinions. I think the operation of the llist() function will open up a lot of memory space, because forecast it itself is an iterator. I feel that using next (ITER (forecast it)) operation directly can save a lot of memory space. Of course, the List can directly select index numbers more flexibly.

forecasts = list(forecast_it)
tss = list(ts_it)

We can check the first element of these lists (corresponding to the first time series of the dataset). Let's start with the list that contains the time series, tss. We want the first entry of tss to contain the first time series (target) of dataset.test.

# first entry of the time series list
ts_entry = tss[0]
# first 5 values of the time series (convert from pandas to numpy)
np.array(ts_entry[:5]).reshape(-1,)
# first entry of dataset.test
dataset_test_entry = next(iter(dataset.test))
# first 5 values
dataset_test_entry['target'][:5]

Predicting entries in the forecast list is more complex. They are objects that contain all the sample paths. These sample paths are in the form of numpy.ndarray. Their dimensions are (num_samples, prediction_length), predicted start date, time series frequency, etc. We can simply call the corresponding properties of the prediction object.

# first entry of the forecast list
forecast_entry = forecasts[0]
print(f"Number of sample paths: {forecast_entry.num_samples}")
print(f"Dimension of samples: {forecast_entry.samples.shape}")
print(f"Start date of the forecast window: {forecast_entry.start_date}")
print(f"Frequency of the time series: {forecast_entry.freq}")

We can also perform calculations to summarize sample paths, such as calculating the mean or quantile of each of the 48 time steps in the prediction window.

print(f"Mean of the future window:\n {forecast_entry.mean}")
print(f"0.5-quantile (median) of the future window:\n {forecast_entry.quantile(0.5)}")

The Forecast object has a plot method, which can summarize the Forecast path into average value, Forecast interval, etc. The prediction interval is displayed as a "pie chart" with different color shadows.

def plot_prob_forecasts(ts_entry, forecast_entry):
    plot_length = 150
    prediction_intervals = (50.0, 90.0)
    legend = ["observations", "median prediction"] + [f"{k}% prediction interval" for k in prediction_intervals][::-1]

    fig, ax = plt.subplots(1, 1, figsize=(10, 7))
    ts_entry[-plot_length:].plot(ax=ax)  # plot the time series
    forecast_entry.plot(prediction_intervals=prediction_intervals, color='g')
    plt.grid(which="both")
    plt.legend(legend, loc="upper left")
    plt.show()
plot_prob_forecasts(ts_entry, forecast_entry)


We can also evaluate the quality of our forecasts by numbers. In gloonts, the Evaluator class can calculate the comprehensive performance indicators and the indicators of each time series (which is very useful for analyzing the performance of heterogeneous time series).

from gluonts.evaluation import Evaluator
evaluator = Evaluator(quantiles=[0.1, 0.5, 0.9])
agg_metrics, item_metrics = evaluator(iter(tss), iter(forecasts), num_series=len(dataset.test))

Summary indicators are summarized across time steps and time series.

print(json.dumps(agg_metrics, indent=4))
###Output display###
{
    "MSE": 8443908.678894352,
    "abs_error": 8877123.154870987,
    "abs_target_sum": 145558863.59960938,
    "abs_target_mean": 7324.822041043146,
    "seasonal_error": 336.9046924038305,
    "MASE": 3.246015285156499,
    "sMAPE": 0.18506043295168975,
    "MSIS": 38.217292786814824,
    "QuantileLoss[0.1]": 5679215.008755971,
    "Coverage[0.1]": 0.09983896940418678,
    "QuantileLoss[0.5]": 8877123.216818333,
    "Coverage[0.5]": 0.5041767310789049,
    "QuantileLoss[0.9]": 7424048.535117529,
    "Coverage[0.9]": 0.892109500805153,
    "RMSE": 2905.8404427797395,
    "NRMSE": 0.39671140493208645,
    "ND": 0.0609864829618992,
    "wQuantileLoss[0.1]": 0.0390166209622099,
    "wQuantileLoss[0.5]": 0.06098648338748198,
    "wQuantileLoss[0.9]": 0.051003754436685866,
    "mean_wQuantileLoss": 0.05033561959545924,
    "MAE_Coverage": 0.004076086956521706
}

5. Create your own forecast model

ps: to create your own prediction model, you need to build your own component neural network hierarchy, and set your own loss function, etc. to flexibly build your own Estimator. Due to your actual situation, there is no in-depth study, just calling the existing model interface API, and those who are interested in the official model can go directly Official model API – estimator Read the specific parameters and methods. If you are interested in building your own model, please discuss with me.

Only the official code will be released below, but more translation and interpretation will be done.

class MyTrainNetwork(gluon.HybridBlock):
    def __init__(self, prediction_length, **kwargs):
        super().__init__(**kwargs)
        self.prediction_length = prediction_length

        with self.name_scope():
            # Set up a 3 layer neural network that directly predicts the target values
            self.nn = mx.gluon.nn.HybridSequential()
            self.nn.add(mx.gluon.nn.Dense(units=40, activation='relu'))
            self.nn.add(mx.gluon.nn.Dense(units=40, activation='relu'))
            self.nn.add(mx.gluon.nn.Dense(units=self.prediction_length, activation='softrelu'))

    def hybrid_forward(self, F, past_target, future_target):
        prediction = self.nn(past_target)
        # calculate L1 loss with the future_target to learn the median
        return (prediction - future_target).abs().mean(axis=-1)


class MyPredNetwork(MyTrainNetwork):
    # The prediction network only receives past_target and returns predictions
    def hybrid_forward(self, F, past_target):
        prediction = self.nn(past_target)
        return prediction.expand_dims(axis=1)
from gluonts.model.estimator import GluonEstimator
from gluonts.model.predictor import Predictor, RepresentableBlockPredictor
from gluonts.core.component import validated
from gluonts.support.util import copy_parameters
from gluonts.transform import ExpectedNumInstanceSampler, Transformation, InstanceSplitter
from gluonts.dataset.field_names import FieldName
from mxnet.gluon import HybridBlock
class MyEstimator(GluonEstimator):
    @validated()
    def __init__(
        self,
        freq: str,
        context_length: int,
        prediction_length: int,
        trainer: Trainer = Trainer()
    ) -> None:
        super().__init__(trainer=trainer)
        self.context_length = context_length
        self.prediction_length = prediction_length
        self.freq = freq


    def create_transformation(self):
        # Feature transformation that the model uses for input.
        # Here we use a transformation that randomly select training samples from all time series.
        return InstanceSplitter(
                    target_field=FieldName.TARGET,
                    is_pad_field=FieldName.IS_PAD,
                    start_field=FieldName.START,
                    forecast_start_field=FieldName.FORECAST_START,
                    train_sampler=ExpectedNumInstanceSampler(num_instances=1),
                    past_length=self.context_length,
                    future_length=self.prediction_length,
                )

    def create_training_network(self) -> MyTrainNetwork:
        return MyTrainNetwork(
            prediction_length=self.prediction_length
        )

    def create_predictor(
        self, transformation: Transformation, trained_network: HybridBlock
    ) -> Predictor:
        prediction_network = MyPredNetwork(
            prediction_length=self.prediction_length
        )

        copy_parameters(trained_network, prediction_network)

        return RepresentableBlockPredictor(
            input_transform=transformation,
            prediction_net=prediction_network,
            batch_size=self.trainer.batch_size,
            freq=self.freq,
            prediction_length=self.prediction_length,
            ctx=self.trainer.ctx,
        )
Published 4 original articles, won praise 3, visited 57
Private letter follow

Keywords: network JSON Python

Added by soulzllc on Thu, 05 Mar 2020 06:16:29 +0200