Skip to content

Preprocessing for Time Series

create_multivariate_windows(df, train_horizon, pred_horizon, train_columns=None, pred_columns=None)

Given a dataframe or raw data sequence with more than one dimension in time, this will create deep learning friendly matrices (X, y) with a given training and prediction length. If not columns are provided, the whole matrix will be used as the training and prediction set.

Parameters:

Name Type Description Default
df array_like

raw data potentially with column names if a full dataframe (>1d in time)

required
train_horizon int

number of data points in each training observation of the X matrix

required
pred_horizon int

number of data points in each prediction observation of the y matrix

required
train_columns list

list of columns to use from dataframe for training matrices. Defaults to None.

None
pred_columns list

list of columns to use from dataframe for prediction matrices. Defaults to None.

None
Source code in timewarpy/preprocess.py
def create_multivariate_windows(df, train_horizon, pred_horizon, train_columns=None, pred_columns=None):
    """Given a dataframe or raw data sequence with more than one dimension in time,
     this will create deep learning friendly matrices (X, y) with a given training
     and prediction length. If not columns are provided, the whole matrix will be used
     as the training and prediction set.

    Args:
        df (array_like): raw data potentially with column names if a full dataframe (>1d in time)
        train_horizon (int): number of data points in each training observation of the X matrix
        pred_horizon (int): number of data points in each prediction observation of the y matrix
        train_columns (list, optional): list of columns to use from dataframe for training matrices.
            Defaults to None.
        pred_columns (list, optional): list of columns to use from dataframe for prediction matrices.
            Defaults to None.
    """
    # number of windows
    num_windows = df.shape[0] - train_horizon - pred_horizon + 1

    # create numpy object
    if train_columns is not None:
        raw_X_data = df[train_columns].to_numpy()
        raw_y_data = df[pred_columns].to_numpy()
    else:
        raw_X_data = np.array(df)
        raw_y_data = np.array(df)

    # allocate memory to numpy
    X = np.zeros([num_windows, train_horizon, raw_X_data.shape[1]])
    y = np.zeros([num_windows, pred_horizon, raw_y_data.shape[1]])

    # loop through the windows
    for i in range(num_windows):

        # store X and Y information
        X[i] = raw_X_data[i: i + train_horizon]
        y[i] = raw_y_data[i + train_horizon: i + train_horizon + pred_horizon]

    return X, y

create_univariate_windows(df, train_horizon, pred_horizon, column=None)

Given a dataframe or raw data sequence, this will create deep learning friendly matrices (X, y) with a given training and prediction length.

Parameters:

Name Type Description Default
df array_like

raw data potentially with column names if a full dataframe

required
train_horizon int

number of data points in each training observation of the X matrix

required
pred_horizon int

number of data points in each prediction observation of the y matrix

required
column str

column to use if df is a pandas dataframe. Defaults to None.

None
Source code in timewarpy/preprocess.py
def create_univariate_windows(df, train_horizon, pred_horizon, column=None):
    """Given a dataframe or raw data sequence, this will create deep learning friendly
    matrices (X, y) with a given training and prediction length.

    Args:
        df (array_like): raw data potentially with column names if a full dataframe
        train_horizon (int): number of data points in each training observation of the X matrix
        pred_horizon (int): number of data points in each prediction observation of the y matrix
        column (str, optional): column to use if df is a pandas dataframe. Defaults to None.
    """
    # number of windows
    num_windows = df.shape[0] - train_horizon - pred_horizon + 1

    # allocate memory to numpy
    X = np.zeros([num_windows, train_horizon, 1])
    y = np.zeros([num_windows, pred_horizon])

    # create numpy object
    if column is not None:
        raw_data = df[[column]].to_numpy()
    else:
        raw_data = np.array(df)

    # loop through the windows
    for i in range(num_windows):

        # store X and Y information
        X[i] = raw_data[i: i + train_horizon]
        y[i] = raw_data[i + train_horizon: i + train_horizon + pred_horizon].reshape(1, -1)[0]

    return X, y