Data Loader Module

The data loader module provides utilities for loading and managing time series classification datasets.

TSCData Container

class tscf_eval.TSCData[source]

Bases: object

Immutable container for time-series classification data.

A small, well-typed container for time-series classification datasets.

name: str
split: Literal['train', 'test']
X: ndarray
y: ndarray
static from_arrays(name, split, X, y, *, squeeze_univariate=True)[source]

Create a TSCData instance from numpy arrays / array-likes.

Parameters:
  • name (str) – Dataset name.

  • split ({'train', 'test'}) – Which split this instance belongs to.

  • X (array-like) – Time-series data. Accepts either 2D (n, L) for univariate data or 3D (n, D, L) for multivariate data.

  • y (array-like) – 1D labels of length n.

  • squeeze_univariate (bool, optional) – If True and X is shape (n,1,L), squeeze the channel dimension to produce shape (n,L).

Return type:

TSCData

Returns:

TSCData – Constructed immutable container.

Raises:

ValueError – If X or y do not have expected dimensions or if the number of instances disagree.

Parameters:
static from_dataframe(name, split, df, *, label_col, feature_cols=None)[source]

Create a TSCData instance from a wide-format DataFrame.

The dataframe format expected is one row per instance, numeric columns representing time points (or channels flattened), and a column containing the label.

Parameters:
  • name (str) – Dataset name used in the resulting TSCData.

  • split ({'train', 'test'}) – Split label to set on the resulting object.

  • df (pandas.DataFrame) – Source table.

  • label_col (str) – Column name in df containing labels.

  • feature_cols (sequence of str, optional) – Columns to use as features in the desired order. If None, numeric columns except label/split columns are used.

  • (label maps are not used; labels are returned in original form)

Return type:

TSCData

Returns:

TSCData – Constructed dataset object.

Raises:

ValueError – If label_col is missing or no numeric feature columns are found when feature_cols is not provided.

Parameters:
property n_instances: int

Number of instances in the dataset.

Returns:

int – Number of rows / time-series instances (n).

property series_length: int

Length of each time series in time points.

For univariate data this is the second axis length of X when X has shape (n, L). For multivariate data (X shape (n, D, L)) this returns L.

Returns:

int – Series length (L).

property n_dims: int

Number of dimensions (channels) per time series.

Returns:

int1 for univariate series (X is 2D) or D for multivariate series (X is 3D with shape (n, D, L)).

property n_classes: int

Number of unique class labels present in y.

Returns:

int – The number of distinct labels (classes) in the label array.

property is_univariate: bool

Whether the dataset is univariate.

Returns:

bool – True if each instance has a single channel (D == 1), False otherwise.

describe()[source]

Return a small dictionary summarizing dataset properties.

The dictionary contains basic metadata useful for logging or quick inspection: dataset name and split, shapes (instances, series length, dimensions), number of classes, class counts and the optional label mapping if present.

Return type:

dict

Returns:

dict – Summary dictionary with keys: ‘name’, ‘split’, ‘n_instances’, ‘series_length’, ‘n_dims’, ‘n_classes’, ‘class_counts’.

to_dataframe(*, label_name='label', prefix='t_')[source]

Return a wide-format DataFrame representing the dataset.

Parameters:
  • label_name (str, optional) – Column name to use for labels in the returned dataframe.

  • prefix (str, optional) – Prefix for generated numeric/time columns.

Return type:

DataFrame

Returns:

pandas.DataFrame – Wide-format dataframe with numeric columns for each time point (and channel) and a final column with labels.

Parameters:
  • label_name (str)

  • prefix (str)

map_labels(mapping)[source]

Return a copy of this dataset with labels remapped.

Parameters:

mapping (dict) – Mapping from original labels to new labels. If a label is not present in mapping, it is left unchanged.

Return type:

TSCData

Returns:

TSCData – New instance with remapped y.

Parameters:

mapping (dict[int | str, int | str])

select_classes(keep)[source]

Return a view of the dataset keeping only specified classes.

Parameters:

keep (iterable) – Labels to keep. Items not present in the dataset are ignored.

Return type:

TSCData

Returns:

TSCData – New instance containing only instances whose label is in keep.

Parameters:

keep (Iterable[int | str])

save(path)[source]

Save the dataset to a compressed NumPy .npz file.

The file contains arrays for X, y, name and split. Use TSCData.load() to restore.

Parameters:

path (str or pathlib.Path) – Destination file path. The function will use numpy.savez_compressed.

Return type:

None

Parameters:

path (str | Path)

static load(path)[source]

Load a TSCData instance previously written with save().

Parameters:

path (str or pathlib.Path) – Path to .npz file produced by save().

Return type:

TSCData

Returns:

TSCData – Restored dataset.

Parameters:

path (str | Path)

__init__(name, split, X, y)
Parameters:
Return type:

None

Data Loaders

Base Class

class tscf_eval.DataLoader[source]

Bases: ABC

Abstract base class for dataset loaders.

Subclasses implement dataset-specific loading logic. Implementations must provide load() which returns a TSCData for the requested split and describe() which returns a small metadata dictionary suitable for discovery and logging. Use load_both() as a convenience to obtain both train and test splits.

abstractmethod load(split, **kwargs)[source]

Load and return a TSCData for split.

Parameters:
  • split ({'train', 'test'}) – Which split to load.

  • **kwargs – Loader-specific options forwarded to the concrete loader.

Return type:

TSCData

Returns:

TSCData – Loaded dataset for the requested split.

Raises:

RuntimeError – Implementations may raise when the split is not available or when underlying I/O fails.

Parameters:

split (Literal['train', 'test'])

abstractmethod describe()[source]

Return a small metadata dictionary describing available datasets.

The returned dictionary should contain enough information for discovery and logging (for example, available dataset names, default paths, and per-split summaries). The exact structure is loader-specific but should be JSON-serializable.

Return type:

dict

Returns:

dict – Metadata dictionary with loader-specific structure.

load_both(**kwargs)[source]

Load both train and test splits and return them as a tuple.

Parameters:

**kwargs – Loader-specific options forwarded to load().

Return type:

tuple[TSCData, TSCData]

Returns:

  • train (TSCData) – Training dataset.

  • test (TSCData) – Test dataset.

UCR Loader

class tscf_eval.UCRLoader[source]

Bases: DataLoader

Loader for UCR time-series classification datasets from the UCR archive.

This loader delegates to the aeon library’s dataset utilities (aeon.datasets.load_classification). The aeon package must be installed for this loader to work.

Parameters:

dataset_name (str) – Name of the UCR dataset (e.g., ‘ItalyPowerDemand’, ‘GunPoint’).

__init__(dataset_name)[source]

Create a loader for a named UCR dataset.

Parameters:

dataset_name (str) – Name of the UCR dataset (e.g., ‘ItalyPowerDemand’).

Parameters:

dataset_name (str)

load(split, **kwargs)[source]

Load a split (‘train’ or ‘test’) of the dataset using aeon.

Parameters:
  • split ({'train', 'test'}) – Which split to load.

  • **kwargs – Additional arguments forwarded to the underlying loader in aeon.

Return type:

TSCData

Returns:

TSCData – Dataset container with feature arrays X and labels y. For univariate datasets, X has shape (N, T). For multivariate datasets, X has shape (N, C, T) where C is the number of channels/dimensions.

Parameters:

split (Literal['train', 'test'])

describe()[source]

Return a compact description for the dataset.

The description contains per-split metadata (from TSCData.describe()) and an overall summary (currently the combined number of classes observed across splits).

Return type:

dict

Returns:

dict – Dictionary with keys:

  • 'name': Dataset name.

  • 'splits': Dict mapping ‘train’/’test’ to their descriptions.

  • 'overall': Dict with 'n_classes' (total unique classes).

File Loader

class tscf_eval.FileLoader[source]

Bases: DataLoader

Load a wide-format CSV/XLSX file (or pair of files) as TSCData.

Supports two modes:
  • Provide train_path and test_path (two-file mode).

  • Provide data_path and split_col indicating which rows belong to train/test (single-file mode).

The table should be wide-format: one row per instance, numeric columns representing time points (or flattened channels), and a separate label column.

__init__(*, train_path=None, test_path=None, data_path=None, split_col=None, train_value='train', test_value='test', label_col=None, feature_cols=None, sheet_name=None, name='local_wide')[source]

Initialize a file-based loader.

Parameters:
  • train_path, test_path (str or pathlib.Path, optional) – Paths to separate train/test files (two-file mode). Mutually exclusive with data_path mode.

  • data_path (str or pathlib.Path, optional) – Path to a single file containing both splits; requires split_col to be provided.

  • split_col (str, optional) – Column name in data_path indicating split membership.

  • train_value, test_value (str) – Values in split_col that indicate train/test rows.

  • label_col (str) – Column name containing labels (required).

  • feature_cols (sequence of str, optional) – Optional explicit list of feature columns to use.

  • sheet_name (str or int, optional) – When reading Excel files, the sheet to use.

  • name (str) – Dataset name to assign to produced TSCData objects.

Parameters:
load(split, **kwargs)[source]

Load the requested split and return a TSCData.

Parameters:
  • split ({'train', 'test'}) – Which split to load.

  • **kwargs – Additional options (not currently used).

Return type:

TSCData

Returns:

TSCData – Dataset container with feature arrays X and labels y. X has shape (N, T) where N is the number of instances and T is the number of time points (feature columns).

Raises:

ValueError – If split_col is specified but not found in the data file.

Parameters:

split (Literal['train', 'test'])

describe()[source]

Return a concise description for the dataset(s) represented by this loader.

The return value includes per-split metadata (via TSCData.describe()) and an overall summary (combined number of classes across splits).

Return type:

dict

Returns:

dict – Dictionary with keys:

  • 'name': Dataset name.

  • 'splits': Dict mapping ‘train’/’test’ to their descriptions.

  • 'overall': Dict with 'n_classes' (total unique classes).

References

  • Dau, H. A., et al. (2019). “The UCR Time Series Archive.” IEEE/CAA Journal of Automatica Sinica, 6(6), 1293-1305. [Archive]