Data Loader Module
The data loader module provides utilities for loading and managing time series classification datasets.
TSCData Container
- class tscf_eval.TSCData[source]
Bases:
objectImmutable container for time-series classification data.
A small, well-typed container for time-series classification datasets.
- static from_arrays(name, split, X, y, *, squeeze_univariate=True)[source]
Create a
TSCDatainstance from numpy arrays / array-likes.- Parameters:
name (
str) – Dataset name.split (
{'train', 'test'}) – Which split this instance belongs to.X (
array-like) – Time-series data. Accepts either 2D(n, L)for univariate data or 3D(n, D, L)for multivariate data.y (
array-like) – 1D labels of lengthn.squeeze_univariate (
bool, optional) – IfTrueandXis shape(n,1,L), squeeze the channel dimension to produce shape(n,L).
- Return type:
- Returns:
TSCData– Constructed immutable container.- Raises:
ValueError – If
Xorydo not have expected dimensions or if the number of instances disagree.- Parameters:
- static from_dataframe(name, split, df, *, label_col, feature_cols=None)[source]
Create a
TSCDatainstance from a wide-formatDataFrame.The dataframe format expected is one row per instance, numeric columns representing time points (or channels flattened), and a column containing the label.
- Parameters:
name (
str) – Dataset name used in the resultingTSCData.split (
{'train', 'test'}) – Split label to set on the resulting object.df (
pandas.DataFrame) – Source table.label_col (
str) – Column name indfcontaining labels.feature_cols (
sequenceofstr, optional) – Columns to use as features in the desired order. IfNone, numeric columns except label/split columns are used.(label maps are not used; labels are returned in original form)
- Return type:
- Returns:
TSCData– Constructed dataset object.- Raises:
ValueError – If
label_colis missing or no numeric feature columns are found whenfeature_colsis not provided.- Parameters:
- property n_instances: int
Number of instances in the dataset.
- Returns:
int– Number of rows / time-series instances (n).
- property series_length: int
Length of each time series in time points.
For univariate data this is the second axis length of
XwhenXhas shape(n, L). For multivariate data (Xshape(n, D, L)) this returnsL.- Returns:
int– Series length (L).
- property n_dims: int
Number of dimensions (channels) per time series.
- Returns:
int–1for univariate series (Xis 2D) orDfor multivariate series (Xis 3D with shape(n, D, L)).
- property n_classes: int
Number of unique class labels present in
y.- Returns:
int– The number of distinct labels (classes) in the label array.
- property is_univariate: bool
Whether the dataset is univariate.
- Returns:
bool– True if each instance has a single channel (D == 1), False otherwise.
- describe()[source]
Return a small dictionary summarizing dataset properties.
The dictionary contains basic metadata useful for logging or quick inspection: dataset name and split, shapes (instances, series length, dimensions), number of classes, class counts and the optional label mapping if present.
- to_dataframe(*, label_name='label', prefix='t_')[source]
Return a wide-format
DataFramerepresenting the dataset.- Parameters:
- Return type:
DataFrame- Returns:
pandas.DataFrame– Wide-format dataframe with numeric columns for each time point (and channel) and a final column with labels.- Parameters:
- save(path)[source]
Save the dataset to a compressed NumPy
.npzfile.The file contains arrays for
X,y,nameandsplit. UseTSCData.load()to restore.- Parameters:
path (
strorpathlib.Path) – Destination file path. The function will usenumpy.savez_compressed.- Return type:
- Parameters:
Data Loaders
Base Class
- class tscf_eval.DataLoader[source]
Bases:
ABCAbstract base class for dataset loaders.
Subclasses implement dataset-specific loading logic. Implementations must provide
load()which returns aTSCDatafor the requested split anddescribe()which returns a small metadata dictionary suitable for discovery and logging. Useload_both()as a convenience to obtain both train and test splits.- abstractmethod load(split, **kwargs)[source]
Load and return a
TSCDataforsplit.- Parameters:
split (
{'train', 'test'}) – Which split to load.**kwargs – Loader-specific options forwarded to the concrete loader.
- Return type:
- Returns:
TSCData– Loaded dataset for the requested split.- Raises:
RuntimeError – Implementations may raise when the split is not available or when underlying I/O fails.
- Parameters:
split (Literal['train', 'test'])
- abstractmethod describe()[source]
Return a small metadata dictionary describing available datasets.
The returned dictionary should contain enough information for discovery and logging (for example, available dataset names, default paths, and per-split summaries). The exact structure is loader-specific but should be JSON-serializable.
UCR Loader
- class tscf_eval.UCRLoader[source]
Bases:
DataLoaderLoader for UCR time-series classification datasets from the UCR archive.
This loader delegates to the
aeonlibrary’s dataset utilities (aeon.datasets.load_classification). Theaeonpackage must be installed for this loader to work.- Parameters:
dataset_name (
str) – Name of the UCR dataset (e.g., ‘ItalyPowerDemand’, ‘GunPoint’).
- load(split, **kwargs)[source]
Load a split (‘train’ or ‘test’) of the dataset using aeon.
- Parameters:
split (
{'train', 'test'}) – Which split to load.**kwargs – Additional arguments forwarded to the underlying loader in aeon.
- Return type:
- Returns:
TSCData– Dataset container with feature arraysXand labelsy. For univariate datasets,Xhas shape(N, T). For multivariate datasets,Xhas shape(N, C, T)whereCis the number of channels/dimensions.- Parameters:
split (Literal['train', 'test'])
- describe()[source]
Return a compact description for the dataset.
The description contains per-split metadata (from
TSCData.describe()) and an overall summary (currently the combined number of classes observed across splits).
File Loader
- class tscf_eval.FileLoader[source]
Bases:
DataLoaderLoad a wide-format CSV/XLSX file (or pair of files) as
TSCData.- Supports two modes:
Provide
train_pathandtest_path(two-file mode).Provide
data_pathandsplit_colindicating which rows belong to train/test (single-file mode).
The table should be wide-format: one row per instance, numeric columns representing time points (or flattened channels), and a separate label column.
- __init__(*, train_path=None, test_path=None, data_path=None, split_col=None, train_value='train', test_value='test', label_col=None, feature_cols=None, sheet_name=None, name='local_wide')[source]
Initialize a file-based loader.
- Parameters:
train_path, test_path (
strorpathlib.Path, optional) – Paths to separate train/test files (two-file mode). Mutually exclusive withdata_pathmode.data_path (
strorpathlib.Path, optional) – Path to a single file containing both splits; requiressplit_colto be provided.split_col (
str, optional) – Column name indata_pathindicating split membership.train_value, test_value (
str) – Values insplit_colthat indicate train/test rows.label_col (
str) – Column name containing labels (required).feature_cols (
sequenceofstr, optional) – Optional explicit list of feature columns to use.sheet_name (
strorint, optional) – When reading Excel files, the sheet to use.name (
str) – Dataset name to assign to producedTSCDataobjects.
- Parameters:
- load(split, **kwargs)[source]
Load the requested split and return a
TSCData.- Parameters:
split (
{'train', 'test'}) – Which split to load.**kwargs – Additional options (not currently used).
- Return type:
- Returns:
TSCData– Dataset container with feature arraysXand labelsy.Xhas shape(N, T)whereNis the number of instances andTis the number of time points (feature columns).- Raises:
ValueError – If
split_colis specified but not found in the data file.- Parameters:
split (Literal['train', 'test'])
- describe()[source]
Return a concise description for the dataset(s) represented by this loader.
The return value includes per-split metadata (via
TSCData.describe()) and an overall summary (combined number of classes across splits).
References
Dau, H. A., et al. (2019). “The UCR Time Series Archive.” IEEE/CAA Journal of Automatica Sinica, 6(6), 1293-1305. [Archive]