Counterfactuals Module

The counterfactuals module provides implementations of counterfactual explanation algorithms for time series classification.

All implementations wrap existing methods from the literature and provide a unified interface for benchmarking and evaluation.

Base Class

class tscf_eval.Counterfactual[source]

Bases: ABC

Minimal base interface for counterfactual explainers.

Subclasses must implement explain. The method operates on a single instance (not batches) and returns the generated counterfactual, its predicted label, and an optional metadata dictionary describing how the counterfactual was produced.

Label Mapping

Subclasses that accept a model and data should call _init_label_mapping() during initialisation (e.g. in __post_init__). This populates _classes and enables _idx_to_label() / _label_to_idx() for converting between probability-column indices and actual class labels.

abstractmethod explain(x, y_pred=None)[source]

Return a counterfactual for a single instance x.

Parameters:

x – A single time-series instance. Supported shapes include (T,), (1, T) or (1, 1, T) for compatibility with callers that may add a leading batch or channel dimension.
y_pred – Optional precomputed predicted label for x. If None, the explainer implementation may compute the prediction from its internally-held model.

Return type:

tuple[ndarray, int, dict[str, Any]]

Returns:

cf_x – Counterfactual series (shape (T,) or matching input format).
cf_label – Predicted label for the counterfactual.
meta – Metadata dictionary with information about the generation process (e.g., neighbor indices, distances, edits, timings).

Parameters:

x (ndarray)
y_pred (int | None)

explain_k(x, k=5, y_pred=None)[source]

Generate k diverse counterfactuals for a single instance.

The default implementation calls explain k times. Subclasses may override this method to implement more sophisticated diversity mechanisms (e.g., different random seeds, target classes, or optimization restarts).

Parameters:

x (np.ndarray) – A single time-series instance.
k (int, default 5) – Number of counterfactuals to generate.
y_pred (int, optional) – Optional precomputed predicted label for x.

Return type:

tuple[ndarray, ndarray, list[dict[str, Any]]]

Returns:

cfs (np.ndarray) – Array of counterfactuals with shape (k, ...), where ... matches the shape of the input x.
cf_labels (np.ndarray) – Array of predicted labels for each counterfactual, shape (k,).
metas (list[dict]) – List of k metadata dictionaries.

Parameters:

x (ndarray)
k (int)
y_pred (int | None)

Examples

>>> cfs, labels, metas = explainer.explain_k(x, k=5)
>>> cfs.shape  # (5, T) for univariate or (5, C, T) for multivariate

Implementations

CELS

Counterfactual explanations via learned saliency maps that blend the original instance with its nearest unlike neighbor. Based on Li et al. (2023).

class tscf_eval.CELS[source]

Bases: Counterfactual

CELS counterfactual generator via learned saliency maps.

Implementation of the CELS algorithm by Li et al. (2023) [cels1].

CELS learns a saliency map theta that blends the original instance with its nearest unlike neighbor (NUN) to produce a counterfactual explanation. The saliency map is optimized to minimize a composite loss balancing prediction change, sparsity, and temporal smoothness.

The counterfactual is computed as:

x’ = x * (1 - theta) + NUN * theta

After optimization, theta is binarized to produce contiguous edits.

Parameters:

model (object) – A classifier with a probability estimator (predict_proba or a compatible interface). Must be differentiable or approximable.
data (tuple (``X_ref``, y_ref)) – Reference dataset used for finding nearest unlike neighbors.
budget_coeff (float, default 0.6) – Weight for the budget (sparsity) loss term L_Budget = a * mean(|theta|). Higher values produce sparser saliency maps.
tv_coeff (float, default 0.5) – Weight for the total variation loss term L_TV = b * mean(|theta_t - theta_{t+1}|^tv_beta). Higher values produce temporally smoother saliency maps.
max_coeff (float, default 0.7) – Weight for the prediction loss term L_Max = g * (1 - P(target | x')). Higher values prioritize changing the prediction.
tv_beta (float, default 3.0) – Exponent for the total variation norm. Higher values penalize large jumps more aggressively while tolerating small ones.
gradient_subsample (int or None, default 50) – Number of features to randomly sample for gradient computation each iteration. Uses stochastic gradient descent when set to a value less than the total number of features. Set to None to use all features (full gradient). Lower values speed up computation but may require more iterations to converge.
learning_rate (float, default 0.1) – Step size for Adam optimizer. Internally scaled by data standard deviation so the effective step adapts to input magnitude.
max_iter (int, default 5000) – Maximum number of optimization iterations.
tau (float, default 0.5) – Decision threshold for target class probability. Optimization considers convergence when P(target_class) >= tau.
patience (int, default 30) – Number of consecutive iterations where P(target) >= tau before stopping. Allows the optimizer to further refine the saliency map after the prediction flips.
tolerance (float, default 1e-4) – Convergence tolerance for total loss change between iterations.
threshold (float, default 0.5) – Binarization threshold for the saliency map during post-processing. Values of theta above this become 1 (use NUN), below become 0 (keep original).
random_state (int or None, default 0) – PRNG seed for reproducible optimization.
temperature (float or None, default None) – Temperature scaling for soft probability computation. Higher values produce smoother gradients by preventing sigmoid saturation when decision function values are large. If None, auto-calibrates based on model decision function values.

predict_proba

Wrapped probability prediction function.

Type:: callable

rng

Random number generator for reproducibility.

Type:: numpy.random.Generator

X_ref

Reference dataset features.

Type:: np.ndarray

y_ref

Reference dataset labels.

Type:: np.ndarray

References

[cels1]

Li, P., Tang, B., & Ning, Y. (2023). CELS: Counterfactual Explanation of Time-Series via Learned Saliency Maps. In Proceedings of the IEEE International Conference on Big Data 2023, pp. 1952-1957. IEEE. https://github.com/Luckilyeee/CELS

model: Any

data: tuple[ndarray, ndarray]

budget_coeff: float = 0.6

tv_coeff: float = 0.5

max_coeff: float = 0.7

tv_beta: float = 3.0

gradient_subsample: int | None = 50

learning_rate: float = 0.1

max_iter: int = 5000

tau: float = 0.5

patience: int = 30

tolerance: float = 0.0001

threshold: float = 0.5

random_state: int | None = 0

temperature: float | None = None

__post_init__()[source]

Initialise probability wrapper, RNG, reference data, and label mapping.

Validates all hyperparameters and computes normalisation statistics from the reference dataset. Warns if the model is unlikely to work well with gradient-based optimisation.

Return type:: None

explain(x, y_pred=None, *, class_of_interest=None)[source]

Generate a counterfactual explanation via learned saliency map.

Parameters:

x (np.ndarray) – Input time series of shape (T,) for univariate or (C, T) for multivariate data.
y_pred (int, optional) – Base predicted class for x. If None, computed via the model.
class_of_interest (int, optional) – Target class for the counterfactual. If None, uses the highest-probability alternative to y_pred.

Return type:

tuple[ndarray, int, dict[str, Any]]

Returns:

cf (np.ndarray) – Counterfactual time series with the same shape as x.
cf_label (int) – Predicted class label for the counterfactual.
meta (dict) – Metadata dictionary containing:
- method: Algorithm identifier ('cels').
- class_of_interest: Target class.
- nun_index_in_ref: Index of NUN in reference dataset.
- n_iterations: Number of iterations performed.
- converged: Whether optimization converged.
- final_target_prob: Final probability of target class.
- final_loss: Final composite loss value.
- mask_density: Fraction of saliency map above threshold.
- validity: Whether the counterfactual class differs from base.

Parameters:

x (ndarray)
y_pred (int | None)
class_of_interest (int | None)

explain_k(x, k=5, y_pred=None, *, class_of_interest=None)[source]

Generate k diverse counterfactuals using different NUNs.

CELS supports diverse counterfactual generation by using different nearest unlike neighbors as the blending source. Each counterfactual is generated with a different NUN, producing structurally diverse explanations while keeping the learned saliency map approach.

Parameters:

x (np.ndarray) – Input time series of shape (T,) or (C, T).
k (int, default 5) – Number of counterfactuals to generate.
y_pred (int, optional) – Precomputed predicted label for x.
class_of_interest (int, optional) – Target class for counterfactuals.

Return type:

tuple[ndarray, ndarray, list[dict[str, Any]]]

Returns:

cfs (np.ndarray) – Array of k counterfactuals with shape (k, ...).
cf_labels (np.ndarray) – Array of k predicted labels.
metas (list[dict]) – List of k metadata dictionaries.

Parameters:

x (ndarray)
k (int)
y_pred (int | None)
class_of_interest (int | None)

__init__(model, data, budget_coeff=0.6, tv_coeff=0.5, max_coeff=0.7, tv_beta=3.0, gradient_subsample=50, learning_rate=0.1, max_iter=5000, tau=0.5, patience=30, tolerance=0.0001, threshold=0.5, random_state=0, temperature=None)

Parameters:

model (Any)
data (tuple[ndarray, ndarray])
budget_coeff (float)
tv_coeff (float)
max_coeff (float)
tv_beta (float)
gradient_subsample (int | None)
learning_rate (float)
max_iter (int)
tau (float)
patience (int)
tolerance (float)
threshold (float)
random_state (int | None)
temperature (float | None)

Return type:

None

CoMTE

Counterfactual explanations for Multivariate Time series using greedy channel substitution from distractor series. Based on Ates et al. (2021).

class tscf_eval.COMTE[source]

Bases: Counterfactual

CoMTE (Sequential Greedy) counterfactual generator for time-series.

Implementation of the CoMTE algorithm by Ates et al. (2021) [comte1].

Produces counterfactuals by greedily replacing whole variables (channels) from distractor series drawn from a reference set. Distractors are selected among reference instances predicted as the target class. For each distractor the algorithm performs a sequential greedy search that replaces channels one-by-one, choosing at each step the channel swap that most increases the model probability f_c of the target class. The best counterfactual across distractors is chosen using the paper’s loss:

L = max(0, tau - f_c)^2 + lambda_reg * max(0, n_vars - delta)

Supported distances:

'dtw' : multivariate DTW via dtw_distance_vec_multich
'euclidean' : Euclidean distance using flattened pairwise distances

Parameters:

model (object) – A classifier with a probability estimator (predict_proba or a compatible interface). The helper predict_proba_fn wraps model inference.
data (tuple (``X_ref``, y_ref)) – Reference dataset used to select distractors.
distance ({'euclidean', 'dtw'}, default 'dtw') – Distance metric to find nearest distractors.
- 'euclidean': Euclidean distance on flattened vectors. Faster but ignores temporal alignment.
- 'dtw': Dynamic Time Warping distance (per-channel, averaged). Respects temporal shifts and is recommended for time series.
n_distractors (int) – Maximum number of distractors to try.
tau (float) – Target probability threshold for class c.
delta (int) – Preferred number of variable edits (paper’s sweet spot).
lambda_reg (float) – Regularization weight in the paper loss.
random_state (Optional[int]) – Seed for reproducible distractor tie-breaking.

References

[comte1]

Ates, E., Aksar, B., Leung, V. J., & Coskun, A. K. (2021). Counterfactual Explanations for Multivariate Time Series. ICAPAI 2021. https://github.com/peaclab/CoMTE

model: Any

data: tuple[ndarray, ndarray]

distance: Literal['euclidean', 'dtw'] = 'dtw'

n_distractors: int = 10

tau: float = 0.95

delta: int = 3

lambda_reg: float = 0.8

random_state: int | None = 0

__post_init__()[source]

Initialise probability wrapper, RNG, reference data, and label mapping.

Validates all hyperparameters and pre-computes reference-set predictions to avoid redundant calls during distractor selection.

Raises:: ValueError – If distance is not in {'euclidean', 'dtw'}, n_distractors < 1, tau is outside (0, 1], delta < 1, or lambda_reg < 0.

explain(x, y_pred=None, *, class_of_interest=None)[source]

Generate a counterfactual toward a class of interest.

Parameters:

x (np.ndarray) – Input time series of shape (T,) for univariate or (C, T) for multivariate data.
y_pred (int, optional) – Base predicted class for x. If None, computed via the model.
class_of_interest (int, optional) – Target class for the counterfactual. If None, uses the highest-probability alternative to y_pred.

Return type:

tuple[ndarray, int, dict[str, Any]]

Returns:

cf (np.ndarray) – Counterfactual time series with the same shape as x.
cf_label (int) – Predicted class label for the counterfactual.
meta (dict) – Metadata dictionary containing:
- method: Algorithm identifier ('comte_greedy').
- distance: Distance metric used.
- class_of_interest: Target class.
- tau, delta, lambda_reg: Algorithm parameters.
- distractor_index_in_ref: Index of selected distractor.
- distractor_distance: Distance to selected distractor.
- edits_variables: List of edited channel indices.
- target_prob: Final target class probability.
- loss: Final loss value.

Parameters:

x (ndarray)
y_pred (int | None)
class_of_interest (int | None)

explain_k(x, k=5, y_pred=None, *, class_of_interest=None)[source]

Generate k diverse counterfactuals using different distractors.

COMTE naturally supports diverse counterfactual generation by using different distractor instances from the reference set. Each CF is generated using a different distractor, producing structurally diverse explanations.

Parameters:

x (np.ndarray) – Input time series.
k (int, default 5) – Number of counterfactuals to generate.
y_pred (int, optional) – Precomputed predicted label for x.
class_of_interest (int, optional) – Target class for counterfactuals.

Return type:

tuple[ndarray, ndarray, list[dict[str, Any]]]

Returns:

cfs (np.ndarray) – Array of k counterfactuals.
cf_labels (np.ndarray) – Array of k predicted labels.
metas (list[dict]) – List of k metadata dictionaries.

Parameters:

x (ndarray)
k (int)
y_pred (int | None)
class_of_interest (int | None)

__init__(model, data, distance='dtw', n_distractors=10, tau=0.95, delta=3, lambda_reg=0.8, random_state=0)

Parameters:

model (Any)
data (tuple[ndarray, ndarray])
distance (Literal['euclidean', 'dtw'])
n_distractors (int)
tau (float)
delta (int)
lambda_reg (float)
random_state (int | None)

Return type:

None

NativeGuide

Instance-based counterfactual explanations using nearest unlike neighbor guidance with DTW barycenter averaging. Based on Delaney et al. (2021).

class tscf_eval.NativeGuide[source]

Bases: Counterfactual

NativeGuide counterfactual generator for time-series.

Implementation of the NativeGuide algorithm by Delaney et al. (2021) [ng1].

The algorithm retrieves a “native guide” (nearest-unlike neighbor, NUN) from a reference set. Depending on the method, it either:

‘blend’ (original paper): Blends the query with the NUN using weighted DTW barycenter averaging, incrementally increasing the guide’s influence until prediction flips.
‘ng’: Copies a contiguous window from the NUN into the query, growing the window until prediction flips.
‘dtw_dba’: Like ‘ng’ but uses a DTW-DBA barycenter of k unlike neighbors as the guide.
‘cam’: Like ‘ng’ but uses a CAM importance function to select the discriminative window.

Parameters:

model (object) – A classifier-like object that exposes a probability estimator. The internal helper predict_proba_fn adapts common interfaces (e.g. scikit-learn, aeon).
data (tuple) – A tuple (X_ref, y_ref) containing the reference dataset used to select distractors. X_ref can have shape (N, T) or (N, C, T).
method ({'blend', 'ng', 'dtw_dba', 'cam'}, default 'blend') – Strategy for counterfactual generation:
- ‘blend’: Original paper method. Weighted averaging of query and NUN using DTW barycenter, incrementally increasing NUN influence.
- ‘ng’: Window replacement using nearest-unlike neighbor.
- ‘dtw_dba’: Window replacement using DTW-DBA barycenter of k neighbors.
- ‘cam’: Window replacement guided by CAM importance function.
distance ({'euclidean', 'dtw'}, default 'dtw') – Distance metric used to rank distractors when searching the reference set.
- 'euclidean': Euclidean distance on flattened vectors. Faster but ignores temporal alignment.
- 'dtw': Dynamic Time Warping distance (per-channel, averaged). Respects temporal shifts and is recommended for time series.
k_unlike (int, default 5) – Number of unlike neighbors to consider when computing a DTW-DBA guide.
random_state (int or None, default 0) – PRNG seed for deterministic behaviour where applicable.
beta_step (float, default 0.01) – For method='blend': increment for the blending weight beta at each iteration (original paper uses 0.01).
target_prob (float, default 0.5) – For method='blend': target probability threshold for the counterfactual class (original paper uses 0.5).
cam_importance_fn (callable or None) – When method=='cam', a function with signature (series, y_pred) -> np.ndarray that returns an importance map of shape (T,) or (C, T).

Notes

The public API is explain(x, y_pred=None) -> (cf, cf_label, meta). The returned meta dictionary contains keys such as nun_index_in_X, neighbor_indices, neighbor_distance, window_start, window_len, and beta (for blend method).

References

[ng1]

Delaney, E., Greene, D., & Keane, M. T. (2021). Instance-Based Counterfactual Explanations for Time Series Classification. ICCBR 2021. https://github.com/e-delaney/Instance-Based_CFE_TSC

model: Any

data: tuple[ndarray, ndarray]

method: Literal['blend', 'ng', 'dtw_dba', 'cam'] = 'blend'

distance: Literal['euclidean', 'dtw'] = 'dtw'

k_unlike: int = 5

random_state: int | None = 0

beta_step: float = 0.01

target_prob: float = 0.5

cam_importance_fn: Callable[[ndarray, int], ndarray] | None = None

__post_init__()[source]

Initialise probability wrapper, RNG, reference data, and label mapping.

Validates all hyperparameters, pre-computes reference-set predictions, and checks method-specific requirements (e.g. cam_importance_fn when method='cam').

Raises:: ValueError – If X and y have mismatched sample counts, method or distance is not in the allowed set, beta_step or target_prob is outside (0, 1], or method='cam' without a cam_importance_fn.

explain(x, y_pred=None)[source]

Generate a counterfactual explanation for a time series instance.

Parameters:

x (np.ndarray) – Input time series of shape (T,) for univariate or (C, T) for multivariate data.
y_pred (int, optional) – Precomputed predicted class for x. If None, computed via the model.

Return type:

tuple[ndarray, int, dict[str, Any]]

Returns:

cf (np.ndarray) – Counterfactual time series with the same shape as x.
cf_label (int) – Predicted class label for the counterfactual.
meta (dict) – Metadata dictionary containing:
- method: Algorithm variant used.
- distance: Distance metric used.
- nun_index_in_X: Index of nearest unlike neighbor.
- neighbor_indices: Indices of neighbors (for dtw_dba).
- neighbor_distance: Distance to nearest unlike neighbor.
- beta: Blending weight (for blend method, else None).
- window_start: Start of replacement window (else None).
- window_len: Length of replacement window (else None).

Parameters:

x (ndarray)
y_pred (int | None)

explain_k(x, k=5, y_pred=None)[source]

Generate k diverse counterfactuals using different unlike neighbors.

NativeGuide naturally supports diverse counterfactual generation by using different unlike neighbors as guides. Each counterfactual is generated using a different neighbor, producing structurally diverse explanations.

Parameters:

x (np.ndarray) – Input time series of shape (T,) or (C, T).
k (int, default 5) – Number of counterfactuals to generate.
y_pred (int, optional) – Precomputed predicted label for x.

Return type:

tuple[ndarray, ndarray, list[dict[str, Any]]]

Returns:

cfs (np.ndarray) – Array of k counterfactuals with shape (k, ...).
cf_labels (np.ndarray) – Array of k predicted labels.
metas (list[dict]) – List of k metadata dictionaries.

Parameters:

x (ndarray)
k (int)
y_pred (int | None)

__init__(model, data, method='blend', distance='dtw', k_unlike=5, random_state=0, beta_step=0.01, target_prob=0.5, cam_importance_fn=None)

Parameters:

model (Any)
data (tuple[ndarray, ndarray])
method (Literal['blend', 'ng', 'dtw_dba', 'cam'])
distance (Literal['euclidean', 'dtw'])
k_unlike (int)
random_state (int | None)
beta_step (float)
target_prob (float)
cam_importance_fn (Callable[[ndarray, int], ndarray] | None)

Return type:

None

SETS

Shapelet-based counterfactual explanations using class-specific shapelet manipulation with contiguous perturbations. Based on Bahri et al. (2022).

class tscf_eval.SETS[source]

Bases: Counterfactual

SETS counterfactual generator using class-specific shapelets.

Implementation of the SETS algorithm by Bahri et al. (2022) [sets1].

SETS leverages the inherent interpretability of shapelets to produce counterfactual explanations with contiguous, visually meaningful perturbations. The preprocessing phase discovers class-exclusive shapelets and their typical occurrence positions; the generation phase removes original-class shapelets and introduces target-class shapelets to flip the classifier prediction.

Parameters:

model (object) – A classifier with predict_proba (or compatible interface).
data (tuple (``X_ref``, y_ref)) – Reference dataset for shapelet extraction and NUN lookup.
n_shapelet_samples (int, default 10000) – Number of candidate shapelets to evaluate during extraction.
max_shapelets (int or None, default None) – Maximum shapelets to retain. None uses aeon’s default (min(10 * n_cases, 1000)).
min_shapelet_length (int, default 3) – Minimum shapelet length.
max_shapelet_length (int or None, default None) – Maximum shapelet length. None uses the full series length.
time_limit_in_minutes (float, default 0.0) – Time budget for shapelet extraction (0 = use n_shapelet_samples).
threshold_percentile (float, default 10.0) – Bottom percentile of per-shapelet scaled distances used as the occlusion threshold. Lower values are stricter.
max_combination_dims (int, default 3) – Maximum number of dimensions to combine when single-dimension edits fail. Caps the combinatorial search at C(D, k) for k ≤ max_combination_dims.
random_state (int or None, default 0) – PRNG seed for reproducibility.
n_jobs (int, default 1) – Number of parallel jobs for shapelet extraction.

predict_proba

Wrapped probability prediction function.

Type:: callable

rng

Random number generator.

Type:: numpy.random.Generator

X_ref

Reference dataset features.

Type:: np.ndarray

y_ref

Reference dataset labels.

Type:: np.ndarray

References

[sets1]

Bahri, O., Filali Boubrahimi, S., & Hamdi, S. M. (2022). Shapelet-Based Counterfactual Explanations for Multivariate Time Series. In Proceedings of the ACM SIGKDD Workshop on Mining and Learning from Time Series (KDD-MiLeTS 2022). https://github.com/omarbahri/SETS

model: Any

data: tuple[ndarray, ndarray]

n_shapelet_samples: int = 10000

max_shapelets: int | None = None

min_shapelet_length: int = 3

max_shapelet_length: int | None = None

time_limit_in_minutes: float = 0.0

threshold_percentile: float = 10.0

max_combination_dims: int = 3

random_state: int | None = 0

n_jobs: int = 1

__post_init__()[source]

Initialise prediction wrapper, reference data, and shapelet pipeline.

Validates parameters, fits the shapelet transform, computes the occlusion threshold, assigns class-exclusive shapelets, builds heat maps, and computes per-channel information gain.

explain(x, y_pred=None, *, class_of_interest=None)[source]

Generate a counterfactual explanation using SETS.

Parameters:

x (np.ndarray) – Input time series of shape (T,) for univariate or (C, T) for multivariate data.
y_pred (int, optional) – Base predicted class for x. If None, computed via model.
class_of_interest (int, optional) – Target class. If None, uses the highest-probability alternative to y_pred.

Return type:

tuple[ndarray, int, dict[str, Any]]

Returns:

cf (np.ndarray) – Counterfactual time series with the same shape as x.
cf_label (int) – Predicted class label for the counterfactual.
meta (dict) – Metadata dictionary containing:
- method: 'sets'
- class_of_interest: Target class.
- nun_index_in_ref: Index of the NUN used.
- dimensions_modified: Channels edited.
- phase_a_edits: Number of Phase A replacements.
- phase_b_edits: Number of Phase B insertions.
- n_class_shapelets: Total surviving class-exclusive shapelets.
- validity: Whether the target class was achieved.
- failure_reason: None if successful, description otherwise.

Parameters:

x (ndarray)
y_pred (int | None)
class_of_interest (int | None)

explain_k(x, k=5, y_pred=None, *, class_of_interest=None)[source]

Generate k diverse counterfactuals using different NUNs.

SETS supports diverse counterfactual generation by using different nearest unlike neighbors as the replacement source for Phase A. Each counterfactual is generated with a different NUN, producing structurally diverse explanations.

Parameters:

x (np.ndarray) – Input time series of shape (T,) or (C, T).
k (int, default 5) – Number of counterfactuals to generate.
y_pred (int, optional) – Precomputed predicted label for x.
class_of_interest (int, optional) – Target class for counterfactuals.

Return type:

tuple[ndarray, ndarray, list[dict[str, Any]]]

Returns:

cfs (np.ndarray) – Array of k counterfactuals with shape (k, ...).
cf_labels (np.ndarray) – Array of k predicted labels.
metas (list[dict]) – List of k metadata dictionaries.

Parameters:

x (ndarray)
k (int)
y_pred (int | None)
class_of_interest (int | None)

__init__(model, data, n_shapelet_samples=10000, max_shapelets=None, min_shapelet_length=3, max_shapelet_length=None, time_limit_in_minutes=0.0, threshold_percentile=10.0, max_combination_dims=3, random_state=0, n_jobs=1)

Parameters:

model (Any)
data (tuple[ndarray, ndarray])
n_shapelet_samples (int)
max_shapelets (int | None)
min_shapelet_length (int)
max_shapelet_length (int | None)
time_limit_in_minutes (float)
threshold_percentile (float)
max_combination_dims (int)
random_state (int | None)
n_jobs (int)

Return type:

None

TSEvo

Evolutionary counterfactual generation using multi-objective optimization (NSGA-II) with three mutation strategies: authentic, frequency, and gaussian. Based on Höllig et al. (2022).

class tscf_eval.TSEvo[source]

Bases: Counterfactual

TSEvo counterfactual generator using multi-objective evolutionary optimization.

Implementation of the TSEvo algorithm by Höllig et al. (2022) [tsevo1].

TSEvo uses NSGA-II (Non-dominated Sorting Genetic Algorithm II) to evolve counterfactual explanations that balance three objectives: changing the model’s prediction (validity), minimizing perturbation (proximity), and keeping changes sparse (sparsity).

The algorithm supports three mutation strategies that can be used individually or combined:

authentic: Replace windows with segments from reference series
frequency: Replace frequency bands via FFT transformation
gaussian: Apply Gaussian perturbation based on reference statistics

Parameters:

model (object) – A classifier with a probability estimator (predict_proba or a compatible interface). The helper predict_proba_fn wraps model inference.
data (tuple (``X_ref``, y_ref)) – Reference dataset used for mutation operations. Series predicted as the target class are used during evolution.
transformer ({'authentic', 'frequency', 'gaussian', 'all'}, default 'authentic') – Mutation strategy to use:
- ‘authentic’: Authentic opposing information (window replacement)
- ‘frequency’: Frequency band mapping via FFT
- ‘gaussian’: Gaussian perturbation from reference statistics
- ‘all’: Randomly select among all strategies per individual
n_generations (int, default 100) – Number of evolutionary generations.
population_size (int, default 50) – Population size (μ in NSGA-II).
crossover_prob (float, default 0.9) – Probability of applying crossover between individuals.
mutation_prob (float, default 0.6) – Probability of applying mutation to an individual.
window_sizes (tuple of int, default (5, 10, 20)) – Candidate window sizes for authentic mutation operator.
random_state (int or None, default 0) – PRNG seed for reproducible evolution.
verbose (int, default 0) – Verbosity level (0=silent, 1=progress, 2=detailed).

predict_proba

Wrapped probability prediction function.

Type:: callable

rng

Random number generator for reproducibility.

Type:: numpy.random.Generator

X_ref

Reference dataset features.

Type:: np.ndarray

y_ref

Reference dataset labels.

Type:: np.ndarray

References

[tsevo1]

Höllig, J., Kulbach, C., & Thoma, S. (2022). TSEvo: Evolutionary Counterfactual Explanations for Time Series Classification. ICMLA 2022. https://github.com/JHoelli/TSEvo

model: Any

data: tuple[ndarray, ndarray]

transformer: Literal['authentic', 'frequency', 'gaussian', 'all'] = 'authentic'

n_generations: int = 100

population_size: int = 50

crossover_prob: float = 0.9

mutation_prob: float = 0.6

window_sizes: tuple[int, ...] = (5, 10, 20)

random_state: int | None = 0

verbose: int = 0

__post_init__()[source]

Initialise probability wrapper, RNG, reference data, and label mapping.

Validates all hyperparameters and ensures the deap package is available for evolutionary computation. Rounds population_size up to the nearest multiple of four as required by NSGA-II tournament selection.

explain(x, y_pred=None, *, class_of_interest=None)[source]

Generate a counterfactual explanation using evolutionary optimization.

Parameters:

x (np.ndarray) – Input time series of shape (T,) for univariate or (C, T) for multivariate data.
y_pred (int, optional) – Base predicted class for x. If None, computed via the model.
class_of_interest (int, optional) – Target class for the counterfactual. If None, uses the highest-probability alternative to y_pred.

Return type:

tuple[ndarray, int, dict[str, Any]]

Returns:

cf (np.ndarray) – Best counterfactual time series with the same shape as x.
cf_label (int) – Predicted class label for the counterfactual.
meta (dict) – Metadata dictionary containing:
- method: Algorithm identifier ('tsevo').
- transformer: Mutation strategy used.
- class_of_interest: Target class.
- n_generations: Number of generations evolved.
- population_size: Population size used.
- objectives: Final objective values (output_dist, input_dist, sparsity).
- pareto_front_size: Number of solutions in Pareto front.
- validity: Whether prediction changed (True/False).

Parameters:

x (ndarray)
y_pred (int | None)
class_of_interest (int | None)

__init__(model, data, transformer='authentic', n_generations=100, population_size=50, crossover_prob=0.9, mutation_prob=0.6, window_sizes=(5, 10, 20), random_state=0, verbose=0)

Parameters:

model (Any)
data (tuple[ndarray, ndarray])
transformer (Literal['authentic', 'frequency', 'gaussian', 'all'])
n_generations (int)
population_size (int)
crossover_prob (float)
mutation_prob (float)
window_sizes (tuple[int, ...])
random_state (int | None)
verbose (int)

Return type:

None

Glacier

Gradient-based counterfactual generation with guided locally constrained optimization using importance-weighted proximity. Based on Wang et al. (2024).

class tscf_eval.Glacier[source]

Bases: Counterfactual

Glacier counterfactual generator using gradient-based optimization.

Implementation of the Glacier algorithm by Wang et al. (2024) [glacier1].

Glacier uses gradient-based optimization with guided constraints to generate counterfactual explanations. The key innovation is applying importance-based weights that allow free modification of less-important time series regions while preserving critical features.

The optimization minimizes a composite loss:

L = w * L_pred + (1-w) * L_proximity

where: - L_pred: Prediction margin loss (distance to target class probability) - L_proximity: Weighted distance from original (importance-weighted) - w: pred_margin_weight parameter

Parameters:

model (object) – A classifier with a probability estimator (predict_proba or a compatible interface). Must be differentiable or approximable.
data (tuple (``X_ref``, y_ref)) – Reference dataset used for computing feature importance and normalization statistics.
pred_margin_weight (float, default 0.75) – Weight balancing prediction margin loss vs proximity loss. Higher values prioritize changing the prediction over staying close to the original. Range: [0, 1]. Values >= 0.75 recommended for non-neural-network classifiers where finite-difference gradients are weak relative to the proximity gradient.
learning_rate (float, default 0.01) – Step size for Adam optimizer. Internally scaled by data standard deviation so the effective step adapts to input magnitude.
max_iter (int, default 300) – Maximum number of optimization iterations.
tau (float, default 0.5) – Decision threshold for target class probability. Optimization stops when P(target_class) >= tau.
tolerance (float, default 1e-4) – Convergence tolerance for prediction margin loss.
weight_type ({'uniform', 'local', 'unconstrained'}, default 'uniform') – Type of importance weighting:
- ‘uniform’: Equal weights across all timesteps
- ‘local’: Segment-based LIME importance following the paper. Uses matrix-profile changepoint segmentation, STFT background perturbation, and Ridge regression surrogate to compute per-segment importance, producing binary timestep weights. Requires stumpy and scipy for full functionality (falls back to uniform segments / mean background otherwise).
- ‘unconstrained’: No proximity penalty (pure prediction optimization)
random_state (int or None, default 0) – PRNG seed for reproducible optimization.
gradient_subsample (int or None, default 50) – Number of features to randomly sample for gradient computation each iteration. Uses stochastic gradient descent when set to a value less than the total number of features. Set to None to use all features (full gradient). Lower values speed up computation but may require more iterations to converge.
temperature (float or None, default None) – Temperature scaling for soft probability computation. Higher values produce smoother gradients by preventing sigmoid saturation when decision function values are large. If None, auto-calibrates based on model decision function values (recommended for most use cases). Increase manually (e.g., 2.0-5.0) if counterfactuals are unchanged with ROCKET or other margin-based classifiers.
n_segments (int, default 10) – Number of changepoints for segment-based local importance (weight_type='local'). Produces n_segments + 1 segments. Ignored when weight_type is not 'local'.
segment_window (int, default 10) – Window size for the matrix-profile segmentation algorithm. Ignored when weight_type is not 'local'.
n_perturbations (int, default 100) – Number of binary perturbation samples for the LIME surrogate model used in segment-based local importance. Ignored when weight_type is not 'local'.

predict_proba

Wrapped probability prediction function.

Type:: callable

rng

Random number generator for reproducibility.

Type:: numpy.random.Generator

X_ref

Reference dataset features.

Type:: np.ndarray

y_ref

Reference dataset labels.

Type:: np.ndarray

_mean

Mean of reference data (for normalization).

Type:: np.ndarray

_std

Standard deviation of reference data (for normalization).

Type:: np.ndarray

References

[glacier1]

Wang, Z., Samsten, I., Miliou, I., Mochaourab, R., & Papapetrou, P. (2024). Glacier: Guided Locally Constrained Counterfactual Explanations for Time Series Classification. Machine Learning, 113(3). https://github.com/zhendong3wang/learning-time-series-counterfactuals

model: Any

data: tuple[ndarray, ndarray]

pred_margin_weight: float = 0.75

learning_rate: float = 0.01

max_iter: int = 300

tau: float = 0.5

tolerance: float = 0.0001

weight_type: Literal['uniform', 'local', 'unconstrained'] = 'uniform'

random_state: int | None = 0

gradient_subsample: int | None = 50

temperature: float | None = None

n_segments: int = 10

segment_window: int = 10

n_perturbations: int = 100

__post_init__()[source]

Initialise probability wrapper, RNG, reference data, and label mapping.

Validates all hyperparameters and computes normalisation statistics from the reference dataset. Warns if the model is unlikely to work well with gradient-based optimisation.

explain(x, y_pred=None, *, class_of_interest=None)[source]

Generate a counterfactual explanation using gradient-based optimization.

Parameters:

x (np.ndarray) – Input time series of shape (T,) for univariate or (C, T) for multivariate data.
y_pred (int, optional) – Base predicted class for x. If None, computed via the model.
class_of_interest (int, optional) – Target class for the counterfactual. If None, uses the highest-probability alternative to y_pred.

Return type:

tuple[ndarray, int, dict[str, Any]]

Returns:

cf (np.ndarray) – Counterfactual time series with the same shape as x.
cf_label (int) – Predicted class label for the counterfactual.
meta (dict) – Metadata dictionary containing:
- method: Algorithm identifier ('glacier').
- weight_type: Constraint type used.
- class_of_interest: Target class.
- pred_margin_weight: Weight parameter used.
- learning_rate: Learning rate used.
- n_iterations: Number of iterations performed.
- converged: Whether optimization converged.
- final_target_prob: Final probability of target class.
- final_loss: Final composite loss value.

Parameters:

x (ndarray)
y_pred (int | None)
class_of_interest (int | None)

__init__(model, data, pred_margin_weight=0.75, learning_rate=0.01, max_iter=300, tau=0.5, tolerance=0.0001, weight_type='uniform', random_state=0, gradient_subsample=50, temperature=None, n_segments=10, segment_window=10, n_perturbations=100)

Parameters:

model (Any)
data (tuple[ndarray, ndarray])
pred_margin_weight (float)
learning_rate (float)
max_iter (int)
tau (float)
tolerance (float)
weight_type (Literal['uniform', 'local', 'unconstrained'])
random_state (int | None)
gradient_subsample (int | None)
temperature (float | None)
n_segments (int)
segment_window (int)
n_perturbations (int)

Return type:

None

LatentCF++

Gradient-based counterfactual generation with importance-weighted proximity constraints, optimizing directly in the input space. Based on Wang et al. (2021).

class tscf_eval.LatentCF[source]

Bases: Counterfactual

LatentCF++ counterfactual generator using gradient-based optimization.

Implementation of the LatentCF++ algorithm by Wang et al. (2021) [latentcf1].

LatentCF++ generates counterfactuals by optimizing in the latent space (or directly in input space when no autoencoder is provided). The algorithm balances prediction margin loss (driving toward target class) with weighted proximity loss (staying close to original, prioritizing less important regions).

The optimization minimizes a composite loss:

L = w * L_pred + (1-w) * L_proximity

where: - L_pred: Mean squared error between desired probability (1.0) and current - L_proximity: Weighted mean absolute error from original - w: pred_margin_weight parameter

Parameters:

model (object) – A classifier with a probability estimator (predict_proba or a compatible interface).
data (tuple (``X_ref``, y_ref)) – Reference dataset used for computing feature importance (for ‘global’ weight strategy) and normalization statistics.
probability (float, default 0.5) – Target probability threshold. Optimization aims for P(target) >= probability.
tolerance (float, default 1e-6) – Convergence tolerance. Optimization stops when prediction margin loss is below tolerance AND target probability is reached.
max_iter (int, default 300) – Maximum number of optimization iterations.
learning_rate (float, default 0.01) – Step size for Adam optimizer. Internally scaled by data standard deviation so the effective step adapts to input magnitude.
pred_margin_weight (float, default 0.75) – Weight balancing prediction margin loss vs proximity loss. Range: [0, 1]. Higher values prioritize changing the prediction. Values >= 0.75 recommended for non-neural-network classifiers.
step_weights ({'uniform', 'local', 'global'}, default 'uniform') – Strategy for computing importance weights:
- ‘uniform’: Equal weights across all timesteps
- ‘local’: Per-sample importance via perturbation-based sensitivity
- ‘global’: Dataset-level importance computed across reference samples
random_state (int or None, default 0) – PRNG seed for reproducible optimization.
gradient_subsample (int or None, default 50) – Number of features to randomly sample for gradient computation each iteration. Uses stochastic gradient descent when set to a value less than the total number of features. Set to None to use all features (full gradient). Lower values speed up computation but may require more iterations to converge.
temperature (float or None, default None) – Temperature scaling for soft probability computation. Higher values produce smoother gradients by preventing sigmoid saturation when decision function values are large. If None, auto-calibrates based on model decision function values (recommended for most use cases). Increase manually (e.g., 2.0-5.0) if counterfactuals are unchanged with ROCKET or other margin-based classifiers.

predict_proba

Wrapped probability prediction function.

Type:: callable

rng

Random number generator for reproducibility.

Type:: numpy.random.Generator

X_ref

Reference dataset features.

Type:: np.ndarray

y_ref

Reference dataset labels.

Type:: np.ndarray

_global_weights

Precomputed global weights (cached after first use).

Type:: np.ndarray or None

References

[latentcf1]

Wang, Z., Samsten, I., Mochaourab, R., & Papapetrou, P. (2021). Learning Time Series Counterfactuals via Latent Space Representations. In International Conference on Discovery Science (DS 2021). https://github.com/zhendong3wang/learning-time-series-counterfactuals

model: Any

data: tuple[ndarray, ndarray]

probability: float = 0.5

tolerance: float = 1e-06

max_iter: int = 300

learning_rate: float = 0.01

pred_margin_weight: float = 0.75

step_weights: Literal['uniform', 'local', 'global'] = 'uniform'

random_state: int | None = 0

gradient_subsample: int | None = 50

temperature: float | None = None

__post_init__()[source]

Initialise probability wrapper, RNG, reference data, and label mapping.

Validates all hyperparameters. Warns if the model is unlikely to work well with gradient-based optimisation.

explain(x, y_pred=None, *, class_of_interest=None)[source]

Generate a counterfactual explanation using LatentCF++ optimization.

Parameters:

x (np.ndarray) – Input time series of shape (T,) for univariate or (C, T) for multivariate data.
y_pred (int, optional) – Base predicted class for x. If None, computed via the model.
class_of_interest (int, optional) – Target class for the counterfactual. If None, uses the highest-probability alternative to y_pred.

Return type:

tuple[ndarray, int, dict[str, Any]]

Returns:

cf (np.ndarray) – Counterfactual time series with the same shape as x.
cf_label (int) – Predicted class label for the counterfactual.
meta (dict) – Metadata dictionary containing:
- method: Algorithm identifier ('latent_cf').
- step_weights: Weight strategy used.
- class_of_interest: Target class.
- pred_margin_weight: Weight parameter used.
- learning_rate: Learning rate used.
- n_iterations: Number of iterations performed.
- converged: Whether optimization converged.
- final_target_prob: Final probability of target class.
- final_loss: Final composite loss value.
- validity: Whether counterfactual changed prediction.

Parameters:

x (ndarray)
y_pred (int | None)
class_of_interest (int | None)

__init__(model, data, probability=0.5, tolerance=1e-06, max_iter=300, learning_rate=0.01, pred_margin_weight=0.75, step_weights='uniform', random_state=0, gradient_subsample=50, temperature=None)

Parameters:

model (Any)
data (tuple[ndarray, ndarray])
probability (float)
tolerance (float)
max_iter (int)
learning_rate (float)
pred_margin_weight (float)
step_weights (Literal['uniform', 'local', 'global'])
random_state (int | None)
gradient_subsample (int | None)
temperature (float | None)

Return type:

None

References

The counterfactual methods implemented in this module are based on the following papers:

Li, P., Tang, B., & Ning, Y. (2023). “CELS: Counterfactual Explanation of Time-Series via Learned Saliency Maps.” In Proceedings of the IEEE International Conference on Big Data 2023, pp. 1952-1957. IEEE. [Paper] [Code]
Ates, E., Aksar, B., Leung, V. J., & Coskun, A. K. (2021). “Counterfactual Explanations for Multivariate Time Series.” In Proceedings of the 2021 International Conference on Applied Artificial Intelligence (ICAPAI), pp. 1-8. [Paper] [Code]
Delaney, E., Greene, D., & Keane, M. T. (2021). “Instance-Based Counterfactual Explanations for Time Series Classification.” In Case-Based Reasoning Research and Development (ICCBR 2021), pp. 32-47. Springer. [Paper] [Code]
Bahri, O., Filali Boubrahimi, S., & Hamdi, S. M. (2022). “Shapelet-Based Counterfactual Explanations for Multivariate Time Series.” In Proceedings of the ACM SIGKDD Workshop on Mining and Learning from Time Series (KDD-MiLeTS 2022). [Paper] [Code]
Höllig, J., Kulbach, C., & Thoma, S. (2022). “TSEvo: Evolutionary Counterfactual Explanations for Time Series Classification.” In Proceedings of the 21st IEEE International Conference on Machine Learning and Applications (ICMLA 2022), pp. 29-36. [Paper] [Code]
Wang, Z., Samsten, I., Miliou, I., Mochaourab, R., & Papapetrou, P. (2024). “Glacier: Guided Locally Constrained Counterfactual Explanations for Time Series Classification.” Machine Learning, 113(3). [Paper] [Code]
Wang, Z., Samsten, I., Mochaourab, R., & Papapetrou, P. (2021). “Learning Time Series Counterfactuals via Latent Space Representations.” In International Conference on Discovery Science (DS 2021), Lecture Notes in Computer Science, vol 12986, pp. 369-384. Springer. [Paper] [Code]

The implementations also use TSInterpret as a foundation:

Hollig, J., Kulbach, C., & Thoma, S. (2023). “TSInterpret: A Python Package for the Interpretability of Time Series Classification.” Journal of Open Source Software, 8(85), 5220. [Paper]