Counterfactuals Module
The counterfactuals module provides implementations of counterfactual explanation algorithms for time series classification.
All implementations wrap existing methods from the literature and provide a unified interface for benchmarking and evaluation.
Base Class
- class tscf_eval.Counterfactual[source]
Bases:
ABCMinimal base interface for counterfactual explainers.
Subclasses must implement
explain. The method operates on a single instance (not batches) and returns the generated counterfactual, its predicted label, and an optional metadata dictionary describing how the counterfactual was produced.Label Mapping
Subclasses that accept a
modelanddatashould call_init_label_mapping()during initialisation (e.g. in__post_init__). This populates_classesand enables_idx_to_label()/_label_to_idx()for converting between probability-column indices and actual class labels.- abstractmethod explain(x, y_pred=None)[source]
Return a counterfactual for a single instance x.
- Parameters:
x – A single time-series instance. Supported shapes include
(T,),(1, T)or(1, 1, T)for compatibility with callers that may add a leading batch or channel dimension.y_pred – Optional precomputed predicted label for
x. IfNone, the explainer implementation may compute the prediction from its internally-held model.
- Return type:
- Returns:
cf_x– Counterfactual series (shape(T,)or matching input format).cf_label– Predicted label for the counterfactual.meta– Metadata dictionary with information about the generation process (e.g., neighbor indices, distances, edits, timings).
- Parameters:
- explain_k(x, k=5, y_pred=None)[source]
Generate k diverse counterfactuals for a single instance.
The default implementation calls
explaink times. Subclasses may override this method to implement more sophisticated diversity mechanisms (e.g., different random seeds, target classes, or optimization restarts).- Parameters:
- Return type:
- Returns:
cfs (
np.ndarray) – Array of counterfactuals with shape(k, ...), where...matches the shape of the inputx.cf_labels (
np.ndarray) – Array of predicted labels for each counterfactual, shape(k,).metas (
list[dict]) – List of k metadata dictionaries.
- Parameters:
Examples
>>> cfs, labels, metas = explainer.explain_k(x, k=5) >>> cfs.shape # (5, T) for univariate or (5, C, T) for multivariate
Implementations
CELS
Counterfactual explanations via learned saliency maps that blend the original instance with its nearest unlike neighbor. Based on Li et al. (2023).
- class tscf_eval.CELS[source]
Bases:
CounterfactualCELS counterfactual generator via learned saliency maps.
Implementation of the CELS algorithm by Li et al. (2023) [cels1].
CELS learns a saliency map theta that blends the original instance with its nearest unlike neighbor (NUN) to produce a counterfactual explanation. The saliency map is optimized to minimize a composite loss balancing prediction change, sparsity, and temporal smoothness.
The counterfactual is computed as:
x’ = x * (1 - theta) + NUN * theta
After optimization, theta is binarized to produce contiguous edits.
- Parameters:
model (
object) – A classifier with a probability estimator (predict_probaor a compatible interface). Must be differentiable or approximable.data (
tuple (``X_ref``,y_ref)) – Reference dataset used for finding nearest unlike neighbors.budget_coeff (
float, default0.6) – Weight for the budget (sparsity) loss termL_Budget = a * mean(|theta|). Higher values produce sparser saliency maps.tv_coeff (
float, default0.5) – Weight for the total variation loss termL_TV = b * mean(|theta_t - theta_{t+1}|^tv_beta). Higher values produce temporally smoother saliency maps.max_coeff (
float, default0.7) – Weight for the prediction loss termL_Max = g * (1 - P(target | x')). Higher values prioritize changing the prediction.tv_beta (
float, default3.0) – Exponent for the total variation norm. Higher values penalize large jumps more aggressively while tolerating small ones.gradient_subsample (
intorNone, default50) – Number of features to randomly sample for gradient computation each iteration. Uses stochastic gradient descent when set to a value less than the total number of features. Set to None to use all features (full gradient). Lower values speed up computation but may require more iterations to converge.learning_rate (
float, default0.1) – Step size for Adam optimizer. Internally scaled by data standard deviation so the effective step adapts to input magnitude.max_iter (
int, default5000) – Maximum number of optimization iterations.tau (
float, default0.5) – Decision threshold for target class probability. Optimization considers convergence whenP(target_class) >= tau.patience (
int, default30) – Number of consecutive iterations whereP(target) >= taubefore stopping. Allows the optimizer to further refine the saliency map after the prediction flips.tolerance (
float, default1e-4) – Convergence tolerance for total loss change between iterations.threshold (
float, default0.5) – Binarization threshold for the saliency map during post-processing. Values of theta above this become 1 (use NUN), below become 0 (keep original).random_state (
intorNone, default0) – PRNG seed for reproducible optimization.temperature (
floatorNone, defaultNone) – Temperature scaling for soft probability computation. Higher values produce smoother gradients by preventing sigmoid saturation when decision function values are large. If None, auto-calibrates based on model decision function values.
- predict_proba
Wrapped probability prediction function.
- Type:
callable
- rng
Random number generator for reproducibility.
- Type:
- X_ref
Reference dataset features.
- Type:
np.ndarray
- y_ref
Reference dataset labels.
- Type:
np.ndarray
References
[cels1]Li, P., Tang, B., & Ning, Y. (2023). CELS: Counterfactual Explanation of Time-Series via Learned Saliency Maps. In Proceedings of the IEEE International Conference on Big Data 2023, pp. 1952-1957. IEEE. https://github.com/Luckilyeee/CELS
- __post_init__()[source]
Initialise probability wrapper, RNG, reference data, and label mapping.
Validates all hyperparameters and computes normalisation statistics from the reference dataset. Warns if the model is unlikely to work well with gradient-based optimisation.
- Return type:
- explain(x, y_pred=None, *, class_of_interest=None)[source]
Generate a counterfactual explanation via learned saliency map.
- Parameters:
x (
np.ndarray) – Input time series of shape(T,)for univariate or(C, T)for multivariate data.y_pred (
int, optional) – Base predicted class forx. IfNone, computed via the model.class_of_interest (
int, optional) – Target class for the counterfactual. IfNone, uses the highest-probability alternative toy_pred.
- Return type:
- Returns:
cf (
np.ndarray) – Counterfactual time series with the same shape asx.cf_label (
int) – Predicted class label for the counterfactual.meta (
dict) – Metadata dictionary containing:method: Algorithm identifier ('cels').class_of_interest: Target class.nun_index_in_ref: Index of NUN in reference dataset.n_iterations: Number of iterations performed.converged: Whether optimization converged.final_target_prob: Final probability of target class.final_loss: Final composite loss value.mask_density: Fraction of saliency map above threshold.validity: Whether the counterfactual class differs from base.
- Parameters:
- explain_k(x, k=5, y_pred=None, *, class_of_interest=None)[source]
Generate k diverse counterfactuals using different NUNs.
CELS supports diverse counterfactual generation by using different nearest unlike neighbors as the blending source. Each counterfactual is generated with a different NUN, producing structurally diverse explanations while keeping the learned saliency map approach.
- Parameters:
- Return type:
- Returns:
cfs (
np.ndarray) – Array of k counterfactuals with shape(k, ...).cf_labels (
np.ndarray) – Array of k predicted labels.metas (
list[dict]) – List of k metadata dictionaries.
- Parameters:
- __init__(model, data, budget_coeff=0.6, tv_coeff=0.5, max_coeff=0.7, tv_beta=3.0, gradient_subsample=50, learning_rate=0.1, max_iter=5000, tau=0.5, patience=30, tolerance=0.0001, threshold=0.5, random_state=0, temperature=None)
- Parameters:
- Return type:
None
CoMTE
Counterfactual explanations for Multivariate Time series using greedy channel substitution from distractor series. Based on Ates et al. (2021).
- class tscf_eval.COMTE[source]
Bases:
CounterfactualCoMTE (Sequential Greedy) counterfactual generator for time-series.
Implementation of the CoMTE algorithm by Ates et al. (2021) [comte1].
Produces counterfactuals by greedily replacing whole variables (channels) from distractor series drawn from a reference set. Distractors are selected among reference instances predicted as the target class. For each distractor the algorithm performs a sequential greedy search that replaces channels one-by-one, choosing at each step the channel swap that most increases the model probability
f_cof the target class. The best counterfactual across distractors is chosen using the paper’s loss:L = max(0, tau - f_c)^2 + lambda_reg * max(0, n_vars - delta)
Supported distances:
'dtw': multivariate DTW viadtw_distance_vec_multich'euclidean': Euclidean distance using flattened pairwise distances
- Parameters:
model (
object) – A classifier with a probability estimator (predict_probaor a compatible interface). The helperpredict_proba_fnwraps model inference.data (
tuple (``X_ref``,y_ref)) – Reference dataset used to select distractors.distance (
{'euclidean', 'dtw'}, default'dtw') – Distance metric to find nearest distractors.'euclidean': Euclidean distance on flattened vectors. Faster but ignores temporal alignment.'dtw': Dynamic Time Warping distance (per-channel, averaged). Respects temporal shifts and is recommended for time series.
n_distractors (
int) – Maximum number of distractors to try.tau (
float) – Target probability threshold for classc.delta (
int) – Preferred number of variable edits (paper’s sweet spot).lambda_reg (
float) – Regularization weight in the paper loss.random_state (
Optional[int]) – Seed for reproducible distractor tie-breaking.
References
[comte1]Ates, E., Aksar, B., Leung, V. J., & Coskun, A. K. (2021). Counterfactual Explanations for Multivariate Time Series. ICAPAI 2021. https://github.com/peaclab/CoMTE
- __post_init__()[source]
Initialise probability wrapper, RNG, reference data, and label mapping.
Validates all hyperparameters and pre-computes reference-set predictions to avoid redundant calls during distractor selection.
- Raises:
ValueError – If
distanceis not in{'euclidean', 'dtw'},n_distractors < 1,tauis outside(0, 1],delta < 1, orlambda_reg < 0.
- explain(x, y_pred=None, *, class_of_interest=None)[source]
Generate a counterfactual toward a class of interest.
- Parameters:
x (
np.ndarray) – Input time series of shape(T,)for univariate or(C, T)for multivariate data.y_pred (
int, optional) – Base predicted class forx. IfNone, computed via the model.class_of_interest (
int, optional) – Target class for the counterfactual. IfNone, uses the highest-probability alternative toy_pred.
- Return type:
- Returns:
cf (
np.ndarray) – Counterfactual time series with the same shape asx.cf_label (
int) – Predicted class label for the counterfactual.meta (
dict) – Metadata dictionary containing:method: Algorithm identifier ('comte_greedy').distance: Distance metric used.class_of_interest: Target class.tau,delta,lambda_reg: Algorithm parameters.distractor_index_in_ref: Index of selected distractor.distractor_distance: Distance to selected distractor.edits_variables: List of edited channel indices.target_prob: Final target class probability.loss: Final loss value.
- Parameters:
- explain_k(x, k=5, y_pred=None, *, class_of_interest=None)[source]
Generate k diverse counterfactuals using different distractors.
COMTE naturally supports diverse counterfactual generation by using different distractor instances from the reference set. Each CF is generated using a different distractor, producing structurally diverse explanations.
- Parameters:
- Return type:
- Returns:
cfs (
np.ndarray) – Array of k counterfactuals.cf_labels (
np.ndarray) – Array of k predicted labels.metas (
list[dict]) – List of k metadata dictionaries.
- Parameters:
- __init__(model, data, distance='dtw', n_distractors=10, tau=0.95, delta=3, lambda_reg=0.8, random_state=0)
NativeGuide
Instance-based counterfactual explanations using nearest unlike neighbor guidance with DTW barycenter averaging. Based on Delaney et al. (2021).
- class tscf_eval.NativeGuide[source]
Bases:
CounterfactualNativeGuide counterfactual generator for time-series.
Implementation of the NativeGuide algorithm by Delaney et al. (2021) [ng1].
The algorithm retrieves a “native guide” (nearest-unlike neighbor, NUN) from a reference set. Depending on the method, it either:
‘blend’ (original paper): Blends the query with the NUN using weighted DTW barycenter averaging, incrementally increasing the guide’s influence until prediction flips.
‘ng’: Copies a contiguous window from the NUN into the query, growing the window until prediction flips.
‘dtw_dba’: Like ‘ng’ but uses a DTW-DBA barycenter of k unlike neighbors as the guide.
‘cam’: Like ‘ng’ but uses a CAM importance function to select the discriminative window.
- Parameters:
model (
object) – A classifier-like object that exposes a probability estimator. The internal helperpredict_proba_fnadapts common interfaces (e.g. scikit-learn, aeon).data (
tuple) – A tuple(X_ref, y_ref)containing the reference dataset used to select distractors.X_refcan have shape(N, T)or(N, C, T).method (
{'blend', 'ng', 'dtw_dba', 'cam'}, default'blend') – Strategy for counterfactual generation:‘blend’: Original paper method. Weighted averaging of query and NUN using DTW barycenter, incrementally increasing NUN influence.
‘ng’: Window replacement using nearest-unlike neighbor.
‘dtw_dba’: Window replacement using DTW-DBA barycenter of k neighbors.
‘cam’: Window replacement guided by CAM importance function.
distance (
{'euclidean', 'dtw'}, default'dtw') – Distance metric used to rank distractors when searching the reference set.'euclidean': Euclidean distance on flattened vectors. Faster but ignores temporal alignment.'dtw': Dynamic Time Warping distance (per-channel, averaged). Respects temporal shifts and is recommended for time series.
k_unlike (
int, default5) – Number of unlike neighbors to consider when computing a DTW-DBA guide.random_state (
intorNone, default0) – PRNG seed for deterministic behaviour where applicable.beta_step (
float, default0.01) – Formethod='blend': increment for the blending weight beta at each iteration (original paper uses 0.01).target_prob (
float, default0.5) – Formethod='blend': target probability threshold for the counterfactual class (original paper uses 0.5).cam_importance_fn (
callableorNone) – Whenmethod=='cam', a function with signature(series, y_pred) -> np.ndarraythat returns an importance map of shape(T,)or(C, T).
Notes
The public API is
explain(x, y_pred=None) -> (cf, cf_label, meta). The returnedmetadictionary contains keys such asnun_index_in_X,neighbor_indices,neighbor_distance,window_start,window_len, andbeta(for blend method).References
[ng1]Delaney, E., Greene, D., & Keane, M. T. (2021). Instance-Based Counterfactual Explanations for Time Series Classification. ICCBR 2021. https://github.com/e-delaney/Instance-Based_CFE_TSC
- __post_init__()[source]
Initialise probability wrapper, RNG, reference data, and label mapping.
Validates all hyperparameters, pre-computes reference-set predictions, and checks method-specific requirements (e.g.
cam_importance_fnwhenmethod='cam').- Raises:
ValueError – If
Xandyhave mismatched sample counts,methodordistanceis not in the allowed set,beta_steportarget_probis outside(0, 1], ormethod='cam'without acam_importance_fn.
- explain(x, y_pred=None)[source]
Generate a counterfactual explanation for a time series instance.
- Parameters:
x (
np.ndarray) – Input time series of shape(T,)for univariate or(C, T)for multivariate data.y_pred (
int, optional) – Precomputed predicted class forx. IfNone, computed via the model.
- Return type:
- Returns:
cf (
np.ndarray) – Counterfactual time series with the same shape asx.cf_label (
int) – Predicted class label for the counterfactual.meta (
dict) – Metadata dictionary containing:method: Algorithm variant used.distance: Distance metric used.nun_index_in_X: Index of nearest unlike neighbor.neighbor_indices: Indices of neighbors (for dtw_dba).neighbor_distance: Distance to nearest unlike neighbor.beta: Blending weight (for blend method, elseNone).window_start: Start of replacement window (elseNone).window_len: Length of replacement window (elseNone).
- Parameters:
- explain_k(x, k=5, y_pred=None)[source]
Generate k diverse counterfactuals using different unlike neighbors.
NativeGuide naturally supports diverse counterfactual generation by using different unlike neighbors as guides. Each counterfactual is generated using a different neighbor, producing structurally diverse explanations.
- Parameters:
- Return type:
- Returns:
cfs (
np.ndarray) – Array of k counterfactuals with shape(k, ...).cf_labels (
np.ndarray) – Array of k predicted labels.metas (
list[dict]) – List of k metadata dictionaries.
- Parameters:
- __init__(model, data, method='blend', distance='dtw', k_unlike=5, random_state=0, beta_step=0.01, target_prob=0.5, cam_importance_fn=None)
- Parameters:
- Return type:
None
SETS
Shapelet-based counterfactual explanations using class-specific shapelet manipulation with contiguous perturbations. Based on Bahri et al. (2022).
- class tscf_eval.SETS[source]
Bases:
CounterfactualSETS counterfactual generator using class-specific shapelets.
Implementation of the SETS algorithm by Bahri et al. (2022) [sets1].
SETS leverages the inherent interpretability of shapelets to produce counterfactual explanations with contiguous, visually meaningful perturbations. The preprocessing phase discovers class-exclusive shapelets and their typical occurrence positions; the generation phase removes original-class shapelets and introduces target-class shapelets to flip the classifier prediction.
- Parameters:
model (
object) – A classifier withpredict_proba(or compatible interface).data (
tuple (``X_ref``,y_ref)) – Reference dataset for shapelet extraction and NUN lookup.n_shapelet_samples (
int, default10000) – Number of candidate shapelets to evaluate during extraction.max_shapelets (
intorNone, defaultNone) – Maximum shapelets to retain.Noneuses aeon’s default (min(10 * n_cases, 1000)).min_shapelet_length (
int, default3) – Minimum shapelet length.max_shapelet_length (
intorNone, defaultNone) – Maximum shapelet length.Noneuses the full series length.time_limit_in_minutes (
float, default0.0) – Time budget for shapelet extraction (0 = usen_shapelet_samples).threshold_percentile (
float, default10.0) – Bottom percentile of per-shapelet scaled distances used as the occlusion threshold. Lower values are stricter.max_combination_dims (
int, default3) – Maximum number of dimensions to combine when single-dimension edits fail. Caps the combinatorial search at C(D, k) for k ≤max_combination_dims.random_state (
intorNone, default0) – PRNG seed for reproducibility.n_jobs (
int, default1) – Number of parallel jobs for shapelet extraction.
- predict_proba
Wrapped probability prediction function.
- Type:
callable
- rng
Random number generator.
- Type:
- X_ref
Reference dataset features.
- Type:
np.ndarray
- y_ref
Reference dataset labels.
- Type:
np.ndarray
References
[sets1]Bahri, O., Filali Boubrahimi, S., & Hamdi, S. M. (2022). Shapelet-Based Counterfactual Explanations for Multivariate Time Series. In Proceedings of the ACM SIGKDD Workshop on Mining and Learning from Time Series (KDD-MiLeTS 2022). https://github.com/omarbahri/SETS
- __post_init__()[source]
Initialise prediction wrapper, reference data, and shapelet pipeline.
Validates parameters, fits the shapelet transform, computes the occlusion threshold, assigns class-exclusive shapelets, builds heat maps, and computes per-channel information gain.
- explain(x, y_pred=None, *, class_of_interest=None)[source]
Generate a counterfactual explanation using SETS.
- Parameters:
- Return type:
- Returns:
cf (
np.ndarray) – Counterfactual time series with the same shape asx.cf_label (
int) – Predicted class label for the counterfactual.meta (
dict) – Metadata dictionary containing:method:'sets'class_of_interest: Target class.nun_index_in_ref: Index of the NUN used.dimensions_modified: Channels edited.phase_a_edits: Number of Phase A replacements.phase_b_edits: Number of Phase B insertions.n_class_shapelets: Total surviving class-exclusive shapelets.validity: Whether the target class was achieved.failure_reason:Noneif successful, description otherwise.
- Parameters:
- explain_k(x, k=5, y_pred=None, *, class_of_interest=None)[source]
Generate k diverse counterfactuals using different NUNs.
SETS supports diverse counterfactual generation by using different nearest unlike neighbors as the replacement source for Phase A. Each counterfactual is generated with a different NUN, producing structurally diverse explanations.
- Parameters:
- Return type:
- Returns:
cfs (
np.ndarray) – Array of k counterfactuals with shape(k, ...).cf_labels (
np.ndarray) – Array of k predicted labels.metas (
list[dict]) – List of k metadata dictionaries.
- Parameters:
- __init__(model, data, n_shapelet_samples=10000, max_shapelets=None, min_shapelet_length=3, max_shapelet_length=None, time_limit_in_minutes=0.0, threshold_percentile=10.0, max_combination_dims=3, random_state=0, n_jobs=1)
- Parameters:
- Return type:
None
TSEvo
Evolutionary counterfactual generation using multi-objective optimization (NSGA-II) with three mutation strategies: authentic, frequency, and gaussian. Based on Höllig et al. (2022).
- class tscf_eval.TSEvo[source]
Bases:
CounterfactualTSEvo counterfactual generator using multi-objective evolutionary optimization.
Implementation of the TSEvo algorithm by Höllig et al. (2022) [tsevo1].
TSEvo uses NSGA-II (Non-dominated Sorting Genetic Algorithm II) to evolve counterfactual explanations that balance three objectives: changing the model’s prediction (validity), minimizing perturbation (proximity), and keeping changes sparse (sparsity).
The algorithm supports three mutation strategies that can be used individually or combined:
authentic: Replace windows with segments from reference series
frequency: Replace frequency bands via FFT transformation
gaussian: Apply Gaussian perturbation based on reference statistics
- Parameters:
model (
object) – A classifier with a probability estimator (predict_probaor a compatible interface). The helperpredict_proba_fnwraps model inference.data (
tuple (``X_ref``,y_ref)) – Reference dataset used for mutation operations. Series predicted as the target class are used during evolution.transformer (
{'authentic', 'frequency', 'gaussian', 'all'}, default'authentic') – Mutation strategy to use:‘authentic’: Authentic opposing information (window replacement)
‘frequency’: Frequency band mapping via FFT
‘gaussian’: Gaussian perturbation from reference statistics
‘all’: Randomly select among all strategies per individual
n_generations (
int, default100) – Number of evolutionary generations.population_size (
int, default50) – Population size (μ in NSGA-II).crossover_prob (
float, default0.9) – Probability of applying crossover between individuals.mutation_prob (
float, default0.6) – Probability of applying mutation to an individual.window_sizes (
tupleofint, default(5,10,20)) – Candidate window sizes for authentic mutation operator.random_state (
intorNone, default0) – PRNG seed for reproducible evolution.verbose (
int, default0) – Verbosity level (0=silent, 1=progress, 2=detailed).
- predict_proba
Wrapped probability prediction function.
- Type:
callable
- rng
Random number generator for reproducibility.
- Type:
- X_ref
Reference dataset features.
- Type:
np.ndarray
- y_ref
Reference dataset labels.
- Type:
np.ndarray
References
[tsevo1]Höllig, J., Kulbach, C., & Thoma, S. (2022). TSEvo: Evolutionary Counterfactual Explanations for Time Series Classification. ICMLA 2022. https://github.com/JHoelli/TSEvo
- __post_init__()[source]
Initialise probability wrapper, RNG, reference data, and label mapping.
Validates all hyperparameters and ensures the
deappackage is available for evolutionary computation. Roundspopulation_sizeup to the nearest multiple of four as required by NSGA-II tournament selection.
- explain(x, y_pred=None, *, class_of_interest=None)[source]
Generate a counterfactual explanation using evolutionary optimization.
- Parameters:
x (
np.ndarray) – Input time series of shape(T,)for univariate or(C, T)for multivariate data.y_pred (
int, optional) – Base predicted class forx. IfNone, computed via the model.class_of_interest (
int, optional) – Target class for the counterfactual. IfNone, uses the highest-probability alternative toy_pred.
- Return type:
- Returns:
cf (
np.ndarray) – Best counterfactual time series with the same shape asx.cf_label (
int) – Predicted class label for the counterfactual.meta (
dict) – Metadata dictionary containing:method: Algorithm identifier ('tsevo').transformer: Mutation strategy used.class_of_interest: Target class.n_generations: Number of generations evolved.population_size: Population size used.objectives: Final objective values (output_dist, input_dist, sparsity).pareto_front_size: Number of solutions in Pareto front.validity: Whether prediction changed (True/False).
- Parameters:
- __init__(model, data, transformer='authentic', n_generations=100, population_size=50, crossover_prob=0.9, mutation_prob=0.6, window_sizes=(5, 10, 20), random_state=0, verbose=0)
Glacier
Gradient-based counterfactual generation with guided locally constrained optimization using importance-weighted proximity. Based on Wang et al. (2024).
- class tscf_eval.Glacier[source]
Bases:
CounterfactualGlacier counterfactual generator using gradient-based optimization.
Implementation of the Glacier algorithm by Wang et al. (2024) [glacier1].
Glacier uses gradient-based optimization with guided constraints to generate counterfactual explanations. The key innovation is applying importance-based weights that allow free modification of less-important time series regions while preserving critical features.
The optimization minimizes a composite loss:
L = w * L_pred + (1-w) * L_proximity
where: - L_pred: Prediction margin loss (distance to target class probability) - L_proximity: Weighted distance from original (importance-weighted) - w: pred_margin_weight parameter
- Parameters:
model (
object) – A classifier with a probability estimator (predict_probaor a compatible interface). Must be differentiable or approximable.data (
tuple (``X_ref``,y_ref)) – Reference dataset used for computing feature importance and normalization statistics.pred_margin_weight (
float, default0.75) – Weight balancing prediction margin loss vs proximity loss. Higher values prioritize changing the prediction over staying close to the original. Range: [0, 1]. Values >= 0.75 recommended for non-neural-network classifiers where finite-difference gradients are weak relative to the proximity gradient.learning_rate (
float, default0.01) – Step size for Adam optimizer. Internally scaled by data standard deviation so the effective step adapts to input magnitude.max_iter (
int, default300) – Maximum number of optimization iterations.tau (
float, default0.5) – Decision threshold for target class probability. Optimization stops when P(target_class) >= tau.tolerance (
float, default1e-4) – Convergence tolerance for prediction margin loss.weight_type (
{'uniform', 'local', 'unconstrained'}, default'uniform') – Type of importance weighting:‘uniform’: Equal weights across all timesteps
‘local’: Segment-based LIME importance following the paper. Uses matrix-profile changepoint segmentation, STFT background perturbation, and Ridge regression surrogate to compute per-segment importance, producing binary timestep weights. Requires
stumpyandscipyfor full functionality (falls back to uniform segments / mean background otherwise).‘unconstrained’: No proximity penalty (pure prediction optimization)
random_state (
intorNone, default0) – PRNG seed for reproducible optimization.gradient_subsample (
intorNone, default50) – Number of features to randomly sample for gradient computation each iteration. Uses stochastic gradient descent when set to a value less than the total number of features. Set to None to use all features (full gradient). Lower values speed up computation but may require more iterations to converge.temperature (
floatorNone, defaultNone) – Temperature scaling for soft probability computation. Higher values produce smoother gradients by preventing sigmoid saturation when decision function values are large. If None, auto-calibrates based on model decision function values (recommended for most use cases). Increase manually (e.g., 2.0-5.0) if counterfactuals are unchanged with ROCKET or other margin-based classifiers.n_segments (
int, default10) – Number of changepoints for segment-based local importance (weight_type='local'). Producesn_segments + 1segments. Ignored whenweight_typeis not'local'.segment_window (
int, default10) – Window size for the matrix-profile segmentation algorithm. Ignored whenweight_typeis not'local'.n_perturbations (
int, default100) – Number of binary perturbation samples for the LIME surrogate model used in segment-based local importance. Ignored whenweight_typeis not'local'.
- predict_proba
Wrapped probability prediction function.
- Type:
callable
- rng
Random number generator for reproducibility.
- Type:
- X_ref
Reference dataset features.
- Type:
np.ndarray
- y_ref
Reference dataset labels.
- Type:
np.ndarray
- _mean
Mean of reference data (for normalization).
- Type:
np.ndarray
- _std
Standard deviation of reference data (for normalization).
- Type:
np.ndarray
References
[glacier1]Wang, Z., Samsten, I., Miliou, I., Mochaourab, R., & Papapetrou, P. (2024). Glacier: Guided Locally Constrained Counterfactual Explanations for Time Series Classification. Machine Learning, 113(3). https://github.com/zhendong3wang/learning-time-series-counterfactuals
- __post_init__()[source]
Initialise probability wrapper, RNG, reference data, and label mapping.
Validates all hyperparameters and computes normalisation statistics from the reference dataset. Warns if the model is unlikely to work well with gradient-based optimisation.
- explain(x, y_pred=None, *, class_of_interest=None)[source]
Generate a counterfactual explanation using gradient-based optimization.
- Parameters:
x (
np.ndarray) – Input time series of shape(T,)for univariate or(C, T)for multivariate data.y_pred (
int, optional) – Base predicted class forx. IfNone, computed via the model.class_of_interest (
int, optional) – Target class for the counterfactual. IfNone, uses the highest-probability alternative toy_pred.
- Return type:
- Returns:
cf (
np.ndarray) – Counterfactual time series with the same shape asx.cf_label (
int) – Predicted class label for the counterfactual.meta (
dict) – Metadata dictionary containing:method: Algorithm identifier ('glacier').weight_type: Constraint type used.class_of_interest: Target class.pred_margin_weight: Weight parameter used.learning_rate: Learning rate used.n_iterations: Number of iterations performed.converged: Whether optimization converged.final_target_prob: Final probability of target class.final_loss: Final composite loss value.
- Parameters:
- __init__(model, data, pred_margin_weight=0.75, learning_rate=0.01, max_iter=300, tau=0.5, tolerance=0.0001, weight_type='uniform', random_state=0, gradient_subsample=50, temperature=None, n_segments=10, segment_window=10, n_perturbations=100)
- Parameters:
model (Any)
pred_margin_weight (float)
learning_rate (float)
max_iter (int)
tau (float)
tolerance (float)
weight_type (Literal['uniform', 'local', 'unconstrained'])
random_state (int | None)
gradient_subsample (int | None)
temperature (float | None)
n_segments (int)
segment_window (int)
n_perturbations (int)
- Return type:
None
LatentCF++
Gradient-based counterfactual generation with importance-weighted proximity constraints, optimizing directly in the input space. Based on Wang et al. (2021).
- class tscf_eval.LatentCF[source]
Bases:
CounterfactualLatentCF++ counterfactual generator using gradient-based optimization.
Implementation of the LatentCF++ algorithm by Wang et al. (2021) [latentcf1].
LatentCF++ generates counterfactuals by optimizing in the latent space (or directly in input space when no autoencoder is provided). The algorithm balances prediction margin loss (driving toward target class) with weighted proximity loss (staying close to original, prioritizing less important regions).
The optimization minimizes a composite loss:
L = w * L_pred + (1-w) * L_proximity
where: - L_pred: Mean squared error between desired probability (1.0) and current - L_proximity: Weighted mean absolute error from original - w: pred_margin_weight parameter
- Parameters:
model (
object) – A classifier with a probability estimator (predict_probaor a compatible interface).data (
tuple (``X_ref``,y_ref)) – Reference dataset used for computing feature importance (for ‘global’ weight strategy) and normalization statistics.probability (
float, default0.5) – Target probability threshold. Optimization aims for P(target) >= probability.tolerance (
float, default1e-6) – Convergence tolerance. Optimization stops when prediction margin loss is below tolerance AND target probability is reached.max_iter (
int, default300) – Maximum number of optimization iterations.learning_rate (
float, default0.01) – Step size for Adam optimizer. Internally scaled by data standard deviation so the effective step adapts to input magnitude.pred_margin_weight (
float, default0.75) – Weight balancing prediction margin loss vs proximity loss. Range: [0, 1]. Higher values prioritize changing the prediction. Values >= 0.75 recommended for non-neural-network classifiers.step_weights (
{'uniform', 'local', 'global'}, default'uniform') – Strategy for computing importance weights:‘uniform’: Equal weights across all timesteps
‘local’: Per-sample importance via perturbation-based sensitivity
‘global’: Dataset-level importance computed across reference samples
random_state (
intorNone, default0) – PRNG seed for reproducible optimization.gradient_subsample (
intorNone, default50) – Number of features to randomly sample for gradient computation each iteration. Uses stochastic gradient descent when set to a value less than the total number of features. Set to None to use all features (full gradient). Lower values speed up computation but may require more iterations to converge.temperature (
floatorNone, defaultNone) – Temperature scaling for soft probability computation. Higher values produce smoother gradients by preventing sigmoid saturation when decision function values are large. If None, auto-calibrates based on model decision function values (recommended for most use cases). Increase manually (e.g., 2.0-5.0) if counterfactuals are unchanged with ROCKET or other margin-based classifiers.
- predict_proba
Wrapped probability prediction function.
- Type:
callable
- rng
Random number generator for reproducibility.
- Type:
- X_ref
Reference dataset features.
- Type:
np.ndarray
- y_ref
Reference dataset labels.
- Type:
np.ndarray
References
[latentcf1]Wang, Z., Samsten, I., Mochaourab, R., & Papapetrou, P. (2021). Learning Time Series Counterfactuals via Latent Space Representations. In International Conference on Discovery Science (DS 2021). https://github.com/zhendong3wang/learning-time-series-counterfactuals
- __post_init__()[source]
Initialise probability wrapper, RNG, reference data, and label mapping.
Validates all hyperparameters. Warns if the model is unlikely to work well with gradient-based optimisation.
- explain(x, y_pred=None, *, class_of_interest=None)[source]
Generate a counterfactual explanation using LatentCF++ optimization.
- Parameters:
x (
np.ndarray) – Input time series of shape(T,)for univariate or(C, T)for multivariate data.y_pred (
int, optional) – Base predicted class forx. IfNone, computed via the model.class_of_interest (
int, optional) – Target class for the counterfactual. IfNone, uses the highest-probability alternative toy_pred.
- Return type:
- Returns:
cf (
np.ndarray) – Counterfactual time series with the same shape asx.cf_label (
int) – Predicted class label for the counterfactual.meta (
dict) – Metadata dictionary containing:method: Algorithm identifier ('latent_cf').step_weights: Weight strategy used.class_of_interest: Target class.pred_margin_weight: Weight parameter used.learning_rate: Learning rate used.n_iterations: Number of iterations performed.converged: Whether optimization converged.final_target_prob: Final probability of target class.final_loss: Final composite loss value.validity: Whether counterfactual changed prediction.
- Parameters:
- __init__(model, data, probability=0.5, tolerance=1e-06, max_iter=300, learning_rate=0.01, pred_margin_weight=0.75, step_weights='uniform', random_state=0, gradient_subsample=50, temperature=None)
- Parameters:
- Return type:
None
References
The counterfactual methods implemented in this module are based on the following papers:
Li, P., Tang, B., & Ning, Y. (2023). “CELS: Counterfactual Explanation of Time-Series via Learned Saliency Maps.” In Proceedings of the IEEE International Conference on Big Data 2023, pp. 1952-1957. IEEE. [Paper] [Code]
Ates, E., Aksar, B., Leung, V. J., & Coskun, A. K. (2021). “Counterfactual Explanations for Multivariate Time Series.” In Proceedings of the 2021 International Conference on Applied Artificial Intelligence (ICAPAI), pp. 1-8. [Paper] [Code]
Delaney, E., Greene, D., & Keane, M. T. (2021). “Instance-Based Counterfactual Explanations for Time Series Classification.” In Case-Based Reasoning Research and Development (ICCBR 2021), pp. 32-47. Springer. [Paper] [Code]
Bahri, O., Filali Boubrahimi, S., & Hamdi, S. M. (2022). “Shapelet-Based Counterfactual Explanations for Multivariate Time Series.” In Proceedings of the ACM SIGKDD Workshop on Mining and Learning from Time Series (KDD-MiLeTS 2022). [Paper] [Code]
Höllig, J., Kulbach, C., & Thoma, S. (2022). “TSEvo: Evolutionary Counterfactual Explanations for Time Series Classification.” In Proceedings of the 21st IEEE International Conference on Machine Learning and Applications (ICMLA 2022), pp. 29-36. [Paper] [Code]
Wang, Z., Samsten, I., Miliou, I., Mochaourab, R., & Papapetrou, P. (2024). “Glacier: Guided Locally Constrained Counterfactual Explanations for Time Series Classification.” Machine Learning, 113(3). [Paper] [Code]
Wang, Z., Samsten, I., Mochaourab, R., & Papapetrou, P. (2021). “Learning Time Series Counterfactuals via Latent Space Representations.” In International Conference on Discovery Science (DS 2021), Lecture Notes in Computer Science, vol 12986, pp. 369-384. Springer. [Paper] [Code]
The implementations also use TSInterpret as a foundation:
Hollig, J., Kulbach, C., & Thoma, S. (2023). “TSInterpret: A Python Package for the Interpretability of Time Series Classification.” Journal of Open Source Software, 8(85), 5220. [Paper]