TSCFEval
TSCFEval is a model-agnostic Python framework for systematic evaluation of counterfactual explanations in Time Series Classification (TSC).
Unlike existing libraries that focus on counterfactual generation, TSCFEval is specifically designed for counterfactual evaluation, consolidating fragmented evaluation practices from the TSC counterfactual literature into a unified, extensible toolkit.
This library is part of the paper:
Zamith Santos, B., Andrade Lira, M. F., Cerri, R., & Cavalcante Prudêncio, R. B. (2026). TSCFEval: A Model-Agnostic Framework for Evaluating Time Series Classification Counterfactuals. In Explainable Artificial Intelligence. xAI 2026. Communications in Computer and Information Science. Springer, Cham.
Accepted at the XAI World Conference 2026 (Fortaleza, Ceará, Brazil).
Given a time series classifier and counterfactual explanations, TSCFEval provides:
11 evaluation metrics organized into six quality dimensions (core quality, distribution alignment, structural properties, model behavior, stability, and computational performance)
Weighted scalarization for aggregating metrics into composite scores, enabling customizable method ranking
Confidence-stratified instance selection for benchmarking across the decision boundary
Three benchmarking scenarios: single dataset with multiple CF methods, single dataset with multiple classifiers, and multiple datasets with a fixed classifier
7 built-in CF methods for generating counterfactuals
Pareto and Friedman analysis for principled multi-criteria comparison
Installation
pip install tscf-eval
With optional dependencies:
pip install tscf-eval[dtw] # DTW distance support
pip install tscf-eval[full] # All features
Available Methods and Metrics
Counterfactual Methods
Method |
Strategy |
Description |
Reference |
|---|---|---|---|
|
Saliency map |
Learned saliency map blending with nearest unlike neighbor |
Li et al., 2023 |
|
Instance-based |
Nearest unlike neighbor guidance (blend, ng, dtw_dba, cam) |
Delaney et al., 2021 |
|
Instance-based |
Greedy channel substitution for multivariate TS |
Ates et al., 2021 |
|
Shapelet-based |
Class-specific shapelet manipulation with contiguous perturbations |
Bahri et al., 2022 |
|
Evolutionary |
Multi-objective optimization via NSGA-II |
Hollig et al., 2022 |
|
Gradient-based |
Gradient optimization with importance-weighted proximity |
Wang et al., 2024 |
|
Gradient-based |
Latent space optimization with local/global weighting |
Wang et al., 2021 |
Evaluation Metrics
TSCFEval implements 11 metrics organized into six quality dimensions:
Dimension |
Metric |
Description |
Direction |
|---|---|---|---|
Core Quality |
|
Fraction of CFs that flip the prediction (hard or soft mode) |
maximize |
Core Quality |
|
Closeness to original instance (L1, L2, L-inf, DTW) |
maximize |
Core Quality |
|
Fraction of changed features |
minimize |
Distribution |
|
Whether CFs lie within data distribution (LOF, IF, MP-OCSVM, DTW-LOF) |
maximize |
Distribution |
|
Variety among multiple CFs via DPP (Euclidean or DTW) |
maximize |
Structure |
|
How contiguous the edits are |
maximize |
Structure |
|
Number and length of edit segments |
minimize |
Model Behavior |
|
Model confidence on original and CF predictions |
maximize |
Model Behavior |
|
Ease of reverting CF changes via single-feature edits |
maximize |
Stability |
|
Local Lipschitz-like stability to input perturbations (Euclidean or DTW) |
minimize |
Performance |
|
Generation time per instance |
minimize |
Quick Start
Evaluating Counterfactuals
from sklearn.neighbors import KNeighborsClassifier
from tscf_eval import (
Evaluator, Validity, Proximity, Sparsity,
UCRLoader, NativeGuide,
)
# Load data
loader = UCRLoader("ItalyPowerDemand")
train, test = loader.load("train"), loader.load("test")
# Train classifier
clf = KNeighborsClassifier(n_neighbors=3)
clf.fit(train.X, train.y)
# Generate counterfactuals using NativeGuide
explainer = NativeGuide(clf, (train.X, train.y), method="blend")
X, X_cf, y, y_cf = [], [], [], []
for x in test.X[:10]:
cf, cf_label, _ = explainer.explain(x)
X.append(x)
X_cf.append(cf)
y.append(clf.predict(x.reshape(1, -1))[0])
y_cf.append(cf_label)
# Evaluate counterfactual quality
evaluator = Evaluator([
Validity(),
Proximity(p=2, distance="lp"),
Proximity(distance="dtw"),
Sparsity(),
])
results = evaluator.evaluate(X, X_cf, y=y, y_cf=y_cf)
Contents
User Guide
Development