TSCFEval

TSCFEval is a model-agnostic Python framework for systematic evaluation of counterfactual explanations in Time Series Classification (TSC).

Unlike existing libraries that focus on counterfactual generation, TSCFEval is specifically designed for counterfactual evaluation, consolidating fragmented evaluation practices from the TSC counterfactual literature into a unified, extensible toolkit.

This library is part of the paper:

Zamith Santos, B., Andrade Lira, M. F., Cerri, R., & Cavalcante Prudêncio, R. B. (2026). TSCFEval: A Model-Agnostic Framework for Evaluating Time Series Classification Counterfactuals. In Explainable Artificial Intelligence. xAI 2026. Communications in Computer and Information Science. Springer, Cham.

Accepted at the XAI World Conference 2026 (Fortaleza, Ceará, Brazil).

Given a time series classifier and counterfactual explanations, TSCFEval provides:

  • 11 evaluation metrics organized into six quality dimensions (core quality, distribution alignment, structural properties, model behavior, stability, and computational performance)

  • Weighted scalarization for aggregating metrics into composite scores, enabling customizable method ranking

  • Confidence-stratified instance selection for benchmarking across the decision boundary

  • Three benchmarking scenarios: single dataset with multiple CF methods, single dataset with multiple classifiers, and multiple datasets with a fixed classifier

  • 7 built-in CF methods for generating counterfactuals

  • Pareto and Friedman analysis for principled multi-criteria comparison

Installation

pip install tscf-eval

With optional dependencies:

pip install tscf-eval[dtw]   # DTW distance support
pip install tscf-eval[full]  # All features

Available Methods and Metrics

Counterfactual Methods

Method

Strategy

Description

Reference

CELS

Saliency map

Learned saliency map blending with nearest unlike neighbor

Li et al., 2023

NativeGuide

Instance-based

Nearest unlike neighbor guidance (blend, ng, dtw_dba, cam)

Delaney et al., 2021

COMTE

Instance-based

Greedy channel substitution for multivariate TS

Ates et al., 2021

SETS

Shapelet-based

Class-specific shapelet manipulation with contiguous perturbations

Bahri et al., 2022

TSEvo

Evolutionary

Multi-objective optimization via NSGA-II

Hollig et al., 2022

Glacier

Gradient-based

Gradient optimization with importance-weighted proximity

Wang et al., 2024

LatentCF (LatentCF++)

Gradient-based

Latent space optimization with local/global weighting

Wang et al., 2021

Evaluation Metrics

TSCFEval implements 11 metrics organized into six quality dimensions:

Dimension

Metric

Description

Direction

Core Quality

Validity

Fraction of CFs that flip the prediction (hard or soft mode)

maximize

Core Quality

Proximity

Closeness to original instance (L1, L2, L-inf, DTW)

maximize

Core Quality

Sparsity

Fraction of changed features

minimize

Distribution

Plausibility

Whether CFs lie within data distribution (LOF, IF, MP-OCSVM, DTW-LOF)

maximize

Distribution

Diversity

Variety among multiple CFs via DPP (Euclidean or DTW)

maximize

Structure

Contiguity

How contiguous the edits are

maximize

Structure

Composition

Number and length of edit segments

minimize

Model Behavior

Confidence

Model confidence on original and CF predictions

maximize

Model Behavior

Controllability

Ease of reverting CF changes via single-feature edits

maximize

Stability

Robustness

Local Lipschitz-like stability to input perturbations (Euclidean or DTW)

minimize

Performance

Efficiency

Generation time per instance

minimize

Quick Start

Evaluating Counterfactuals

from sklearn.neighbors import KNeighborsClassifier
from tscf_eval import (
    Evaluator, Validity, Proximity, Sparsity,
    UCRLoader, NativeGuide,
)

# Load data
loader = UCRLoader("ItalyPowerDemand")
train, test = loader.load("train"), loader.load("test")

# Train classifier
clf = KNeighborsClassifier(n_neighbors=3)
clf.fit(train.X, train.y)

# Generate counterfactuals using NativeGuide
explainer = NativeGuide(clf, (train.X, train.y), method="blend")
X, X_cf, y, y_cf = [], [], [], []
for x in test.X[:10]:
    cf, cf_label, _ = explainer.explain(x)
    X.append(x)
    X_cf.append(cf)
    y.append(clf.predict(x.reshape(1, -1))[0])
    y_cf.append(cf_label)

# Evaluate counterfactual quality
evaluator = Evaluator([
    Validity(),
    Proximity(p=2, distance="lp"),
    Proximity(distance="dtw"),
    Sparsity(),
])
results = evaluator.evaluate(X, X_cf, y=y, y_cf=y_cf)

Contents

Development

Indices and tables