TSCFEval ======== **TSCFEval** is a model-agnostic Python framework for systematic evaluation of counterfactual explanations in Time Series Classification (TSC). Unlike existing libraries that focus on counterfactual generation, TSCFEval is specifically designed for counterfactual evaluation, consolidating fragmented evaluation practices from the TSC counterfactual literature into a unified, extensible toolkit. This library is part of the paper: Zamith Santos, B., Andrade Lira, M. F., Cerri, R., & Cavalcante Prudêncio, R. B. (2026). *TSCFEval: A Model-Agnostic Framework for Evaluating Time Series Classification Counterfactuals*. In Explainable Artificial Intelligence. xAI 2026. Communications in Computer and Information Science. Springer, Cham. Accepted at the **XAI World Conference 2026** (Fortaleza, Ceará, Brazil). Given a time series classifier and counterfactual explanations, TSCFEval provides: - **11 evaluation metrics** organized into **six quality dimensions** (core quality, distribution alignment, structural properties, model behavior, stability, and computational performance) - **Weighted scalarization** for aggregating metrics into composite scores, enabling customizable method ranking - **Confidence-stratified instance selection** for benchmarking across the decision boundary - **Three benchmarking scenarios**: single dataset with multiple CF methods, single dataset with multiple classifiers, and multiple datasets with a fixed classifier - **7 built-in CF methods** for generating counterfactuals - **Pareto and Friedman analysis** for principled multi-criteria comparison Installation ------------ .. code-block:: bash pip install tscf-eval With optional dependencies: .. code-block:: bash pip install tscf-eval[dtw] # DTW distance support pip install tscf-eval[full] # All features Available Methods and Metrics ----------------------------- Counterfactual Methods ~~~~~~~~~~~~~~~~~~~~~~ .. list-table:: :header-rows: 1 :widths: 15 15 40 30 * - Method - Strategy - Description - Reference * - ``CELS`` - Saliency map - Learned saliency map blending with nearest unlike neighbor - Li et al., 2023 * - ``NativeGuide`` - Instance-based - Nearest unlike neighbor guidance (blend, ng, dtw_dba, cam) - Delaney et al., 2021 * - ``COMTE`` - Instance-based - Greedy channel substitution for multivariate TS - Ates et al., 2021 * - ``SETS`` - Shapelet-based - Class-specific shapelet manipulation with contiguous perturbations - Bahri et al., 2022 * - ``TSEvo`` - Evolutionary - Multi-objective optimization via NSGA-II - Hollig et al., 2022 * - ``Glacier`` - Gradient-based - Gradient optimization with importance-weighted proximity - Wang et al., 2024 * - ``LatentCF`` (LatentCF++) - Gradient-based - Latent space optimization with local/global weighting - Wang et al., 2021 Evaluation Metrics ~~~~~~~~~~~~~~~~~~ TSCFEval implements 11 metrics organized into six quality dimensions: .. list-table:: :header-rows: 1 :widths: 20 20 35 25 * - Dimension - Metric - Description - Direction * - Core Quality - ``Validity`` - Fraction of CFs that flip the prediction (hard or soft mode) - maximize * - Core Quality - ``Proximity`` - Closeness to original instance (L1, L2, L-inf, DTW) - maximize * - Core Quality - ``Sparsity`` - Fraction of changed features - minimize * - Distribution - ``Plausibility`` - Whether CFs lie within data distribution (LOF, IF, MP-OCSVM, DTW-LOF) - maximize * - Distribution - ``Diversity`` - Variety among multiple CFs via DPP (Euclidean or DTW) - maximize * - Structure - ``Contiguity`` - How contiguous the edits are - maximize * - Structure - ``Composition`` - Number and length of edit segments - minimize * - Model Behavior - ``Confidence`` - Model confidence on original and CF predictions - maximize * - Model Behavior - ``Controllability`` - Ease of reverting CF changes via single-feature edits - maximize * - Stability - ``Robustness`` - Local Lipschitz-like stability to input perturbations (Euclidean or DTW) - minimize * - Performance - ``Efficiency`` - Generation time per instance - minimize Quick Start ----------- Evaluating Counterfactuals ~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python from sklearn.neighbors import KNeighborsClassifier from tscf_eval import ( Evaluator, Validity, Proximity, Sparsity, UCRLoader, NativeGuide, ) # Load data loader = UCRLoader("ItalyPowerDemand") train, test = loader.load("train"), loader.load("test") # Train classifier clf = KNeighborsClassifier(n_neighbors=3) clf.fit(train.X, train.y) # Generate counterfactuals using NativeGuide explainer = NativeGuide(clf, (train.X, train.y), method="blend") X, X_cf, y, y_cf = [], [], [], [] for x in test.X[:10]: cf, cf_label, _ = explainer.explain(x) X.append(x) X_cf.append(cf) y.append(clf.predict(x.reshape(1, -1))[0]) y_cf.append(cf_label) # Evaluate counterfactual quality evaluator = Evaluator([ Validity(), Proximity(p=2, distance="lp"), Proximity(distance="dtw"), Sparsity(), ]) results = evaluator.evaluate(X, X_cf, y=y, y_cf=y_cf) Contents -------- .. toctree:: :maxdepth: 2 :caption: User Guide installation quickstart examples .. toctree:: :maxdepth: 2 :caption: API Reference api/evaluator api/counterfactuals api/data_loader api/benchmark .. toctree:: :maxdepth: 1 :caption: Development contributing changelog Indices and tables ================== * :ref:`genindex` * :ref:`modindex` * :ref:`search`