Quick Start Guide ================= This guide will help you get started with tscf-eval for evaluating counterfactual explanations for time series classification. Loading Data ------------ tscf-eval provides utilities for loading time series data. From the UCR Archive ~~~~~~~~~~~~~~~~~~~~ The easiest way to get started is using the UCR Time Series Archive: .. code-block:: python from tscf_eval import UCRLoader loader = UCRLoader("ItalyPowerDemand") train_data = loader.load("train") test_data = loader.load("test") print(f"Train: {train_data.X.shape}, Test: {test_data.X.shape}") print(train_data.describe()) From NumPy Arrays ~~~~~~~~~~~~~~~~~ You can also create data containers from your own arrays: .. code-block:: python from tscf_eval import TSCData import numpy as np X = np.random.randn(100, 50) # 100 instances, 50 time points y = np.array([0] * 50 + [1] * 50) data = TSCData.from_arrays( name="my_dataset", split="train", X=X, # Shape: (n, T) or (n, C, T) y=y, # Shape: (n,) ) Basic Usage ----------- Evaluating Counterfactuals ~~~~~~~~~~~~~~~~~~~~~~~~~~ The core functionality of tscf-eval is evaluating counterfactual quality using the :class:`~tscf_eval.Evaluator` class: .. code-block:: python from sklearn.neighbors import KNeighborsClassifier from tscf_eval import ( Evaluator, Validity, Proximity, Sparsity, UCRLoader, NativeGuide, ) # Load data loader = UCRLoader("ItalyPowerDemand") train, test = loader.load("train"), loader.load("test") # Train classifier clf = KNeighborsClassifier(n_neighbors=3) clf.fit(train.X, train.y) # Generate counterfactuals using NativeGuide explainer = NativeGuide(clf, (train.X, train.y), method="blend") X, X_cf, y, y_cf = [], [], [], [] for x in test.X[:10]: cf, cf_label, _ = explainer.explain(x) X.append(x) X_cf.append(cf) y.append(clf.predict(x.reshape(1, -1))[0]) y_cf.append(cf_label) # Create evaluator with desired metrics evaluator = Evaluator([ Validity(), Proximity(p=2, distance="lp"), Proximity(distance="dtw"), # DTW-based proximity Sparsity(), ]) # Run evaluation results = evaluator.evaluate(X, X_cf, y=y, y_cf=y_cf) # Access results print(f"Validity: {results['validity_soft']:.2f}") print(f"Proximity (L2): {results['proximity_l2']:.2f}") print(f"Proximity (DTW): {results['proximity_dtw']:.2f}") print(f"Sparsity: {results['sparsity']:.2f}") Using a Classifier ~~~~~~~~~~~~~~~~~~ For metrics like Validity and Controllability, you can provide a fitted classifier instead of labels: .. code-block:: python from sklearn.neighbors import KNeighborsClassifier from tscf_eval import ( Evaluator, Validity, Proximity, Sparsity, UCRLoader, COMTE, ) # Load data loader = UCRLoader("ItalyPowerDemand") train, test = loader.load("train"), loader.load("test") # Train a classifier clf = KNeighborsClassifier(n_neighbors=3) clf.fit(train.X, train.y) # Generate counterfactuals using COMTE explainer = COMTE(clf, (train.X, train.y), distance="dtw") X, X_cf = [], [] for x in test.X[:10]: cf, _, _ = explainer.explain(x) X.append(x) X_cf.append(cf) # Evaluate using the classifier (labels inferred from model) evaluator = Evaluator([ Validity(), Proximity(p=2, distance="lp"), Proximity(distance="dtw"), Sparsity(), ]) results = evaluator.evaluate(X, X_cf, model=clf) Some metrics require additional inputs: - ``model``: Validity, Controllability, Confidence - ``X_train``: Plausibility, Diversity - ``time_per_instance``: Efficiency Available Metrics ----------------- tscf-eval provides 11 metric classes organized into six quality dimensions: .. list-table:: :header-rows: 1 :widths: 20 50 30 * - Metric - Description - Range * - Validity - Fraction of CFs that change prediction - [0, 1] * - Proximity(p) - Proximity score ``1 / (1 + d)``, where ``d`` is distance - [0, 1] * - Sparsity - Fraction of changed features - [0, 1] * - Plausibility - Outlier detection score - [0, 1] * - Diversity - DPP-based diversity score - [0, +inf) * - Controllability - Ease of reverting changes - [0, 1] * - Confidence - Model confidence statistics - dict * - Composition - Edit segment statistics - dict * - Contiguity - Edit contiguity score - [0, 1] * - Robustness - Lipschitz-like stability - [0, +inf) * - Efficiency - Mean time per instance - seconds Next Steps ---------- - See :doc:`examples` for more detailed usage examples - Explore the :doc:`api/evaluator` for all available metrics - Check :doc:`api/counterfactuals` for counterfactual generation methods