Examples
========

This guide provides comprehensive examples for common use cases with TSCFEval,
from generating counterfactuals to running benchmarks and analyzing results.

.. contents:: Table of Contents
   :local:
   :depth: 2


Generating Counterfactuals
--------------------------

TSCFEval provides 7 built-in counterfactual methods covering different generation
strategies: instance-based (NativeGuide, COMTE), evolutionary (TSEvo), gradient-based
(Glacier, LatentCF), saliency-based (CELS), and shapelet-based (SETS).

All methods follow a unified interface:

1. Initialize with a fitted classifier and training data tuple ``(X_train, y_train)``
2. Call ``explain(x)`` to generate a counterfactual for instance ``x``
3. Returns a tuple ``(cf, cf_label, meta)`` containing the counterfactual,
   its predicted label, and method-specific metadata

Using NativeGuide
~~~~~~~~~~~~~~~~~

NativeGuide is an instance-based method that generates counterfactuals by guiding
the original instance toward its nearest unlike neighbor (NUN) - the closest training
instance with a different predicted class. It supports four blending strategies:

- ``blend``: Linear interpolation toward NUN until prediction flips
- ``ng``: Native Guide with weighted averaging
- ``dtw_dba``: DTW Barycentric Averaging for time-series-aware blending
- ``cam``: Class Activation Map weighted guidance

.. code-block:: python

   from sklearn.neighbors import KNeighborsClassifier
   from tscf_eval import UCRLoader, NativeGuide

   # Load data and train classifier
   loader = UCRLoader("ItalyPowerDemand")
   train, test = loader.load("train"), loader.load("test")
   clf = KNeighborsClassifier(n_neighbors=3)
   clf.fit(train.X, train.y)

   # Create explainer (methods: "blend", "ng", "dtw_dba", "cam")
   explainer = NativeGuide(clf, (train.X, train.y), method="blend")

   # Generate counterfactual for a single instance
   x = test.X[0]
   cf, cf_label, meta = explainer.explain(x)

   print(f"Original prediction: {clf.predict(x.reshape(1, -1))[0]}")
   print(f"Counterfactual prediction: {cf_label}")

Using COMTE
~~~~~~~~~~~

COMTE (Counterfactual Multivariate Time-series Explanations) generates counterfactuals
by greedily substituting channels from a "distractor" series - a training instance
from a different class. It iteratively replaces channels until the prediction flips,
producing sparse, interpretable explanations that highlight which channels are most
important for the classification decision. Works with both univariate and multivariate
time series, using Euclidean or DTW distance for distractor selection:

.. code-block:: python

   from tscf_eval import UCRLoader, COMTE

   explainer = COMTE(clf, (train.X, train.y), distance="dtw")
   cf, cf_label, meta = explainer.explain(test.X[0])

Using TSEvo
~~~~~~~~~~~

TSEvo uses multi-objective evolutionary optimization (NSGA-II) to generate
counterfactuals that balance validity, proximity, and plausibility. It applies
mutation operators to evolve a population of candidate counterfactuals over
multiple generations. Three transformer types control how mutations are applied:

- ``authentic``: Mutations based on authentic patterns from training data
- ``frequency``: Frequency-domain perturbations
- ``gaussian``: Random Gaussian noise perturbations

.. code-block:: python

   from tscf_eval import UCRLoader, TSEvo

   # Transformers: "authentic", "frequency", "gaussian"
   explainer = TSEvo(clf, (train.X, train.y), transformer="authentic")
   cf, cf_label, meta = explainer.explain(test.X[0])

Using Glacier
~~~~~~~~~~~~~

Glacier (Guided Locally Constrained Counterfactual Explanations) uses gradient-based
optimization with importance-weighted proximity constraints. It optimizes in the input
space while penalizing changes to important time points more heavily. Requires a
differentiable classifier (e.g., neural networks). The ``weight_type`` parameter
controls how importance weights are computed:

- ``uniform``: Equal weight for all time points
- ``local``: Weights based on local gradients (instance-specific)
- ``global``: Weights based on global feature importance

.. code-block:: python

   from tscf_eval import UCRLoader, Glacier

   # Weight types: "uniform", "local", "global"
   explainer = Glacier(clf, (train.X, train.y), weight_type="uniform")
   cf, cf_label, meta = explainer.explain(test.X[0])

Using SETS and CELS
~~~~~~~~~~~~~~~~~~~

SETS and CELS use different strategies for identifying discriminative regions:

- **SETS** (Shapelet-based Explanations for Time Series): Identifies class-discriminative
  shapelets and generates counterfactuals by manipulating these subsequences. Produces
  contiguous, localized perturbations that are often more interpretable.

- **CELS** (Counterfactual Explanations via Learned Saliency): Uses learned saliency maps
  to identify important time points, then blends the original instance with its nearest
  unlike neighbor weighted by the saliency scores. Produces smooth counterfactuals that
  focus changes on the most discriminative regions.

.. code-block:: python

   from tscf_eval import UCRLoader, SETS, CELS

   # SETS: Shapelet-based explanations
   explainer_sets = SETS(clf, (train.X, train.y))
   cf, cf_label, meta = explainer_sets.explain(test.X[0])

   # CELS: Saliency map blending
   explainer_cels = CELS(clf, (train.X, train.y))
   cf, cf_label, meta = explainer_cels.explain(test.X[0])


Evaluating Counterfactuals
--------------------------

TSCFEval provides 11 metrics across 6 quality dimensions for comprehensive
counterfactual evaluation:

1. **Core Quality**: Validity, Proximity, Sparsity
2. **Distribution Alignment**: Plausibility, Diversity
3. **Structural Properties**: Contiguity, Composition
4. **Model Behavior**: Confidence, Controllability
5. **Stability**: Robustness
6. **Performance**: Efficiency

The ``Evaluator`` class provides a flexible interface for computing any combination
of these metrics. Each metric has specific requirements (e.g., some need the model,
others need training data) which are detailed in the API reference.

Basic Evaluation
~~~~~~~~~~~~~~~~

The core metrics (Validity, Proximity, Sparsity) measure fundamental counterfactual
quality. Validity checks if the prediction changed, Proximity measures how close
the counterfactual is to the original, and Sparsity quantifies the fraction of
changed features:

.. code-block:: python

   from sklearn.neighbors import KNeighborsClassifier
   from tscf_eval import UCRLoader, NativeGuide
   from tscf_eval.evaluator import Evaluator, Validity, Proximity, Sparsity

   # Load data and train classifier
   loader = UCRLoader("ItalyPowerDemand")
   train, test = loader.load("train"), loader.load("test")
   clf = KNeighborsClassifier(n_neighbors=3)
   clf.fit(train.X, train.y)

   # Generate counterfactuals
   explainer = NativeGuide(clf, (train.X, train.y), method="blend")
   X, X_cf, y, y_cf = [], [], [], []
   for x in test.X[:10]:
       cf, cf_label, _ = explainer.explain(x)
       X.append(x)
       X_cf.append(cf)
       y.append(clf.predict(x.reshape(1, -1))[0])
       y_cf.append(cf_label)

   # Create evaluator
   evaluator = Evaluator([
       Validity(),
       Proximity(p=2, distance="lp"),
       Proximity(distance="dtw"),
       Sparsity(),
   ])

   # Evaluate
   results = evaluator.evaluate(X, X_cf, y=y, y_cf=y_cf)

   print(f"Validity: {results['validity_soft']:.2%}")
   print(f"Proximity (L2): {results['proximity_l2']:.4f}")
   print(f"Proximity (DTW): {results['proximity_dtw']:.4f}")
   print(f"Sparsity: {results['sparsity']:.2%}")

Using Model-Dependent Metrics
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Some metrics require access to the classifier to compute their values:

- **Validity**: When labels aren't provided, predictions are inferred from the model
- **Controllability**: Measures how easily the counterfactual changes can be reverted
  by modifying a single feature (requires making predictions on modified instances)
- **Confidence**: Reports the model's predicted probabilities for both the original
  instance and the counterfactual (requires ``predict_proba``)

.. code-block:: python

   from tscf_eval.evaluator import (
       Evaluator, Validity, Proximity, Sparsity,
       Controllability, Confidence
   )

   evaluator = Evaluator([
       Validity(),
       Proximity(distance="dtw"),
       Sparsity(),
       Controllability(),
       Confidence(),
   ])

   # Pass the model to evaluate()
   results = evaluator.evaluate(X, X_cf, model=clf, X_train=train.X)

   print(f"Validity: {results['validity_soft']:.2%}")
   print(f"Controllability: {results['controllability']:.4f}")
   print(f"Mean CF confidence: {results['mean_conf_cf']:.4f}")

Using Distribution Metrics
~~~~~~~~~~~~~~~~~~~~~~~~~~

Distribution metrics assess whether counterfactuals are realistic and diverse:

- **Plausibility**: Measures whether counterfactuals lie within the training data
  distribution using outlier detection. High plausibility means the counterfactual
  resembles real training instances. Methods include LOF (Local Outlier Factor),
  Isolation Forest, and DTW-based LOF for time-series-aware detection.

- **Diversity**: When generating multiple counterfactuals per instance, measures
  the variety among them using Determinantal Point Processes (DPP). Higher diversity
  means the counterfactuals explore different regions of the feature space.

Both metrics require ``X_train`` to be passed to ``evaluate()``:

.. code-block:: python

   from tscf_eval.evaluator import (
       Evaluator, Plausibility, Diversity, Contiguity
   )

   evaluator = Evaluator([
       Plausibility(method="lof"),       # Local Outlier Factor
       Plausibility(method="dtw_lof"),   # DTW-based LOF
       Diversity(distance="euclidean"),
       Diversity(distance="dtw"),
       Contiguity(),
   ])

   # Pass X_train for distribution metrics
   results = evaluator.evaluate(X, X_cf, y=y, y_cf=y_cf, X_train=train.X)

Measuring Efficiency
~~~~~~~~~~~~~~~~~~~~

The Efficiency metric tracks how long it takes to generate each counterfactual.
This is important for comparing methods in practical applications where generation
time matters. You must measure the time yourself and pass it to the evaluator:

.. code-block:: python

   import time
   from tscf_eval import TSEvo
   from tscf_eval.evaluator import Evaluator, Validity, Proximity, Efficiency

   explainer = TSEvo(clf, (train.X, train.y), transformer="authentic")
   X, X_cf, times = [], [], []
   for x in test.X[:5]:
       start = time.perf_counter()
       cf, _, _ = explainer.explain(x)
       times.append(time.perf_counter() - start)
       X.append(x)
       X_cf.append(cf)

   evaluator = Evaluator([Validity(), Proximity(distance="dtw"), Efficiency()])
   results = evaluator.evaluate(X, X_cf, model=clf, time_per_instance=times)

   print(f"Mean time: {results['efficiency_time_s']:.4f}s")

Full Evaluation with All Metrics
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

For comprehensive evaluation, you can use all available metrics together. Note that
this requires providing all optional parameters (``model``, ``X_train``, ``y``, ``y_cf``,
``time_per_instance``) to satisfy each metric's requirements:

.. code-block:: python

   import time
   from tscf_eval import UCRLoader, Glacier
   from tscf_eval.evaluator import (
       Evaluator, Validity, Proximity, Sparsity,
       Plausibility, Diversity, Controllability, Confidence,
       Composition, Contiguity, Robustness, Efficiency
   )

   evaluator = Evaluator([
       # Core
       Validity(),
       Proximity(p=2, distance="lp"),
       Proximity(distance="dtw"),
       Sparsity(),
       # Distribution
       Plausibility(method="lof"),
       Plausibility(method="dtw_lof"),
       Diversity(distance="dtw"),
       # Model behavior
       Controllability(),
       Confidence(),
       # Structure
       Composition(),
       Contiguity(),
       # Stability and performance
       Robustness(distance="dtw"),
       Efficiency(),
   ])

   results = evaluator.evaluate(
       X, X_cf,
       model=clf,
       X_train=train.X,
       y=y,
       y_cf=y_cf,
       time_per_instance=times,
   )


Running Benchmarks
------------------

The ``BenchmarkRunner`` class provides a structured framework for systematically
comparing counterfactual methods. It handles:

- **Instance selection**: Random or confidence-stratified sampling of test instances
- **Parallel execution**: Run multiple explainers in parallel with ``n_jobs``
- **Progress tracking**: Built-in progress bars with tqdm
- **Result aggregation**: Aggregate results by explainer, dataset, or model

TSCFEval supports three benchmarking scenarios:

1. **Single dataset, multiple CF methods**: Compare explainer algorithms on a fixed dataset
2. **Single dataset, multiple classifiers**: Study how the classifier affects CF quality
3. **Multiple datasets, fixed classifier**: Assess generalization across datasets

Single-Dataset Benchmark
~~~~~~~~~~~~~~~~~~~~~~~~

The most common scenario: compare multiple counterfactual methods on a single dataset
with a fixed classifier. Use ``instance_selection="stratified_confidence"`` to ensure
coverage of both high-confidence and uncertain instances near the decision boundary:

.. code-block:: python

   from sklearn.neighbors import KNeighborsClassifier
   from tscf_eval import Evaluator, Validity, Proximity, Sparsity
   from tscf_eval.benchmark import (
       BenchmarkRunner, DatasetConfig, ModelConfig, ExplainerConfig,
   )
   from tscf_eval.counterfactuals import COMTE, NativeGuide, Glacier
   from tscf_eval.data_loader import UCRLoader

   # Load data
   loader = UCRLoader("ItalyPowerDemand")
   train, test = loader.load("train"), loader.load("test")

   # Train classifier
   clf = KNeighborsClassifier(n_neighbors=3)
   clf.fit(train.X, train.y)

   # Configure explainers
   explainer_configs = [
       ExplainerConfig("comte", COMTE, {"distance": "dtw"}),
       ExplainerConfig("ng_blend", NativeGuide, {"method": "blend"}),
       ExplainerConfig("glacier", Glacier, {"weight_type": "uniform"}),
   ]

   # Configure evaluator
   evaluator = Evaluator([
       Validity(),
       Proximity(distance="dtw"),
       Sparsity(),
   ])

   # Run benchmark
   runner = BenchmarkRunner(
       datasets=[DatasetConfig("ItalyPowerDemand", train.X, train.y, test.X, test.y)],
       models=[ModelConfig("knn", clf)],
       explainers=explainer_configs,
       evaluator=evaluator,
       n_instances=20,
       instance_selection="stratified_confidence",
       verbose=True,
   )
   results = runner.run()

   # View results
   print(results.to_dataframe())
   print(results.aggregate(by="explainer"))

Multi-Dataset Benchmark
~~~~~~~~~~~~~~~~~~~~~~~

To assess how well counterfactual methods generalize, run benchmarks across multiple
datasets. This enables statistical testing (e.g., Friedman test) to determine if
performance differences are significant across problem domains:

.. code-block:: python

   from tscf_eval.benchmark import (
       BenchmarkRunner, DatasetConfig, ModelConfig, ExplainerConfig,
   )
   from tscf_eval.counterfactuals import COMTE, NativeGuide
   from tscf_eval.data_loader import UCRLoader

   # Load datasets and train models
   dataset_names = ["ItalyPowerDemand", "GunPoint", "ECG200"]
   datasets, model_configs = [], []

   for name in dataset_names:
       loader = UCRLoader(name)
       train, test = loader.load("train"), loader.load("test")
       datasets.append(DatasetConfig(name, train.X, train.y, test.X, test.y))

       clf = KNeighborsClassifier(n_neighbors=3)
       clf.fit(train.X, train.y)
       model_configs.append(ModelConfig("knn", clf))

   # Run benchmark
   runner = BenchmarkRunner(
       datasets=datasets,
       models=model_configs,
       explainers=[
           ExplainerConfig("comte", COMTE, {"distance": "dtw"}),
           ExplainerConfig("ng_blend", NativeGuide, {"method": "blend"}),
       ],
       n_instances=10,
       n_jobs=-1,  # Parallel execution
       verbose=True,
   )
   results = runner.run()

   # Aggregate across datasets
   print(results.aggregate(by="explainer"))


Analyzing Results
-----------------

Counterfactual evaluation is inherently multi-objective: high validity may come at
the cost of low proximity, and sparse explanations may sacrifice plausibility.
TSCFEval provides tools for principled multi-criteria analysis.

Pareto Analysis
~~~~~~~~~~~~~~~

Pareto analysis identifies methods that are not dominated by any other method on
the selected metrics. A method is Pareto-optimal if no other method is better on
all metrics simultaneously. This avoids the need to specify metric weights upfront:

.. code-block:: python

   from tscf_eval.benchmark import ParetoAnalyzer

   analyzer = ParetoAnalyzer(metrics=[
       "validity_soft", "proximity_dtw", "sparsity",
   ])

   # Find non-dominated methods
   pareto_methods = analyzer.pareto_front(results)
   print(f"Pareto-optimal: {pareto_methods}")

   # Full ranking table
   print(analyzer.dominance_ranking(results))

   # Export to LaTeX
   latex = analyzer.to_latex(results, caption="Results", label="tab:results")

Visualizing Pareto Fronts
~~~~~~~~~~~~~~~~~~~~~~~~~

Pareto front visualizations help understand the trade-offs between metrics.
The 2D plot shows which methods lie on the Pareto front (non-dominated solutions)
for any pair of metrics. Consistency heatmaps show how often each method appears
on the Pareto front across different datasets:

.. code-block:: python

   import matplotlib.pyplot as plt

   # 2D Pareto front plot
   ax = analyzer.plot_front(
       results,
       x_metric="proximity_dtw",
       y_metric="validity_soft",
       annotate=True,
   )
   plt.savefig("pareto_front.png")

   # Cross-dataset consistency heatmap
   results_by_dataset = {
       ds: results.filter(datasets=[ds])
       for ds in results.datasets
   }
   consistency_df = analyzer.consistency(results_by_dataset)
   analyzer.plot_consistency_heatmap(consistency_df)
   plt.savefig("consistency.png")

Weighted Scalarization
~~~~~~~~~~~~~~~~~~~~~~

When you need a single ranking of methods, weighted scalarization combines metrics
into a composite score. Each metric is min-max normalized to [0, 1] with direction
awareness (maximize metrics are higher-is-better, minimize metrics are inverted),
then combined via weighted sum. This enables customizable rankings based on your
priorities:

.. code-block:: python

   from tscf_eval.benchmark import WeightedScalarizer

   # Equal weights
   scalarizer = WeightedScalarizer(metrics=[
       "validity_soft", "proximity_dtw", "sparsity",
   ])
   print(scalarizer.score(results))

   # Custom weights
   scalarizer = WeightedScalarizer(
       metrics=["validity_soft", "proximity_dtw", "sparsity"],
       weights={"validity_soft": 3.0, "proximity_dtw": 1.0, "sparsity": 1.0},
   )

   # Sensitivity analysis
   sens_df = scalarizer.sensitivity(results, vary_metric="validity_soft", n_steps=11)
   scalarizer.plot_sensitivity(sens_df)

Statistical Testing
~~~~~~~~~~~~~~~~~~~

When benchmarking across multiple datasets, the Friedman test determines if there
are statistically significant differences between methods. It's a non-parametric
alternative to repeated-measures ANOVA, ranking methods within each dataset and
testing if the average ranks differ significantly:

.. code-block:: python

   from tscf_eval.benchmark import friedman_test

   fr = friedman_test(results, metric="validity_soft")
   print(f"Statistic: {fr.statistic:.3f}, p-value: {fr.p_value:.4f}")
   print(fr.rankings)


Extending TSCFEval
------------------

TSCFEval is designed to be extensible. You can add your own counterfactual methods
and evaluation metrics that integrate seamlessly with the benchmarking framework.

Custom Counterfactual Method
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To add a new counterfactual method, inherit from the ``Counterfactual`` base class
and implement the ``explain`` method. The method receives a single instance ``x``
and returns a tuple ``(cf, cf_label, meta)``:

- ``cf``: The generated counterfactual (same shape as input)
- ``cf_label``: The predicted class label for the counterfactual
- ``meta``: A dictionary with method-specific metadata (e.g., generation parameters)

Here's an example of a simple interpolation-based method:

.. code-block:: python

   import numpy as np
   from tscf_eval.counterfactuals import Counterfactual

   class MyCounterfactual(Counterfactual):
       """Custom counterfactual using nearest unlike neighbor interpolation."""

       def __init__(self, model, data, n_steps=50):
           self.model = model
           self.X_train, self.y_train = data
           self.n_steps = n_steps

       def explain(self, x, y_pred=None):
           x = np.asarray(x).squeeze()
           if y_pred is None:
               y_pred = int(self.model.predict(x.reshape(1, -1))[0])

           # Find nearest unlike neighbor
           preds = self.model.predict(self.X_train)
           unlike_mask = preds != y_pred
           unlike_samples = self.X_train[unlike_mask]
           distances = np.linalg.norm(
               unlike_samples.reshape(len(unlike_samples), -1) - x.flatten(),
               axis=1
           )
           target = unlike_samples[np.argmin(distances)]

           # Interpolate toward target until prediction flips
           cf = x.copy()
           for i in range(self.n_steps):
               alpha = (i + 1) / self.n_steps
               cf = (1 - alpha) * x + alpha * target.squeeze()
               cf_label = int(self.model.predict(cf.reshape(1, -1))[0])
               if cf_label != y_pred:
                   break

           meta = {"method": "my_cf", "steps": i + 1, "alpha": alpha}
           return cf, cf_label, meta

   # Use in benchmarks
   from tscf_eval.benchmark import ExplainerConfig

   config = ExplainerConfig("my_method", MyCounterfactual, {"n_steps": 50})

Custom Evaluation Metric
~~~~~~~~~~~~~~~~~~~~~~~~

To add a new evaluation metric, inherit from the ``Metric`` base class and implement:

- ``name()``: Returns the metric key used in results dictionaries
- ``compute(X, X_cf, **kwargs)``: Computes and returns the metric value

The ``compute`` method receives the original instances ``X``, counterfactuals ``X_cf``,
and any additional keyword arguments passed to ``evaluate()`` (e.g., ``model``,
``X_train``, ``y``, ``y_cf``). Here's an example metric that measures the maximum
per-instance change:

.. code-block:: python

   import numpy as np
   from tscf_eval.evaluator import Metric

   class MaxChangeMetric(Metric):
       """Fraction of instances where max change exceeds threshold."""

       def __init__(self, threshold=0.1):
           self.threshold = threshold

       def name(self):
           return f"max_change_t{self.threshold}"

       def compute(self, X, X_cf, **kwargs):
           diff = np.abs(np.array(X) - np.array(X_cf))
           max_changes = np.max(diff.reshape(len(X), -1), axis=1)
           return float(np.mean(max_changes > self.threshold))

   # Use in evaluator
   from tscf_eval.evaluator import Evaluator, Validity

   evaluator = Evaluator([
       Validity(),
       MaxChangeMetric(threshold=0.1),
       MaxChangeMetric(threshold=0.5),
   ])


Complete Workflow
-----------------

This end-to-end example demonstrates a typical TSCFEval workflow: loading data,
training a classifier, running a benchmark, and analyzing results with multiple
analysis tools. The results are saved to JSON for later analysis or visualization:

.. code-block:: python

   import json
   from sklearn.neighbors import KNeighborsClassifier
   from tscf_eval import UCRLoader
   from tscf_eval.counterfactuals import COMTE, NativeGuide
   from tscf_eval.evaluator import Evaluator, Validity, Proximity, Sparsity
   from tscf_eval.benchmark import (
       BenchmarkRunner, DatasetConfig, ModelConfig, ExplainerConfig,
       ParetoAnalyzer, WeightedScalarizer, friedman_test,
   )

   # 1. Load data
   loader = UCRLoader("ItalyPowerDemand")
   train, test = loader.load("train"), loader.load("test")

   # 2. Train classifier
   clf = KNeighborsClassifier(n_neighbors=5)
   clf.fit(train.X, train.y)

   # 3. Run benchmark
   runner = BenchmarkRunner(
       datasets=[DatasetConfig("ItalyPowerDemand", train.X, train.y, test.X, test.y)],
       models=[ModelConfig("knn", clf)],
       explainers=[
           ExplainerConfig("comte", COMTE, {"distance": "dtw"}),
           ExplainerConfig("ng_blend", NativeGuide, {"method": "blend"}),
       ],
       evaluator=Evaluator([Validity(), Proximity(distance="dtw"), Sparsity()]),
       n_instances=20,
       instance_selection="stratified_confidence",
       verbose=True,
   )
   results = runner.run()

   # 4. View results
   print(results.to_dataframe())
   print(results.aggregate(by="explainer"))

   # 5. Pareto analysis
   analyzer = ParetoAnalyzer(metrics=["validity_soft", "proximity_dtw", "sparsity"])
   print(f"Pareto-optimal: {analyzer.pareto_front(results)}")
   print(analyzer.dominance_ranking(results))

   # 6. Weighted ranking
   scalarizer = WeightedScalarizer(metrics=["validity_soft", "proximity_dtw", "sparsity"])
   print(scalarizer.score(results))

   # 7. Save results
   with open("results.json", "w") as f:
       json.dump(results.to_dict(), f, indent=2)