verskyt.interventions ¶

__init__(layer_name: str, prototype_index: int, vector: Tensor, layer_ref: TverskyProjectionLayer | TverskySimilarityLayer) → None¶

class FeatureInfo(layer_name: str, feature_index: int, vector: Tensor, layer_ref: TverskyProjectionLayer | TverskySimilarityLayer)[source]¶

Bases: object

Information about a feature in a TNN layer.

Contains metadata and vector data for a single feature, enabling inspection and modification of learned feature representations.

layer_name¶

Name of the layer containing this feature.

Type:: str

feature_index¶

Index of the feature within the layer’s feature bank.

Type:: int

vector¶

The feature vector data.

Type:: torch.Tensor

layer_ref¶

Reference to the layer object.

Type:: Union[TverskyProjectionLayer, TverskySimilarityLayer]

layer_name: str¶

feature_index: int¶

vector: Tensor¶

layer_ref: TverskyProjectionLayer | TverskySimilarityLayer¶

property shape: Size¶

Get the shape of the feature vector.

Returns:: Shape of the feature vector, typically [in_features].
Return type:: torch.Size

property norm: float¶

Get the L2 norm of the feature vector.

Returns:

L2 norm of the feature vector, useful for comparing: feature magnitudes and analyzing learned representations.

Return type:

__init__(layer_name: str, feature_index: int, vector: Tensor, layer_ref: TverskyProjectionLayer | TverskySimilarityLayer) → None¶

class InterventionManager(model: Module, model_name: str = 'TNN_Model')[source]¶

Bases: object

Manager for interventions on Tversky Neural Networks.

Provides a unified API for inspecting, modifying, and analyzing TNN models to enable interpretability research and counterfactual analysis. Supports tracking of interventions and restoration of original model states.

This class serves as the central hub for TNN interpretability, offering: - Comprehensive prototype and feature discovery across all layers - Safe parameter modification with automatic state tracking - Integration with impact assessment and grounding frameworks - Batch operations for systematic intervention studies

Note

The manager automatically discovers TNN layers (TverskyProjectionLayer and TverskySimilarityLayer) within the provided model and maintains original parameter states for restoration.

__init__(model: Module, model_name: str = 'TNN_Model')[source]¶

Initialize InterventionManager for a TNN model.

Automatically discovers all TNN layers within the model and captures the original parameter state for later restoration.

Parameters:

model (nn.Module) – PyTorch model containing TverskyProjectionLayer or TverskySimilarityLayer instances.
model_name (str, optional) – Human-readable name for the model. Defaults to “TNN_Model”.

Note

The manager will only operate on TverskyProjectionLayer and TverskySimilarityLayer instances found within the model.

property num_layers: int¶

Get the number of TNN layers in the model.

Returns:

Total count of TverskyProjectionLayer and TverskySimilarityLayer: instances found in the model.

Return type:

int

property layer_names: List[str]¶

Get names of all TNN layers in the model.

Returns:

List of layer names that can be used with other: manager methods for layer-specific operations.

Return type:

List[str]

get_layer_info(layer_name: str) → Dict[str, Any][source]¶

Get comprehensive information about a TNN layer.

Provides detailed metadata about layer configuration, parameter shapes, and capabilities for inspection and intervention planning.

Parameters:

layer_name (str) – Name of the layer to inspect. Must be one of the names returned by the layer_names property.

Returns:

Dictionary containing layer metadata including:

layer_name: Name of the layer
layer_type: Class name of the layer
in_features: Input feature dimension
num_prototypes: Number of prototypes (if applicable)
num_features: Number of features (if applicable)
learnable_ab: Whether alpha/beta are learnable (if applicable)

Return type:

Dict[str, Any]

Raises:

ValueError – If layer_name is not found in the model.

list_prototypes(layer_name: str | None = None) → List[PrototypeInfo][source]¶

List all prototypes in the model or specific layer.

Discovers and returns metadata for all prototype vectors across TNN layers, enabling systematic inspection and analysis.

Parameters:

layer_name (Optional[str], optional) – If specified, only return prototypes from this layer. If None, returns prototypes from all layers. Defaults to None.

Returns:

List of PrototypeInfo objects containing: prototype vectors and metadata. Each object provides access to the prototype vector, layer reference, and computed properties.

Return type:

List[PrototypeInfo]

Note

Only layers with ‘prototypes’ attribute (typically TverskyProjectionLayer) will contribute to the returned list.

list_features(layer_name: str | None = None) → List[FeatureInfo][source]¶

List all features in the model or specific layer.

Discovers and returns metadata for all feature vectors across TNN layers, enabling systematic inspection and analysis of the learned feature representations.

Parameters:

layer_name (Optional[str], optional) – If specified, only return features from this layer. If None, returns features from all layers. Defaults to None.

Returns:

List of FeatureInfo objects containing: feature vectors and metadata. Each object provides access to the feature vector, layer reference, and computed properties.

Return type:

List[FeatureInfo]

Note

Only layers with ‘feature_bank’ attribute will contribute to the returned list. This typically includes both TverskyProjectionLayer and TverskySimilarityLayer instances.

get_prototype(layer_name: str, prototype_index: int) → PrototypeInfo[source]¶

Get specific prototype information.

Retrieves detailed information about a single prototype vector, including its current values and layer context.

Parameters:

layer_name (str) – Name of the layer containing the prototype. Must be one of the names returned by layer_names.
prototype_index (int) – Index of the prototype within the layer. Must be in range [0, num_prototypes).

Returns:

Object containing the prototype vector, metadata,: and layer reference for further operations.

Return type:

Raises:

ValueError – If layer_name is not found or layer has no prototypes.
IndexError – If prototype_index is out of bounds.

get_feature(layer_name: str, feature_index: int) → FeatureInfo[source]¶

Get specific feature information.

Retrieves detailed information about a single feature vector, including its current values and layer context.

Parameters:

layer_name (str) – Name of the layer containing the feature. Must be one of the names returned by layer_names.
feature_index (int) – Index of the feature within the layer’s feature bank. Must be in range [0, num_features).

Returns:

Object containing the feature vector, metadata,: and layer reference for further operations.

Return type:

Raises:

ValueError – If layer_name is not found or layer has no feature bank.
IndexError – If feature_index is out of bounds.

modify_prototype(layer_name: str, prototype_index: int, new_vector: Tensor, track_intervention: bool = True) → PrototypeInfo[source]¶

Modify a prototype vector in a TNN layer.

Safely modifies a prototype vector with automatic validation and optional intervention tracking for impact assessment and restoration.

Parameters:

layer_name (str) – Name of the layer containing the prototype. Must be one of the names returned by layer_names.
prototype_index (int) – Index of the prototype to modify. Must be in range [0, num_prototypes).
new_vector (torch.Tensor) – New prototype vector to set. Must match the shape of the existing prototype.
track_intervention (bool, optional) – Whether to record this intervention in the history for impact assessment. Defaults to True.

Returns:

Updated PrototypeInfo object reflecting the: new prototype vector state.

Return type:

Raises:

ValueError – If layer_name is not found, layer has no prototypes, or new_vector shape doesn’t match expected dimensions.
IndexError – If prototype_index is out of bounds.

Note

When track_intervention=True, the original vector is stored for potential restoration via reset_to_original().

modify_feature(layer_name: str, feature_index: int, new_vector: Tensor, track_intervention: bool = True) → FeatureInfo[source]¶

Modify a feature vector in a TNN layer.

Safely modifies a feature vector with automatic validation and optional intervention tracking for impact assessment and restoration.

Parameters:

layer_name (str) – Name of the layer containing the feature. Must be one of the names returned by layer_names.
feature_index (int) – Index of the feature to modify within the layer’s feature bank. Must be in range [0, num_features).
new_vector (torch.Tensor) – New feature vector to set. Must match the shape of the existing feature.
track_intervention (bool, optional) – Whether to record this intervention in the history for impact assessment. Defaults to True.

Returns:

Updated FeatureInfo object reflecting the: new feature vector state.

Return type:

Raises:

ValueError – If layer_name is not found, layer has no feature bank, or new_vector shape doesn’t match expected dimensions.
IndexError – If feature_index is out of bounds.

Note

When track_intervention=True, the original vector is stored for potential restoration via reset_to_original().

reset_to_original() → None[source]¶

Reset all TNN layers to their original state.

Restores all prototype vectors, feature vectors, and learnable parameters (alpha, beta) to their values at manager initialization. Also clears the intervention history.

Note

This operation cannot be undone. All modifications made through modify_prototype() and modify_feature() will be reverted to the original model state.

get_intervention_history() → List[Dict[str, Any]][source]¶

Get history of all interventions performed.

Returns a copy of the intervention history containing detailed records of all modifications made through this manager.

Returns:

List of intervention records, each containing:

type: Type of intervention (‘prototype_modification’
or ‘feature_modification’)
layer_name: Name of the affected layer
index: Index of the modified parameter
original_vector: Original parameter vector (cloned)
new_vector: New parameter vector (cloned)
timestamp: Sequential intervention number

Return type:

List[Dict[str, Any]]

summary() → str[source]¶

Get a summary of the model and available interventions.

Provides a comprehensive overview of the model structure, TNN layers, and intervention capabilities for inspection.

Returns:

Multi-line summary string containing:

Model name and layer count
Detailed information for each TNN layer
Number of prototypes and features per layer
Parameter values (alpha, beta, theta)
Total intervention count

Return type:

str

Classes¶

InterventionManager¶

class InterventionManager(model: Module, model_name: str = 'TNN_Model')[source]¶

Manager for interventions on Tversky Neural Networks.

Note

The manager automatically discovers TNN layers (TverskyProjectionLayer and TverskySimilarityLayer) within the provided model and maintains original parameter states for restoration.

__init__(model: Module, model_name: str = 'TNN_Model')[source]¶

Initialize InterventionManager for a TNN model.

Automatically discovers all TNN layers within the model and captures the original parameter state for later restoration.

Parameters:

model (nn.Module) – PyTorch model containing TverskyProjectionLayer or TverskySimilarityLayer instances.
model_name (str, optional) – Human-readable name for the model. Defaults to “TNN_Model”.

Note

The manager will only operate on TverskyProjectionLayer and TverskySimilarityLayer instances found within the model.

property num_layers: int¶

Get the number of TNN layers in the model.

Returns:

Total count of TverskyProjectionLayer and TverskySimilarityLayer: instances found in the model.

Return type:

int

property layer_names: List[str]¶

Get names of all TNN layers in the model.

Returns:

List of layer names that can be used with other: manager methods for layer-specific operations.

Return type:

List[str]

get_layer_info(layer_name: str) → Dict[str, Any][source]¶

Get comprehensive information about a TNN layer.

Provides detailed metadata about layer configuration, parameter shapes, and capabilities for inspection and intervention planning.

Parameters:

layer_name (str) – Name of the layer to inspect. Must be one of the names returned by the layer_names property.

Returns:

Dictionary containing layer metadata including:

layer_name: Name of the layer
layer_type: Class name of the layer
in_features: Input feature dimension
num_prototypes: Number of prototypes (if applicable)
num_features: Number of features (if applicable)
learnable_ab: Whether alpha/beta are learnable (if applicable)

Return type:

Dict[str, Any]

Raises:

ValueError – If layer_name is not found in the model.

list_prototypes(layer_name: str | None = None) → List[PrototypeInfo][source]¶

List all prototypes in the model or specific layer.

Discovers and returns metadata for all prototype vectors across TNN layers, enabling systematic inspection and analysis.

Parameters:

layer_name (Optional[str], optional) – If specified, only return prototypes from this layer. If None, returns prototypes from all layers. Defaults to None.

Returns:

List of PrototypeInfo objects containing: prototype vectors and metadata. Each object provides access to the prototype vector, layer reference, and computed properties.

Return type:

List[PrototypeInfo]

Note

Only layers with ‘prototypes’ attribute (typically TverskyProjectionLayer) will contribute to the returned list.

list_features(layer_name: str | None = None) → List[FeatureInfo][source]¶

List all features in the model or specific layer.

Discovers and returns metadata for all feature vectors across TNN layers, enabling systematic inspection and analysis of the learned feature representations.

Parameters:

layer_name (Optional[str], optional) – If specified, only return features from this layer. If None, returns features from all layers. Defaults to None.

Returns:

List of FeatureInfo objects containing: feature vectors and metadata. Each object provides access to the feature vector, layer reference, and computed properties.

Return type:

List[FeatureInfo]

Note

Only layers with ‘feature_bank’ attribute will contribute to the returned list. This typically includes both TverskyProjectionLayer and TverskySimilarityLayer instances.

get_prototype(layer_name: str, prototype_index: int) → PrototypeInfo[source]¶

Get specific prototype information.

Retrieves detailed information about a single prototype vector, including its current values and layer context.

Parameters:

layer_name (str) – Name of the layer containing the prototype. Must be one of the names returned by layer_names.
prototype_index (int) – Index of the prototype within the layer. Must be in range [0, num_prototypes).

Returns:

Object containing the prototype vector, metadata,: and layer reference for further operations.

Return type:

Raises:

ValueError – If layer_name is not found or layer has no prototypes.
IndexError – If prototype_index is out of bounds.

get_feature(layer_name: str, feature_index: int) → FeatureInfo[source]¶

Get specific feature information.

Retrieves detailed information about a single feature vector, including its current values and layer context.

Parameters:

layer_name (str) – Name of the layer containing the feature. Must be one of the names returned by layer_names.
feature_index (int) – Index of the feature within the layer’s feature bank. Must be in range [0, num_features).

Returns:

Object containing the feature vector, metadata,: and layer reference for further operations.

Return type:

Raises:

ValueError – If layer_name is not found or layer has no feature bank.
IndexError – If feature_index is out of bounds.

modify_prototype(layer_name: str, prototype_index: int, new_vector: Tensor, track_intervention: bool = True) → PrototypeInfo[source]¶

Modify a prototype vector in a TNN layer.

Safely modifies a prototype vector with automatic validation and optional intervention tracking for impact assessment and restoration.

Parameters:

layer_name (str) – Name of the layer containing the prototype. Must be one of the names returned by layer_names.
prototype_index (int) – Index of the prototype to modify. Must be in range [0, num_prototypes).
new_vector (torch.Tensor) – New prototype vector to set. Must match the shape of the existing prototype.
track_intervention (bool, optional) – Whether to record this intervention in the history for impact assessment. Defaults to True.

Returns:

Updated PrototypeInfo object reflecting the: new prototype vector state.

Return type:

Raises:

ValueError – If layer_name is not found, layer has no prototypes, or new_vector shape doesn’t match expected dimensions.
IndexError – If prototype_index is out of bounds.

Note

When track_intervention=True, the original vector is stored for potential restoration via reset_to_original().

modify_feature(layer_name: str, feature_index: int, new_vector: Tensor, track_intervention: bool = True) → FeatureInfo[source]¶

Modify a feature vector in a TNN layer.

Safely modifies a feature vector with automatic validation and optional intervention tracking for impact assessment and restoration.

Parameters:

layer_name (str) – Name of the layer containing the feature. Must be one of the names returned by layer_names.
feature_index (int) – Index of the feature to modify within the layer’s feature bank. Must be in range [0, num_features).
new_vector (torch.Tensor) – New feature vector to set. Must match the shape of the existing feature.
track_intervention (bool, optional) – Whether to record this intervention in the history for impact assessment. Defaults to True.

Returns:

Updated FeatureInfo object reflecting the: new feature vector state.

Return type:

Raises:

ValueError – If layer_name is not found, layer has no feature bank, or new_vector shape doesn’t match expected dimensions.
IndexError – If feature_index is out of bounds.

Note

When track_intervention=True, the original vector is stored for potential restoration via reset_to_original().

reset_to_original() → None[source]¶

Reset all TNN layers to their original state.

Restores all prototype vectors, feature vectors, and learnable parameters (alpha, beta) to their values at manager initialization. Also clears the intervention history.

Note

This operation cannot be undone. All modifications made through modify_prototype() and modify_feature() will be reverted to the original model state.

get_intervention_history() → List[Dict[str, Any]][source]¶

Get history of all interventions performed.

Returns a copy of the intervention history containing detailed records of all modifications made through this manager.

Returns:

List of intervention records, each containing:

type: Type of intervention (‘prototype_modification’
or ‘feature_modification’)
layer_name: Name of the affected layer
index: Index of the modified parameter
original_vector: Original parameter vector (cloned)
new_vector: New parameter vector (cloned)
timestamp: Sequential intervention number

Return type:

List[Dict[str, Any]]

summary() → str[source]¶

Get a summary of the model and available interventions.

Provides a comprehensive overview of the model structure, TNN layers, and intervention capabilities for inspection.

Returns:

Multi-line summary string containing:

Model name and layer count
Detailed information for each TNN layer
Number of prototypes and features per layer
Parameter values (alpha, beta, theta)
Total intervention count

Return type:

str

PrototypeInfo¶

class PrototypeInfo(layer_name: str, prototype_index: int, vector: Tensor, layer_ref: TverskyProjectionLayer | TverskySimilarityLayer)[source]¶

Information about a prototype in a TNN layer.

Contains metadata and vector data for a single prototype, enabling inspection and modification of learned prototype representations.

layer_name¶

Name of the layer containing this prototype.

Type:: str

prototype_index¶

Index of the prototype within the layer.

Type:: int

vector¶

The prototype vector data.

Type:: torch.Tensor

layer_ref¶

Reference to the layer object.

Type:: Union[TverskyProjectionLayer, TverskySimilarityLayer]

layer_name: str¶

prototype_index: int¶

vector: Tensor¶

layer_ref: TverskyProjectionLayer | TverskySimilarityLayer¶

property shape: Size¶

Get the shape of the prototype vector.

Returns:: Shape of the prototype vector, typically [in_features].
Return type:: torch.Size

property norm: float¶

Get the L2 norm of the prototype vector.

Returns:

L2 norm of the prototype vector, useful for comparing: prototype magnitudes and analyzing learned representations.

Return type:

__init__(layer_name: str, prototype_index: int, vector: Tensor, layer_ref: TverskyProjectionLayer | TverskySimilarityLayer) → None¶

FeatureInfo¶

class FeatureInfo(layer_name: str, feature_index: int, vector: Tensor, layer_ref: TverskyProjectionLayer | TverskySimilarityLayer)[source]¶

Information about a feature in a TNN layer.

Contains metadata and vector data for a single feature, enabling inspection and modification of learned feature representations.

layer_name¶

Name of the layer containing this feature.

Type:: str

feature_index¶

Index of the feature within the layer’s feature bank.

Type:: int

vector¶

The feature vector data.

Type:: torch.Tensor

layer_ref¶

Reference to the layer object.

Type:: Union[TverskyProjectionLayer, TverskySimilarityLayer]

layer_name: str¶

feature_index: int¶

vector: Tensor¶

layer_ref: TverskyProjectionLayer | TverskySimilarityLayer¶

property shape: Size¶

Get the shape of the feature vector.

Returns:: Shape of the feature vector, typically [in_features].
Return type:: torch.Size

property norm: float¶

Get the L2 norm of the feature vector.

Returns:

L2 norm of the feature vector, useful for comparing: feature magnitudes and analyzing learned representations.

Return type:

__init__(layer_name: str, feature_index: int, vector: Tensor, layer_ref: TverskyProjectionLayer | TverskySimilarityLayer) → None¶

Module: analysis¶

Analysis tools for TNN interventions.

Provides counterfactual analysis and impact assessment capabilities for understanding how interventions affect model behavior.

class ImpactMetrics(output_distance: float, output_correlation: float, prediction_change_rate: float, confidence_change: float, feature_activation_change: Tensor | None = None, similarity_score_change: Tensor | None = None, effect_size: float = 0.0, significance: float | None = None)[source]¶

Bases: object

Metrics quantifying the impact of an intervention.

Comprehensive metrics for evaluating how parameter modifications affect model behavior, including output changes, prediction shifts, and statistical significance measures.

output_distance¶

L2 distance between original and modified outputs.

Type:: float

output_correlation¶

Pearson correlation between original and modified outputs.

Type:: float

prediction_change_rate¶

Fraction of samples with changed predictions.

Type:: float

confidence_change¶

Average change in prediction confidence scores.

Type:: float

feature_activation_change¶

Changes in feature activation patterns, if computed.

Type:: Optional[torch.Tensor]

similarity_score_change¶

Changes in similarity scores, if computed.

Type:: Optional[torch.Tensor]

effect_size¶

Cohen’s d or similar standardized effect size measure.

Type:: float

significance¶

p-value from statistical significance test, if performed.

Type:: Optional[float]

output_distance: float¶

output_correlation: float¶

prediction_change_rate: float¶

confidence_change: float¶

feature_activation_change: Tensor | None = None¶

similarity_score_change: Tensor | None = None¶

effect_size: float = 0.0¶

significance: float | None = None¶

__init__(output_distance: float, output_correlation: float, prediction_change_rate: float, confidence_change: float, feature_activation_change: Tensor | None = None, similarity_score_change: Tensor | None = None, effect_size: float = 0.0, significance: float | None = None) → None¶

class CounterfactualResult(original_input: Tensor, original_output: Tensor, original_prediction: int, modified_input: Tensor, modified_output: Tensor, modified_prediction: int, intervention_description: str, success: bool, input_perturbation_norm: float, output_change_norm: float, confidence_change: float)[source]¶

Bases: object

Result of a counterfactual analysis.

Contains the complete record of a successful counterfactual generation, including original and modified states, intervention details, and quantitative measures of the change achieved.

original_input¶

Original input sample.

Type:: torch.Tensor

original_output¶

Model output for original input.

Type:: torch.Tensor

original_prediction¶

Predicted class for original input.

Type:: int

modified_input¶

Input after intervention (may be unchanged).

Type:: torch.Tensor

modified_output¶

Model output after intervention.

Type:: torch.Tensor

modified_prediction¶

Predicted class after intervention.

Type:: int

intervention_description¶

Human-readable description of the intervention.

Type:: str

success¶

Whether intervention achieved the desired outcome.

Type:: bool

input_perturbation_norm¶

L2 norm of input perturbation.

Type:: float

output_change_norm¶

L2 norm of output change.

Type:: float

confidence_change¶

Change in prediction confidence.

Type:: float

original_input: Tensor¶

original_output: Tensor¶

original_prediction: int¶

modified_input: Tensor¶

modified_output: Tensor¶

modified_prediction: int¶

intervention_description: str¶

success: bool¶

input_perturbation_norm: float¶

output_change_norm: float¶

confidence_change: float¶

__init__(original_input: Tensor, original_output: Tensor, original_prediction: int, modified_input: Tensor, modified_output: Tensor, modified_prediction: int, intervention_description: str, success: bool, input_perturbation_norm: float, output_change_norm: float, confidence_change: float) → None¶

class ImpactAssessment(intervention_manager: InterventionManager)[source]¶

Bases: object

Assess the impact of interventions on model behavior.

Provides comprehensive methods to quantify how prototype or feature modifications affect model outputs, enabling systematic evaluation of intervention effectiveness and model interpretability.

This class works in conjunction with InterventionManager to provide safe, temporary modifications with automatic restoration, allowing researchers to explore counterfactual scenarios without permanent model changes.

Note

All interventions are automatically reverted after assessment, ensuring the model state remains unchanged unless explicitly modified through the InterventionManager.

__init__(intervention_manager: InterventionManager)[source]¶

Initialize ImpactAssessment.

Parameters:: intervention_manager (InterventionManager) – InterventionManager instance to analyze. Must be initialized with a TNN model.

Note

The impact assessor uses the manager’s model directly and leverages its intervention tracking capabilities.

assess_prototype_impact(layer_name: str, prototype_index: int, new_vector: Tensor, test_inputs: Tensor, test_targets: Tensor | None = None) → ImpactMetrics[source]¶

Assess impact of modifying a prototype on model behavior.

Temporarily modifies a prototype vector and quantifies the resulting changes in model outputs, predictions, and confidence scores across a set of test inputs. The original prototype is automatically restored.

Parameters:

layer_name (str) – Name of the layer containing the prototype. Must be one of the manager’s discovered layer names.
prototype_index (int) – Index of the prototype to modify. Must be in range [0, num_prototypes).
new_vector (torch.Tensor) – New prototype vector to test. Must match the shape of the existing prototype.
test_inputs (torch.Tensor) – Input data to evaluate impact on. Shape should be [batch_size, in_features].
test_targets (Optional[torch.Tensor], optional) – Target labels for computing accuracy-based metrics. Defaults to None.

Returns:

Comprehensive metrics quantifying the intervention’s: effects including output distance, correlation, prediction changes, confidence shifts, and statistical effect size.

Return type:

Note

The prototype is automatically restored to its original value after assessment, regardless of success or failure.

assess_feature_impact(layer_name: str, feature_index: int, new_vector: Tensor, test_inputs: Tensor, test_targets: Tensor | None = None) → ImpactMetrics[source]¶

Assess impact of modifying a feature on model behavior.

Temporarily modifies a feature vector and quantifies the resulting changes in model outputs, predictions, and confidence scores across a set of test inputs. The original feature is automatically restored.

Parameters:

layer_name (str) – Name of the layer containing the feature. Must be one of the manager’s discovered layer names.
feature_index (int) – Index of the feature to modify within the layer’s feature bank. Must be in range [0, num_features).
new_vector (torch.Tensor) – New feature vector to test. Must match the shape of the existing feature.
test_inputs (torch.Tensor) – Input data to evaluate impact on. Shape should be [batch_size, in_features].
test_targets (Optional[torch.Tensor], optional) – Target labels for computing accuracy-based metrics. Defaults to None.

Returns:

Comprehensive metrics quantifying the intervention’s: effects including output distance, correlation, prediction changes, confidence shifts, and statistical effect size.

Return type:

Note

The feature is automatically restored to its original value after assessment, regardless of success or failure.

sensitivity_analysis(layer_name: str, parameter_type: str, parameter_index: int, test_inputs: Tensor, perturbation_scales: List[float] = None) → Dict[float, ImpactMetrics][source]¶

Perform sensitivity analysis by applying different scales of perturbation.

Parameters:

layer_name – Name of layer to analyze
parameter_type – ‘prototype’ or ‘feature’
parameter_index – Index of parameter to perturb
test_inputs – Input data for evaluation
perturbation_scales – List of perturbation scales to test

Returns:

Dictionary mapping perturbation scales to impact metrics

class CounterfactualAnalyzer(intervention_manager: InterventionManager)[source]¶

Bases: object

Perform counterfactual analysis on TNN models.

Generates counterfactual examples by finding minimal parameter interventions that change model predictions for specific inputs. Uses gradient-based optimization to discover how prototype or feature modifications can achieve desired prediction outcomes.

This class enables researchers to understand model decision boundaries and generate explanations for model behavior through systematic parameter space exploration.

Note

All interventions are temporary and automatically restored, allowing safe exploration of counterfactual scenarios.

__init__(intervention_manager: InterventionManager)[source]¶

Initialize CounterfactualAnalyzer.

Parameters:

intervention_manager (InterventionManager) – InterventionManager instance to use for parameter modifications and model access.
intervention_manager – InterventionManager instance to use

find_prototype_counterfactuals(input_sample: Tensor, target_class: int, layer_name: str, max_iterations: int = 100, learning_rate: float = 0.01) → List[CounterfactualResult][source]¶

Find counterfactual examples by modifying prototypes.

Parameters:

input_sample – Input sample to generate counterfactuals for
target_class – Desired output class
layer_name – Layer to modify prototypes in
max_iterations – Maximum optimization iterations
learning_rate – Learning rate for optimization

Returns:

List of successful counterfactual results

find_feature_counterfactuals(input_sample: Tensor, target_class: int, layer_name: str, max_iterations: int = 100, learning_rate: float = 0.01) → List[CounterfactualResult][source]¶

Find counterfactual examples by modifying features.

Parameters:

input_sample – Input sample to generate counterfactuals for
target_class – Desired output class
layer_name – Layer to modify features in
max_iterations – Maximum optimization iterations
learning_rate – Learning rate for optimization

Returns:

List of successful counterfactual results

analyze_decision_boundary(input_samples: Tensor, layer_name: str, num_perturbations: int = 10) → Dict[str, Any][source]¶

Analyze how the decision boundary changes with interventions.

Parameters:

input_samples – Set of input samples near decision boundary
layer_name – Layer to analyze
num_perturbations – Number of random perturbations to test

Returns:

Dictionary with boundary analysis results

Classes¶

ImpactAssessment¶

class ImpactAssessment(intervention_manager: InterventionManager)[source]¶

Assess the impact of interventions on model behavior.

Provides comprehensive methods to quantify how prototype or feature modifications affect model outputs, enabling systematic evaluation of intervention effectiveness and model interpretability.

Note

All interventions are automatically reverted after assessment, ensuring the model state remains unchanged unless explicitly modified through the InterventionManager.

__init__(intervention_manager: InterventionManager)[source]¶

Initialize ImpactAssessment.

Parameters:: intervention_manager (InterventionManager) – InterventionManager instance to analyze. Must be initialized with a TNN model.

Note

The impact assessor uses the manager’s model directly and leverages its intervention tracking capabilities.

assess_prototype_impact(layer_name: str, prototype_index: int, new_vector: Tensor, test_inputs: Tensor, test_targets: Tensor | None = None) → ImpactMetrics[source]¶

Assess impact of modifying a prototype on model behavior.

Parameters:

layer_name (str) – Name of the layer containing the prototype. Must be one of the manager’s discovered layer names.
prototype_index (int) – Index of the prototype to modify. Must be in range [0, num_prototypes).
new_vector (torch.Tensor) – New prototype vector to test. Must match the shape of the existing prototype.
test_inputs (torch.Tensor) – Input data to evaluate impact on. Shape should be [batch_size, in_features].
test_targets (Optional[torch.Tensor], optional) – Target labels for computing accuracy-based metrics. Defaults to None.

Returns:

Comprehensive metrics quantifying the intervention’s: effects including output distance, correlation, prediction changes, confidence shifts, and statistical effect size.

Return type:

Note

The prototype is automatically restored to its original value after assessment, regardless of success or failure.

assess_feature_impact(layer_name: str, feature_index: int, new_vector: Tensor, test_inputs: Tensor, test_targets: Tensor | None = None) → ImpactMetrics[source]¶

Assess impact of modifying a feature on model behavior.

Parameters:

layer_name (str) – Name of the layer containing the feature. Must be one of the manager’s discovered layer names.
feature_index (int) – Index of the feature to modify within the layer’s feature bank. Must be in range [0, num_features).
new_vector (torch.Tensor) – New feature vector to test. Must match the shape of the existing feature.
test_inputs (torch.Tensor) – Input data to evaluate impact on. Shape should be [batch_size, in_features].
test_targets (Optional[torch.Tensor], optional) – Target labels for computing accuracy-based metrics. Defaults to None.

Returns:

Comprehensive metrics quantifying the intervention’s: effects including output distance, correlation, prediction changes, confidence shifts, and statistical effect size.

Return type: