verskyt.interventions¶
Intervention tools for analyzing and modifying TNN models.
Module: manager¶
Intervention Manager for Tversky Neural Networks.
Provides high-level APIs for inspecting and modifying TNN models, enabling interpretability and counterfactual analysis.
- class PrototypeInfo(layer_name: str, prototype_index: int, vector: Tensor, layer_ref: TverskyProjectionLayer | TverskySimilarityLayer)[source]¶
Bases:
objectInformation about a prototype in a TNN layer.
Contains metadata and vector data for a single prototype, enabling inspection and modification of learned prototype representations.
- vector¶
The prototype vector data.
- Type:
- layer_ref¶
Reference to the layer object.
- Type:
- layer_ref: TverskyProjectionLayer | TverskySimilarityLayer¶
- property shape: Size¶
Get the shape of the prototype vector.
- Returns:
Shape of the prototype vector, typically [in_features].
- Return type:
- property norm: float¶
Get the L2 norm of the prototype vector.
- Returns:
- L2 norm of the prototype vector, useful for comparing
prototype magnitudes and analyzing learned representations.
- Return type:
- __init__(layer_name: str, prototype_index: int, vector: Tensor, layer_ref: TverskyProjectionLayer | TverskySimilarityLayer) None¶
- class FeatureInfo(layer_name: str, feature_index: int, vector: Tensor, layer_ref: TverskyProjectionLayer | TverskySimilarityLayer)[source]¶
Bases:
objectInformation about a feature in a TNN layer.
Contains metadata and vector data for a single feature, enabling inspection and modification of learned feature representations.
- vector¶
The feature vector data.
- Type:
- layer_ref¶
Reference to the layer object.
- Type:
- layer_ref: TverskyProjectionLayer | TverskySimilarityLayer¶
- property shape: Size¶
Get the shape of the feature vector.
- Returns:
Shape of the feature vector, typically [in_features].
- Return type:
- property norm: float¶
Get the L2 norm of the feature vector.
- Returns:
- L2 norm of the feature vector, useful for comparing
feature magnitudes and analyzing learned representations.
- Return type:
- __init__(layer_name: str, feature_index: int, vector: Tensor, layer_ref: TverskyProjectionLayer | TverskySimilarityLayer) None¶
- class InterventionManager(model: Module, model_name: str = 'TNN_Model')[source]¶
Bases:
objectManager for interventions on Tversky Neural Networks.
Provides a unified API for inspecting, modifying, and analyzing TNN models to enable interpretability research and counterfactual analysis. Supports tracking of interventions and restoration of original model states.
This class serves as the central hub for TNN interpretability, offering: - Comprehensive prototype and feature discovery across all layers - Safe parameter modification with automatic state tracking - Integration with impact assessment and grounding frameworks - Batch operations for systematic intervention studies
Note
The manager automatically discovers TNN layers (TverskyProjectionLayer and TverskySimilarityLayer) within the provided model and maintains original parameter states for restoration.
- __init__(model: Module, model_name: str = 'TNN_Model')[source]¶
Initialize InterventionManager for a TNN model.
Automatically discovers all TNN layers within the model and captures the original parameter state for later restoration.
- Parameters:
model (nn.Module) – PyTorch model containing TverskyProjectionLayer or TverskySimilarityLayer instances.
model_name (str, optional) – Human-readable name for the model. Defaults to “TNN_Model”.
Note
The manager will only operate on TverskyProjectionLayer and TverskySimilarityLayer instances found within the model.
- property num_layers: int¶
Get the number of TNN layers in the model.
- Returns:
- Total count of TverskyProjectionLayer and TverskySimilarityLayer
instances found in the model.
- Return type:
- property layer_names: List[str]¶
Get names of all TNN layers in the model.
- Returns:
- List of layer names that can be used with other
manager methods for layer-specific operations.
- Return type:
List[str]
- get_layer_info(layer_name: str) Dict[str, Any][source]¶
Get comprehensive information about a TNN layer.
Provides detailed metadata about layer configuration, parameter shapes, and capabilities for inspection and intervention planning.
- Parameters:
layer_name (str) – Name of the layer to inspect. Must be one of the names returned by the layer_names property.
- Returns:
- Dictionary containing layer metadata including:
layer_name: Name of the layer
layer_type: Class name of the layer
in_features: Input feature dimension
num_prototypes: Number of prototypes (if applicable)
num_features: Number of features (if applicable)
learnable_ab: Whether alpha/beta are learnable (if applicable)
- Return type:
Dict[str, Any]
- Raises:
ValueError – If layer_name is not found in the model.
- list_prototypes(layer_name: str | None = None) List[PrototypeInfo][source]¶
List all prototypes in the model or specific layer.
Discovers and returns metadata for all prototype vectors across TNN layers, enabling systematic inspection and analysis.
- Parameters:
layer_name (Optional[str], optional) – If specified, only return prototypes from this layer. If None, returns prototypes from all layers. Defaults to None.
- Returns:
- List of PrototypeInfo objects containing
prototype vectors and metadata. Each object provides access to the prototype vector, layer reference, and computed properties.
- Return type:
List[PrototypeInfo]
Note
Only layers with ‘prototypes’ attribute (typically TverskyProjectionLayer) will contribute to the returned list.
- list_features(layer_name: str | None = None) List[FeatureInfo][source]¶
List all features in the model or specific layer.
Discovers and returns metadata for all feature vectors across TNN layers, enabling systematic inspection and analysis of the learned feature representations.
- Parameters:
layer_name (Optional[str], optional) – If specified, only return features from this layer. If None, returns features from all layers. Defaults to None.
- Returns:
- List of FeatureInfo objects containing
feature vectors and metadata. Each object provides access to the feature vector, layer reference, and computed properties.
- Return type:
List[FeatureInfo]
Note
Only layers with ‘feature_bank’ attribute will contribute to the returned list. This typically includes both TverskyProjectionLayer and TverskySimilarityLayer instances.
- get_prototype(layer_name: str, prototype_index: int) PrototypeInfo[source]¶
Get specific prototype information.
Retrieves detailed information about a single prototype vector, including its current values and layer context.
- Parameters:
- Returns:
- Object containing the prototype vector, metadata,
and layer reference for further operations.
- Return type:
- Raises:
ValueError – If layer_name is not found or layer has no prototypes.
IndexError – If prototype_index is out of bounds.
- get_feature(layer_name: str, feature_index: int) FeatureInfo[source]¶
Get specific feature information.
Retrieves detailed information about a single feature vector, including its current values and layer context.
- Parameters:
- Returns:
- Object containing the feature vector, metadata,
and layer reference for further operations.
- Return type:
- Raises:
ValueError – If layer_name is not found or layer has no feature bank.
IndexError – If feature_index is out of bounds.
- modify_prototype(layer_name: str, prototype_index: int, new_vector: Tensor, track_intervention: bool = True) PrototypeInfo[source]¶
Modify a prototype vector in a TNN layer.
Safely modifies a prototype vector with automatic validation and optional intervention tracking for impact assessment and restoration.
- Parameters:
layer_name (str) – Name of the layer containing the prototype. Must be one of the names returned by layer_names.
prototype_index (int) – Index of the prototype to modify. Must be in range [0, num_prototypes).
new_vector (torch.Tensor) – New prototype vector to set. Must match the shape of the existing prototype.
track_intervention (bool, optional) – Whether to record this intervention in the history for impact assessment. Defaults to True.
- Returns:
- Updated PrototypeInfo object reflecting the
new prototype vector state.
- Return type:
- Raises:
ValueError – If layer_name is not found, layer has no prototypes, or new_vector shape doesn’t match expected dimensions.
IndexError – If prototype_index is out of bounds.
Note
When track_intervention=True, the original vector is stored for potential restoration via reset_to_original().
- modify_feature(layer_name: str, feature_index: int, new_vector: Tensor, track_intervention: bool = True) FeatureInfo[source]¶
Modify a feature vector in a TNN layer.
Safely modifies a feature vector with automatic validation and optional intervention tracking for impact assessment and restoration.
- Parameters:
layer_name (str) – Name of the layer containing the feature. Must be one of the names returned by layer_names.
feature_index (int) – Index of the feature to modify within the layer’s feature bank. Must be in range [0, num_features).
new_vector (torch.Tensor) – New feature vector to set. Must match the shape of the existing feature.
track_intervention (bool, optional) – Whether to record this intervention in the history for impact assessment. Defaults to True.
- Returns:
- Updated FeatureInfo object reflecting the
new feature vector state.
- Return type:
- Raises:
ValueError – If layer_name is not found, layer has no feature bank, or new_vector shape doesn’t match expected dimensions.
IndexError – If feature_index is out of bounds.
Note
When track_intervention=True, the original vector is stored for potential restoration via reset_to_original().
- reset_to_original() None[source]¶
Reset all TNN layers to their original state.
Restores all prototype vectors, feature vectors, and learnable parameters (alpha, beta) to their values at manager initialization. Also clears the intervention history.
Note
This operation cannot be undone. All modifications made through modify_prototype() and modify_feature() will be reverted to the original model state.
- get_intervention_history() List[Dict[str, Any]][source]¶
Get history of all interventions performed.
Returns a copy of the intervention history containing detailed records of all modifications made through this manager.
- Returns:
- List of intervention records, each containing:
- type: Type of intervention (‘prototype_modification’
or ‘feature_modification’)
layer_name: Name of the affected layer
index: Index of the modified parameter
original_vector: Original parameter vector (cloned)
new_vector: New parameter vector (cloned)
timestamp: Sequential intervention number
- Return type:
List[Dict[str, Any]]
- summary() str[source]¶
Get a summary of the model and available interventions.
Provides a comprehensive overview of the model structure, TNN layers, and intervention capabilities for inspection.
- Returns:
- Multi-line summary string containing:
Model name and layer count
Detailed information for each TNN layer
Number of prototypes and features per layer
Parameter values (alpha, beta, theta)
Total intervention count
- Return type:
Classes¶
InterventionManager¶
- class InterventionManager(model: Module, model_name: str = 'TNN_Model')[source]¶
Manager for interventions on Tversky Neural Networks.
Provides a unified API for inspecting, modifying, and analyzing TNN models to enable interpretability research and counterfactual analysis. Supports tracking of interventions and restoration of original model states.
This class serves as the central hub for TNN interpretability, offering: - Comprehensive prototype and feature discovery across all layers - Safe parameter modification with automatic state tracking - Integration with impact assessment and grounding frameworks - Batch operations for systematic intervention studies
Note
The manager automatically discovers TNN layers (TverskyProjectionLayer and TverskySimilarityLayer) within the provided model and maintains original parameter states for restoration.
- __init__(model: Module, model_name: str = 'TNN_Model')[source]¶
Initialize InterventionManager for a TNN model.
Automatically discovers all TNN layers within the model and captures the original parameter state for later restoration.
- Parameters:
model (nn.Module) – PyTorch model containing TverskyProjectionLayer or TverskySimilarityLayer instances.
model_name (str, optional) – Human-readable name for the model. Defaults to “TNN_Model”.
Note
The manager will only operate on TverskyProjectionLayer and TverskySimilarityLayer instances found within the model.
- property num_layers: int¶
Get the number of TNN layers in the model.
- Returns:
- Total count of TverskyProjectionLayer and TverskySimilarityLayer
instances found in the model.
- Return type:
- property layer_names: List[str]¶
Get names of all TNN layers in the model.
- Returns:
- List of layer names that can be used with other
manager methods for layer-specific operations.
- Return type:
List[str]
- get_layer_info(layer_name: str) Dict[str, Any][source]¶
Get comprehensive information about a TNN layer.
Provides detailed metadata about layer configuration, parameter shapes, and capabilities for inspection and intervention planning.
- Parameters:
layer_name (str) – Name of the layer to inspect. Must be one of the names returned by the layer_names property.
- Returns:
- Dictionary containing layer metadata including:
layer_name: Name of the layer
layer_type: Class name of the layer
in_features: Input feature dimension
num_prototypes: Number of prototypes (if applicable)
num_features: Number of features (if applicable)
learnable_ab: Whether alpha/beta are learnable (if applicable)
- Return type:
Dict[str, Any]
- Raises:
ValueError – If layer_name is not found in the model.
- list_prototypes(layer_name: str | None = None) List[PrototypeInfo][source]¶
List all prototypes in the model or specific layer.
Discovers and returns metadata for all prototype vectors across TNN layers, enabling systematic inspection and analysis.
- Parameters:
layer_name (Optional[str], optional) – If specified, only return prototypes from this layer. If None, returns prototypes from all layers. Defaults to None.
- Returns:
- List of PrototypeInfo objects containing
prototype vectors and metadata. Each object provides access to the prototype vector, layer reference, and computed properties.
- Return type:
List[PrototypeInfo]
Note
Only layers with ‘prototypes’ attribute (typically TverskyProjectionLayer) will contribute to the returned list.
- list_features(layer_name: str | None = None) List[FeatureInfo][source]¶
List all features in the model or specific layer.
Discovers and returns metadata for all feature vectors across TNN layers, enabling systematic inspection and analysis of the learned feature representations.
- Parameters:
layer_name (Optional[str], optional) – If specified, only return features from this layer. If None, returns features from all layers. Defaults to None.
- Returns:
- List of FeatureInfo objects containing
feature vectors and metadata. Each object provides access to the feature vector, layer reference, and computed properties.
- Return type:
List[FeatureInfo]
Note
Only layers with ‘feature_bank’ attribute will contribute to the returned list. This typically includes both TverskyProjectionLayer and TverskySimilarityLayer instances.
- get_prototype(layer_name: str, prototype_index: int) PrototypeInfo[source]¶
Get specific prototype information.
Retrieves detailed information about a single prototype vector, including its current values and layer context.
- Parameters:
- Returns:
- Object containing the prototype vector, metadata,
and layer reference for further operations.
- Return type:
- Raises:
ValueError – If layer_name is not found or layer has no prototypes.
IndexError – If prototype_index is out of bounds.
- get_feature(layer_name: str, feature_index: int) FeatureInfo[source]¶
Get specific feature information.
Retrieves detailed information about a single feature vector, including its current values and layer context.
- Parameters:
- Returns:
- Object containing the feature vector, metadata,
and layer reference for further operations.
- Return type:
- Raises:
ValueError – If layer_name is not found or layer has no feature bank.
IndexError – If feature_index is out of bounds.
- modify_prototype(layer_name: str, prototype_index: int, new_vector: Tensor, track_intervention: bool = True) PrototypeInfo[source]¶
Modify a prototype vector in a TNN layer.
Safely modifies a prototype vector with automatic validation and optional intervention tracking for impact assessment and restoration.
- Parameters:
layer_name (str) – Name of the layer containing the prototype. Must be one of the names returned by layer_names.
prototype_index (int) – Index of the prototype to modify. Must be in range [0, num_prototypes).
new_vector (torch.Tensor) – New prototype vector to set. Must match the shape of the existing prototype.
track_intervention (bool, optional) – Whether to record this intervention in the history for impact assessment. Defaults to True.
- Returns:
- Updated PrototypeInfo object reflecting the
new prototype vector state.
- Return type:
- Raises:
ValueError – If layer_name is not found, layer has no prototypes, or new_vector shape doesn’t match expected dimensions.
IndexError – If prototype_index is out of bounds.
Note
When track_intervention=True, the original vector is stored for potential restoration via reset_to_original().
- modify_feature(layer_name: str, feature_index: int, new_vector: Tensor, track_intervention: bool = True) FeatureInfo[source]¶
Modify a feature vector in a TNN layer.
Safely modifies a feature vector with automatic validation and optional intervention tracking for impact assessment and restoration.
- Parameters:
layer_name (str) – Name of the layer containing the feature. Must be one of the names returned by layer_names.
feature_index (int) – Index of the feature to modify within the layer’s feature bank. Must be in range [0, num_features).
new_vector (torch.Tensor) – New feature vector to set. Must match the shape of the existing feature.
track_intervention (bool, optional) – Whether to record this intervention in the history for impact assessment. Defaults to True.
- Returns:
- Updated FeatureInfo object reflecting the
new feature vector state.
- Return type:
- Raises:
ValueError – If layer_name is not found, layer has no feature bank, or new_vector shape doesn’t match expected dimensions.
IndexError – If feature_index is out of bounds.
Note
When track_intervention=True, the original vector is stored for potential restoration via reset_to_original().
- reset_to_original() None[source]¶
Reset all TNN layers to their original state.
Restores all prototype vectors, feature vectors, and learnable parameters (alpha, beta) to their values at manager initialization. Also clears the intervention history.
Note
This operation cannot be undone. All modifications made through modify_prototype() and modify_feature() will be reverted to the original model state.
- get_intervention_history() List[Dict[str, Any]][source]¶
Get history of all interventions performed.
Returns a copy of the intervention history containing detailed records of all modifications made through this manager.
- Returns:
- List of intervention records, each containing:
- type: Type of intervention (‘prototype_modification’
or ‘feature_modification’)
layer_name: Name of the affected layer
index: Index of the modified parameter
original_vector: Original parameter vector (cloned)
new_vector: New parameter vector (cloned)
timestamp: Sequential intervention number
- Return type:
List[Dict[str, Any]]
- summary() str[source]¶
Get a summary of the model and available interventions.
Provides a comprehensive overview of the model structure, TNN layers, and intervention capabilities for inspection.
- Returns:
- Multi-line summary string containing:
Model name and layer count
Detailed information for each TNN layer
Number of prototypes and features per layer
Parameter values (alpha, beta, theta)
Total intervention count
- Return type:
PrototypeInfo¶
- class PrototypeInfo(layer_name: str, prototype_index: int, vector: Tensor, layer_ref: TverskyProjectionLayer | TverskySimilarityLayer)[source]¶
Information about a prototype in a TNN layer.
Contains metadata and vector data for a single prototype, enabling inspection and modification of learned prototype representations.
- vector¶
The prototype vector data.
- Type:
- layer_ref¶
Reference to the layer object.
- Type:
- layer_ref: TverskyProjectionLayer | TverskySimilarityLayer¶
- property shape: Size¶
Get the shape of the prototype vector.
- Returns:
Shape of the prototype vector, typically [in_features].
- Return type:
- property norm: float¶
Get the L2 norm of the prototype vector.
- Returns:
- L2 norm of the prototype vector, useful for comparing
prototype magnitudes and analyzing learned representations.
- Return type:
- __init__(layer_name: str, prototype_index: int, vector: Tensor, layer_ref: TverskyProjectionLayer | TverskySimilarityLayer) None¶
FeatureInfo¶
- class FeatureInfo(layer_name: str, feature_index: int, vector: Tensor, layer_ref: TverskyProjectionLayer | TverskySimilarityLayer)[source]¶
Information about a feature in a TNN layer.
Contains metadata and vector data for a single feature, enabling inspection and modification of learned feature representations.
- vector¶
The feature vector data.
- Type:
- layer_ref¶
Reference to the layer object.
- Type:
- layer_ref: TverskyProjectionLayer | TverskySimilarityLayer¶
- property shape: Size¶
Get the shape of the feature vector.
- Returns:
Shape of the feature vector, typically [in_features].
- Return type:
- property norm: float¶
Get the L2 norm of the feature vector.
- Returns:
- L2 norm of the feature vector, useful for comparing
feature magnitudes and analyzing learned representations.
- Return type:
- __init__(layer_name: str, feature_index: int, vector: Tensor, layer_ref: TverskyProjectionLayer | TverskySimilarityLayer) None¶
Module: analysis¶
Analysis tools for TNN interventions.
Provides counterfactual analysis and impact assessment capabilities for understanding how interventions affect model behavior.
- class ImpactMetrics(output_distance: float, output_correlation: float, prediction_change_rate: float, confidence_change: float, feature_activation_change: Tensor | None = None, similarity_score_change: Tensor | None = None, effect_size: float = 0.0, significance: float | None = None)[source]¶
Bases:
objectMetrics quantifying the impact of an intervention.
Comprehensive metrics for evaluating how parameter modifications affect model behavior, including output changes, prediction shifts, and statistical significance measures.
- feature_activation_change¶
Changes in feature activation patterns, if computed.
- Type:
Optional[torch.Tensor]
- similarity_score_change¶
Changes in similarity scores, if computed.
- Type:
Optional[torch.Tensor]
- class CounterfactualResult(original_input: Tensor, original_output: Tensor, original_prediction: int, modified_input: Tensor, modified_output: Tensor, modified_prediction: int, intervention_description: str, success: bool, input_perturbation_norm: float, output_change_norm: float, confidence_change: float)[source]¶
Bases:
objectResult of a counterfactual analysis.
Contains the complete record of a successful counterfactual generation, including original and modified states, intervention details, and quantitative measures of the change achieved.
- original_input¶
Original input sample.
- Type:
- original_output¶
Model output for original input.
- Type:
- modified_input¶
Input after intervention (may be unchanged).
- Type:
- modified_output¶
Model output after intervention.
- Type:
- class ImpactAssessment(intervention_manager: InterventionManager)[source]¶
Bases:
objectAssess the impact of interventions on model behavior.
Provides comprehensive methods to quantify how prototype or feature modifications affect model outputs, enabling systematic evaluation of intervention effectiveness and model interpretability.
This class works in conjunction with InterventionManager to provide safe, temporary modifications with automatic restoration, allowing researchers to explore counterfactual scenarios without permanent model changes.
Note
All interventions are automatically reverted after assessment, ensuring the model state remains unchanged unless explicitly modified through the InterventionManager.
- __init__(intervention_manager: InterventionManager)[source]¶
Initialize ImpactAssessment.
- Parameters:
intervention_manager (InterventionManager) – InterventionManager instance to analyze. Must be initialized with a TNN model.
Note
The impact assessor uses the manager’s model directly and leverages its intervention tracking capabilities.
- assess_prototype_impact(layer_name: str, prototype_index: int, new_vector: Tensor, test_inputs: Tensor, test_targets: Tensor | None = None) ImpactMetrics[source]¶
Assess impact of modifying a prototype on model behavior.
Temporarily modifies a prototype vector and quantifies the resulting changes in model outputs, predictions, and confidence scores across a set of test inputs. The original prototype is automatically restored.
- Parameters:
layer_name (str) – Name of the layer containing the prototype. Must be one of the manager’s discovered layer names.
prototype_index (int) – Index of the prototype to modify. Must be in range [0, num_prototypes).
new_vector (torch.Tensor) – New prototype vector to test. Must match the shape of the existing prototype.
test_inputs (torch.Tensor) – Input data to evaluate impact on. Shape should be [batch_size, in_features].
test_targets (Optional[torch.Tensor], optional) – Target labels for computing accuracy-based metrics. Defaults to None.
- Returns:
- Comprehensive metrics quantifying the intervention’s
effects including output distance, correlation, prediction changes, confidence shifts, and statistical effect size.
- Return type:
Note
The prototype is automatically restored to its original value after assessment, regardless of success or failure.
- assess_feature_impact(layer_name: str, feature_index: int, new_vector: Tensor, test_inputs: Tensor, test_targets: Tensor | None = None) ImpactMetrics[source]¶
Assess impact of modifying a feature on model behavior.
Temporarily modifies a feature vector and quantifies the resulting changes in model outputs, predictions, and confidence scores across a set of test inputs. The original feature is automatically restored.
- Parameters:
layer_name (str) – Name of the layer containing the feature. Must be one of the manager’s discovered layer names.
feature_index (int) – Index of the feature to modify within the layer’s feature bank. Must be in range [0, num_features).
new_vector (torch.Tensor) – New feature vector to test. Must match the shape of the existing feature.
test_inputs (torch.Tensor) – Input data to evaluate impact on. Shape should be [batch_size, in_features].
test_targets (Optional[torch.Tensor], optional) – Target labels for computing accuracy-based metrics. Defaults to None.
- Returns:
- Comprehensive metrics quantifying the intervention’s
effects including output distance, correlation, prediction changes, confidence shifts, and statistical effect size.
- Return type:
Note
The feature is automatically restored to its original value after assessment, regardless of success or failure.
- sensitivity_analysis(layer_name: str, parameter_type: str, parameter_index: int, test_inputs: Tensor, perturbation_scales: List[float] = None) Dict[float, ImpactMetrics][source]¶
Perform sensitivity analysis by applying different scales of perturbation.
- Parameters:
layer_name – Name of layer to analyze
parameter_type – ‘prototype’ or ‘feature’
parameter_index – Index of parameter to perturb
test_inputs – Input data for evaluation
perturbation_scales – List of perturbation scales to test
- Returns:
Dictionary mapping perturbation scales to impact metrics
- class CounterfactualAnalyzer(intervention_manager: InterventionManager)[source]¶
Bases:
objectPerform counterfactual analysis on TNN models.
Generates counterfactual examples by finding minimal parameter interventions that change model predictions for specific inputs. Uses gradient-based optimization to discover how prototype or feature modifications can achieve desired prediction outcomes.
This class enables researchers to understand model decision boundaries and generate explanations for model behavior through systematic parameter space exploration.
Note
All interventions are temporary and automatically restored, allowing safe exploration of counterfactual scenarios.
- __init__(intervention_manager: InterventionManager)[source]¶
Initialize CounterfactualAnalyzer.
- Parameters:
intervention_manager (InterventionManager) – InterventionManager instance to use for parameter modifications and model access.
intervention_manager – InterventionManager instance to use
- find_prototype_counterfactuals(input_sample: Tensor, target_class: int, layer_name: str, max_iterations: int = 100, learning_rate: float = 0.01) List[CounterfactualResult][source]¶
Find counterfactual examples by modifying prototypes.
- Parameters:
input_sample – Input sample to generate counterfactuals for
target_class – Desired output class
layer_name – Layer to modify prototypes in
max_iterations – Maximum optimization iterations
learning_rate – Learning rate for optimization
- Returns:
List of successful counterfactual results
- find_feature_counterfactuals(input_sample: Tensor, target_class: int, layer_name: str, max_iterations: int = 100, learning_rate: float = 0.01) List[CounterfactualResult][source]¶
Find counterfactual examples by modifying features.
- Parameters:
input_sample – Input sample to generate counterfactuals for
target_class – Desired output class
layer_name – Layer to modify features in
max_iterations – Maximum optimization iterations
learning_rate – Learning rate for optimization
- Returns:
List of successful counterfactual results
- analyze_decision_boundary(input_samples: Tensor, layer_name: str, num_perturbations: int = 10) Dict[str, Any][source]¶
Analyze how the decision boundary changes with interventions.
- Parameters:
input_samples – Set of input samples near decision boundary
layer_name – Layer to analyze
num_perturbations – Number of random perturbations to test
- Returns:
Dictionary with boundary analysis results
Classes¶
ImpactAssessment¶
- class ImpactAssessment(intervention_manager: InterventionManager)[source]¶
Assess the impact of interventions on model behavior.
Provides comprehensive methods to quantify how prototype or feature modifications affect model outputs, enabling systematic evaluation of intervention effectiveness and model interpretability.
This class works in conjunction with InterventionManager to provide safe, temporary modifications with automatic restoration, allowing researchers to explore counterfactual scenarios without permanent model changes.
Note
All interventions are automatically reverted after assessment, ensuring the model state remains unchanged unless explicitly modified through the InterventionManager.
- __init__(intervention_manager: InterventionManager)[source]¶
Initialize ImpactAssessment.
- Parameters:
intervention_manager (InterventionManager) – InterventionManager instance to analyze. Must be initialized with a TNN model.
Note
The impact assessor uses the manager’s model directly and leverages its intervention tracking capabilities.
- assess_prototype_impact(layer_name: str, prototype_index: int, new_vector: Tensor, test_inputs: Tensor, test_targets: Tensor | None = None) ImpactMetrics[source]¶
Assess impact of modifying a prototype on model behavior.
Temporarily modifies a prototype vector and quantifies the resulting changes in model outputs, predictions, and confidence scores across a set of test inputs. The original prototype is automatically restored.
- Parameters:
layer_name (str) – Name of the layer containing the prototype. Must be one of the manager’s discovered layer names.
prototype_index (int) – Index of the prototype to modify. Must be in range [0, num_prototypes).
new_vector (torch.Tensor) – New prototype vector to test. Must match the shape of the existing prototype.
test_inputs (torch.Tensor) – Input data to evaluate impact on. Shape should be [batch_size, in_features].
test_targets (Optional[torch.Tensor], optional) – Target labels for computing accuracy-based metrics. Defaults to None.
- Returns:
- Comprehensive metrics quantifying the intervention’s
effects including output distance, correlation, prediction changes, confidence shifts, and statistical effect size.
- Return type:
Note
The prototype is automatically restored to its original value after assessment, regardless of success or failure.
- assess_feature_impact(layer_name: str, feature_index: int, new_vector: Tensor, test_inputs: Tensor, test_targets: Tensor | None = None) ImpactMetrics[source]¶
Assess impact of modifying a feature on model behavior.
Temporarily modifies a feature vector and quantifies the resulting changes in model outputs, predictions, and confidence scores across a set of test inputs. The original feature is automatically restored.
- Parameters:
layer_name (str) – Name of the layer containing the feature. Must be one of the manager’s discovered layer names.
feature_index (int) – Index of the feature to modify within the layer’s feature bank. Must be in range [0, num_features).
new_vector (torch.Tensor) – New feature vector to test. Must match the shape of the existing feature.
test_inputs (torch.Tensor) – Input data to evaluate impact on. Shape should be [batch_size, in_features].
test_targets (Optional[torch.Tensor], optional) – Target labels for computing accuracy-based metrics. Defaults to None.
- Returns:
- Comprehensive metrics quantifying the intervention’s
effects including output distance, correlation, prediction changes, confidence shifts, and statistical effect size.
- Return type:
Note
The feature is automatically restored to its original value after assessment, regardless of success or failure.
- sensitivity_analysis(layer_name: str, parameter_type: str, parameter_index: int, test_inputs: Tensor, perturbation_scales: List[float] = None) Dict[float, ImpactMetrics][source]¶
Perform sensitivity analysis by applying different scales of perturbation.
- Parameters:
layer_name – Name of layer to analyze
parameter_type – ‘prototype’ or ‘feature’
parameter_index – Index of parameter to perturb
test_inputs – Input data for evaluation
perturbation_scales – List of perturbation scales to test
- Returns:
Dictionary mapping perturbation scales to impact metrics
CounterfactualAnalyzer¶
- class CounterfactualAnalyzer(intervention_manager: InterventionManager)[source]¶
Perform counterfactual analysis on TNN models.
Generates counterfactual examples by finding minimal parameter interventions that change model predictions for specific inputs. Uses gradient-based optimization to discover how prototype or feature modifications can achieve desired prediction outcomes.
This class enables researchers to understand model decision boundaries and generate explanations for model behavior through systematic parameter space exploration.
Note
All interventions are temporary and automatically restored, allowing safe exploration of counterfactual scenarios.
- __init__(intervention_manager: InterventionManager)[source]¶
Initialize CounterfactualAnalyzer.
- Parameters:
intervention_manager (InterventionManager) – InterventionManager instance to use for parameter modifications and model access.
intervention_manager – InterventionManager instance to use
- find_prototype_counterfactuals(input_sample: Tensor, target_class: int, layer_name: str, max_iterations: int = 100, learning_rate: float = 0.01) List[CounterfactualResult][source]¶
Find counterfactual examples by modifying prototypes.
- Parameters:
input_sample – Input sample to generate counterfactuals for
target_class – Desired output class
layer_name – Layer to modify prototypes in
max_iterations – Maximum optimization iterations
learning_rate – Learning rate for optimization
- Returns:
List of successful counterfactual results
- find_feature_counterfactuals(input_sample: Tensor, target_class: int, layer_name: str, max_iterations: int = 100, learning_rate: float = 0.01) List[CounterfactualResult][source]¶
Find counterfactual examples by modifying features.
- Parameters:
input_sample – Input sample to generate counterfactuals for
target_class – Desired output class
layer_name – Layer to modify features in
max_iterations – Maximum optimization iterations
learning_rate – Learning rate for optimization
- Returns:
List of successful counterfactual results
- analyze_decision_boundary(input_samples: Tensor, layer_name: str, num_perturbations: int = 10) Dict[str, Any][source]¶
Analyze how the decision boundary changes with interventions.
- Parameters:
input_samples – Set of input samples near decision boundary
layer_name – Layer to analyze
num_perturbations – Number of random perturbations to test
- Returns:
Dictionary with boundary analysis results
ImpactMetrics¶
- class ImpactMetrics(output_distance: float, output_correlation: float, prediction_change_rate: float, confidence_change: float, feature_activation_change: Tensor | None = None, similarity_score_change: Tensor | None = None, effect_size: float = 0.0, significance: float | None = None)[source]¶
Metrics quantifying the impact of an intervention.
Comprehensive metrics for evaluating how parameter modifications affect model behavior, including output changes, prediction shifts, and statistical significance measures.
- feature_activation_change¶
Changes in feature activation patterns, if computed.
- Type:
Optional[torch.Tensor]
- similarity_score_change¶
Changes in similarity scores, if computed.
- Type:
Optional[torch.Tensor]
CounterfactualResult¶
- class CounterfactualResult(original_input: Tensor, original_output: Tensor, original_prediction: int, modified_input: Tensor, modified_output: Tensor, modified_prediction: int, intervention_description: str, success: bool, input_perturbation_norm: float, output_change_norm: float, confidence_change: float)[source]¶
Result of a counterfactual analysis.
Contains the complete record of a successful counterfactual generation, including original and modified states, intervention details, and quantitative measures of the change achieved.
- original_input¶
Original input sample.
- Type:
- original_output¶
Model output for original input.
- Type:
- modified_input¶
Input after intervention (may be unchanged).
- Type:
- modified_output¶
Model output after intervention.
- Type:
Module: grounding¶
Feature grounding for Tversky Neural Networks.
Provides tools for grounding features and prototypes to semantic concepts, enabling human-interpretable understanding of TNN internals.
- class ConceptGrounding(layer_name: str, parameter_type: str, parameter_index: int, concept_name: str, concept_description: str, confidence: float, activation_correlation: float | None = None, visual_similarity: float | None = None, semantic_coherence: float | None = None, grounding_method: str = 'manual', validation_samples: List[Any] | None = None)[source]¶
Bases:
objectAssociates a TNN parameter with a semantic concept.
Records the association between a learned parameter (prototype or feature) and a human-interpretable concept, including confidence measures and supporting evidence for the grounding.
- validation_samples¶
Samples used for validation.
- Type:
Optional[List[Any]]
- __init__(layer_name: str, parameter_type: str, parameter_index: int, concept_name: str, concept_description: str, confidence: float, activation_correlation: float | None = None, visual_similarity: float | None = None, semantic_coherence: float | None = None, grounding_method: str = 'manual', validation_samples: List[Any] | None = None) None¶
- class ConceptLibrary(concepts: ~typing.Dict[str, ~typing.Dict[str, ~typing.Any]] = <factory>)[source]¶
Bases:
objectLibrary of semantic concepts for grounding.
Maintains a collection of semantic concepts with their descriptions, examples, and associated groundings for systematic interpretability analysis.
- concepts¶
Dictionary mapping concept names to concept metadata including description, examples, properties, and list of associated groundings.
- class FeatureGrounder(intervention_manager: InterventionManager)[source]¶
Bases:
objectGround TNN features and prototypes to semantic concepts.
Provides comprehensive methods for associating learned parameters with human-interpretable concepts, enabling semantic understanding of TNN internals through both manual and automatic grounding approaches.
The grounder maintains a concept library and supports multiple grounding methods including manual assignment, activation-based analysis, and similarity-based matching. All groundings include confidence measures and can be validated against held-out data.
Note
The grounder works in conjunction with InterventionManager to access model parameters and layer information for grounding analysis.
- __init__(intervention_manager: InterventionManager)[source]¶
Initialize FeatureGrounder.
- Parameters:
intervention_manager (InterventionManager) – InterventionManager instance providing access to TNN model and layer information.
Note
Initializes an empty concept library and grounding dictionary. Concepts must be added before grounding can be performed.
- add_concept(name: str, description: str, examples: List[Any] | None = None, properties: Dict[str, Any] | None = None)[source]¶
Add a concept to the concept library.
- ground_feature_manually(layer_name: str, feature_index: int, concept_name: str, confidence: float = 1.0) ConceptGrounding[source]¶
Manually ground a feature to a concept.
- Parameters:
layer_name – Name of layer containing the feature
feature_index – Index of the feature
concept_name – Name of the concept to ground to
confidence – Confidence in the grounding
- Returns:
ConceptGrounding object
- ground_prototype_manually(layer_name: str, prototype_index: int, concept_name: str, confidence: float = 1.0) ConceptGrounding[source]¶
Manually ground a prototype to a concept.
- Parameters:
layer_name – Name of layer containing the prototype
prototype_index – Index of the prototype
concept_name – Name of the concept to ground to
confidence – Confidence in the grounding
- Returns:
ConceptGrounding object
- ground_features_by_activation(layer_name: str, concept_inputs: Dict[str, Tensor], confidence_threshold: float = 0.7) List[ConceptGrounding][source]¶
Ground features by analyzing their activation patterns on concept examples.
- Parameters:
layer_name – Name of layer to analyze
concept_inputs – Dict mapping concept names to input tensors
confidence_threshold – Minimum confidence for automatic grounding
- Returns:
List of ConceptGrounding objects
- ground_prototypes_by_similarity(layer_name: str, concept_prototypes: Dict[str, Tensor], confidence_threshold: float = 0.8) List[ConceptGrounding][source]¶
Ground prototypes by similarity to concept prototype vectors.
- Parameters:
layer_name – Name of layer to analyze
concept_prototypes – Dict mapping concept names to prototype vectors
confidence_threshold – Minimum similarity for automatic grounding
- Returns:
List of ConceptGrounding objects
- get_grounding(layer_name: str, parameter_type: str, parameter_index: int) ConceptGrounding | None[source]¶
Get grounding for a specific parameter.
- Parameters:
layer_name – Name of the layer
parameter_type – ‘feature’ or ‘prototype’
parameter_index – Index of the parameter
- Returns:
ConceptGrounding if exists, None otherwise
- list_groundings(layer_name: str | None = None, concept_name: str | None = None) List[ConceptGrounding][source]¶
List groundings, optionally filtered by layer or concept.
- Parameters:
layer_name – Filter by layer name
concept_name – Filter by concept name
- Returns:
List of ConceptGrounding objects
- remove_grounding(layer_name: str, parameter_type: str, parameter_index: int) bool[source]¶
Remove a grounding.
- Parameters:
layer_name – Name of the layer
parameter_type – ‘feature’ or ‘prototype’
parameter_index – Index of the parameter
- Returns:
True if grounding was removed, False if not found
- explain_parameter(layer_name: str, parameter_type: str, parameter_index: int) str[source]¶
Generate human-readable explanation of a parameter.
- Parameters:
layer_name – Name of the layer
parameter_type – ‘feature’ or ‘prototype’
parameter_index – Index of the parameter
- Returns:
Human-readable explanation string
- validate_groundings(validation_inputs: Tensor, validation_concepts: List[str]) Dict[str, float][source]¶
Validate groundings using held-out validation data.
- Parameters:
validation_inputs – Input data for validation
validation_concepts – Expected concept labels for each input
- Returns:
Dictionary with validation metrics
Classes¶
FeatureGrounder¶
- class FeatureGrounder(intervention_manager: InterventionManager)[source]¶
Ground TNN features and prototypes to semantic concepts.
Provides comprehensive methods for associating learned parameters with human-interpretable concepts, enabling semantic understanding of TNN internals through both manual and automatic grounding approaches.
The grounder maintains a concept library and supports multiple grounding methods including manual assignment, activation-based analysis, and similarity-based matching. All groundings include confidence measures and can be validated against held-out data.
Note
The grounder works in conjunction with InterventionManager to access model parameters and layer information for grounding analysis.
- __init__(intervention_manager: InterventionManager)[source]¶
Initialize FeatureGrounder.
- Parameters:
intervention_manager (InterventionManager) – InterventionManager instance providing access to TNN model and layer information.
Note
Initializes an empty concept library and grounding dictionary. Concepts must be added before grounding can be performed.
- add_concept(name: str, description: str, examples: List[Any] | None = None, properties: Dict[str, Any] | None = None)[source]¶
Add a concept to the concept library.
- ground_feature_manually(layer_name: str, feature_index: int, concept_name: str, confidence: float = 1.0) ConceptGrounding[source]¶
Manually ground a feature to a concept.
- Parameters:
layer_name – Name of layer containing the feature
feature_index – Index of the feature
concept_name – Name of the concept to ground to
confidence – Confidence in the grounding
- Returns:
ConceptGrounding object
- ground_prototype_manually(layer_name: str, prototype_index: int, concept_name: str, confidence: float = 1.0) ConceptGrounding[source]¶
Manually ground a prototype to a concept.
- Parameters:
layer_name – Name of layer containing the prototype
prototype_index – Index of the prototype
concept_name – Name of the concept to ground to
confidence – Confidence in the grounding
- Returns:
ConceptGrounding object
- ground_features_by_activation(layer_name: str, concept_inputs: Dict[str, Tensor], confidence_threshold: float = 0.7) List[ConceptGrounding][source]¶
Ground features by analyzing their activation patterns on concept examples.
- Parameters:
layer_name – Name of layer to analyze
concept_inputs – Dict mapping concept names to input tensors
confidence_threshold – Minimum confidence for automatic grounding
- Returns:
List of ConceptGrounding objects
- ground_prototypes_by_similarity(layer_name: str, concept_prototypes: Dict[str, Tensor], confidence_threshold: float = 0.8) List[ConceptGrounding][source]¶
Ground prototypes by similarity to concept prototype vectors.
- Parameters:
layer_name – Name of layer to analyze
concept_prototypes – Dict mapping concept names to prototype vectors
confidence_threshold – Minimum similarity for automatic grounding
- Returns:
List of ConceptGrounding objects
- get_grounding(layer_name: str, parameter_type: str, parameter_index: int) ConceptGrounding | None[source]¶
Get grounding for a specific parameter.
- Parameters:
layer_name – Name of the layer
parameter_type – ‘feature’ or ‘prototype’
parameter_index – Index of the parameter
- Returns:
ConceptGrounding if exists, None otherwise
- list_groundings(layer_name: str | None = None, concept_name: str | None = None) List[ConceptGrounding][source]¶
List groundings, optionally filtered by layer or concept.
- Parameters:
layer_name – Filter by layer name
concept_name – Filter by concept name
- Returns:
List of ConceptGrounding objects
- remove_grounding(layer_name: str, parameter_type: str, parameter_index: int) bool[source]¶
Remove a grounding.
- Parameters:
layer_name – Name of the layer
parameter_type – ‘feature’ or ‘prototype’
parameter_index – Index of the parameter
- Returns:
True if grounding was removed, False if not found
- explain_parameter(layer_name: str, parameter_type: str, parameter_index: int) str[source]¶
Generate human-readable explanation of a parameter.
- Parameters:
layer_name – Name of the layer
parameter_type – ‘feature’ or ‘prototype’
parameter_index – Index of the parameter
- Returns:
Human-readable explanation string
- validate_groundings(validation_inputs: Tensor, validation_concepts: List[str]) Dict[str, float][source]¶
Validate groundings using held-out validation data.
- Parameters:
validation_inputs – Input data for validation
validation_concepts – Expected concept labels for each input
- Returns:
Dictionary with validation metrics
ConceptGrounding¶
- class ConceptGrounding(layer_name: str, parameter_type: str, parameter_index: int, concept_name: str, concept_description: str, confidence: float, activation_correlation: float | None = None, visual_similarity: float | None = None, semantic_coherence: float | None = None, grounding_method: str = 'manual', validation_samples: List[Any] | None = None)[source]¶
Associates a TNN parameter with a semantic concept.
Records the association between a learned parameter (prototype or feature) and a human-interpretable concept, including confidence measures and supporting evidence for the grounding.
- validation_samples¶
Samples used for validation.
- Type:
Optional[List[Any]]
- __init__(layer_name: str, parameter_type: str, parameter_index: int, concept_name: str, concept_description: str, confidence: float, activation_correlation: float | None = None, visual_similarity: float | None = None, semantic_coherence: float | None = None, grounding_method: str = 'manual', validation_samples: List[Any] | None = None) None¶
ConceptLibrary¶
- class ConceptLibrary(concepts: ~typing.Dict[str, ~typing.Dict[str, ~typing.Any]] = <factory>)[source]¶
Library of semantic concepts for grounding.
Maintains a collection of semantic concepts with their descriptions, examples, and associated groundings for systematic interpretability analysis.
- concepts¶
Dictionary mapping concept names to concept metadata including description, examples, properties, and list of associated groundings.