verskyt.interventions

Intervention tools for analyzing and modifying TNN models.

Module: manager

Intervention Manager for Tversky Neural Networks.

Provides high-level APIs for inspecting and modifying TNN models, enabling interpretability and counterfactual analysis.

class PrototypeInfo(layer_name: str, prototype_index: int, vector: Tensor, layer_ref: TverskyProjectionLayer | TverskySimilarityLayer)[source]

Bases: object

Information about a prototype in a TNN layer.

Contains metadata and vector data for a single prototype, enabling inspection and modification of learned prototype representations.

layer_name

Name of the layer containing this prototype.

Type:

str

prototype_index

Index of the prototype within the layer.

Type:

int

vector

The prototype vector data.

Type:

torch.Tensor

layer_ref

Reference to the layer object.

Type:

Union[TverskyProjectionLayer, TverskySimilarityLayer]

layer_name: str
prototype_index: int
vector: Tensor
layer_ref: TverskyProjectionLayer | TverskySimilarityLayer
property shape: Size

Get the shape of the prototype vector.

Returns:

Shape of the prototype vector, typically [in_features].

Return type:

torch.Size

property norm: float

Get the L2 norm of the prototype vector.

Returns:

L2 norm of the prototype vector, useful for comparing

prototype magnitudes and analyzing learned representations.

Return type:

float

__init__(layer_name: str, prototype_index: int, vector: Tensor, layer_ref: TverskyProjectionLayer | TverskySimilarityLayer) None
class FeatureInfo(layer_name: str, feature_index: int, vector: Tensor, layer_ref: TverskyProjectionLayer | TverskySimilarityLayer)[source]

Bases: object

Information about a feature in a TNN layer.

Contains metadata and vector data for a single feature, enabling inspection and modification of learned feature representations.

layer_name

Name of the layer containing this feature.

Type:

str

feature_index

Index of the feature within the layer’s feature bank.

Type:

int

vector

The feature vector data.

Type:

torch.Tensor

layer_ref

Reference to the layer object.

Type:

Union[TverskyProjectionLayer, TverskySimilarityLayer]

layer_name: str
feature_index: int
vector: Tensor
layer_ref: TverskyProjectionLayer | TverskySimilarityLayer
property shape: Size

Get the shape of the feature vector.

Returns:

Shape of the feature vector, typically [in_features].

Return type:

torch.Size

property norm: float

Get the L2 norm of the feature vector.

Returns:

L2 norm of the feature vector, useful for comparing

feature magnitudes and analyzing learned representations.

Return type:

float

__init__(layer_name: str, feature_index: int, vector: Tensor, layer_ref: TverskyProjectionLayer | TverskySimilarityLayer) None
class InterventionManager(model: Module, model_name: str = 'TNN_Model')[source]

Bases: object

Manager for interventions on Tversky Neural Networks.

Provides a unified API for inspecting, modifying, and analyzing TNN models to enable interpretability research and counterfactual analysis. Supports tracking of interventions and restoration of original model states.

This class serves as the central hub for TNN interpretability, offering: - Comprehensive prototype and feature discovery across all layers - Safe parameter modification with automatic state tracking - Integration with impact assessment and grounding frameworks - Batch operations for systematic intervention studies

Note

The manager automatically discovers TNN layers (TverskyProjectionLayer and TverskySimilarityLayer) within the provided model and maintains original parameter states for restoration.

__init__(model: Module, model_name: str = 'TNN_Model')[source]

Initialize InterventionManager for a TNN model.

Automatically discovers all TNN layers within the model and captures the original parameter state for later restoration.

Parameters:
  • model (nn.Module) – PyTorch model containing TverskyProjectionLayer or TverskySimilarityLayer instances.

  • model_name (str, optional) – Human-readable name for the model. Defaults to “TNN_Model”.

Note

The manager will only operate on TverskyProjectionLayer and TverskySimilarityLayer instances found within the model.

property num_layers: int

Get the number of TNN layers in the model.

Returns:

Total count of TverskyProjectionLayer and TverskySimilarityLayer

instances found in the model.

Return type:

int

property layer_names: List[str]

Get names of all TNN layers in the model.

Returns:

List of layer names that can be used with other

manager methods for layer-specific operations.

Return type:

List[str]

get_layer_info(layer_name: str) Dict[str, Any][source]

Get comprehensive information about a TNN layer.

Provides detailed metadata about layer configuration, parameter shapes, and capabilities for inspection and intervention planning.

Parameters:

layer_name (str) – Name of the layer to inspect. Must be one of the names returned by the layer_names property.

Returns:

Dictionary containing layer metadata including:
  • layer_name: Name of the layer

  • layer_type: Class name of the layer

  • in_features: Input feature dimension

  • num_prototypes: Number of prototypes (if applicable)

  • num_features: Number of features (if applicable)

  • learnable_ab: Whether alpha/beta are learnable (if applicable)

Return type:

Dict[str, Any]

Raises:

ValueError – If layer_name is not found in the model.

list_prototypes(layer_name: str | None = None) List[PrototypeInfo][source]

List all prototypes in the model or specific layer.

Discovers and returns metadata for all prototype vectors across TNN layers, enabling systematic inspection and analysis.

Parameters:

layer_name (Optional[str], optional) – If specified, only return prototypes from this layer. If None, returns prototypes from all layers. Defaults to None.

Returns:

List of PrototypeInfo objects containing

prototype vectors and metadata. Each object provides access to the prototype vector, layer reference, and computed properties.

Return type:

List[PrototypeInfo]

Note

Only layers with ‘prototypes’ attribute (typically TverskyProjectionLayer) will contribute to the returned list.

list_features(layer_name: str | None = None) List[FeatureInfo][source]

List all features in the model or specific layer.

Discovers and returns metadata for all feature vectors across TNN layers, enabling systematic inspection and analysis of the learned feature representations.

Parameters:

layer_name (Optional[str], optional) – If specified, only return features from this layer. If None, returns features from all layers. Defaults to None.

Returns:

List of FeatureInfo objects containing

feature vectors and metadata. Each object provides access to the feature vector, layer reference, and computed properties.

Return type:

List[FeatureInfo]

Note

Only layers with ‘feature_bank’ attribute will contribute to the returned list. This typically includes both TverskyProjectionLayer and TverskySimilarityLayer instances.

get_prototype(layer_name: str, prototype_index: int) PrototypeInfo[source]

Get specific prototype information.

Retrieves detailed information about a single prototype vector, including its current values and layer context.

Parameters:
  • layer_name (str) – Name of the layer containing the prototype. Must be one of the names returned by layer_names.

  • prototype_index (int) – Index of the prototype within the layer. Must be in range [0, num_prototypes).

Returns:

Object containing the prototype vector, metadata,

and layer reference for further operations.

Return type:

PrototypeInfo

Raises:
  • ValueError – If layer_name is not found or layer has no prototypes.

  • IndexError – If prototype_index is out of bounds.

get_feature(layer_name: str, feature_index: int) FeatureInfo[source]

Get specific feature information.

Retrieves detailed information about a single feature vector, including its current values and layer context.

Parameters:
  • layer_name (str) – Name of the layer containing the feature. Must be one of the names returned by layer_names.

  • feature_index (int) – Index of the feature within the layer’s feature bank. Must be in range [0, num_features).

Returns:

Object containing the feature vector, metadata,

and layer reference for further operations.

Return type:

FeatureInfo

Raises:
  • ValueError – If layer_name is not found or layer has no feature bank.

  • IndexError – If feature_index is out of bounds.

modify_prototype(layer_name: str, prototype_index: int, new_vector: Tensor, track_intervention: bool = True) PrototypeInfo[source]

Modify a prototype vector in a TNN layer.

Safely modifies a prototype vector with automatic validation and optional intervention tracking for impact assessment and restoration.

Parameters:
  • layer_name (str) – Name of the layer containing the prototype. Must be one of the names returned by layer_names.

  • prototype_index (int) – Index of the prototype to modify. Must be in range [0, num_prototypes).

  • new_vector (torch.Tensor) – New prototype vector to set. Must match the shape of the existing prototype.

  • track_intervention (bool, optional) – Whether to record this intervention in the history for impact assessment. Defaults to True.

Returns:

Updated PrototypeInfo object reflecting the

new prototype vector state.

Return type:

PrototypeInfo

Raises:
  • ValueError – If layer_name is not found, layer has no prototypes, or new_vector shape doesn’t match expected dimensions.

  • IndexError – If prototype_index is out of bounds.

Note

When track_intervention=True, the original vector is stored for potential restoration via reset_to_original().

modify_feature(layer_name: str, feature_index: int, new_vector: Tensor, track_intervention: bool = True) FeatureInfo[source]

Modify a feature vector in a TNN layer.

Safely modifies a feature vector with automatic validation and optional intervention tracking for impact assessment and restoration.

Parameters:
  • layer_name (str) – Name of the layer containing the feature. Must be one of the names returned by layer_names.

  • feature_index (int) – Index of the feature to modify within the layer’s feature bank. Must be in range [0, num_features).

  • new_vector (torch.Tensor) – New feature vector to set. Must match the shape of the existing feature.

  • track_intervention (bool, optional) – Whether to record this intervention in the history for impact assessment. Defaults to True.

Returns:

Updated FeatureInfo object reflecting the

new feature vector state.

Return type:

FeatureInfo

Raises:
  • ValueError – If layer_name is not found, layer has no feature bank, or new_vector shape doesn’t match expected dimensions.

  • IndexError – If feature_index is out of bounds.

Note

When track_intervention=True, the original vector is stored for potential restoration via reset_to_original().

reset_to_original() None[source]

Reset all TNN layers to their original state.

Restores all prototype vectors, feature vectors, and learnable parameters (alpha, beta) to their values at manager initialization. Also clears the intervention history.

Note

This operation cannot be undone. All modifications made through modify_prototype() and modify_feature() will be reverted to the original model state.

get_intervention_history() List[Dict[str, Any]][source]

Get history of all interventions performed.

Returns a copy of the intervention history containing detailed records of all modifications made through this manager.

Returns:

List of intervention records, each containing:
  • type: Type of intervention (‘prototype_modification’

    or ‘feature_modification’)

  • layer_name: Name of the affected layer

  • index: Index of the modified parameter

  • original_vector: Original parameter vector (cloned)

  • new_vector: New parameter vector (cloned)

  • timestamp: Sequential intervention number

Return type:

List[Dict[str, Any]]

summary() str[source]

Get a summary of the model and available interventions.

Provides a comprehensive overview of the model structure, TNN layers, and intervention capabilities for inspection.

Returns:

Multi-line summary string containing:
  • Model name and layer count

  • Detailed information for each TNN layer

  • Number of prototypes and features per layer

  • Parameter values (alpha, beta, theta)

  • Total intervention count

Return type:

str

Classes

InterventionManager

class InterventionManager(model: Module, model_name: str = 'TNN_Model')[source]

Manager for interventions on Tversky Neural Networks.

Provides a unified API for inspecting, modifying, and analyzing TNN models to enable interpretability research and counterfactual analysis. Supports tracking of interventions and restoration of original model states.

This class serves as the central hub for TNN interpretability, offering: - Comprehensive prototype and feature discovery across all layers - Safe parameter modification with automatic state tracking - Integration with impact assessment and grounding frameworks - Batch operations for systematic intervention studies

Note

The manager automatically discovers TNN layers (TverskyProjectionLayer and TverskySimilarityLayer) within the provided model and maintains original parameter states for restoration.

__init__(model: Module, model_name: str = 'TNN_Model')[source]

Initialize InterventionManager for a TNN model.

Automatically discovers all TNN layers within the model and captures the original parameter state for later restoration.

Parameters:
  • model (nn.Module) – PyTorch model containing TverskyProjectionLayer or TverskySimilarityLayer instances.

  • model_name (str, optional) – Human-readable name for the model. Defaults to “TNN_Model”.

Note

The manager will only operate on TverskyProjectionLayer and TverskySimilarityLayer instances found within the model.

property num_layers: int

Get the number of TNN layers in the model.

Returns:

Total count of TverskyProjectionLayer and TverskySimilarityLayer

instances found in the model.

Return type:

int

property layer_names: List[str]

Get names of all TNN layers in the model.

Returns:

List of layer names that can be used with other

manager methods for layer-specific operations.

Return type:

List[str]

get_layer_info(layer_name: str) Dict[str, Any][source]

Get comprehensive information about a TNN layer.

Provides detailed metadata about layer configuration, parameter shapes, and capabilities for inspection and intervention planning.

Parameters:

layer_name (str) – Name of the layer to inspect. Must be one of the names returned by the layer_names property.

Returns:

Dictionary containing layer metadata including:
  • layer_name: Name of the layer

  • layer_type: Class name of the layer

  • in_features: Input feature dimension

  • num_prototypes: Number of prototypes (if applicable)

  • num_features: Number of features (if applicable)

  • learnable_ab: Whether alpha/beta are learnable (if applicable)

Return type:

Dict[str, Any]

Raises:

ValueError – If layer_name is not found in the model.

list_prototypes(layer_name: str | None = None) List[PrototypeInfo][source]

List all prototypes in the model or specific layer.

Discovers and returns metadata for all prototype vectors across TNN layers, enabling systematic inspection and analysis.

Parameters:

layer_name (Optional[str], optional) – If specified, only return prototypes from this layer. If None, returns prototypes from all layers. Defaults to None.

Returns:

List of PrototypeInfo objects containing

prototype vectors and metadata. Each object provides access to the prototype vector, layer reference, and computed properties.

Return type:

List[PrototypeInfo]

Note

Only layers with ‘prototypes’ attribute (typically TverskyProjectionLayer) will contribute to the returned list.

list_features(layer_name: str | None = None) List[FeatureInfo][source]

List all features in the model or specific layer.

Discovers and returns metadata for all feature vectors across TNN layers, enabling systematic inspection and analysis of the learned feature representations.

Parameters:

layer_name (Optional[str], optional) – If specified, only return features from this layer. If None, returns features from all layers. Defaults to None.

Returns:

List of FeatureInfo objects containing

feature vectors and metadata. Each object provides access to the feature vector, layer reference, and computed properties.

Return type:

List[FeatureInfo]

Note

Only layers with ‘feature_bank’ attribute will contribute to the returned list. This typically includes both TverskyProjectionLayer and TverskySimilarityLayer instances.

get_prototype(layer_name: str, prototype_index: int) PrototypeInfo[source]

Get specific prototype information.

Retrieves detailed information about a single prototype vector, including its current values and layer context.

Parameters:
  • layer_name (str) – Name of the layer containing the prototype. Must be one of the names returned by layer_names.

  • prototype_index (int) – Index of the prototype within the layer. Must be in range [0, num_prototypes).

Returns:

Object containing the prototype vector, metadata,

and layer reference for further operations.

Return type:

PrototypeInfo

Raises:
  • ValueError – If layer_name is not found or layer has no prototypes.

  • IndexError – If prototype_index is out of bounds.

get_feature(layer_name: str, feature_index: int) FeatureInfo[source]

Get specific feature information.

Retrieves detailed information about a single feature vector, including its current values and layer context.

Parameters:
  • layer_name (str) – Name of the layer containing the feature. Must be one of the names returned by layer_names.

  • feature_index (int) – Index of the feature within the layer’s feature bank. Must be in range [0, num_features).

Returns:

Object containing the feature vector, metadata,

and layer reference for further operations.

Return type:

FeatureInfo

Raises:
  • ValueError – If layer_name is not found or layer has no feature bank.

  • IndexError – If feature_index is out of bounds.

modify_prototype(layer_name: str, prototype_index: int, new_vector: Tensor, track_intervention: bool = True) PrototypeInfo[source]

Modify a prototype vector in a TNN layer.

Safely modifies a prototype vector with automatic validation and optional intervention tracking for impact assessment and restoration.

Parameters:
  • layer_name (str) – Name of the layer containing the prototype. Must be one of the names returned by layer_names.

  • prototype_index (int) – Index of the prototype to modify. Must be in range [0, num_prototypes).

  • new_vector (torch.Tensor) – New prototype vector to set. Must match the shape of the existing prototype.

  • track_intervention (bool, optional) – Whether to record this intervention in the history for impact assessment. Defaults to True.

Returns:

Updated PrototypeInfo object reflecting the

new prototype vector state.

Return type:

PrototypeInfo

Raises:
  • ValueError – If layer_name is not found, layer has no prototypes, or new_vector shape doesn’t match expected dimensions.

  • IndexError – If prototype_index is out of bounds.

Note

When track_intervention=True, the original vector is stored for potential restoration via reset_to_original().

modify_feature(layer_name: str, feature_index: int, new_vector: Tensor, track_intervention: bool = True) FeatureInfo[source]

Modify a feature vector in a TNN layer.

Safely modifies a feature vector with automatic validation and optional intervention tracking for impact assessment and restoration.

Parameters:
  • layer_name (str) – Name of the layer containing the feature. Must be one of the names returned by layer_names.

  • feature_index (int) – Index of the feature to modify within the layer’s feature bank. Must be in range [0, num_features).

  • new_vector (torch.Tensor) – New feature vector to set. Must match the shape of the existing feature.

  • track_intervention (bool, optional) – Whether to record this intervention in the history for impact assessment. Defaults to True.

Returns:

Updated FeatureInfo object reflecting the

new feature vector state.

Return type:

FeatureInfo

Raises:
  • ValueError – If layer_name is not found, layer has no feature bank, or new_vector shape doesn’t match expected dimensions.

  • IndexError – If feature_index is out of bounds.

Note

When track_intervention=True, the original vector is stored for potential restoration via reset_to_original().

reset_to_original() None[source]

Reset all TNN layers to their original state.

Restores all prototype vectors, feature vectors, and learnable parameters (alpha, beta) to their values at manager initialization. Also clears the intervention history.

Note

This operation cannot be undone. All modifications made through modify_prototype() and modify_feature() will be reverted to the original model state.

get_intervention_history() List[Dict[str, Any]][source]

Get history of all interventions performed.

Returns a copy of the intervention history containing detailed records of all modifications made through this manager.

Returns:

List of intervention records, each containing:
  • type: Type of intervention (‘prototype_modification’

    or ‘feature_modification’)

  • layer_name: Name of the affected layer

  • index: Index of the modified parameter

  • original_vector: Original parameter vector (cloned)

  • new_vector: New parameter vector (cloned)

  • timestamp: Sequential intervention number

Return type:

List[Dict[str, Any]]

summary() str[source]

Get a summary of the model and available interventions.

Provides a comprehensive overview of the model structure, TNN layers, and intervention capabilities for inspection.

Returns:

Multi-line summary string containing:
  • Model name and layer count

  • Detailed information for each TNN layer

  • Number of prototypes and features per layer

  • Parameter values (alpha, beta, theta)

  • Total intervention count

Return type:

str

PrototypeInfo

class PrototypeInfo(layer_name: str, prototype_index: int, vector: Tensor, layer_ref: TverskyProjectionLayer | TverskySimilarityLayer)[source]

Information about a prototype in a TNN layer.

Contains metadata and vector data for a single prototype, enabling inspection and modification of learned prototype representations.

layer_name

Name of the layer containing this prototype.

Type:

str

prototype_index

Index of the prototype within the layer.

Type:

int

vector

The prototype vector data.

Type:

torch.Tensor

layer_ref

Reference to the layer object.

Type:

Union[TverskyProjectionLayer, TverskySimilarityLayer]

layer_name: str
prototype_index: int
vector: Tensor
layer_ref: TverskyProjectionLayer | TverskySimilarityLayer
property shape: Size

Get the shape of the prototype vector.

Returns:

Shape of the prototype vector, typically [in_features].

Return type:

torch.Size

property norm: float

Get the L2 norm of the prototype vector.

Returns:

L2 norm of the prototype vector, useful for comparing

prototype magnitudes and analyzing learned representations.

Return type:

float

__init__(layer_name: str, prototype_index: int, vector: Tensor, layer_ref: TverskyProjectionLayer | TverskySimilarityLayer) None

FeatureInfo

class FeatureInfo(layer_name: str, feature_index: int, vector: Tensor, layer_ref: TverskyProjectionLayer | TverskySimilarityLayer)[source]

Information about a feature in a TNN layer.

Contains metadata and vector data for a single feature, enabling inspection and modification of learned feature representations.

layer_name

Name of the layer containing this feature.

Type:

str

feature_index

Index of the feature within the layer’s feature bank.

Type:

int

vector

The feature vector data.

Type:

torch.Tensor

layer_ref

Reference to the layer object.

Type:

Union[TverskyProjectionLayer, TverskySimilarityLayer]

layer_name: str
feature_index: int
vector: Tensor
layer_ref: TverskyProjectionLayer | TverskySimilarityLayer
property shape: Size

Get the shape of the feature vector.

Returns:

Shape of the feature vector, typically [in_features].

Return type:

torch.Size

property norm: float

Get the L2 norm of the feature vector.

Returns:

L2 norm of the feature vector, useful for comparing

feature magnitudes and analyzing learned representations.

Return type:

float

__init__(layer_name: str, feature_index: int, vector: Tensor, layer_ref: TverskyProjectionLayer | TverskySimilarityLayer) None

Module: analysis

Analysis tools for TNN interventions.

Provides counterfactual analysis and impact assessment capabilities for understanding how interventions affect model behavior.

class ImpactMetrics(output_distance: float, output_correlation: float, prediction_change_rate: float, confidence_change: float, feature_activation_change: Tensor | None = None, similarity_score_change: Tensor | None = None, effect_size: float = 0.0, significance: float | None = None)[source]

Bases: object

Metrics quantifying the impact of an intervention.

Comprehensive metrics for evaluating how parameter modifications affect model behavior, including output changes, prediction shifts, and statistical significance measures.

output_distance

L2 distance between original and modified outputs.

Type:

float

output_correlation

Pearson correlation between original and modified outputs.

Type:

float

prediction_change_rate

Fraction of samples with changed predictions.

Type:

float

confidence_change

Average change in prediction confidence scores.

Type:

float

feature_activation_change

Changes in feature activation patterns, if computed.

Type:

Optional[torch.Tensor]

similarity_score_change

Changes in similarity scores, if computed.

Type:

Optional[torch.Tensor]

effect_size

Cohen’s d or similar standardized effect size measure.

Type:

float

significance

p-value from statistical significance test, if performed.

Type:

Optional[float]

output_distance: float
output_correlation: float
prediction_change_rate: float
confidence_change: float
feature_activation_change: Tensor | None = None
similarity_score_change: Tensor | None = None
effect_size: float = 0.0
significance: float | None = None
__init__(output_distance: float, output_correlation: float, prediction_change_rate: float, confidence_change: float, feature_activation_change: Tensor | None = None, similarity_score_change: Tensor | None = None, effect_size: float = 0.0, significance: float | None = None) None
class CounterfactualResult(original_input: Tensor, original_output: Tensor, original_prediction: int, modified_input: Tensor, modified_output: Tensor, modified_prediction: int, intervention_description: str, success: bool, input_perturbation_norm: float, output_change_norm: float, confidence_change: float)[source]

Bases: object

Result of a counterfactual analysis.

Contains the complete record of a successful counterfactual generation, including original and modified states, intervention details, and quantitative measures of the change achieved.

original_input

Original input sample.

Type:

torch.Tensor

original_output

Model output for original input.

Type:

torch.Tensor

original_prediction

Predicted class for original input.

Type:

int

modified_input

Input after intervention (may be unchanged).

Type:

torch.Tensor

modified_output

Model output after intervention.

Type:

torch.Tensor

modified_prediction

Predicted class after intervention.

Type:

int

intervention_description

Human-readable description of the intervention.

Type:

str

success

Whether intervention achieved the desired outcome.

Type:

bool

input_perturbation_norm

L2 norm of input perturbation.

Type:

float

output_change_norm

L2 norm of output change.

Type:

float

confidence_change

Change in prediction confidence.

Type:

float

original_input: Tensor
original_output: Tensor
original_prediction: int
modified_input: Tensor
modified_output: Tensor
modified_prediction: int
intervention_description: str
success: bool
input_perturbation_norm: float
output_change_norm: float
confidence_change: float
__init__(original_input: Tensor, original_output: Tensor, original_prediction: int, modified_input: Tensor, modified_output: Tensor, modified_prediction: int, intervention_description: str, success: bool, input_perturbation_norm: float, output_change_norm: float, confidence_change: float) None
class ImpactAssessment(intervention_manager: InterventionManager)[source]

Bases: object

Assess the impact of interventions on model behavior.

Provides comprehensive methods to quantify how prototype or feature modifications affect model outputs, enabling systematic evaluation of intervention effectiveness and model interpretability.

This class works in conjunction with InterventionManager to provide safe, temporary modifications with automatic restoration, allowing researchers to explore counterfactual scenarios without permanent model changes.

Note

All interventions are automatically reverted after assessment, ensuring the model state remains unchanged unless explicitly modified through the InterventionManager.

__init__(intervention_manager: InterventionManager)[source]

Initialize ImpactAssessment.

Parameters:

intervention_manager (InterventionManager) – InterventionManager instance to analyze. Must be initialized with a TNN model.

Note

The impact assessor uses the manager’s model directly and leverages its intervention tracking capabilities.

assess_prototype_impact(layer_name: str, prototype_index: int, new_vector: Tensor, test_inputs: Tensor, test_targets: Tensor | None = None) ImpactMetrics[source]

Assess impact of modifying a prototype on model behavior.

Temporarily modifies a prototype vector and quantifies the resulting changes in model outputs, predictions, and confidence scores across a set of test inputs. The original prototype is automatically restored.

Parameters:
  • layer_name (str) – Name of the layer containing the prototype. Must be one of the manager’s discovered layer names.

  • prototype_index (int) – Index of the prototype to modify. Must be in range [0, num_prototypes).

  • new_vector (torch.Tensor) – New prototype vector to test. Must match the shape of the existing prototype.

  • test_inputs (torch.Tensor) – Input data to evaluate impact on. Shape should be [batch_size, in_features].

  • test_targets (Optional[torch.Tensor], optional) – Target labels for computing accuracy-based metrics. Defaults to None.

Returns:

Comprehensive metrics quantifying the intervention’s

effects including output distance, correlation, prediction changes, confidence shifts, and statistical effect size.

Return type:

ImpactMetrics

Note

The prototype is automatically restored to its original value after assessment, regardless of success or failure.

assess_feature_impact(layer_name: str, feature_index: int, new_vector: Tensor, test_inputs: Tensor, test_targets: Tensor | None = None) ImpactMetrics[source]

Assess impact of modifying a feature on model behavior.

Temporarily modifies a feature vector and quantifies the resulting changes in model outputs, predictions, and confidence scores across a set of test inputs. The original feature is automatically restored.

Parameters:
  • layer_name (str) – Name of the layer containing the feature. Must be one of the manager’s discovered layer names.

  • feature_index (int) – Index of the feature to modify within the layer’s feature bank. Must be in range [0, num_features).

  • new_vector (torch.Tensor) – New feature vector to test. Must match the shape of the existing feature.

  • test_inputs (torch.Tensor) – Input data to evaluate impact on. Shape should be [batch_size, in_features].

  • test_targets (Optional[torch.Tensor], optional) – Target labels for computing accuracy-based metrics. Defaults to None.

Returns:

Comprehensive metrics quantifying the intervention’s

effects including output distance, correlation, prediction changes, confidence shifts, and statistical effect size.

Return type:

ImpactMetrics

Note

The feature is automatically restored to its original value after assessment, regardless of success or failure.

sensitivity_analysis(layer_name: str, parameter_type: str, parameter_index: int, test_inputs: Tensor, perturbation_scales: List[float] = None) Dict[float, ImpactMetrics][source]

Perform sensitivity analysis by applying different scales of perturbation.

Parameters:
  • layer_name – Name of layer to analyze

  • parameter_type – ‘prototype’ or ‘feature’

  • parameter_index – Index of parameter to perturb

  • test_inputs – Input data for evaluation

  • perturbation_scales – List of perturbation scales to test

Returns:

Dictionary mapping perturbation scales to impact metrics

class CounterfactualAnalyzer(intervention_manager: InterventionManager)[source]

Bases: object

Perform counterfactual analysis on TNN models.

Generates counterfactual examples by finding minimal parameter interventions that change model predictions for specific inputs. Uses gradient-based optimization to discover how prototype or feature modifications can achieve desired prediction outcomes.

This class enables researchers to understand model decision boundaries and generate explanations for model behavior through systematic parameter space exploration.

Note

All interventions are temporary and automatically restored, allowing safe exploration of counterfactual scenarios.

__init__(intervention_manager: InterventionManager)[source]

Initialize CounterfactualAnalyzer.

Parameters:
  • intervention_manager (InterventionManager) – InterventionManager instance to use for parameter modifications and model access.

  • intervention_manager – InterventionManager instance to use

find_prototype_counterfactuals(input_sample: Tensor, target_class: int, layer_name: str, max_iterations: int = 100, learning_rate: float = 0.01) List[CounterfactualResult][source]

Find counterfactual examples by modifying prototypes.

Parameters:
  • input_sample – Input sample to generate counterfactuals for

  • target_class – Desired output class

  • layer_name – Layer to modify prototypes in

  • max_iterations – Maximum optimization iterations

  • learning_rate – Learning rate for optimization

Returns:

List of successful counterfactual results

find_feature_counterfactuals(input_sample: Tensor, target_class: int, layer_name: str, max_iterations: int = 100, learning_rate: float = 0.01) List[CounterfactualResult][source]

Find counterfactual examples by modifying features.

Parameters:
  • input_sample – Input sample to generate counterfactuals for

  • target_class – Desired output class

  • layer_name – Layer to modify features in

  • max_iterations – Maximum optimization iterations

  • learning_rate – Learning rate for optimization

Returns:

List of successful counterfactual results

analyze_decision_boundary(input_samples: Tensor, layer_name: str, num_perturbations: int = 10) Dict[str, Any][source]

Analyze how the decision boundary changes with interventions.

Parameters:
  • input_samples – Set of input samples near decision boundary

  • layer_name – Layer to analyze

  • num_perturbations – Number of random perturbations to test

Returns:

Dictionary with boundary analysis results

Classes

ImpactAssessment

class ImpactAssessment(intervention_manager: InterventionManager)[source]

Assess the impact of interventions on model behavior.

Provides comprehensive methods to quantify how prototype or feature modifications affect model outputs, enabling systematic evaluation of intervention effectiveness and model interpretability.

This class works in conjunction with InterventionManager to provide safe, temporary modifications with automatic restoration, allowing researchers to explore counterfactual scenarios without permanent model changes.

Note

All interventions are automatically reverted after assessment, ensuring the model state remains unchanged unless explicitly modified through the InterventionManager.

__init__(intervention_manager: InterventionManager)[source]

Initialize ImpactAssessment.

Parameters:

intervention_manager (InterventionManager) – InterventionManager instance to analyze. Must be initialized with a TNN model.

Note

The impact assessor uses the manager’s model directly and leverages its intervention tracking capabilities.

assess_prototype_impact(layer_name: str, prototype_index: int, new_vector: Tensor, test_inputs: Tensor, test_targets: Tensor | None = None) ImpactMetrics[source]

Assess impact of modifying a prototype on model behavior.

Temporarily modifies a prototype vector and quantifies the resulting changes in model outputs, predictions, and confidence scores across a set of test inputs. The original prototype is automatically restored.

Parameters:
  • layer_name (str) – Name of the layer containing the prototype. Must be one of the manager’s discovered layer names.

  • prototype_index (int) – Index of the prototype to modify. Must be in range [0, num_prototypes).

  • new_vector (torch.Tensor) – New prototype vector to test. Must match the shape of the existing prototype.

  • test_inputs (torch.Tensor) – Input data to evaluate impact on. Shape should be [batch_size, in_features].

  • test_targets (Optional[torch.Tensor], optional) – Target labels for computing accuracy-based metrics. Defaults to None.

Returns:

Comprehensive metrics quantifying the intervention’s

effects including output distance, correlation, prediction changes, confidence shifts, and statistical effect size.

Return type:

ImpactMetrics

Note

The prototype is automatically restored to its original value after assessment, regardless of success or failure.

assess_feature_impact(layer_name: str, feature_index: int, new_vector: Tensor, test_inputs: Tensor, test_targets: Tensor | None = None) ImpactMetrics[source]

Assess impact of modifying a feature on model behavior.

Temporarily modifies a feature vector and quantifies the resulting changes in model outputs, predictions, and confidence scores across a set of test inputs. The original feature is automatically restored.

Parameters:
  • layer_name (str) – Name of the layer containing the feature. Must be one of the manager’s discovered layer names.

  • feature_index (int) – Index of the feature to modify within the layer’s feature bank. Must be in range [0, num_features).

  • new_vector (torch.Tensor) – New feature vector to test. Must match the shape of the existing feature.

  • test_inputs (torch.Tensor) – Input data to evaluate impact on. Shape should be [batch_size, in_features].

  • test_targets (Optional[torch.Tensor], optional) – Target labels for computing accuracy-based metrics. Defaults to None.

Returns:

Comprehensive metrics quantifying the intervention’s

effects including output distance, correlation, prediction changes, confidence shifts, and statistical effect size.

Return type:

ImpactMetrics

Note

The feature is automatically restored to its original value after assessment, regardless of success or failure.

sensitivity_analysis(layer_name: str, parameter_type: str, parameter_index: int, test_inputs: Tensor, perturbation_scales: List[float] = None) Dict[float, ImpactMetrics][source]

Perform sensitivity analysis by applying different scales of perturbation.

Parameters:
  • layer_name – Name of layer to analyze

  • parameter_type – ‘prototype’ or ‘feature’

  • parameter_index – Index of parameter to perturb

  • test_inputs – Input data for evaluation

  • perturbation_scales – List of perturbation scales to test

Returns:

Dictionary mapping perturbation scales to impact metrics

CounterfactualAnalyzer

class CounterfactualAnalyzer(intervention_manager: InterventionManager)[source]

Perform counterfactual analysis on TNN models.

Generates counterfactual examples by finding minimal parameter interventions that change model predictions for specific inputs. Uses gradient-based optimization to discover how prototype or feature modifications can achieve desired prediction outcomes.

This class enables researchers to understand model decision boundaries and generate explanations for model behavior through systematic parameter space exploration.

Note

All interventions are temporary and automatically restored, allowing safe exploration of counterfactual scenarios.

__init__(intervention_manager: InterventionManager)[source]

Initialize CounterfactualAnalyzer.

Parameters:
  • intervention_manager (InterventionManager) – InterventionManager instance to use for parameter modifications and model access.

  • intervention_manager – InterventionManager instance to use

find_prototype_counterfactuals(input_sample: Tensor, target_class: int, layer_name: str, max_iterations: int = 100, learning_rate: float = 0.01) List[CounterfactualResult][source]

Find counterfactual examples by modifying prototypes.

Parameters:
  • input_sample – Input sample to generate counterfactuals for

  • target_class – Desired output class

  • layer_name – Layer to modify prototypes in

  • max_iterations – Maximum optimization iterations

  • learning_rate – Learning rate for optimization

Returns:

List of successful counterfactual results

find_feature_counterfactuals(input_sample: Tensor, target_class: int, layer_name: str, max_iterations: int = 100, learning_rate: float = 0.01) List[CounterfactualResult][source]

Find counterfactual examples by modifying features.

Parameters:
  • input_sample – Input sample to generate counterfactuals for

  • target_class – Desired output class

  • layer_name – Layer to modify features in

  • max_iterations – Maximum optimization iterations

  • learning_rate – Learning rate for optimization

Returns:

List of successful counterfactual results

analyze_decision_boundary(input_samples: Tensor, layer_name: str, num_perturbations: int = 10) Dict[str, Any][source]

Analyze how the decision boundary changes with interventions.

Parameters:
  • input_samples – Set of input samples near decision boundary

  • layer_name – Layer to analyze

  • num_perturbations – Number of random perturbations to test

Returns:

Dictionary with boundary analysis results

ImpactMetrics

class ImpactMetrics(output_distance: float, output_correlation: float, prediction_change_rate: float, confidence_change: float, feature_activation_change: Tensor | None = None, similarity_score_change: Tensor | None = None, effect_size: float = 0.0, significance: float | None = None)[source]

Metrics quantifying the impact of an intervention.

Comprehensive metrics for evaluating how parameter modifications affect model behavior, including output changes, prediction shifts, and statistical significance measures.

output_distance

L2 distance between original and modified outputs.

Type:

float

output_correlation

Pearson correlation between original and modified outputs.

Type:

float

prediction_change_rate

Fraction of samples with changed predictions.

Type:

float

confidence_change

Average change in prediction confidence scores.

Type:

float

feature_activation_change

Changes in feature activation patterns, if computed.

Type:

Optional[torch.Tensor]

similarity_score_change

Changes in similarity scores, if computed.

Type:

Optional[torch.Tensor]

effect_size

Cohen’s d or similar standardized effect size measure.

Type:

float

significance

p-value from statistical significance test, if performed.

Type:

Optional[float]

output_distance: float
output_correlation: float
prediction_change_rate: float
confidence_change: float
feature_activation_change: Tensor | None = None
similarity_score_change: Tensor | None = None
effect_size: float = 0.0
significance: float | None = None
__init__(output_distance: float, output_correlation: float, prediction_change_rate: float, confidence_change: float, feature_activation_change: Tensor | None = None, similarity_score_change: Tensor | None = None, effect_size: float = 0.0, significance: float | None = None) None

CounterfactualResult

class CounterfactualResult(original_input: Tensor, original_output: Tensor, original_prediction: int, modified_input: Tensor, modified_output: Tensor, modified_prediction: int, intervention_description: str, success: bool, input_perturbation_norm: float, output_change_norm: float, confidence_change: float)[source]

Result of a counterfactual analysis.

Contains the complete record of a successful counterfactual generation, including original and modified states, intervention details, and quantitative measures of the change achieved.

original_input

Original input sample.

Type:

torch.Tensor

original_output

Model output for original input.

Type:

torch.Tensor

original_prediction

Predicted class for original input.

Type:

int

modified_input

Input after intervention (may be unchanged).

Type:

torch.Tensor

modified_output

Model output after intervention.

Type:

torch.Tensor

modified_prediction

Predicted class after intervention.

Type:

int

intervention_description

Human-readable description of the intervention.

Type:

str

success

Whether intervention achieved the desired outcome.

Type:

bool

input_perturbation_norm

L2 norm of input perturbation.

Type:

float

output_change_norm

L2 norm of output change.

Type:

float

confidence_change

Change in prediction confidence.

Type:

float

original_input: Tensor
original_output: Tensor
original_prediction: int
modified_input: Tensor
modified_output: Tensor
modified_prediction: int
intervention_description: str
success: bool
input_perturbation_norm: float
output_change_norm: float
confidence_change: float
__init__(original_input: Tensor, original_output: Tensor, original_prediction: int, modified_input: Tensor, modified_output: Tensor, modified_prediction: int, intervention_description: str, success: bool, input_perturbation_norm: float, output_change_norm: float, confidence_change: float) None

Module: grounding

Feature grounding for Tversky Neural Networks.

Provides tools for grounding features and prototypes to semantic concepts, enabling human-interpretable understanding of TNN internals.

class ConceptGrounding(layer_name: str, parameter_type: str, parameter_index: int, concept_name: str, concept_description: str, confidence: float, activation_correlation: float | None = None, visual_similarity: float | None = None, semantic_coherence: float | None = None, grounding_method: str = 'manual', validation_samples: List[Any] | None = None)[source]

Bases: object

Associates a TNN parameter with a semantic concept.

Records the association between a learned parameter (prototype or feature) and a human-interpretable concept, including confidence measures and supporting evidence for the grounding.

layer_name

Name of the layer containing the parameter.

Type:

str

parameter_type

Type of parameter (‘feature’ or ‘prototype’).

Type:

str

parameter_index

Index of the parameter within the layer.

Type:

int

concept_name

Name of the associated semantic concept.

Type:

str

concept_description

Human-readable description of the concept.

Type:

str

confidence

Confidence in the grounding, range [0, 1].

Type:

float

activation_correlation

Correlation with concept activations.

Type:

Optional[float]

visual_similarity

Visual similarity to concept examples.

Type:

Optional[float]

semantic_coherence

Semantic coherence measure.

Type:

Optional[float]

grounding_method

Method used for grounding (‘manual’, ‘activation_based’, etc.).

Type:

str

validation_samples

Samples used for validation.

Type:

Optional[List[Any]]

layer_name: str
parameter_type: str
parameter_index: int
concept_name: str
concept_description: str
confidence: float
activation_correlation: float | None = None
visual_similarity: float | None = None
semantic_coherence: float | None = None
grounding_method: str = 'manual'
validation_samples: List[Any] | None = None
__init__(layer_name: str, parameter_type: str, parameter_index: int, concept_name: str, concept_description: str, confidence: float, activation_correlation: float | None = None, visual_similarity: float | None = None, semantic_coherence: float | None = None, grounding_method: str = 'manual', validation_samples: List[Any] | None = None) None
class ConceptLibrary(concepts: ~typing.Dict[str, ~typing.Dict[str, ~typing.Any]] = <factory>)[source]

Bases: object

Library of semantic concepts for grounding.

Maintains a collection of semantic concepts with their descriptions, examples, and associated groundings for systematic interpretability analysis.

concepts

Dictionary mapping concept names to concept metadata including description, examples, properties, and list of associated groundings.

Type:

Dict[str, Dict[str, Any]]

concepts: Dict[str, Dict[str, Any]]
add_concept(name: str, description: str, examples: List[Any] | None = None, properties: Dict[str, Any] | None = None)[source]

Add a concept to the library.

get_concept(name: str) Dict[str, Any] | None[source]

Get concept information.

list_concepts() List[str][source]

List all concept names.

__init__(concepts: ~typing.Dict[str, ~typing.Dict[str, ~typing.Any]] = <factory>) None
class FeatureGrounder(intervention_manager: InterventionManager)[source]

Bases: object

Ground TNN features and prototypes to semantic concepts.

Provides comprehensive methods for associating learned parameters with human-interpretable concepts, enabling semantic understanding of TNN internals through both manual and automatic grounding approaches.

The grounder maintains a concept library and supports multiple grounding methods including manual assignment, activation-based analysis, and similarity-based matching. All groundings include confidence measures and can be validated against held-out data.

Note

The grounder works in conjunction with InterventionManager to access model parameters and layer information for grounding analysis.

__init__(intervention_manager: InterventionManager)[source]

Initialize FeatureGrounder.

Parameters:

intervention_manager (InterventionManager) – InterventionManager instance providing access to TNN model and layer information.

Note

Initializes an empty concept library and grounding dictionary. Concepts must be added before grounding can be performed.

add_concept(name: str, description: str, examples: List[Any] | None = None, properties: Dict[str, Any] | None = None)[source]

Add a concept to the concept library.

ground_feature_manually(layer_name: str, feature_index: int, concept_name: str, confidence: float = 1.0) ConceptGrounding[source]

Manually ground a feature to a concept.

Parameters:
  • layer_name – Name of layer containing the feature

  • feature_index – Index of the feature

  • concept_name – Name of the concept to ground to

  • confidence – Confidence in the grounding

Returns:

ConceptGrounding object

ground_prototype_manually(layer_name: str, prototype_index: int, concept_name: str, confidence: float = 1.0) ConceptGrounding[source]

Manually ground a prototype to a concept.

Parameters:
  • layer_name – Name of layer containing the prototype

  • prototype_index – Index of the prototype

  • concept_name – Name of the concept to ground to

  • confidence – Confidence in the grounding

Returns:

ConceptGrounding object

ground_features_by_activation(layer_name: str, concept_inputs: Dict[str, Tensor], confidence_threshold: float = 0.7) List[ConceptGrounding][source]

Ground features by analyzing their activation patterns on concept examples.

Parameters:
  • layer_name – Name of layer to analyze

  • concept_inputs – Dict mapping concept names to input tensors

  • confidence_threshold – Minimum confidence for automatic grounding

Returns:

List of ConceptGrounding objects

ground_prototypes_by_similarity(layer_name: str, concept_prototypes: Dict[str, Tensor], confidence_threshold: float = 0.8) List[ConceptGrounding][source]

Ground prototypes by similarity to concept prototype vectors.

Parameters:
  • layer_name – Name of layer to analyze

  • concept_prototypes – Dict mapping concept names to prototype vectors

  • confidence_threshold – Minimum similarity for automatic grounding

Returns:

List of ConceptGrounding objects

get_grounding(layer_name: str, parameter_type: str, parameter_index: int) ConceptGrounding | None[source]

Get grounding for a specific parameter.

Parameters:
  • layer_name – Name of the layer

  • parameter_type – ‘feature’ or ‘prototype’

  • parameter_index – Index of the parameter

Returns:

ConceptGrounding if exists, None otherwise

list_groundings(layer_name: str | None = None, concept_name: str | None = None) List[ConceptGrounding][source]

List groundings, optionally filtered by layer or concept.

Parameters:
  • layer_name – Filter by layer name

  • concept_name – Filter by concept name

Returns:

List of ConceptGrounding objects

remove_grounding(layer_name: str, parameter_type: str, parameter_index: int) bool[source]

Remove a grounding.

Parameters:
  • layer_name – Name of the layer

  • parameter_type – ‘feature’ or ‘prototype’

  • parameter_index – Index of the parameter

Returns:

True if grounding was removed, False if not found

explain_parameter(layer_name: str, parameter_type: str, parameter_index: int) str[source]

Generate human-readable explanation of a parameter.

Parameters:
  • layer_name – Name of the layer

  • parameter_type – ‘feature’ or ‘prototype’

  • parameter_index – Index of the parameter

Returns:

Human-readable explanation string

generate_model_explanation() str[source]

Generate a high-level explanation of the entire model.

validate_groundings(validation_inputs: Tensor, validation_concepts: List[str]) Dict[str, float][source]

Validate groundings using held-out validation data.

Parameters:
  • validation_inputs – Input data for validation

  • validation_concepts – Expected concept labels for each input

Returns:

Dictionary with validation metrics

Classes

FeatureGrounder

class FeatureGrounder(intervention_manager: InterventionManager)[source]

Ground TNN features and prototypes to semantic concepts.

Provides comprehensive methods for associating learned parameters with human-interpretable concepts, enabling semantic understanding of TNN internals through both manual and automatic grounding approaches.

The grounder maintains a concept library and supports multiple grounding methods including manual assignment, activation-based analysis, and similarity-based matching. All groundings include confidence measures and can be validated against held-out data.

Note

The grounder works in conjunction with InterventionManager to access model parameters and layer information for grounding analysis.

__init__(intervention_manager: InterventionManager)[source]

Initialize FeatureGrounder.

Parameters:

intervention_manager (InterventionManager) – InterventionManager instance providing access to TNN model and layer information.

Note

Initializes an empty concept library and grounding dictionary. Concepts must be added before grounding can be performed.

add_concept(name: str, description: str, examples: List[Any] | None = None, properties: Dict[str, Any] | None = None)[source]

Add a concept to the concept library.

ground_feature_manually(layer_name: str, feature_index: int, concept_name: str, confidence: float = 1.0) ConceptGrounding[source]

Manually ground a feature to a concept.

Parameters:
  • layer_name – Name of layer containing the feature

  • feature_index – Index of the feature

  • concept_name – Name of the concept to ground to

  • confidence – Confidence in the grounding

Returns:

ConceptGrounding object

ground_prototype_manually(layer_name: str, prototype_index: int, concept_name: str, confidence: float = 1.0) ConceptGrounding[source]

Manually ground a prototype to a concept.

Parameters:
  • layer_name – Name of layer containing the prototype

  • prototype_index – Index of the prototype

  • concept_name – Name of the concept to ground to

  • confidence – Confidence in the grounding

Returns:

ConceptGrounding object

ground_features_by_activation(layer_name: str, concept_inputs: Dict[str, Tensor], confidence_threshold: float = 0.7) List[ConceptGrounding][source]

Ground features by analyzing their activation patterns on concept examples.

Parameters:
  • layer_name – Name of layer to analyze

  • concept_inputs – Dict mapping concept names to input tensors

  • confidence_threshold – Minimum confidence for automatic grounding

Returns:

List of ConceptGrounding objects

ground_prototypes_by_similarity(layer_name: str, concept_prototypes: Dict[str, Tensor], confidence_threshold: float = 0.8) List[ConceptGrounding][source]

Ground prototypes by similarity to concept prototype vectors.

Parameters:
  • layer_name – Name of layer to analyze

  • concept_prototypes – Dict mapping concept names to prototype vectors

  • confidence_threshold – Minimum similarity for automatic grounding

Returns:

List of ConceptGrounding objects

get_grounding(layer_name: str, parameter_type: str, parameter_index: int) ConceptGrounding | None[source]

Get grounding for a specific parameter.

Parameters:
  • layer_name – Name of the layer

  • parameter_type – ‘feature’ or ‘prototype’

  • parameter_index – Index of the parameter

Returns:

ConceptGrounding if exists, None otherwise

list_groundings(layer_name: str | None = None, concept_name: str | None = None) List[ConceptGrounding][source]

List groundings, optionally filtered by layer or concept.

Parameters:
  • layer_name – Filter by layer name

  • concept_name – Filter by concept name

Returns:

List of ConceptGrounding objects

remove_grounding(layer_name: str, parameter_type: str, parameter_index: int) bool[source]

Remove a grounding.

Parameters:
  • layer_name – Name of the layer

  • parameter_type – ‘feature’ or ‘prototype’

  • parameter_index – Index of the parameter

Returns:

True if grounding was removed, False if not found

explain_parameter(layer_name: str, parameter_type: str, parameter_index: int) str[source]

Generate human-readable explanation of a parameter.

Parameters:
  • layer_name – Name of the layer

  • parameter_type – ‘feature’ or ‘prototype’

  • parameter_index – Index of the parameter

Returns:

Human-readable explanation string

generate_model_explanation() str[source]

Generate a high-level explanation of the entire model.

validate_groundings(validation_inputs: Tensor, validation_concepts: List[str]) Dict[str, float][source]

Validate groundings using held-out validation data.

Parameters:
  • validation_inputs – Input data for validation

  • validation_concepts – Expected concept labels for each input

Returns:

Dictionary with validation metrics

ConceptGrounding

class ConceptGrounding(layer_name: str, parameter_type: str, parameter_index: int, concept_name: str, concept_description: str, confidence: float, activation_correlation: float | None = None, visual_similarity: float | None = None, semantic_coherence: float | None = None, grounding_method: str = 'manual', validation_samples: List[Any] | None = None)[source]

Associates a TNN parameter with a semantic concept.

Records the association between a learned parameter (prototype or feature) and a human-interpretable concept, including confidence measures and supporting evidence for the grounding.

layer_name

Name of the layer containing the parameter.

Type:

str

parameter_type

Type of parameter (‘feature’ or ‘prototype’).

Type:

str

parameter_index

Index of the parameter within the layer.

Type:

int

concept_name

Name of the associated semantic concept.

Type:

str

concept_description

Human-readable description of the concept.

Type:

str

confidence

Confidence in the grounding, range [0, 1].

Type:

float

activation_correlation

Correlation with concept activations.

Type:

Optional[float]

visual_similarity

Visual similarity to concept examples.

Type:

Optional[float]

semantic_coherence

Semantic coherence measure.

Type:

Optional[float]

grounding_method

Method used for grounding (‘manual’, ‘activation_based’, etc.).

Type:

str

validation_samples

Samples used for validation.

Type:

Optional[List[Any]]

layer_name: str
parameter_type: str
parameter_index: int
concept_name: str
concept_description: str
confidence: float
activation_correlation: float | None = None
visual_similarity: float | None = None
semantic_coherence: float | None = None
grounding_method: str = 'manual'
validation_samples: List[Any] | None = None
__init__(layer_name: str, parameter_type: str, parameter_index: int, concept_name: str, concept_description: str, confidence: float, activation_correlation: float | None = None, visual_similarity: float | None = None, semantic_coherence: float | None = None, grounding_method: str = 'manual', validation_samples: List[Any] | None = None) None

ConceptLibrary

class ConceptLibrary(concepts: ~typing.Dict[str, ~typing.Dict[str, ~typing.Any]] = <factory>)[source]

Library of semantic concepts for grounding.

Maintains a collection of semantic concepts with their descriptions, examples, and associated groundings for systematic interpretability analysis.

concepts

Dictionary mapping concept names to concept metadata including description, examples, properties, and list of associated groundings.

Type:

Dict[str, Dict[str, Any]]

concepts: Dict[str, Dict[str, Any]]
add_concept(name: str, description: str, examples: List[Any] | None = None, properties: Dict[str, Any] | None = None)[source]

Add a concept to the library.

get_concept(name: str) Dict[str, Any] | None[source]

Get concept information.

list_concepts() List[str][source]

List all concept names.

__init__(concepts: ~typing.Dict[str, ~typing.Dict[str, ~typing.Any]] = <factory>) None