verskyt.benchmarks

Benchmark suites for validating paper results and comparing performance.

Module: xor_suite

XOR benchmark suite for Tversky Neural Networks.

Reproduces XOR experiments from “Tversky Neural Networks: Psychologically Plausible Deep Learning with Differentiable Tversky Similarity” (Doumbouya et al., 2025).

Provides both fast development benchmarks and full paper replication capabilities.

class XORConfig(intersection_methods: ~typing.List[str | ~verskyt.core.similarity.IntersectionReduction] = <factory>, difference_methods: ~typing.List[str | ~verskyt.core.similarity.DifferenceReduction] = <factory>, normalization: ~typing.List[bool] = <factory>, feature_counts: ~typing.List[int] = <factory>, prototype_init: ~typing.List[str] = <factory>, feature_init: ~typing.List[str] = <factory>, random_seeds: ~typing.List[int] = <factory>, epochs: int = 1000, learning_rate: float = 0.1, convergence_threshold: float = 1.0)[source]

Bases: object

Configuration for XOR benchmark experiments.

intersection_methods: List[str | IntersectionReduction]
difference_methods: List[str | DifferenceReduction]
normalization: List[bool]
feature_counts: List[int]
prototype_init: List[str]
feature_init: List[str]
random_seeds: List[int]
epochs: int = 1000
learning_rate: float = 0.1
convergence_threshold: float = 1.0
property total_runs: int

Calculate total number of experimental runs.

__init__(intersection_methods: ~typing.List[str | ~verskyt.core.similarity.IntersectionReduction] = <factory>, difference_methods: ~typing.List[str | ~verskyt.core.similarity.DifferenceReduction] = <factory>, normalization: ~typing.List[bool] = <factory>, feature_counts: ~typing.List[int] = <factory>, prototype_init: ~typing.List[str] = <factory>, feature_init: ~typing.List[str] = <factory>, random_seeds: ~typing.List[int] = <factory>, epochs: int = 1000, learning_rate: float = 0.1, convergence_threshold: float = 1.0) None
class XORResult(intersection_method: str, difference_method: str, normalize: bool, feature_count: int, prototype_init: str, feature_init: str, seed: int, final_loss: float, final_accuracy: float, converged: bool, training_time: float, loss_history: List[float] | None = None, accuracy_history: List[float] | None = None)[source]

Bases: object

Results from a single XOR training run.

intersection_method: str
difference_method: str
normalize: bool
feature_count: int
prototype_init: str
feature_init: str
seed: int
final_loss: float
final_accuracy: float
converged: bool
training_time: float
loss_history: List[float] | None = None
accuracy_history: List[float] | None = None
__init__(intersection_method: str, difference_method: str, normalize: bool, feature_count: int, prototype_init: str, feature_init: str, seed: int, final_loss: float, final_accuracy: float, converged: bool, training_time: float, loss_history: List[float] | None = None, accuracy_history: List[float] | None = None) None
class XORBenchmark(config: XORConfig)[source]

Bases: object

XOR benchmark runner for Tversky Neural Networks.

__init__(config: XORConfig)[source]
run_single_experiment(intersection_method: str, difference_method: str, normalize: bool, feature_count: int, prototype_init: str, feature_init: str, seed: int, track_history: bool = False) XORResult[source]

Run a single XOR training experiment.

run_benchmark(verbose: bool = True, track_history: bool = False) List[XORResult][source]

Run complete benchmark suite.

analyze_results() Dict[str, float][source]

Analyze benchmark results and compute convergence rates.

run_fast_xor_benchmark(verbose: bool = True) Tuple[List[XORResult], Dict[str, float]][source]

Run fast XOR benchmark for development (96 runs, ~60 seconds).

run_full_xor_replication(verbose: bool = True) Tuple[List[XORResult], Dict[str, float]][source]

Run full paper replication (12,960 runs, ~2.2 hours).

XOR benchmark suite for validating non-linear learning capabilities.