Benchmark

Cluster-split evaluation of the currently deployed allosteric-site model on 226 held-out proteins — none with more than 30% sequence identity to the training set. No cherry-picking, no per-protein tuning, no hidden fallbacks. The model version, training data, and this page's source JSON are all linked at the bottom.

Methodology

Task: predict which residues of a protein form an allosteric binding site.
Test set: data/mega/split_test.csv — 226 entries from AlloBench + Allosteric DB, each clustered at MMseqs2 <30% identity to every training entry.
Scoring: for each test protein we call the live /predict endpoint, take the per-pocket allosteric probability, project it to every residue it contains. Residues not in any of the top-5 predicted pockets get probability 0. Ground truth per residue comes from the curated allo_residues list in each dataset row. We then aggregate residue-level predictions + labels across all test proteins and compute ROC-AUC, PR-AUC, F1, best-F1, precision, and recall.
Comparison baselines: PASSer 2.0 and VN-EGNN report on overlapping but not identical test sets. Their cited numbers are shown for scale; the only apples-to-apples number is our own, which you can reproduce by running /opt/allopath/benchmark_harness.py against this same split.

Model	ROC-AUC	PR-AUC	F1 (best)	Test set
loak allonet GBT v1	—	—	—	226 cluster-split	ours
PASSer 2.0 (2023)	~0.80	—	—	PASSer dataset	cited
VN-EGNN (2024)	~0.82	—	—	COACH420 + PDBbind	cited

Methodology

Comparison

Per-protein AUC distribution