About Loak

Loak is an academic research tool that predicts allosteric binding sites on protein structures and suggests candidate small-molecule modulators. It runs in any modern web browser, takes a PDB ID or UniProt accession as input, and returns a ranked list of candidate allosteric pockets with associated probability scores, per-residue dynamics, and generated drug-like molecules.

Allosteric site prediction Normal mode analysis Perturbation–response scanning AutoDock Vina docking Open benchmarks

What the tool does

Allosteric sites are protein binding pockets that, when occupied by a small molecule, regulate the protein's active site remotely. They are pharmacologically valuable because they enable selective modulation rather than blunt inhibition, and they often differ between closely related proteins in the same family, giving drug designers a selectivity handle that orthosteric (active-site) inhibitors lack.

Loak's core job is to look at a protein structure and say: "Here are the five pockets most likely to be allosteric, ranked by trained-model probability, with a per-residue flexibility and communication-pathway analysis to explain why." For each ranked pocket, Loak can also generate drug-like candidate molecules, dock them into the pocket with AutoDock Vina, and report the resulting binding energy, predicted activator-versus-inhibitor label, affinity estimate, and explicit non-covalent contact list (H-bonds, salt bridges, hydrophobic contacts).

Pipeline

  1. PDB file fetched from RCSB PDB (or uploaded by user).
  2. Structure parsed with BioPython; residue graph built at 8 Å Cα contact cutoff.
  3. Normal-mode analysis via ProDy (ANM + GNM, 30 modes).
  4. Perturbation–response matrix (PRS) computed from the mode spectrum (100 repeats).
  5. Candidate pockets detected by VN-EGNN (default) or fpocket as fallback.
  6. Per-pocket feature vector assembled (32 features: geometric, physicochemical, dynamic, allosteric-specific).
  7. Trained gradient-boosted classifier scores each pocket (probability of being allosteric).
  8. Communication pathways between pockets and the active site computed with Dijkstra on the residue contact graph, weighted by PRS coupling.
  9. Optional molecule generation: real AutoDock Vina docking at the predicted pocket centre, pKd prediction via a second trained regressor, explicit non-covalent contact detection.

Honest accuracy numbers

Loak publishes cluster-split held-out test metrics. These are the numbers we trust for claims about the model's performance on proteins it has never seen:

Held-out AUC
0.65
cluster-split, <30% id
Retrospective top-3 hit
4 / 22
published allosteric drugs
Linear-probe ceiling
0.59
per-residue ESM+PRS
What this means in plain English: on proteins in the training distribution, the model is good. On truly novel proteins with no close homolog in training, the model scores about 0.65 AUC and correctly identifies the allosteric site inside its top 3 predictions about 18% of the time. That is decision-support grade (useful for triage, not for publishing binding claims). See the live benchmark and retrospective validation pages for the raw per-case numbers.

What Loak is not

Trust layer

Every number displayed on Loak carries a scientific basis (hover the value in the UI to see it). If a prediction fell back to a heuristic because the trained model was unavailable, the response is tagged using_heuristic_fallback=true and flagged in the UI. Training reports and split CSVs are available on request — nothing is hidden behind a "trust us."

Technology

Loak is a single-page web application. Frontend is vanilla HTML / CSS / JavaScript with the NGL 3D molecular viewer. Backend is FastAPI on Python 3.12. Machine-learning models are scikit-learn, LightGBM, and ESM-2 from Meta AI. 3D docking uses AutoDock Vina. Normal-mode analysis runs on ProDy. Everything runs server-side on commodity hardware.

Data sources

Privacy

Loak does not require an account. Queries are logged server-side for rate limiting and abuse prevention (IP + query + timestamp, retained 30 days). Uploaded PDB files are written to a per-job temporary directory and deleted after 24 hours. No tracking cookies, no analytics scripts, no third-party advertising. See the security.txt for responsible-disclosure contact.

Contact

For scientific questions, collaboration, or bug reports, use the responsible-disclosure address in security.txt. For SmartScreen or other domain-reputation false-positives, the site is a non-commercial academic tool with no downloads, no authentication flows, and no tracking.