Chai-1 is a GPU-accelerated, multi-modal foundation model for biomolecular structure prediction, enabling accurate inference of protein, nucleic acid, ligand, and multimeric complex structures directly from FASTA or SMILES inputs. It optionally accepts MSAs, structural templates, and experimental constraints (e.g., cross-linking mass spectrometry), improving DockQ scores significantly. Chai-1 supports single-sequence inference without MSAs, retaining predictive accuracy, and provides confidence metrics (ipTM, pLDDT) in standard PDB-format outputs for drug discovery, antibody optimization, and protein engineering workflows.
Predict¶
Predict 3D structures and optional confidence metrics with Chai-1
- POST /api/v3/chai1/predict/¶
Predict endpoint for Chai-1.
- Request Headers:
Content-Type – application/json
Authorization – Token YOUR_API_KEY
Request
params (object, optional) — Configuration parameters:
num_trunk_recycles (int, range: 1-10, default: 3) — Number of trunk recycles
num_diffusion_timesteps (int, range: 50-200, default: 200) — Number of diffusion timesteps
num_diffn_samples (int, range: 1-5, default: 1) — Number of diffusion samples
use_esm_embeddings (bool, default: True) — Whether to use ESM embeddings
seed (int, default: 42) — Random seed for reproducibility
include (array, default: []) — Output score options (forced to empty list)
items (array of objects, max: 1) — Input items for prediction:
molecules (array of objects, max: 5) — List of molecules:
name (string, required) — Name of the molecule
type (string, allowed: “protein”, “RNA”, “DNA”, “ligand”, “polymer_hybrid”, “water”, “unknown”) — Type of the molecule
sequence (string, optional) — Sequence of the molecule
smiles (string, optional) — SMILES representation of the molecule
alignment (object, optional) — Alignment data for protein molecules:
mgnify (string, optional) — Mgnify alignment data
small_bfd (string, optional) — Small BFD alignment data
uniref90 (string, optional) — UniRef90 alignment data
Example request:
- Status Codes:
200 OK – Successful response
400 Bad Request – Invalid input
500 Internal Server Error – Internal server error
Response
results (array of objects) — One result per input item, in the order requested:
cif (string) — CIF content as a string
pae (array of arrays of floats, optional) — Predicted aligned error matrix, dimensions: [N, N], where N is the number of residues
plddt (array of floats, optional) — Per-residue confidence scores, size: N, where N is the number of residues in the protein
Example response:
Performance¶
Chai-1 inference runs on NVIDIA A100 GPUs (80GB VRAM), leveraging GPU acceleration for rapid structural prediction.
Chai-1 significantly outperforms AlphaFold2 and ESMFold on protein-ligand and protein-protein interaction prediction tasks, particularly in challenging antibody-antigen docking scenarios:
Protein-ligand prediction (PoseBusters benchmark): Chai-1 achieves a ligand RMSD success rate (RMSD < 2 Å) of 77%, comparable to AlphaFold3 (76%) and substantially higher than RoseTTAFold All-Atom (42%).
Protein-protein interface prediction: Chai-1 achieves a DockQ success rate (DockQ > 0.23) of 75.1%, surpassing AlphaFold-Multimer 2.3 (67.7%). In antibody-protein interface prediction specifically, Chai-1 achieves 52.9% DockQ success, significantly outperforming AlphaFold-Multimer 2.3 (38.0%).
Chai-1 leverages protein language model embeddings (ESM embeddings) enabling strong performance even in single-sequence mode (without MSAs), outperforming ESMFold and AlphaFold-Multimer 2.3 under comparable conditions:
Single-sequence mode protein-protein DockQ success rate: 69.8% (Chai-1) vs. 67.7% (AlphaFold-Multimer 2.3).
Single-sequence antibody-protein DockQ success rate: 47.9% (Chai-1) vs. 38.0% (AlphaFold-Multimer 2.3).
Chai-1 provides accurate confidence scores (ipTM, pLDDT, PAE) that correlate strongly with experimentally determined structure quality, enabling reliable ranking and filtering of predicted structures without additional computational overhead.
BioLM’s optimized inference pipeline uses distributed genetic search (Jackhmmer) across UniRef90, UniProt, and MGnify databases, providing faster MSA generation compared to traditional approaches (e.g., original AlphaFold2 pipeline), without compromising prediction accuracy.
Chai-1 achieves state-of-the-art accuracy on CASP15 protein monomer prediction tasks, surpassing AlphaFold-Multimer 2.3 (average LDDT: 0.849 vs. 0.843) and significantly outperforming ESM3 (LDDT: 0.801).
Chai-1 predictions are chemically valid and exhibit lower inter-molecular clash rates compared to AlphaFold3, eliminating the need for additional clash penalties during prediction ranking.
Applications¶
Predicting protein-small molecule binding structures to accelerate drug discovery by accurately modeling ligand interactions directly from protein sequences and chemical structures; useful for pharmaceutical companies prioritizing lead compounds, though accuracy may decrease for highly flexible ligands or very large protein complexes.
Antibody-antigen complex modeling to guide antibody optimization and maturation efforts; valuable for biotech companies developing therapeutic antibodies by enabling accurate prediction of binding interfaces, although achieving high-resolution predictions may require additional experimental constraints such as epitope mapping data.
Multimeric protein complex structure prediction to inform protein engineering and synthetic biology workflows; beneficial for designing novel protein assemblies or optimizing existing complexes, particularly when experimental structural data is limited, though relative orientations of chains may occasionally be predicted inaccurately without experimental constraints.
Single-sequence protein structure prediction for rapid exploration of protein design space in immunological applications; enables efficient computational screening of highly variable protein sequences without requiring multiple sequence alignments (MSAs), ideal for antibody design campaigns, but performance may be lower compared to predictions leveraging full evolutionary information.
Nucleic acid-protein interaction modeling to support gene-editing and RNA therapeutic development; enables accurate structural predictions of protein-DNA and protein-RNA complexes from sequence alone, facilitating design of CRISPR/Cas systems or RNA-targeting therapeutics, though accuracy may be improved further by incorporating nucleic acid-specific evolutionary information or experimental constraints.
Limitations¶
Maximum Sequence Length: Protein sequences are limited to
1024residues, RNA/DNA sequences to3072bases, and ligand sequences to128residues. Input sequences exceeding these limits will be rejected by the API.Batch Size: The API supports a fixed
batch_sizeof1per request. For multiple predictions, submit separate API requests sequentially.Chai-1 predictions can be sensitive to modified residues. Substituting modified residues with standard amino acids or removing them entirely may significantly alter predicted structures.
Chai-1 may correctly predict individual chains within a complex but can sometimes fail to accurately position them relative to each other, especially without experimental constraints (e.g., epitope mapping or cross-linking data).
While Chai-1 performs well in single-sequence mode without MSAs (Multiple Sequence Alignments), accuracy for protein monomer prediction is notably lower compared to predictions using full MSA inputs. For monomer folding tasks without available MSAs, alternative models like ESMFold may be more suitable.
High-quality antibody-antigen complex prediction remains challenging. Without experimentally derived constraints (e.g., epitope residues or distance restraints), Chai-1 frequently produces lower-quality predictions for antibody-antigen interfaces.
How We Use It¶
BioLM leverages Chai-1 for accurate and rapid prediction of biomolecular structures, enabling streamlined protein engineering and antibody optimization workflows. By integrating Chai-1 via standardized APIs into BioLM’s predictive modeling pipelines, we rapidly evaluate protein-ligand and antibody-antigen interactions, guiding experimental prioritization and reducing laboratory cycles. Chai-1’s ability to incorporate experimental restraints complements BioLM’s use of sequence embeddings and thermodynamic metrics, facilitating multi-round protein optimization and targeted engineering efforts.
Integrates seamlessly with BioLM’s predictive and generative modeling pipelines to accelerate protein engineering cycles.
Enables efficient use of experimental data, improving antibody-antigen binding predictions and guiding experimental validation.
References¶
Chai Discovery team (2024). Chai-1: Decoding the molecular interactions of life. Chai Discovery.
