Chai-1 is a GPU-accelerated, multi-modal foundation model for biomolecular structure prediction, enabling accurate inference of protein, nucleic acid, ligand, and multimeric complex structures directly from FASTA or SMILES inputs. It optionally accepts MSAs, structural templates, and experimental constraints (e.g., cross-linking mass spectrometry), improving DockQ scores significantly. Chai-1 supports single-sequence inference without MSAs, retaining predictive accuracy, and provides confidence metrics (ipTM, pLDDT) in standard PDB-format outputs for drug discovery, antibody optimization, and protein engineering workflows.

Predict

Predict 3D structures and optional confidence metrics with Chai-1

python
from biolmai import BioLM
response = BioLM(
    entity="chai1",
    action="predict",
    params={
      "num_trunk_recycles": 4,
      "num_diffusion_timesteps": 180,
      "num_diffn_samples": 1,
      "use_esm_embeddings": true,
      "seed": 42,
      "include": []
    },
    items=[
      {
        "molecules": [
          {
            "name": "TestProtein",
            "type": "protein",
            "sequence": "MAAASNDENERK"
          }
        ]
      }
    ]
)
print(response)
bash
curl -X POST https://biolm.ai/api/v3/chai1/predict/ \
  -H "Authorization: Token YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
  "params": {
    "num_trunk_recycles": 4,
    "num_diffusion_timesteps": 180,
    "num_diffn_samples": 1,
    "use_esm_embeddings": true,
    "seed": 42,
    "include": []
  },
  "items": [
    {
      "molecules": [
        {
          "name": "TestProtein",
          "type": "protein",
          "sequence": "MAAASNDENERK"
        }
      ]
    }
  ]
}'
python
import requests

url = "https://biolm.ai/api/v3/chai1/predict/"
headers = {
    "Authorization": "Token YOUR_API_KEY",
    "Content-Type": "application/json"
}
payload = {
      "params": {
        "num_trunk_recycles": 4,
        "num_diffusion_timesteps": 180,
        "num_diffn_samples": 1,
        "use_esm_embeddings": true,
        "seed": 42,
        "include": []
      },
      "items": [
        {
          "molecules": [
            {
              "name": "TestProtein",
              "type": "protein",
              "sequence": "MAAASNDENERK"
            }
          ]
        }
      ]
    }

response = requests.post(url, headers=headers, json=payload)
print(response.json())
r
library(httr)

url <- "https://biolm.ai/api/v3/chai1/predict/"
headers <- c("Authorization" = "Token YOUR_API_KEY", "Content-Type" = "application/json")
body <- list(
  params = list(
    num_trunk_recycles = 4,
    num_diffusion_timesteps = 180,
    num_diffn_samples = 1,
    use_esm_embeddings = TRUE,
    seed = 42,
    include = list()
  ),
  items = list(
    list(
      molecules = list(
        list(
          name = "TestProtein",
          type = "protein",
          sequence = "MAAASNDENERK"
        )
      )
    )
  )
)

res <- POST(url, add_headers(.headers = headers), body = body, encode = "json")
print(content(res))
POST /api/v3/chai1/predict/

Predict endpoint for Chai-1.

Request Headers:

Request

  • params (object, optional) — Configuration parameters:

    • num_trunk_recycles (int, range: 1-10, default: 3) — Number of trunk recycles

    • num_diffusion_timesteps (int, range: 50-200, default: 200) — Number of diffusion timesteps

    • num_diffn_samples (int, range: 1-5, default: 1) — Number of diffusion samples

    • use_esm_embeddings (bool, default: True) — Whether to use ESM embeddings

    • seed (int, default: 42) — Random seed for reproducibility

    • include (array, default: []) — Output score options (forced to empty list)

  • items (array of objects, max: 1) — Input items for prediction:

    • molecules (array of objects, max: 5) — List of molecules:

      • name (string, required) — Name of the molecule

      • type (string, allowed: “protein”, “RNA”, “DNA”, “ligand”, “polymer_hybrid”, “water”, “unknown”) — Type of the molecule

      • sequence (string, optional) — Sequence of the molecule

      • smiles (string, optional) — SMILES representation of the molecule

      • alignment (object, optional) — Alignment data for protein molecules:

        • mgnify (string, optional) — Mgnify alignment data

        • small_bfd (string, optional) — Small BFD alignment data

        • uniref90 (string, optional) — UniRef90 alignment data

Example request:

http
POST /api/v3/chai1/predict/ HTTP/1.1
Host: biolm.ai
Authorization: Token YOUR_API_KEY
Content-Type: application/json

      {
  "params": {
    "num_trunk_recycles": 4,
    "num_diffusion_timesteps": 180,
    "num_diffn_samples": 1,
    "use_esm_embeddings": true,
    "seed": 42,
    "include": []
  },
  "items": [
    {
      "molecules": [
        {
          "name": "TestProtein",
          "type": "protein",
          "sequence": "MAAASNDENERK"
        }
      ]
    }
  ]
}
Status Codes:

Response

  • results (array of objects) — One result per input item, in the order requested:

    • cif (string) — CIF content as a string

    • pae (array of arrays of floats, optional) — Predicted aligned error matrix, dimensions: [N, N], where N is the number of residues

    • plddt (array of floats, optional) — Per-residue confidence scores, size: N, where N is the number of residues in the protein

Example response:

http
HTTP/1.1 200 OK
Content-Type: application/json

      {
  "results": [
    [
      {
        "cif": "data_model\n_entry.id model\n_struct.entry_id model\n_struct.pdbx_model_details .\n_struct.pdbx_structure_determination_methodology computational\n_struct.title 'Chai-1 predicted structure'\n_audit_conform.... (truncated for documentation)"
      }
    ]
  ]
}

Performance

  • Chai-1 inference runs on NVIDIA A100 GPUs (80GB VRAM), leveraging GPU acceleration for rapid structural prediction.

  • Chai-1 significantly outperforms AlphaFold2 and ESMFold on protein-ligand and protein-protein interaction prediction tasks, particularly in challenging antibody-antigen docking scenarios:

    • Protein-ligand prediction (PoseBusters benchmark): Chai-1 achieves a ligand RMSD success rate (RMSD < 2 Å) of 77%, comparable to AlphaFold3 (76%) and substantially higher than RoseTTAFold All-Atom (42%).

    • Protein-protein interface prediction: Chai-1 achieves a DockQ success rate (DockQ > 0.23) of 75.1%, surpassing AlphaFold-Multimer 2.3 (67.7%). In antibody-protein interface prediction specifically, Chai-1 achieves 52.9% DockQ success, significantly outperforming AlphaFold-Multimer 2.3 (38.0%).

  • Chai-1 leverages protein language model embeddings (ESM embeddings) enabling strong performance even in single-sequence mode (without MSAs), outperforming ESMFold and AlphaFold-Multimer 2.3 under comparable conditions:

    • Single-sequence mode protein-protein DockQ success rate: 69.8% (Chai-1) vs. 67.7% (AlphaFold-Multimer 2.3).

    • Single-sequence antibody-protein DockQ success rate: 47.9% (Chai-1) vs. 38.0% (AlphaFold-Multimer 2.3).

  • Chai-1 provides accurate confidence scores (ipTM, pLDDT, PAE) that correlate strongly with experimentally determined structure quality, enabling reliable ranking and filtering of predicted structures without additional computational overhead.

  • BioLM’s optimized inference pipeline uses distributed genetic search (Jackhmmer) across UniRef90, UniProt, and MGnify databases, providing faster MSA generation compared to traditional approaches (e.g., original AlphaFold2 pipeline), without compromising prediction accuracy.

  • Chai-1 achieves state-of-the-art accuracy on CASP15 protein monomer prediction tasks, surpassing AlphaFold-Multimer 2.3 (average LDDT: 0.849 vs. 0.843) and significantly outperforming ESM3 (LDDT: 0.801).

  • Chai-1 predictions are chemically valid and exhibit lower inter-molecular clash rates compared to AlphaFold3, eliminating the need for additional clash penalties during prediction ranking.

Applications

  • Predicting protein-small molecule binding structures to accelerate drug discovery by accurately modeling ligand interactions directly from protein sequences and chemical structures; useful for pharmaceutical companies prioritizing lead compounds, though accuracy may decrease for highly flexible ligands or very large protein complexes.

  • Antibody-antigen complex modeling to guide antibody optimization and maturation efforts; valuable for biotech companies developing therapeutic antibodies by enabling accurate prediction of binding interfaces, although achieving high-resolution predictions may require additional experimental constraints such as epitope mapping data.

  • Multimeric protein complex structure prediction to inform protein engineering and synthetic biology workflows; beneficial for designing novel protein assemblies or optimizing existing complexes, particularly when experimental structural data is limited, though relative orientations of chains may occasionally be predicted inaccurately without experimental constraints.

  • Single-sequence protein structure prediction for rapid exploration of protein design space in immunological applications; enables efficient computational screening of highly variable protein sequences without requiring multiple sequence alignments (MSAs), ideal for antibody design campaigns, but performance may be lower compared to predictions leveraging full evolutionary information.

  • Nucleic acid-protein interaction modeling to support gene-editing and RNA therapeutic development; enables accurate structural predictions of protein-DNA and protein-RNA complexes from sequence alone, facilitating design of CRISPR/Cas systems or RNA-targeting therapeutics, though accuracy may be improved further by incorporating nucleic acid-specific evolutionary information or experimental constraints.

Limitations

  • Maximum Sequence Length: Protein sequences are limited to 1024 residues, RNA/DNA sequences to 3072 bases, and ligand sequences to 128 residues. Input sequences exceeding these limits will be rejected by the API.

  • Batch Size: The API supports a fixed batch_size of 1 per request. For multiple predictions, submit separate API requests sequentially.

  • Chai-1 predictions can be sensitive to modified residues. Substituting modified residues with standard amino acids or removing them entirely may significantly alter predicted structures.

  • Chai-1 may correctly predict individual chains within a complex but can sometimes fail to accurately position them relative to each other, especially without experimental constraints (e.g., epitope mapping or cross-linking data).

  • While Chai-1 performs well in single-sequence mode without MSAs (Multiple Sequence Alignments), accuracy for protein monomer prediction is notably lower compared to predictions using full MSA inputs. For monomer folding tasks without available MSAs, alternative models like ESMFold may be more suitable.

  • High-quality antibody-antigen complex prediction remains challenging. Without experimentally derived constraints (e.g., epitope residues or distance restraints), Chai-1 frequently produces lower-quality predictions for antibody-antigen interfaces.

How We Use It

BioLM leverages Chai-1 for accurate and rapid prediction of biomolecular structures, enabling streamlined protein engineering and antibody optimization workflows. By integrating Chai-1 via standardized APIs into BioLM’s predictive modeling pipelines, we rapidly evaluate protein-ligand and antibody-antigen interactions, guiding experimental prioritization and reducing laboratory cycles. Chai-1’s ability to incorporate experimental restraints complements BioLM’s use of sequence embeddings and thermodynamic metrics, facilitating multi-round protein optimization and targeted engineering efforts.

  • Integrates seamlessly with BioLM’s predictive and generative modeling pipelines to accelerate protein engineering cycles.

  • Enables efficient use of experimental data, improving antibody-antigen binding predictions and guiding experimental validation.

References