ESMFold is a GPU-accelerated protein structure prediction algorithm utilizing evolutionary-scale protein language models to infer atomic-level 3D coordinates directly from amino acid sequences, without requiring MSAs or templates. It outputs structures in PDB format with confidence metrics (pLDDT, pTM), achieving inference times of ~14 seconds per 384-residue protein on a single NVIDIA V100 GPU, and supports single-chain proteins and multimeric complexes, enabling high-throughput structural analysis for protein engineering, metagenomics, and computational biology research.

Predict

Predict 3D structure for one or more protein chains (separated by colons) using ESMFold

python
from biolmai import BioLM
response = BioLM(
    entity="esmfold",
    action="predict",
    params={},
    items=[
      {
        "sequence": "GAMEDTQVAW"
      },
      {
        "sequence": "MKTIIALSYIFCLVFADYKDDDD:VLLPAGKQ"
      }
    ]
)
print(response)
bash
curl -X POST https://biolm.ai/api/v3/esmfold/predict/ \
  -H "Authorization: Token YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
  "items": [
    {
      "sequence": "GAMEDTQVAW"
    },
    {
      "sequence": "MKTIIALSYIFCLVFADYKDDDD:VLLPAGKQ"
    }
  ]
}'
python
import requests

url = "https://biolm.ai/api/v3/esmfold/predict/"
headers = {
    "Authorization": "Token YOUR_API_KEY",
    "Content-Type": "application/json"
}
payload = {
      "items": [
        {
          "sequence": "GAMEDTQVAW"
        },
        {
          "sequence": "MKTIIALSYIFCLVFADYKDDDD:VLLPAGKQ"
        }
      ]
    }

response = requests.post(url, headers=headers, json=payload)
print(response.json())
r
library(httr)

url <- "https://biolm.ai/api/v3/esmfold/predict/"
headers <- c("Authorization" = "Token YOUR_API_KEY", "Content-Type" = "application/json")
body <- list(
  items = list(
    list(
      sequence = "GAMEDTQVAW"
    ),
    list(
      sequence = "MKTIIALSYIFCLVFADYKDDDD:VLLPAGKQ"
    )
  )
)

res <- POST(url, add_headers(.headers = headers), body = body, encode = "json")
print(content(res))
POST /api/v3/esmfold/predict/

Predict endpoint for ESMFold.

Request Headers:

Request

  • params (object, optional) — Configuration parameters:

    • batch_size (int, default: 2) — Number of sequences processed per batch, maximum 2

    • max_sequence_len (int, default: 768) — Maximum allowed length of a single protein sequence

    • max_n_multimers (int, default: 4) — Maximum number of chains allowed in a multimer sequence

  • items (array of objects, min: 1, max: 2) — Input sequences:

    • sequence (string, min length: 1, max length: 771, required) — Protein sequence composed of extended amino acid alphabet and “:” separators, allowing up to 3 non-consecutive “:” characters for multimers

Example request:

http
POST /api/v3/esmfold/predict/ HTTP/1.1
Host: biolm.ai
Authorization: Token YOUR_API_KEY
Content-Type: application/json

      {
  "items": [
    {
      "sequence": "GAMEDTQVAW"
    },
    {
      "sequence": "MKTIIALSYIFCLVFADYKDDDD:VLLPAGKQ"
    }
  ]
}
Status Codes:

Response

  • results (array of objects) — One result per input item, in the order requested:

    • pdb (string) — Predicted protein structure in standard PDB format

    • mean_plddt (float, range: 0.0 - 1.0) — Mean predicted Local Distance Difference Test (pLDDT) score indicating prediction confidence

    • ptm (float, range: 0.0 - 1.0) — Predicted Template Modeling (pTM) score evaluating global structural accuracy

Example response:

http
HTTP/1.1 200 OK
Content-Type: application/json

      {
  "results": [
    {
      "pdb": "PARENT N/A\nATOM      1  N   GLY A   1       2.704  13.046  14.934  1.00 67.45           N  \nATOM      2  CA  GLY A   1       1.649  12.125  15.325  1.00 67.60           C  \nATOM      3  C   GLY A   1 ... (truncated for documentation)",
      "mean_plddt": 64.428955078125,
      "ptm": 0.012728862464427948
    },
    {
      "pdb": "PARENT N/A\nATOM      1  N   MET A   1       1.328 -16.623 -16.398  1.00 42.52           N  \nATOM      2  CA  MET A   1       2.126 -17.137 -15.289  1.00 44.35           C  \nATOM      3  C   MET A   1 ... (truncated for documentation)",
      "mean_plddt": 47.16063690185547,
      "ptm": 0.10539554059505463
    }
  ]
}

Performance

  • ESMFold runs inference on NVIDIA A10G GPUs, providing GPU-accelerated structure prediction optimized for rapid turnaround.

  • Typical inference speed for a single protein sequence of length 384 residues is approximately 14.2 seconds, significantly faster than AlphaFold2, which takes roughly 6 times longer for the same sequence length.

  • For shorter sequences (< 200 residues), ESMFold achieves over 60x faster inference compared to AlphaFold2, due to the elimination of multiple sequence alignment (MSA) search overhead.

  • ESMFold accuracy, measured by mean LDDT, is slightly lower than AlphaFold2 (0.68 vs. 0.85 on CASP14 dataset), but comparable to RoseTTAFold (0.68 vs. 0.81 on CASP14).

  • On the CAMEO dataset, ESMFold delivers mean TM-scores of 0.90 (easy targets), 0.79 (medium targets), and 0.45 (hard targets), whereas AlphaFold2 achieves slightly higher scores of 0.93 (easy), 0.86 (medium), and 0.62 (hard).

  • ESMFold leverages BioLM’s optimized deployment architecture, including chunked attention, mixed precision computation, and CPU offloading, enabling efficient inference even on long sequences (e.g., CASP14 target T1044 with 2166 residues) on a single GPU.

  • ESMFold uses a simplified architecture derived from AlphaFold2’s Evoformer block, removing reliance on MSAs and templates, which significantly reduces inference latency without substantial accuracy loss for most applications.

  • Predictive confidence scores (mean pLDDT) provided by ESMFold correlate strongly with actual model accuracy (LDDT), allowing reliable filtering and selection of high-quality predictions.

Applications

  • Rapid structure prediction for protein engineering pipelines, enabling researchers to quickly assess structural viability of designed protein variants without the computational overhead of traditional MSA-based methods; valuable for accelerating iterative design cycles in protein therapeutics or industrial biocatalysts; not optimal for proteins with highly novel folds lacking evolutionary homologs.

  • High-throughput structural screening of protein libraries, allowing biotech companies to efficiently filter and prioritize large numbers of candidate proteins based on predicted structural stability and folding quality (pLDDT scores); useful for identifying promising scaffolds or functional variants for further experimental validation; less suitable for accurately modeling multi-chain protein complexes without additional refinement.

  • Structural annotation of metagenomic protein sequences, providing biotech firms with structural insights into previously uncharacterized proteins from environmental samples; facilitates discovery of novel protein scaffolds or functional domains for industrial biocatalysis or synthetic biology applications; limited accuracy for sequences with extremely low evolutionary similarity to known proteins.

  • Computational ranking of protein design candidates, enabling protein engineering teams to prioritize designs based on predicted structural confidence (pLDDT) and language model perplexity; valuable for reducing experimental costs by focusing efforts on the most structurally viable candidates; not recommended as the sole criterion for functional optimization, as functional activity predictions require additional modeling or experimental validation.

  • Single-chain protein fold prediction for synthetic biology workflows, allowing synthetic biologists to rapidly verify structural feasibility of designed proteins or protein domains before DNA synthesis; valuable for reducing the risk of synthesizing non-folding or unstable proteins; does not account for protein-protein interactions or multi-domain assembly, requiring additional modeling steps for complex constructs.

Limitations

  • Maximum Sequence Length: Sequences submitted to ESMFold must not exceed 768 amino acids (excluding chain separators). Longer sequences must be truncated or split into smaller segments.

  • Batch Size: The API supports a maximum batch_size of 2 sequences per request. Larger batches must be divided into multiple API calls.

  • Multimeric Input: ESMFold can predict structures for up to 4 chains (multimers), separated by colon : characters. Complexes with more chains must be modeled separately or simplified.

  • Accuracy vs. Speed: While ESMFold is significantly faster than AlphaFold2, it generally provides lower accuracy, especially for challenging targets with limited evolutionary information (low MSA depth). For critical final-stage predictions, AlphaFold2 or other high-accuracy models may be more appropriate.

  • Protein Complexes: Although ESMFold can predict multimeric structures, it was trained exclusively on single chains. Predictions for protein complexes are therefore out-of-distribution and typically less accurate compared to specialized multimer models such as AlphaFold-Multimer.

  • Confidence Estimation (pLDDT): Predicted confidence scores (mean_plddt) correlate well with actual accuracy, but structures with low mean_plddt (<70) should be interpreted cautiously. For highly novel sequences or orphans with no close homologs, predictions may be less reliable.

How We Use It

ESMFold enables rapid, high-throughput prediction of protein structures directly from sequence data, significantly accelerating protein design and engineering workflows. By integrating ESMFold within BioLM’s predictive modeling pipelines, we provide researchers with immediate access to accurate structural insights without the need for computationally intensive multiple sequence alignment searches. This capability supports faster decision-making cycles in enzyme optimization, antibody maturation, and targeted protein modifications, seamlessly complementing BioLM’s thermodynamic, biophysical, and structural predictive models.

  • Enhances iterative protein engineering by quickly evaluating structural consequences of sequence modifications.

  • Integrates efficiently with BioLM’s predictive scoring and filtering models to streamline candidate molecule selection for experimental validation.

References