ESM-IF1 is an inverse folding model trained on 12 million AlphaFold2-predicted structures, which predicts amino acid sequences from backbone atom coordinates. Built on an autoregressive transformer architecture with invariant geometric layers, ESM-IF1 achieves 51% native sequence recovery and up to 72% recovery for buried residues on structurally held-out backbones. GPU-accelerated inference provides efficient prediction for protein design, multi-state conformations, and complex binding interfaces in bioengineering and synthetic biology workflows.

Generate

Generate protein sequences from a provided protein backbone using the ESM-IF1 inverse folding model.

python
from biolmai import BioLM
response = BioLM(
    entity="esm-if1",
    action="generate",
    params={
      "chain": "A",
      "num_samples": 1,
      "temperature": 0.6,
      "multichain_backbone": false
    },
    items=[
      {
        "pdb": "ATOM      1  N   ALA A   1      11.104  13.207  11.947  1.00 20.00           N  \nATOM      2  CA  ALA A   1      12.560  13.051  11.824  1.00 20.00           C  \nATOM      3  C   ALA A   1      13.069  11.615  12.062  1.00 20.00           C  \nATOM      4  O   ALA A   1      12.436  10.671  11.586  1.00 20.00           O  \nATOM      5  CB  ALA A   1      13.255  13.861  10.726  1.00 20.00           C  \nTER       6      ALA A   1                                                      \nEND\n"
      }
    ]
)
print(response)
bash
curl -X POST https://biolm.ai/api/v3/esm-if1/generate/ \
  -H "Authorization: Token YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
  "params": {
    "chain": "A",
    "num_samples": 1,
    "temperature": 0.6,
    "multichain_backbone": false
  },
  "items": [
    {
      "pdb": "ATOM      1  N   ALA A   1      11.104  13.207  11.947  1.00 20.00           N  \nATOM      2  CA  ALA A   1      12.560  13.051  11.824  1.00 20.00           C  \nATOM      3  C   ALA A   1      13.069  11.615  12.062  1.00 20.00           C  \nATOM      4  O   ALA A   1      12.436  10.671  11.586  1.00 20.00           O  \nATOM      5  CB  ALA A   1      13.255  13.861  10.726  1.00 20.00           C  \nTER       6      ALA A   1                                                      \nEND\n"
    }
  ]
}'
python
import requests

url = "https://biolm.ai/api/v3/esm-if1/generate/"
headers = {
    "Authorization": "Token YOUR_API_KEY",
    "Content-Type": "application/json"
}
payload = {
      "params": {
        "chain": "A",
        "num_samples": 1,
        "temperature": 0.6,
        "multichain_backbone": false
      },
      "items": [
        {
          "pdb": "ATOM      1  N   ALA A   1      11.104  13.207  11.947  1.00 20.00           N  \nATOM      2  CA  ALA A   1      12.560  13.051  11.824  1.00 20.00           C  \nATOM      3  C   ALA A   1      13.069  11.615  12.062  1.00 20.00           C  \nATOM      4  O   ALA A   1      12.436  10.671  11.586  1.00 20.00           O  \nATOM      5  CB  ALA A   1      13.255  13.861  10.726  1.00 20.00           C  \nTER       6      ALA A   1                                                      \nEND\n"
        }
      ]
    }

response = requests.post(url, headers=headers, json=payload)
print(response.json())
r
library(httr)

url <- "https://biolm.ai/api/v3/esm-if1/generate/"
headers <- c("Authorization" = "Token YOUR_API_KEY", "Content-Type" = "application/json")
body <- list(
  params = list(
    chain = "A",
    num_samples = 1,
    temperature = 0.6,
    multichain_backbone = FALSE
  ),
  items = list(
    list(
      pdb = "ATOM      1  N   ALA A   1      11.104  13.207  11.947  1.00 20.00           N
ATOM      2  CA  ALA A   1      12.560  13.051  11.824  1.00 20.00           C
ATOM      3  C   ALA A   1      13.069  11.615  12.062  1.00 20.00           C
ATOM      4  O   ALA A   1      12.436  10.671  11.586  1.00 20.00           O
ATOM      5  CB  ALA A   1      13.255  13.861  10.726  1.00 20.00           C
TER       6      ALA A   1
END
"
    )
  )
)

res <- POST(url, add_headers(.headers = headers), body = body, encode = "json")
print(content(res))
POST /api/v3/esm-if1/generate/

Generate endpoint for ESM-IF1.

Request Headers:

Request

  • params (object, required) — Configuration parameters:

    • chain (string, max length: 1, default: “A”) — Chain identifier from the input PDB structure

    • num_samples (int, range: 1-3, default: 1) — Number of sequences to generate per input structure

    • temperature (float, range: 0.0-8.0, default: 0.6) — Sampling temperature for sequence generation

    • multichain_backbone (bool, default: False) — Indicates if input backbone includes multiple chains

  • items (array of objects, min: 1, max: 1) — Input data items:

    • pdb (string, min length: 1, max length: 100000, required) — Protein backbone structure in PDB format

Example request:

http
POST /api/v3/esm-if1/generate/ HTTP/1.1
Host: biolm.ai
Authorization: Token YOUR_API_KEY
Content-Type: application/json

      {
  "params": {
    "chain": "A",
    "num_samples": 1,
    "temperature": 0.6,
    "multichain_backbone": false
  },
  "items": [
    {
      "pdb": "ATOM      1  N   ALA A   1      11.104  13.207  11.947  1.00 20.00           N  \nATOM      2  CA  ALA A   1      12.560  13.051  11.824  1.00 20.00           C  \nATOM      3  C   ALA A   1      13.069  11.615  12.062  1.00 20.00           C  \nATOM      4  O   ALA A   1      12.436  10.671  11.586  1.00 20.00           O  \nATOM      5  CB  ALA A   1      13.255  13.861  10.726  1.00 20.00           C  \nTER       6      ALA A   1                                                      \nEND\n"
    }
  ]
}
Status Codes:

Response

  • results (array of objects) — One result per input item, in the order requested:

    • (array of objects, length: num_samples) — Generated sequence samples for the input backbone structure:

      • sequence (string) — Generated amino acid sequence (single-letter amino acid codes)

      • recovery (float, range: 0.0 - 1.0) — Fraction of residues matching the native sequence (sequence recovery accuracy)

Example response:

http
HTTP/1.1 200 OK
Content-Type: application/json

      {
  "results": [
    [
      {
        "sequence": "M",
        "recovery": 0.0
      }
    ]
  ]
}

Performance

  • ESM-IF1 is designed for inverse folding tasks, focusing on predicting protein sequences from given backbone structures.

  • Utilizes a GPU-accelerated environment with NVIDIA T4 GPUs, ensuring efficient processing of complex computations.

  • Supports a batch size of 1, optimizing for precise sequence predictions per request.

  • The model can handle protein sequences up to a length of 500 amino acids, making it suitable for a wide range of protein design applications.

  • ESM-IF1 offers a competitive advantage in sequence recovery, achieving up to 51% recovery for native sequences, surpassing other models like ESMFold in accuracy.

  • Compared to AlphaFold2, ESM-IF1 provides faster processing times for inverse folding tasks, though AlphaFold2 remains superior for full structure prediction accuracy.

  • ESM-IF1’s performance is optimized through the integration of predicted structures, enhancing its ability to generalize across various protein design tasks.

  • The model demonstrates improved perplexity and sequence recovery metrics when trained with a mix of experimental and predicted data, leveraging BioLM’s extensive dataset.

  • Typical completion times are efficient, with seconds per batch, ensuring rapid turnaround for real-time applications.

  • ESM-IF1’s architecture is fine-tuned for inverse folding, providing a balance between speed and accuracy, making it a preferred choice for targeted protein sequence design tasks.

Applications

  • Designing novel protein sequences given backbone coordinates, enabling rapid exploration of protein variants optimized for stability or function; valuable for biotech companies performing de novo protein engineering or developing protein therapeutics; limitations include lower accuracy in predicting surface-exposed residues due to their inherent flexibility.

  • Optimizing protein-protein interfaces by predicting sequences that enhance binding affinity, useful for engineering biologics with improved therapeutic efficacy; particularly valuable for companies designing protein-based therapeutics or biosensors; accuracy depends on availability of accurate structural information for the binding partners.

  • Predicting mutational effects on protein stability and binding affinity through zero-shot scoring of sequence variants, enabling rapid screening and prioritization of candidate mutations; beneficial for protein engineering teams developing stable protein scaffolds or affinity-enhanced binding domains; performance may degrade for mutations involving large structural rearrangements or insertions.

  • Multi-state protein sequence design by conditioning on multiple backbone conformations, enabling design of proteins that adopt desired conformations under different conditions; useful for developing protein switches or biosensors that respond to environmental changes; less effective for highly flexible or disordered regions lacking clear structural conformations.

  • Infilling partially masked protein backbone structures to generate plausible sequences for missing regions, enabling completion of partial protein structures from experimental data; valuable for companies performing structural biology studies or protein fragment optimization; limited accuracy for longer masked spans (greater than 30 residues) or highly flexible loop regions.

Limitations

  • Batch Size: The maximum number of input items per request is limited to 1.

  • Maximum Input Length: Input PDB structures must not exceed max_pdb_str_len characters; longer structures must be truncated or split.

  • Chain Selection: The algorithm predicts sequences for a single chain specified by the chain parameter (default "A"); multichain predictions require setting multichain_backbone to True but may yield reduced accuracy.

  • Sequence Length: The algorithm is trained and optimized for protein backbones up to 500 amino acids; longer sequences may result in degraded performance and increased inference time.

  • The model is trained on AlphaFold2-predicted structures and may produce suboptimal results for backbones significantly different from those in the training data, such as highly flexible regions, disordered proteins, or structures with novel folds.

  • ESM-IF1 is designed for fixed-backbone inverse folding tasks; it does not perform well for tasks requiring backbone flexibility, side-chain packing optimization, or predicting the effects of large insertions and deletions.

How We Use It

ESM-IF1 enables BioLM users to perform rapid, structure-guided protein sequence optimization and design by predicting amino acid sequences from backbone coordinates. By integrating seamlessly with other predictive models (e.g., masked language models, embedding-based filtering, and 3D structure-derived metrics), ESM-IF1 facilitates iterative rounds of protein engineering, antibody maturation, and enzyme optimization. Its standardized and scalable API allows users to accelerate research outcomes and streamline engineering workflows, translating sequence-to-structure insights into actionable design decisions.

  • Accelerates iterative protein optimization by integrating predictive and generative modeling pipelines.

  • Facilitates informed selection of candidate sequences for synthesis based on predicted stability, binding affinity, and other biophysical properties.

References