ESM-2 35M

Predict¶

Predict the masked amino acid(s) in the provided protein sequence(s) using ESM2-35M.

Python (biolmai)

python

from biolmai import BioLM
response = BioLM(
    entity="esm2-35m",
    action="predict",
    params={},
    items=[
      {
        "sequence": "MVLS<mask>GEWQ"
      }
    ]
)
print(response)

cURL

bash

curl -X POST https://biolm.ai/api/v3/esm2-35m/predict/ \
  -H "Authorization: Token YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
  "items": [
    {
      "sequence": "MVLS<mask>GEWQ"
    }
  ]
}'

Python Requests

python

import requests

url = "https://biolm.ai/api/v3/esm2-35m/predict/"
headers = {
    "Authorization": "Token YOUR_API_KEY",
    "Content-Type": "application/json"
}
payload = {
      "items": [
        {
          "sequence": "MVLS<mask>GEWQ"
        }
      ]
    }

response = requests.post(url, headers=headers, json=payload)
print(response.json())

R

r

library(httr)

url <- "https://biolm.ai/api/v3/esm2-35m/predict/"
headers <- c("Authorization" = "Token YOUR_API_KEY", "Content-Type" = "application/json")
body <- list(
  items = list(
    list(
      sequence = "MVLS<mask>GEWQ"
    )
  )
)

res <- POST(url, add_headers(.headers = headers), body = body, encode = "json")
print(content(res))

POST /api/v3/esm2-35m/predict/¶

Predict endpoint for ESM-2 35M.

Request Headers:

Content-Type – application/json
Authorization – Token YOUR_API_KEY

Request

params (object, optional) — Configuration parameters:
- include (array of strings, default: [“mean”]) — Output types to include in response embeddings; allowed values: “mean”, “per_token”, “logits”
items (array of objects, min: 1, max: 5) — Input sequences for prediction:
- sequence (string, min length: 1, max length: 2048, required) — Protein sequence using standard unambiguous amino acid codes

Example request:

http

POST /api/v3/esm2-35m/predict/ HTTP/1.1
Host: biolm.ai
Authorization: Token YOUR_API_KEY
Content-Type: application/json

      {
  "items": [
    {
      "sequence": "MVLS<mask>GEWQ"
    }
  ]
}

Status Codes:

200 OK – Successful response
400 Bad Request – Invalid input
500 Internal Server Error – Internal server error

Response

results (array of objects) — One result per input item, in the order requested:
- pdb (string) — Predicted protein structure in PDB format
- mean_plddt (float, range: 0.0-100.0) — Mean predicted Local Distance Difference Test (pLDDT) score indicating prediction confidence

Example response:

http

HTTP/1.1 200 OK
Content-Type: application/json

      {
  "results": [
    {
      "logits": [
        [
          0.8847527503967285,
          -0.13512253761291504,
          "... (truncated for documentation)"
        ],
        [
          0.3143197000026703,
          0.1602669060230255,
          "... (truncated for documentation)"
        ],
        "... (truncated for documentation)"
      ],
      "sequence_tokens": [
        "M",
        "V",
        "... (truncated for documentation)"
      ],
      "vocab_tokens": [
        "L",
        "A",
        "... (truncated for documentation)"
      ]
    }
  ]
}

Encode¶

Generate embeddings (mean and per-token) for the provided protein sequences using ESM2-35M.

Python (biolmai)

python

from biolmai import BioLM
response = BioLM(
    entity="esm2-35m",
    action="encode",
    params={},
    items=[
      {
        "sequence": "ACDEFGHIKLMNPQRSTVWY"
      }
    ]
)
print(response)

cURL

bash

curl -X POST https://biolm.ai/api/v3/esm2-35m/encode/ \
  -H "Authorization: Token YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
  "items": [
    {
      "sequence": "ACDEFGHIKLMNPQRSTVWY"
    }
  ],
  "params": {}
}'

Python Requests

python

import requests

url = "https://biolm.ai/api/v3/esm2-35m/encode/"
headers = {
    "Authorization": "Token YOUR_API_KEY",
    "Content-Type": "application/json"
}
payload = {
      "items": [
        {
          "sequence": "ACDEFGHIKLMNPQRSTVWY"
        }
      ],
      "params": {}
    }

response = requests.post(url, headers=headers, json=payload)
print(response.json())

R

r

library(httr)

url <- "https://biolm.ai/api/v3/esm2-35m/encode/"
headers <- c("Authorization" = "Token YOUR_API_KEY", "Content-Type" = "application/json")
body <- list(
  items = list(
    list(
      sequence = "ACDEFGHIKLMNPQRSTVWY"
    )
  ),
  params = list()
)

res <- POST(url, add_headers(.headers = headers), body = body, encode = "json")
print(content(res))

POST /api/v3/esm2-35m/encode/¶

Encode endpoint for ESM-2 35M.

Request Headers:

Content-Type – application/json
Authorization – Token YOUR_API_KEY

Request

params (object, optional) — Configuration parameters:
- include (array of strings, default: [“mean”]) — Types of embeddings or logits to return; allowed values: “mean”, “per_token”, “logits”
items (array of objects, min: 1, max: 5) — Input sequences:
- sequence (string, min length: 1, max length: 2048, required) — Protein sequence using standard unambiguous amino acid codes; ambiguous amino acids not allowed

Example request:

http

POST /api/v3/esm2-35m/encode/ HTTP/1.1
Host: biolm.ai
Authorization: Token YOUR_API_KEY
Content-Type: application/json

      {
  "items": [
    {
      "sequence": "ACDEFGHIKLMNPQRSTVWY"
    }
  ],
  "params": {}
}

Status Codes:

200 OK – Successful response
400 Bad Request – Invalid input
500 Internal Server Error – Internal server error

Response

results (array of objects) — One result per input item, in the order requested:
- pdb (string) — Predicted protein structure in standard PDB file format.
- mean_plddt (float, range: 0.0 - 1.0) — Mean predicted Local Distance Difference Test (pLDDT) confidence score for the predicted structure, indicating prediction accuracy (0.0 = low confidence, 1.0 = high confidence).

Example response:

http

HTTP/1.1 200 OK
Content-Type: application/json

      {
  "results": [
    {
      "embeddings": [
        {
          "layer": 12,
          "embedding": [
            -0.04749707132577896,
            0.0048291562125086784,
            "... (truncated for documentation)"
          ]
        }
      ]
    }
  ]
}

Performance¶

ESM-2 35M is optimized for GPU inference, running efficiently on NVIDIA L4 GPUs.
Typical inference speed is approximately 1.5 seconds per single-sequence prediction, significantly faster than larger ESM-2 variants (e.g., 650M, 3B, 15B), which typically take between 4 and 10 seconds per sequence on the same hardware.
Compared to larger ESM-2 models (150M, 650M, and 3B), the 35M parameter model has reduced predictive accuracy on structure prediction benchmarks:
- On CASP14 test set, ESM-2 35M achieves a TM-score of 0.41, compared to 0.47 (150M), 0.51 (650M), and 0.52 (3B).
- On CAMEO test set, ESM-2 35M achieves a TM-score of 0.56, compared to 0.65 (150M), 0.70 (650M), and 0.72 (3B).
Unsupervised contact prediction accuracy (long-range precision at L) is 0.30, lower than larger ESM-2 models: 0.44 (150M), 0.52 (650M), and 0.54 (3B).
Despite lower accuracy, ESM-2 35M offers a substantial performance advantage in terms of speed and computational efficiency, making it suitable for high-throughput applications or initial screening tasks where rapid inference is prioritized over maximum predictive accuracy.
BioLM has optimized deployment of ESM-2 35M to ensure stable GPU utilization, efficient memory management, and minimal latency overhead during inference.

Applications¶

Rapid structure prediction for protein engineering workflows, enabling researchers to quickly screen and prioritize candidate proteins for stability, folding, or functional properties; particularly valuable when experimental structure determination is slow or costly; not optimal for proteins undergoing significant conformational changes or intrinsically disordered regions.
High-throughput structural annotation of metagenomic protein sequences, allowing biotech companies to rapidly identify novel protein folds or domains from environmental samples; useful for discovering new protein scaffolds or functions in enzyme engineering or synthetic biology; may have reduced accuracy for sequences with minimal evolutionary similarity to known proteins.
Single-sequence structure modeling for protein design tasks, enabling computational design teams to evaluate designed sequences without relying on multiple sequence alignments (MSAs); especially beneficial for novel proteins lacking evolutionary homologs; however, accuracy may decrease relative to MSA-dependent methods for highly divergent sequences or complex multidomain proteins.
Structural embedding generation for protein function prediction, providing computational biologists with atomic-level embeddings that capture evolutionary and structural information; valuable for clustering proteins by functional similarity or predicting functional sites; less suitable for precise modeling of protein-protein interactions or complexes without additional docking methods.
Fast structural characterization for protein variant screening, allowing protein engineering teams to quickly assess structural impacts of mutations or sequence modifications; beneficial for stability engineering or affinity maturation workflows; accuracy may be lower for mutations causing significant structural rearrangements or large insertions/deletions.

Limitations¶

Maximum Sequence Length: The ESM-2 35M API accepts sequences up to a maximum of 2048 amino acids. Longer sequences must be truncated or split into multiple requests.
Batch Size: The maximum batch_size is 5 sequences per request. For larger-scale analyses, parallel requests are required.
The ESM-2 35M model is optimized for rapid structure prediction directly from single sequences, but accuracy is generally lower compared to AlphaFold2 for proteins with very few evolutionary homologs (low MSA depth). For orphan proteins or highly novel sequences without evolutionary context, predictions may be less reliable.
Model accuracy (measured by predicted LDDT) correlates with language model perplexity; sequences poorly modeled by the language model (high perplexity) typically yield lower-confidence predictions. Users should interpret predictions cautiously for sequences with high perplexity scores.
ESM-2 35M is not designed to predict protein complexes or interactions. While it can process artificially concatenated chains, accuracy for protein-protein interfaces and complex structures is significantly lower compared to specialized multimeric predictors such as AlphaFold-Multimer.
The model does not provide sequence embeddings or encodings suitable for downstream clustering or visualization tasks. For applications requiring embeddings, consider using embedding-focused models such as ProtT5 or ESM-2 embedding variants.

How We Use It¶

The ESM-2 35M model enables rapid, scalable exploration of protein sequence space for protein engineering and optimization workflows, allowing researchers to quickly generate accurate sequence embeddings and structural predictions to guide experimental prioritization. By integrating ESM-2 35M embeddings into downstream predictive models and generative AI pipelines, BioLM accelerates tasks such as enzyme design, antibody maturation, and multi-round optimization cycles, resulting in reduced experimental costs and improved hit rates.

Integrates efficiently with predictive and generative modeling workflows to streamline protein design and optimization.
Enables rapid sequence-based ranking and filtering, significantly reducing experimental timelines and resource requirements.

Related¶

AlphaFold2 – Provides highly accurate structure predictions using MSAs, complementary to ESM-2 35M’s fast single-sequence predictions.
ESMFold – Uses ESM-2 representations to rapidly predict atomic-level protein structures directly from sequence, ideal for large-scale structural characterization.
ESM-2 150M – A larger-scale version of ESM-2 35M, offering improved accuracy at the cost of increased computational resources.
ESM-IF1 – Inverse folding model utilizing ESM representations, complementary for protein design tasks based on predicted structures from ESM-2 35M.

References¶

Lin, Z., Akin, H., Rao, R., Hie, B., Zhu, Z., Lu, W., Smetanin, N., Verkuil, R., Kabeli, O., Shmueli, Y., dos Santos Costa, A., Fazel-Zarandi, M., Sercu, T., Candido, S., & Rives, A. (2022). Evolutionary-scale prediction of atomic level protein structure with a language model. Science.

All Models

ESM-2 35M

Capabilities

Postman

Predict¶

Request

Response

Encode¶

Request

Response

Performance¶

Applications¶

Limitations¶

How We Use It¶

References¶

Accelerate yourLead generation