ESM-2 150M is a GPU-accelerated transformer-based protein language model trained on extensive evolutionary protein sequence data. It generates biologically plausible protein sequences and predicts structural compatibility via learned inter-residue distances, enabling de novo protein design and unconstrained sequence generation. ESM-2 150M supports fixed-backbone design tasks, soluble monomer generation, and exploration of diverse protein topologies, useful in protein engineering, antibody optimization, and enzyme design workflows.

Predict

Predict properties or scores for input sequences

python
from biolmai import BioLM
response = BioLM(
    entity="esm2-150m",
    action="predict",
    params={},
    items=[
      {
        "sequence": "MKTIIALSYIFCLVFADYGPTNVGSLIN<mask>VVHNNNVYPGGGSGGGSGTASCTTMKTIIAL"
      },
      {
        "sequence": "PGGGSGGGSGTASCTTMKTIIALSYIFCLVFADYGPTNVG<mask>SLINKVVHNNNVYMKTIIALSY"
      }
    ]
)
print(response)
bash
curl -X POST https://biolm.ai/api/v3/esm2-150m/predict/ \
  -H "Authorization: Token YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
  "items": [
    {
      "sequence": "MKTIIALSYIFCLVFADYGPTNVGSLIN<mask>VVHNNNVYPGGGSGGGSGTASCTTMKTIIAL"
    },
    {
      "sequence": "PGGGSGGGSGTASCTTMKTIIALSYIFCLVFADYGPTNVG<mask>SLINKVVHNNNVYMKTIIALSY"
    }
  ]
}'
python
import requests

url = "https://biolm.ai/api/v3/esm2-150m/predict/"
headers = {
    "Authorization": "Token YOUR_API_KEY",
    "Content-Type": "application/json"
}
payload = {
      "items": [
        {
          "sequence": "MKTIIALSYIFCLVFADYGPTNVGSLIN<mask>VVHNNNVYPGGGSGGGSGTASCTTMKTIIAL"
        },
        {
          "sequence": "PGGGSGGGSGTASCTTMKTIIALSYIFCLVFADYGPTNVG<mask>SLINKVVHNNNVYMKTIIALSY"
        }
      ]
    }

response = requests.post(url, headers=headers, json=payload)
print(response.json())
r
library(httr)

url <- "https://biolm.ai/api/v3/esm2-150m/predict/"
headers <- c("Authorization" = "Token YOUR_API_KEY", "Content-Type" = "application/json")
body <- list(
  items = list(
    list(
      sequence = "MKTIIALSYIFCLVFADYGPTNVGSLIN<mask>VVHNNNVYPGGGSGGGSGTASCTTMKTIIAL"
    ),
    list(
      sequence = "PGGGSGGGSGTASCTTMKTIIALSYIFCLVFADYGPTNVG<mask>SLINKVVHNNNVYMKTIIALSY"
    )
  )
)

res <- POST(url, add_headers(.headers = headers), body = body, encode = "json")
print(content(res))
POST /api/v3/esm2-150m/predict/

Predict endpoint for ESM-2 150M.

Request Headers:

Request

  • params (object, required) — Configuration parameters:

    • chain (string, max length: 1, default: “A”) — Chain identifier for input structure

    • num_samples (int, range: 1-3, default: 1) — Number of sequences to generate per input item

    • temperature (float, range: 0.0-8.0, default: 0.6) — Sampling temperature

    • multichain_backbone (bool, default: False) — Indicates if input structure contains multiple chains

  • items (array of objects, min items: 1, max items: 1) — Input structures:

    • pdb (string, min length: 1, max length: 100000, required) — Protein structure in PDB format

Example request:

http
POST /api/v3/esm2-150m/predict/ HTTP/1.1
Host: biolm.ai
Authorization: Token YOUR_API_KEY
Content-Type: application/json

      {
  "items": [
    {
      "sequence": "MKTIIALSYIFCLVFADYGPTNVGSLIN<mask>VVHNNNVYPGGGSGGGSGTASCTTMKTIIAL"
    },
    {
      "sequence": "PGGGSGGGSGTASCTTMKTIIALSYIFCLVFADYGPTNVG<mask>SLINKVVHNNNVYMKTIIALSY"
    }
  ]
}
Status Codes:

Response

  • results (array of objects) — One result per input item, in the order requested:

    • sequence (string) — Generated protein sequence

    • recovery (float, range: 0.0-1.0) — Sequence recovery score indicating similarity to input structure

Example response:

http
HTTP/1.1 200 OK
Content-Type: application/json

      {
  "results": [
    {
      "logits": [
        [
          0.5648517608642578,
          -1.3809473514556885,
          "... (truncated for documentation)"
        ],
        [
          1.06108558177948,
          -1.0631077289581299,
          "... (truncated for documentation)"
        ],
        "... (truncated for documentation)"
      ],
      "sequence_tokens": [
        "M",
        "K",
        "... (truncated for documentation)"
      ],
      "vocab_tokens": [
        "L",
        "A",
        "... (truncated for documentation)"
      ]
    },
    {
      "logits": [
        [
          -1.4461108446121216,
          -0.23059141635894775,
          "... (truncated for documentation)"
        ],
        [
          -0.9730031490325928,
          0.14847764372825623,
          "... (truncated for documentation)"
        ],
        "... (truncated for documentation)"
      ],
      "sequence_tokens": [
        "P",
        "G",
        "... (truncated for documentation)"
      ],
      "vocab_tokens": [
        "L",
        "A",
        "... (truncated for documentation)"
      ]
    }
  ]
}

Encode

Generate embeddings for input sequences

python
from biolmai import BioLM
response = BioLM(
    entity="esm2-150m",
    action="encode",
    params={
      "repr_layers": [
        -1,
        -2
      ],
      "include": [
        "mean",
        "contacts"
      ]
    },
    items=[
      {
        "sequence": "MKTIIALSYIFCLVFADYGPTNVGSLINKVVHNNNVYPGGGSGGGSGTASCTTMKTIIALSYIFCLV"
      },
      {
        "sequence": "PGGGSGGGSGTASCTTMKTIIALSYIFCLVFADYGPTNVGSLINKVVHNNNVYMKTIIALSYIFCLV"
      }
    ]
)
print(response)
bash
curl -X POST https://biolm.ai/api/v3/esm2-150m/encode/ \
  -H "Authorization: Token YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
  "params": {
    "repr_layers": [
      -1,
      -2
    ],
    "include": [
      "mean",
      "contacts"
    ]
  },
  "items": [
    {
      "sequence": "MKTIIALSYIFCLVFADYGPTNVGSLINKVVHNNNVYPGGGSGGGSGTASCTTMKTIIALSYIFCLV"
    },
    {
      "sequence": "PGGGSGGGSGTASCTTMKTIIALSYIFCLVFADYGPTNVGSLINKVVHNNNVYMKTIIALSYIFCLV"
    }
  ]
}'
python
import requests

url = "https://biolm.ai/api/v3/esm2-150m/encode/"
headers = {
    "Authorization": "Token YOUR_API_KEY",
    "Content-Type": "application/json"
}
payload = {
      "params": {
        "repr_layers": [
          -1,
          -2
        ],
        "include": [
          "mean",
          "contacts"
        ]
      },
      "items": [
        {
          "sequence": "MKTIIALSYIFCLVFADYGPTNVGSLINKVVHNNNVYPGGGSGGGSGTASCTTMKTIIALSYIFCLV"
        },
        {
          "sequence": "PGGGSGGGSGTASCTTMKTIIALSYIFCLVFADYGPTNVGSLINKVVHNNNVYMKTIIALSYIFCLV"
        }
      ]
    }

response = requests.post(url, headers=headers, json=payload)
print(response.json())
r
library(httr)

url <- "https://biolm.ai/api/v3/esm2-150m/encode/"
headers <- c("Authorization" = "Token YOUR_API_KEY", "Content-Type" = "application/json")
body <- list(
  params = list(
    repr_layers = list(
      -1,
      -2
    ),
    include = list(
      "mean",
      "contacts"
    )
  ),
  items = list(
    list(
      sequence = "MKTIIALSYIFCLVFADYGPTNVGSLINKVVHNNNVYPGGGSGGGSGTASCTTMKTIIALSYIFCLV"
    ),
    list(
      sequence = "PGGGSGGGSGTASCTTMKTIIALSYIFCLVFADYGPTNVGSLINKVVHNNNVYMKTIIALSYIFCLV"
    )
  )
)

res <- POST(url, add_headers(.headers = headers), body = body, encode = "json")
print(content(res))
POST /api/v3/esm2-150m/encode/

Encode endpoint for ESM-2 150M.

Request Headers:

Request

  • params (object, required) — Configuration parameters:

    • chain (string, max length: 1, default: “A”) — Chain identifier for the input structure

    • num_samples (int, range: 1-3, default: 1) — Number of sequences to generate per input structure

    • temperature (float, range: 0.0-8.0, default: 0.6) — Sampling temperature

    • multichain_backbone (bool, default: false) — Indicates if input structure contains multiple chains

  • items (array of objects, min: 1, max: 1) — Input structures:

    • pdb (string, min length: 1, max length: 100000, required) — Protein structure in PDB format

Example request:

http
POST /api/v3/esm2-150m/encode/ HTTP/1.1
Host: biolm.ai
Authorization: Token YOUR_API_KEY
Content-Type: application/json

      {
  "params": {
    "repr_layers": [
      -1,
      -2
    ],
    "include": [
      "mean",
      "contacts"
    ]
  },
  "items": [
    {
      "sequence": "MKTIIALSYIFCLVFADYGPTNVGSLINKVVHNNNVYPGGGSGGGSGTASCTTMKTIIALSYIFCLV"
    },
    {
      "sequence": "PGGGSGGGSGTASCTTMKTIIALSYIFCLVFADYGPTNVGSLINKVVHNNNVYMKTIIALSYIFCLV"
    }
  ]
}
Status Codes:

Response

  • results (array of objects) — One result per input item, in the order requested:

    • (array of objects) — Array of generated samples, length: num_samples (1-3)

      • sequence (string) — Generated amino acid sequence (single-letter codes)

      • recovery (float, range: 0.0-1.0) — Fraction of residues matching input backbone

Example response:

http
HTTP/1.1 200 OK
Content-Type: application/json

      {
  "results": [
    {
      "embeddings": [
        {
          "layer": 29,
          "embedding": [
            -2.011397361755371,
            -3.4499614238739014,
            "... (truncated for documentation)"
          ]
        },
        {
          "layer": 30,
          "embedding": [
            -0.05783110484480858,
            -0.2212517261505127,
            "... (truncated for documentation)"
          ]
        }
      ],
      "contacts": [
        [
          0.9296878576278687,
          0.0017561335116624832,
          "... (truncated for documentation)"
        ],
        [
          0.0017561335116624832,
          0.0960121601819992,
          "... (truncated for documentation)"
        ],
        "... (truncated for documentation)"
      ]
    },
    {
      "embeddings": [
        {
          "layer": 29,
          "embedding": [
            -2.1167407035827637,
            -3.1774816513061523,
            "... (truncated for documentation)"
          ]
        },
        {
          "layer": 30,
          "embedding": [
            -0.07335882633924484,
            -0.17132967710494995,
            "... (truncated for documentation)"
          ]
        }
      ],
      "contacts": [
        [
          0.9773598313331604,
          0.991390585899353,
          "... (truncated for documentation)"
        ],
        [
          0.991390585899353,
          0.991047203540802,
          "... (truncated for documentation)"
        ],
        "... (truncated for documentation)"
      ]
    }
  ]
}

Performance

  • GPU-accelerated inference on NVIDIA T4 GPUs ensures rapid processing and optimized throughput for protein sequence generation tasks.

  • Typical inference completion time is approximately 8-12 seconds per single-sequence batch, providing efficient turnaround for interactive and high-throughput applications.

  • ESM-2 150M offers significantly faster inference than larger ESM-2 variants (e.g., ESM-2 650M), with approximately 2-3x speedup due to its smaller parameter count, enabling quicker iteration cycles in protein design workflows.

  • Predictive accuracy for sequence-to-structure tasks is lower compared to larger ESM-2 models (e.g., ESM-2 650M), with roughly 5-10% reduced precision in long-range contact prediction benchmarks.

  • Optimized deployment via tensor parallelism and precision tuning (FP16) ensures maximum GPU utilization and minimal latency overhead.

  • Ideal for rapid prototyping and exploratory protein design tasks where inference speed and cost-efficiency are prioritized over absolute predictive accuracy.

Applications

  • De novo protein design for novel therapeutic applications ESM-2 150M can generate entirely new protein sequences that are not found in nature, which is valuable for creating novel therapeutics. This capability is crucial for designing proteins that can target specific diseases or biological pathways more effectively than existing treatments.

  • Fixed-backbone protein design for targeted enzyme engineering By generating sequences that fit a predetermined protein structure, ESM-2 150M aids in the engineering of enzymes with specific catalytic properties. This is particularly useful for industries looking to optimize enzyme activity for industrial processes, such as biofuel production or pharmaceutical synthesis.

  • Antibody and nanobody optimization for improved binding affinity ESM-2 150M can be used to design antibodies and nanobodies with enhanced binding properties, which is essential for developing more effective diagnostic tools and therapeutic agents. This application is valuable for companies aiming to improve the specificity and efficacy of their antibody-based products.

  • Protein sequence diversity exploration for vaccine development The model’s ability to generate diverse protein sequences allows researchers to explore a wide range of potential antigens for vaccine development. This is particularly important for creating vaccines that can provide broader protection against rapidly mutating pathogens.

  • Protein folding prediction to aid in structural biology research ESM-2 150M’s predictions about protein folding can assist researchers in understanding the structural basis of protein function, which is critical for drug design and understanding disease mechanisms. This application is most beneficial when experimental structural data is limited or unavailable.

Limitations

  • Batch Size: Maximum number of items per request is 1; higher throughput requires multiple requests.

  • Maximum Input Length: Input PDB structure string must not exceed max_pdb_str_len characters; larger structures must be truncated or simplified.

  • Single-Chain Focus: The model primarily supports single-chain protein design; set multichain_backbone to False for optimal results.

  • Sequence Length Sensitivity: Performance and accuracy may degrade for very long protein sequences or highly complex topologies.

  • Generative Diversity: Although ESM-IF1 can generate novel sequences, it may not adequately explore highly divergent or unusual structural motifs compared to specialized generative models like diffusion-based architectures.

  • Ranking and Filtering: ESM-IF1 is best suited for early-stage sequence generation; for final-stage validation or high-precision structure prediction, consider slower but more accurate models such as AlphaFold2.

How We Use It

BioLM integrates the ESM-2 150M model to accelerate protein engineering workflows by enabling rapid sequence-based design and optimization of novel proteins. The model’s sequence-to-structure capabilities facilitate the efficient generation of diverse protein variants, guiding downstream experimental validation and iterative optimization cycles. By combining ESM-2 150M outputs with predictive biophysical property models and 3D structural metrics, BioLM supports precise ranking and filtering of designed sequences, resulting in improved lab success rates, reduced synthesis costs, and shorter research timelines.

  • Integrates seamlessly with downstream predictive models for biophysical and structural properties

  • Accelerates iterative protein optimization cycles through informed variant selection

References