ESM-2 150M is a GPU-accelerated transformer-based protein language model trained on extensive evolutionary protein sequence data. It generates biologically plausible protein sequences and predicts structural compatibility via learned inter-residue distances, enabling de novo protein design and unconstrained sequence generation. ESM-2 150M supports fixed-backbone design tasks, soluble monomer generation, and exploration of diverse protein topologies, useful in protein engineering, antibody optimization, and enzyme design workflows.
Predict¶
Predict properties or scores for input sequences
- POST /api/v3/esm2-150m/predict/¶
Predict endpoint for ESM-2 150M.
- Request Headers:
Content-Type – application/json
Authorization – Token YOUR_API_KEY
Request
params (object, required) — Configuration parameters:
chain (string, max length: 1, default: “A”) — Chain identifier for input structure
num_samples (int, range: 1-3, default: 1) — Number of sequences to generate per input item
temperature (float, range: 0.0-8.0, default: 0.6) — Sampling temperature
multichain_backbone (bool, default: False) — Indicates if input structure contains multiple chains
items (array of objects, min items: 1, max items: 1) — Input structures:
pdb (string, min length: 1, max length: 100000, required) — Protein structure in PDB format
Example request:
- Status Codes:
200 OK – Successful response
400 Bad Request – Invalid input
500 Internal Server Error – Internal server error
Response
results (array of objects) — One result per input item, in the order requested:
sequence (string) — Generated protein sequence
recovery (float, range: 0.0-1.0) — Sequence recovery score indicating similarity to input structure
Example response:
Encode¶
Generate embeddings for input sequences
- POST /api/v3/esm2-150m/encode/¶
Encode endpoint for ESM-2 150M.
- Request Headers:
Content-Type – application/json
Authorization – Token YOUR_API_KEY
Request
params (object, required) — Configuration parameters:
chain (string, max length: 1, default: “A”) — Chain identifier for the input structure
num_samples (int, range: 1-3, default: 1) — Number of sequences to generate per input structure
temperature (float, range: 0.0-8.0, default: 0.6) — Sampling temperature
multichain_backbone (bool, default: false) — Indicates if input structure contains multiple chains
items (array of objects, min: 1, max: 1) — Input structures:
pdb (string, min length: 1, max length: 100000, required) — Protein structure in PDB format
Example request:
- Status Codes:
200 OK – Successful response
400 Bad Request – Invalid input
500 Internal Server Error – Internal server error
Response
results (array of objects) — One result per input item, in the order requested:
(array of objects) — Array of generated samples, length: num_samples (1-3)
sequence (string) — Generated amino acid sequence (single-letter codes)
recovery (float, range: 0.0-1.0) — Fraction of residues matching input backbone
Example response:
Performance¶
GPU-accelerated inference on NVIDIA T4 GPUs ensures rapid processing and optimized throughput for protein sequence generation tasks.
Typical inference completion time is approximately 8-12 seconds per single-sequence batch, providing efficient turnaround for interactive and high-throughput applications.
ESM-2 150M offers significantly faster inference than larger ESM-2 variants (e.g., ESM-2 650M), with approximately 2-3x speedup due to its smaller parameter count, enabling quicker iteration cycles in protein design workflows.
Predictive accuracy for sequence-to-structure tasks is lower compared to larger ESM-2 models (e.g., ESM-2 650M), with roughly 5-10% reduced precision in long-range contact prediction benchmarks.
Optimized deployment via tensor parallelism and precision tuning (FP16) ensures maximum GPU utilization and minimal latency overhead.
Ideal for rapid prototyping and exploratory protein design tasks where inference speed and cost-efficiency are prioritized over absolute predictive accuracy.
Applications¶
De novo protein design for novel therapeutic applications ESM-2 150M can generate entirely new protein sequences that are not found in nature, which is valuable for creating novel therapeutics. This capability is crucial for designing proteins that can target specific diseases or biological pathways more effectively than existing treatments.
Fixed-backbone protein design for targeted enzyme engineering By generating sequences that fit a predetermined protein structure, ESM-2 150M aids in the engineering of enzymes with specific catalytic properties. This is particularly useful for industries looking to optimize enzyme activity for industrial processes, such as biofuel production or pharmaceutical synthesis.
Antibody and nanobody optimization for improved binding affinity ESM-2 150M can be used to design antibodies and nanobodies with enhanced binding properties, which is essential for developing more effective diagnostic tools and therapeutic agents. This application is valuable for companies aiming to improve the specificity and efficacy of their antibody-based products.
Protein sequence diversity exploration for vaccine development The model’s ability to generate diverse protein sequences allows researchers to explore a wide range of potential antigens for vaccine development. This is particularly important for creating vaccines that can provide broader protection against rapidly mutating pathogens.
Protein folding prediction to aid in structural biology research ESM-2 150M’s predictions about protein folding can assist researchers in understanding the structural basis of protein function, which is critical for drug design and understanding disease mechanisms. This application is most beneficial when experimental structural data is limited or unavailable.
Limitations¶
Batch Size: Maximum number of items per request is
1; higher throughput requires multiple requests.Maximum Input Length: Input PDB structure string must not exceed
max_pdb_str_lencharacters; larger structures must be truncated or simplified.Single-Chain Focus: The model primarily supports single-chain protein design; set
multichain_backbonetoFalsefor optimal results.Sequence Length Sensitivity: Performance and accuracy may degrade for very long protein sequences or highly complex topologies.
Generative Diversity: Although ESM-IF1 can generate novel sequences, it may not adequately explore highly divergent or unusual structural motifs compared to specialized generative models like diffusion-based architectures.
Ranking and Filtering: ESM-IF1 is best suited for early-stage sequence generation; for final-stage validation or high-precision structure prediction, consider slower but more accurate models such as AlphaFold2.
How We Use It¶
BioLM integrates the ESM-2 150M model to accelerate protein engineering workflows by enabling rapid sequence-based design and optimization of novel proteins. The model’s sequence-to-structure capabilities facilitate the efficient generation of diverse protein variants, guiding downstream experimental validation and iterative optimization cycles. By combining ESM-2 150M outputs with predictive biophysical property models and 3D structural metrics, BioLM supports precise ranking and filtering of designed sequences, resulting in improved lab success rates, reduced synthesis costs, and shorter research timelines.
Integrates seamlessly with downstream predictive models for biophysical and structural properties
Accelerates iterative protein optimization cycles through informed variant selection
References¶
Rives, A., Meier, J., Sercu, T., Goyal, S., Lin, Z., Liu, J., Guo, D., Ott, M., Zitnick, C. L., Ma, J., & Fergus, R. (2021). Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proceedings of the National Academy of Sciences.
Lin, Z., Akin, H., Rao, R., Hie, B., Zhu, Z., Lu, W., Smetanin, N., Verkuil, R., Kabeli, O., Shmueli, Y., dos Santos Costa, A., Fazel-Zarandi, M., Sercu, T., Candido, S., & Rives, A. (2022). Evolutionary-scale prediction of atomic level protein structure with a language model. bioRxiv.
