ESM-1b is a 33-layer, ~650M-parameter Transformer protein language model trained on 250M UniRef50 protein sequences (86B amino acids) with a masked language modeling objective. The API provides GPU-accelerated encoder and predictor endpoints that return per-sequence and per-residue embeddings, attention maps, logits, and masked-token predictions for batches of up to 8 proteins (length ≤1,022). These representations support tasks such as remote homology search, structure/contacts feature extraction, and mutational effect modeling.
Predict¶
Predict masked amino acids in protein sequences using ESM-1b
- POST /api/v3/esm1b/predict/¶
Predict endpoint for ESM1b.
- Request Headers:
Content-Type – application/json
Authorization – Token YOUR_API_KEY
Request
items (array of objects, min: 1, max: 8, required) — Masked input sequences:
sequence (string, min length: 1, max length: 1022, required) — Amino acid sequence validated against AAExtendedPlusExtra with “<mask>” required and allowed
Example request:
- Status Codes:
200 OK – Successful response
400 Bad Request – Invalid input
500 Internal Server Error – Internal server error
Response
results (array of objects) — One result per input item, in the order requested:
logits (array of arrays of floats, shape: [L, V]) — Unnormalized scores for each vocabulary token at each token position, where L is the tokenized sequence length and V is the vocabulary size
sequence_tokens (array of strings, length: L) — Tokenized input sequence, including special tokens such as “<mask>”
vocab_tokens (array of strings, length: V) — Output vocabulary tokens corresponding to the second dimension of
logits
Example response:
Encode¶
Encode protein sequences with ESM-1b, requesting multiple representation layers and embedding types
- POST /api/v3/esm1b/encode/¶
Encode endpoint for ESM1b.
- Request Headers:
Content-Type – application/json
Authorization – Token YOUR_API_KEY
Request
params (object, optional) — Configuration parameters:
repr_layers (array of integers, default: [-1]) — Indices of model layers to return representations for
include (array of strings, default: [“mean”]) — Representation types to return; allowed values: “mean”, “per_token”, “bos”, “logits”, “attentions”
items (array of objects, min: 1, max: 8, required) — Input sequences:
sequence (string, min length: 1, max length: 1022, required) — Amino acid sequence using the extended alphabet plus “-”
Example request:
- Status Codes:
200 OK – Successful response
400 Bad Request – Invalid input
500 Internal Server Error – Internal server error
Response
results (array of objects) — One result per input item, in the order requested:
sequence_index (int) — Zero-based index of the input sequence in the request
itemsarrayembeddings (array of objects, optional) — Mean sequence-level embeddings for requested layers
layer (int) — Model layer index associated with the embedding
embedding (array of floats, length: 1280) — Layer-specific sequence embedding vector
bos_embeddings (array of objects, optional) — Beginning-of-sequence token embeddings for requested layers
layer (int) — Model layer index associated with the BOS embedding
embedding (array of floats, length: 1280) — Layer-specific BOS token embedding vector
per_token_embeddings (array of objects, optional) — Per-token embeddings for requested layers
layer (int) — Model layer index associated with the per-token embeddings
embeddings (array of arrays of floats, shape:
[L, 1280]) — Layer-specific embeddings per token, whereLis the tokenized sequence length including special tokens
attentions (array of arrays of floats, optional) — Self-attention weights, flattened across layers, heads, and token positions
logits (array of arrays of floats, optional) — Per-token output scores over the amino acid vocabulary, shape
[L, V]whereVis the size ofvocab_tokensvocab_tokens (array of strings, optional) — Vocabulary entries corresponding to the last dimension of
logits
Example response:
Performance¶
Model architecture and training: - 33-layer bidirectional Transformer language model (~650M parameters) trained on UR50/S (high-diversity UniRef50) with a masked language modeling objective - Encodes global evolutionary statistics across families rather than relying on per-target MSAs, enabling single-sequence usage at scale - Representations contain linearly decodable information about secondary structure, long-range contacts, and remote homology (as shown in the ESM-1b paper benchmarks)
Comparative representation quality vs. other BioLM encoders: - Versus smaller ESM-2 encoders (8M/35M/150M), ESM-1b generally provides stronger embeddings for remote homology detection, secondary structure probes, and long-range contact probes, particularly when paired with simple downstream heads - Versus ESM-2 650M, ESM-1b has slightly weaker structure-aware representations on very large, diverse datasets but remains a practical default where historical benchmarks, tooling, and stability are preferred over marginal accuracy gains
Use as a feature source vs. structure-prediction models: - ESM-1b is not a 3D structure predictor; it exposes sequence-level and token-level representations and logits suited for downstream models - When paired with lightweight supervised heads, ESM-1b features can match or exceed alignment-based baselines (e.g., CCMpred-style contacts, sequence-profile-based secondary structure), but will not reach end-to-end 3D accuracy of models such as AlphaFold2 or ESMFold
Masked prediction and mutation scoring performance: - Within BioLM’s catalog, ESM-1b offers high-quality amino-acid distributions at masked positions, making it preferable to lightweight encoders for mutation-impact ranking and masked-design loops - Newer Evo/E1-series generative models can achieve similar or better ranking quality at comparable compute cost, but ESM-1b remains better characterized and more widely benchmarked for variant-effect and mutational scanning tasks
Applications¶
Protein fitness and mutational scanning surrogates for engineering campaigns: use ESM1b log-likelihoods or masked-token predictions via the
predictorendpoint to prioritize single and combinatorial mutations in directed evolution or library design, reducing wet-lab screening when optimizing stability, activity, or manufacturability; most informative when the target function is well represented in natural sequence space and less so for highly novel functionsStructure- and contact-aware feature generation for downstream ML models: use the
encoderendpoint to extract mean, per-token, BOS, and attention-derived embeddings from selected layers as input features to custom supervised models (e.g., stability predictors, aggregation-risk classifiers, interface-scoring models) that require structural signal without running full 3D prediction on every variant, enabling large-scale in silico screening of thousands to millions of protein designsRemote homology and scaffolding search in large sequence libraries: embed proprietary or public protein sequence collections with ESM1b and run nearest-neighbor or clustering methods on the resulting vectors to detect remote structural relatives and alternative scaffolds that simple sequence identity thresholds miss, supporting reuse of known assays and expression systems; less suitable when precise domain boundaries or high-quality alignments are the primary requirement
MSA-free pre-filtering for protein design and variant triage: apply ESM1b log-probabilities or embeddings to very large mutational or generative libraries where building robust MSAs is infeasible (for example, highly diverse synthetic libraries or metagenomic-like spaces), quickly discarding sequences that are strongly out-of-distribution with respect to evolutionary statistics and focusing experimental campaigns on candidates more likely to be foldable and functional
Integration into automated protein engineering pipelines: incorporate ESM1b sequence embeddings and log-likelihoods as one stage in multi-model ranking stacks (alongside stability, solubility, and developability predictors) to orchestrate design–test–learn loops for industrial enzymes and other production proteins; ESM1b provides a general-purpose sequence representation capturing secondary and tertiary trends but does not replace task-specific models or experimental validation
Limitations¶
Maximum sequence length: Input
sequencestrings are limited toESM1bParams.max_sequence_len=1022amino acids (excluding BOS/EOS). Longer proteins must be truncated, split into overlapping windows, or summarized with domain-level sequences. This constraint applies to all request types (ESM1bEncodeRequest,ESM1bPredictRequest,ESM1bLogProbRequest).Batch size and throughput: Each request can include at most
ESM1bParams.batch_size=8items initems. Large sequence collections must be sharded across multiple API calls. The model is relatively large (~650M parameters), so high-volume or latency-sensitive workloads may require additional engineering (asynchronous batching, caching, pre-computed embeddings).Representation-only model (no 3D structure): ESM-1b provides sequence-level encodings and token-level scores via
encoder(repr_layersandincludeoptions such asmean,per_token,bos,logits,attentions), but does not perform explicit 3D structure prediction. It encodes information correlated with secondary and tertiary structure, but for detailed structure (backbone coordinates, side-chain conformations) and stability ranking of a small candidate set, structure-specific models (e.g. AlphaFold2, ESMFold, antibody-specific structure models) are usually more appropriate.Masked language modeling, not full generative design: The
predictorendpoint (ESM1bPredictRequest) requires at least one<mask>token in eachsequence(enforced bySingleOrMoreOccurrencesOf(token="<mask>")) and returns per-token logits over the vocabulary. It is optimized for scoring substitutions and local infills around<mask>, not for unconditional de novo generation or long autoregressive design. For large-scale sequence generation, CausalLM-style models (e.g. ProGen2, ProtGPT2) or diffusion-based backbone generators are often better suited.Embeddings are generic, not task-specific: Encodings returned via
encoder(embeddings,per_token_embeddings,bos_embeddings,attentions, optionallogitswithvocab_tokens) capture broad evolutionary and structural signals learned from UR50/S, but they are not specialized for any one downstream task. For high-precision applications (e.g. quantitative activity prediction, developability screening, antibody optimization), these representations typically need to be combined with domain-specific models, MSAs, structural features, or fine-tuning.Scope of biological generalization: ESM-1b is trained on natural protein sequences using a 20–amino-acid–centric vocabulary plus a small set of extra tokens (validated by
AAExtendedPlusExtrainsequence). It may perform poorly on highly non-natural sequences (e.g. heavily synthetic repeats, non-standard amino acids encoded as arbitrary symbols, extremely long low-complexity regions) and does not model DNA/RNA. For nucleic acid–focused applications or very unusual protein-like polymers, dedicated DNA/RNA models or custom training are typically required.
How We Use It¶
ESM1b underpins many of our protein design and optimization workflows by providing rich single-sequence embeddings and structure-aware features that plug into standardized APIs for screening, ranking, and iterative optimization. Teams use these embeddings as a shared representation across enzyme engineering, antibody maturation, and developability assessment, combining them with supervised models (e.g., stability, activity, immunogenicity), structure-derived metrics from separate structure tools, and assay readouts to prioritize variants for synthesis and multi-round improvement.
ESM1b embeddings integrate with downstream property predictors (stability, expression, binding, aggregation) to enable high-throughput in silico triage and reduce experimental burden.
In antibody and enzyme campaigns, ESM1b-based features are combined with structural models, physicochemical summaries (charge, size, hydrophobicity), and assay results to drive multi-round design–test–learn cycles through scalable, API-driven workflows.
References¶
Rives, A., Meier, J., Sercu, T., Goyal, S., Lin, Z., Liu, J., Guo, D., Ott, M., Zitnick, C. L., Ma, J., & Fergus, R. (2021). Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proceedings of the National Academy of Sciences of the United States of America, 118(15), e2016239118.
