AbLang-2 is an antibody-specific language model optimized to address germline bias in antibody sequence prediction, specifically tuned for accurate inference of non-germline residues critical for antigen-binding affinity and specificity. Trained on 35.6M unpaired and 1.26M paired VH/VL sequences from Observed Antibody Space (OAS), AbLang-2 employs a modified masked language modeling approach with focal loss to improve prediction accuracy, enabling identification of valid antibody mutations with high cumulative probability. The GPU-accelerated API supports efficient mutation suggestion for antibody design, optimization, and affinity maturation tasks in therapeutic antibody discovery workflows.
Predict¶
Predict likelihood for these input sequences
- POST /api/v3/ablang2/predict/¶
Predict endpoint for AbLang-2.
- Request Headers:
Content-Type – application/json
Authorization – Token YOUR_API_KEY
Request
params (object, optional) — Configuration parameters:
include (string, default: “likelihood”) — Type of prediction output; must be “likelihood”
items (array of objects, min: 1, max: 32) — Input antibody sequences:
heavy (string, min length: 1, max length: 1024, required) — Heavy chain amino acid sequence using extended amino acid alphabet
light (string, min length: 1, max length: 1024, required) — Light chain amino acid sequence using extended amino acid alphabet
Example request:
- Status Codes:
200 OK – Successful response
400 Bad Request – Invalid input
500 Internal Server Error – Internal server error
Response
results (array of objects) — One result per input item, in the order requested:
likelihood (array of arrays of floats, shape: [num_sequence_tokens, vocab_size]) — Token-wise likelihood scores
sequence_tokens (array of strings, length: num_sequence_tokens) — Sequence tokens corresponding to likelihood scores
vocab_tokens (array of strings, length: vocab_size) — Vocabulary tokens used in likelihood calculation
Example response:
Encode¶
Generate embeddings for input sequences
- POST /api/v3/ablang2/encode/¶
Encode endpoint for AbLang-2.
- Request Headers:
Content-Type – application/json
Authorization – Token YOUR_API_KEY
Request
params (object, optional) — Configuration parameters:
include (string, optional, default: “seqcoding”, enum: [“seqcoding”, “rescoding”]) — Encoding type to include in response
align (boolean, optional, default: false) — Alignment flag, applicable only if “rescoding” is selected
items (array of objects, min: 1, max: 32) — Input antibody sequences:
heavy (string, required, min length: 1, max length: 1024) — Heavy chain amino acid sequence, validated against extended amino acid alphabet
light (string, required, min length: 1, max length: 1024) — Light chain amino acid sequence, validated against extended amino acid alphabet
Example request:
- Status Codes:
200 OK – Successful response
400 Bad Request – Invalid input
500 Internal Server Error – Internal server error
Response
results (array of objects) — One result per input item, in the order requested:
seqcoding (array of floats, size: embedding dimension) — Sequence-level embeddings for the input antibody sequences
Example response:
Performance¶
Batch size: up to 32 antibody sequence pairs (heavy + light)
Sequence length limit: 1024 amino acids per heavy or light chain
Optimized for CPU inference (2 vCPU, 4 GB RAM); GPU acceleration not required
Typical inference speed: sub-second per batch (32 sequence pairs), significantly faster than larger transformer-based antibody models such as IgT5 Paired or IgBert Paired
Embedding generation (seqcoding and rescoding) performance is faster than comparable antibody embedding models (e.g., IgT5, IgBert) due to smaller model size and optimized inference pipeline
Likelihood estimation and sequence restoration tasks complete rapidly (milliseconds per sequence), outperforming larger antibody language models in latency-sensitive applications
AbLang-2 provides superior computational efficiency compared to larger antibody-focused transformer models (e.g., IgT5 Paired, IgBert Paired) without significant loss in embedding quality or predictive accuracy for antibody design and optimization tasks
Applications¶
Antibody sequence optimization for improved binding affinity, enabling researchers and biotech companies to rapidly refine antibody candidates by computationally predicting beneficial mutations; valuable for accelerating antibody maturation processes and reducing experimental screening costs; not optimal for predicting antibody stability under manufacturing conditions.
In silico screening of antibody libraries to identify high-affinity binders, allowing biotech companies to efficiently prioritize antibody candidates before costly wet-lab validation; particularly useful in therapeutic antibody discovery pipelines to reduce experimental overhead; less effective for predicting complex antibody-antigen interactions involving significant conformational changes.
Computational humanization of non-human antibodies, assisting biotech firms in rapidly identifying sequence modifications that reduce immunogenicity while preserving antigen-binding properties; valuable for streamlining therapeutic antibody development and regulatory approval; limited in accurately predicting immunogenicity impacts arising from post-translational modifications.
Identification of antibody sequence liabilities such as aggregation-prone motifs or instability hotspots, enabling early-stage filtering of problematic candidates; beneficial for reducing downstream manufacturing and formulation issues; not suitable for comprehensive structural stability predictions requiring detailed 3D modeling.
Rapid clustering and classification of antibody repertoires from sequencing data, facilitating biotech companies’ analysis of immune response diversity and identification of promising therapeutic candidates; valuable in vaccine response profiling and antibody discovery from immune libraries; limited accuracy when analyzing heavily mutated or highly divergent antibody sequences.
Limitations¶
Maximum Sequence Length: The API accepts antibody sequences with a maximum length of
1024
amino acids for bothheavy
andlight
chains. Longer sequences must be truncated or split before submission.Batch Size: Requests are limited to a maximum of
32
antibody sequence pairs per API call. For larger datasets, split your sequences into multiple batches.AbLang-2 embeddings (
seqcoding
andrescoding
) are specialized for antibody sequences and may not generalize well to non-antibody proteins or unrelated biological sequences.The
restore
functionality requires at least one unknown amino acid position marked explicitly with*
; sequences without unknown positions cannot utilize this feature.AbLang-2 predictions do not provide structural information (e.g., antibody 3D structure predictions or CDR loop conformations). For antibody structure prediction tasks, consider specialized tools such as NanobodyBuilder or ABodybuilder.
The model does not directly support antibody-antigen binding affinity prediction or epitope mapping; complementary predictive tools or experimental validation may be required for these use cases.
How We Use It¶
BioLM integrates AbLang-2 into antibody optimization workflows, enabling rapid sequence evaluation and targeted maturation strategies. Its standardized API facilitates scalable inference across large antibody libraries, streamlining candidate selection based on predicted binding affinity and specificity. By combining AbLang-2 outputs with complementary structural prediction models and biophysical property calculators, BioLM accelerates antibody engineering cycles, improving success rates and reducing experimental overhead.
Accelerates antibody candidate prioritization through scalable inference
Integrates seamlessly with structural modeling and biophysical property prediction tools for comprehensive antibody optimization
References¶
Olsen, T. H., Moal, I. H., & Deane, C. M. (2022). AbLang: an antibody language model for completing antibody sequences. Bioinformatics Advances.
Olsen, T. H., Boyles, F., & Deane, C. M. (2022). Observed Antibody Space: A diverse database of cleaned, annotated, and translated unpaired and paired antibody sequences. Protein Science.
Raybould, M. I. J., Marks, C., Lewis, A. P., Shi, J., Bujotzek, A., Taddese, B., & Deane, C. M. (2020). Thera-SAbDab: the Therapeutic Structural Antibody Database. Nucleic Acids Research.
Lin, Z., Akin, H., Rao, R., Hie, B., Zhu, Z., Lu, W., Smetanin, N., Verkuil, R., Kabeli, O., Shmueli, Y., dos Santos Costa, A., Fazel-Zarandi, M., Sercu, T., Candido, S., & Rives, A. (2023). Evolutionary-scale prediction of atomic-level protein structure with a language model. Science.
Ruffolo, J. A., Gray, J. J., & Sulam, J. (2021). Deciphering antibody affinity maturation with language models and weakly supervised learning. arXiv preprint.
Prihoda, D., Maamary, J., Waight, A., Juan, V., Fayadat-Dilman, L., Svozil, D., & Bitton, D. A. (2022). BioPhi: A platform for antibody design, humanization, and humanness evaluation based on natural antibody repertoires and deep learning. mAbs.
Rives, A., Meier, J., Sercu, T., Goyal, S., Lin, Z., Liu, J., Guo, D., Ott, M., Zitnick, C. L., Ma, J., & Fergus, R. (2021). Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proceedings of the National Academy of Sciences.
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint.