AbLang-2 is an antibody-specific language model optimized to address germline bias in antibody sequence prediction, specifically tuned for accurate inference of non-germline residues critical for antigen-binding affinity and specificity. Trained on 35.6M unpaired and 1.26M paired VH/VL sequences from Observed Antibody Space (OAS), AbLang-2 employs a modified masked language modeling approach with focal loss to improve prediction accuracy, enabling identification of valid antibody mutations with high cumulative probability. The GPU-accelerated API supports efficient mutation suggestion for antibody design, optimization, and affinity maturation tasks in therapeutic antibody discovery workflows.

Predict

Predict likelihood for these input sequences

python
from biolmai import BioLM
response = BioLM(
    entity="ablang2",
    action="predict",
    params={
      "include": "likelihood"
    },
    items=[
      {
        "heavy": "QVQLVQSGAEVKKPGASVKVSCK",
        "light": "DIQMTQSPASLSASVGDRVTITC"
      },
      {
        "heavy": "EVQLVESGGGLVKPGGSLKLSCA",
        "light": "KVVMTQSPDSLSASLGDRVTITC"
      }
    ]
)
print(response)
bash
curl -X POST https://biolm.ai/api/v3/ablang2/predict/ \
  -H "Authorization: Token YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
  "params": {
    "include": "likelihood"
  },
  "items": [
    {
      "heavy": "QVQLVQSGAEVKKPGASVKVSCK",
      "light": "DIQMTQSPASLSASVGDRVTITC"
    },
    {
      "heavy": "EVQLVESGGGLVKPGGSLKLSCA",
      "light": "KVVMTQSPDSLSASLGDRVTITC"
    }
  ]
}'
python
import requests

url = "https://biolm.ai/api/v3/ablang2/predict/"
headers = {
    "Authorization": "Token YOUR_API_KEY",
    "Content-Type": "application/json"
}
payload = {
      "params": {
        "include": "likelihood"
      },
      "items": [
        {
          "heavy": "QVQLVQSGAEVKKPGASVKVSCK",
          "light": "DIQMTQSPASLSASVGDRVTITC"
        },
        {
          "heavy": "EVQLVESGGGLVKPGGSLKLSCA",
          "light": "KVVMTQSPDSLSASLGDRVTITC"
        }
      ]
    }

response = requests.post(url, headers=headers, json=payload)
print(response.json())
r
library(httr)

url <- "https://biolm.ai/api/v3/ablang2/predict/"
headers <- c("Authorization" = "Token YOUR_API_KEY", "Content-Type" = "application/json")
body <- list(
  params = list(
    include = "likelihood"
  ),
  items = list(
    list(
      heavy = "QVQLVQSGAEVKKPGASVKVSCK",
      light = "DIQMTQSPASLSASVGDRVTITC"
    ),
    list(
      heavy = "EVQLVESGGGLVKPGGSLKLSCA",
      light = "KVVMTQSPDSLSASLGDRVTITC"
    )
  )
)

res <- POST(url, add_headers(.headers = headers), body = body, encode = "json")
print(content(res))
POST /api/v3/ablang2/predict/

Predict endpoint for AbLang-2.

Request Headers:

Request

  • params (object, optional) — Configuration parameters:

    • include (string, default: “likelihood”) — Type of prediction output; must be “likelihood”

  • items (array of objects, min: 1, max: 32) — Input antibody sequences:

    • heavy (string, min length: 1, max length: 1024, required) — Heavy chain amino acid sequence using extended amino acid alphabet

    • light (string, min length: 1, max length: 1024, required) — Light chain amino acid sequence using extended amino acid alphabet

Example request:

http
POST /api/v3/ablang2/predict/ HTTP/1.1
Host: biolm.ai
Authorization: Token YOUR_API_KEY
Content-Type: application/json

      {
  "params": {
    "include": "likelihood"
  },
  "items": [
    {
      "heavy": "QVQLVQSGAEVKKPGASVKVSCK",
      "light": "DIQMTQSPASLSASVGDRVTITC"
    },
    {
      "heavy": "EVQLVESGGGLVKPGGSLKLSCA",
      "light": "KVVMTQSPDSLSASLGDRVTITC"
    }
  ]
}
Status Codes:

Response

  • results (array of objects) — One result per input item, in the order requested:

    • likelihood (array of arrays of floats, shape: [num_sequence_tokens, vocab_size]) — Token-wise likelihood scores

    • sequence_tokens (array of strings, length: num_sequence_tokens) — Sequence tokens corresponding to likelihood scores

    • vocab_tokens (array of strings, length: vocab_size) — Vocabulary tokens used in likelihood calculation

Example response:

http
HTTP/1.1 200 OK
Content-Type: application/json

      {
  "results": [
    {
      "likelihood": [
        [
          -4.1002655029296875,
          -3.792412519454956,
          "... (truncated for documentation)"
        ],
        [
          -4.9303059577941895,
          -2.364351749420166,
          "... (truncated for documentation)"
        ],
        "... (truncated for documentation)"
      ],
      "sequence_tokens": [
        "<",
        "Q",
        "... (truncated for documentation)"
      ],
      "vocab_tokens": [
        "M",
        "R",
        "... (truncated for documentation)"
      ]
    },
    {
      "likelihood": [
        [
          -3.3265843391418457,
          -5.960440158843994,
          "... (truncated for documentation)"
        ],
        [
          -3.3216309547424316,
          -3.2128522396087646,
          "... (truncated for documentation)"
        ],
        "... (truncated for documentation)"
      ],
      "sequence_tokens": [
        "<",
        "E",
        "... (truncated for documentation)"
      ],
      "vocab_tokens": [
        "M",
        "R",
        "... (truncated for documentation)"
      ]
    }
  ]
}

Encode

Generate embeddings for input sequences

python
from biolmai import BioLM
response = BioLM(
    entity="ablang2",
    action="encode",
    params={
      "include": "seqcoding",
      "align": false
    },
    items=[
      {
        "heavy": "QVQLVQSGAEVKKQ",
        "light": "DVVMTQTPLSLPVTP"
      },
      {
        "heavy": "QVQLVESGGGSVQPGRSLR",
        "light": "EIVLTQSPGTLSLSPGERA"
      }
    ]
)
print(response)
bash
curl -X POST https://biolm.ai/api/v3/ablang2/encode/ \
  -H "Authorization: Token YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
  "params": {
    "include": "seqcoding",
    "align": false
  },
  "items": [
    {
      "heavy": "QVQLVQSGAEVKKQ",
      "light": "DVVMTQTPLSLPVTP"
    },
    {
      "heavy": "QVQLVESGGGSVQPGRSLR",
      "light": "EIVLTQSPGTLSLSPGERA"
    }
  ]
}'
python
import requests

url = "https://biolm.ai/api/v3/ablang2/encode/"
headers = {
    "Authorization": "Token YOUR_API_KEY",
    "Content-Type": "application/json"
}
payload = {
      "params": {
        "include": "seqcoding",
        "align": false
      },
      "items": [
        {
          "heavy": "QVQLVQSGAEVKKQ",
          "light": "DVVMTQTPLSLPVTP"
        },
        {
          "heavy": "QVQLVESGGGSVQPGRSLR",
          "light": "EIVLTQSPGTLSLSPGERA"
        }
      ]
    }

response = requests.post(url, headers=headers, json=payload)
print(response.json())
r
library(httr)

url <- "https://biolm.ai/api/v3/ablang2/encode/"
headers <- c("Authorization" = "Token YOUR_API_KEY", "Content-Type" = "application/json")
body <- list(
  params = list(
    include = "seqcoding",
    align = FALSE
  ),
  items = list(
    list(
      heavy = "QVQLVQSGAEVKKQ",
      light = "DVVMTQTPLSLPVTP"
    ),
    list(
      heavy = "QVQLVESGGGSVQPGRSLR",
      light = "EIVLTQSPGTLSLSPGERA"
    )
  )
)

res <- POST(url, add_headers(.headers = headers), body = body, encode = "json")
print(content(res))
POST /api/v3/ablang2/encode/

Encode endpoint for AbLang-2.

Request Headers:

Request

  • params (object, optional) — Configuration parameters:

    • include (string, optional, default: “seqcoding”, enum: [“seqcoding”, “rescoding”]) — Encoding type to include in response

    • align (boolean, optional, default: false) — Alignment flag, applicable only if “rescoding” is selected

  • items (array of objects, min: 1, max: 32) — Input antibody sequences:

    • heavy (string, required, min length: 1, max length: 1024) — Heavy chain amino acid sequence, validated against extended amino acid alphabet

    • light (string, required, min length: 1, max length: 1024) — Light chain amino acid sequence, validated against extended amino acid alphabet

Example request:

http
POST /api/v3/ablang2/encode/ HTTP/1.1
Host: biolm.ai
Authorization: Token YOUR_API_KEY
Content-Type: application/json

      {
  "params": {
    "include": "seqcoding",
    "align": false
  },
  "items": [
    {
      "heavy": "QVQLVQSGAEVKKQ",
      "light": "DVVMTQTPLSLPVTP"
    },
    {
      "heavy": "QVQLVESGGGSVQPGRSLR",
      "light": "EIVLTQSPGTLSLSPGERA"
    }
  ]
}
Status Codes:

Response

  • results (array of objects) — One result per input item, in the order requested:

    • seqcoding (array of floats, size: embedding dimension) — Sequence-level embeddings for the input antibody sequences

Example response:

http
HTTP/1.1 200 OK
Content-Type: application/json

      {
  "results": [
    {
      "seqcoding": [
        -0.21477425201138592,
        -0.12190765886129264,
        "... (truncated for documentation)"
      ]
    },
    {
      "seqcoding": [
        -0.23653832053072577,
        -0.12340051473963053,
        "... (truncated for documentation)"
      ]
    }
  ]
}

Performance

  • Batch size: up to 32 antibody sequence pairs (heavy + light)

  • Sequence length limit: 1024 amino acids per heavy or light chain

  • Optimized for CPU inference (2 vCPU, 4 GB RAM); GPU acceleration not required

  • Typical inference speed: sub-second per batch (32 sequence pairs), significantly faster than larger transformer-based antibody models such as IgT5 Paired or IgBert Paired

  • Embedding generation (seqcoding and rescoding) performance is faster than comparable antibody embedding models (e.g., IgT5, IgBert) due to smaller model size and optimized inference pipeline

  • Likelihood estimation and sequence restoration tasks complete rapidly (milliseconds per sequence), outperforming larger antibody language models in latency-sensitive applications

  • AbLang-2 provides superior computational efficiency compared to larger antibody-focused transformer models (e.g., IgT5 Paired, IgBert Paired) without significant loss in embedding quality or predictive accuracy for antibody design and optimization tasks

Applications

  • Antibody sequence optimization for improved binding affinity, enabling researchers and biotech companies to rapidly refine antibody candidates by computationally predicting beneficial mutations; valuable for accelerating antibody maturation processes and reducing experimental screening costs; not optimal for predicting antibody stability under manufacturing conditions.

  • In silico screening of antibody libraries to identify high-affinity binders, allowing biotech companies to efficiently prioritize antibody candidates before costly wet-lab validation; particularly useful in therapeutic antibody discovery pipelines to reduce experimental overhead; less effective for predicting complex antibody-antigen interactions involving significant conformational changes.

  • Computational humanization of non-human antibodies, assisting biotech firms in rapidly identifying sequence modifications that reduce immunogenicity while preserving antigen-binding properties; valuable for streamlining therapeutic antibody development and regulatory approval; limited in accurately predicting immunogenicity impacts arising from post-translational modifications.

  • Identification of antibody sequence liabilities such as aggregation-prone motifs or instability hotspots, enabling early-stage filtering of problematic candidates; beneficial for reducing downstream manufacturing and formulation issues; not suitable for comprehensive structural stability predictions requiring detailed 3D modeling.

  • Rapid clustering and classification of antibody repertoires from sequencing data, facilitating biotech companies’ analysis of immune response diversity and identification of promising therapeutic candidates; valuable in vaccine response profiling and antibody discovery from immune libraries; limited accuracy when analyzing heavily mutated or highly divergent antibody sequences.

Limitations

  • Maximum Sequence Length: The API accepts antibody sequences with a maximum length of 1024 amino acids for both heavy and light chains. Longer sequences must be truncated or split before submission.

  • Batch Size: Requests are limited to a maximum of 32 antibody sequence pairs per API call. For larger datasets, split your sequences into multiple batches.

  • AbLang-2 embeddings (seqcoding and rescoding) are specialized for antibody sequences and may not generalize well to non-antibody proteins or unrelated biological sequences.

  • The restore functionality requires at least one unknown amino acid position marked explicitly with *; sequences without unknown positions cannot utilize this feature.

  • AbLang-2 predictions do not provide structural information (e.g., antibody 3D structure predictions or CDR loop conformations). For antibody structure prediction tasks, consider specialized tools such as NanobodyBuilder or ABodybuilder.

  • The model does not directly support antibody-antigen binding affinity prediction or epitope mapping; complementary predictive tools or experimental validation may be required for these use cases.

How We Use It

BioLM integrates AbLang-2 into antibody optimization workflows, enabling rapid sequence evaluation and targeted maturation strategies. Its standardized API facilitates scalable inference across large antibody libraries, streamlining candidate selection based on predicted binding affinity and specificity. By combining AbLang-2 outputs with complementary structural prediction models and biophysical property calculators, BioLM accelerates antibody engineering cycles, improving success rates and reducing experimental overhead.

  • Accelerates antibody candidate prioritization through scalable inference

  • Integrates seamlessly with structural modeling and biophysical property prediction tools for comprehensive antibody optimization

References