IgT5 Paired is an antibody-specific encoder-decoder transformer model trained on 2 million natively paired heavy and light chain sequences from the Observed Antibody Space (OAS) dataset. Leveraging GPU acceleration, the model generates residue-level embeddings and predicts masked antibody residues with state-of-the-art accuracy, achieving 62% and 87% recovery rates on heavy and light chain CDR3 loops, respectively. BioLM provides scalable API inference for embedding generation, antibody property prediction, affinity maturation, and sequence optimization tasks.

Encode

Generate embeddings for input sequences

python
from biolmai import BioLM
response = BioLM(
    entity="igt5-paired",
    action="encode",
    params={
      "include": [
        "mean",
        "residue"
      ]
    },
    items=[
      {
        "heavy": "QVQLVESGGGLVQPGGSLRLSCAASGDIF",
        "light": "DIQMTQSPSSLSASVGDRVTITCRAS"
      },
      {
        "heavy": "EVQLVESGGDVVQPGRSLRLSCAASGFTF",
        "light": "DILMTQSPSSLSASVGDRVTITCRASK"
      }
    ]
)
print(response)
bash
curl -X POST https://biolm.ai/api/v3/igt5-paired/encode/ \
  -H "Authorization: Token YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
  "params": {
    "include": [
      "mean",
      "residue"
    ]
  },
  "items": [
    {
      "heavy": "QVQLVESGGGLVQPGGSLRLSCAASGDIF",
      "light": "DIQMTQSPSSLSASVGDRVTITCRAS"
    },
    {
      "heavy": "EVQLVESGGDVVQPGRSLRLSCAASGFTF",
      "light": "DILMTQSPSSLSASVGDRVTITCRASK"
    }
  ]
}'
python
import requests

url = "https://biolm.ai/api/v3/igt5-paired/encode/"
headers = {
    "Authorization": "Token YOUR_API_KEY",
    "Content-Type": "application/json"
}
payload = {
      "params": {
        "include": [
          "mean",
          "residue"
        ]
      },
      "items": [
        {
          "heavy": "QVQLVESGGGLVQPGGSLRLSCAASGDIF",
          "light": "DIQMTQSPSSLSASVGDRVTITCRAS"
        },
        {
          "heavy": "EVQLVESGGDVVQPGRSLRLSCAASGFTF",
          "light": "DILMTQSPSSLSASVGDRVTITCRASK"
        }
      ]
    }

response = requests.post(url, headers=headers, json=payload)
print(response.json())
r
library(httr)

url <- "https://biolm.ai/api/v3/igt5-paired/encode/"
headers <- c("Authorization" = "Token YOUR_API_KEY", "Content-Type" = "application/json")
body <- list(
  params = list(
    include = list(
      "mean",
      "residue"
    )
  ),
  items = list(
    list(
      heavy = "QVQLVESGGGLVQPGGSLRLSCAASGDIF",
      light = "DIQMTQSPSSLSASVGDRVTITCRAS"
    ),
    list(
      heavy = "EVQLVESGGDVVQPGRSLRLSCAASGFTF",
      light = "DILMTQSPSSLSASVGDRVTITCRASK"
    )
  )
)

res <- POST(url, add_headers(.headers = headers), body = body, encode = "json")
print(content(res))
POST /api/v3/igt5-paired/encode/

Encode endpoint for IgT5 Paired.

Request Headers:

Request

  • params (object, optional) — Configuration parameters:

    • include (array of strings, default: [“mean”]) — Output embedding types to include:

      • “mean” — Mean embedding representation (default)

      • “residue” — Per-residue embedding representations

  • items (array of objects, min: 1, max: 8) — Input antibody sequences:

    • heavy (string, optional, min length: 1, max length: 256) — Heavy chain amino acid sequence (standard amino acid codes only)

    • light (string, optional, min length: 1, max length: 256) — Light chain amino acid sequence (standard amino acid codes only)

    • sequence (string, optional, min length: 1, max length: 512) — Single unpaired amino acid sequence (standard amino acid codes only)

Example request:

http
POST /api/v3/igt5-paired/encode/ HTTP/1.1
Host: biolm.ai
Authorization: Token YOUR_API_KEY
Content-Type: application/json

      {
  "params": {
    "include": [
      "mean",
      "residue"
    ]
  },
  "items": [
    {
      "heavy": "QVQLVESGGGLVQPGGSLRLSCAASGDIF",
      "light": "DIQMTQSPSSLSASVGDRVTITCRAS"
    },
    {
      "heavy": "EVQLVESGGDVVQPGRSLRLSCAASGFTF",
      "light": "DILMTQSPSSLSASVGDRVTITCRASK"
    }
  ]
}
Status Codes:

Response

  • results (array of objects) — One result per input item, in the order requested:

    • embeddings (array of floats, size: 1024, optional) — Mean embedding vector for the provided antibody sequence; each float represents one dimension of the embedding space.

    • residue_embeddings (array of arrays of floats, shape: [sequence_length, 1024], optional) — Per-residue embedding vectors; each inner array corresponds to one residue in the input sequence, with 1024 dimensions per residue.

Example response:

http
HTTP/1.1 200 OK
Content-Type: application/json

      {
  "results": [
    {
      "embeddings": [
        0.006445258855819702,
        -0.02826373279094696,
        "... (truncated for documentation)"
      ],
      "residue_embeddings": [
        [
          0.0030039038974791765,
          -0.2898975610733032,
          "... (truncated for documentation)"
        ],
        [
          0.14874276518821716,
          -0.06095488741993904,
          "... (truncated for documentation)"
        ],
        "... (truncated for documentation)"
      ]
    },
    {
      "embeddings": [
        -0.026388389989733696,
        -0.032129086554050446,
        "... (truncated for documentation)"
      ],
      "residue_embeddings": [
        [
          -0.09284166991710663,
          -0.12178697437047958,
          "... (truncated for documentation)"
        ],
        [
          0.18048305809497833,
          -0.020936869084835052,
          "... (truncated for documentation)"
        ],
        "... (truncated for documentation)"
      ]
    }
  ]
}

Performance

  • The IgT5 Paired model is optimized by BioLM for GPU-accelerated inference, specifically deployed on NVIDIA T4 GPUs (16GB VRAM), ensuring efficient embedding generation and downstream prediction tasks.

  • IgT5 Paired significantly outperforms general protein language models (e.g., ProtT5, ProtBert) on antibody-specific predictive tasks such as binding affinity prediction, achieving higher R² values (e.g., 0.297 vs. 0.186 for ProtT5 on binding affinity benchmarks).

  • Compared to other antibody-specific language models provided by BioLM (e.g., IgBert Paired, IgBert Unpaired), IgT5 Paired demonstrates superior sequence recovery accuracy, especially in highly variable regions such as CDR-H3 (62.0% residue recovery vs. 60.1% for IgBert Paired).

  • IgT5 Paired embeddings, derived from averaging heavy and light chain representations, provide robust cross-chain contextual information, resulting in improved predictive performance on antibody-antigen binding tasks compared to concatenated embeddings from unpaired models.

  • Due to its larger parameter count (approximately 3 billion parameters), IgT5 Paired inference is computationally more demanding than IgBert Paired (420 million parameters), but BioLM’s GPU optimizations ensure practical inference times suitable for high-throughput antibody engineering workflows.

  • Typical inference completion time is on the order of seconds per batch, enabling efficient embedding generation for large-scale antibody sequence analysis and design.

Applications

  • Antibody affinity maturation by predicting beneficial mutations in complementarity-determining regions (CDRs), enabling researchers to rapidly identify sequence variants with enhanced antigen-binding affinity; valuable for accelerating lead optimization in therapeutic antibody development pipelines, although predictions may require experimental validation to confirm functional improvements.

  • Identification of antibody sequence liabilities through sequence recovery analysis, enabling early detection of problematic residues that may negatively impact antibody stability or manufacturability; useful for biopharmaceutical companies aiming to streamline candidate selection and reduce downstream development risks, though it does not directly predict biophysical properties such as aggregation or viscosity.

  • Generation of antibody embeddings for predictive modeling of antigen binding affinity, enabling ranking and prioritization of antibody candidates in large-scale screening campaigns; particularly beneficial in computational antibody discovery workflows to reduce experimental screening burden, however, embedding-based predictions alone may not fully capture complex structural interactions requiring additional structural modeling.

  • In silico humanization of therapeutic antibodies by assessing sequence similarity to human antibody repertoires, enabling identification of residues likely to elicit immunogenic responses; valuable for reducing immunogenicity risks early in therapeutic antibody design, though this method alone may not fully predict clinical immunogenicity outcomes.

  • Cross-chain pairing prediction for antibody variable regions by leveraging learned heavy-light chain interactions, enabling reconstruction of native antibody pairs from bulk-sequenced repertoires; particularly valuable for single-cell immune repertoire analysis and antibody repertoire mining, although accuracy may diminish for highly diverse or novel antibody sequences.

Limitations

  • Maximum Sequence Length: The heavy and light chains each have a maximum length of 256 amino acids, while the sequence input for unpaired sequences has a maximum length of 512 amino acids.

  • Batch Size: The maximum batch_size is 8 sequences per API request.

  • Input Type Constraints: The paired model requires both heavy and light chains to be provided; the unpaired model requires a single sequence input. Mixing these inputs within a single request is not allowed.

  • The IgT5 Paired model is specifically optimized for antibody variable region sequences. Performance on general protein sequences, or sequences significantly diverging from typical antibody variable regions, may be suboptimal.

  • IgT5 embeddings are effective for predicting antibody binding affinity but may not be optimal for predicting general protein properties (e.g., expression levels), where general protein language models (e.g., ProtT5) may perform better.

  • IgT5 Paired embeddings encode cross-chain features useful for tasks involving paired antibody sequences. For tasks involving only single-chain sequences, the IgT5 Unpaired model may be more appropriate.

How We Use It

BioLM integrates IgT5 Paired into antibody engineering workflows to enable rapid and accurate prediction of antibody binding affinity, stability, and developability directly from paired heavy-light chain sequences. This capability accelerates iterative antibody optimization by providing actionable insights for candidate selection and filtering, significantly reducing experimental cycles and associated costs. By combining IgT5 Paired embeddings with BioLM’s antibody-specific predictive models and generative design tools, research teams can quickly identify lead candidates with improved biophysical properties and therapeutic potential.

  • Integrates seamlessly with BioLM’s generative antibody design and multi-parameter optimization models

  • Enables rapid prioritization of antibody variants, shortening discovery timelines and lowering experimental overhead

References