NanoBodyBuilder2 is a GPU-accelerated transformer model designed specifically for nanobody sequence infilling, trained on 10 million camelid-derived sequences from the INDI database. It provides positional amino acid predictions for masked residues, enabling accurate mutational exploration without relying on germline-based alignments. Typical applications include nanobody humanization, affinity optimization, and stability prediction for therapeutic discovery and bioengineering tasks.

Predict

Predict the 3D structure (PDB) for nanobody heavy chain sequences

python
from biolmai import BioLM
response = BioLM(
    entity="nanobodybuilder2",
    action="predict",
    params={},
    items=[
      {
        "H": "QVQLVQSGAEVKKPGASVKVSCKVSGYTSPTTIHWVRQAPGKGLEWMGGISPYRGDTIYAQKFQGRVTMTEDTSTDTAYMELSSLKSEDTAVYYCARDGYSSGYYGMDVWGQGTTVTVSS"
      },
      {
        "H": "QVQLVQSGAEVKKPGASVKVSCKVSGYTSPTTIHWVRQAPGKGLEWMGGINAGTGDTIYAQKFQGRVTMTEDTSTDTAYMELSSLKSEDTAVYYCARDGYSSGYYGMDVWGQGTTVTVSS"
      }
    ]
)
print(response)
bash
curl -X POST https://biolm.ai/api/v3/nanobodybuilder2/predict/ \
  -H "Authorization: Token YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
  "items": [
    {
      "H": "QVQLVQSGAEVKKPGASVKVSCKVSGYTSPTTIHWVRQAPGKGLEWMGGISPYRGDTIYAQKFQGRVTMTEDTSTDTAYMELSSLKSEDTAVYYCARDGYSSGYYGMDVWGQGTTVTVSS"
    },
    {
      "H": "QVQLVQSGAEVKKPGASVKVSCKVSGYTSPTTIHWVRQAPGKGLEWMGGINAGTGDTIYAQKFQGRVTMTEDTSTDTAYMELSSLKSEDTAVYYCARDGYSSGYYGMDVWGQGTTVTVSS"
    }
  ],
  "params": {}
}'
python
import requests

url = "https://biolm.ai/api/v3/nanobodybuilder2/predict/"
headers = {
    "Authorization": "Token YOUR_API_KEY",
    "Content-Type": "application/json"
}
payload = {
      "items": [
        {
          "H": "QVQLVQSGAEVKKPGASVKVSCKVSGYTSPTTIHWVRQAPGKGLEWMGGISPYRGDTIYAQKFQGRVTMTEDTSTDTAYMELSSLKSEDTAVYYCARDGYSSGYYGMDVWGQGTTVTVSS"
        },
        {
          "H": "QVQLVQSGAEVKKPGASVKVSCKVSGYTSPTTIHWVRQAPGKGLEWMGGINAGTGDTIYAQKFQGRVTMTEDTSTDTAYMELSSLKSEDTAVYYCARDGYSSGYYGMDVWGQGTTVTVSS"
        }
      ],
      "params": {}
    }

response = requests.post(url, headers=headers, json=payload)
print(response.json())
r
library(httr)

url <- "https://biolm.ai/api/v3/nanobodybuilder2/predict/"
headers <- c("Authorization" = "Token YOUR_API_KEY", "Content-Type" = "application/json")
body <- list(
  items = list(
    list(
      H = "QVQLVQSGAEVKKPGASVKVSCKVSGYTSPTTIHWVRQAPGKGLEWMGGISPYRGDTIYAQKFQGRVTMTEDTSTDTAYMELSSLKSEDTAVYYCARDGYSSGYYGMDVWGQGTTVTVSS"
    ),
    list(
      H = "QVQLVQSGAEVKKPGASVKVSCKVSGYTSPTTIHWVRQAPGKGLEWMGGINAGTGDTIYAQKFQGRVTMTEDTSTDTAYMELSSLKSEDTAVYYCARDGYSSGYYGMDVWGQGTTVTVSS"
    )
  ),
  params = list()
)

res <- POST(url, add_headers(.headers = headers), body = body, encode = "json")
print(content(res))
POST /api/v3/nanobodybuilder2/predict/

Predict endpoint for NanoBodyBuilder2.

Request Headers:

Request

  • params (object, optional) — Configuration parameters:

    • plddt (bool, default: False) — Whether to compute per-residue pLDDT confidence scores

    • seed (int, optional, default: 42) — Random seed for reproducibility

  • items (array of objects, min: 1, max: 1) — Input sequences:

    • H (string, required, min length: 1, max length: 2048) — Heavy-chain amino acid sequence, using extended amino acid alphabet

    • L (string, required, min length: 1, max length: 2048) — Light-chain amino acid sequence, using extended amino acid alphabet

Example request:

http
POST /api/v3/nanobodybuilder2/predict/ HTTP/1.1
Host: biolm.ai
Authorization: Token YOUR_API_KEY
Content-Type: application/json

      {
  "items": [
    {
      "H": "QVQLVQSGAEVKKPGASVKVSCKVSGYTSPTTIHWVRQAPGKGLEWMGGISPYRGDTIYAQKFQGRVTMTEDTSTDTAYMELSSLKSEDTAVYYCARDGYSSGYYGMDVWGQGTTVTVSS"
    },
    {
      "H": "QVQLVQSGAEVKKPGASVKVSCKVSGYTSPTTIHWVRQAPGKGLEWMGGINAGTGDTIYAQKFQGRVTMTEDTSTDTAYMELSSLKSEDTAVYYCARDGYSSGYYGMDVWGQGTTVTVSS"
    }
  ],
  "params": {}
}
Status Codes:

Response

  • results (array of objects) — One result per input item, in the order requested:

    • pdb (string) — Predicted 3D structure in PDB format

    • plddt (array of arrays of floats, optional) — Predicted per-residue confidence scores (pLDDT), range: 0.0–100.0, shape: [sequence_length, 1]

Example response:

http
HTTP/1.1 200 OK
Content-Type: application/json

      {
  "results": [
    {
      "pdb": "REMARK  NANOBODY STRUCTURE MODELLED USING NANOBODYBUILDER2                      \nREMARK  STRUCTURE REFINED USING OPENMM 8.2, 2025-06-17                          \nATOM      1  N   GLN H   1     -10.551... (truncated for documentation)"
    },
    {
      "pdb": "REMARK  NANOBODY STRUCTURE MODELLED USING NANOBODYBUILDER2                      \nREMARK  STRUCTURE REFINED USING OPENMM 8.2, 2025-06-17                          \nATOM      1  N   GLN H   1     -10.287... (truncated for documentation)"
    }
  ]
}

Performance

  • NanoBodyBuilder2 is specifically optimized for nanobody sequence infilling, significantly outperforming general-purpose protein language models such as ESM-2 650M, as well as human antibody-specific models (e.g., AbLangHeavy, human_320, human_640) on nanobody sequences.

  • Predictive accuracy for nanobody sequences (measured by positional amino acid reconstruction) is approximately 77% for the V-region, compared to around 64% for human antibody-specific models and 57% for general protein models like ESM-2 650M.

  • Within nanobody-specific regions, NanoBodyBuilder2 achieves approximately 88% accuracy on framework regions (FW) and around 45% accuracy on complementarity-determining regions (CDRs), notably outperforming human antibody-specific models (around 76% FW, 30% CDRs).

  • NanoBodyBuilder2 is particularly effective in predicting therapeutic nanobody sequences, achieving approximately 77% accuracy in the V-region, compared to roughly 70-73% for human-specific antibody models.

  • The model is GPU-accelerated, deployed on NVIDIA T4 GPUs, ensuring rapid inference suitable for interactive and high-throughput applications.

  • NanoBodyBuilder2 provides comparable predictive performance to larger parameter nanobody-specific models (e.g., nanoBERT_big with 86M parameters), while being significantly more computationally efficient (nanoBERT_small architecture with 14M parameters).

  • Fine-tuning NanoBodyBuilder2 on nanobody-specific datasets (e.g., thermostability datasets like NBThermo) yields superior performance compared to human antibody-specific models, demonstrating improved Pearson correlation (mean ~0.5 vs. ~0.39 for Circular Dichroism data; ~0.41 vs. ~0.07 for DSF (SYPRO) data).

  • NanoBodyBuilder2 leverages BioLM’s optimized deployment infrastructure, ensuring scalable, reliable, and efficient inference for nanobody engineering tasks.

Applications

  • Nanobody humanization by predicting context-aware amino acid substitutions to reduce immunogenicity risks; enables therapeutic developers to identify mutations that balance human-like sequence profiles with nanobody stability and functionality; particularly valuable when traditional germline-based methods are insufficient due to incomplete camelid germline databases; not optimal for direct immunogenicity prediction without additional experimental validation.

  • Affinity maturation of therapeutic nanobodies through computational mutagenesis, providing residue-level suggestions for mutations likely to enhance target binding; allows biotech companies to rapidly prioritize experimental mutational libraries, reducing experimental screening costs; especially useful for optimizing CDR regions, though less effective for predicting mutations in highly diverse CDR3 loops without experimental data.

  • Stability engineering of nanobody therapeutics by identifying sequence substitutions associated with increased thermostability; helps formulation teams select candidate mutations for improved shelf-life and reduced aggregation in manufacturing; valuable for prioritizing variants before costly stability assays, though not a substitute for empirical stability testing.

  • Sequence liability removal by computationally identifying and replacing residues prone to chemical degradation or aggregation in nanobody frameworks; enables developers to proactively engineer out liabilities early in therapeutic development, reducing downstream manufacturing and formulation challenges; limited effectiveness for liabilities arising from complex post-translational modifications or formulation-specific interactions.

  • Computational library design for nanobody discovery campaigns by generating context-aware mutational diversity at specific positions; provides biotech companies with targeted diversity libraries enriched for functional variants, improving hit rates and reducing screening effort; most useful when applied to framework and CDR1/CDR2 regions, as predictions in highly variable CDR3 loops remain challenging without experimental validation.

Limitations

  • Maximum Sequence Length: The NanoBodyBuilder2 API supports sequences up to 2048 amino acids. Sequences exceeding this length are not supported and will result in an error.

  • Batch Size: Only 1 sequence can be processed per API request. For larger-scale analysis, multiple sequential API calls are required.

  • NanoBodyBuilder2 is specifically optimized for predicting nanobody (VHH single-domain antibody) structures. It is not suitable for general antibody modeling tasks, such as canonical human or mouse antibody structure prediction, where dedicated models like AbodyBuilder or general-purpose models (e.g., AlphaFold2, ESMFold) may yield better accuracy.

  • The model’s accuracy in predicting Complementarity Determining Region 3 (CDR3) loops is lower compared to framework regions and other CDRs, due to the inherent high diversity and variability in these regions. Users should interpret CDR3 predictions cautiously, especially when precise loop conformations are critical.

  • NanoBodyBuilder2 is trained primarily on natural llama-derived nanobody sequences. Its predictive accuracy may decrease for engineered or highly mutated therapeutic nanobodies that deviate significantly from natural sequence distributions.

  • This model does not provide sequence embeddings or encodings. Users requiring sequence embeddings for downstream tasks such as clustering, visualization, or sequence similarity analysis should use alternative transformer-based models (e.g., nanoBERT) designed specifically for embedding generation.

How We Use It

The NanoBodyBuilder2 algorithm enables BioLM to perform targeted nanobody engineering by predicting biologically feasible amino acid substitutions within nanobody sequences. Integrated into BioLM’s comprehensive protein design pipelines, NanoBodyBuilder2 complements structural modeling, thermostability prediction, and sequence embedding algorithms to accelerate antibody optimization and humanization tasks. Standardized, scalable APIs facilitate seamless incorporation of NanoBodyBuilder2 predictions into iterative design cycles, guiding experimental validation and reducing development timelines.

  • Supports iterative cycles of nanobody optimization and humanization.

  • Complements structural modeling and biophysical property prediction tools within BioLM workflows.

References