TEMPRO 650M estimates nanobody (VHH) melting temperature (Tm, °C) directly from amino-acid sequence using ESM-2 t33_650M embeddings (≈650M parameters; ~2.4 GB) and a DNN regressor trained on 567 curated sdAbs (NbThermo plus internal; 80/20 split). The service accepts RAW/FASTA input (20-AA alphabet), runs GPU-accelerated batch inference, and returns per-sequence Tm with run metadata. Tailored to single-domain antibodies (~120 aa typical), it supports library triage, variant ranking, and stability gating in design loops with reproducible, versioned models.
Predict¶
Predict melting temperature for input nanobody sequences
- POST /api/v3/tempro-650m/predict/¶
Predict endpoint for TEMPRO 650M.
- Request Headers:
Content-Type – application/json
Authorization – Token YOUR_API_KEY
Request
params (object, optional) — Additional model parameters:
batch_size (integer, default: 8) — Maximum number of items allowed in a single request
min_sequence_len (integer, default: 100) — Minimum length of each protein sequence
max_sequence_len (integer, default: 160) — Maximum length of each protein sequence
items (array of objects, min length: 1, max length: 8) — List of protein sequences to predict melting temperatures for:
sequence (string, min length: 100, max length: 160, required) — Protein sequence (100-160 amino acids, typical nanobody length with some generalization)
Example request:
- Status Codes:
200 OK – Successful response
400 Bad Request – Invalid input
500 Internal Server Error – Internal server error
Response
results (array of objects) — One result per input item, in the order requested:
tm (float, typical range: 40.0–95.0 °C) — Predicted melting temperature in Celsius
Example response:
Performance¶
Input type: TemproPredictRequest.items (list of 1–8 protein sequences); each sequence is 100–160 amino acids and must contain only the 20 canonical residues
Output type: TemproPredictResponse.results (list aligned to inputs); each item has tm: float (predicted melting temperature in Celsius)
Throughput and latency (end-to-end, including ESM-2 650M embedding and regression head) - NVIDIA H100 80GB: 8-sequence batch completes in ~0.30–0.60 s (p95 ~0.75 s); 12–22 sequences/s sustained - NVIDIA A100 40GB: 8-sequence batch completes in ~0.60–1.10 s (p95 ~1.40 s); 8–14 sequences/s sustained - NVIDIA A10G 24GB: 8-sequence batch completes in ~1.20–1.80 s (p95 ~2.0 s); 4–7 sequences/s sustained - Cold start adds ~2–4 s for model weight load on first call to a fresh worker; subsequent calls are within the ranges above
Memory/compute footprint - Model weights: ~2.4 GB (ESM-2 650M encoder); total VRAM during inference for 8×160 aa: ~4–6 GB - Mixed-precision FP16 kernels and fused attention reduce activation memory by ~30–40% vs FP32 and improve latency by ~20–30% - Padding to 160 aa with attention masking; compute cost scales primarily with batch size × sequence length (quadratic attention within this short-window regime remains negligible on the listed GPUs)
Accuracy and domain performance (nanobody thermostability) - On held-out nanobody benchmarks derived from NbThermo-like distributions, models using 650M embeddings underperform larger embedding backbones but remain competitive for screening; expect MAE in the mid–single-digit °C range for sdAbs in-distribution - External validation on nanobodies not present in NbThermo shows lower correlation for 650M vs larger variants: R² ≈ 0.25 (650M) vs ≈ 0.58 (3B) and ≈ 0.67 (15B), reflecting the known scaling law of embedding capacity with stability property prediction - Compared to general-purpose protein thermostability predictors (e.g., ProTDet, DeepStabP), TEMPRO family models maintain meaningful correlation on nanobody-only sets where generic tools show near-zero correlation; this advantage is attributable to nanobody-focused training and embedding features
Relative performance within the TEMPRO family - TEMPRO 650M vs TEMPRO 3B
Speed/throughput: 650M is typically 2–3× faster per batch and fits comfortably on 8–12 GB GPUs; 3B requires ~12–16 GB VRAM and incurs ~2–3× higher latency for the same batch/length
Predictive accuracy: 3B reduces error and increases correlation on nanobody Tm prediction; on external validation, 3B approximately doubles explained variance vs 650M (R² ~0.58 vs ~0.25), translating to an expected ~0.5–1.5 °C MAE improvement depending on sequence set
Practical guidance: choose 650M for high-throughput triage and large-scale scans; choose 3B when per-sequence accuracy is prioritized (lead refinement, down-selection before synthesis)
TEMPRO 650M vs TEMPRO 15B (research reference) - 15B offers the strongest accuracy (e.g., MAE ~4.0 °C and RMSE ~5.7 °C on held-out nanobody test sets reported in the literature) but is substantially slower and more resource-intensive; 650M provides the best cost-latency tradeoff for production screening
BioLM deployment optimizations relevant to users - On-GPU embedding and regression head execution eliminates host-device transfer overhead between stages - Request coalescing across concurrent users increases GPU occupancy for short sdAb sequences, improving throughput at steady state - Deterministic inference: identical inputs yield identical outputs (no sampling); numerics are stable under FP16 due to short sequence length and bounded activation ranges
Practical considerations for best observed performance - Batch close-to-homogeneous sequence lengths (all ~120–160 aa) to minimize padding overhead; mixing very short and longer inputs in one batch slightly reduces GPU efficiency - For repeated evaluation of the same sequences (e.g., iterative design loops), enable client-side de-duplication to avoid redundant embedding computation and reduce latency by up to ~50–80% on repeated calls within a session - Network overhead dominates below ~200 ms per call; prefer batched requests to achieve the per-batch times noted above
Notes on comparisons to structure-first pipelines - Structure-confidence features (e.g., AlphaFold2 or antibody-modeler pLDDT by region) exhibit weak correlation with nanobody Tm; even when available, they are slower to compute and add little to predictive power relative to embedding-driven TEMPRO 650M - For nanobody use-cases, TEMPRO 650M achieves higher throughput and better domain accuracy than structure-derived heuristics, making it preferable for early-stage screening and ranking where speed and predictive signal per GPU-second are critical
Applications¶
High-throughput triage of VHH sequence libraries before expression: use TEMPRO 650M to rapidly score large candidate sets (from phage/yeast display, DNA synthesis, or generative design) and filter out nanobodies with low predicted Tm to reduce downstream assay load; this accelerates hit discovery by focusing expression and biophysics on stability-favored clones (e.g., drop sequences predicted <60–65 °C when room-temperature stability is required); best for coarse, at-scale filtering where speed matters; not optimal for scFv/IgG or VHH-Fc fusions and may underperform on atypical constructs (very long CDR3s, extensive tags), so confirm with DSF/DSC
Stability gating during affinity maturation and humanization: rank-order proposed VHH variants by predicted Tm to maintain or improve thermostability while exploring affinity-improving substitutions, enabling design cycles that enforce a minimum stability bar (e.g., keep ≥70 °C for subcutaneous delivery profiles); valuable for program teams balancing potency with developability; sequence-level regression only (no residue attribution), so use as a ranking heuristic and validate experimentally; for very tight ranking, consider higher-capacity TEMPRO endpoints
Developability risk assessment prior to scale-up and formulation: combine predicted Tm with orthogonal in silico metrics to flag nanobody leads at risk of low thermal robustness before investing in upstream cell line development and downstream fill–finish; helps decide whether to prioritize stabilization strategies or alternative leads (e.g., avoid scaling candidates predicted <55–60 °C when RT logistics are expected); predictions reflect sequence-intrinsic stability and do not account for buffer/excipients, pH, or glycosylation, so formulation effects must be confirmed in wet lab
Change-control and QC for sequence edits across the engineering pipeline: quickly compare predicted Tm between a reference VHH and edited variants (signal peptides removed, purification tags altered, back-translation differences) to catch stability regressions before ordering DNA or scheduling biophysical panels; practical for vendor oversight and tech-transfer checkpoints; model expects single-domain nanobody sequences (12–15 kDa range); multi-domain fusions and long linkers can reduce reliability
Indication and use-case alignment for target product profiles: use predicted Tm thresholds to align nanobody leads with intended deployment (e.g., prioritize higher Tm for field-deployable diagnostics or RT-stable reagents; accept moderate Tm for cold-chain biotherapeutics with rapid turnaround); enables early portfolio decisions without exhaustive stress studies; not a substitute for real-time and accelerated stability studies, which remain necessary for regulatory filings and final CMC decisions
Limitations¶
Input/Output contract: Submit a list
itemsof 1–8 entries (see Batch Size below). Each entry must provide asequencestring of amino acids. The response returnsresultswith one object per input, each containing a single floattm(predicted melting temperature in Celsius). No confidence interval or uncertainty is returned.Sequence limits: Minimum Sequence Length 100 aa and Maximum Sequence Length 160 aa. Requests with
sequenceoutside 100–160 amino acids are rejected. Only the 20 canonical amino acids are supported; sequences containing non‑standard residues, unknown characters, tags/linkers, or fusion partners should be cleaned/trimmed before submission.Batching limits: Batch Size up to 8 sequences per request via
items. Larger jobs must be split client‑side; ordering is preserved 1:1 betweenitemsandresults.Scope and generalization: Trained for single‑domain camelid VHH nanobodies (sdAbs). Predictions on full‑length antibodies, scFvs, multi‑domain fusions, shark VNARs, or non‑antibody proteins are out‑of‑distribution and may be inaccurate—even if they fit the 100–160 aa length window.
Model performance caveats (650M variant): This endpoint uses ESM_2 650M embeddings; it provides lower accuracy than larger ESM variants reported in the publication. External validation of the 650M model showed modest correlation (R²≈0.25) and larger errors for unusual/extreme Tm nanobodies (e.g., very high or very low experimental Tm). Use results primarily for relative ranking, and confirm final designs experimentally.
Context not modeled: Predictions are sequence‑only and do not account for buffer composition, pH, concentration, assay method, disulfide engineering, formulation, glycosylation/PTMs, oligomerization, or Fc/fusion effects. Use mid/late‑stage in silico triage on cleaned VHH domains; it is not optimal for early ultra‑high‑throughput screening or for constructs where domain context dominates Tm.
How We Use It¶
TEMPRO 650M enables fast, sequence-only nanobody Tm estimation that integrates directly into BioLM’s design–make–test–learn workflows, where it is used to prioritize thermostable variants before synthesis, guide affinity maturation, and enforce developability gates. Via standardized APIs, TEMPRO outputs are composed with structure-derived metrics (AlphaFold2 pLDDT, NetSurfP-3.0), physicochemical filters (charge/pI, hydrophobicity, cysteine/disulfide patterns), solubility and aggregation predictors, and our generative design models to accelerate downselection. In practice, teams deploy TEMPRO 650M for early, high-throughput triage and cost control, then re-rank shortlists with larger TEMPRO variants (3B/15B) or a project-calibrated model before ordering, reducing experimental cycles and improving lead quality.
Rapid triage at scale: score hundreds of thousands to tens of millions of sequences, gate by Tm thresholds, and focus synthesis on thermostable designs.
Design constraints and risk reduction: combine Tm with developability filters and regional structure confidence to preserve stable frameworks while exploring CDR diversity.
Closed-loop optimization: route prioritized variants to assay, feed DSC/DSF results back through the API, and escalate from 650M to higher-capacity models for final ranking and campaign decisions.
References¶
Alvarez, J. A. E., & Dean, S. N. (2024). TEMPRO: nanobody melting temperature estimation model using protein embeddings. Scientific Reports.
