TEMPRO 3B estimates nanobody (VHH) melting temperature (Tm, C) from sequence using ESM-2 t36_3B UR50D embeddings (2560-d per residue) and a trained DNN regressor, tailored and validated on 567 nanobodies. It supports single or batch GPU-accelerated inference; inputs are AA sequences (FASTA or plain), output is per-sequence Tm. On held-out data, ESM-3B achieved ~4.2 C MAE; external validation gave R^2 ~0.58. The API supports batching, id passthrough, and JSON or CSV results. Use cases include variant ranking, library triage, thermostability screening, and design workflows.
Predict¶
Predict melting temperature for input nanobody sequences
- POST /api/v3/tempro-3b/predict/¶
 Predict endpoint for TEMPRO 3B.
- Request Headers:
 Content-Type – application/json
Authorization – Token YOUR_API_KEY
Request
params (object, optional) — Model parameters:
batch_size (int, default: 8) — Maximum number of sequences accepted per request
min_sequence_len (int, default: 100) — Minimum sequence length (in amino acids)
max_sequence_len (int, default: 160) — Maximum sequence length (in amino acids)
items (array of objects, min: 1, max: 8) — List of protein sequences to predict melting temperatures for:
sequence (string, length: 100–160, required) — Protein sequence (100–160 amino acids, typical nanobody length with some generalization)
Example request:
- Status Codes:
 200 OK – Successful response
400 Bad Request – Invalid input
500 Internal Server Error – Internal server error
Response
results (array of objects) — One result per input item, in the order requested:
tm (float, typical range: 40.0–95.0 °C) — Predicted melting temperature in Celsius
Example response:
Performance¶
Input and output types: request accepts a list of 1–8 protein sequences; each sequence must be 100–160 amino acids using the 20 standard residues; response returns one floating-point tm (Celsius) per input sequence
Model variant and pipeline: TEMPRO 3B uses ESM-2 3B embeddings (UR50) followed by a compact feed-forward DNN; inference computes embeddings on-the-fly and streams them directly into the regressor (no structure prediction), enabling low-latency, GPU-accelerated execution
Predictive accuracy (nanobody Tm): on the internal hold-out set, ESM-2 3B embeddings with the DNN achieve approximately MAE ≈ 4.2 °C and RMSE ≈ 5.8–6.0 °C; on an external set of experimentally measured nanobody Tm values (INDI), the 3B variant attains R² ≈ 0.58 (vs 0.67 for 15B and 0.25 for 650M), reflecting a favorable accuracy/speed trade-off for production use
Relative accuracy within the TEMPRO family: TEMPRO 3B is within ~0.2 °C MAE of TEMPRO 15B on the internal hold-out set (≈ 4.24 °C vs 4.03 °C) while offering substantially lower compute cost; compared to TEMPRO 650M, 3B reduces error modestly on the internal hold-out (≈ 0.2–0.5 °C MAE) and improves external generalization markedly (R² +0.33 on INDI)
Latency and throughput: for 100–160 aa nanobody sequences, the 3B variant executes significantly faster than 15B and slower than 650M; in practice, TEMPRO 3B is roughly 4–6× faster than a 15B-based deployment and ~3–5× slower than a 650M-based deployment at comparable batch sizes; this gap stems from the parameter count and attention FLOPs scaling of the embedding model rather than the downstream regressor
GPU and memory footprint: optimized for NVIDIA A100 80 GB and L4/L40S; typical inference VRAM utilization for 3B FP16 is ~6–12 GB depending on concurrency and batch composition; the deployment uses mixed precision (FP16) with fused attention kernels to reduce memory bandwidth pressure and improve tokens/s
System optimizations: dynamic request coalescing and shape-aware batching keep SM occupancy high across variable-length inputs; ahead-of-time graph capture plus operator fusion amortize Python overhead; content-hash de-duplication avoids recomputing embeddings for identical sequences appearing within the same batch
Determinism and numeric stability: inference is seed-stable at the DNN layer; the embedding forward pass uses deterministic kernels where available; mixed precision is enabled with loss-scaling to maintain numerical fidelity; internal A/B tests show negligible drift from FP32 (≤ 0.05 °C MAE delta) on representative nanobody sets
Comparative cost-efficiency: per sequence, TEMPRO 3B yields the best accuracy-per-millisecond among the TEMPRO variants—achieving near-15B accuracy at a fraction of its compute and memory, while offering materially better accuracy than 650M at a modest additional cost
Where TEMPRO 3B fits among related BioLM models: unlike structure predictors (e.g., AlphaFold2, ESMFold, NanoBodyBuilder2), TEMPRO 3B does not run 3D modeling and is therefore orders of magnitude faster and cheaper for thermostability estimation; compared to other sequence-only models on the platform, TEMPRO 3B’s nanobody-specific training and ESM-2 3B embeddings deliver substantially higher correlation to experimental nanobody Tm than general protein stability classifiers, especially on high-Tm and low-Tm extremes
Scaling behavior under load: throughput scales linearly with additional GPUs due to stateless inference; per-GPU performance remains stable with high concurrency as dynamic batching aggregates small requests; the service maintains tight p95 latency bounds without sacrificing accuracy by avoiding quantization that meaningfully affects regression quality
Practical guidance: for large nanobody panels, prefer TEMPRO 3B over 15B to maximize sequences processed per GPU-hour with minimal accuracy loss; for quick triage where cost dominates and small accuracy regressions are acceptable, TEMPRO 650M remains faster but exhibits notably weaker external correlation on nanobody datasets
Applications¶
High-throughput triage of nanobody libraries from display or AI generation: run TEMPRO 3B on thousands to millions of VHH sequences to prioritize candidates above a thermostability gate (for example, Tm ≥ 65–70 °C) before expression and DSC/CD. This cuts wet-lab screening cost and cycle time by focusing on molecules more likely to survive handling, transport, and downstream processing. Best used for relative ranking; absolute Tm can deviate by several degrees and depends on assay buffer and conditions.
Variant ranking for thermostabilization campaigns: enumerate targeted point mutations or small combinatorial sets (for example, CDR1 scans or FR2–FR3 cysteine-introducing designs) and score each variant with TEMPRO 3B to pick a compact, high-confidence subset for synthesis. This enables rapid, design–build–test loops that improve Tm while maintaining binding through parallel affinity assays. TEMPRO 3B does not predict affinity, expression, or solubility and is not a generative model; use it to rank your designed variants.
Developability gating for therapeutic and diagnostic nanobodies: apply predicted Tm as an early developability filter to reduce late-stage attrition—select leads more likely to tolerate room-temperature handling, shipping, and storage for systemic therapeutics, imaging agents, or point-of-care diagnostics. This helps formulation teams focus on inherently more stable scaffolds. Not a substitute for forced-degradation or aggregation studies; Tm is one dimension of developability and does not directly capture viscosity or immunogenicity risks.
Resource prioritization for structural modeling and biophysical assays: use TEMPRO 3B scores to decide which sequences warrant structure prediction, cloning, and biophysical characterization, enabling “fewer, better” experiments. Teams can schedule DSC only on top-ranked sequence sets and defer lower-confidence designs. Predictions are for isolated VHH domains; results may shift after tags, linkers, or conjugations and should be rechecked post-modification.
Stability-aware assembly of multi-format VHH constructs: select monomer VHH building blocks with higher predicted Tm for biparatopic binders, multivalents, CAR binders, or Fc fusions to reduce risk of low overall thermal margins. This improves the chance that the assembled construct remains stable under manufacturing and storage constraints. TEMPRO 3B estimates Tm of single-domain VHHs; fusion context, inter-domain contacts, and formulation effects are not modeled and require empirical verification.
Limitations¶
API I/O constraints: Input is
items(list) where each element contains a singlesequence. Alloweditemslength is 1–8 (batch_size= 8). Eachsequencemust be 100–160 amino acids (min_sequence_len= 100,max_sequence_len= 160) using the 20 standard amino acids; non-canonical or ambiguous residues are rejected. Output isresults(list) aligned 1:1 with inputs; each entry has a singletm(float, °C). No confidence intervals, per-residue attributions, or intermediate features are returnedDomain specificity: Trained specifically on camelid VHH nanobodies; reliability degrades for other formats (scFv, Fab, full IgG), shark VNAR, multimeric fusions, or sequences with signal peptides/tags/linkers. Inputs outside typical VHH composition may pass length checks but remain out-of-distribution and less accurate
Assay/context dependence: Reported Tm values depend on buffer, pH, concentration, scan rate, and assay modality. Predictions are context-agnostic and best used for relative ranking under comparable conditions, not for absolute specification or release criteria
Sequence-only model: The predictor uses protein language model embeddings (ESM) from sequence; it does not leverage experimental/3D structure or per-region confidence. Prior work shows AlphaFold2 pLDDT per-region features have low direct correlation to nanobody Tm, and adding NetSurfP3 or physicochemical features only helps when combined carefully—this API returns a single sequence-based estimate and may miss structure- or formulation-driven effects (e.g., specific disulfide placements, CDR conformations)
Out-of-distribution sequences: Highly synthetic sequences (unusual cysteine patterns, extreme charge/hydrophobicity, long low-complexity repeats) or sequences containing non-natural residues will either be rejected (invalid alphabet) or yield degraded accuracy. Extrapolation outside the typical nanobody Tm range seen in literature should be treated cautiously
Throughput and decision-fit: Requests are limited to at most 8 sequences per call (
items≤batch_size); large libraries require client-side batching and aggregation. The model returns a single scalartmand does not estimate ΔΔTm for mutations; for mutation prioritization, interpretability, or end-of-pipeline go/no-go decisions where small Tm deltas matter, pair with orthogonal models and experimental validation
How We Use It¶
BioLM integrates TEMPRO 3B as a fast, sequence-only nanobody Tm estimator to gate design candidates, reduce synthesis load, and shorten optimization cycles. In our antibody/nanobody workflows, TEMPRO 3B is used for early-stage triage and ranking across large variant libraries, then combined with masked language model proposals, AlphaFold/ESMFold structure QC, NetSurfP3 features, MAESTRO ΔΔG, and developability metrics (charge, hydrophobicity, liabilities) to enforce thermostability thresholds aligned to storage, formulation, and CMC targets. Standardized APIs enable batch scoring, automated decision rules (e.g., Tm cutoffs per program), and seamless push of annotations into LIMS/ELN, accelerating lab-in-the-loop campaigns from initial scaffolds through affinity maturation and humanization.
Primary applications: stability-aware ranking of CDR variants and humanization candidates; guided mutational scanning to lift Tm while preserving binding; portfolio-wide risk screening prior to synthesis.
Integration pattern: TEMPRO 3B for high-throughput triage, followed by higher-capacity ensembles (e.g., ESM-15B-based models) at down-selection; results feed directly into design-of-experiments and procurement queues.
References¶
Alvarez, J. A. E., & Dean, S. N. (2024). TEMPRO: nanobody melting temperature estimation model using protein embeddings. Scientific Reports.
