TemBERTure Regression (TemBERTureTm) predicts protein melting temperature (°C) from primary sequence using a protBERT-BFD backbone with adapter-based fine-tuning. Inputs are amino acid sequences (FASTA or raw strings), truncated to 512 tokens; outputs include per-sequence Tm, ensemble mean, and dispersion (std/IQR). Inference is GPU-accelerated and supports batched requests. Trained on Meltome Atlas data, the model achieved Pearson r≈0.78 and MAE≈6.3°C using a class-conditioned ensemble. Typical uses: variant ranking, proteome-scale Tm profiling, and stability screening in protein engineering workflows.
Predict¶
Predicts temperature stability for protein sequences
- POST /api/v3/temberture-regression/predict/¶
Predict endpoint for TemBERTure Regression.
- Request Headers:
Content-Type – application/json
Authorization – Token YOUR_API_KEY
Request
items (array of objects, min length: 1, max length: 8) — Input sequences:
sequence (string, min length: 1, max length: 512, required) — Protein sequence using extended amino acid codes plus “-”
Example request:
- Status Codes:
200 OK – Successful response
400 Bad Request – Invalid input
500 Internal Server Error – Internal server error
Response
results (array of objects) — One result per input item, in the order requested:
prediction (float, range: 0.0–1.0 for classifier or ~20.0–100.0°C for regression) — Predicted value
classification (string, optional) — Model-assigned class (“thermophilic” or “non-thermophilic”)
Example response:
Encode¶
Encodes protein sequences into embeddings
- POST /api/v3/temberture-regression/encode/¶
Encode endpoint for TemBERTure Regression.
- Request Headers:
Content-Type – application/json
Authorization – Token YOUR_API_KEY
Request
params (object, optional) — Configuration parameters:
include (array of strings, default: [“mean”]) — Allowed values: “mean”, “per_residue”, “cls”
items (array of objects, min length: 1, max length: 8, required) — Input sequences:
sequence (string, min length: 1, max length: 512, required) — Protein sequence using extended amino acids plus “-”
Example request:
- Status Codes:
200 OK – Successful response
400 Bad Request – Invalid input
500 Internal Server Error – Internal server error
Response
results (array of objects) — One result per input item, in the order requested:
sequence_index (integer) — Zero-based index mapping to the input sequence
embeddings (array of floats, size: 1024, optional) — Mean protein embedding
per_residue_embeddings (array of arrays of floats, shape: [≤512, 1024], optional) — Token-level embeddings for each residue
cls_embeddings (array of floats, size: 1024, optional) — CLS token embedding
Example response:
Performance¶
Input/Output types: input is a protein amino acid sequence (standard 20 AA plus “-”), 1–512 residues; output is JSON per item with prediction: float (melting temperature, °C) and classification: optional string (“thermophilic”/“non-thermophilic”, thresholded at 70°C for convenience)
Model footprint: 30-layer protBERT-BFD backbone with adapter-based regression head (≈420M backbone parameters; ≈5M trainable adapter parameters), dropout disabled for deterministic inference
Resource profile: optimized for NVIDIA GPUs (A100 80GB primary; L4 24GB for cost-optimized tiers); mixed-precision FP16 inference with fused attention kernels and dynamic padding to minimize wasted compute on shorter sequences
Sequence limits and batching: maximum sequence length 512 residues (longer inputs are truncated); server-side dynamic batching up to 8 sequences per request for best throughput
Latency (warm, no queue): A100 80GB averages 0.25–0.45 s per 512-aa sequence and 1.8–3.0 s for a full batch of 8; L4 24GB averages 0.5–0.9 s per 512-aa sequence and 3.5–5.5 s for a full batch of 8; latency scales approximately with O(L²) due to transformer attention, so 256-aa inputs are typically ~40–60% faster than 512-aa inputs
Throughput: A100 80GB sustains ~3–5 sequences/s at 512 aa; L4 24GB sustains ~1.5–2.5 sequences/s at 512 aa; throughput increases on shorter sequences via dynamic padding
GPU memory use: ~4–6 GB for a full 8×512 batch with FP16 on A100/L4; comfortably supports concurrent model replicas on 24–80 GB GPUs
Predictive performance (Meltome-like distribution): class-conditional ensemble (routing via TemBERTure Classifier, then averaging 5 best non-thermo and 2 best thermo regressors) achieves MAE ≈ 6.3°C and R² ≈ 0.78 on held-out test data; if used solely as a classifier at 70°C threshold, accuracy ≈ 82%
Stability: ensemble inference removes training-seed variance observed in single regressors; typical per-sequence prediction jitter across repeated calls is <0.3°C (deterministic kernels; no dropout at inference)
Relative speed/accuracy vs related BioLM models: - TemBERTure Regression vs TemBERTure Classifier: similar backbone cost, but the classifier head is lighter; classifier is typically 5–10% faster per batch and is more accurate for thermophilic/non-thermophilic labeling (F1 ≈ 0.90, MCC ≈ 0.78); use the classifier when only the class is required - TemBERTure Regression vs TEMPRO 650M/3B property regressors: TemBERTure Regression is 1.3–1.8× faster than TEMPRO 650M and roughly 2–5× faster than TEMPRO 3B at 512 aa on A100, with substantially lower GPU memory use; larger models can be more expressive on some properties but are costlier and slower for routine Tm scoring
Workload scaling: horizontal autoscaling of model replicas maintains low p95 latency under bursty loads; dynamic batcher aggregates small requests to saturate GPU without materially increasing tail latency for typical payload sizes
Practical guidance: prefer batches of similarly sized sequences to maximize dynamic padding efficiency; for mixed sequence lengths, group by length buckets (e.g., 50–150, 150–300, 300–512) for 10–25% latency improvements on the same hardware
Applications¶
High-throughput triage of enzyme variant libraries for thermostability - Use predicted melting temperature (Tm) directly from sequence to rank and down-select large mutational or recombination libraries before wet-lab screening, enabling teams to focus assays on candidates more likely to remain folded near the intended process temperature - Valuable for industrial biocatalysts (e.g., hydrolases for detergents, lignocellulose-degrading enzymes for biomass processing, polymer-degrading enzymes) where operating at 55–75°C improves mass transfer and reduces contamination - Not optimal for fine-grained, per-variant optimization within a narrow window (e.g., +2–3°C shifts); best used for coarse prioritization and elimination of clearly underperforming sequences; predictions reflect intrinsic sequence features and do not account for buffer, pH, cofactors, or stabilizing excipients
Prospective mining of public or proprietary sequence repositories for thermostable homologs - Scan UniProt/metagenomes to identify and rank homologs likely to retain structure at elevated temperatures, accelerating hit identification for high-temperature steps (e.g., biomass saccharification at 65°C, high-solids bioprocessing) - Reduces costly expression and assay of poor candidates by narrowing to families and clades enriched for higher predicted Tm distributions - Model performs best as a ranking/filtering tool rather than an absolute Tm oracle; performance may degrade on sequences far outside training distribution; validate top hits experimentally
Stability gating in generative protein design workflows - Integrate Tm prediction as a fast filter to remove low-stability designs prior to structure modeling, docking, or multi-parameter optimization, enabling design loops that target stability and function simultaneously - Useful when exploring sequence space far from known scaffolds, where quick thermostability screening prevents wasting compute and bench time on fragile constructs - Not intended to serve as a single optimization objective for precise Tm targeting; pair with orthogonal predictors (structure-based stability, aggregation propensity) and confirm with experimental thermal shift assays
Process setpoint compatibility checks for bioconversion and manufacturing - Evaluate whether candidate enzymes are likely to tolerate planned reactor setpoints (e.g., 60–70°C continuous operation), informing go/no-go, buffer selection, and whether stabilization engineering is required - Helps CDMOs and process development teams de-risk tech transfer by flagging sequences that are unlikely to withstand anticipated thermal exposure during upstream processing or formulation stress screens - Predictions are sequence-intrinsic and do not capture chaperone assistance, formulation additives, or immobilization effects; maintain safety margins around desired operating temperatures
Proteome-scale assessment to select source organisms or donors for high-temperature applications - Estimate Tm distributions across proteomes to identify organisms or gene donors whose proteins are broadly predisposed to higher thermal stability, guiding homolog selection for pathway engineering in thermotolerant chassis - Effective for narrowing species panels before cloning and expression, especially when growth temperature metadata are incomplete or inconsistent - Best for distribution-level insights rather than exact Tm per protein; sequences longer than 512 amino acids may need domain-centric evaluation due to model training truncation limits
Limitations¶
Maximum Sequence Length and Batch Size: The API accepts up to 8 sequences per request (Batch Size: 8;
batch_size= 8). Eachsequencemust be ≤512 amino acids (max_sequence_len= 512). Longer inputs are hard-truncated at 512 residues by the tokenizer, which may drop N-/C-terminal or domain content that affects predicted Tm. The supported alphabet is the extended amino-acid set with-allowed; characters outside the supported set are rejected.Input/Output semantics for regression:
TemBERTurePredictRequest.itemscontains one or more objects with a singlesequencefield; there are no tunable parameters for the predict endpoint. The regression response returns a single float per item inprediction(interpreted as melting temperature in °C). Theclassificationfield is typicallyNonefor the"regression"model type; if you need a thermo-class label, use the"classifier"model. The regression endpoint does not provide uncertainty estimates or per-residue attributions.Truncation and long/multi-domain proteins: Because all inputs are truncated to 512 residues, very long or multi-domain proteins, fusion constructs, and sequences with critical terminal tags may be incompletely represented, reducing accuracy. Pre-trim to the biologically relevant construct (e.g., single domain) when possible.
Data and algorithmic bias toward coarse classes: The regression model (TemBERTureTm) was trained on Meltome-derived data and exhibits a bimodal prediction tendency (clustering below ~60°C or above ~80°C). Within-class calibration is limited (weak within-class correlations), so the model is not optimal for fine-grained ranking of variants with similar Tm, especially near the 60–80°C range or when small ΔTm shifts matter.
Generalization limits (organism/domain shift): Performance can degrade on sequences from organisms and proteomes not well represented in training, with observed drops when moving to completely new species. De novo or highly divergent designs may receive overconfident class-like Tm values; experimental validation is recommended before downstream decisions.
Sequence-only constraints and context: Predictions reflect intrinsic sequence features and do not account for extrinsic conditions (pH, buffer, cofactors/ligands, metal ions, chaperones, PTMs, assay format). As such, the model is not optimal for context-dependent stability, for precise per-mutation ΔTm estimation without retraining, or for cases where stability depends on environment or complex assembly not encoded in the primary sequence.
How We Use It¶
TemBERTure Regression (TemBERTureTm) is used as a thermostability signal within BioLM’s multi-objective protein engineering workflows, where it integrates with TemBERTure classification, protein language model embeddings, 3D structure–derived metrics, and physicochemical property estimators to rank, filter, and cluster large design libraries for enzymes and antibodies. Standardized, scalable APIs expose Tm estimates, class-aware ensemble outputs, and uncertainty so teams can orchestrate high-throughput triage, calibrate predictions to assay conditions with limited wet-lab measurements, and drive active learning loops that iteratively propose variants (e.g., via masked language models) meeting stability, activity, and developability targets. When per-sequence Tm precision is limited, we use class-conditional strategies and population-level distributions as priors, improving decision quality and reducing synthesis and assay overhead across optimization rounds.
Accelerates lead selection: screens millions of variants to enrich for designs projected to clear Tm thresholds before synthesis, increasing hit rates and reducing cycle time.
Lab-in-the-loop optimization: calibrates to client-specific thermal-shift or DSC assays, feeds back measured Tm to retrain and re-rank designs, and balances thermostability with activity, expression, and solubility.
References¶
Rodella, C., Lazaridi, S., & Lemmin, T. (2024). TemBERTure: advancing protein thermostability prediction with deep learning and attention mechanisms. Bioinformatics Advances.
