TemBERTure Classifier (TemBERTureCLS) predicts protein thermostability class (thermophilic >60°C vs non-thermophilic) from primary sequence using a protBERT-BFD backbone with Pfeiffer adapter fine-tuning. The service accepts FASTA input, tokenizes at the amino acid level, truncates to 512 residues, and returns class labels with probabilities; optional per-residue attention scores support interpretation. Trained on the curated TemBERTureDB, it achieves ~0.89 accuracy, F1 0.90, MCC 0.78. GPU-accelerated, batch inference enables triage for enzyme engineering, library pruning, and metagenome annotation.
Predict¶
Predict properties or scores for input sequences
- POST /api/v3/temberture-classifier/predict/¶
Predict endpoint for TemBERTure Classifier.
- Request Headers:
Content-Type – application/json
Authorization – Token YOUR_API_KEY
Request
items (array of objects, min length: 1, max length: 8) — Input sequences
sequence (string, min length: 1, max length: 512, required) — Protein sequence with extended amino acid codes plus “-”
Example request:
- Status Codes:
200 OK – Successful response
400 Bad Request – Invalid input
500 Internal Server Error – Internal server error
Response
results (array of objects) — One result per input item, in the order requested:
prediction (float, range: 0.0–1.0 for classifier or approximately 20.0–110.0 °C for regression) — Model output score or predicted melting temperature
classification (string, optional) — Predicted protein thermal class (e.g. “thermophilic” or “non-thermophilic”)
Example response:
Encode¶
Generate embeddings for input sequences
- POST /api/v3/temberture-classifier/encode/¶
Encode endpoint for TemBERTure Classifier.
- Request Headers:
Content-Type – application/json
Authorization – Token YOUR_API_KEY
Request
params (object, optional) — Configuration parameters:
include (array of strings, default: [“mean”]) — Types of embeddings to include (possible values: “mean”, “per_residue”, “cls”)
items (array of objects, min length: 1, max length: 8) — Input sequences:
sequence (string, min length: 1, max length: 512, required) — Protein sequence using extended amino acid codes plus “-”
Example request:
- Status Codes:
200 OK – Successful response
400 Bad Request – Invalid input
500 Internal Server Error – Internal server error
Response
results (array of objects) — One result per input item, in the order requested:
sequence_index (int) — Zero-based index of the input sequence
embeddings (array of float, shape: 1024, optional) — Mean protein embeddings
per_residue_embeddings (array of arrays of float, shape: [L, 1024], optional) — Per-residue embeddings (L ≤ 512)
cls_embeddings (array of float, shape: 1024, optional) — CLS token embedding
Example response:
Performance¶
Input and output types - Input: amino acid sequence(s) as strings; extended protein alphabet supported with gap character ‘-’ permitted - Output (per sequence): prediction (float in [0.0–1.0], higher means more thermophilic) and classification (string: “thermophilic” or “non-thermophilic”)
Hardware and runtime characteristics - Deployed on NVIDIA H100 and A100 for high-throughput inference, with L4 used for cost-optimized throughput; mixed-precision (FP16/BF16) inference with fused attention kernels - Typical end-to-end tokenizer + model compute share: >95% GPU-bound, <5% CPU tokenization overhead for 512-aa inputs - Throughput at 512 aa (dynamic batching enabled): H100 ~1,600–2,100 sequences/min; A100 ~900–1,300 sequences/min; L4 ~400–650 sequences/min - Memory footprint at 512 aa (single process, adapters active): ~1–3 GB GPU memory; adapters add negligible overhead at inference relative to the base model
Model architecture and complexity - Based on protBERT-BFD (~420M parameters, 30 transformer layers, 16 heads, 1024 hidden); O(L²) attention cost with sequence length L - Adapter-based fine-tuning (Pfeiffer adapters) reduces train-time parameters from ~420M to ~5M; inference cost remains comparable to full fine-tuning with negligible adapter overhead
Predictive performance (TemBERTure Classifier) - In-domain test set (TemBERTureDB, clustered split to prevent leakage): accuracy 0.89, F1 0.90, MCC 0.78; balanced class F1 (non-thermophilic 0.88, thermophilic 0.90); low run-to-run variance across seeds - Cross-dataset generalization after 50% identity filtering: accuracy ~0.86 (iThermo) and ~0.83 (TemStaPro test subset); precision remains high for non-thermophiles with moderate drop for thermophiles on unseen organisms - Identity-stratified performance shows stable non-thermophilic classification across identity bins; thermophilic performance degrades primarily below 20% identity, consistent with harder out-of-distribution sequences
Comparative performance within BioLM’s model family - Versus TemBERTure Regression used as a classifier (70°C threshold): classifier achieves higher accuracy (0.89 vs ~0.82) and markedly better thermophile recall with lower variance across random seeds; regression exhibits bimodal bias around class boundaries when used for Tm prediction - Versus ESM-2 650M + lightweight classifier head (internal benchmark on matched splits): TemBERTure Classifier provides comparable or better class balance and thermophile recall while running ~1.4–1.8× faster per 512-aa sequence on A100-class GPUs due to the smaller base model and optimized kernels; ESM-2 150M is faster but shows lower thermophile recall and MCC on the same benchmarks
Operational optimizations for scale - Dynamic request coalescing and sequence-length bucketing to maximize GPU occupancy with minimal queuing overhead - Kernel-fused attention and mixed-precision execution to reduce latency without measurable loss in classification accuracy (logit deltas vs FP32 within numerical noise) - Horizontal autoscaling across GPU pools to sustain high sustained throughput; cold-start amortization via warm pools and weight preloading
Practical guidance - Scores are well-calibrated for a default threshold of 0.5 on in-domain data; for cross-organism scenarios, users targeting thermophile enrichment should consider slightly lower thresholds to maximize recall while monitoring precision - For long proteins, inference cost grows quadratically with length; batching proteins of similar lengths yields the best device utilization and end-to-end throughput
Applications¶
High-throughput triage of enzyme variant libraries for hot-process biocatalysis: use TemBERTureCLS to rank sequences by thermophilic class score before expression, reducing wet-lab screening by focusing on variants more likely to remain active at ≥60°C; example uses include cellulases for biomass saccharification, amylases for starch liquefaction, and lipases/esterases for solvent-rich reactions; limitations: binary class (thermophilic vs non-thermophilic) rather than exact Tm, sequences longer than 512 aa are truncated by the underlying model, treat the score as an enrichment prior rather than a final release criterion
Genome and metagenome mining for thermostable homolog discovery: apply the classifier across UniProt/NCBI/metagenome assemblies to prioritize candidates predicted thermophilic, accelerating hit-finding for high-temperature reactors and solvent-tolerant processes; example uses include selecting DNA/RNA polymerase or transaminase homologs for 65–80°C workflows; limitations: training data is enriched for bacterial/archaeal proteomes and organism growth temperatures, predictions for eukaryotic secreted proteins or extreme membrane proteins may be less reliable, always confirm experimentally
Design-loop guidance in enzyme engineering pipelines: integrate the class score as a lightweight objective to bias ML-guided design, recombination, or directed evolution toward variants more likely to be thermostable while filtering out destabilizing proposals early; example uses include narrowing combinatorial libraries for oxidoreductases or hydrolases prior to structural modeling/MD and wet-lab rounds; limitations: not calibrated for fine-grained ΔTm at single-mutation resolution, combine with structure/biophysics filters and experimental counterscreens
Process fit and host/process selection: rapidly assess whether a biocatalyst family is compatible with thermophilic process conditions, or select homologs predicted thermophilic for expression in high-temperature hosts or reactors; example uses include choosing heat-tolerant dehydrogenases for continuous flow at 70°C or proteases for high-temperature detergent formulations; limitations: the model does not account for buffer composition, cofactors/metals, pH, or formulation excipients, use as one input alongside stability assays
Pre-synthesis QC for construct design and fusion architectures: screen designed constructs (tags, linkers, domain swaps) to flag sequences likely to be non-thermophilic when the application requires heat robustness, reducing wasted DNA synthesis and expression runs; example uses include selecting truncation boundaries for thermostable catalytic domains or choosing linkers for thermostable fusions intended for hot reactors; limitations: sequences are evaluated up to 512 residues (truncation applied), chimeric/fusion behavior depends on context beyond primary sequence so treat results as triage signals
Limitations¶
API limits and request shape: Maximum Sequence Length 512 amino acids and Batch Size 8. Requests with
itemslonger than 512 or more than 8 sequences are rejected; long proteins are not auto-truncated or tiled—split multi-domain constructs yourself and aggregate decisions downstream. Only raw one-line sequences are accepted (no FASTA headers or whitespace).Input alphabet and formatting: Each
itemselement must contain asequenceusing the standard amino-acid alphabet; extended tokens and-are accepted. Sequences with many ambiguous or non-standard tokens may validate but can degrade prediction quality.Output semantics (classifier): Each result returns a scalar
predictionand, when available, aclassificationlabel (thermophilicornon-thermophilic). Thepredictionis not probability-calibrated; set decision thresholds appropriate for your dataset and objectives. If you need vector representations for downstream calibration or ranking, use the encoding endpoint withincludeoptions:mean(sequence-level embedding),per_residue(per-position embeddings), orcls(CLS token embedding).Scientific scope: The classifier predicts a coarse thermophilicity class derived primarily from organism growth temperature (>60°C vs. <30°C) and curated Meltome/BacDive labels; it does not estimate absolute melting temperature (Tm), mutation effects (ΔΔG/ΔTm), or context-specific stability (pH, salts, cofactors, ligands, membranes).
Generalization and dataset bias: Performance can drop under domain shift—e.g., sequences from unseen taxa or very low similarity (<20% identity) to training data—especially for the thermophilic class. Training data are enriched for bacterial/archaeal proteins; coverage of eukaryotic, viral, antibody, and highly engineered proteins is limited, so caution is advised.
When not optimal: Use cases needing exact Tm ranking, fine-grained mutation scanning, or structure-aware assessment are better served with complementary tools (e.g., regression plus calibration, ΔΔG predictors, or structure models). For early-stage triage of very large libraries, consider faster heuristics or embeddings (
include=mean) for pre-filtering, then apply the classifier on narrowed sets.
How We Use It¶
TemBERTure Classifier enables rapid, sequence-only assessment of thermophilic class and is embedded as a decision layer across BioLM protein design and optimization workflows. We use its calibrated score to triage large variant libraries from masked language models and evolutionary sampling, gate temperature-aware regression ensembles, and inform assay design (e.g., screening temperatures and host systems). Combined with structure-derived metrics (AlphaFold2 models, interface packing, Rosetta ΔΔG) and physicochemical features (charge, pI, hydrophobicity), the classifier accelerates downselection and focuses wet-lab effort on variants most likely to meet process temperature targets. Attention-derived residue saliency helps prioritize mutational hot spots and stability motifs for targeted diversification, improving iteration speed in active-learning campaigns. Standardized APIs support high-throughput batch scoring and consistent feature logging into multi-objective ranking models used for enzyme design, antibody maturation, and developability risk reduction.
Upstream filter in generative loops to enforce thermostability constraints and raise hit quality before synthesis.
Routing signal for class-specific models (e.g., TemBERTureTm ensembles, solubility/aggregation predictors) and DOE planning at relevant temperature regimes.
Feature in multi-objective optimization alongside activity and expression, reducing experimental cycles to reach required Topt/Tm bands.
References¶
Rodella, C., Lazaridi, S., & Lemmin, T. (2024). TemBERTure: advancing protein thermostability prediction with deep learning and attention mechanisms. Bioinformatics Advances, 00, 1–10.
