HyperMPNN is a retrained ProteinMPNN-style inverse folding model optimized on AlphaFold-predicted hyperthermophilic proteomes to design protein sequences with thermostability-associated amino acid biases (more apolar cores, more positively charged surfaces). The API accepts fixed backbones as single- or multi-chain PDB strings and returns designed sequences, global scores, and per-residue log-probabilities, supporting batched, CPU-based inference for high-throughput redesign of enzymes, scaffolds, and protein nanoparticles beyond mesophilic baselines.
Generate¶
HyperMPNN-style ProteinMPNN request to generate thermostable sequence designs for a single-chain soluble protein, with chain A biased toward hydrophobic and charged residues typical of hyperthermophiles, explicit fixed/redesigned residues, symmetry constraints, and per-residue transmembrane annotations.
- POST /api/v3/hyper-mpnn/generate/¶
Generate endpoint for HyperMPNN.
- Request Headers:
Content-Type – application/json
Authorization – Token YOUR_API_KEY
Request
params (object, required) — Configuration parameters for HyperMPNN generation:
temperature (float, default: 0.1) — Sampling temperature
fixed_residues (array of strings, default: empty array) — Residue specification strings “[ChainID][ResidueNumber][OptionalInsertionCode]” to keep fixed; each must exist in the input PDB
redesigned_residues (array of strings, default: empty array) — Residue specification strings “[ChainID][ResidueNumber][OptionalInsertionCode]” to redesign; each must exist in the input PDB
bias_AA (object, default: empty object) — Global amino-acid bias map (keys: single-letter unambiguous amino-acid codes, values: float weights)
bias_AA_per_residue (object, default: empty object) — Per-residue amino-acid bias map:
<residue_spec> (object) — Key is a residue specification string “[ChainID][ResidueNumber][OptionalInsertionCode]” that must exist in the input PDB
<aa_code> (float) — Key is a single-letter unambiguous amino-acid code, value is a float weight
omit_AA (string, default: “”, allowed chars: unambiguous amino-acid codes) — Global string of amino acids to omit
omit_AA_per_residue (object, default: empty object) — Per-residue amino acids to omit:
<residue_spec> (string) — Residue specification string “[ChainID][ResidueNumber][OptionalInsertionCode]” that must exist in the input PDB, mapped to a string of unambiguous amino-acid codes
symmetry_residues (array of arrays of strings, default: empty array) — Groups of residue specification strings “[ChainID][ResidueNumber][OptionalInsertionCode]” that must exist in the input PDB
symmetry_weights (array of arrays of floats, default: empty array) — Symmetry weights corresponding to groups in symmetry_residues; each inner array length must match the corresponding symmetry_residues group
homo_oligomer (boolean, default: false) — Homo-oligomer design flag
chains_to_design (array of strings, default: empty array) — Chain IDs to design; each must exist in the input PDB
parse_these_chains_only (array of strings, default: empty array) — Chain IDs to parse from the input PDB; each must exist in the input PDB
parse_atoms_with_zero_occupancy (boolean, default: false) — Whether atoms with zero occupancy are parsed from the PDB
number_of_batches (int, range: 1-48, default: 1) — Number of design batches
batch_size (int, range: 1-1000, default: 1) — Number of designs per batch
repack_everything (boolean or null, default: false) — Whether all residues are repacked in side-chain mode
pack_side_chains (boolean or null, default: false) — Whether side chains are packed in side-chain mode
number_of_packs_per_design (int or null, range: 1-8, default: 1) — Number of packing runs per design in side-chain mode
sc_num_samples (int or null, range: 1-64, default: 16) — Number of side-chain samples in side-chain denoising
sc_num_denoising_steps (int or null, range: 1-10, default: 3) — Number of denoising steps in side-chain mode
force_hetatm (boolean or null, default: false) — Whether to include HETATM records during parsing
pack_with_ligand_context (boolean or null, default: true) — Whether to pack using ligand context when ligands are present
fasta_seq_separation (string, default: “:”) — Separator string used for concatenating FASTA sequences
file_ending (string, default: “”) — File ending label string
zero_indexed (int, default: 0) — Residue indexing mode flag
pdb_path (null, fixed: null) — Unused field; always null
redesigned_residues_multi (null, fixed: null) — Unused field; always null
fixed_residues_multi (null, fixed: null) — Unused field; always null
bias_AA_per_residue_multi (null, fixed: null) — Unused field; always null
omit_AA_per_residue_multi (null, fixed: null) — Unused field; always null
save_stats (null, fixed: null) — Unused field; always null
verbose (boolean, default: true) — Verbosity flag for internal processing
ligand_mpnn_use_side_chain_context (null, fixed: null) — Unused field; always null
ligand_mpnn_use_atom_context (boolean or null, default: true) — LigandMPNN atom context flag
ligand_mpnn_cutoff_for_score (float or null, default: 8.0) — LigandMPNN distance cutoff in Å for scoring
global_transmembrane_label (string or null, allowed: “membrane”, “soluble”, default: “soluble”) — Global transmembrane label
transmembrane_buried (array of strings or null, default: null) — Residue specification strings “[ChainID][ResidueNumber][OptionalInsertionCode]” for buried transmembrane residues; each must exist in the input PDB
transmembrane_interface (array of strings or null, default: null) — Residue specification strings “[ChainID][ResidueNumber][OptionalInsertionCode]” for transmembrane interface residues; each must exist in the input PDB
items (array of objects, min items: 1, max items: 1) — Input structures:
pdb (string, min length: 1, max length: max_pdb_str_len, required) — PDB content string containing ATOM and/or HETATM records validated on input
Example request:
- Status Codes:
200 OK – Successful response
400 Bad Request – Invalid input
500 Internal Server Error – Internal server error
Response
results (array of objects) — One result per input item, in the order requested:
sequence (string) — Designed amino acid sequence using single-letter residue codes, length ≤ 1024
pdb (string) — Designed structure in PDB format text, including REMARK and ATOM/HETATM records
overall_confidence (float) — Scalar design confidence score parsed from a JSON number or numeric string
ligand_confidence (float) — Scalar ligand-environment confidence score parsed from a JSON number or numeric string
seq_rec (float) — Scalar sequence recovery metric parsed from a JSON number or numeric string
log_probs (array of arrays of floats) — Per-position log-probabilities over the residue vocabulary; shape: [L, V], where L = length of
sequenceand V = number of residue categories; each inner value parsed as floatsampling_probs (array of arrays of floats) — Per-position sampling probabilities over the residue vocabulary; shape: [L, V], where L = length of
sequenceand V = number of residue categories; each inner value parsed as floatpdb_packed (object, optional) — Side-chain–packed structures from side-chain models keyed by identifier:
<chain_or_model_id> (string) — Packed PDB content for the given chain or packing context
Example response:
Performance¶
Computational characteristics relative to ProteinMPNN:
HyperMPNN uses the same message-passing architecture, depth, and width as ProteinMPNN, retrained on ~29k AlphaFold structures from hyperthermophiles; per-residue inference cost and memory footprint are effectively identical on the same hardware
BioLM deploys HyperMPNN with the same optimized implementation stack as other MPNN variants, so throughput, latency scaling with protein length, and cost per redesigned residue match ProteinMPNN and LigandMPNN
Thermostability-targeted design behavior:
When redesigning mesophilic proteins, HyperMPNN shifts surface composition toward hyperthermophile-like profiles (more apolar and positively charged residues, fewer polar uncharged), whereas ProteinMPNN tends toward alanine/glutamate/lysine-rich mesophile-like surfaces
In protein cores, HyperMPNN increases apolar content by ~4–5% relative to mesophilic references, closely matching native hyperthermophile cores; ProteinMPNN designs deviate from this pattern
For non-hyperthermophilic inputs (e.g., E. coli), HyperMPNN redesigns typically move median net charge into the positive regime (≈ +3), while ProteinMPNN produces negatively charged designs, opposite to the charge trend associated with thermostability in hyperthermophiles
Structural-physics and sequence-modeling performance:
HyperMPNN approximately doubles the median number of salt bridges per redesigned E. coli protein (≈17 vs ≈9 for ProteinMPNN) while maintaining native-like backbone geometry, enabling more extensive electrostatic networks without introducing non-native compaction or expansion
Designed sequences preserve contact order and radius of gyration distributions comparable to both mesophilic and hyperthermophilic natives, indicating that HyperMPNN alters sequence-level stability determinants without distorting global fold topology
On its held-out hyperthermophile test set, HyperMPNN attains perplexity ≈ 5.18 and per-residue recovery ≈ 0.48, matching ProteinMPNN’s accuracy; relative to ProteinMPNN, users can expect similar native-sequence recovery but with a systematic bias toward thermostable, hyperthermophile-like compositions
Experimental performance and workflow integration:
In redesigns of the I53-50B pentameric nanoparticle component (parent Tm ~65°C), a HyperMPNN consensus design remained folded to at least 95°C by CD, representing a ≥30°C apparent stability increase and demonstrating that the learned composition shift can translate into large experimental Tm gains
Compared with a ProteinMPNN design on the same backbone that is also stable to 95°C but strongly negatively charged (≈ −7.9), the HyperMPNN design (net charge ≈ +2.0) better reproduces hyperthermophile-like surface electrostatics, which can be advantageous when targeting positively charged lumen-facing surfaces
Within multi-model BioLM pipelines (e.g., HyperMPNN for global redesign followed by structure prediction and supervised stability scoring), HyperMPNN typically contributes only a small fraction of total runtime while materially enriching for high-Tm candidates relative to starting from ProteinMPNN or language-model-based sequence generators
Applications¶
Thermostabilization of existing industrial enzymes given a reliable 3D structure (experimental or AlphaFold2), by using HyperMPNN (
model_type="hyper") to redesign non-catalytic residues toward hyperthermophile-like amino acid composition, enabling operation at higher process temperatures (e.g., shifting a 50–60°C enzyme toward 80–95°C use) and reducing contamination risk and cleaning costs; activity and solubility still require experimental verificationDesign of highly heat-stable self-assembling protein nanoparticles and carriers (e.g., I53-50-like scaffolds) by restructuring surface and core residues while preserving assembly interfaces using HyperMPNN’s structure-conditioned sequence model, allowing vaccine and biologic delivery systems that maintain integrity after prolonged exposure to elevated temperatures and cold-chain interruptions; requires an oligomer model built into a single PDB for design
Pre-screening of stabilizing sequence variants before wet-lab campaigns in protein engineering pipelines, by using HyperMPNN via the
generatorendpoint to generate thermostable sequence panels that are then filtered with secondary in silico models (e.g., ddG predictors, activity models) and assays, reducing the number of variants that need to be constructed while focusing on mutation patterns consistent with hyperthermophilic stability strategiesRetrofitting mesophilic biocatalysts for high-temperature manufacturing steps where substrate solubility or reaction rates are limiting, via structure-guided redesign of the scaffold (keeping active-site residues in
fixed_residues) to incorporate apolar core packing and charged surface patterns learned from hyperthermophiles; users should expect that catalytic performance, expression levels, and solubility may change and must be re-optimized experimentallyEarly-stage stability risk mitigation for protein-based therapeutics and diagnostic reagents (e.g., binding scaffolds, cytokine mimetics) by generating HyperMPNN designs biased toward higher melting temperatures, providing alternative sequence options when formulation, storage, or transport conditions are expected to exceed typical room-temperature stability thresholds; not optimal for targets lacking a reliable monomeric or oligomeric structure model or where function depends on finely tuned conformational dynamics near physiological temperature
Limitations¶
Maximum sequence length: HyperMPNN inherits the MPNN architectural limit of at most
1024residues per chain as parsed from the inputpdbstring. Chains with more than1024residues are not supported and must be truncated or split before calling the API.Batching and throughput:
HyperMPNNGenerateRequest.itemsmust contain exactly1structure (min_items=1,max_items=1). Withinparams,batch_sizemust satisfy1 <= batch_size <= 1000andnumber_of_batchesmust satisfy1 <= number_of_batches <= 48. Very large design campaigns should be split across multiple API requests by the client.Structure and residue specification requirements: The
pdbfield must be a valid PDB-formatted string (ATOM/HETATMrecords with consistent chain and residue numbering) and is validated byvalidate_pdb. Residue-level options such asfixed_residues,redesigned_residues,bias_AA_per_residue,omit_AA_per_residue,symmetry_residues,transmembrane_buried, andtransmembrane_interfaceuse the[ChainID][ResidueNumber][OptionalInsertionCode]syntax (for exampleA10,B52A) and must refer to residues present in the uploaded structure; invalid chain IDs or out-of-range residue numbers will raise validation errors.Thermostability-focused design bias: HyperMPNN is a retrained ProteinMPNN variant biased toward the amino acid composition of hyperthermophilic proteins (increased apolar core and positively charged surface residues). It is intended for stabilizing well-structured proteins and is not optimized for preserving native-like flexibility, fine-tuning low-temperature activity, or reproducing organism-specific mesophilic amino acid compositions. For neutral or organism-matched designs, other MPNN variants or protein language models may be more suitable.
Backbone dependence and design scope: The model assumes a reasonably accurate backbone (experimental or high-confidence predicted). It does not fix large backbone errors, change topology, or assess foldability from sequence alone; it only designs sequences on the provided structure. HyperMPNN generally introduces many mutations to embed hyperthermophile-like patterns and is therefore less appropriate than single-mutation
ddGpredictors for local scanning or minimal-change engineering.Domain and expression context: Training data are hyperthermophilic and predominantly prokaryotic. The strong bias toward high-temperature stability can reduce folding or expression efficiency in mesophilic hosts such as E. coli. HyperMPNN is not ideal for mammalian-only folds with no hyperthermophilic analogs, highly disordered or IDR-rich targets, membrane proteins (use membrane-aware MPNN variants instead), or applications where maintaining wild-type expression at low or moderate temperature is the primary goal.
How We Use It¶
HyperMPNN enables thermostability-focused sequence redesign as a configurable step within broader protein engineering programs, integrating seamlessly with structure prediction, sequence embeddings, and developability filters to increase the likelihood of higher melting temperatures for enzymes, antibodies, and protein nanoparticles. Using the same standardized MPNN API schema, teams can combine HyperMPNN designs with folding models (e.g., AlphaFold2-based services), language-model embeddings, biophysical scoring (charge, salt bridges, radius of gyration), and client assay data to route large variant sets through a consistent design → filter → rank → test → iterate loop, turning thermostability from an ad hoc consideration into a scalable design objective that improves hit quality and shortens optimization cycles.
In enzyme and industrial biocatalyst projects, HyperMPNN redesigns are evaluated alongside activity and solubility predictors to select variants that balance elevated operating temperatures with process-relevant performance.
In vaccine and nanoparticle applications, HyperMPNN complements interface design, antigen display, and manufacturability assessments, supporting thermostable carriers that better tolerate distribution constraints without adding extra assay rounds.
References¶
Ertelt, M., Schlegel, P., Beining, M., Kaysser, L., Meiler, J., & Schoeder, C. T. (2024). HyperMPNN – A general strategy to design thermostable proteins learned from hyperthermophiles. Preprint and code available at https://github.com/meilerlab/HyperMPNN.
