Per-Residue Label Membrane MPNN is a CPU-accelerated variant of ProteinMPNN that designs amino acid sequences on fixed backbones while explicitly modeling membrane environments at per-residue resolution. Given a PDB structure, it samples sequences up to 1,024 residues per chain, with optional constraints on fixed/redesigned positions, residue-level transmembrane (buried/interface) labels, symmetry, and amino acid biases or omissions. Typical uses include engineering membrane proteins with region-specific solubility, stability, or interface properties.
Generate¶
This endpoint gensats for Per-Residue Label Membrane MPNN.
- POST /api/v3/per-residue-label-membrane-mpnn/generate/¶
Generate endpoint for Per-Residue Label Membrane MPNN.
- Request Headers:
Content-Type – application/json
Authorization – Token YOUR_API_KEY
Request
params (object, required) — Configuration parameters for sequence generation:
temperature (float, default: 0.1) — Sampling temperature
fixed_residues (array of strings, default: []) — Residue specifications to keep fixed
redesigned_residues (array of strings, default: []) — Residue specifications to redesign
bias_AA (object, default: {}) — Per–amino-acid sampling bias; keys are single-letter amino acid codes, values are floats
bias_AA_per_residue (object, default: {}) — Per-residue amino-acid sampling bias; keys are residue specifications, values are objects mapping single-letter amino acid codes to floats
omit_AA (string, default: “”) — Concatenated single-letter amino acid codes to omit globally
omit_AA_per_residue (object, default: {}) — Per-residue amino acids to omit; keys are residue specifications, values are strings of single-letter amino acid codes
symmetry_residues (array of arrays of strings, default: []) — Groups of residue specifications constrained by symmetry
symmetry_weights (array of arrays of floats, default: []) — Symmetry weights corresponding to
symmetry_residuesgroupshomo_oligomer (boolean, default: False) — Homo-oligomer flag
chains_to_design (array of strings, default: []) — Chain IDs to include in design
parse_these_chains_only (array of strings, default: []) — Chain IDs to parse from the input PDB
parse_atoms_with_zero_occupancy (boolean, default: False) — Include atoms with zero occupancy
number_of_batches (int, range: 1-1, default: 1) — Number of batches to generate
batch_size (int, range: 1-2, default: 1) — Number of designs per batch
repack_everything (boolean, default: False, optional) — Side-chain repacking flag
pack_side_chains (boolean, default: False, optional) — Enable side-chain packing
number_of_packs_per_design (int, range: 1-8, default: 1, optional) — Number of packing runs per design
sc_num_samples (int, range: 1-64, default: 16, optional) — Number of side-chain samples per design
sc_num_denoising_steps (int, range: 1-10, default: 3, optional) — Number of denoising steps for side-chain sampling
force_hetatm (boolean, default: False, optional) — Force inclusion of HETATM records
pack_with_ligand_context (boolean, default: True, optional) — Use ligand context for packing
fasta_seq_separation (string, default: “:”, optional) — FASTA sequence separation character
file_ending (string, default: “”, optional) — File ending tag
zero_indexed (int, default: 0, optional) — Residue indexing mode
pdb_path (null, fixed) — Unused field
redesigned_residues_multi (null, fixed) — Unused field
fixed_residues_multi (null, fixed) — Unused field
bias_AA_per_residue_multi (null, fixed) — Unused field
omit_AA_per_residue_multi (null, fixed) — Unused field
save_stats (null, fixed) — Unused field
verbose (boolean, default: True, optional) — Verbosity flag
ligand_mpnn_use_side_chain_context (null, fixed) — Unused field
ligand_mpnn_use_atom_context (boolean, default: True, optional) — Use atom-level context for ligand-aware models
ligand_mpnn_cutoff_for_score (float, default: 8.0, optional) — Distance cutoff for ligand scoring
global_transmembrane_label (string, allowed: “membrane”, “soluble”, default: “soluble”, optional) — Global transmembrane label
transmembrane_buried (array of strings, default: null, optional) — Residue specifications labeled as buried transmembrane
transmembrane_interface (array of strings, default: null, optional) — Residue specifications labeled as transmembrane interface
items (array of objects, min: 1, max: 1, required) — Input structures:
pdb (string, min length: 1, max length:
max_pdb_str_len, required) — PDB-format structure string validated for syntax and content
Example request:
- Status Codes:
200 OK – Successful response
400 Bad Request – Invalid input
500 Internal Server Error – Internal server error
Response
results (array of objects) — One result per input item, in the order requested:
sequence (string) — Designed amino acid sequence in 1-letter codes
pdb (string) — PDB-formatted structure associated with the designed sequence, including ATOM/HETATM records
overall_confidence (float) — Design confidence score, dimensionless
ligand_confidence (float) — Ligand-context confidence score, dimensionless
seq_rec (float, range: 0.0–100.0) — Sequence recovery percentage, units: percent
log_probs (array of arrays of floats, shape: [L, 20]) — Per-position log-probabilities over 20 amino acids, where L is sequence length
sampling_probs (array of arrays of floats, shape: [L, 20]) — Per-position sampling probabilities over 20 amino acids, where L is sequence length
results (array of objects, side-chain variant) — One result per input item, in the order requested:
sequence (string) — Designed amino acid sequence in 1-letter codes
pdb (string) — PDB-formatted structure associated with the designed sequence, including ATOM/HETATM records
overall_confidence (float) — Design confidence score, dimensionless
ligand_confidence (float) — Ligand-context confidence score, dimensionless
seq_rec (float, range: 0.0–100.0) — Sequence recovery percentage, units: percent
log_probs (array of arrays of floats, shape: [L, 20]) — Per-position log-probabilities over 20 amino acids, where L is sequence length
sampling_probs (array of arrays of floats, shape: [L, 20]) — Per-position sampling probabilities over 20 amino acids, where L is sequence length
pdb_packed (object) — Side-chain–repacked structures per chain
<chain_id> (string) — Repacked structure for the given chain identifier in PDB format
Example response:
Performance¶
Model architecture and conditioning:
Variant of
ProteinMPNNsharing the same graph neural network core, with additional per-residue conditioning channels fortransmembrane_buriedandtransmembrane_interfaceannotationsPer-residue labels are validated against the input backbone and propagated through the graph, adding <10% computational overhead relative to base
ProteinMPNNfor typical protein sizes
Relative speed and throughput:
Inference speed is effectively the same order as
ProteinMPNNandSoluble ProteinMPNNon the same backbone, with very similar memory footprint and FLOP usageSubstantially faster and cheaper than structure-prediction–based redesign workflows (for example, running
AlphaFold2orESMFoldon many sequence variants for the same backbone)
Comparative design behavior:
Compared to
Global Label Membrane MPNN, per-residue labeling yields higher effective design accuracy at membrane interfaces (more realistic enrichment of interface aromatics/charges and buried hydrophobics), at the cost of a modest runtime increaseCompared to base
ProteinMPNNon membrane-exposed regions, per-residue conditioning reduces over-solubilization and better recovers hydrophobic cores and amphipathic patterns in predicted transmembrane segmentsCompared to
Soluble ProteinMPNNon backbones containing membrane-like segments, it better preserves membrane-adapted packing in buried regions while allowing explicit solubilization or reweighting at user-labeled interface positions
Scoring, calibration, and deployment characteristics:
overall_confidenceis directly comparable to scores fromProteinMPNNandSoluble ProteinMPNNand can be used to rank sequences within or across design runssampling_probsandlog_probsare typically sharper in membrane-buried segments than for soluble-only models, reflecting stronger learned constraints in lipid-exposed coresRuns efficiently on the same GPU configurations as other MPNN-family models; when co-deployed with
ProteinMPNNandLigandMPNN, per-GPU throughput is effectively identical, and CPU-only execution remains feasible for small systems though GPU is preferred for large design campaigns
Applications¶
Rapid assessment of transmembrane vs soluble exposure in designed proteins by assigning per-residue membrane/soluble labels to 3D backbones, enabling protein engineers to see which positions the Membrane MPNN model treats as membrane-facing vs solvent-facing and prioritize mutations or redesign of problematic patches for expression, aggregation resistance, and formulation robustness
Design of soluble analogues of membrane protein scaffolds for screening and mechanistic assays by mapping per-residue membrane propensities on GPCRs, ion channels, or transporters and then programmatically inverting hydrophobic/hydrophilic patterns at positions labeled as membrane-exposed, enabling panel designs for high-throughput ligand screening or structural biology without detergents or reconstitution (not intended to preserve native signaling or transport function outside membranes)
Targeted re-embedding and re-parameterization of membrane-facing residues in computational protein redesign pipelines by using per-residue labels as constraints or weights when calling Membrane MPNN for sequence optimization, so residues classified as membrane-buried retain appropriate hydrophobic character while loop or extracellular residues are diversified for stability, manufacturability, or binding interface engineering in GPCR-like or rhomboid-like scaffolds
Automated quality control of generative backbone designs for complex helical bundles and β-barrels by running the per-residue label model on candidate structures to detect inconsistent patterns (for example, polar residues predicted as membrane-exposed across large surfaces, or strongly membrane-like segments in an intended soluble scaffold), helping teams filter or flag designs before downstream modeling, synthesis, and expression; not optimal for very small or highly disordered proteins where membrane vs soluble segmentation is intrinsically ambiguous
Integration of membrane-region annotations into large-scale ML-driven protein engineering workflows where organizations maintain libraries of real or designed membrane proteins (such as receptors, transporters, or membrane enzymes) and use per-residue labels to drive feature engineering (for example, region-specific mutational constraints or region-aware stability predictors), improving robustness and interpretability of design–build–test–learn cycles across diverse membrane topologies
Limitations¶
Maximum sequence length: Backbones with more than 1024 residues per chain are not supported. Any PDB input where a chain exceeds this will be rejected because
max_sequence_lenis fixed at1024for this model.Batching and throughput limits: Each request may contain at most one design item (
itemshasmax_items=1), and sequence generation is limited tobatch_size <= 2andnumber_of_batches <= 1. This API is therefore optimized for per‑structure design, not for very high‑throughput screening of hundreds of backbones in a single call.Membrane label specification: Per‑residue membrane labeling must be provided as residue identifiers (for example,
A15orB120A) in thetransmembrane_buriedandtransmembrane_interfacelists. These identifiers must correspond to existing chains and residue indices in the uploadedpdb; labels outside the detected chain lengths, invalid chain IDs, or malformed residue strings will cause validation errors and no design will be produced.Backbone‑fixed design only: The model assumes the input
pdbbackbone is already a suitable target structure. It does not relax the backbone, redesign topology, or assess foldability in a membrane vs. soluble context. Non‑physical backbones, misaligned transmembrane segments, or unrealistic per‑residue labels can yield low‑quality or misleading sequences even if the API call succeeds.Membrane‑aware, not generic: This checkpoint (
per_residue_label_membrane) is specialized for designs where residue‑level transmembrane vs. interface annotations matter (for example, tuning exposed vs. buried residues across a bilayer). For generic soluble design, ligand‑context design, or global membrane tagging tasks, thesoluble,protein,ligand, orglobal_label_membranevariants are typically more appropriate and computationally simpler.No guarantee of experimental behavior: Output sequences (
sequence,log_probs,sampling_probsand the redesignedpdb) are optimized for compatibility with the provided backbone and labels, not for expression yield, aggregation resistance, binding, or activity in vitro or in vivo. For applications such as enzyme design, GPCR‑like receptor engineering, or solubilizing complex membrane folds, this model should be used within a broader design and validation pipeline (for example, downstream structure prediction, stability filters, and experimental screening).
How We Use It¶
Per-Residue Label Membrane MPNN enables data teams and protein engineers to redesign membrane-derived folds into soluble or otherwise context-optimized variants as a standardized API step within larger ML-driven design pipelines. By integrating residue-level membrane/solvent labeling with structure-aware sequence design, it complements backbone generation (e.g., AF2- or diffusion-based designs), structure prediction, and sequence encoders, so teams can systematically tune surface hydrophobicity, burial, and interface exposure on complex topologies such as GPCR-, rhomboid-, or claudin-like folds. Typical use cases include converting integral membrane backbones into soluble surrogates for biophysics and screening, or re-optimizing per-residue environments on designed receptors to improve expression, manufacturability, or assay compatibility, with redesigned panels flowing directly into iterative wet-lab testing.
Integrates with generative backbone models, AlphaFold-like predictors, and sequence-embedding services to form automated design–predict–filter loops for membrane and membrane-derived targets.
Supports multi-objective campaigns (stability, solubility, manufacturability, epitope exposure) by enabling per-residue-aware redesigns that can be ranked and combined with downstream structure-, sequence-, and physics-based scoring before experimental screening.
References¶
Goverde, C. A., Pacesa, M., Dornfeld, L. J., Georgeon, S., Rosset, S., Dauparas, J., Schellhaas, C., Kozlov, S., Baker, D., Ovchinnikov, S., & Correia, B. E. (2023). Computational design of soluble analogues of integral membrane protein structures. bioRxiv. https://doi.org/10.1101/2023.05.09.540044
