AntiFold is an antibody-specific inverse folding model fine-tuned from ESM-IF1, providing structure-constrained sequence design of antibody variable domains. It generates sequences predicted to preserve backbone structure, offering per-residue mutation tolerance scores, amino acid probabilities, and sampling controls (temperature). GPU-accelerated API inference supports rapid antibody optimization, affinity maturation, and rational antibody design workflows, demonstrating 60% CDRH3 sequence recovery and improved zero-shot binding affinity prediction.
Predict¶
Predict per-residue or global properties for the given antibody structure.
- POST /api/v3/antifold/predict/¶
Predict endpoint for AntiFold.
- Request Headers:
Content-Type – application/json
Authorization – Token YOUR_API_KEY
Request
params (object, required) — Configuration parameters:
heavy_chain (string, optional) — Chain ID for antibody heavy chain
light_chain (string, optional) — Chain ID for antibody light chain
nanobody_chain (string, optional) — Chain ID for nanobody chain
antigen_chain (string, optional) — Chain ID for antigen chain
include (array of strings, optional) — Additional outputs to include, allowed values:
“logprobs” — Softmax log probabilities
“logits” — Raw logits before softmax
num_seq_per_target (integer, range: 1-100, default: 1) — Number of sequences to generate per input structure
sampling_temp (float, range: 0.0-4.0, default: 0.2) — Sampling temperature for sequence generation
regions (array, optional, default: [“CDR1”, “CDR2”, “CDR3”]) — Regions to sample mutations from, allowed values:
“all” — All antibody regions
“allH” — All heavy chain regions
“allL” — All light chain regions
“FWH” — Heavy chain framework regions
“FWL” — Light chain framework regions
“CDRH” — Heavy chain CDR regions
“CDRL” — Light chain CDR regions
“FW1”, “FW2”, “FW3”, “FW4” — Framework regions 1-4 (both chains)
“FWH1”, “FWH2”, “FWH3”, “FWH4” — Heavy chain framework regions 1-4
“FWL1”, “FWL2”, “FWL3”, “FWL4” — Light chain framework regions 1-4
“CDR1”, “CDR2”, “CDR3” — CDR regions 1-3 (both chains)
“CDRH1”, “CDRH2”, “CDRH3” — Heavy chain CDR regions 1-3
“CDRL1”, “CDRL2”, “CDRL3” — Light chain CDR regions 1-3
integer values — Specific residue positions (must be within chain length)
limit_expected_variation (boolean, optional, default: false) — Limit mutations to expected variation
exclude_heavy (boolean, optional, default: false) — Exclude heavy chain from sampling
exclude_light (boolean, optional, default: false) — Exclude light chain from sampling
items (array of objects, min: 1, max: 1, required) — Input antibody structures:
pdb (string, required, max length: 100000) — Antibody structure in PDB format (ATOM/HETATM records only)
Example request:
- Status Codes:
200 OK – Successful response
400 Bad Request – Invalid input
500 Internal Server Error – Internal server error
Response
results (array of objects) — One result per input item, in the order requested:
sequences (array of objects) — Generated antibody sequences:
global_score (float) — Overall log-likelihood score of the generated antibody sequence
score (float) — Log-likelihood score for the sampled region(s)
heavy (string) — Generated heavy chain amino acid sequence
light (string, optional) — Generated light chain amino acid sequence (if provided)
temperature (float, range: 0.0–4.0) — Sampling temperature controlling sequence diversity
mutations (int) — Number of mutations introduced relative to input sequence
seq_recovery (float, range: 0.0–1.0) — Fraction of residues identical to input sequence
logprobs (array of arrays of floats, optional, shape: [sequence_length, 20]) — Log-probabilities for each amino acid at each residue position
logits (array of arrays of floats, optional, shape: [sequence_length, 20]) — Raw model logits for each amino acid at each residue position
pdb_posins (array of ints, optional, length: sequence_length) — Residue positions corresponding to input PDB numbering
pdb_chain (array of strings, optional, length: sequence_length) — Chain identifiers corresponding to input PDB structure
pdb_res (array of strings, optional, length: sequence_length) — Original amino acid residues from input PDB structure
top_res (array of strings, optional, length: sequence_length) — Highest probability amino acid predicted at each residue position
perplexity (array of floats, optional, length: sequence_length, range: ≥1.0) — Structural tolerance to mutations at each residue position (lower values indicate higher structural constraint)
vocab (array of strings, optional, length: 20) — Amino acid vocabulary used by the model (standard 20 amino acids)
Example response:
Encode¶
Generate embeddings for the specified heavy and light chains.
- POST /api/v3/antifold/encode/¶
Encode endpoint for AntiFold.
- Request Headers:
Content-Type – application/json
Authorization – Token YOUR_API_KEY
Request
params (object, required) — Configuration parameters:
heavy_chain (string, optional) — Chain identifier for antibody heavy chain (single character)
light_chain (string, optional) — Chain identifier for antibody light chain (single character)
nanobody_chain (string, optional) — Chain identifier for nanobody (single character)
antigen_chain (string, optional) — Chain identifier for antigen (single character)
include (array of strings, optional, default: [“mean”]) — Output types to include, possible values: “mean”, “residue”, “logits”
num_seq_per_target (int, optional, range: 1-100, default: 1) — Number of sequences to generate per input structure
sampling_temp (float, optional, range: 0.0-4.0, default: 0.2) — Sampling temperature for sequence generation
regions (array, optional, default: [“CDR1”, “CDR2”, “CDR3”]) — Regions to sample sequences for, possible values:
— “all”, “allH”, “allL”, “FWH”, “FWL”, “CDRH”, “CDRL”, “FW1”, “FWH1”, “FWL1”, “CDR1”, “CDRH1”, “CDRL1”, “FW2”, “FWH2”, “FWL2”, “CDR2”, “CDRH2”, “CDRL2”, “FW3”, “FWH3”, “FWL3”, “CDR3”, “CDRH3”, “CDRL3”, “FW4”, “FWH4”, “FWL4” — or integer positions within chain length
limit_expected_variation (bool, optional, default: false) — Limit expected variation in generated sequences
exclude_heavy (bool, optional, default: false) — Exclude heavy chain from sequence sampling
exclude_light (bool, optional, default: false) — Exclude light chain from sequence sampling
items (array of objects, required, min length: 1, max length: 32) — Input antibody structures:
pdb (string, required, min length: 1, max length: 100000) — Antibody structure in PDB format (ATOM/HETATM records only)
Example request:
- Status Codes:
200 OK – Successful response
400 Bad Request – Invalid input
500 Internal Server Error – Internal server error
Response
results (array of objects) — One result per input item, in the order requested:
sequences (array of objects) — Generated antibody sequences:
global_score (float, range: negative to positive) — Overall log-likelihood score of the generated antibody sequence
score (float, range: negative to positive) — Log-likelihood score for the sampled regions
heavy (string) — Generated heavy chain amino acid sequence
light (string, optional) — Generated light chain amino acid sequence (if applicable)
temperature (float, range: 0.0–4.0) — Sampling temperature parameter used for sequence generation
mutations (int, range: ≥ 0) — Number of amino acid mutations introduced compared to the original sequence
seq_recovery (float, range: 0.0–1.0) — Fraction of residues identical to the original sequence in sampled regions
logprobs (array of arrays of floats, shape: [sequence_length, 20], optional) — Log-probabilities for each amino acid at each residue position
logits (array of arrays of floats, shape: [sequence_length, 20], optional) — Raw logits for each amino acid at each residue position
pdb_posins (array of integers, length: sequence_length, optional) — Residue numbering positions from input PDB structure
pdb_chain (array of strings, length: sequence_length, optional) — Chain identifiers corresponding to each residue position from input PDB structure
pdb_res (array of strings, length: sequence_length, optional) — Original amino acid residues from input PDB structure
top_res (array of strings, length: sequence_length, optional) — Highest-probability amino acid residue predicted at each position
perplexity (array of floats, length: sequence_length, optional) — Predicted structural tolerance to mutations at each residue position (range: ≥ 1.0, lower values indicate fewer tolerated mutations)
vocab (array of strings, length: 20, optional) — Amino acid vocabulary corresponding to logits and logprobs arrays (standard 20 amino acids)
Example response:
Generate¶
Generate new sequences focused on selected CDR regions with control over sampling temperature.
- POST /api/v3/antifold/generate/¶
Generate endpoint for AntiFold.
- Request Headers:
Content-Type – application/json
Authorization – Token YOUR_API_KEY
Request
params (object, required) — Configuration parameters:
heavy_chain (string, optional) — Chain ID for antibody heavy chain; must exist in provided PDB
light_chain (string, optional) — Chain ID for antibody light chain; must exist in provided PDB
nanobody_chain (string, optional) — Chain ID for nanobody; mutually exclusive with heavy_chain and light_chain; must exist in provided PDB
antigen_chain (string, optional) — Chain ID for antigen; must exist in provided PDB
include (array of strings, optional) — Additional output data to include; possible values:
logprobs
logits
num_seq_per_target (int, range: 1-100, default: 1) — Number of sequences to generate per input structure
sampling_temp (float, range: 0.0-4.0, default: 0.2) — Sampling temperature parameter controlling sequence diversity
regions (array, default: [“CDR1”, “CDR2”, “CDR3”]) — Regions to generate sequences for; possible values:
all
allH
allL
FWH
FWL
CDRH
CDRL
FW1
FWH1
FWL1
CDR1
CDRH1
CDRL1
FW2
FWH2
FWL2
CDR2
CDRH2
CDRL2
FW3
FWH3
FWL3
CDR3
CDRH3
CDRL3
FW4
FWH4
FWL4
or list of integer residue positions (must exist in provided PDB chains)
limit_expected_variation (boolean, default: False, optional) — Limit generated sequences to residues with low expected structural variation
exclude_heavy (boolean, default: False, optional) — Exclude heavy chain from sequence generation
exclude_light (boolean, default: False, optional) — Exclude light chain from sequence generation
items (array of objects, min: 1, max: 1, required) — Input antibody structures:
pdb (string, min length: 1, max length: 100000, required) — Antibody structure in PDB format; must contain specified chain IDs and residue positions
Example request:
- Status Codes:
200 OK – Successful response
400 Bad Request – Invalid input
500 Internal Server Error – Internal server error
Response
results (array of objects) — One result per input item, in the order requested:
sequences (array of objects) — Generated antibody sequences and associated metrics:
global_score (float) — Overall sequence log-likelihood score
score (float) — Sequence log-likelihood score for sampled region
heavy (string) — Generated heavy chain amino acid sequence
light (string, optional) — Generated light chain amino acid sequence (if applicable)
temperature (float, range: 0.0–4.0) — Sampling temperature controlling sequence diversity
mutations (int) — Number of amino acid mutations compared to input sequence
seq_recovery (float, range: 0.0–1.0) — Fraction of residues identical to input sequence
logprobs (array of arrays of floats, optional, shape: [sequence_length, 20]) — Log-probabilities (base e) for each amino acid at each residue position
logits (array of arrays of floats, optional, shape: [sequence_length, 20]) — Raw model logits for each amino acid at each residue position
pdb_posins (array of ints, optional, length: sequence_length) — Residue positions from input PDB structure
pdb_chain (array of strings, optional, length: sequence_length) — Chain identifiers from input PDB structure
pdb_res (array of strings, optional, length: sequence_length) — Original amino acid residues from input PDB structure
top_res (array of strings, optional, length: sequence_length) — Highest-probability amino acid residues predicted by the model at each position
perplexity (array of floats, optional, length: sequence_length, range: ≥1.0) — Per-residue perplexity indicating mutation tolerance without altering backbone structure
vocab (array of strings, optional, length: 20) — Amino acid vocabulary used by the model (standard 20 amino acids)
Example response:
Performance¶
AntiFold inference runs on GPU-accelerated hardware (NVIDIA A100 GPUs), ensuring high-throughput and low-latency predictions compared to CPU-based inference.
AntiFold demonstrates superior amino acid recovery accuracy for antibody complementarity-determining regions (CDRs) compared to related inverse folding models available on BioLM, such as ESM-IF1 and general-purpose ProteinMPNN:
AntiFold achieves 60% amino acid recovery for the critical CDRH3 loop, outperforming AbMPNN (56%), ESM-IF1 (43%), and ProteinMPNN (35%).
Across all CDR loops, AntiFold consistently shows improved sequence recovery, with accuracy ranging from 75-84%, compared to AbMPNN (63-76%).
Designed antibody sequences generated by AntiFold maintain high structural fidelity upon refolding, achieving an average backbone RMSD of 0.95 Å for CDR loops, versus 0.98 Å for AbMPNN, 1.01 Å for ESM-IF1, and 1.03 Å for ProteinMPNN.
AntiFold predictions correlate strongly with experimental antibody-antigen binding affinity measurements (Spearman’s rank correlation of 0.42), significantly outperforming other inverse folding models on BioLM, including AbMPNN (0.32), ESM-IF1 (0.33), and ProteinMPNN (0.30).
AntiFold effectively identifies mutations detrimental to antibody-antigen binding affinity, enabling users to prioritize variants likely to maintain or enhance binding properties; in benchmark affinity maturation experiments, AntiFold ranked improved-affinity variants in the top 80%, compared to 73% for ProteinMPNN, 57% for ESM-IF1, and 55% for AbMPNN.
BioLM’s optimized deployment of AntiFold leverages fine-tuned ESM-IF1 architecture with layer-wise learning rate decay, IMGT-weighted masking, and Gaussian noise augmentation, resulting in robust performance on both experimentally solved and computationally predicted antibody structures.
Applications¶
Antibody affinity maturation by generating structurally constrained sequence variants, enabling researchers to efficiently identify mutations that enhance antigen binding while preserving antibody structural integrity; valuable for therapeutic antibody developers aiming to improve potency without compromising stability or manufacturability; not optimal for predicting non-structural properties such as immunogenicity or aggregation propensity.
Optimization of antibody humanization workflows by scoring candidate humanized sequences for structural compatibility, allowing biopharma companies to rapidly filter out structurally disruptive humanization mutations and reduce experimental validation efforts; particularly useful for accelerating development timelines and minimizing risks associated with structural instability; less suitable for evaluating immunogenicity or humanness directly.
Antibody library design for phage or yeast display by sampling structurally diverse yet backbone-constrained CDR sequences, enabling biotech companies to generate focused libraries enriched with structurally viable variants; significantly improves screening efficiency by reducing structurally incompatible sequences; not intended for predicting expression levels or display efficiency directly.
Structural risk assessment of antibody sequence variants identified from deep mutational scanning (DMS) experiments, allowing researchers to rapidly prioritize variants predicted to retain structural integrity; valuable for guiding experimental validation and reducing the number of structurally disruptive variants carried forward; does not directly predict functional properties such as specificity or off-target binding.
Computational triage of antibody lead candidates by evaluating structural tolerance to mutations and identifying sequence liabilities early in the discovery pipeline, enabling biotech teams to prioritize structurally robust antibody candidates and reduce downstream development risks; particularly beneficial for resource-constrained teams aiming to minimize costly late-stage failures; not designed to replace comprehensive experimental developability assessments.
Limitations¶
Batch Size: AntiFold requests support up to
32items per batch for prediction and encoding, but only1item per batch for sequence generation (generate_batch_size). Ensure your workflows account for these batch size constraints to avoid errors.Maximum PDB Length: Input PDB structures must not exceed
max_pdb_str_lencharacters. Exceeding this length will result in validation errors.AntiFold is specifically fine-tuned for antibody variable domain structures. It is not suitable for general protein structures or non-antibody proteins. Using AntiFold on non-antibody inputs may yield poor or unreliable results.
AntiFold predictions are most accurate for shorter CDR loops (e.g., CDRH3 loops between 6-9 residues). Accuracy notably decreases for longer loops (16 residues or more). Consider alternative methods or additional validation steps when designing antibodies with unusually long CDR loops.
AntiFold is optimized for maintaining antibody backbone structure while exploring sequence diversity. It is not designed for tasks requiring significant backbone conformational changes or de novo antibody structure design. For such tasks, consider methods explicitly designed for backbone flexibility or structure prediction (e.g., AlphaFold2, ESMFold).
While AntiFold can predict antibody-antigen binding affinity changes, it performs best when antigen information is included. Without antigen context, affinity predictions are less reliable. If antigen structure data is unavailable, consider complementary affinity prediction methods specialized for sequence-only inputs.
How We Use It¶
AntiFold enables BioLM to accelerate antibody design and optimization by generating antibody sequences that retain structural integrity and antigen-binding properties. Integrated into BioLM’s antibody optimization pipelines, AntiFold informs decisions on mutations that preserve structural stability and enhance binding affinity, significantly reducing the experimental search space and increasing the likelihood of identifying viable antibody candidates for synthesis and testing.
Integrates effectively with BioLM’s predictive models and generative AI tools to streamline antibody affinity maturation and developability optimization.
Enables targeted selection of structurally viable antibody variants, improving experimental efficiency and reducing time-to-market.
References¶
Høie, M. H., Hummer, A., Olsen, T. H., Aguilar-Sanjuan, B., Nielsen, M., & Deane, C. M. (2024). AntiFold: Improved antibody structure-based design using inverse folding. Bioinformatics.
