BioLM | API Models

Check back regularly for additional model deployments.

Please reach out to request any new model deployments.

AbLang-2

antibody prediction generation embedding mlm

AbLang-2 is an antibody-specific language model for paired heavy/light chains that reduces germline bias to better characterize non-germline mutations. The API supports up to 32 VH–VL pairs per request (≤1024 residues per chain) for CPU-only inference. It provides sequence-level (seqcoding) and residue-level (rescoding) embeddings, masked-position restoration for “*” placeholders in variable regions, and per-residue mutation likelihoods, supporting affinity maturation analysis, sequence completion, and repertoire engineering workflows.

View Docs

AbLang Heavy Chain API

antibody generator embeddings predict

AbLang empowers researchers with capabilities like embeddings, sequence restoration, and likelihood computation, directly catering to the unique structural and functional attributes of heavy/light-chain antibodies. AbLang's deep learning framework, trained on an extensive collection of heavy-chain antibody sequences, offers unparalleled insights into their design and optimization for therapeutic applications.

View Docs

AbLang Light Chain API

antibody generator embeddings predict generate

AbLang is an advanced AI language model tailored for antibody design, enabling embeddings, sequence restoration, and computing likelihoods for heavy/light-chain antibodies. Trained on a comprehensive database of human antibody sequences, AbLang can provide insights into antibody sequence functionality, and aid in the development of therapeutic antibodies.

View Docs

ABodyBuilder2

antibody structure prediction folding

ABodyBuilder2 predicts all-atom 3D structures of paired antibody variable domains (VH/VL) directly from heavy and light chain amino acid sequences. The model returns refined PDB-format Fv structures with accurate CDR loop and VH–VL packing geometries, suitable for large-scale analysis of antibody repertoires. ABodyBuilder2 is CPU-efficient (2 vCPUs, 8 GB RAM per request) and supports batched inference (up to 8 items, 2048 residues per chain) for high-throughput structural characterization and downstream design workflows.

View Docs

ABodyBuilder3 Language

antibody structure prediction folding

ABodyBuilder3 Language predicts atomic-resolution 3D Fv structures from paired antibody heavy (H) and light (L) chain amino acid sequences. It implements an antibody-specific AlphaFold-Multimer–style architecture to model VH–VL orientation, frameworks, and CDR loops, including challenging CDR-H3 regions. The API accepts one H/L pair per request (up to 2048 residues per chain) and returns a refined PDB model, supporting high-throughput structural analysis of immune receptor datasets and antibody engineering workflows.

View Docs

ABodyBuilder3 pLDDT

antibody structure prediction folding

ABodyBuilder3 pLDDT predicts all-atom 3D antibody Fv structures from paired heavy (H) and light (L) chain amino acid sequences and returns per-residue pLDDT-like confidence scores. The service accepts single-sequence inputs (no MSAs/templates) up to 2048 residues per chain and batch size 1, and outputs a refined PDB plus optional pLDDT score arrays for each chain. Typical uses include antibody engineering, CDR loop analysis, structure-based filtering, and prioritization for experimental validation.

View Docs

AlphaFold2

protein structure folding alphafold2 msa jackhmmer mmseqs2

AlphaFold2 predicts atomic-resolution 3D protein structures from amino acid sequence using deep neural networks conditioned on MSAs and structural templates. This service provides GPU-accelerated single-chain inference for sequences up to 512 residues, with configurable database searches (UniRef90, MGnify, small BFD) and MMseqs2-based alignment. The encoder endpoint returns MSAs and template hits; the predictor endpoint returns PDB-format models, with optional Amber-style relaxation control, for protein engineering and structural biology workflows.

View Docs

AntiFold

protein antibody structure esm generation inverse-folding embedding inverse-fold

AntiFold is an antibody-specific inverse folding model that designs sequences conditioned on variable-domain structures, including paired VH/VL antibodies or nanobodies, with optional antigen context. The API provides GPU-accelerated endpoints to score per-residue mutation tolerance (logits, log-probabilities, perplexity) and to generate up to 50 000 backbone-constrained sequences per target over user-selected IMGT/position-based regions, with temperature-controlled diversity. Typical uses include CDR design, affinity maturation guidance, and structure-preserving antibody optimization.

View Docs

BioLM-AbLEF

antibody prediction embedding

BioLM-AbLEF encodes paired antibody Fv heavy/light chain sequences using frozen AbLang embeddings to generate 1536-dimensional Fv representations and predict developability-relevant properties. The API provides two GPU-accelerated actions: an encoder that returns Fv embeddings, and a predictor that outputs hydrophobic interaction chromatography retention time (minutes) and Fab melting temperature by DSF (°C). Typical uses include early-stage mAb screening, ranking, and triage under sequence length limits of 160 residues per chain and batch sizes up to 8.

View Docs

BioLMSol

prediction

Predict protein and peptide solubility from sequence using the CamSol-like intrinsic solubility model. BioLMSol returns an overall solubility score (higher = more soluble) and, optionally, a per-residue intrinsic solubility profile for up to 10 sequences of length ≤2048 amino acids per request. The API supports configurable pH (0.0–14.0) for charge-related calculations and is suited to construct optimization, aggregation risk assessment, solubility-focused protein engineering, and high-throughput screening workflows.

View Docs

BioLMTox-2

prediction embedding biosecurity toxins classification

BioLMTox-2 is a GPU-accelerated, sequence-only protein toxin classifier fine-tuned from a 650M-parameter ESM-2 model to predict toxin vs. not-toxin labels for amino acid sequences up to 2048 residues. The API supports batched inference (up to 16 sequences) with sub-second single-sequence inference on a T4-class GPU, and provides both classification scores and sequence embeddings. Typical applications include peptide and protein therapeutic screening, biosecurity sequence filters, and large-scale protein engineering or synthetic biology workflows.

View Docs

Biotite PDB RMSD

esm pdb

Compute the root-mean-square deviation (RMSD) between two protein structures provided as PDB strings using Biotite’s C-/NumPy-accelerated structure routines. The service supports batch processing of up to 8 structure pairs and explicit mapping of chain IDs for each input (single- or multi-chain), enabling targeted comparison of domains, interfaces, or subunits. This is useful for assessing structural similarity, conformational changes, and model-to-reference deviations in protein engineering and structural biology workflows.

View Docs

Boltz-2

protein structure dna prediction generation rna embedding

Boltz-2 jointly predicts 3D protein–ligand complex structures and small‑molecule binding affinity in a single GPU‑accelerated workflow. The API returns CIF structures plus confidence metrics (pLDDT, PAE, PDE, ipTM) and, when an affinity binder chain is specified, an affinity ensemble comprising a binary binding probability and a log10(IC50‑like) affinity value, with optional molecular‑weight correction. Boltz-2 supports MSAs, optional templates, and geometric constraints (bond, pocket, contact), enabling virtual screening, analog ranking, and hit‑to‑lead optimization at scale.

View Docs

Chai-1

protein structure dna prediction esm transformer ligand multimer diffusion single-sequence multimodal rna

Chai-1 predicts atomic-resolution 3D structures for proteins, antibodies, multimers, nucleic acids, and protein–ligand complexes from sequence and SMILES input using a multi-modal architecture that can incorporate optional MSA-derived alignments. The GPU-accelerated service (A100 80GB, up to 5 molecules per request and 2048 tokens per complex) returns CIF structures and optional per-residue pLDDT and PAE matrices, supporting single-sequence and MSA-backed inference for structure-based design, docking, and interaction modeling workflows.

View Docs

CLEAN

protein prediction embedding

CLEAN predicts enzyme EC functions directly from amino acid sequence using contrastive learning over ESM-1b representations. The service provides two actions: generating 128-dimensional, function-aware embeddings (encoder) and distance-based EC predictions (predictor) with configurable limits (up to 20 EC numbers per sequence, confidence filtering via a 0–1 GMM-based score). Typical uses include annotating understudied enzymes, flagging potential misannotations, and detecting candidate promiscuous activities in large-scale genomics or metagenomic datasets.

View Docs

DeepViscosity

antibody prediction

DeepViscosity classifies monoclonal antibodies at 150 mg/mL into low-viscosity (≤20 cP) or high-viscosity (>20 cP) classes from paired Fv sequences (VH, VL). The service uses 102 ensemble ANN models trained on 229 mAbs and 30 DeepSP-derived spatial surface descriptors capturing charge and hydrophobicity. The API supports batches of up to 10 antibodies (50–200 AA per chain), returns ensemble class probabilities, and can optionally include the 30 DeepSP features for developability analysis and viscosity driver interpretation.

View Docs

DIAMOND

protein prediction blast alignment

DIAMOND provides ultra-fast, sensitive protein sequence alignments at tree-of-life scale, matching BLASTP-level sensitivity while achieving orders-of-magnitude speedups via cache-aware double indexing, multiple spaced seeds, and vectorized Smith–Waterman extension. The API runs large-batch protein searches (up to 300,000 sequences, length ≤2,000 AA) against configured DIAMOND databases, returning top hits with identities, scores, E-values, and CIGAR strings for applications in functional annotation, remote homology detection, gene age inference, and phylogenomics.

View Docs

DNABERT-2

dna bert prediction embedding

DNABERT-2 provides GPU-accelerated embeddings and sequence log probabilities for DNA sequences up to 2,048 bases, using a Transformer encoder with Byte Pair Encoding (BPE) trained on multi-species genomes. The API supports batched requests (up to 10 sequences) for high-throughput analysis, enabling downstream tasks such as regulatory element classification, variant effect modeling, and similarity search. BPE tokenization and efficient attention reduce computational cost while maintaining strong performance on diverse genome analysis benchmarks.

View Docs

DNABERT API

dna bert

This pre-trained language model surpasses conventional methods, providing superior accuracy and insights for DNA sequence classification, variant calling, and more. Ideal for researchers aiming to unlock genomic complexities efficiently.

View Docs

DNA Chisel

dna prediction analytics design feature-extraction

DNA Chisel computes a configurable set of sequence-level features for linear DNA, including GC content and its windowed variability, codon adaptation index (CAI), melting temperature, restriction site counts for user-selected enzymes, k-mer uniqueness metrics (non-unique 6-mers), codon usage statistics (entropy, rare codons, methionine frequency), composition skews and nucleotide entropy, and simple motif or structure proxies (hairpin score, TATA boxes, tandem repeats, Kozak strength). The API processes one sequence per request for analysis, QC, and design workflows.

View Docs

DSM 150M Base

protein generation embedding

DSM 150M Base is a 150M-parameter diffusion-trained protein language model that encodes amino acid sequences into embeddings, scores sequence log-likelihood under high corruption, and performs mask-based sequence completion. The API provides GPU-accelerated endpoints for encoding (mean/per-residue/CLS embeddings), likelihood scoring (log-probability, perplexity), and generation of up to 32 candidates per masked or unconditional input, for sequences up to length 2048. Typical uses include mutational scanning, variant exploration, and binder template optimization.

View Docs

DSM 650M Base

protein generation embedding

DSM 650M Base is a 650M-parameter diffusion-based protein language model that unifies representation learning and sequence generation. It encodes protein sequences into embeddings (mean, per-residue, CLS), scores sequences by log-likelihood and perplexity, and performs iterative denoising over masked tokens using configurable diffusion steps and remasking strategies. The API supports unconditional generation and masked inpainting for up to 2,048 residues, enabling biomimetic variant design and downstream annotation or screening workflows.

View Docs

DSM 650M PPI

protein generation embedding

DSM 650M PPI encodes, scores, and designs protein–protein binders using a 650M-parameter diffusion-based protein language model fine‑tuned on interaction pairs. The API supports three actions: encoding interacting chains into joint embeddings, scoring candidate partners via interaction log-probability and perplexity, and generating sequence variants for masked regions using diffusion-style remasking (configurable temperature, top‑k/top‑p, and step_divisor) for workflows in binder discovery and PPI engineering.

View Docs

E1 150M

protein prediction embedding

E1 150M is a 150M-parameter retrieval-augmented protein language model for encoding sequences and scoring variants. The API supports GPU-accelerated encoding of single proteins with optional homologous context (up to 50 context sequences, 2048 residues each), returning mean and per-residue embeddings and logits from selectable layers. Masked prediction and log-probability scoring enable zero-shot fitness estimation and residue-level constraint analysis for protein engineering, mutational scanning, and structural proxy tasks.

View Docs

E1 300M

protein prediction embedding

E1 300M is a 300M-parameter retrieval-augmented protein encoder that conditions query sequences on optional homologous context to produce structure- and fitness-relevant representations. The API supports GPU-accelerated encoding (mean and per-token embeddings, logits) and masked amino acid prediction for sequences up to 2048 residues, with up to 50 context sequences. Typical uses include zero-shot fitness scoring, variant ranking, mutation effect analysis, and embeddings for structure-aware protein engineering workflows.

View Docs

E1 600M

protein prediction embedding

E1 600M is a 600M-parameter retrieval-augmented protein encoder that conditions on up to 50 homologous context sequences per query (each up to 2048 residues) to produce sequence embeddings and masked amino acid predictions. The API exposes GPU-accelerated endpoints for encoding (mean and per-token representations, optional logits) and masked prediction, supporting extended amino acid alphabets and batch sizes up to 8. Typical uses include zero-shot fitness scoring, variant ranking, and embeddings for structural or contact-map analyses.

View Docs

ESM1b

prediction esm embedding

Encode protein sequences with the 33-layer ESM-1b Transformer to obtain rich sequence embeddings and masked-token predictions. The service supports GPU-accelerated batched inference for up to 8 sequences of length ≤1022, returning mean, per-residue, BOS, attention, and logit representations from configurable layers. These embeddings capture evolutionary and structural signals useful for downstream models such as mutational effect prediction, secondary structure or contact prediction, and remote homology search.

View Docs

ESM-1v

prediction esm mlm

ESM-1v scores the functional impact of protein sequence variants using a 650M-parameter transformer trained on UniRef90 for zero-shot mutation effect prediction. The API supports scoring single-site substitutions by masking exactly one residue per sequence (up to 512 amino acids, batch size up to 5) and returns per-amino-acid log-probability scores at the masked position from a selected model (n1–n5) or an ensemble over all five. Typical uses include prioritizing missense variants in deep mutational scans, protein engineering, and variant interpretation.

View Docs

ESM-1v #1 API/v1

enzyme protein maturation esm

First of five models in the ESM-1v series, each trained with a different seed. Predict favorable or unfavorable point-variants. NLP model trained on UniRef90. Zero-shot (unsupervised) predictor of functional effects. Ensemble with the remaining four models for best results. doi

View Docs

ESM1v ALL API

enzyme protein maturation esm

Conveniently retrieve predictions from all five ESM-1v models for a given position, allowing for rapid variant ranking and deep mutational scans.

View Docs

ESM1v N1 API

enzyme protein maturation esm

The 1st of the ESM-1v models, for unmasking AA positional likelihoods.

View Docs

ESM1v N2 API

enzyme protein maturation esm

2nd of the ESM-1v models.

View Docs

ESM1v N3 API

enzyme protein maturation esm

3rd ESM-1v model.

View Docs

ESM1v N4 API

enzyme protein maturation esm

4th ESM-1v model.

View Docs

ESM1v N5 API

enzyme protein maturation esm

The 5th of the ESM-1v models, for unmasking AA positional likelihoods.

View Docs

ESM-2 150M

protein prediction esm embedding mlm

ESM-2 150M is a transformer protein language model that encodes amino acid sequences into rich embeddings capturing evolutionary and structural patterns without using MSAs or external databases. The API provides GPU-accelerated encoding of up to 8 protein sequences (≤2048 residues) per request, with options for mean, per-residue, and BOS embeddings, attention maps, logits, and unsupervised contact maps. These representations support downstream tasks such as structure-aware protein engineering, function prediction, and variant effect modeling.

View Docs

ESM-2 35M

protein prediction esm embedding mlm

ESM-2 35M is a compact transformer protein language model for masked language modeling and sequence embeddings. This API exposes fast CPU- and GPU-backed inference for sequences up to 2,048 amino acids, supporting batched requests (up to 8 sequences) for mean, per-token, and BOS embeddings, attention-based contact maps, logits, and masked-position amino acid predictions. It is well suited for large-scale property prediction, mutational effect modeling, and as a general-purpose encoder in protein engineering workflows.

View Docs

ESM-2 3B

protein prediction esm embedding mlm

ESM-2 3B is a transformer protein language model trained on UniRef with a masked amino acid objective, producing sequence representations that encode evolutionary and structural constraints. This API exposes two actions: an encoder that returns selected-layer embeddings (per-sequence mean, per-token, BOS), attention weights, inter-residue contact maps, and per-token logits; and a predictor that recovers masked amino acids at ```` positions. The 3B-scale variant is suitable for structure-, function-, and fitness-focused downstream modeling.

View Docs

ESM2 3B API

enzyme protein embeddings esm logits

Use this 2nd-largest ESM-2 model for highly informative embeddings, contact maps, and more.

View Docs

ESM-2 650M

enzyme protein antibody prediction esm transformer embedding mlm

ESM-2 650M is a transformer protein language model for masked amino acid prediction and sequence embeddings, exposing encoder and predictor endpoints. The encoder supports batches of up to 8 sequences of length ≤2048, returning mean or per-token embeddings, BOS embeddings, self-attention maps, contact maps, and logits from configurable representation layers. The predictor fills one or more tokens per sequence with per-position logits, enabling mutational scanning, design scoring, and downstream structure/function modeling workflows.

View Docs

ESM-2 8M

protein prediction esm embedding mlm

ESM-2 8M is a compact transformer protein language model that generates sequence embeddings and predicts masked amino acids from single protein sequences. The API supports batch processing of up to 8 sequences with lengths up to 2048 residues, returning mean, per-token, BOS embeddings, attention maps, contact scores, and logits as requested. Smaller variants run on CPU, while larger ones use GPU for faster inference. Typical uses include feature extraction for downstream models, contact map estimation, and zero-shot mutation effect scoring.

View Docs

ESM2StabP

prediction esm

Predict protein melting temperatures (Tm) and thermophilic vs non-thermophilic class directly from amino acid sequence using an ESM2-based regression model with random forest on layer-33 embeddings. The API supports batch prediction (up to 8 sequences, max length 1,022) and optionally incorporates optimal growth temperature and experimental condition (cell or lysate) to refine Tm estimates. Typical uses include screening enzyme variants, assessing stability in proteome-wide studies, and prioritizing candidates for experimental validation.

View Docs

ESM C 300M

protein prediction esm language-model embedding mlm

Generate protein sequence embeddings and masked-token predictions using ESM C 300M, a 300M-parameter transformer trained with masked language modeling on large-scale protein datasets. The API provides GPU-accelerated encoding for up to 8 sequences of length ≤2048, returning mean or per-token representations and optional logits. Use these embeddings and scores as inputs to downstream models for structure, function, fitness, and design tasks, or for large-scale similarity search and clustering.

View Docs

ESM C 600M

protein prediction esm embedding mlm

ESM C 600M is a 600M-parameter transformer protein language model for unsupervised representation learning on evolutionary-scale sequence data. The API provides GPU-accelerated encoding of proteins up to 2048 residues (batch size up to 8), returning mean and/or per-token embeddings from configurable layers and optional logits, as well as masked-residue prediction via a dedicated predictor endpoint. Typical uses include feature extraction for downstream models, mutation scoring, protein engineering, and large-scale functional annotation.

View Docs

ESMFold

protein structure prediction esm folding alphafold2

ESMFold predicts atomic-resolution 3D structures of single proteins or small complexes (up to 4 chains, provided as colon-separated sequences) directly from amino acid sequence without MSAs or templates. The service runs the ESM-2-based folding network on GPU with batch prediction (up to 2 sequences, length ≤ 768 residues each, including chain separators) and returns PDB-formatted coordinates plus mean pLDDT and pTM confidence scores, supporting protein engineering, metagenomic annotation, and structural analysis workflows.

View Docs

ESMFold Multi-Chain API

protein structure prediction esm

For multi-chain proteins like antibodies, predict the folded structure in seconds with v2 of ESMFold. Similar results, speed, and accuracy to single-chain folding endpoint, but now available for more complex sequences. PDBs via REST in less than a minute, using one of the largest protein LMs to date. doi

View Docs

ESM-IF1

structure esm generation inverse-fold

Generate protein sequences from backbone coordinates using the ESM-IF1 inverse folding model. Given a PDB structure, the service samples up to 3 sequences per request for a specified chain, with configurable temperature and optional multi-chain conditioning for complexes. Outputs include designed amino acid sequences and per-sample native sequence recovery, enabling fixed-backbone and interface-focused protein design, as well as rapid in silico exploration of sequence variants compatible with a target 3D conformation.

View Docs

Evo 1.5 8k Base

protein dna prediction generation multimodal rna

Evo 1.5 8k Base is a 7B-parameter autoregressive genomic language model trained on ~300B nucleotides from prokaryotic genomes with 8,192-token context at single-nucleotide resolution. The API provides GPU-accelerated scoring (log-probabilities) and conditional generation of unambiguous DNA sequences up to 4,096 bases, with batch sizes up to 2 and configurable sampling (temperature, top-k/top-p). Typical uses include genome-scale mutational scoring, regulatory element or gene design, and generation of multi-gene constructs and CRISPR-associated loci.

View Docs

Evo 2 1B Base

protein dna prediction generation multimodal rna embedding

Evo 2 1B Base is a genomic language model for DNA sequence analysis and generation, trained on the OpenGenome2 corpus with single-nucleotide resolution. This API exposes the 1B-parameter, 8k-context variant for GPU-accelerated encoding, likelihood scoring, and short-range generation of unambiguous DNA (A/C/G/T) up to 4,096 bp per request. Typical uses include computing log-probabilities for variant effect scoring, deriving sequence embeddings for downstream models, and prompt-conditioned local sequence design in genomics and synthetic biology workflows.

View Docs

Evo v1.5-8k Base

dna rna language-model

1.5-B parameter Evo multimodal LM (DNA/RNA) with 8 k context; supports log-prob scoring and sequence generation.

View Docs

Global Label Membrane MPNN

protein structure generation design mpnn

Design protein sequences on fixed backbones using a ProteinMPNN variant globally conditioned to be membrane or soluble. Global Label Membrane MPNN takes PDB backbones (up to 1,024 residues) and generates batches of sequences per structure with a user-specified global_transmembrane_label, preserving fold while matching the desired environment. This is useful for designing membrane-mimetic scaffolds, solubilized analogues of membrane folds, and for comparing membrane vs soluble sequence solutions for the same topology.

View Docs

HyperMPNN

protein structure generation mpnn

HyperMPNN designs thermostable protein variants from single-chain 3D structures by retraining ProteinMPNN on predicted hyperthermophilic proteomes. The API generates sequence designs biased toward hyperthermophilic-like amino acid compositions (enriched hydrophobic core and charged surfaces) while supporting fixed and redesigned residues, homo-oligomer and symmetry constraints, and optional transmembrane region annotations. It returns full-length sequences with per-residue probabilities and confidence scores for structure-guided stability engineering.

View Docs

IgBert Paired

antibody embeddings bert prediction generation language-model embedding

IgBert is an antibody-specific BERT model trained on over two billion unpaired and two million paired heavy–light variable region sequences from Observed Antibody Space. This API provides GPU-accelerated encoding of paired or unpaired antibody sequences into mean and residue-level embeddings, log-probability scoring of full sequences, and constrained sequence generation using “*” placeholders. Typical uses include repertoire analysis, developability filtering, affinity/expression proxy modeling, and in silico antibody design workflows.

View Docs

IgT5 Paired

antibody embeddings t5 embedding

IgT5 Paired encodes antibody variable-region heavy–light chain pairs into contextual embeddings trained on millions of paired sequences from Observed Antibody Space. The encoder endpoint supports GPU-accelerated batches of up to 8 items, each with heavy and light chains up to 256 residues, or a single unpaired sequence up to 512 residues. It returns mean and/or per-residue embeddings per input, enabling downstream models for binding affinity or expression prediction and antibody engineering tasks that require cross-chain representations.

View Docs

IgT5 Unpaired

antibody embeddings t5 embedding

Generate antibody-specific embeddings for unpaired variable region sequences using IgT5, a T5-based transformer model pre-trained on over two billion OAS antibody sequences and fine-tuned for encoding tasks. The encoder endpoint supports single-chain inputs up to 512 residues with batch processing up to 8 items, returning mean and optional per-residue embeddings. These representations are suitable for downstream tasks such as repertoire analysis, property prediction, and therapeutic antibody design workflows.

View Docs

ImmuneFold Antibody

antibody structure prediction folding

ImmuneFold predicts atomic-level 3D structures of antibodies and nanobodies from amino acid sequences using a LoRA-fine-tuned ESMFold model specialized for immune proteins. The API supports heavy/light chain or single-domain inputs (up to 256 residues per chain) and optional antigen context via a PDB snippet, returning PDB coordinates with pLDDT and pTM confidence scores. ImmuneFold focuses on accurate modeling of CDRs, particularly CDR H3, for structure-guided antibody engineering and immunotherapy design workflows.

View Docs

ImmuneFold TCR

structure prediction folding tcr t-cell

ImmuneFold TCR predicts atomic-resolution 3D structures of T-cell receptors using a LoRA-fine-tuned Evoformer trunk on top of ESMFold. The API supports sequence-based TCR complex modeling from paired α/β chains plus peptide–MHC (A, B, P, M), with sequence lengths up to 256 and batched inference up to 32 items. Outputs include PDB coordinates and confidence metrics (per-residue pLDDT, global pLDDT, pTM), enabling structure-guided analysis and downstream, structure-based TCR–epitope binding assessment.

View Docs

LigandMPNN

protein structure generation design mpnn

LigandMPNN designs protein sequences and sidechain conformations in explicit atomic context of bound small molecules, nucleotides or metals, using fixed protein backbones and ligand coordinates from input PDBs. The API generates up to 2×1 sequences per request with optional sidechain repacking (up to 16 samples, 3 denoising steps), and reports overall and ligand-site confidence, per-residue log-probabilities and sequence recovery over redesigned residues. Typical uses include enzyme active-site design, small-molecule binders, sensors and DNA/RNA-binding interfaces.

View Docs

MPNN

protein design mpnn

ProteinMPNN generative design model. Supports sequence generation from input PDBs.

View Docs

MSA Transformer

prediction embedding

MSA Transformer encodes protein multiple sequence alignments to produce structure-aware sequence embeddings, tied row attention maps, and contact probability matrices. The API accepts aligned protein MSAs (up to 1024 positions × 256 sequences per item, batch size 4) and returns mean or per-position embeddings from selected layers, plus optional row attentions and contact maps. This enables analysis of coevolutionary constraints and extraction of features for downstream structure, fitness, and function modeling.

View Docs

nanoBERT

antibody embeddings bert prediction generation language-model embedding nanobody

nanoBERT is a nanobody-specific BERT model trained on 10 million INDI VHH sequences for masked-residue prediction and sequence representation. The API provides CPU-only, batched inference (up to 32 sequences, length ≤154 AAs) for encoding (mean and per-residue embeddings, logits), sequence infilling using "*" masks, and log-probability scoring. Typical uses include mapping nanobody mutational feasibility, ranking variants by model nativeness, and supplying embeddings for downstream stability or developability models.

View Docs

NanoBodyBuilder2

antibody structure prediction folding nanobody

NanoBodyBuilder2 predicts all-atom 3D structures for single-chain nanobody (VHH) heavy-chain sequences, with a focus on accurate CDR loop conformations, including CDR3. The API accepts nanobody heavy-chain sequences (``H``) up to 2048 residues and returns refined PDB models produced via a deep learning backbone followed by physics-based energy minimisation. NanoBodyBuilder2 runs on CPU-only nodes (2 vCPUs, 8 GB RAM) and is suitable for high-throughput nanobody repertoire analysis, developability assessment, and structure-guided design workflows.

View Docs

Omni-DNA 1B

dna embeddings prediction multimodal language-model embedding embedding print(resp.json())

Omni-DNA 1B is a transformer-based genomic language model that encodes unambiguous DNA sequences into fixed-length embeddings and estimates sequence log probabilities via next-token prediction. The API supports batches of up to 2 sequences, each up to 2048 bp, with GPU-accelerated inference on 1B parameters. Omni-DNA 1B is pretrained on 300B nucleotides and achieves strong performance on NT and Genomic Benchmark tasks, making it suitable for regulatory element modeling, variant prioritization pipelines, and downstream classifier or generative model conditioning.

View Docs

Peptides

protein antibody prediction feature-extraction peptide

Compute a rich set of physicochemical and structural descriptors for peptide and protein sequences, including charge, hydrophobicity, hydrophobic moment, aliphatic and instability indices, Boman index, isoelectric point, molecular weight, m/z, mass shift, and per-residue descriptor vectors. The encoder endpoint supports batches of up to 10 sequences (length ≤ 2048 AAs) and can optionally return vector profiles, enabling feature engineering, QSAR modeling, AMP classification, and design/optimization of peptides and proteins.

View Docs

Per-Residue Label Membrane MPNN

protein structure generation design mpnn

Design protein sequences for membrane protein backbones using a ProteinMPNN variant trained with per-residue membrane annotations. For each input PDB, the model samples sequences conditioned on position-wise labels for buried transmembrane and interface residues, enabling fine-grained control over solubility and membrane preference at specific sites. Use residue-level labels, bias terms, and fixed/redesigned residue sets to guide mutations for tuning topology, surface polarity, and environment compatibility.

View Docs

Pro4S Classification

structure prediction

Pro4S Classification predicts protein solubility from amino acid sequence and corresponding 3D structure (PDB or mmCIF) for a specified chain. The model fuses sequence embeddings, structural graphs, and surface descriptors to output a solubility probability in [0, 1] and an optional soluble/insoluble label. This service is suited for screening recombinant constructs, assessing developability, and prioritizing *de novo* protein designs for higher expression success.

View Docs

Pro4S Regression

structure prediction

Pro4S Regression predicts quantitative protein solubility (0–1 score) from amino acid sequence plus 3D structure provided as PDB or CIF, using fused sequence, structural, and surface features. The API supports batches of up to 4 sequences (≤2048 residues) with single-chain structures and returns a continuous solubility_score suitable for ranking and filtering. Typical uses include prioritizing highly expressible, soluble designs and excluding unstable or aggregation-prone candidates in protein engineering workflows.

View Docs

ProGen2 BFD90

generation

Generate up to 3 plausible protein sequence continuations from an enzyme-like N-terminal prompt using the ProGen2 BFD90 variant, trained on Uniref90 plus a metagenomic-rich BFD90 dataset. Sampling is GPU-accelerated on T4-class hardware, supporting context and generated lengths up to 512 amino acids per request. This service is suited for exploring sequence variants in silico for protein engineering, library design, and hypothesis generation around mutational landscapes.

View Docs

ProGen2 BFD90 API

protein generator gpt

BFD-90 pretrained model from the suite of ProGen2 generative models. Tune outputs with your choice of pretrained model, temperature, length, and more.

View Docs

ProGen2 Large

generation

Generate up to three plausible protein sequences by autoregressively extending a user-provided amino acid context using a 2.7B-parameter ProGen2 model (LARGE or BFD90 variant). The GPU-accelerated service supports sequences up to 512 residues, configurable temperature and nucleus (top-p) sampling, and returns per-sequence log-likelihood scores (sum and mean). Typical uses include exploring local sequence variants for downstream structure prediction, fitness modeling, or protein engineering design cycles.

View Docs

ProGen2 Large API

protein generator gpt

ProGen2 Large contains 2.7B parameters and is the second-largest model in the suite of generative models. Tune outputs with your choice of pretrained model, temperature, length, and more.

View Docs

ProGen2 Medium

generation

Generate up to 3 plausible protein sequence continuations from an N-terminal context using a 764M-parameter ProGen2 autoregressive transformer trained on ∼1B diverse natural proteins (UniRef90+BFD30). The GPU-accelerated service supports contexts and generated lengths up to 512 residues, with configurable temperature and nucleus (top-p) sampling. Typical uses include exploring sequence variants for protein engineering, library design, and hypothesis testing while scoring variants by log-likelihood.

View Docs

ProGen2 Medium API

protein generator gp2

For faster protein generation, use this 764M parameter model from the ProGen2 suite of generative models. Tune outputs with your choice of pretrained model, temperature, length, and more.

View Docs

ProGen2 OAS

generation

Generate up to 3 antibody VH sequences autoregressively using the OAS-trained ProGen2 variant, conditioned on an OAS-style germline framework provided as context. Sampling controls (temperature, top-p) and a configurable maximum sequence length of 12–512 amino acids allow exploration of VH sequence diversity within realistic immune-repertoire constraints. Typical uses include constructing in silico VH libraries for downstream screening, developability profiling, and integration into antibody engineering workflows.

View Docs

ProGen2 OAS API

antibody generator gpt

ProGen2 OAS was trained on the Observed Antibody Space, making it more suitable to some generative-antibody applications. Tune outputs with your choice of pretrained model, temperature, length, and more.

View Docs

ProstT5 AA2Fold

enzyme protein antibody structure embeddings folding proteins generation seq2seq encoder-decoder inverse-folding t5 embedding

ProstT5 AA2Fold predicts 3Di fold sequences directly from amino acid input using a bilingual protein language model fine-tuned on AlphaFoldDB-derived structures. The API exposes GPU-accelerated encoding and generation for AA→3Di, supporting batches of up to 16 sequences (encode) or 2 sequences (generate) with lengths up to 1000 and 512 residues, respectively. ProstT5 enables fast structure-aware embeddings and fold tokens for remote homology search, large-scale proteome annotation, and downstream structure-related prediction models without explicit 3D coordinate inference.

View Docs

ProstT5 Fold2AA

enzyme protein antibody structure embeddings folding proteins generation seq2seq encoder-decoder inverse-folding t5 embedding inverse-fold

ProstT5 Fold2AA generates amino acid sequences from protein 3Di structural token strings and can also embed these fold representations via a GPU-accelerated encoder. The API supports batches of up to 2 input 3Di sequences (length ≤512 for generation, ≤1000 for encoding) using lowercase 3Di tokens (a–y). Typical uses include inverse folding, structure-guided sequence diversification, and deriving structure-aware embeddings for remote homology search and downstream ML models without explicit 3D coordinate prediction.

View Docs

ProteInfer EC (Enzyme Commmision) API

enzyme protein prediction EC

ProteInfer EC Prediction API leverages AI to accurately predict enzyme commission (EC) numbers from protein sequences, enhancing enzymatic research and biotechnological applications.

View Docs

ProteInfer GO (Gene Ontology) API

protein GO predictor function

Harness the power of deep learning to predict Gene Ontology (GO) terms for proteins directly from sequence data. ProteInfer leverages a comprehensive dataset and advanced models to provide accurate functional annotations, offering insights into biological processes, cellular components, and molecular functions.

View Docs

ProteinMPNN

protein structure generation design mpnn

Design protein sequences for fixed 3D backbones using ProteinMPNN, a message-passing neural network for structure-conditioned inverse folding. The API accepts single- or multi-chain PDB inputs (up to 1024 residues total) and generates batched sequence samples with controllable temperature, residue-level constraints (fixed/redesigned/omitted/biased), chain selection, and symmetry tying across positions or chains. Variants support soluble and membrane labeling and ligand-context design, enabling monomers, oligomers, interfaces, and assemblies.

View Docs

Sadie Antibody

antibody prediction classification feature-extraction nanobody renumbering

SADIE annotates antibody amino acid sequences to assign V/J genes, CDR and framework boundaries, and standardized numbering using HMMER-based alignment and configurable schemes (Chothia, IMGT, Kabat) and region definitions. The API returns AIRR-like fields including species, chain type, gene calls, per-gene identity and scores, residue-level numbering, and region sequences (FWR/CDR with and without gaps), supporting heavy and light chains. This service enables downstream repertoire analysis, CDR engineering, and immunoinformatics data curation.

View Docs

Soluble ProteinMPNN

protein structure generation design mpnn

Design soluble protein sequences for complex and membrane-derived folds from fixed backbones using a ProteinMPNN variant retrained on soluble structures. The service samples sequences that match the input backbone while reducing surface hydrophobic residues, and supports per-residue constraints, amino acid sampling biases, and global or per-residue transmembrane labeling controls. Typical use cases include designing soluble analogs of GPCR-, claudin-, and rhomboid-like folds for structure-function studies and screening-compatible scaffolds.

View Docs

SoluProt

protein prediction

SoluProt predicts the probability of soluble overexpression of recombinant proteins in *E. coli* directly from amino acid sequence, returning both a calibrated probability (0–1) and a binary solubility call (threshold ≥ 0.5). The model is based on a gradient boosting classifier trained on a curated TargetTrack-derived dataset and evaluated on an independent NESG-based test set (accuracy 58.5%, AUC 0.62). The API supports batched inference on 1–100 sequences of length 20–5000 aa for prioritizing soluble candidates in cloning, expression screening, and enzyme discovery pipelines.

View Docs

SPURS

protein structure prediction ddg

SPURS predicts changes in protein thermostability (ΔΔG, kcal/mol) for amino acid substitutions by integrating ESM sequence embeddings with ProteinMPNN structure embeddings via a rewired adapter architecture. The API accepts a protein sequence plus PDB or mmCIF content and a single-letter chain ID, supports batched inference for up to 4 proteins per call with sequences up to 1,024 residues, and returns either ΔΔG for user-specified point mutations or an L×20 single-mutation ΔΔG matrix for saturation mutagenesis and stability-focused protein engineering.

View Docs

TCRBuilder2

structure prediction folding tcr t-cell

TCRBuilder2 predicts 3D T-cell receptor variable-domain structures from paired alpha (A) and beta (B) chain amino acid sequences, returning refined all-atom PDB models. The model is TCR-specific, trained on curated structural datasets, and runs CPU-only with support for batches of up to 8 TCRs and chain lengths up to 2048 residues. The service supports high-throughput TCR structure generation for repertoire analysis, TCR–pMHC interaction studies, and structure-guided TCR engineering.

View Docs

TCRBuilder2+

structure prediction folding tcr t-cell

TCRBuilder2+ predicts atomic-level 3D structures of paired T-cell receptors from alpha (A) and beta (B) chain amino acid sequences, using updated ImmuneBuilder weights for improved backbone accuracy across CDR and framework regions. The API accepts batches of up to 8 TCRs (sequence length ≤2048 residues per chain) and returns refined PDB structures suitable for downstream analysis. Typical uses include structural characterization of TCR repertoires, TCR–pMHC interaction studies, and immunotherapy design workflows.

View Docs

TemBERTure Classifier

prediction transformer embedding

TemBERTure is an adapter-finetuned protBERT-BFD model for protein thermostability analysis that classifies sequences as thermophilic or non-thermophilic and outputs a calibrated probability score. The API exposes a GPU-accelerated classifier (up to 8 sequences per request, max length 512) and an encoder that returns sequence-level, per-residue, or CLS embeddings for downstream modeling. Typical uses include screening enzymes or proteomes, guiding thermostable variant selection, and building custom thermostability predictors.

View Docs

TemBERTure Regression

prediction transformer embedding

Predict protein melting temperature (Tm, °C) and thermal class from amino acid sequence using a protBERT-BFD–based regression and classification framework. The predictor endpoint returns a continuous Tm estimate plus an optional thermophilic/non-thermophilic label, while the encoder endpoint provides sequence, per-residue, or CLS embeddings (max 512 residues, batch size up to 8). Typical uses include thermostability triage of enzyme variants, proteome-scale Tm profiling, and feature extraction for downstream screening models.

View Docs

TEMPRO 3B

protein prediction

TEMPRO 3B predicts nanobody (sdAb/VHH) melting temperature (Tm, °C) directly from amino acid sequence using protein language model embeddings. The API accepts single-chain protein sequences of 100–160 residues (typical nanobody length) and returns per-sequence Tm estimates suitable for ranking variants, screening libraries, and triaging designs by thermostability. The service is optimized for batched inference (up to 8 sequences per request) and supports high-throughput workflows in therapeutic, diagnostic, and research settings.

View Docs

TEMPRO 650M

protein prediction

TEMPRO 650M predicts nanobody (VHH/sdAb) melting temperature (Tm) directly from amino acid sequence using ESM-2 650M protein embeddings and a deep neural network trained on 567 nanobodies. The API accepts single-chain nanobody-like sequences of 100–160 residues (batch size up to 8) and returns per-sequence Tm estimates in °C. Typical uses include screening, ranking, and prioritizing nanobody variants for thermostability in therapeutic, diagnostic, and biotechnological workflows.

View Docs

ThermoMPNN

protein structure prediction ddg

ThermoMPNN predicts ΔΔG for single amino acid substitutions in structured proteins using a graph neural network with transfer learning from ProteinMPNN. Given a PDB structure (up to 1,024 residues) and an optional chain ID, the API scores user-specified point mutations or performs full site-saturation mutagenesis when no mutation list is provided. This service supports high-throughput, GPU-accelerated stability profiling for protein engineering, variant prioritization, and mechanistic analysis of local stability effects.

View Docs

ThermoMPNN-D

protein structure prediction ddg

ThermoMPNN-D predicts protein stability changes (ΔΔG, kcal/mol) for single and double point mutations using a structure-based neural network derived from ProteinMPNN. The API exposes three modes: single-mutation, additive double-mutation (sum of two single ΔΔG values), and epistatic double-mutation prediction, with optional site-saturation mutagenesis scans when mutations are omitted. A distance filter (Å) restricts double mutations to nearby residues, and a configurable ΔΔG threshold (default ≤ -0.5 kcal/mol) prioritizes stabilizing variants for protein engineering.

View Docs

UniRef50 Embedding Similars API

Accelerate your protein sequence searches with our blazing-fast Nearest Neighbors Search API. By leveraging protein language models to generate vector embeddings for over 65 million UniRef50 sequences, our service enables lightning-fast similarity searches that are 1,200 times faster than traditional Levenshtein-based methods like BLAST.

View Docs

ZymCTRL

generation embedding

ZymCTRL is a conditional enzyme language model trained on >36M BRENDA sequences to generate artificial enzymes conditioned on EC numbers and encode them into sequence embeddings. The API exposes two actions: generation of amino acid sequences up to 1,024 residues from user-specified EC numbers, and encoding of sequences into fixed-length or per-residue embeddings, optionally conditioned on EC. Typical uses include designing candidate enzymes for specific catalytic reactions, exploring sequence space distant from known enzymes, and deriving representations for downstream modeling.

View Docs

ZymCTRL API

enzymes generation

View Docs

Get Started

Build AI

API Models

UniRef50 Embedding Similars API

ZymCTRL API