Peptides computes a focused set of physicochemical descriptors for antimicrobial peptide (AMP) analysis, including sequence length, amino acid composition, net charge, aliphatic index, molecular weight, isoelectric point, hydrophobicity (GRAVY), instability index, Boman index, and hydrophobic moment, with optional per-residue profiles. The encoder endpoint accepts amino acid sequences up to 2048 residues in batches of up to 10, returning numeric and vector features for AMP classification, virtual screening, and peptide design workflows.
Encode¶
Generate embeddings for input sequences
- POST /api/v3/peptides/encode/¶
Encode endpoint for Peptides.
- Request Headers:
Content-Type – application/json
Authorization – Token YOUR_API_KEY
Request
params (object, optional) — Configuration parameters:
include (array of strings, optional, default: []) — Additional feature groups to compute
Allowed values:
“vector” — Include vector-based feature profiles
items (array of objects, required, min items: 1, max items: 10) — Input peptide sequences:
sequence (string, required, min length: 1, max length: 2048) — Peptide sequence using extended amino acid codes
Example request:
- Status Codes:
200 OK – Successful response
400 Bad Request – Invalid input
500 Internal Server Error – Internal server error
Response
results (array of objects) — One result per input item, in the order requested:
features (object) — Computed peptide features and descriptor values:
aliphatic_index (float, ≥ 0.0) — Aliphatic index value
boman (float) — Boman (potential protein interaction) index
charge (float) — Net charge at pH 7
descriptors (object, optional) — Collection of peptide structural descriptors (when present):
length (int, ≥ 1) — Number of amino acids in sequence
amino_acid_composition (object) — Amino acid class composition metrics:
Tiny (object) — Class containing amino acids A, C, G, S, T
number (int, ≥ 0) — Count of tiny amino acids
mole_percent (float, 0.0–100.0) — Mole percentage of tiny amino acids
Small (object) — Class containing amino acids A, B, C, D, G, N, P, S, T, V
number (int, ≥ 0) — Count of small amino acids
mole_percent (float, 0.0–100.0) — Mole percentage of small amino acids
Aliphatic (object) — Class containing amino acids A, I, L, V
number (int, ≥ 0) — Count of aliphatic amino acids
mole_percent (float, 0.0–100.0) — Mole percentage of aliphatic amino acids
Aromatic (object) — Class containing amino acids F, H, W, Y
number (int, ≥ 0) — Count of aromatic amino acids
mole_percent (float, 0.0–100.0) — Mole percentage of aromatic amino acids
NonPolar (object) — Class containing amino acids A, C, F, G, I, L, M, P, V, W, Y
number (int, ≥ 0) — Count of non-polar amino acids
mole_percent (float, 0.0–100.0) — Mole percentage of non-polar amino acids
Polar (object) — Class containing amino acids D, E, H, K, N, Q, R, S, T, Z
number (int, ≥ 0) — Count of polar amino acids
mole_percent (float, 0.0–100.0) — Mole percentage of polar amino acids
Charged (object) — Class containing amino acids B, D, E, H, K, R, Z
number (int, ≥ 0) — Count of charged amino acids
mole_percent (float, 0.0–100.0) — Mole percentage of charged amino acids
Basic (object) — Class containing amino acids H, K, R
number (int, ≥ 0) — Count of basic amino acids
mole_percent (float, 0.0–100.0) — Mole percentage of basic amino acids
Acidic (object) — Class containing amino acids B, D, E, Z
number (int, ≥ 0) — Count of acidic amino acids
mole_percent (float, 0.0–100.0) — Mole percentage of acidic amino acids
frequencies (object) — Per-residue frequencies, as fractional counts for each amino acid symbol (e.g.
A_frequency,R_frequency)hydrophobic_moment (float, ≥ 0.0) — Hydrophobic moment value
hydrophobicity (float) — Mean hydrophobicity index
instability_index (float) — Instability index value
isoelectric_point (float) — Isoelectric point (pI) on pH scale
mass_shift (float) — Mass shift value in daltons (Da)
molecular_weight (float) — Molecular weight in daltons (Da)
mz (float) — Mass-to-charge ratio (m/z)
hydrophobic_moment_profile (array of floats, length: sequence length - 10) — Sliding-window hydrophobic moment values (empty if
vectorfeatures are not requested)hydrophobicity_profile (array of floats, length: sequence length - 10) — Sliding-window mean hydrophobicity values (empty if
vectorfeatures are not requested)linker_preference_profile (array of floats, length: sequence length) — Linker preference scores per position (empty if
vectorfeatures are not requested)
Example response:
Performance¶
CPU-only execution with 0.125 vCPU and 1 GB RAM per worker; no GPU is used. The implementation relies on lightweight, analytic descriptors (charge, hydrophobicity, instability index, etc.), so runtime scales approximately linearly with sequence length and batch size.
Predictive performance for AMP-related tasks is comparable to the original Peptides R package: when these descriptors are used with linear discriminant analysis, reported AMP classification accuracy reaches ~95%, and ~85% with CART-based models on benchmark datasets.
Compared to structure-focused models in the BioLM ecosystem (e.g., ESMFold or AlphaFold2), Peptides is substantially faster and less compute-intensive because it avoids 3D structure prediction and focuses on closed-form physicochemical calculations. For rapid AMP screening or feature generation, it typically offers higher throughput per CPU core than deep sequence or structure models.
Applications¶
Rapid screening and prioritization of antimicrobial peptide (AMP) candidates by computing physicochemical descriptors (e.g., hydrophobic moment, net charge, aliphatic and instability indices, GRAVY, amino-acid composition), enabling biotech teams to rank and filter large in silico peptide libraries for therapeutic or agricultural applications; not optimal for peptides longer than ~50 amino acids where AMP-focused heuristics are less informative.
Optimization of membrane-active peptide antibiotics by quantifying Boman index, hydrophobicity, and hydrophobic moment profiles, helping researchers tune peptide–membrane selectivity and reduce off-target protein binding in lead series; less informative for peptides whose primary mechanism is via specific intracellular protein targets.
Stability and manufacturability assessment of AMP hits using aliphatic and instability indices alongside molecular weight and isoelectric point, allowing formulation and CMC teams to flag candidates with predicted poor thermostability, solubility, or proteolytic sensitivity before synthesis; accuracy may be reduced for sequences with noncanonical residues or extensive post-translational modifications, which are not explicitly modeled.
Classification and pre-filtering of novel peptide sequences by exporting computed descriptors and frequencies as model-ready feature vectors for downstream machine learning (e.g., LDA, CART, other in-house classifiers), enabling companies to build internal AMP vs non-AMP or spectrum-of-activity models; performance may degrade on peptides that are highly dissimilar from the physicochemical space represented in training data.
Support for peptide–membrane modeling workflows by generating hydrophobicity and hydrophobic moment profiles suitable for integration with external molecular dynamics pipelines (e.g., GROMACS XVG analyses handled outside this API), helping structural biologists rationalize sequence regions that are likely transmembrane, surface-associated, or globular; not suitable for direct parsing or visualization of MD trajectory files within this API.
Limitations¶
Maximum Sequence Length: Each
sequencemust be at most2048amino acids long (longer inputs should be truncated or split before submission).Batch Size: A single request can contain up to
10sequences initems(larger datasets must be chunked across multiple API calls).Input Scope: Features are computed from primary amino acid sequences only; no structural (3D), molecular dynamics, or GROMACS
XVGfile inputs are supported.Output Scope: The API returns physicochemical descriptors and profiles in
features; it does not directly predict antimicrobial activity, potency, toxicity, or other biological endpoints.Sequence Length Regime: Descriptors are most relevant for short peptides (typically <50 residues). For long protein-like sequences, interpretations for antimicrobial peptide design or classification may be less meaningful.
Embeddings and ML Workflows: Outputs are engineered physicochemical descriptors (numeric and vector features), not latent sequence embeddings. For unsupervised clustering, visualization, or representation learning, sequence-embedding models are generally more appropriate.
How We Use It¶
BioLM uses the Peptides algorithm as a standardized feature generator for antimicrobial peptide R&D, turning raw sequences into rich physicochemical descriptors that feed larger ML workflows for classification, prioritization, and design. These descriptors (charge, hydrophobicity, aliphatic index, instability index, Boman index, hydrophobic moment, membrane-position proxies, and related profiles) are combined with sequence embeddings, AMP classifiers, and generative models to define candidate design spaces, enforce developability constraints, and link in silico optimization with downstream synthesis and testing.
Enables high-throughput, API-driven computation of peptide properties and profiles for batches of up to 10 sequences (length ≤ 2048 aa).
Integrates into iterative design loops, where Peptides features guide filtering, ranking, and retraining of AMP and protein engineering models.
References¶
Osorio, D., Rondon-Villarreal, P., & Torres, R. (2015). Peptides: A package for data mining of antimicrobial peptides. The R Journal, 7(1), 4–14. https://doi.org/10.32614/RJ-2015-001
