The Peptides algorithm computes ten physicochemical properties critical for antimicrobial peptide (AMP) characterization and design, including length, net charge, amino acid composition, hydrophobicity (GRAVY), instability index, and hydrophobic moment. It accepts amino acid sequences to generate structural descriptors useful in AMP classification, screening, and design workflows. GPU-accelerated inference allows efficient, scalable analysis suitable for high-throughput peptide engineering, bioinformatics annotation pipelines, and computational drug discovery.
Encode¶
Generate embeddings for input sequences
- POST /api/v3/peptides/encode/¶
Encode endpoint for Peptides.
- Request Headers:
Content-Type – application/json
Authorization – Token YOUR_API_KEY
Request
params (object, optional) — Configuration parameters:
include (array of strings, optional, default: []) — Additional feature types to compute:
Allowed values:
“vector” — Compute vector-based features
items (array of objects, required, min items: 1, max items: 10) — Input peptide sequences:
sequence (string, required, min length: 1, max length: 2048) — Peptide sequence using extended amino acid codes
Example request:
- Status Codes:
200 OK – Successful response
400 Bad Request – Invalid input
500 Internal Server Error – Internal server error
Response
results (array of objects) — One result per input item, in the order requested:
features (object) — Computed physicochemical and structural peptide features:
aliphatic_index (float, ≥ 0.0) — Thermostability index based on aliphatic amino acid content
boman (float) — Potential peptide-protein interaction index (typically negative or near zero for antimicrobial peptides)
charge (float) — Net peptide charge at pH 7 (computed using EMBOSS pKa scale)
descriptors (object) — Collection of peptide structural descriptors:
length (int, ≥ 1) — Number of amino acids in sequence
amino_acid_composition (object) — Counts and mole percentages (%) of amino acid classes:
Tiny (object) — Amino acids: A, C, G, S, T
number (int, ≥ 0) — Count of tiny amino acids
mole_percent (float, 0.0–100.0) — Mole percentage of tiny amino acids
Small (object) — Amino acids: A, B, C, D, G, N, P, S, T, V
number (int, ≥ 0) — Count of small amino acids
mole_percent (float, 0.0–100.0) — Mole percentage of small amino acids
Aliphatic (object) — Amino acids: A, I, L, V
number (int, ≥ 0) — Count of aliphatic amino acids
mole_percent (float, 0.0–100.0) — Mole percentage of aliphatic amino acids
Aromatic (object) — Amino acids: F, H, W, Y
number (int, ≥ 0) — Count of aromatic amino acids
mole_percent (float, 0.0–100.0) — Mole percentage of aromatic amino acids
NonPolar (object) — Amino acids: A, C, F, G, I, L, M, P, V, W, Y
number (int, ≥ 0) — Count of non-polar amino acids
mole_percent (float, 0.0–100.0) — Mole percentage of non-polar amino acids
Polar (object) — Amino acids: D, E, H, K, N, Q, R, S, T, Z
number (int, ≥ 0) — Count of polar amino acids
mole_percent (float, 0.0–100.0) — Mole percentage of polar amino acids
Charged (object) — Amino acids: B, D, E, H, K, R, Z
number (int, ≥ 0) — Count of charged amino acids
mole_percent (float, 0.0–100.0) — Mole percentage of charged amino acids
Basic (object) — Amino acids: H, K, R
number (int, ≥ 0) — Count of basic amino acids
mole_percent (float, 0.0–100.0) — Mole percentage of basic amino acids
Acidic (object) — Amino acids: B, D, E, Z
number (int, ≥ 0) — Count of acidic amino acids
mole_percent (float, 0.0–100.0) — Mole percentage of acidic amino acids
frequencies (object) — Amino acid frequency distribution (counts and percentages)
hydrophobic_moment (float, ≥ 0.0) — Amphiphilicity measure (computed using Eisenberg scale, window size: 11, angle: 100°)
hydrophobicity (float) — Mean hydrophobicity index (Eisenberg scale)
instability_index (float, typically < 40.0 for stable peptides) — Stability prediction based on dipeptide composition
isoelectric_point (float, pH scale ~0–14) — pH at which peptide net charge is zero (computed using EMBOSS pKa scale)
mass_shift (float) — Mass shift value (Daltons, Da)
molecular_weight (float, Daltons, Da) — Peptide molecular mass (computed using ExPASy formulas)
mz (float) — Mass-to-charge ratio (m/z)
hydrophobic_moment_profile (array of floats, length: sequence length - window size + 1) — Sliding-window hydrophobic moment values (window size: 11, angle: 100°)
hydrophobicity_profile (array of floats, length: sequence length - window size + 1) — Sliding-window hydrophobicity index values (Eisenberg scale, window size: 11)
linker_preference_profile (array of floats, length: sequence length) — Linker preference scores per amino acid position
Example response:
Performance¶
Technical Specifications: - The Peptides API supports a maximum batch size of 10 sequences per request. - Maximum sequence length is capped at 2048 amino acids. - The API does not utilize GPU acceleration; it operates on CPU resources.
Performance Metrics: - Typical processing time is rapid, with completion times often within seconds per batch, depending on sequence complexity. - The Peptides API is optimized for high-throughput analysis, enabling efficient handling of multiple sequences in parallel.
Comparative Performance: - The Peptides API offers faster processing times compared to more computationally intensive models like AlphaFold2, which focus on detailed structural predictions. - While less resource-intensive than models like ESMFold, the Peptides API provides robust analysis of antimicrobial peptide properties with high accuracy. - In comparison to other sequence analysis models, Peptides excels in processing speed due to its streamlined focus on physicochemical property computation.
Optimization and Scalability: - The Peptides API is designed for scalability, allowing seamless integration into larger workflows for high-throughput peptide analysis. - BioLM’s infrastructure ensures consistent performance even under high-demand scenarios, maintaining quick turnaround times.
Accuracy and Reliability: - The API achieves high predictive accuracy for antimicrobial peptide classification, with reported accuracies of up to 95% using linear discriminant analysis. - It provides reliable computation of ten key structural characteristics critical for peptide classification and design.
Use Case Suitability: - Ideal for applications requiring rapid computation of physicochemical properties without the overhead of full structural prediction. - Particularly suited for researchers focusing on antimicrobial peptide design and classification, where speed and accuracy are paramount.
Applications¶
Rapid screening and prioritization of antimicrobial peptide (AMP) candidates by computing physicochemical descriptors (e.g., hydrophobic moment, net charge, instability index), enabling biotech companies to efficiently identify promising peptides for therapeutic development or agricultural biocontrol; not optimal for peptides longer than 50 amino acids due to limitations in descriptor accuracy.
Optimization of peptide-based antibiotics by calculating the Boman index and hydrophobicity profiles, helping researchers predict peptide-membrane interactions and improve selectivity against bacterial membranes; limited applicability for peptides that act via intracellular targets.
Stability assessment of antimicrobial peptides through aliphatic and instability indices, allowing protein engineers to identify candidates with increased thermostability and shelf-life for pharmaceutical formulations; less effective for peptides containing unnatural amino acids or extensive post-translational modifications.
Classification and filtering of novel peptide sequences using computed physicochemical properties (e.g., isoelectric point, amino acid composition) combined with machine learning methods (e.g., linear discriminant analysis, regression trees), enabling biotech companies to rapidly differentiate antimicrobial peptides from non-antimicrobial peptides; accuracy may decrease for peptides significantly divergent from training datasets.
Integration with GROMACS molecular dynamics simulations by reading and plotting XVG output files, facilitating detailed biophysical characterization of peptide-membrane interactions, essential for rational peptide design; not suitable for direct analysis of non-GROMACS molecular dynamics data formats.
Limitations¶
Maximum Sequence Length: The API accepts sequences up to
2048amino acids; longer sequences must be truncated or split into smaller segments.Batch Size: Requests are limited to a maximum
batch_sizeof10sequences per API call.The Peptides algorithm computes physicochemical properties useful for antimicrobial peptide classification and design; however, it does not provide predictions of biological activity or antimicrobial potency directly.
The algorithm calculates peptide descriptors based solely on primary amino acid sequences and does not incorporate tertiary structure or conformational dynamics, which may limit accuracy for peptides whose activity strongly depends on 3D structure.
This method is optimized for short peptides (<50 amino acids) typical of antimicrobial peptides; accuracy and relevance of computed properties may decrease for significantly longer protein sequences.
The Peptides algorithm is not designed for tasks requiring sequence embeddings or encodings suitable for clustering, visualization, or downstream machine learning tasks beyond basic classification based on physicochemical descriptors.
How We Use It¶
BioLM leverages the Peptides algorithm to streamline and enhance the analysis and engineering of antimicrobial peptides, crucial for developing new antibiotics with lower resistance profiles. By integrating the Peptides algorithm into our comprehensive ML pipelines, BioLM accelerates the discovery and optimization of novel peptide sequences, enabling precise tuning of physicochemical properties for specific applications. This integration allows seamless data flow and analysis, facilitating rapid classification and design iterations within larger protein engineering workflows. The Peptides algorithm complements other BioLM services by providing essential insights into peptide stability and interactions, driving efficient research and practical outcomes in protein design.
Accelerates peptide classification and design: Integrates seamlessly with BioLM’s predictive models to refine peptide sequences for desired properties.
Facilitates multi-round optimization: Enables iterative feedback loops in peptide engineering processes, enhancing the precision of design and application.
References¶
Osorio, D., Rondón-Villarreal, P., & Torres, R. (2015). Peptides: A Package for Data Mining of Antimicrobial Peptides. The R Journal, 7(1), 4–14.
