Screen 1,000 Peptides Before Lunch

Multi-stage pipeline to screen a peptide library for thermal stability and solubility.




⚠️ Preview Feature — The biolmai.pipeline module used in this guide is currently in preview and not yet publicly released. Access is available to early users on request. Contact us to get access.

What you'll learn:

  • Defining a multi-stage pipeline with parallel predictions
  • Filtering by melting temperature and solubility
  • Exploring results with summary(), stats(), and SQL queries

Requirements:

pip install biolmai[pipeline] matplotlib
export BIOLMAI_TOKEN=your-token-here

Setup

import os
from biolmai.pipeline import (
    DataPipeline, DuckDBDataStore,
    ThresholdFilter, RankingFilter,
    ValidAminoAcidFilter, EmbeddingSpec,
    DiversitySamplingFilter,
)

TOKEN = os.environ.get("BIOLMAI_TOKEN", "")
if not TOKEN:
    raise EnvironmentError(
        "Set BIOLMAI_TOKEN before running.\n"
        "Get one at https://biolm.ai/ui/accounts/user-api-tokens/"
    )

Peptide library

30 antimicrobial peptides of varying length, charge, and hydrophobicity.

MY_PEPTIDES = [
    # Magainins / frog-derived
    "GIGKFLHSAKKFGKAFVGEIMNS",
    "GIGKFLHSAGKFGKAFVGEIMKS",
    "GLFDIIKKIAESF",
    "GLFDIVKKVVGALGSL",
    "FLPLILRKIVTAL",
    # Human defensins / cathelicidins
    "LLGDFFRKSKEKIGKEFKRIVQRIKDFLRNLVPRTES",
    "RLFDKIRQVIRKF",
    "KWKLFKKIPKFLHLAKKF",
    # Insect-derived
    "GIGAVLKVLTTGLPALISWIKRKRQQ",
    "VDKGSYLPRPTPPRPIYNRN",
    # Synthetic / designed
    "KLAKLAKKLAKLAK",
    "LKLLKKLLKLLKKL",
    "RRWWRRWWRR",
    "KWKWKWKWKW",
    "GIKKFLGSIWKFIKAFVKEIMN",
    # Short peptides
    "RRWQWR",
    "RWRWRW",
    "FKRIVQRIKDFL",
    "KFLKKAKKFGK",
    "GIGKFLHSAK",
    "KWKLFKKI",
    "RLFDKIRQ",
    # Longer peptides
    "GLFDIIKKIAESFLPKV",
    "GIGKFLHSAKKFGKAFV",
    "KWKLFKKIPKFLHLAK",
]
print(f"{len(MY_PEPTIDES)} peptides, length range: {min(len(s) for s in MY_PEPTIDES)}{max(len(s) for s in MY_PEPTIDES)} aa")

Build and run the pipeline

The dependency graph:

validate
   ├── predict_tm   ─┐
   └── predict_sol  ─┴── filter_tm >= 40°C ── rank top 15 by solubility

Both prediction stages run in parallel because they share the same dependency.

pipeline = DataPipeline(sequences=MY_PEPTIDES, verbose=True)

pipeline.add_filter(ValidAminoAcidFilter(), stage_name="validate")

pipeline.add_prediction(
    "temperature-regression", extractions="prediction",
    columns="melting_temperature", stage_name="predict_tm",
    depends_on=["validate"],
)
pipeline.add_prediction(
    "biolmsol", extractions="solubility_score",
    columns="solubility", stage_name="predict_sol",
    depends_on=["validate"],
)

pipeline.add_filter(ThresholdFilter("melting_temperature", min_value=40.0), stage_name="filter_tm")
pipeline.add_filter(RankingFilter("solubility", n=15, ascending=False), stage_name="top15")

pipeline.run()

Explore results

pipeline.summary()
pipeline.stats()
# Top 10 sequences by melting temperature
pipeline.query("""
    SELECT s.sequence,
           MAX(CASE WHEN p.prediction_type = 'melting_temperature' THEN p.value END) AS tm,
           MAX(CASE WHEN p.prediction_type = 'solubility' THEN p.value END) AS solubility
    FROM sequences s
    JOIN predictions p ON s.sequence_id = p.sequence_id
    GROUP BY s.sequence
    ORDER BY tm DESC
    LIMIT 10
""")
pipeline.plot("funnel")

Next Steps

Check out additional tutorials at jupyter.biolm.ai, or head over to our BioLM Documentation to explore additional models and functionality.

See more use-cases and APIs on your BioLM Console Catalog.


BioLM hosts deep learning models and runs inference at scale. You do the science.

Contact us to learn more.

Accelerate yourLead generation

BioLM offers tailored AI solutions to meet your experimental needs. We deliver top-tier results with our model-agnostic approach, powered by our highly scalable and real-time GPU-backed APIs and years of experience in biological data modeling, all at a competitive price.

CTA

We speak the language of bio-AI

© 2022 - 2026 BioLM. All Rights Reserved.