Screen 1,000 Peptides Before Lunch¶

Multi-stage pipeline to screen a peptide library for thermal stability and solubility.

⚠️ Preview Feature — The biolmai.pipeline module used in this guide is currently in preview and not yet publicly released. Access is available to early users on request. Contact us to get access.

What you'll learn:

Defining a multi-stage pipeline with parallel predictions
Filtering by melting temperature and solubility
Exploring results with summary(), stats(), and SQL queries

Requirements:

pip install biolmai[pipeline] matplotlib
export BIOLMAI_TOKEN=your-token-here

Setup¶

import os
from biolmai.pipeline import (
    DataPipeline, DuckDBDataStore,
    ThresholdFilter, RankingFilter,
    ValidAminoAcidFilter, EmbeddingSpec,
    DiversitySamplingFilter,
)

TOKEN = os.environ.get("BIOLMAI_TOKEN", "")
if not TOKEN:
    raise EnvironmentError(
        "Set BIOLMAI_TOKEN before running.\n"
        "Get one at https://biolm.ai/ui/accounts/user-api-tokens/"
    )

Peptide library¶

30 antimicrobial peptides of varying length, charge, and hydrophobicity.

MY_PEPTIDES = [
    # Magainins / frog-derived
    "GIGKFLHSAKKFGKAFVGEIMNS",
    "GIGKFLHSAGKFGKAFVGEIMKS",
    "GLFDIIKKIAESF",
    "GLFDIVKKVVGALGSL",
    "FLPLILRKIVTAL",
    # Human defensins / cathelicidins
    "LLGDFFRKSKEKIGKEFKRIVQRIKDFLRNLVPRTES",
    "RLFDKIRQVIRKF",
    "KWKLFKKIPKFLHLAKKF",
    # Insect-derived
    "GIGAVLKVLTTGLPALISWIKRKRQQ",
    "VDKGSYLPRPTPPRPIYNRN",
    # Synthetic / designed
    "KLAKLAKKLAKLAK",
    "LKLLKKLLKLLKKL",
    "RRWWRRWWRR",
    "KWKWKWKWKW",
    "GIKKFLGSIWKFIKAFVKEIMN",
    # Short peptides
    "RRWQWR",
    "RWRWRW",
    "FKRIVQRIKDFL",
    "KFLKKAKKFGK",
    "GIGKFLHSAK",
    "KWKLFKKI",
    "RLFDKIRQ",
    # Longer peptides
    "GLFDIIKKIAESFLPKV",
    "GIGKFLHSAKKFGKAFV",
    "KWKLFKKIPKFLHLAK",
]
print(f"{len(MY_PEPTIDES)} peptides, length range: {min(len(s) for s in MY_PEPTIDES)}–{max(len(s) for s in MY_PEPTIDES)} aa")

Build and run the pipeline¶

The dependency graph:

validate
   ├── predict_tm   ─┐
   └── predict_sol  ─┴── filter_tm >= 40°C ── rank top 15 by solubility

Both prediction stages run in parallel because they share the same dependency.

pipeline = DataPipeline(sequences=MY_PEPTIDES, verbose=True)

pipeline.add_filter(ValidAminoAcidFilter(), stage_name="validate")

pipeline.add_prediction(
    "temperature-regression", extractions="prediction",
    columns="melting_temperature", stage_name="predict_tm",
    depends_on=["validate"],
)
pipeline.add_prediction(
    "biolmsol", extractions="solubility_score",
    columns="solubility", stage_name="predict_sol",
    depends_on=["validate"],
)

pipeline.add_filter(ThresholdFilter("melting_temperature", min_value=40.0), stage_name="filter_tm")
pipeline.add_filter(RankingFilter("solubility", n=15, ascending=False), stage_name="top15")

pipeline.run()

Explore results¶

pipeline.summary()

pipeline.stats()

# Top 10 sequences by melting temperature
pipeline.query("""
    SELECT s.sequence,
           MAX(CASE WHEN p.prediction_type = 'melting_temperature' THEN p.value END) AS tm,
           MAX(CASE WHEN p.prediction_type = 'solubility' THEN p.value END) AS solubility
    FROM sequences s
    JOIN predictions p ON s.sequence_id = p.sequence_id
    GROUP BY s.sequence
    ORDER BY tm DESC
    LIMIT 10
""")

pipeline.plot("funnel")

Next Steps¶

Check out additional tutorials at jupyter.biolm.ai, or head over to our BioLM Documentation to explore additional models and functionality.

All Examples

Screen 1,000 Peptides Before Lunch¶

Setup¶

Peptide library¶

Build and run the pipeline¶

Explore results¶

Next Steps¶

See more use-cases and APIs on your BioLM Console Catalog.¶

BioLM hosts deep learning models and runs inference at scale. You do the science.¶

Contact us to learn more.¶

Accelerate yourLead generation