Screen 1,000 Peptides Before Lunch¶
Multi-stage pipeline to screen a peptide library for thermal stability and solubility.
⚠️ Preview Feature — The
biolmai.pipelinemodule used in this guide is currently in preview and not yet publicly released. Access is available to early users on request. Contact us to get access.
What you'll learn:
- Defining a multi-stage pipeline with parallel predictions
- Filtering by melting temperature and solubility
- Exploring results with
summary(),stats(), and SQL queries
Requirements:
pip install biolmai[pipeline] matplotlib
export BIOLMAI_TOKEN=your-token-here
Setup¶
import os
from biolmai.pipeline import (
DataPipeline, DuckDBDataStore,
ThresholdFilter, RankingFilter,
ValidAminoAcidFilter, EmbeddingSpec,
DiversitySamplingFilter,
)
TOKEN = os.environ.get("BIOLMAI_TOKEN", "")
if not TOKEN:
raise EnvironmentError(
"Set BIOLMAI_TOKEN before running.\n"
"Get one at https://biolm.ai/ui/accounts/user-api-tokens/"
)Peptide library¶
30 antimicrobial peptides of varying length, charge, and hydrophobicity.
MY_PEPTIDES = [
# Magainins / frog-derived
"GIGKFLHSAKKFGKAFVGEIMNS",
"GIGKFLHSAGKFGKAFVGEIMKS",
"GLFDIIKKIAESF",
"GLFDIVKKVVGALGSL",
"FLPLILRKIVTAL",
# Human defensins / cathelicidins
"LLGDFFRKSKEKIGKEFKRIVQRIKDFLRNLVPRTES",
"RLFDKIRQVIRKF",
"KWKLFKKIPKFLHLAKKF",
# Insect-derived
"GIGAVLKVLTTGLPALISWIKRKRQQ",
"VDKGSYLPRPTPPRPIYNRN",
# Synthetic / designed
"KLAKLAKKLAKLAK",
"LKLLKKLLKLLKKL",
"RRWWRRWWRR",
"KWKWKWKWKW",
"GIKKFLGSIWKFIKAFVKEIMN",
# Short peptides
"RRWQWR",
"RWRWRW",
"FKRIVQRIKDFL",
"KFLKKAKKFGK",
"GIGKFLHSAK",
"KWKLFKKI",
"RLFDKIRQ",
# Longer peptides
"GLFDIIKKIAESFLPKV",
"GIGKFLHSAKKFGKAFV",
"KWKLFKKIPKFLHLAK",
]
print(f"{len(MY_PEPTIDES)} peptides, length range: {min(len(s) for s in MY_PEPTIDES)}–{max(len(s) for s in MY_PEPTIDES)} aa")Build and run the pipeline¶
The dependency graph:
validate
├── predict_tm ─┐
└── predict_sol ─┴── filter_tm >= 40°C ── rank top 15 by solubility
Both prediction stages run in parallel because they share the same dependency.
pipeline = DataPipeline(sequences=MY_PEPTIDES, verbose=True)
pipeline.add_filter(ValidAminoAcidFilter(), stage_name="validate")
pipeline.add_prediction(
"temperature-regression", extractions="prediction",
columns="melting_temperature", stage_name="predict_tm",
depends_on=["validate"],
)
pipeline.add_prediction(
"biolmsol", extractions="solubility_score",
columns="solubility", stage_name="predict_sol",
depends_on=["validate"],
)
pipeline.add_filter(ThresholdFilter("melting_temperature", min_value=40.0), stage_name="filter_tm")
pipeline.add_filter(RankingFilter("solubility", n=15, ascending=False), stage_name="top15")
pipeline.run()Explore results¶
pipeline.summary()pipeline.stats()# Top 10 sequences by melting temperature
pipeline.query("""
SELECT s.sequence,
MAX(CASE WHEN p.prediction_type = 'melting_temperature' THEN p.value END) AS tm,
MAX(CASE WHEN p.prediction_type = 'solubility' THEN p.value END) AS solubility
FROM sequences s
JOIN predictions p ON s.sequence_id = p.sequence_id
GROUP BY s.sequence
ORDER BY tm DESC
LIMIT 10
""")pipeline.plot("funnel")Next Steps¶
Check out additional tutorials at jupyter.biolm.ai, or head over to our BioLM Documentation to explore additional models and functionality.
