Fold First

Use Boltz-2 confidence scores as a coarse gate before downstream property predictions.




⚠️ Preview Feature — The biolmai.pipeline module used in this guide is currently in preview and not yet publicly released. Access is available to early users on request. Contact us to get access.

What you'll learn:

  • Using Boltz-2 in single-sequence (no-MSA) mode as a fast fold quality filter
  • Interpreting pLDDT, pTM, and interface PAE
  • Building a three-gate pipeline: fold → stability → solubility
  • When to use and when to skip the fold gate

Requirements:

pip install biolmai[pipeline] matplotlib
export BIOLMAI_TOKEN=your-token-here

Note: This notebook requires Boltz-2 access via the BioLM API. Confirm availability at biolm.ai/models.

Setup

import os
from biolmai.pipeline import (
    DataPipeline, DuckDBDataStore,
    ThresholdFilter, RankingFilter,
    ValidAminoAcidFilter, EmbeddingSpec,
    DiversitySamplingFilter,
)

TOKEN = os.environ.get("BIOLMAI_TOKEN", "")
if not TOKEN:
    raise EnvironmentError(
        "Set BIOLMAI_TOKEN before running.\n"
        "Get one at https://biolm.ai/ui/accounts/user-api-tokens/"
    )
import matplotlib.pyplot as plt

Designed library

A mixed library of designed sequences — some will fold well, some won't.

DESIGNED_LIBRARY = [
    # Well-folded scaffolds (expected high pLDDT)
    "MKTAYIAKQRQISFVKSHFSRQLEERVKILEQELEKAKEELKERLEELEKAKEEL",
    "GSHMDELYKAALEKAKQELKEAKQELKEAKQELKEAKQELKEAKQELKEAKQEL",
    "MHHHHHHSSGENLYFQGAEAAAKEAAAKEAAAKEAAAKEAAAKEAAAKEAAAK",
    "EVQLVESGGGLVQPGGSLRLSCAASGFNIKDTYIHWVRQAPGKGLEWVARI",
    "DIQMTQSPSSLSASVGDRVTITCRASQSISSYLNWYQQKPGKAPKLLIY",
    # Likely disordered / low confidence
    "GSGSGSGSGSGSGSGSGSGSGSGSGSGSGSGSGSGSGSGSGSGSGSGSGS",
    "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA",
    "GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG",
    "KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK",
    "EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE",
    # Mixed
    "GIGKFLHSAKKFGKAFVGEIMNS",
    "KWKLFKKIPKFLHLAKKF",
    "LLGDFFRKSKEKIGKEFKRIVQRIKDFLRNLVPRTES",
]
print(f"{len(DESIGNED_LIBRARY)} sequences in designed library")

The fold-first pipeline

Three gates in sequence:

  1. Boltz-2 (no-MSA) — fast coarse fold quality filter
  2. Melting temperature — thermal stability (only runs on sequences that pass the fold gate)
  3. Solubility ranking — final selection (only runs on sequences that pass both gates)
pipeline = DataPipeline(sequences=DESIGNED_LIBRARY, verbose=True)

# Gate 1: coarse fold quality (fast, no MSA)
pipeline.add_prediction(
    "boltz2", action="predict",
    params={"use_msa": False},
    extractions=["mean_plddt", "ptm"],
    columns=["plddt", "ptm"],
    stage_name="fold_coarse",
)
pipeline.add_filter(
    ThresholdFilter("plddt", min_value=65.0),
    stage_name="fold_gate",
)

# Gate 2: stability (only for fold-passing sequences)
pipeline.add_prediction(
    "temperature-regression", extractions="prediction",
    columns="melting_temperature", stage_name="tm",
    depends_on=["fold_gate"],
)
pipeline.add_filter(
    ThresholdFilter("melting_temperature", min_value=50.0),
    stage_name="tm_gate",
)

# Gate 3: solubility ranking
pipeline.add_prediction(
    "biolmsol", extractions="solubility_score",
    columns="solubility", stage_name="sol",
    depends_on=["tm_gate"],
)
pipeline.add_filter(
    RankingFilter("solubility", n=5, ascending=False),
    stage_name="top_5",
)

pipeline.run()
pipeline.summary()
pipeline.plot("funnel")

Inspect fold quality scores

fold_df = pipeline.query("""
    SELECT s.sequence,
           MAX(CASE WHEN p.prediction_type = 'plddt' THEN ROUND(p.value,1) END) AS plddt,
           MAX(CASE WHEN p.prediction_type = 'ptm'   THEN ROUND(p.value,3) END) AS ptm
    FROM sequences s
    JOIN predictions p ON s.sequence_id = p.sequence_id
    WHERE p.prediction_type IN ('plddt', 'ptm')
    GROUP BY s.sequence
    ORDER BY plddt DESC
""")
fold_df

When to use (and skip) the fold gate

Always use it for:

  • Designed enzyme variants — if it doesn't fold, it doesn't catalyze
  • Antibody and nanobody libraries — misfolded binders don't bind
  • Scaffold-based design — low pLDDT means the design failed at the first hurdle

Skip it for:

  • Short peptides (<30 residues) — structure predictors aren't reliable here
  • Intrinsically disordered proteins — low pLDDT is the correct prediction
  • Peptide binders or disordered linkers — disorder may be the feature

Next Steps

Check out additional tutorials at jupyter.biolm.ai, or head over to our BioLM Documentation to explore additional models and functionality.

See more use-cases and APIs on your BioLM Console Catalog.


BioLM hosts deep learning models and runs inference at scale. You do the science.

Contact us to learn more.

Accelerate yourLead generation

BioLM offers tailored AI solutions to meet your experimental needs. We deliver top-tier results with our model-agnostic approach, powered by our highly scalable and real-time GPU-backed APIs and years of experience in biological data modeling, all at a competitive price.

CTA

We speak the language of bio-AI

© 2022 - 2026 BioLM. All Rights Reserved.