Mutation Scanning¶

Score all single-point variants of a target sequence and visualize the mutational landscape.

⚠️ Preview Feature — The biolmai.pipeline module used in this guide is currently in preview and not yet publicly released. Access is available to early users on request. Contact us to get access.

What you'll learn:

Generating a complete single-point mutant library programmatically
Running multi-model predictions with parallel stages
Visualizing a ΔTm mutational landscape heatmap
Multi-model consensus ranking

Requirements:

pip install biolmai[pipeline] matplotlib numpy
export BIOLMAI_TOKEN=your-token-here

Setup¶

import os
from biolmai.pipeline import (
    DataPipeline, DuckDBDataStore,
    ThresholdFilter, RankingFilter,
    ValidAminoAcidFilter, EmbeddingSpec,
    DiversitySamplingFilter,
)

TOKEN = os.environ.get("BIOLMAI_TOKEN", "")
if not TOKEN:
    raise EnvironmentError(
        "Set BIOLMAI_TOKEN before running.\n"
        "Get one at https://biolm.ai/ui/accounts/user-api-tokens/"
    )
import numpy as np
import matplotlib.pyplot as plt

Generate the single-point mutant library¶

WILD_TYPE = "MKTAYIAKQRQISFVKSHFSRQLEER"
CDR3_START, CDR3_END = 5, 22   # 17-residue target region
AMINO_ACIDS = "ACDEFGHIKLMNPQRSTVWY"

variants = [WILD_TYPE]  # include wild type as reference
for pos in range(CDR3_START, CDR3_END):
    for aa in AMINO_ACIDS:
        if aa != WILD_TYPE[pos]:
            mutant = WILD_TYPE[:pos] + aa + WILD_TYPE[pos+1:]
            variants.append(mutant)

print(f"{len(variants)} sequences ({len(variants)-1} single-point variants + wild type)")

Score with multiple models¶

Both prediction stages run in parallel. 324 sequences × 2 models, all cached in DuckDB.

from biolmai.pipeline import DataPipeline, ThresholdFilter, RankingFilter

pipeline = DataPipeline(sequences=variants, verbose=True)

pipeline.add_prediction(
    "temperature-regression", extractions="prediction",
    columns="melting_temperature", stage_name="tm",
)
pipeline.add_prediction(
    "biolmsol", extractions="solubility_score",
    columns="solubility", stage_name="sol",
)

# Keep variants that beat wild-type Tm floor, then top 20 by solubility
pipeline.add_filter(ThresholdFilter("melting_temperature", min_value=55.0))
pipeline.add_filter(RankingFilter("solubility", n=20, ascending=False))

pipeline.run()
pipeline.summary()

Compute ΔTm relative to wild type¶

df = pipeline.query("""
    SELECT s.sequence,
           MAX(CASE WHEN p.prediction_type = 'melting_temperature' THEN p.value END) AS tm,
           MAX(CASE WHEN p.prediction_type = 'solubility' THEN p.value END) AS solubility
    FROM sequences s
    JOIN predictions p ON s.sequence_id = p.sequence_id
    GROUP BY s.sequence
""")

wt_tm = df[df["sequence"] == WILD_TYPE]["tm"].iloc[0]
df["delta_tm"] = df["tm"] - wt_tm
print(f"Wild-type Tm: {wt_tm:.1f}°C")
df.sort_values("delta_tm", ascending=False).head(10)

Mutational landscape heatmap¶

Each cell shows the predicted ΔTm for substituting the column's wild-type residue with the row's amino acid. Red = stabilizing, blue = destabilizing.

positions = list(range(CDR3_START, CDR3_END))
aa_list = sorted(AMINO_ACIDS)
heatmap = np.full((len(aa_list), len(positions)), np.nan)

for i, aa in enumerate(aa_list):
    for j, pos in enumerate(positions):
        if aa == WILD_TYPE[pos]:
            continue
        mutant = WILD_TYPE[:pos] + aa + WILD_TYPE[pos+1:]
        row = df[df["sequence"] == mutant]
        if len(row) > 0:
            heatmap[i, j] = row["delta_tm"].iloc[0]

fig, ax = plt.subplots(figsize=(14, 6))
im = ax.imshow(heatmap, cmap="RdBu_r", aspect="auto", vmin=-10, vmax=10)
ax.set_xticks(range(len(positions)))
ax.set_xticklabels([WILD_TYPE[p] + str(p+1) for p in positions], rotation=90)
ax.set_yticks(range(len(aa_list)))
ax.set_yticklabels(aa_list)
plt.colorbar(im, ax=ax, label="ΔTm (°C) vs. wild type")
ax.set_title("Saturation mutagenesis: predicted ΔTm landscape")
ax.set_xlabel("Position (wild-type residue + number)")
ax.set_ylabel("Substitution amino acid")
plt.tight_layout()
plt.show()

Multi-model consensus¶

Normalize scores across models and rank by average — mutations that score well on independent predictors are more likely to validate.

import pandas as pd

df_filt = pipeline.get_final_data()
df_filt["tm_norm"] = (df_filt["melting_temperature"] - df_filt["melting_temperature"].mean()) / df_filt["melting_temperature"].std()
df_filt["sol_norm"] = (df_filt["solubility"] - df_filt["solubility"].mean()) / df_filt["solubility"].std()
df_filt["consensus"] = (df_filt["tm_norm"] + df_filt["sol_norm"]) / 2

print("Top 10 consensus candidates:")
df_filt.nlargest(10, "consensus")[["sequence", "melting_temperature", "solubility", "consensus"]]

Next Steps¶

Check out additional tutorials at jupyter.biolm.ai, or head over to our BioLM Documentation to explore additional models and functionality.

All Examples

Mutation Scanning¶

Setup¶

Generate the single-point mutant library¶

Score with multiple models¶

Compute ΔTm relative to wild type¶

Mutational landscape heatmap¶

Multi-model consensus¶

Next Steps¶

See more use-cases and APIs on your BioLM Console Catalog.¶

BioLM hosts deep learning models and runs inference at scale. You do the science.¶

Contact us to learn more.¶

Accelerate yourLead generation