Mutation Scanning

Score all single-point variants of a target sequence and visualize the mutational landscape.




⚠️ Preview Feature — The biolmai.pipeline module used in this guide is currently in preview and not yet publicly released. Access is available to early users on request. Contact us to get access.

What you'll learn:

  • Generating a complete single-point mutant library programmatically
  • Running multi-model predictions with parallel stages
  • Visualizing a ΔTm mutational landscape heatmap
  • Multi-model consensus ranking

Requirements:

pip install biolmai[pipeline] matplotlib numpy
export BIOLMAI_TOKEN=your-token-here

Setup

import os
from biolmai.pipeline import (
    DataPipeline, DuckDBDataStore,
    ThresholdFilter, RankingFilter,
    ValidAminoAcidFilter, EmbeddingSpec,
    DiversitySamplingFilter,
)

TOKEN = os.environ.get("BIOLMAI_TOKEN", "")
if not TOKEN:
    raise EnvironmentError(
        "Set BIOLMAI_TOKEN before running.\n"
        "Get one at https://biolm.ai/ui/accounts/user-api-tokens/"
    )
import numpy as np
import matplotlib.pyplot as plt

Generate the single-point mutant library

WILD_TYPE = "MKTAYIAKQRQISFVKSHFSRQLEER"
CDR3_START, CDR3_END = 5, 22   # 17-residue target region
AMINO_ACIDS = "ACDEFGHIKLMNPQRSTVWY"

variants = [WILD_TYPE]  # include wild type as reference
for pos in range(CDR3_START, CDR3_END):
    for aa in AMINO_ACIDS:
        if aa != WILD_TYPE[pos]:
            mutant = WILD_TYPE[:pos] + aa + WILD_TYPE[pos+1:]
            variants.append(mutant)

print(f"{len(variants)} sequences ({len(variants)-1} single-point variants + wild type)")

Score with multiple models

Both prediction stages run in parallel. 324 sequences × 2 models, all cached in DuckDB.

from biolmai.pipeline import DataPipeline, ThresholdFilter, RankingFilter

pipeline = DataPipeline(sequences=variants, verbose=True)

pipeline.add_prediction(
    "temperature-regression", extractions="prediction",
    columns="melting_temperature", stage_name="tm",
)
pipeline.add_prediction(
    "biolmsol", extractions="solubility_score",
    columns="solubility", stage_name="sol",
)

# Keep variants that beat wild-type Tm floor, then top 20 by solubility
pipeline.add_filter(ThresholdFilter("melting_temperature", min_value=55.0))
pipeline.add_filter(RankingFilter("solubility", n=20, ascending=False))

pipeline.run()
pipeline.summary()

Compute ΔTm relative to wild type

df = pipeline.query("""
    SELECT s.sequence,
           MAX(CASE WHEN p.prediction_type = 'melting_temperature' THEN p.value END) AS tm,
           MAX(CASE WHEN p.prediction_type = 'solubility' THEN p.value END) AS solubility
    FROM sequences s
    JOIN predictions p ON s.sequence_id = p.sequence_id
    GROUP BY s.sequence
""")

wt_tm = df[df["sequence"] == WILD_TYPE]["tm"].iloc[0]
df["delta_tm"] = df["tm"] - wt_tm
print(f"Wild-type Tm: {wt_tm:.1f}°C")
df.sort_values("delta_tm", ascending=False).head(10)

Mutational landscape heatmap

Each cell shows the predicted ΔTm for substituting the column's wild-type residue with the row's amino acid. Red = stabilizing, blue = destabilizing.

positions = list(range(CDR3_START, CDR3_END))
aa_list = sorted(AMINO_ACIDS)
heatmap = np.full((len(aa_list), len(positions)), np.nan)

for i, aa in enumerate(aa_list):
    for j, pos in enumerate(positions):
        if aa == WILD_TYPE[pos]:
            continue
        mutant = WILD_TYPE[:pos] + aa + WILD_TYPE[pos+1:]
        row = df[df["sequence"] == mutant]
        if len(row) > 0:
            heatmap[i, j] = row["delta_tm"].iloc[0]

fig, ax = plt.subplots(figsize=(14, 6))
im = ax.imshow(heatmap, cmap="RdBu_r", aspect="auto", vmin=-10, vmax=10)
ax.set_xticks(range(len(positions)))
ax.set_xticklabels([WILD_TYPE[p] + str(p+1) for p in positions], rotation=90)
ax.set_yticks(range(len(aa_list)))
ax.set_yticklabels(aa_list)
plt.colorbar(im, ax=ax, label="ΔTm (°C) vs. wild type")
ax.set_title("Saturation mutagenesis: predicted ΔTm landscape")
ax.set_xlabel("Position (wild-type residue + number)")
ax.set_ylabel("Substitution amino acid")
plt.tight_layout()
plt.show()

Multi-model consensus

Normalize scores across models and rank by average — mutations that score well on independent predictors are more likely to validate.

import pandas as pd

df_filt = pipeline.get_final_data()
df_filt["tm_norm"] = (df_filt["melting_temperature"] - df_filt["melting_temperature"].mean()) / df_filt["melting_temperature"].std()
df_filt["sol_norm"] = (df_filt["solubility"] - df_filt["solubility"].mean()) / df_filt["solubility"].std()
df_filt["consensus"] = (df_filt["tm_norm"] + df_filt["sol_norm"]) / 2

print("Top 10 consensus candidates:")
df_filt.nlargest(10, "consensus")[["sequence", "melting_temperature", "solubility", "consensus"]]

Next Steps

Check out additional tutorials at jupyter.biolm.ai, or head over to our BioLM Documentation to explore additional models and functionality.

See more use-cases and APIs on your BioLM Console Catalog.


BioLM hosts deep learning models and runs inference at scale. You do the science.

Contact us to learn more.

Accelerate yourLead generation

BioLM offers tailored AI solutions to meet your experimental needs. We deliver top-tier results with our model-agnostic approach, powered by our highly scalable and real-time GPU-backed APIs and years of experience in biological data modeling, all at a competitive price.

CTA

We speak the language of bio-AI

© 2022 - 2026 BioLM. All Rights Reserved.