Mutation Scanning¶
Score all single-point variants of a target sequence and visualize the mutational landscape.
⚠️ Preview Feature — The
biolmai.pipelinemodule used in this guide is currently in preview and not yet publicly released. Access is available to early users on request. Contact us to get access.
What you'll learn:
- Generating a complete single-point mutant library programmatically
- Running multi-model predictions with parallel stages
- Visualizing a ΔTm mutational landscape heatmap
- Multi-model consensus ranking
Requirements:
pip install biolmai[pipeline] matplotlib numpy
export BIOLMAI_TOKEN=your-token-here
Setup¶
import os
from biolmai.pipeline import (
DataPipeline, DuckDBDataStore,
ThresholdFilter, RankingFilter,
ValidAminoAcidFilter, EmbeddingSpec,
DiversitySamplingFilter,
)
TOKEN = os.environ.get("BIOLMAI_TOKEN", "")
if not TOKEN:
raise EnvironmentError(
"Set BIOLMAI_TOKEN before running.\n"
"Get one at https://biolm.ai/ui/accounts/user-api-tokens/"
)
import numpy as np
import matplotlib.pyplot as pltGenerate the single-point mutant library¶
WILD_TYPE = "MKTAYIAKQRQISFVKSHFSRQLEER"
CDR3_START, CDR3_END = 5, 22 # 17-residue target region
AMINO_ACIDS = "ACDEFGHIKLMNPQRSTVWY"
variants = [WILD_TYPE] # include wild type as reference
for pos in range(CDR3_START, CDR3_END):
for aa in AMINO_ACIDS:
if aa != WILD_TYPE[pos]:
mutant = WILD_TYPE[:pos] + aa + WILD_TYPE[pos+1:]
variants.append(mutant)
print(f"{len(variants)} sequences ({len(variants)-1} single-point variants + wild type)")Score with multiple models¶
Both prediction stages run in parallel. 324 sequences × 2 models, all cached in DuckDB.
from biolmai.pipeline import DataPipeline, ThresholdFilter, RankingFilter
pipeline = DataPipeline(sequences=variants, verbose=True)
pipeline.add_prediction(
"temperature-regression", extractions="prediction",
columns="melting_temperature", stage_name="tm",
)
pipeline.add_prediction(
"biolmsol", extractions="solubility_score",
columns="solubility", stage_name="sol",
)
# Keep variants that beat wild-type Tm floor, then top 20 by solubility
pipeline.add_filter(ThresholdFilter("melting_temperature", min_value=55.0))
pipeline.add_filter(RankingFilter("solubility", n=20, ascending=False))
pipeline.run()
pipeline.summary()Compute ΔTm relative to wild type¶
df = pipeline.query("""
SELECT s.sequence,
MAX(CASE WHEN p.prediction_type = 'melting_temperature' THEN p.value END) AS tm,
MAX(CASE WHEN p.prediction_type = 'solubility' THEN p.value END) AS solubility
FROM sequences s
JOIN predictions p ON s.sequence_id = p.sequence_id
GROUP BY s.sequence
""")
wt_tm = df[df["sequence"] == WILD_TYPE]["tm"].iloc[0]
df["delta_tm"] = df["tm"] - wt_tm
print(f"Wild-type Tm: {wt_tm:.1f}°C")
df.sort_values("delta_tm", ascending=False).head(10)Mutational landscape heatmap¶
Each cell shows the predicted ΔTm for substituting the column's wild-type residue with the row's amino acid. Red = stabilizing, blue = destabilizing.
positions = list(range(CDR3_START, CDR3_END))
aa_list = sorted(AMINO_ACIDS)
heatmap = np.full((len(aa_list), len(positions)), np.nan)
for i, aa in enumerate(aa_list):
for j, pos in enumerate(positions):
if aa == WILD_TYPE[pos]:
continue
mutant = WILD_TYPE[:pos] + aa + WILD_TYPE[pos+1:]
row = df[df["sequence"] == mutant]
if len(row) > 0:
heatmap[i, j] = row["delta_tm"].iloc[0]
fig, ax = plt.subplots(figsize=(14, 6))
im = ax.imshow(heatmap, cmap="RdBu_r", aspect="auto", vmin=-10, vmax=10)
ax.set_xticks(range(len(positions)))
ax.set_xticklabels([WILD_TYPE[p] + str(p+1) for p in positions], rotation=90)
ax.set_yticks(range(len(aa_list)))
ax.set_yticklabels(aa_list)
plt.colorbar(im, ax=ax, label="ΔTm (°C) vs. wild type")
ax.set_title("Saturation mutagenesis: predicted ΔTm landscape")
ax.set_xlabel("Position (wild-type residue + number)")
ax.set_ylabel("Substitution amino acid")
plt.tight_layout()
plt.show()Multi-model consensus¶
Normalize scores across models and rank by average — mutations that score well on independent predictors are more likely to validate.
import pandas as pd
df_filt = pipeline.get_final_data()
df_filt["tm_norm"] = (df_filt["melting_temperature"] - df_filt["melting_temperature"].mean()) / df_filt["melting_temperature"].std()
df_filt["sol_norm"] = (df_filt["solubility"] - df_filt["solubility"].mean()) / df_filt["solubility"].std()
df_filt["consensus"] = (df_filt["tm_norm"] + df_filt["sol_norm"]) / 2
print("Top 10 consensus candidates:")
df_filt.nlargest(10, "consensus")[["sequence", "melting_temperature", "solubility", "consensus"]]