GPT2 for Semi-Directed Antibody Generation¶

Set Your API Token¶

In order to use the BioLM API, you need to have a token. You can get one from the User API Tokens page.

Paste the API token you generated in the cell below, as the value of the variable BIOLMAI_TOKEN.

BIOLMAI_TOKEN = " "  # !!! YOUR API TOKEN HERE !!!

API Call with Python requests¶

We need to make sure we have the Python requests module loaded first.

try:
    # Install packages to make API requests in JLite
    import micropip
    await micropip.install('requests')
    await micropip.install('pyodide-http')
    # Patch requests for in-browser support
    import pyodide_http
    pyodide_http.patch_all()
except ModuleNotFoundError:
    pass  # Won't be using micropip outside of JLite

import requests
from IPython.display import JSON  # Helpful UI for JSON display


import pandas as pd
import os, sys
import json
import datetime
import urllib3

def generate_gpt2_cv2_hchain(seed_seq=None):
    """POST create a new GPT2 antibody from fine-tuned SARS-Cov2 immune responses."""


    url = "https://biolm.ai/api/v1/models/gpt2_sarscovd_heavy/generate/"
    
    if not seed_seq:
        seed_seq = ''
        

    payload = json.dumps({
      "instances": [
        {
          "data": {
            "text": seed_seq
          }
        }
      ]
    })
    
    headers = {
        "Content-Type": "application/json",
        "Authorization": f"Token {BIOLMAI_TOKEN.strip()}",
    }

    response = requests.request("POST", url, headers=headers, data=payload, timeout=480)
    
    resp_json = response.json()
    
    return resp_json['predictions']['generated']

resp = generate_gpt2_cv2_hchain('EVQ')

resp

df = pd.DataFrame(['EVQ' for _ in range(10000)], columns=['seed_seq'])

def apply_gen_abs(seed_seq):
    g = generate_gpt2_cv2_hchain(seed_seq)
    _d = pd.DataFrame.from_dict(g)
    _d = _d.query('perplexity <= 125.0').reset_index(drop=True)
    _d = _d.loc[_d.text.str.len() <= 256, :].reset_index(drop=True)
    return _d

generated_seq_dfs = df.seed_seq.iloc[:2500].apply(apply_gen_abs)  # use parallel_apply and pandarralel for parallel requests

generated_seqs = pd.concat(list(generated_seq_dfs), axis=0)
generated_seqs['len'] = generated_seqs.text.str.len()

generated_seqs.sort_values('perplexity', ascending=True).head(10)

generated_seqs.sample(10)

generated_seqs.to_csv('generated_sars_cov2_ab_seqs.csv', index=False)

generated_seqs.shape

The perplexity measure is correlated with similarity to known molecules - the lower the values, the more likely the sequence folds into something real. There are ~9.5k sequences with a perplexity <= 125.0, to be further ranked and selected using other models now.

Rank with ESM-1v & Other Evaluations¶

In order to pull out likely functional sequences, we could also score these with ESM-1v - or any ESM flavor - since those models were trained on functional sequences only. See In silico Deep Mutational Scan for more info.

We could also see how close the low-perplexity generated sequences are to those in the test set. Align or calculate Levenshtein distances from antibodies in the test set. Number the antibodies so we can assess their CDR loops comapred to known SARS-Cov-2 antibodies. And of course many other evaluations we could make, which will be up to you.

Next Steps¶

Check out additional tutorials at jupyter.biolm.ai, or head over to our BioLM Documentation to explore additional models and functionality.

See more use-cases and APIs on your BioLM Console Catalog.¶

BioLM hosts deep learning models and runs inference at scale. You do the science.¶

Contact us to learn more.¶

<span></span>

Previous: Generation for Antibody Engineering Next: Intrinsic Scoring for Antibody Engineering

Accelerate yourLead generation

BioLM offers tailored AI solutions to meet your experimental needs. We deliver top-tier results with our model-agnostic approach, powered by our highly scalable and real-time GPU-backed APIs and years of experience in biological data modeling, all at a competitive price.

All Guides