In order to use the BioLM API, you need to have a token. You can get one from the User API Tokens page.
Paste the API token you generated in the cell below, as the value
of the variable BIOLMAI_TOKEN.
BIOLMAI_TOKEN = " "  # !!! YOUR API TOKEN HERE !!!We need to make sure we have the Python requests module loaded first.
try:
    # Install packages to make API requests in JLite
    import micropip
    await micropip.install('requests')
    await micropip.install('pyodide-http')
    # Patch requests for in-browser support
    import pyodide_http
    pyodide_http.patch_all()
except ModuleNotFoundError:
    pass  # Won't be using micropip outside of JLite
import requests
from IPython.display import JSON  # Helpful UI for JSON display
import pandas as pd
import os, sys
import json
import datetime
import urllib3def generate_gpt2_cv2_hchain(seed_seq=None):
    """POST create a new GPT2 antibody from fine-tuned SARS-Cov2 immune responses."""
    url = "https://biolm.ai/api/v1/models/gpt2_sarscovd_heavy/generate/"
    
    if not seed_seq:
        seed_seq = ''
        
    payload = json.dumps({
      "instances": [
        {
          "data": {
            "text": seed_seq
          }
        }
      ]
    })
    
    headers = {
        "Content-Type": "application/json",
        "Authorization": f"Token {BIOLMAI_TOKEN.strip()}",
    }
    response = requests.request("POST", url, headers=headers, data=payload, timeout=480)
    
    resp_json = response.json()
    
    return resp_json['predictions']['generated']resp = generate_gpt2_cv2_hchain('EVQ')
respdf = pd.DataFrame(['EVQ' for _ in range(10000)], columns=['seed_seq'])def apply_gen_abs(seed_seq):
    g = generate_gpt2_cv2_hchain(seed_seq)
    _d = pd.DataFrame.from_dict(g)
    _d = _d.query('perplexity <= 125.0').reset_index(drop=True)
    _d = _d.loc[_d.text.str.len() <= 256, :].reset_index(drop=True)
    return _dgenerated_seq_dfs = df.seed_seq.iloc[:2500].apply(apply_gen_abs)  # use parallel_apply and pandarralel for parallel requestsgenerated_seqs = pd.concat(list(generated_seq_dfs), axis=0)
generated_seqs['len'] = generated_seqs.text.str.len()generated_seqs.sort_values('perplexity', ascending=True).head(10)generated_seqs.sample(10)generated_seqs.to_csv('generated_sars_cov2_ab_seqs.csv', index=False)generated_seqs.shapeThe perplexity measure is correlated with similarity to known molecules - the lower the values, the more likely the sequence folds into something real. There are ~9.5k sequences with a perplexity <= 125.0, to be further ranked and selected using other models now.
In order to pull out likely functional sequences, we could also score these with ESM-1v - or any ESM flavor - since those models were trained on functional sequences only. See In silico Deep Mutational Scan for more info.
We could also see how close the low-perplexity generated sequences are to those in the test set. Align or calculate Levenshtein distances from antibodies in the test set. Number the antibodies so we can assess their CDR loops comapred to known SARS-Cov-2 antibodies. And of course many other evaluations we could make, which will be up to you.
Check out additional tutorials at jupyter.biolm.ai, or head over to our BioLM Documentation to explore additional models and functionality.
<span></span>BioLM offers tailored AI solutions to meet your experimental needs. We deliver top-tier results with our model-agnostic approach, powered by our highly scalable and real-time GPU-backed APIs and years of experience in biological data modeling, all at a competitive price.
