BioLMTox Toxin Similarity¶
Enhance biosecurity predictions with the BioLMTox embedding endpoint.
from IPython.display import JSON  # Helpful UI for JSON display
import time
from biolmai import BioLM
import requests  # Will use to make calls to BioLM.ai
import csv  # To read example data
import numpy as np
from numpy.linalg import normlines = []
with open('data/protein/data/PLA2.csv', newline='') as csvfile:
    reader = csv.reader(csvfile)
    for row in reader:
        lines.append(row)
print(lines)Load the example toxin sequences from the CSV file
SEQ1 = lines[1][1]
SEQ2 = lines[2][1]print("Sequence length 1: {}".format(len(SEQ1)))
print("Sequence length 2: {}".format(len(SEQ2)))SEQ1 is https://www.uniprot.org/uniprotkb/Q45Z47/entry and SEQ2 is https://www.uniprot.org/uniprotkb/P82114/entry. Both are snake venoms
Define Endpoint Params¶
Let's make a secure REST API request to BioLM API to quickly make the prediction on GPU.
# Make the request - let's time it!
start = time.time()
result = BioLM(entity="biolmtox2", action="encode", type="sequence", items=[SEQ1, SEQ2])
end = time.time()
print(f"BioLMTox prediction took {end - start:.4f} seconds.")
# If you wish to view the full result, you can expand the tree in the cell below
JSON(result)The response is a list of embedding vectors, one vector of static length 640 for each input instance.
# Define similarity measure
def cos_similarity(a, b):
    return np.dot(a,b)/(norm(a)*norm(b))# convert sequence embeddings to numpy arrays
em_1 = np.asarray(result[0]["mean_representation"])
em_2 = np.asarray(result[1]["mean_representation"])# compute similarity measures
em_similarity = cos_similarity(em_1, em_2)
print(f'sequence embedding cosine similarity:\n{em_similarity}')The cosine similarity between the two toxin sequences is quite high, as expected, since one sequence is Phospholipase A2 OS2 and the other is Basic phospholipase A2 homolog MjTX-I. Both are related to Phospholipases and are snake venoms.
Next Steps¶
Check out additional tutorials at jupyter.biolm.ai, or head over to our BioLM Documentation to explore additional models and functionality.
