BioLMTox Toxin Similarity

Enhance biosecurity predictions with the BioLMTox embedding endpoint.

from IPython.display import JSON  # Helpful UI for JSON display
import time
from biolmai import BioLM
import requests  # Will use to make calls to BioLM.ai
import csv  # To read example data
import numpy as np
from numpy.linalg import norm
lines = []
with open('data/protein/data/PLA2.csv', newline='') as csvfile:
    reader = csv.reader(csvfile)
    for row in reader:
        lines.append(row)
print(lines)
[['label', 'sequence'], ['toxin', 'MHPAHLLVLLAVCVSLLGASDIPPLPLNLAQFGFMIRCANGGSRSPLDYTDYGCYCGKGGRGTPVDDLDRCCQVHDECYGEAEKRLGCSPFVTLYSWKCYGKAPSCNTKTDCQRFVCNCDAKAAECFARSPYQKKNWNINTKARCK'], ['toxin', 'MRTLWIMAVLLVGVEGSLVELGKMILQETGKNPVTSYGAYGCNCGVLGRGKPKDATDRCCYVHKCCYKKLTDCNPKKDRYSYSWKDKTIVCGENNSCLKELCECDKAVAICLRENLDTYNKKYKNNYLKPFCKKADPC']]

Load the example toxin sequences from the CSV file

SEQ1 = lines[1][1]
SEQ2 = lines[2][1]
print("Sequence length 1: {}".format(len(SEQ1)))
print("Sequence length 2: {}".format(len(SEQ2)))
Sequence length 1: 146
Sequence length 2: 138

Define Endpoint Params

Let's make a secure REST API request to BioLM API to quickly make the prediction on GPU.

# Make the request - let's time it!
start = time.time()
result = BioLM(entity="biolmtox2", action="encode", type="sequence", items=[SEQ1, SEQ2])
end = time.time()
print(f"BioLMTox prediction took {end - start:.4f} seconds.")

# If you wish to view the full result, you can expand the tree in the cell below
JSON(result)
BioLMTox prediction took 0.5072 seconds.
<IPython.core.display.JSON object>

The response is a list of embedding vectors, one vector of static length 640 for each input instance.

# Define similarity measure
def cos_similarity(a, b):
    return np.dot(a,b)/(norm(a)*norm(b))
# convert sequence embeddings to numpy arrays
em_1 = np.asarray(result[0]["mean_representation"])
em_2 = np.asarray(result[1]["mean_representation"])
# compute similarity measures
em_similarity = cos_similarity(em_1, em_2)
print(f'sequence embedding cosine similarity:\n{em_similarity}')
sequence embedding cosine similarity:
0.9651337988386686

The cosine similarity between the two toxin sequences is quite high, as expected, since one sequence is Phospholipase A2 OS2 and the other is Basic phospholipase A2 homolog MjTX-I. Both are related to Phospholipases and are snake venoms.

Next Steps

Check out additional tutorials at jupyter.biolm.ai, or head over to our BioLM Documentation to explore additional models and functionality.

See more use-cases and APIs on your BioLM Console Catalog.


BioLM hosts deep learning models and runs inference at scale. You do the science.

Contact us to learn more.

Accelerate yourLead generation

BioLM offers tailored AI solutions to meet your experimental needs. We deliver top-tier results with our model-agnostic approach, powered by our highly scalable and real-time GPU-backed APIs and years of experience in biological data modeling, all at a competitive price.

CTA

We speak the language of bio-AI

© 2022 - 2025 BioLM. All Rights Reserved.