Fast & Accurate PDB Prediction with ESMFold¶

Having the ability to use AlphaFold2, ESM, and other recent structural modeling NNs is great, but what if you don't want to leave Python, don't want to spin up a GPU, want to avoid conterization, or need to massively scale out your PDB file prediction / creation?

You can predict a PDB file for proteins up to 1024+ in length using the highly accurate ESMFold, scaled out and pre-loaded into memory on BioLM.ai. The API docs show an example protein and PDB string response.

# Import the BioLM SDK
import time
from biolmai import BioLM
import py3Dmol

SEQ = "MAETAVINHKKRNSPRIVQSNDLEAAYSLSRDQKRMLYLFVDQIRKSDGTLQEHDGICEIHVAKYAEIFGLTSAEASKDIRQALKSFAGKEVVFYRPEEDAGDEKGYESFPWFIKRAHSPSRGLYSVHINPYLIPFFIGLQNRFTQFRLSETKEITNPYAMRLYESLCQYRKPDGSGIVSLKIDWIIERYQLPQSYQRMPDFRRRFLQVCVNEINSRTPMRLSYIEKKKGRQTTHIVFSFRDITSMTTG"

print("Sequence length: {}".format(len(SEQ)))

Sequence length: 249

Make API Request¶

There is already a server on BioLM with ESMFold loaded into memory, so predictions should be fast.library.

SEQ = "MAETAVINHKKRNSPRIVQSNDLEAAYSLSRDQKRMLYLFVDQIRKSDGTLQEHDGICEIHVAKYAEIFGLTSAEASKDIRQALKSFAGKEVVFYRPEEDAGDEKGYESFPWFIKRAHSPSRGLYSVHINPYLIPFFIGLQNRFTQFRLSETKEITNPYAMRLYESLCQYRKPDGSGIVSLKIDWIIERYQLPQSYQRMPDFRRRFLQVCVNEINSRTPMRLSYIEKKKGRQTTHIVFSFRDITSMTTG"

start = time.time()
result = BioLM(entity="esmfold", action="predict", items=[{"sequence": SEQ}])
end = time.time()
print(f"ESMFold prediction took {end - start:.4f} seconds.")

ESMFold prediction took 0.7810 seconds.

If the model was starting cold, there would be an initial wait time of several minutese to load this large model into memory, after which subsequent API requests would respond normally, without delay. This is what is known as a model cold-start time. It is generally not very noticeable, except in this case since ESMFold is one of the largest protein models to date at the time of this writing.

Visualize Structure in 3D¶

We have the PDB file contents as a string. We can use it directly to visualize the structure.

# View the file contents first
import json

pdb_pred = result["pdb"]  # Extract the contents of the PDB file

json.dumps(pdb_pred)[:1000]  # Look at the first 1000 characters, since PDBs are long...

'"PARENT N/A\\nATOM      1  N   MET A   1     -24.201  39.742   4.574  1.00 95.19           N  \\nATOM      2  CA  MET A   1     -23.369  39.073   3.578  1.00 96.49           C  \\nATOM      3  C   MET A   1     -22.229  38.311   4.245  1.00 95.33           C  \\nATOM      4  CB  MET A   1     -22.807  40.084   2.578  1.00 94.53           C  \\nATOM      5  O   MET A   1     -21.274  38.917   4.734  1.00 87.34           O  \\nATOM      6  CG  MET A   1     -23.860  40.697   1.669  1.00 90.04           C  \\nATOM      7  SD  MET A   1     -23.145  41.882   0.464  1.00 92.34           S  \\nATOM      8  CE  MET A   1     -23.408  43.442   1.354  1.00 90.35           C  \\nATOM      9  N   ALA A   2     -22.354  37.046   4.473  1.00 94.99           N  \\nATOM     10  CA  ALA A   2     -21.304  36.222   5.067  1.00 95.54           C  \\nATOM     11  C   ALA A   2     -20.200  35.928   4.055  1.00 92.69           C  \\nATOM     12  CB  ALA A   2     -21.890  34.919   5.604  1.00 93.42           C  \\nATO'

Let's use the py3Dmol Python package to visualize the PDB here, in-browser.

view = py3Dmol.view(js='https://3Dmol.org/build/3Dmol-min.js', width=800, height=400)
view.addModel(pdb_pred, 'pdb')
view.setStyle({'model': -1}, {"cartoon": {'color': 'spectrum'}})
view.zoomTo()

<py3Dmol.view at 0x7c8aa3e6ae90>

See more use-cases and APIs on your BioLM Console Catalog.¶

BioLM hosts deep learning models and runs inference at scale. You do the science.¶

Contact us to learn more.¶

<span></span>