AlphaFold2 is a GPU-accelerated neural network algorithm for accurate prediction of protein 3D structures from amino acid sequences, achieving atomic-level precision close to experimental methods (median backbone accuracy of 0.96 Å Cα r.m.s.d.95 on CASP14). It utilizes multi-sequence alignments (MSAs) and structural templates to produce full-length predictions along with per-residue confidence metrics (pLDDT, pTM). AlphaFold2 enables large-scale structure annotation, protein engineering, and structural biology research through scalable API access provided by BioLM.
Predict¶
Predicts 3D structure for a protein sequence
- POST /api/v3/alphafold2/predict/¶
Predict endpoint for AlphaFold2.
- Request Headers:
Content-Type – application/json
Authorization – Token YOUR_API_KEY
Request
params (object)
databases (array of strings, default: [“mgnify”, “small_bfd”, “uniref90”]) — Allowed values: “small_bfd”, “mgnify”, “uniref90”
predictions_per_model (integer, default: 1, range: 1–8) — Number of predictions to generate per model
relax (string, default: “none”) — Allowed values: “all”, “best”, “none”
return_templates (boolean, default: true) — Whether to include template data in the output
msa_iterations (integer, default: 1, range: 1–5) — Number of MSA refinement iterations
max_msa_sequences (integer, optional, default: null, range: 1–4000) — Maximum number of MSA sequences
algorithm (string, default: “mmseqs2”) — Allowed value: “mmseqs2”
items (array of objects, min: 1, max: 1)
sequence (string, min length: 1, max length: 512) — Protein sequence with extended amino acid validation
Example request:
- Status Codes:
200 OK – Successful response
400 Bad Request – Invalid input
500 Internal Server Error – Internal server error
Response
results (array of objects) — One result per input item, in the order requested:
pdbs (array of strings, size: 1–8) — Predicted protein structures in PDB format
Example response:
Encode¶
Get MSAs for a protein sequence - compatible with AlphaFold2, Chai1, and other models
- POST /api/v3/alphafold2/encode/¶
Encode endpoint for AlphaFold2.
- Request Headers:
Content-Type – application/json
Authorization – Token YOUR_API_KEY
Request
params (object) — Configuration parameters:
databases (array of strings, default: [“mgnify”, “small_bfd”, “uniref90”], possible values: “small_bfd”, “mgnify”, “uniref90”) — List of databases
return_templates (boolean, default: true) — Whether to include template information
msa_iterations (integer, range: 1..5, default: 1) — Number of MSA search iterations
max_msa_sequences (integer, range: 1..4000, optional) — Maximum number of MSA sequences
algorithm (string, default: “mmseqs2”, possible values: “mmseqs2”) — MSA search algorithm
items (array of objects, min items: 1, max items: 1) — Input data:
sequence (string, min length: 1, max length: 512, required) — Sequence data
Example request:
- Status Codes:
200 OK – Successful response
400 Bad Request – Invalid input
500 Internal Server Error – Internal server error
Response
results (array of objects) ― One result per input item, in the order requested:
alignments (object) ― Contains alignment data
small_bfd (array of strings, optional) ― Aligned sequences from the small_bfd database
mgnify (array of strings, optional) ― Aligned sequences from the mgnify database
uniref90 (array of strings, optional) ― Aligned sequences from the uniref90 database
templates (array of objects, optional) ― Contains template hit data
index (integer) ― Template index
name (string) ― Template name
aligned_cols (integer) ― Number of aligned columns
sum_probs (float) ― Summation of alignment probabilities
query (string) ― Query subsequence
hit_sequence (string) ― Template subsequence
indices_query (array of integers) ― Query residue positions
indices_hit (array of integers) ― Template residue positions
Example response:
Performance¶
AlphaFold2 provides highly accurate protein structure predictions, significantly outperforming other structure prediction models such as ESMFold and ABodyBuilder3 pLDDT in terms of atomic-level accuracy, particularly for complex proteins and novel folds.
For general protein structure prediction tasks, AlphaFold2 achieves a median backbone accuracy of approximately 0.96 Å RMSD95 (Cα root-mean-square deviation at 95% residue coverage), compared to around 2.8 Å RMSD95 for the next best-performing methods.
AlphaFold2 produces more accurate side-chain conformations than ESMFold and ABodyBuilder3 pLDDT, especially when backbone predictions are highly accurate, with median all-atom accuracy of approximately 1.5 Å RMSD95.
While AlphaFold2 is more accurate than ESMFold, it is computationally more demanding, requiring significantly more GPU resources and processing time per prediction.
AlphaFold2 is optimized by BioLM for GPU-accelerated inference, leveraging NVIDIA A100 GPUs with 80GB VRAM, typically utilizing three GPUs per prediction task.
AlphaFold2 employs Mmseqs2 for multiple sequence alignment (MSA) generation, providing faster alignment performance compared to Jackhmmer, thus reducing the overall computational time required for predictions.
The accuracy of AlphaFold2 predictions strongly depends on the depth of the multiple sequence alignment (MSA); optimal accuracy is achieved with an MSA depth of at least 30 sequences, with diminishing returns beyond approximately 100 sequences.
AlphaFold2 performs less effectively than specialized models such as NanoBodyBuilder2 when predicting nanobody structures, as NanoBodyBuilder2 is specifically optimized for the unique structural characteristics of nanobodies.
AlphaFold2’s accuracy decreases for proteins with many heterotypic (cross-chain) interactions, whereas it excels at predicting structures of single-chain proteins and homomeric complexes with extensive intra-chain or homotypic contacts.
BioLM’s deployment of AlphaFold2 includes optimizations such as iterative recycling and equivariant attention mechanisms, ensuring robust and reliable predictions across diverse protein families.
Applications¶
Rapid prediction of protein structures directly from amino acid sequences, enabling faster iteration cycles in protein engineering projects by eliminating the need for time-intensive experimental structure determination.
High-resolution modeling of protein backbones and side-chains to inform rational mutagenesis strategies, improving the accuracy of protein stability predictions and functional site identification for commercial enzyme optimization.
Accurate structural characterization of novel protein scaffolds or domains lacking homologous templates, supporting design workflows for synthetic biology applications such as biosensors or protein-based materials.
In silico identification of structurally stable protein variants to prioritize candidates for experimental screening, significantly reducing laboratory resource requirements and accelerating timelines in protein therapeutic development.
Reliable prediction of protein-protein interaction interfaces to guide engineering of fusion proteins or multi-domain constructs, though predictions may be less accurate for proteins relying heavily on heteromeric interactions or external cofactors for structural stability.
Limitations¶
Maximum Sequence Length: The AlphaFold2 API accepts protein sequences up to
512amino acids. Longer sequences must be truncated or split into smaller segments.Batch Size: The API supports a maximum batch size of
1sequence per request. For multiple sequences, submit separate requests.MSA Depth and Quality: Prediction accuracy depends heavily on the quality and depth of the multiple sequence alignment (MSA). Accuracy significantly decreases if the median alignment depth is below approximately 30 sequences. The API allows up to
4000sequences in the MSA; however, shallow alignments may yield less accurate predictions.Protein Complexes and Inter-chain Contacts: AlphaFold2 is optimized for predicting single-chain protein structures or homomeric complexes. It performs poorly on proteins whose structure is primarily defined by interactions with other distinct chains (heteromeric complexes). For proteins with extensive heterotypic contacts, consider alternative modeling approaches.
Computational Cost and Runtime: AlphaFold2 is computationally intensive, especially for sequences approaching the maximum length limit. Prediction runtimes may be several hours, particularly with increased
msa_iterationsor when requesting structural relaxation (relaxset toallorbest).Structural Relaxation: Structural relaxation (
relaxparameter) can improve stereochemical quality but does not typically enhance prediction accuracy. It introduces additional computational overhead and may not be necessary for all use cases.
How We Use It¶
BioLM integrates AlphaFold2 predictions into our protein engineering workflows to accelerate molecular design and optimization, providing accurate structural insights that inform downstream predictive modeling and generative design tasks. By leveraging AlphaFold2’s atomic-level predictions, we enable rapid assessment of structural hypotheses, enhance selection of promising candidates, and significantly reduce experimental iterations. The standardized API facilitates seamless integration with our predictive and generative models, embedding analysis of predicted structures into automated ranking, filtering, and optimization pipelines.
Provides structural context to refine predictive models and improve hit rates for engineered proteins.
Integrates seamlessly with generative AI workflows and structure-based property predictions, reducing time-to-market for designed molecules.
References¶
Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., Tunyasuvunakool, K., Bates, R., Žídek, A., Potapenko, A., Bridgland, A., Meyer, C., Kohl, S. A. A., Ballard, A. J., Cowie, A., Romera-Paredes, B., Nikolov, S., Jain, R., Adler, J., Back, T., Petersen, S., Reiman, D., Clancy, E., Zielinski, M., Steinegger, M., Pacholska, M., Berghammer, T., Bodenstein, S., Silver, D., Vinyals, O., Senior, A. W., Kavukcuoglu, K., Kohli, P., & Hassabis, D. (2021). Highly accurate protein structure prediction with AlphaFold. Nature.
