ProGen2 BFD90 excels in protein structure prediction and analysis, leveraging the comprehensive BFD90 database for pretraining. Ideal for protein folding studies, it enhances accuracy in structural bioinformatics and protein engineering applications.
All
Generate new protein sequences for a given PDB structure. Trained on 12M protein structures predicted by AlphaFold2. Achieves 51% native sequence recovery on structurally held-out backbones with 72% recovery for buried residues. Allows exploring new sequence spaces.
Predict function by identifying the most likely Enzyme Commission (EC) numbers for an enzyme sequence. Whereas InterProScan and BLAST compare each sequence against a type of database, ProteInfer uses a deep convolutional neural net to directly predict protein function, without slow and costly alignments. Especially useful for enzymes with multiple functions or domains.
Leverage AbLang's embedding space, capturing structural similarities and biophysical properties beyond sequence alone. With this advanced approach, rapidly explore over 2 billion antibodies, uncovering diverse biosimilar candidates with greater accuracy than traditional methods.
ProGen2 Medium is designed for general protein sequence analysis and generation. Its balanced parameter size makes it suitable for a wide range of applications, providing efficient predictions across various protein types.
Visually represent the attention weights assigned to each amino acid in a sequence. Greater weights suggest a potential importance for understanding the protein's properties.
For multi-chain proteins like antibodies, predict the folded structure in seconds with v2 of ESMFold. Similar results, speed, and accuracy to single-chain folding endpoint, but now available for more complex sequences. PDBs via REST in less than a minute, using one of the largest protein LMs to date.
Language model specialized for prediction of variant effects. Ensembling results from all five models enables SOTA zero-shot prediction of the functional effects of sequence variations.
Enhance your biosecurity assessments by classifying toxins and pathogenic sequences utilizing an ESM-2 language model fine-tuned for toxin classification. Embedding representations of proteins can be used to train downstream models.
ProGen2 OAS focuses on immune repertoire databases, making it ideal for research in immunology and antibody design. Its specialized training enhances the accuracy of generation in this specific protein family.
Predict structure using Meta's SOTA language model. Similar in accuracy to AlphaFold2.0 and RoseTTAFold; magnitudes faster. Get predicted PDBs via REST in seconds, with one of the largest protein LMs to date.
Harnesses the power of LLMs to capture intricate protein properties, including structural aspects, for robust feature engineering in classification and regression models. Ideal for data scientists and bioinformaticians, this API significantly enhances machine learning outcomes in protein analysis.
Perform a deep mutational scan with Evolutionary Scale Modeling to stabilize molecules like antibodies and enzymes with proposed single-point mutations.
Generate a fast but accurate contact map to visualize pairwise residue interactions within a protein sequence. Predict the likelihood of contact between all pairs of amino acids in the protein.
Predict a protein's biological roles using neural networks, bypassing traditional sequence comparison. Ideal for complex proteins involved in multiple processes, ProteInfer GO offers rapid, alignment-free predictions, enhancing understanding of biological systems.
RFdiffusion is protein structure generation model with capabilities in motif scaffolding, unconditional protein generation, symmetric unconditional generation, symmetric motif scaffolding, binder design, and design diversification.
Extract dense representations of DNA sequences from an upgraded version of DNABERT. This model was pretrained on large-scale multi-species genome. These representations can be used for visualization, sequence comparisons and downstream modeling.
Extract dense representations of DNA sequences from a BERT model pre-trained on the human genome that can be used for visualization, sequence comparisons and downstream modeling. Meaningful representations even from organisms outside pretraining.
Compute advanced embeddings from antibody sequences, providing a rich dataset for machine learning and clustering tasks. Ideal for nuanced antibody analysis, the embeddings enable enhanced classification and regression modeling in AI-based antibody development.
Reconstruct the start and end of antibody sequences from NGS data, with or without aligning to known germlines. This tool ensures complete and accurate antibody profiling, enhancing research in immunology and therapeutic development.