Sadie Antibody, Sequencing Analysis and Data library for Immunoinformatics Exploration (SADIE), provides AIRR-standard compliant annotation, accurately identifying CDRs, framework regions, somatic hypermutation rates, and V(D)J segment usage directly from raw antibody sequence data. The API outputs structured AIRR tables, facilitating downstream clustering, lineage analysis, and antibody engineering workflows. Typical use cases include antibody discovery, repertoire analysis, and bioinformatics pipelines for therapeutic antibody optimization.
Predict¶
Predict properties or scores for input sequences
- POST /api/v3/sadie-antibody/predict/¶
Predict endpoint for Sadie Antibody.
- Request Headers:
Content-Type – application/json
Authorization – Token YOUR_API_KEY
Request
params (object, required) — Configuration parameters:
region_assign (enum: [imgt, kabat, chothia, abm, contact, scdr], default: “imgt”, optional) — Region definition
scheme (enum: [imgt, kabat, chothia], default: “chothia”, optional) — Numbering scheme
scfv (boolean, default: false) — Whether to allow single-chain Fv
allowed_chain (array of strings, default: [“H”, “K”, “L”]) — Must be a subset of [L, H, K, A, B, G, D]
items (array of objects, min: 1, max: 8, required) — Input sequences:
sequence (string, min length: 1, max length: 2048, required) — Protein sequence with extended amino acid codes
Example request:
- Status Codes:
200 OK – Successful response
400 Bad Request – Invalid input
500 Internal Server Error – Internal server error
Response
results (array of objects) — One result per input item, in the order requested:
domain_no (integer) — Numerical domain index
hmm_species (string) — Species name from HMM alignment
chain_type (string) — Single-letter chain identifier
e_value (float) — Statistical E-value
score (float) — Alignment score
identity_species (string) — Closest species by identity
v_gene (string) — Top V gene call
v_identity (float) — V gene percent identity
j_gene (string) — Top J gene call
j_identity (float) — J gene percent identity
Chain (string) — Chain label
Numbering (array of integers, length ≤ 2048) — Residue index mapping
Insertion (array of strings, length ≤ 2048) — Insertions at each residue index
scheme (string) — Numbering scheme (e.g. “chothia”, “kabat”, “imgt”)
region_definition (string) — Region definition (e.g. “imgt”, “kabat”, “chothia”, “abm”, “contact”, “scdr”)
fwr1_aa_gaps (string) — Amino acids of FWR1 segment with gaps
fwr1_aa_no_gaps (string) — Amino acids of FWR1 segment without gaps
cdr1_aa_gaps (string) — Amino acids of CDR1 segment with gaps
cdr1_aa_no_gaps (string) — Amino acids of CDR1 segment without gaps
fwr2_aa_gaps (string) — Amino acids of FWR2 segment with gaps
fwr2_aa_no_gaps (string) — Amino acids of FWR2 segment without gaps
cdr2_aa_gaps (string) — Amino acids of CDR2 segment with gaps
cdr2_aa_no_gaps (string) — Amino acids of CDR2 segment without gaps
fwr3_aa_gaps (string) — Amino acids of FWR3 segment with gaps
fwr3_aa_no_gaps (string) — Amino acids of FWR3 segment without gaps
cdr3_aa_gaps (string) — Amino acids of CDR3 segment with gaps
cdr3_aa_no_gaps (string) — Amino acids of CDR3 segment without gaps
fwr4_aa_gaps (string) — Amino acids of FWR4 segment with gaps
fwr4_aa_no_gaps (string) — Amino acids of FWR4 segment without gaps
leader (string) — Residues before alignment start
follow (string) — Residues after alignment end
Example response:
Performance¶
Sadie Antibody API provides high-throughput, GPU-accelerated antibody sequence numbering and annotation, suitable for large-scale immunoinformatics workflows.
Implements a Hidden Markov Model (HMM)-based approach for antibody numbering, significantly outperforming traditional BLAST-based annotation methods (such as IgBLAST) in terms of speed and scalability.
Typical runtime per antibody sequence is under 1 second, enabling rapid annotation of large antibody datasets.
Offers multiple numbering schemes (IMGT, Kabat, Chothia), with the Chothia scheme providing optimal balance between accuracy and computational efficiency.
Region definitions (IMGT, Kabat, Chothia, ABM, Contact, SCDR) are supported, allowing flexible annotation tailored to downstream analysis needs.
Compared to similar numbering and annotation algorithms (e.g., ANARCI), Sadie Antibody API achieves comparable or better accuracy with significantly improved computational efficiency due to GPU acceleration and optimized implementation.
Sadie Antibody API is particularly optimized for antibody engineering workflows, providing accurate identification of framework and CDR regions critical for antibody design, maturation, and optimization tasks.
Input type: amino acid sequences (single or batch); Output type: numbered antibody sequences with annotated framework and CDR regions, V/J gene assignments, and alignment metrics.
BioLM’s deployment of Sadie Antibody API leverages GPU acceleration and optimized software architecture, ensuring consistent high performance and scalability for large-scale antibody informatics pipelines.
Applications¶
Antibody sequence annotation for rapid identification of complementarity-determining regions (CDRs) and framework regions, enabling efficient antibody engineering and optimization by accurately mapping functional domains for affinity maturation or humanization workflows
Clustering antibody sequences based on CDR similarity to identify related antibody families, enabling streamlined selection of candidate antibodies for therapeutic development and optimization, though not optimal for datasets with highly divergent sequences
Standardized AIRR-compliant annotation output for antibody sequences, facilitating interoperability and data sharing between bioinformatics pipelines and immunoinformatics tools, essential for reproducible antibody discovery and characterization workflows
Antibody sequence numbering using common schemes (Chothia, Kabat, IMGT) to enable consistent residue-level comparisons across diverse antibody libraries, critical for accurate structural modeling, mutational analysis, and patent filings, but limited to standard antibody formats and not suitable for unconventional antibody-like scaffolds
Generation of structured, annotated antibody sequence objects (ReceptorChain) that encapsulate detailed annotations (e.g., germline assignments, CDR definitions, alignment scores), enabling efficient downstream computational analyses such as antibody repertoire profiling or machine learning-based antibody design, although not intended for direct structural prediction tasks
Limitations¶
Maximum Sequence Length: The maximum allowed sequence length is
2048amino acids. Longer sequences must be truncated or split into smaller segments before submission.Batch Size: The API supports a maximum batch size of
8sequences per request. Larger datasets must be processed in multiple batches.Supported Numbering Schemes: SADIE supports numbering schemes
imgt,kabat, andchothia. Alternative numbering schemes are not supported.Region Definitions: Region definitions are limited to
imgt,kabat,chothia,abm,contact, andscdr. Custom or alternative region definitions cannot be used.Chain Type Constraints: The API supports chains
H,K,L,A,B,G, andD. Other chain types or non-standard antibody formats might not be optimally annotated.Species and Germline Database: SADIE uses a predefined germline database primarily optimized for human and common model organisms. Custom species or unusual germline configurations may not be accurately annotated.
How We Use It¶
The Sadie Antibody algorithm enables BioLM to efficiently annotate, number, and cluster antibody sequences, streamlining antibody design and optimization workflows. By providing standardized antibody annotation consistent with AIRR guidelines, Sadie integrates seamlessly into our broader protein engineering pipelines, enhancing the accuracy and consistency of candidate selection. BioLM leverages Sadie Antibody to rapidly filter and rank antibody sequences based on precise CDR definitions, enabling accelerated antibody maturation cycles and reducing the time and cost associated with antibody discovery.
Integrates directly with BioLM’s predictive modeling and generative AI services, enabling end-to-end antibody optimization.
Accelerates research outcomes by quickly identifying lead candidates ready for synthesis and laboratory validation.
References¶
Walker, L. M., Phogat, S. K., Chan-Hui, P. Y., Wagner, D., Phung, P., Goss, J. L., Wrin, T., Simek, M. D., Fling, S., Mitcham, J. L., Lehrman, J. K., Priddy, F. H., Olsen, O. A., Frey, S. M., Hammond, P. W., Kaminsky, S., Zamb, T., Moyle, M., Koff, W. C., Poignard, P., & Burton, D. R. (2009). Broad and potent neutralizing antibodies from an African donor reveal a new HIV-1 vaccine target. Science, 326(5950), 285-289.
Deli, A., Kurella, V. B., & Kelsoe, G. (2020). HuGL mouse models for the study of human antibody repertoires. Frontiers in Immunology, 11, 1947.
Dunbar, J., & Deane, C. M. (2016). ANARCI: antigen receptor numbering and receptor classification. Bioinformatics, 32(2), 298-300.
Martin, A. C. R. (2023). Antibody Numbering and CDR Definitions. Bioinformatics Group, University College London.
Lefranc, M.-P., Giudicelli, V., Duroux, P., Jabado-Michaloud, J., Folch, G., Aouinti, S., Carillon, E., Duvergey, H., Houles, A., Paysan-Lafosse, T., Hadi-Saljoqi, S., Sasorith, S., Lefranc, G., & Kossida, S. (2015). IMGT®, the international ImMunoGeneTics information system® 25 years on. Nucleic Acids Research, 43(D1), D413-D422.
Ye, J., Ma, N., Madden, T. L., & Ostell, J. M. (2013). IgBLAST: an immunoglobulin variable domain sequence analysis tool. Nucleic Acids Research, 41(W1), W34-W40.
Vander Heiden, J. A., Marquez, S., Marthandan, N., Bukhari, S. A. C., Busse, C. E., Corrie, B., Hershberg, U., Kleinstein, S. H., Matsen, F. A., Ralph, D. K., Rosenfeld, A. M., Schramm, C. A., Christley, S., & Laserson, U. (2018). AIRR Community Standardized Representations for Annotated Immune Repertoires. Frontiers in Immunology, 9, 2206.
Willis, J. R., Sincomb, T., & Kibet, C. K. (2023). SADIE: Sequencing Analysis and Data Library for Immunoinformatics Exploration. GitHub Repository.
