Date: 8 December 2022

UniProt is a high quality, comprehensive protein resource in which the core activity is the expert review and annotation of proteins where the function has been experimentally investigated. At the same time, the UniProt database contains large numbers of proteins which are predicted to exist from gene models, but which do not have associated experimental evidence indicating their function. UniProt commits significant resources to developing computational methods for functional annotation of these predicted proteins based on the data in entries that have gone through the expert review process.

We will describe the two main automated annotation systems currently in use. First, UniRule, which is an established UniProt system in which curators manually develop rules for annotation. Second, ARBA (Association-Rule-Based Annotator), which is a multi-class learning system which uses rule mining techniques to generate concise annotation models. ARBA employs a data exclusion algorithm that censors data not suitable for computational annotation, and generates human-readable rules for each UniProt release. As part of our interest in engaging with the machine learning community, we will also introduce the contribution of ProtNLM (Protein Natural Language Model), from Google Research, which annotates proteins which have "uncharacterised" names.

We will also introduce UniFIRE, an open source software that enables researchers to annotate their own protein dataset by using the above mentioned annotation systems. In order to provide an easy and straightforward way to download and set up this tool we have containerised UniFIRE together with all its dependencies and the latest set of UniRule and ARBA rules. In this webinar, we will show how to create functional predictions for protein sequences by using this container image.

Keywords: Fermentation, Microbial ecosystems webinar, UniProt: The Universal Protein Resource, Proteins (proteins), BLAST, Open Targets Platform, Cross domain (cross-domain), Chemical biology (chemical-biology), Drug discovery, Drug target identification, UniRule, ARBA, Automated annotation, MetaboLights: Metabolomics repository and reference database, Chemical Entities of Biological Interest, ChEBI, Metabolites, Molecular building blocks of life, Human Cell Atlas Data Coordination Platform, Single-cell transcriptomics, HCA data portal, Programmatic access, API, Python, Complex Portal, macromolecular assembly, InterPro, Boolean modelling, Europe PubMed Central, Literature (literature), Open access, Protein Data Bank in Europe - Knowledge Base, 3D structure, AlphaFold Database, DeepMind, Artificial intelligence, AI, Structure prediction, cancer, Boolean, Ensembl Genomes, DNA & RNA (dna-rna), European Nucleotide Archive, Data archive, Raw sequencing data, RNAcentral, Non-coding RNA, ncRNA, GPU, Data protection, Job dispatcher, Bioimage analysis resource, Accessibility, Missense variation, Biostatistics, Rfam, non-coding RNA, Infernal software, Sequence annotation, Root microbiome, Abiotic stress, land management, Plant genotype, Plant webinar series, HPC, database development, cross-linked databases, Plant database, data infrastructure, Plant breeding, Data standards, data managemnet, data sharing, Hyb-Seq method, Flowering plants, Crop improvement, Pangenomics, Pangenomes, Virtual humans, Drug-target identification, plant-microbe interactions, Spatial transcriptomics, Plant research, Drug targets, Machine learning, Mathematical modelling, plant science, Data integration, plant-environment interaction, Phenotyping, field phenotyping, Deep phenotyping, EOSC-Life, NHGRI-EBI GWAS Catalog, clinical data, genome-wide association, Proteomics, Proteomes, Peptide search, plants, European Variation Archive, EVA, Variant clusters, Variant data annotation, Constraint-based metabolic modelling, UniProt knowledgebase, protein variant impact, disease-associated protein variants, Bioethics, FAIR principles, ELSI, cohort data, translational research, BioModels database, Mathematical modeling, Reproducibity, Systems biology models, workflows, federated analysis, polygenic risk scores, IntAct Molecular Interaction Database, PSICQUIC, IMEx, Complex portal, Agent-based modelling, Macrophages, Tumorigenesis, Ensembl, Training (Training), On-demand, teaching, introduction, Building blocks, Data analysis, COSMIC, Cancer mutation, Somatic mutation, UniFIRE, Protein annotation

Organizer: European Bioinformatics Institute (EBI)

Target audience: Plant research, Plant research

Capacity: 1000

Event types:

  • Workshops and courses


Activity log