Date: 18 - 20 February 2020

This workshop covers the use of the PDBe graph database to extract data for solving complex structural biology queries. It will introduce the PDBe graph database and how to write Cypher queries to retrieve data of interest. Workshop participants will be able to use the graph database to explore data relevant to their own research with support and guidance from the development team at PDBe.

The graph database integrates annotations provided by PDBe-KB partners and is implemented in Neo4J. In this graph each PDB entry is represented as a tree, with the root being the PDB entry, connected to chains and entities, which are then connected to residues. Each of the PDB residues (>150 million) are linked to available annotations (e.g. is the residue part of a catalytic site?, or is it on a macromolecular interaction interface?) and are also directly connected to their corresponding UniProt residues. Storing PDBe-KB data as a graph offers great benefits in particular by allowing straightforward transfer of annotations between PDB entries which map to the same UniProt accession, as well as to highly identical UniProt accession.

Read the database schema here.

Keywords: Proteins (proteins), Structures (structures)

Venue: European Bioinformatics Institute Hinxton

Region: Cambridge

Country: United Kingdom

Postcode: CB10 1SD

Target audience: This workshop is aimed at bioinformaticians with experience of analysing data from the PDB, either by processing archive files or via API access. We encourage applications from individuals with specific questions relating to PDB data that are difficult to solve using existing data queries. Programming experience is required, with a preference for those familiar with Python, although  this is not an absolute requirement. An example use case might involve research into a specific drug molecule, where protein structure is relevant to drug specificity. The graph database would allow the analysis of all common interaction sites in PDB at the residue level, with the potential to expand this search across ligands containing similar fragments. Additional searches could analyse the protein-protein interaction sites between different isoforms of the same protein, and cross-reference them to sequence conservation data and predicted functional annotations. Researchers should submit a 200-word abstract when they apply that describes their work and potential queries related to PDB data. This should include details on how PDB data has been accessed previously and the types of questions trying to be answered.

Capacity: 10


Activity log