Date: No date given

This Learning Pathway collects the results of Intellectual Output 2 in the Gallantries Project

Keywords: beginner, data-science, galaxy-interface, microbiome, variant-analysis, visualisation

Learning objectives:

  • Analyze and preprocess Nanopore reads
  • Apply the ML techniques to analyse their own datasets
  • Be able to write simple shell scripts for running multiple workflows concurrently or sequentially.
  • Build complex and customized plots from data in a data frame.
  • Check quality reports generated by FastQC and NanoPlot for metagenomics Nanopore data
  • Create a number of Circos plots using the Galaxy tool
  • Create clean, non-repetitive workflows
  • Describe what faceting is and apply faceting in ggplot.
  • Familiarise yourself with the various different track types
  • Filter, annotate and report lists of variants
  • Identify pathogens based on the found virulence factor gene products via assembly, identify strains and indicate all antimicrobial resistance genes in samples
  • Identify pathogens via SNP calling and build the consensus gemone of the samples
  • Identify yeast species contained in a sequenced beer sample using DNA
  • Inspect metagenomics data
  • Interpret and visualize the results obtained from ML analyses on omics datasets
  • Learn about SRA aligned read format and vcf files for Runs containing SARS-CoV-2 content
  • Learn about the Rule Based Uploader
  • Learn even more about the Rule Based Uploader
  • Learn how to change a workflow using the workflow editor
  • Learn how to extract a workflow from a Galaxy history
  • Learn how to use Pangolin to assign annotated variants to lineages.
  • Learn how to use Workflow Parameters to improve your Workflows
  • Learn to use the planemo run subcommand to run workflows from the command line.
  • Modify the aesthetics of an existing ggplot plot (including axis labels and color).
  • Perform taxonomy profiling indicating and visualizing up to species level in the samples
  • Perform variant linkage analyses for phenotypically selected recombinant progeny
  • Plot an E. coli genome in Galaxy
  • Preprocess the sequencing data to remove adapters, poor quality base content and host/contaminating reads
  • Produce scatter plots, boxplots, and time series plots using ggplot.
  • Relate all samples' pathogenic genes for tracking pathogens via phylogenetic trees and heatmaps
  • Run metagenomics tools
  • Set universal plot settings.
  • Understand and master dataset collections
  • Understand differences between ML algorithms categories and to which kind of problem they can be applied
  • Understand different applications of ML in different -omics studies
  • Understand how to search the metadata for these Runs to find your dataset of interest and then import that data in your preferred format
  • Understand key aspects of workflows
  • Understand the ML taxonomy and the commonly used machine learning algorithms for analysing -omics data
  • Use Kraken2 to assign a taxonomic labels
  • Use Nanopore data for studying soil metagenomics
  • Use joint variant calling and extraction to facilitate variant comparison across samples
  • Use some basic, widely used R packages for ML
  • Use the scientific library matplolib to explore tabular datasets
  • Visualize the microbiome community of a beer sample
  • With tracks for the annotations, sequencing data, and variants.

Event types:

  • Workshops and courses

Scientific topics: Metagenomics, Microbial ecology, Taxonomy, Sequence analysis, Metabarcoding, Public health and epidemiology, Sequence assembly, Pathology


Activity log