Learning Pathway Gallantries Grant - Intellectual Output 2 - Large-scale data analysis, and introduction to visualisation and data modelling
Date: No date given
This Learning Pathway collects the results of Intellectual Output 2 in the Gallantries Project
Keywords: beginner, data-science, galaxy-interface, microbiome, variant-analysis, visualisation
Learning objectives:
- Analyze and preprocess Nanopore reads
- Apply the ML techniques to analyse their own datasets
- Be able to write simple shell scripts for running multiple workflows concurrently or sequentially.
- Build complex and customized plots from data in a data frame.
- Check quality reports generated by FastQC and NanoPlot for metagenomics Nanopore data
- Create a number of Circos plots using the Galaxy tool
- Create clean, non-repetitive workflows
- Describe what faceting is and apply faceting in ggplot.
- Familiarise yourself with the various different track types
- Filter, annotate and report lists of variants
- Identify pathogens based on the found virulence factor gene products via assembly, identify strains and indicate all antimicrobial resistance genes in samples
- Identify pathogens via SNP calling and build the consensus gemone of the samples
- Identify yeast species contained in a sequenced beer sample using DNA
- Inspect metagenomics data
- Interpret and visualize the results obtained from ML analyses on omics datasets
- Learn about SRA aligned read format and vcf files for Runs containing SARS-CoV-2 content
- Learn about the Rule Based Uploader
- Learn even more about the Rule Based Uploader
- Learn how to change a workflow using the workflow editor
- Learn how to extract a workflow from a Galaxy history
- Learn how to use Pangolin to assign annotated variants to lineages.
- Learn how to use Workflow Parameters to improve your Workflows
- Learn to use the
planemo run
subcommand to run workflows from the command line. - Modify the aesthetics of an existing ggplot plot (including axis labels and color).
- Perform taxonomy profiling indicating and visualizing up to species level in the samples
- Perform variant linkage analyses for phenotypically selected recombinant progeny
- Plot an E. coli genome in Galaxy
- Preprocess the sequencing data to remove adapters, poor quality base content and host/contaminating reads
- Produce scatter plots, boxplots, and time series plots using ggplot.
- Relate all samples' pathogenic genes for tracking pathogens via phylogenetic trees and heatmaps
- Run metagenomics tools
- Set universal plot settings.
- Understand and master dataset collections
- Understand differences between ML algorithms categories and to which kind of problem they can be applied
- Understand different applications of ML in different -omics studies
- Understand how to search the metadata for these Runs to find your dataset of interest and then import that data in your preferred format
- Understand key aspects of workflows
- Understand the ML taxonomy and the commonly used machine learning algorithms for analysing -omics data
- Use Kraken2 to assign a taxonomic labels
- Use Nanopore data for studying soil metagenomics
- Use joint variant calling and extraction to facilitate variant comparison across samples
- Use some basic, widely used R packages for ML
- Use the scientific library matplolib to explore tabular datasets
- Visualize the microbiome community of a beer sample
- With tracks for the annotations, sequencing data, and variants.
Event types:
- Workshops and courses
Scientific topics: Metagenomics, Microbial ecology, Taxonomy, Sequence analysis, Metabarcoding, Public health and epidemiology, Sequence assembly, Pathology
Activity log