Statistics and Machine Learning for Life Sciences

Overview

Statistics are an integral aspect of scientific research, and in particular of life sciences that heavily rely on quantitative methodologies. Among other things, statistics are an essential tool which allows gaining new insights on the relationships between different biological measurements and variables.

Machine learning (ML) also assists in making sense of large and complex datasets and can be very useful in mining large biological datasets to uncover new insights that can advance the field of bioinformatics.

This course was designed to guide participants in the exploration of the concepts of statistical modelling, and at the same time relate and contrast them with machine learning approaches when it comes to both classification and regression.

A particular focus will be given on the evaluation of the relevance of the produced models, and their interpretation in order to provide new biological knowledge.

Audience

This course is addressed to life scientists who want to have a better understanding of these methods and on how to apply them to their own datasets.

Learning outcomes

At the end of the course, the participants will be able to:
* perform linear and logistic regressions, and critically evaluate their results
* describe the general Machine Learning data analysis pipeline
* implement a classification task and appraise the resulting model
* contrast the statistical and Machine Learning approaches when it comes to regression, and choose the most appropriate to their question.

Prerequisites

Knowledge / competencies

The course is targeted to life scientists who are already familiar with the Python programming language and who have basic knowledge on statistics. The competences and knowledge levels required correspond to those taught in courses such as: First Steps with Python in Life Sciences, Introduction to statistics with Python and Introduction to statistics with R.

Before applying to this course, please self assess your Python and statistics skills using the quiz here.

Technical

You are required to have your own computer with an internet connection and the following tools installed PRIOR to the course:
You are required to have your own computer with an internet connection and the following tools installed PRIOR to the course: tools to be installed.

Schedule

Day 1
* Warm-up: loading and plotting data with python.
* Linear modelling: ordinary least squares, from fitting to models comparison
* Logistic regression and Generalized Linear Models (GLM): from regression to classification

Day 2
* The Machine Learning pipeline and evaluation
* Machine Learning and classification: logistic regression classifier and random forests
* Machine Learning and regression

Application

The course is not open yet for registration.

The registration fees for academics are 200 CHF and 1000 CHF for for-profit companies.

You will be informed by email of your registration confirmation. Upon reception of the confirmation email, participants will be asked to confirm attendance by paying the fees within 5 days.

Applications will close as soon as the maximum capacity is reached. Deadline for free-of-charge cancellation is set to 26/11/2023. Cancellation after this date will not be reimbursed. Please note that participation in SIB courses is subject to our general conditions.

Venue and Time

This course will be streamed.

The course will start at 9:00 and end around 17:00 (CET time zone).

Precise information will be provided to the participants in due time.

Additional information

Coordination: Grégoire Rossier, SIB training group

We will recommend 0.5 ECTS credits for this course (given a passed exam at the end of the course).

You are welcome to register to the SIB courses mailing list to be informed of all future courses and workshops, as well as all important deadlines using the form here.

Please note that participation in SIB courses is subject to our general conditions.

SIB abides by the ELIXIR Code of Conduct. Participants of SIB courses are also required to abide by the same code.

For more information, please contact training@sib.swiss.

Keywords: training, biostatistics, data mining, machine learning, torsten schwede & thierry sengstag group

Authors: SIB Swiss Institute of Bioinformatics, Wandrille Duchemin


Activity log