e-learning

Machine Learning Modeling of Anticancer Peptides

Abstract

Biological molecules such as proteins, peptides, DNA, and RNA can be represented by their biochemical or sequence-based properties. These properties can be utilized to deduce biological meanings using ML modeling. A descriptor or feature is the quantitative or qualitative measure of a property that is associated with a sequence. For example, a chemical compound can be described via its charge chemical formula, molecular weight, number of rotatable bonds, etc. Similarly, several properties can be deduced from the biological sequence that can be utilized to describe a biological activity such as anticancer property. Properties associated with a group of peptide sequences such as overall charge, hydrophobicity profile, or k-mer composition can be utilized to build an ML model and this model can be used to predict biological properties of unknown peptides. Several computational methods have been proven very useful in the initial screening and prediction of peptides for various biological properties. These methods have emerged as effective alternatives to the lengthy and expensive traditional experimental approaches. Finding ACPs through wet-lab methods is costly and time-consuming; thus, the development of an efficient computational approach is useful to predict potential ACP peptides before wet-lab experimentation. In this tutorial, we will be discussing how peptide-based properties like charge, hydrophobicity, the composition of amino acids, etc. can be utilized to predict the biological properties of peptides. Additionally, we will learn how to use different utilities of the Peptide Design and Analysis Under Galaxy (PDAUG) package to calculate various peptide-based descriptors and use these descriptors for ML modeling. We will use CTD (composition, transition, and distribution) descriptor to define peptide sequences in the training set and will test 6 different ML algorithms. We will also assess the effect of normalization on the accuracy of ML models.

About This Material

This is a Hands-on Tutorial from the GTN which is usable either for individual self-study, or as a teaching material in a classroom.

Questions this will address

  • Which machine learning (ML) algorithm is superior in classifying anticancer peptides (ACPs) and non-anticancer peptides (non-ACPs)?

Learning Objectives

  • Learn, how to calculate peptide descriptor
  • Learn, how to create training data set from features?
  • Assessment of best ML algorithm in predicting anticancer peptide

Licence: Creative Commons Attribution 4.0 International

Keywords: ML, Proteomics, cancer

Target audience: Students

Resource type: e-learning

Version: 8

Status: Active

Prerequisites:

Introduction to Galaxy Analyses

Learning objectives:

  • Learn, how to calculate peptide descriptor
  • Learn, how to create training data set from features?
  • Assessment of best ML algorithm in predicting anticancer peptide

Date modified: 2023-11-09

Date published: 2021-01-22

Authors: Daniel Blankenberg, Jayadev Joshi

Contributors: Björn Grüning, Helena Rasche, Jayadev Joshi, Melanie Föll, Saskia Hiltemann, Subina Mehta

Scientific topics: Proteomics


Activity log