e-learning

Clustering in Machine Learning

Abstract

The goal of unsupervised learning is to discover hidden patterns in any unlabeled data. One of the approaches to unsupervised learning is clustering. In this tutorial, we will discuss clustering, its types and a few algorithms to find clusters in data. Clustering groups data points based on their similarities. Each group is called a cluster and contains data points with high similarity and low similarity with data points in other clusters. In short, data points of a cluster are more similar to each other than they are to the data points of other clusters. The goal of clustering is to divide a set of data points in such a way that similar items fall into the same cluster, whereas dissimilar data points fall in different clusters. Further in this tutorial, we will discuss ideas on how to choose different metrics of similarity between data points and use them in different clustering algorithms.

About This Material

This is a Hands-on Tutorial from the GTN which is usable either for individual self-study, or as a teaching material in a classroom.

Questions this will address

  • How to use clustering algorithms to categorize data in different clusters

Learning Objectives

  • Learn clustering background
  • Learn hierarchical clustering algorithm
  • Learn k-means clustering algorithm
  • Learn DBSCAN clustering algorithm
  • Apply clustering algorithms to different datasets
  • Learn how to visualize clusters

Licence: Creative Commons Attribution 4.0 International

Keywords: Statistics and machine learning

Target audience: Students

Resource type: e-learning

Version: 13

Status: Active

Prerequisites:

Introduction to Galaxy Analyses

Learning objectives:

  • Learn clustering background
  • Learn hierarchical clustering algorithm
  • Learn k-means clustering algorithm
  • Learn DBSCAN clustering algorithm
  • Apply clustering algorithms to different datasets
  • Learn how to visualize clusters

Date modified: 2024-01-15

Date published: 2020-05-08

Authors: Alireza Khanteymoori, Anup Kumar

Scientific topics: Statistics and probability


Activity log