# More Clustering « Previous | Next »

## Session Overview This lecture covers hierarchical clustering and introduces k-means clustering. This image is from the Wikimedia Commons. This image is in the public domain.

## Session Activities

### Lecture Videos

Topics covered: Feature vectors, scaling, k-means clustering.

### Recitation Videos

Topics covered: Unsupervised learning, k-means clustering, distance metric, cluster merging, centroid, k-mean error, holdout set, k value significance, features of k-means clustering, merits and disadvantages of types of clustering.

## Check Yourself

How do we use nominal (non-numeric or noncontinuous) categories as features?

Convert each possible value to a real number.

Why do we need to use scaling (normalization)?

To indicate the relative importance of each feature.

How does k-means clustering work?

A number 'k' points are chosen, randomly or otherwise, to be the initial centroids; all other points are assigned to their nearest centroid. A new, better centroid is then chosen for each cluster, and we rinse and repeat until the difference between our current set of clusters and the previous set is insignificant.

## Problem Sets

### Problem Set 9: Schedule Optimization (Due)

At an institute of higher education that shall remain nameless, it used to be the case that a human adviser would help each student formulate a list of subjects that would meet the student's objectives. However, because of financial troubles, the Institute has decided to replace human advisers with software. Given the amount of work a student wants to do, the program returns a list of subjects that maximizes the amount of value. The goal of this problem set is to implement optimization algorithms.

Note: Solutions are not available for this assignment.

### Problem Set 10 (Assigned)

Problem set 10 is assigned in this session.  The instructions and solutions can be found on the session page where it is due, Lecture 22 Using Graphs to Model Problems, Part 2.

« Previous | Next »