More Clustering | Unit 3 | Introduction to Computer Science and Programming | Electrical Engineering and Computer Science

OCW Scholar

« Previous | Next »

Session Overview

This lecture covers hierarchical clustering and introduces k-means clustering.

This image is from the Wikimedia Commons. This image is in the public domain.

Session Activities

Lecture Videos

Lecture 20: More Clustering (00:49:09)

Flash and JavaScript are required for this feature.

Lecture 20: More Clustering

> Download from iTunes U (MP4 - 107MB)

> Download from Internet Archive (MP4 - 107MB)

> Download English-US transcript (PDF)

> Download English-US caption (SRT)

About this Video

Topics covered: Feature vectors, scaling, k-means clustering.

Resources

Lecture code handout (PDF)

Lecture code (PY)

Lecture slides (PDF)

Lecture data files (ZIP) (This ZIP file contains: 3 .txt files.)

Recitation Videos

Recitation 8: Hierarchical and k-means Clustering (00:50:49)

Flash and JavaScript are required for this feature.

Recitation 8: Hierarchical and k-means Clustering

> Download from iTunes U (MP4 - 113MB)

> Download from Internet Archive (MP4 - 113MB)

> Download English-US transcript (PDF)

> Download English-US caption (SRT)

About this Video

Topics covered: Unsupervised learning, k-means clustering, distance metric, cluster merging, centroid, k-mean error, holdout set, k value significance, features of k-means clustering, merits and disadvantages of types of clustering.

Check Yourself

How do we use nominal (non-numeric or noncontinuous) categories as features?

› View/hide answer

Convert each possible value to a real number.

Why do we need to use scaling (normalization)?

› View/hide answer

To indicate the relative importance of each feature.

How does k-means clustering work?

› View/hide answer

A number 'k' points are chosen, randomly or otherwise, to be the initial centroids; all other points are assigned to their nearest centroid. A new, better centroid is then chosen for each cluster, and we rinse and repeat until the difference between our current set of clusters and the previous set is insignificant.

Problem Sets

Problem Set 9: Schedule Optimization (Due)

At an institute of higher education that shall remain nameless, it used to be the case that a human adviser would help each student formulate a list of subjects that would meet the student's objectives. However, because of financial troubles, the Institute has decided to replace human advisers with software. Given the amount of work a student wants to do, the program returns a list of subjects that maximizes the amount of value. The goal of this problem set is to implement optimization algorithms.

Instructions (PDF)
Code Files (ZIP) (This ZIP file contains: 2 .py files and 2 .txt files.)

Note: Solutions are not available for this assignment.

Problem Set 10 (Assigned)

Problem set 10 is assigned in this session. The instructions and solutions can be found on the session page where it is due, Lecture 22 Using Graphs to Model Problems, Part 2.

« Previous | Next »