1
00:00:09,500 --> 00:00:13,410
In this lecture, we'll be
discussing the story of Netflix

2
00:00:13,410 --> 00:00:15,630
and how their
recommendation system is

3
00:00:15,630 --> 00:00:17,520
worth a million dollars.

4
00:00:17,520 --> 00:00:19,870
Through this example,
we'll introduce

5
00:00:19,870 --> 00:00:22,960
the method of clustering.

6
00:00:22,960 --> 00:00:27,980
Netflix is an online DVD rental
and streaming video service.

7
00:00:27,980 --> 00:00:31,610
Customers can receive
movie rentals by mail,

8
00:00:31,610 --> 00:00:36,910
and they can also watch selected
movies and TV shows online.

9
00:00:36,910 --> 00:00:41,200
Netflix has more than 40
million subscribers worldwide

10
00:00:41,200 --> 00:00:45,650
and has an annual
revenue of $3.6 billion.

11
00:00:45,650 --> 00:00:48,340
A key aspect of the
company is being

12
00:00:48,340 --> 00:00:52,580
able to offer customers accurate
movie recommendations based

13
00:00:52,580 --> 00:00:56,980
on a customer's own preferences
and viewing history.

14
00:00:56,980 --> 00:01:02,490
From 2006 through 2009, Netflix
ran a contest asking the public

15
00:01:02,490 --> 00:01:07,200
to submit algorithms to predict
user ratings for movies.

16
00:01:07,200 --> 00:01:09,760
This algorithm would
be useful for Netflix

17
00:01:09,760 --> 00:01:13,110
when making
recommendations to users.

18
00:01:13,110 --> 00:01:18,100
Netflix provided a training data
set of about 100 million user

19
00:01:18,100 --> 00:01:22,650
ratings and a test data set
of about three million user

20
00:01:22,650 --> 00:01:24,020
ratings.

21
00:01:24,020 --> 00:01:27,610
They offered a grand
prize of $1 million

22
00:01:27,610 --> 00:01:31,230
to the team who could beat
Netflix's current algorithm,

23
00:01:31,230 --> 00:01:34,800
called Cinematch,
by more than 10%

24
00:01:34,800 --> 00:01:38,090
measured in terms of
root mean squared error.

25
00:01:38,090 --> 00:01:40,910
Netflix believed that
their recommendation system

26
00:01:40,910 --> 00:01:43,950
was so valuable that it
was worth a million dollars

27
00:01:43,950 --> 00:01:46,350
to improve.

28
00:01:46,350 --> 00:01:48,479
The contest had a few rules.

29
00:01:48,479 --> 00:01:52,170
One with that if the grand
prize was not yet reached,

30
00:01:52,170 --> 00:01:55,539
progress prizes of
$50,000 per year

31
00:01:55,539 --> 00:01:57,550
would be awarded
for the best results

32
00:01:57,550 --> 00:02:01,570
so far, as long as it was
at least a 1% improvement

33
00:02:01,570 --> 00:02:03,740
over the previous year.

34
00:02:03,740 --> 00:02:06,050
Another rule was that
teams must submit

35
00:02:06,050 --> 00:02:08,770
their code and a
description of the algorithm

36
00:02:08,770 --> 00:02:11,320
to be awarded any prizes.

37
00:02:11,320 --> 00:02:15,620
And lastly, if a team met
the 10% improvement goal,

38
00:02:15,620 --> 00:02:19,130
a last call would be
issued, and 30 days

39
00:02:19,130 --> 00:02:23,380
would remain for all teams to
submit their best algorithm.

40
00:02:23,380 --> 00:02:25,220
So what happened?

41
00:02:25,220 --> 00:02:29,770
The contest went live
on October 2, 2006.

42
00:02:29,770 --> 00:02:33,660
By October 8, only
six days later, a team

43
00:02:33,660 --> 00:02:36,850
submitted an algorithm
that beat Cinematch.

44
00:02:36,850 --> 00:02:39,690
A week later, on
October 15, there

45
00:02:39,690 --> 00:02:41,950
were three teams
already submitting

46
00:02:41,950 --> 00:02:44,150
algorithms beating Cinematch.

47
00:02:44,150 --> 00:02:48,320
One of these solutions beat
Cinematch by more than 1%,

48
00:02:48,320 --> 00:02:51,740
already qualifying
for a progress prize.

49
00:02:51,740 --> 00:02:55,380
The contest was hugely
popular all over the world.

50
00:02:55,380 --> 00:02:59,020
By June, 2007, over
20,000 teams had

51
00:02:59,020 --> 00:03:02,320
registered from
over 150 countries.

52
00:03:02,320 --> 00:03:06,540
The 2007 progress prize went
to a team called BellKor,

53
00:03:06,540 --> 00:03:11,530
with an 8.43% improvement
over Cinematch.

54
00:03:11,530 --> 00:03:14,980
The following year, several
teams from across the world

55
00:03:14,980 --> 00:03:19,410
joined forces to improve
the accuracy even further.

56
00:03:19,410 --> 00:03:23,579
In 2008, the progress prize
again went to team BellKor.

57
00:03:23,579 --> 00:03:26,079
But this time, the
team included members

58
00:03:26,079 --> 00:03:28,600
from the team
BigChaos in addition

59
00:03:28,600 --> 00:03:31,500
to the original
members of BellKor.

60
00:03:31,500 --> 00:03:35,620
This was the last progress prize
because another 1% improvement

61
00:03:35,620 --> 00:03:40,070
would reach the grand
prize goal of 10%.

62
00:03:40,070 --> 00:03:45,840
On June 26, 2009, the team
BellKor's Pragmatic Chaos,

63
00:03:45,840 --> 00:03:49,160
composed of members from three
different original teams,

64
00:03:49,160 --> 00:03:53,540
submitted a 10.05%
improvement over Cinematch,

65
00:03:53,540 --> 00:03:56,920
signaling the last
call for the contest.

66
00:03:56,920 --> 00:04:00,270
Other teams had 30 days
to submit algorithms

67
00:04:00,270 --> 00:04:02,510
before the contest closed.

68
00:04:02,510 --> 00:04:06,020
These 30 days were filled with
intense competition and even

69
00:04:06,020 --> 00:04:07,470
more progress.

70
00:04:07,470 --> 00:04:09,870
But before revealing
what happened,

71
00:04:09,870 --> 00:04:13,670
let's investigate how we could
try to predict user ratings.

72
00:04:13,670 --> 00:04:17,550
In the next video, we'll discuss
how recommendation systems

73
00:04:17,550 --> 00:04:19,570
generally work.