1 00:00:09,500 --> 00:00:13,410 In this lecture, we'll be discussing the story of Netflix 2 00:00:13,410 --> 00:00:15,630 and how their recommendation system is 3 00:00:15,630 --> 00:00:17,520 worth a million dollars. 4 00:00:17,520 --> 00:00:19,870 Through this example, we'll introduce 5 00:00:19,870 --> 00:00:22,960 the method of clustering. 6 00:00:22,960 --> 00:00:27,980 Netflix is an online DVD rental and streaming video service. 7 00:00:27,980 --> 00:00:31,610 Customers can receive movie rentals by mail, 8 00:00:31,610 --> 00:00:36,910 and they can also watch selected movies and TV shows online. 9 00:00:36,910 --> 00:00:41,200 Netflix has more than 40 million subscribers worldwide 10 00:00:41,200 --> 00:00:45,650 and has an annual revenue of $3.6 billion. 11 00:00:45,650 --> 00:00:48,340 A key aspect of the company is being 12 00:00:48,340 --> 00:00:52,580 able to offer customers accurate movie recommendations based 13 00:00:52,580 --> 00:00:56,980 on a customer's own preferences and viewing history. 14 00:00:56,980 --> 00:01:02,490 From 2006 through 2009, Netflix ran a contest asking the public 15 00:01:02,490 --> 00:01:07,200 to submit algorithms to predict user ratings for movies. 16 00:01:07,200 --> 00:01:09,760 This algorithm would be useful for Netflix 17 00:01:09,760 --> 00:01:13,110 when making recommendations to users. 18 00:01:13,110 --> 00:01:18,100 Netflix provided a training data set of about 100 million user 19 00:01:18,100 --> 00:01:22,650 ratings and a test data set of about three million user 20 00:01:22,650 --> 00:01:24,020 ratings. 21 00:01:24,020 --> 00:01:27,610 They offered a grand prize of $1 million 22 00:01:27,610 --> 00:01:31,230 to the team who could beat Netflix's current algorithm, 23 00:01:31,230 --> 00:01:34,800 called Cinematch, by more than 10% 24 00:01:34,800 --> 00:01:38,090 measured in terms of root mean squared error. 25 00:01:38,090 --> 00:01:40,910 Netflix believed that their recommendation system 26 00:01:40,910 --> 00:01:43,950 was so valuable that it was worth a million dollars 27 00:01:43,950 --> 00:01:46,350 to improve. 28 00:01:46,350 --> 00:01:48,479 The contest had a few rules. 29 00:01:48,479 --> 00:01:52,170 One with that if the grand prize was not yet reached, 30 00:01:52,170 --> 00:01:55,539 progress prizes of $50,000 per year 31 00:01:55,539 --> 00:01:57,550 would be awarded for the best results 32 00:01:57,550 --> 00:02:01,570 so far, as long as it was at least a 1% improvement 33 00:02:01,570 --> 00:02:03,740 over the previous year. 34 00:02:03,740 --> 00:02:06,050 Another rule was that teams must submit 35 00:02:06,050 --> 00:02:08,770 their code and a description of the algorithm 36 00:02:08,770 --> 00:02:11,320 to be awarded any prizes. 37 00:02:11,320 --> 00:02:15,620 And lastly, if a team met the 10% improvement goal, 38 00:02:15,620 --> 00:02:19,130 a last call would be issued, and 30 days 39 00:02:19,130 --> 00:02:23,380 would remain for all teams to submit their best algorithm. 40 00:02:23,380 --> 00:02:25,220 So what happened? 41 00:02:25,220 --> 00:02:29,770 The contest went live on October 2, 2006. 42 00:02:29,770 --> 00:02:33,660 By October 8, only six days later, a team 43 00:02:33,660 --> 00:02:36,850 submitted an algorithm that beat Cinematch. 44 00:02:36,850 --> 00:02:39,690 A week later, on October 15, there 45 00:02:39,690 --> 00:02:41,950 were three teams already submitting 46 00:02:41,950 --> 00:02:44,150 algorithms beating Cinematch. 47 00:02:44,150 --> 00:02:48,320 One of these solutions beat Cinematch by more than 1%, 48 00:02:48,320 --> 00:02:51,740 already qualifying for a progress prize. 49 00:02:51,740 --> 00:02:55,380 The contest was hugely popular all over the world. 50 00:02:55,380 --> 00:02:59,020 By June, 2007, over 20,000 teams had 51 00:02:59,020 --> 00:03:02,320 registered from over 150 countries. 52 00:03:02,320 --> 00:03:06,540 The 2007 progress prize went to a team called BellKor, 53 00:03:06,540 --> 00:03:11,530 with an 8.43% improvement over Cinematch. 54 00:03:11,530 --> 00:03:14,980 The following year, several teams from across the world 55 00:03:14,980 --> 00:03:19,410 joined forces to improve the accuracy even further. 56 00:03:19,410 --> 00:03:23,579 In 2008, the progress prize again went to team BellKor. 57 00:03:23,579 --> 00:03:26,079 But this time, the team included members 58 00:03:26,079 --> 00:03:28,600 from the team BigChaos in addition 59 00:03:28,600 --> 00:03:31,500 to the original members of BellKor. 60 00:03:31,500 --> 00:03:35,620 This was the last progress prize because another 1% improvement 61 00:03:35,620 --> 00:03:40,070 would reach the grand prize goal of 10%. 62 00:03:40,070 --> 00:03:45,840 On June 26, 2009, the team BellKor's Pragmatic Chaos, 63 00:03:45,840 --> 00:03:49,160 composed of members from three different original teams, 64 00:03:49,160 --> 00:03:53,540 submitted a 10.05% improvement over Cinematch, 65 00:03:53,540 --> 00:03:56,920 signaling the last call for the contest. 66 00:03:56,920 --> 00:04:00,270 Other teams had 30 days to submit algorithms 67 00:04:00,270 --> 00:04:02,510 before the contest closed. 68 00:04:02,510 --> 00:04:06,020 These 30 days were filled with intense competition and even 69 00:04:06,020 --> 00:04:07,470 more progress. 70 00:04:07,470 --> 00:04:09,870 But before revealing what happened, 71 00:04:09,870 --> 00:04:13,670 let's investigate how we could try to predict user ratings. 72 00:04:13,670 --> 00:04:17,550 In the next video, we'll discuss how recommendation systems 73 00:04:17,550 --> 00:04:19,570 generally work.