1 00:00:09,500 --> 00:00:12,130 In this lecture, we introduce linear regression 2 00:00:12,130 --> 00:00:16,470 a simple but very powerful method to analyze data 3 00:00:16,470 --> 00:00:18,840 and make predictions and apply it 4 00:00:18,840 --> 00:00:22,360 in a very unexpected context-- predicting 5 00:00:22,360 --> 00:00:23,670 the quality of wines. 6 00:00:29,250 --> 00:00:34,170 Bordeaux is a region in France popular for producing wine. 7 00:00:34,170 --> 00:00:35,970 While this wine has been produced 8 00:00:35,970 --> 00:00:38,670 in much the same way for hundreds of years, 9 00:00:38,670 --> 00:00:40,650 there are differences in price and quality 10 00:00:40,650 --> 00:00:44,400 from year to year that are sometimes very significant. 11 00:00:44,400 --> 00:00:46,250 Bordeaux wines are widely believed 12 00:00:46,250 --> 00:00:48,640 to taste better when they are order, 13 00:00:48,640 --> 00:00:51,250 so there's an incentive to store young wines 14 00:00:51,250 --> 00:00:53,300 until they are mature. 15 00:00:53,300 --> 00:00:56,280 The main problem is that it is hard to determine 16 00:00:56,280 --> 00:01:00,200 the quality of the wine when it is so young just by tasting it, 17 00:01:00,200 --> 00:01:03,990 since the taste will change so significantly by the time it 18 00:01:03,990 --> 00:01:06,360 will actually be consumed. 19 00:01:06,360 --> 00:01:10,010 This is why wine tasters and experts are helpful. 20 00:01:10,010 --> 00:01:13,230 They taste the wines and then predict 21 00:01:13,230 --> 00:01:16,690 which ones will be the best wines later. 22 00:01:16,690 --> 00:01:19,340 The question we'll address in this lecture-- 23 00:01:19,340 --> 00:01:24,260 can analytics model this process better 24 00:01:24,260 --> 00:01:25,510 and make stronger predictions? 25 00:01:28,840 --> 00:01:31,990 On March 4, 1990, the New York Times 26 00:01:31,990 --> 00:01:35,479 announced that Princeton economics professor Orley 27 00:01:35,479 --> 00:01:38,759 Ashenfelter can predict the quality of Bordeaux wine 28 00:01:38,759 --> 00:01:41,370 without tasting a single drop. 29 00:01:41,370 --> 00:01:43,300 Ashenfelter's predictions have nothing 30 00:01:43,300 --> 00:01:46,450 to do with assessing the aroma of the wine, 31 00:01:46,450 --> 00:01:50,870 looking at the legs, or declaring that the wine tastes 32 00:01:50,870 --> 00:01:53,740 citrusy, oaky, or nutty. 33 00:01:53,740 --> 00:01:55,890 They are the results of a mathematical model. 34 00:01:58,430 --> 00:02:01,850 Ashenfelter used a method called linear regression. 35 00:02:01,850 --> 00:02:06,970 The methods predicts an outcome variable or dependent variable. 36 00:02:06,970 --> 00:02:10,060 And in doing so, it uses a set of what 37 00:02:10,060 --> 00:02:11,420 is called independent variables. 38 00:02:15,260 --> 00:02:16,890 For the dependent variable, Ashenfelter 39 00:02:16,890 --> 00:02:25,240 chose a typical price in 1990-1991 for Bordeaux wine 40 00:02:25,240 --> 00:02:26,340 in an auction. 41 00:02:26,340 --> 00:02:28,640 This approximates quality. 42 00:02:28,640 --> 00:02:31,480 As independent variables, he used 43 00:02:31,480 --> 00:02:35,620 age of the wine-- so the older wines are more expensive-- 44 00:02:35,620 --> 00:02:39,120 and weather-related information, specifically 45 00:02:39,120 --> 00:02:42,200 the average growing season temperature, the harvest 46 00:02:42,200 --> 00:02:43,700 rain, and winter rain. 47 00:02:46,570 --> 00:02:49,820 In these figures, we depict the data 48 00:02:49,820 --> 00:02:53,340 during the period from 1952 to 1978. 49 00:02:53,340 --> 00:02:56,490 There are four independent variables-- 50 00:02:56,490 --> 00:03:02,080 the age of the wine, the average growing season temperature, 51 00:03:02,080 --> 00:03:04,450 the harvest rain, and winter rain. 52 00:03:07,650 --> 00:03:12,810 And on the vertical axis, you observe the logarithm 53 00:03:12,810 --> 00:03:18,130 of the price, the realization in an auction. 54 00:03:18,130 --> 00:03:21,050 So these are the primitive data that Ashenfelter used. 55 00:03:24,360 --> 00:03:28,640 So Ashenfelter believed that his predictions 56 00:03:28,640 --> 00:03:31,140 are more accurate than those of the world's 57 00:03:31,140 --> 00:03:33,850 most influential wine critic. 58 00:03:33,850 --> 00:03:37,560 His name is Robert Parker. 59 00:03:37,560 --> 00:03:41,480 In response, Parker called Ashenfelter 60 00:03:41,480 --> 00:03:45,910 to be "an absolute total sham," and he adds that, 61 00:03:45,910 --> 00:03:51,030 "rather like a movie critic who never goes to see the movie 62 00:03:51,030 --> 00:03:53,150 but tells you how good it is based 63 00:03:53,150 --> 00:03:55,490 on the actors and the director."