1 00:00:00,000 --> 00:00:02,000 OPERATOR:: The following content is provided under a 2 00:00:02,000 --> 00:00:03,840 Creative Commons license. 3 00:00:03,840 --> 00:00:06,840 Your support will help MIT OpenCourseWare continue to 4 00:00:06,840 --> 00:00:10,530 offer high quality educational resources for free. 5 00:00:10,530 --> 00:00:13,390 To make a donation or view additional materials from 6 00:00:13,390 --> 00:00:17,600 hundreds of MIT courses, visit MIT OpenCourseWare at 7 00:00:17,600 --> 00:00:21,810 ocw.mit.edu. 8 00:00:21,810 --> 00:00:24,460 PROFESSOR: I want to pick up exactly where I 9 00:00:24,460 --> 00:00:26,260 left off last time. 10 00:00:26,260 --> 00:00:29,080 When I was talking about various sins one can commit 11 00:00:29,080 --> 00:00:31,140 with statistics. 12 00:00:31,140 --> 00:00:45,400 And I had been talking about the sin of data enhancement, 13 00:00:45,400 --> 00:00:49,280 where the basic idea there is, you take a piece of data, and 14 00:00:49,280 --> 00:00:51,820 you read much more into it than it implies. 15 00:00:51,820 --> 00:00:56,810 In particular, a very common thing people do with data is 16 00:00:56,810 --> 00:01:01,900 they extrapolate. 17 00:01:01,900 --> 00:01:08,080 I'd given you a couple of examples. 18 00:01:08,080 --> 00:01:17,270 In the real world, it's often not desirable to say that I 19 00:01:17,270 --> 00:01:21,770 have a point here, and a point here, therefore the next point 20 00:01:21,770 --> 00:01:23,840 will surely be here. 21 00:01:23,840 --> 00:01:27,400 And we can just extrapolate in a straight line. 22 00:01:27,400 --> 00:01:30,560 We before saw some examples where I had an algorithm to 23 00:01:30,560 --> 00:01:34,410 generate points, and we fit a curve to it, used the curve to 24 00:01:34,410 --> 00:01:36,190 predict future points, and discovered 25 00:01:36,190 --> 00:01:40,720 it was nowhere close. 26 00:01:40,720 --> 00:01:44,650 Unfortunately, we often see people do this sort of thing. 27 00:01:44,650 --> 00:01:49,950 One of my favorite stories is, William Ruckelshaus, who was 28 00:01:49,950 --> 00:01:52,140 head of the Environmental Protection Agency 29 00:01:52,140 --> 00:01:54,740 in the early 1970s. 30 00:01:54,740 --> 00:01:58,840 And he had a press conference, spoke about the increased use 31 00:01:58,840 --> 00:02:03,360 of cars, and the decreased amount of carpooling. 32 00:02:03,360 --> 00:02:05,970 He was trying to get people to carpool, since at the time 33 00:02:05,970 --> 00:02:10,980 carpooling was on the way down, and I now quote, "each 34 00:02:10,980 --> 00:02:15,760 car entering the central city, sorry, in 1960," he said, 35 00:02:15,760 --> 00:02:20,630 "each car entering the central city had 1.7 people in it. 36 00:02:20,630 --> 00:02:25,540 By 1970. this had dropped to less than 1.2. 37 00:02:25,540 --> 00:02:30,660 If present trends continue, by 1980, more than 1 out of every 38 00:02:30,660 --> 00:02:36,890 10 cars entering the city will have no driver." Amazingly 39 00:02:36,890 --> 00:02:40,990 enough, the press reported this as a straight story, and 40 00:02:40,990 --> 00:02:45,130 talked about how we would be dramatically dropping. 41 00:02:45,130 --> 00:02:49,640 Of course, as it happened, it didn't occur. 42 00:02:49,640 --> 00:02:53,470 But it's just an example of, how much trouble you can get 43 00:02:53,470 --> 00:02:56,880 into by extrapolating. 44 00:02:56,880 --> 00:03:00,870 The final sin I want to talk about is probably the most 45 00:03:00,870 --> 00:03:13,830 common, and it's called the Texas sharpshooter fallacy. 46 00:03:13,830 --> 00:03:15,626 Now before I get into that, are any of 47 00:03:15,626 --> 00:03:17,390 you here from Texas? 48 00:03:17,390 --> 00:03:19,670 All right, you're going to be offended. 49 00:03:19,670 --> 00:03:23,870 Let me think, OK, anybody here from Oklahoma? 50 00:03:23,870 --> 00:03:24,380 You'll like it. 51 00:03:24,380 --> 00:03:26,610 I'll dump on Oklahoma, it will be much better then. 52 00:03:26,610 --> 00:03:32,530 We'll talk about the Oklahoma sharpshooter fallacy. 53 00:03:32,530 --> 00:03:37,780 We won't talk about the BCS rankings, though. 54 00:03:37,780 --> 00:03:40,800 So the idea here is a pretty simple one. 55 00:03:40,800 --> 00:03:45,020 This is a famous marksman who fires his gun randomly at the 56 00:03:45,020 --> 00:03:52,470 side of a barn, has a bunch of holes in it, then goes and 57 00:03:52,470 --> 00:03:59,480 takes a can of paint and draws bullseyes around all the 58 00:03:59,480 --> 00:04:02,550 places his bullets happened to hit. 59 00:04:02,550 --> 00:04:07,770 And people walk by the barn and say, God, he is good. 60 00:04:07,770 --> 00:04:13,750 So obviously, not a good thing, but amazingly easy to 61 00:04:13,750 --> 00:04:16,400 fall into this trap. 62 00:04:16,400 --> 00:04:17,790 So here's another example. 63 00:04:17,790 --> 00:04:24,840 In August of 2001, a paper which people took seriously 64 00:04:24,840 --> 00:04:27,950 appeared in a moderately serious journal called The New 65 00:04:27,950 --> 00:04:33,050 Scientist. And it announced that researchers in Scotland 66 00:04:33,050 --> 00:04:37,850 had proven that anorexics are likely to have 67 00:04:37,850 --> 00:04:41,830 been born in June. 68 00:04:41,830 --> 00:04:44,080 I'm sure you all knew that. 69 00:04:44,080 --> 00:04:45,750 How did how did they prove this? 70 00:04:45,750 --> 00:04:47,570 Or demonstrate this? 71 00:04:47,570 --> 00:04:59,950 They studied 446 women. 72 00:04:59,950 --> 00:05:04,790 Each of whom had been diagnosed anorexic. 73 00:05:04,790 --> 00:05:21,520 And they observed that about 30 percent more than average 74 00:05:21,520 --> 00:05:33,620 were born in June. 75 00:05:33,620 --> 00:05:36,950 Now, since the monthly average of births, if you divide this 76 00:05:36,950 --> 00:05:43,200 by 12, it's about 37, that tells us that 48 77 00:05:43,200 --> 00:05:47,210 were born in June. 78 00:05:47,210 --> 00:05:50,360 So at first sight, this seems significant, and in fact if 79 00:05:50,360 --> 00:05:55,370 you run tests, and ask what's the likelihood of that many 80 00:05:55,370 --> 00:05:58,270 more being born in 1 month, you'll find 81 00:05:58,270 --> 00:06:04,780 that it's quite unlikely. 82 00:06:04,780 --> 00:06:08,680 In fact, you'll find the probability of this happening 83 00:06:08,680 --> 00:06:16,210 is only about 3 percent, of it happening just by accident. 84 00:06:16,210 --> 00:06:19,180 What's wrong with the logic here? 85 00:06:19,180 --> 00:06:19,670 Yes? 86 00:06:19,670 --> 00:06:24,010 STUDENT: They only studied diagnosed anorexics. 87 00:06:24,010 --> 00:06:26,210 PROFESSOR: No, because they were only interested in the 88 00:06:26,210 --> 00:06:30,700 question of when are anorexics born, so it made sense to only 89 00:06:30,700 --> 00:06:31,390 study those. 90 00:06:31,390 --> 00:06:34,630 Now maybe you're right, that we could study that, in fact, 91 00:06:34,630 --> 00:06:37,860 more people are born in June period. 92 00:06:37,860 --> 00:06:38,740 That could be true. 93 00:06:38,740 --> 00:06:40,100 This would be one of the fallacies we 94 00:06:40,100 --> 00:06:42,220 looked at before, right? 95 00:06:42,220 --> 00:06:44,990 That there's a lurking variable which is just that 96 00:06:44,990 --> 00:06:47,550 people are more likely to be born in June. 97 00:06:47,550 --> 00:06:50,610 So that's certainly a possibility. 98 00:06:50,610 --> 00:06:52,320 What else? 99 00:06:52,320 --> 00:06:53,370 What else is the flaw? 100 00:06:53,370 --> 00:07:04,610 Where's the flaw in this logic? 101 00:07:04,610 --> 00:07:06,680 Well, what did they do? 102 00:07:06,680 --> 00:07:12,380 They participated in the Oklahoma sharpshooter fallacy. 103 00:07:12,380 --> 00:07:17,960 What they did is, they looked at 12 months, they took the 104 00:07:17,960 --> 00:07:23,190 months with the most births in it, which happened to be June, 105 00:07:23,190 --> 00:07:29,460 and calculated the probability of 3 percent. 106 00:07:29,460 --> 00:07:34,130 They didn't start with the hypothesis that it was June. 107 00:07:34,130 --> 00:07:36,770 They started with 12 months, and then they drew a bullseye 108 00:07:36,770 --> 00:07:39,890 around June. 109 00:07:39,890 --> 00:07:43,560 So the right question to ask is, what's the probability, 110 00:07:43,560 --> 00:07:49,930 not that June had 48 babies, but that at least one of the 111 00:07:49,930 --> 00:07:55,090 12 months had 48 babies. 112 00:07:55,090 --> 00:08:04,570 That probability is a lot to higher than 3 percent, right? 113 00:08:04,570 --> 00:08:10,540 In fact, it's about 30 percent. 114 00:08:10,540 --> 00:08:17,750 So what we see is, again perfectly reasonable 115 00:08:17,750 --> 00:08:21,440 statistical techniques, but not looking at things 116 00:08:21,440 --> 00:08:23,810 in the right way. 117 00:08:23,810 --> 00:08:26,760 And answering the wrong question. 118 00:08:26,760 --> 00:08:29,100 That make sense to everybody? 119 00:08:29,100 --> 00:08:31,990 And you can see why people can fall into this trap, right? 120 00:08:31,990 --> 00:08:36,820 It was a perfectly sensible, seemingly sensible argument. 121 00:08:36,820 --> 00:08:41,560 So the moral of this particular thing is, be very 122 00:08:41,560 --> 00:08:48,840 careful about looking at your data, drawing a conclusion, 123 00:08:48,840 --> 00:08:53,330 and then saying how probable was that to have occurred? 124 00:08:53,330 --> 00:08:56,120 Because again, you're probably, or maybe, drawing 125 00:08:56,120 --> 00:08:59,790 the bullseye around something that's already there. 126 00:08:59,790 --> 00:09:05,840 Now if they had taken another set of 446 anorexics, and 127 00:09:05,840 --> 00:09:09,390 again June was the month, then there would be some 128 00:09:09,390 --> 00:09:11,320 credibility in it. 129 00:09:11,320 --> 00:09:13,560 Because they would have started with the hypothesis, 130 00:09:13,560 --> 00:09:17,700 not that there existed a month, but that June was 131 00:09:17,700 --> 00:09:19,480 particularly likely. 132 00:09:19,480 --> 00:09:22,300 But then they would have to also check and make sure that 133 00:09:22,300 --> 00:09:25,690 June isn't just a popular month to be born, as was 134 00:09:25,690 --> 00:09:27,500 suggested earlier. 135 00:09:27,500 --> 00:09:30,560 All right, I could go on and on with this sort of thing, 136 00:09:30,560 --> 00:09:32,990 it's kind of fun. 137 00:09:32,990 --> 00:09:34,210 But I won't. 138 00:09:34,210 --> 00:09:36,900 Instead I'm going to torture you with yet one more 139 00:09:36,900 --> 00:09:39,310 simulation. 140 00:09:39,310 --> 00:09:43,420 You may be tempted at this point to just zone out. 141 00:09:43,420 --> 00:09:45,100 Try not to. 142 00:09:45,100 --> 00:09:48,460 And as an added incentive for you to pay attention, I'm 143 00:09:48,460 --> 00:09:52,300 going to warn you that this particular simulation will 144 00:09:52,300 --> 00:09:58,080 appear in the final, or a variant of it. 145 00:09:58,080 --> 00:10:02,160 And what we'll be doing is, early next week we'll be 146 00:10:02,160 --> 00:10:06,790 distributing code, which we'll ask you to study, about two or 147 00:10:06,790 --> 00:10:11,580 three pages of code, and then on the final we'll be asking 148 00:10:11,580 --> 00:10:13,570 you questions about the code. 149 00:10:13,570 --> 00:10:15,140 Not that you have to memorize it, we'll give 150 00:10:15,140 --> 00:10:16,690 you a copy of it. 151 00:10:16,690 --> 00:10:20,140 But you should understand it before you walk 152 00:10:20,140 --> 00:10:21,750 in to take the final. 153 00:10:21,750 --> 00:10:24,130 Because there will not be time to look at that code for the 154 00:10:24,130 --> 00:10:26,720 first time during the quiz, and figure 155 00:10:26,720 --> 00:10:30,130 out what it's doing. 156 00:10:30,130 --> 00:10:38,640 OK, so let's look at it. 157 00:10:38,640 --> 00:10:41,930 I should also warn you that this code includes some Python 158 00:10:41,930 --> 00:10:46,100 concepts, at least one, that you have not yet seen. 159 00:10:46,100 --> 00:10:48,370 We'll see it briefly today. 160 00:10:48,370 --> 00:10:51,950 This is on purpose, because one of the things I hope you 161 00:10:51,950 --> 00:10:55,520 have learned to do this semester, is look up things 162 00:10:55,520 --> 00:10:57,760 you don't know, and figure out what they do. 163 00:10:57,760 --> 00:10:59,810 What they mean. 164 00:10:59,810 --> 00:11:03,830 Because we obviously can't, in any course, or even any set of 165 00:11:03,830 --> 00:11:05,270 courses, tell you everything you'll ever 166 00:11:05,270 --> 00:11:07,150 want to know in life. 167 00:11:07,150 --> 00:11:10,290 So intentionally, we've seeded some things in this program 168 00:11:10,290 --> 00:11:13,190 that will be unfamiliar, so during the time you're 169 00:11:13,190 --> 00:11:16,440 studying the program, get online, look it up, figure out 170 00:11:16,440 --> 00:11:18,150 what they do. 171 00:11:18,150 --> 00:11:21,580 If you have trouble, we will be having office hours, where 172 00:11:21,580 --> 00:11:23,920 you can go and get some help. 173 00:11:23,920 --> 00:11:26,720 But the TAs will expect you to have at least tried to figure 174 00:11:26,720 --> 00:11:27,620 it out yourself. 175 00:11:27,620 --> 00:11:27,980 Yeah? 176 00:11:27,980 --> 00:11:30,420 STUDENT: Will the final be open note? 177 00:11:30,420 --> 00:11:33,450 PROFESSOR: Final will be open book, open notes, just like 178 00:11:33,450 --> 00:11:34,570 the quizzes. 179 00:11:34,570 --> 00:11:37,680 It will be the first two hours of the allotted time, we won't 180 00:11:37,680 --> 00:11:44,400 go the whole 3 hours, OK? 181 00:11:44,400 --> 00:11:48,300 So it won't be hugely longer than the quizzes. 182 00:11:48,300 --> 00:11:49,700 It would be a little bit longer. 183 00:11:49,700 --> 00:11:56,250 And again, very much in the same style of the quizzes. 184 00:11:56,250 --> 00:11:59,350 All right, let's look at this. 185 00:11:59,350 --> 00:12:03,990 Let's assume that you've won the lottery, and have serious 186 00:12:03,990 --> 00:12:09,570 money that you foolishly wish to invest in the stock market. 187 00:12:09,570 --> 00:12:15,890 There are two basic strategies to choose from, in investing. 188 00:12:15,890 --> 00:12:27,450 You can either have what's called an indexed portfolio, 189 00:12:27,450 --> 00:12:41,510 or a managed portfolio. 190 00:12:41,510 --> 00:12:45,300 Indexed portfolios, you basically say, I want to own 191 00:12:45,300 --> 00:12:49,430 all of the stocks that there are, and if the stock market 192 00:12:49,430 --> 00:12:51,630 goes up, I make money, if the stock market goes 193 00:12:51,630 --> 00:12:53,160 down, I lose money. 194 00:12:53,160 --> 00:12:55,760 I'm not going to be thinking I'm clever, and can pick 195 00:12:55,760 --> 00:12:57,350 winners and losers, I'm just betting on 196 00:12:57,350 --> 00:13:00,000 the market as a whole. 197 00:13:00,000 --> 00:13:03,570 They're attractive, in that a, they don't 198 00:13:03,570 --> 00:13:05,660 require a lot of thought. 199 00:13:05,660 --> 00:13:09,270 And b, they have what's called a low expense ratio, since 200 00:13:09,270 --> 00:13:11,660 they're easy to implement, you don't pay anyone to be 201 00:13:11,660 --> 00:13:13,670 brilliant to implement if for you. 202 00:13:13,670 --> 00:13:17,100 So they're very low fees. 203 00:13:17,100 --> 00:13:21,090 A managed portfolio, you find somebody you think is really 204 00:13:21,090 --> 00:13:26,680 smart, and you pay them a fair amount of money, and in return 205 00:13:26,680 --> 00:13:30,420 they assert that they will pick winners for you, and in 206 00:13:30,420 --> 00:13:33,420 fact, you will outperform the stock market. 207 00:13:33,420 --> 00:13:36,600 And if it goes up 6 percent, well you'll go up 10 percent 208 00:13:36,600 --> 00:13:40,460 or more, and if it goes down, don't worry, I'm so smart your 209 00:13:40,460 --> 00:13:45,200 stocks won't go down. 210 00:13:45,200 --> 00:13:47,520 There's a lot of debate about which is the 211 00:13:47,520 --> 00:13:52,800 better of these two. 212 00:13:52,800 --> 00:13:55,570 And so now we're going to try and see if we can write a 213 00:13:55,570 --> 00:14:01,190 simulation that will give us some insight as to which of 214 00:14:01,190 --> 00:14:06,010 these might be better or worse. 215 00:14:06,010 --> 00:14:10,780 All right, so that's the basic problem. 216 00:14:10,780 --> 00:14:16,110 Now, as we know, and by the way we're not going to write a 217 00:14:16,110 --> 00:14:18,600 perfect simulation here, because we're going to try and 218 00:14:18,600 --> 00:14:21,070 do it in 40 minutes, or 30 minutes. 219 00:14:21,070 --> 00:14:24,590 And it would take at least an hour do a perfect simulation 220 00:14:24,590 --> 00:14:26,140 of the stock market. 221 00:14:26,140 --> 00:14:29,090 All right. 222 00:14:29,090 --> 00:14:33,190 First thing we need to do is have some sort of a theory. 223 00:14:33,190 --> 00:14:36,730 When we did the spring, we had this theory of Hooke's Law 224 00:14:36,730 --> 00:14:40,010 that told us something, and we built a simulation, or built 225 00:14:40,010 --> 00:14:42,830 some tools around that theory. 226 00:14:42,830 --> 00:14:47,460 Now we need to think about a model of the stock market. 227 00:14:47,460 --> 00:14:53,190 And the model we're going to use is based on what's called 228 00:14:53,190 --> 00:15:00,840 the Efficient Market Hypothesis. 229 00:15:00,840 --> 00:15:05,070 So the moral here, again, is whenever you're doing an 230 00:15:05,070 --> 00:15:07,490 implementation of a simulation, you do need to 231 00:15:07,490 --> 00:15:12,210 have some underlying theory about the model. 232 00:15:12,210 --> 00:15:14,900 What this model asserts is that markets are 233 00:15:14,900 --> 00:15:31,130 informationally efficient. 234 00:15:31,130 --> 00:15:36,280 That is to say, current prices reflect all publicly known 235 00:15:36,280 --> 00:15:43,510 information about each stock, and therefore are unbiased. 236 00:15:43,510 --> 00:15:46,860 That if people thought that the stock was underpriced, 237 00:15:46,860 --> 00:15:49,170 well people would buy more of it in the price would have 238 00:15:49,170 --> 00:15:51,070 risen already. 239 00:15:51,070 --> 00:15:53,570 If people thought the stock was overpriced, well, people 240 00:15:53,570 --> 00:15:56,810 would have tried to sell it, and it would have come down. 241 00:15:56,810 --> 00:16:01,420 So this is a very popular theory, believed by many 242 00:16:01,420 --> 00:16:06,880 famous economists today, and in the past. And says, OK, 243 00:16:06,880 --> 00:16:13,700 that effectively means that the market is memoryless. 244 00:16:13,700 --> 00:16:16,330 OK, that it doesn't matter what the price of the stock 245 00:16:16,330 --> 00:16:18,610 was yesterday. 246 00:16:18,610 --> 00:16:23,100 Today, it's priced given the best-known information, and so 247 00:16:23,100 --> 00:16:29,900 tomorrow it's equally likely to go up or down. 248 00:16:29,900 --> 00:16:32,640 Relative to the whole market, right? 249 00:16:32,640 --> 00:16:36,640 It's well known that over periods of multiple decades, 250 00:16:36,640 --> 00:16:39,090 the market has a tendency to go up. 251 00:16:39,090 --> 00:16:42,620 And so there's an upward bias to the stock market, contrary 252 00:16:42,620 --> 00:16:46,150 to what you may have seen recently. 253 00:16:46,150 --> 00:16:49,480 But that no particular stock is more or less likely to 254 00:16:49,480 --> 00:16:53,610 outperform the market, because all the information is 255 00:16:53,610 --> 00:16:56,690 incorporated in the price. 256 00:16:56,690 --> 00:17:01,100 And that leads to a notion of being able to model the 257 00:17:01,100 --> 00:17:04,800 market, how? 258 00:17:04,800 --> 00:17:09,010 How would you model individual stocks if you believe this 259 00:17:09,010 --> 00:17:14,350 hypothesis? 260 00:17:14,350 --> 00:17:15,520 Somebody? 261 00:17:15,520 --> 00:17:16,340 What's going to happen? 262 00:17:16,340 --> 00:17:18,640 STUDENT: Random walk. 263 00:17:18,640 --> 00:17:20,910 PROFESSOR: Yes, exactly right. 264 00:17:20,910 --> 00:17:23,260 So we would model it as a random walk. 265 00:17:23,260 --> 00:17:26,110 In fact, there's a very famous book called A Random Walk Down 266 00:17:26,110 --> 00:17:30,240 Wall Street, that was one of the first to make this 267 00:17:30,240 --> 00:17:34,040 hypothesis. 268 00:17:34,040 --> 00:17:38,730 Now later, we may decide to abandon this model, but for 269 00:17:38,730 --> 00:17:43,010 the moment let's accept that. 270 00:17:43,010 --> 00:17:46,500 And let's think about how we're going to build the 271 00:17:46,500 --> 00:17:53,810 simulation. 272 00:17:53,810 --> 00:17:58,190 Whenever I think about how to build an interesting program, 273 00:17:58,190 --> 00:18:01,070 and I hope whenever you think about it, the first thing I 274 00:18:01,070 --> 00:18:05,720 think about is, what are the classes I might want to have, 275 00:18:05,720 --> 00:18:08,680 what are the types? 276 00:18:08,680 --> 00:18:12,130 And it seems pretty obvious that at least two of the 277 00:18:12,130 --> 00:18:18,070 things I'm going to want are stock and market. 278 00:18:18,070 --> 00:18:22,770 After all, I'm going to try and build a simulation of the 279 00:18:22,770 --> 00:18:26,170 stock market, so I might as well have the notion of a 280 00:18:26,170 --> 00:18:31,330 market, and probably the notion of a stock. 281 00:18:31,330 --> 00:18:35,880 Which should I implement first? 282 00:18:35,880 --> 00:18:40,200 Well, my usual style of programming would be to 283 00:18:40,200 --> 00:18:45,260 implement the one that's lowest down in the hierarchy, 284 00:18:45,260 --> 00:18:48,370 near the bottom. 285 00:18:48,370 --> 00:18:50,810 I won't be able to show you what a market does unless I 286 00:18:50,810 --> 00:18:55,430 have stocks, but I can look at what an individual stock does 287 00:18:55,430 --> 00:18:58,120 without having a market. 288 00:18:58,120 --> 00:19:00,620 So why do I implement this first? 289 00:19:00,620 --> 00:19:06,080 Because it will be easier to unit test. I can build class 290 00:19:06,080 --> 00:19:10,780 stock, and I can test class stock, before I 291 00:19:10,780 --> 00:19:16,080 have a class market. 292 00:19:16,080 --> 00:19:30,740 So now let's look at it. 293 00:19:30,740 --> 00:19:38,510 Clean up the desktop a little bit. 294 00:19:38,510 --> 00:19:42,490 This is similar to, but not identical to, what you have in 295 00:19:42,490 --> 00:19:48,390 your handout. 296 00:19:48,390 --> 00:19:53,150 All right, so there's class stock. 297 00:19:53,150 --> 00:19:58,350 And I'm going to initialize it, create them, with an 298 00:19:58,350 --> 00:19:59,710 opening price. 299 00:19:59,710 --> 00:20:02,400 When a stock is first listed in the market, it comes with 300 00:20:02,400 --> 00:20:06,040 some price. 301 00:20:06,040 --> 00:20:10,550 I'm gonna keep as part of each stock, it's history of prices, 302 00:20:10,550 --> 00:20:13,860 which we can initialize, well, I've initialized it as empty, 303 00:20:13,860 --> 00:20:17,210 but that's probably the wrong thing, right? 304 00:20:17,210 --> 00:20:26,500 I probably should have had it being the, starting here, 305 00:20:26,500 --> 00:20:32,040 right, the opening price. 306 00:20:32,040 --> 00:20:34,650 Now comes an interesting part. 307 00:20:34,650 --> 00:20:38,620 Self dot distribution. 308 00:20:38,620 --> 00:20:46,990 Well, I lied to you a little bit in my description of what 309 00:20:46,990 --> 00:20:53,760 it meant to have the Efficient Market Hypothesis. 310 00:20:53,760 --> 00:20:59,670 I said that no stock is likely to outperform the market or 311 00:20:59,670 --> 00:21:02,480 underperform the market. 312 00:21:02,480 --> 00:21:06,980 But it's not quite true, because typically what they 313 00:21:06,980 --> 00:21:21,120 actually do that, is they say it's adjusted for risk. 314 00:21:21,120 --> 00:21:27,240 It's clear that some stocks are more volatile than others. 315 00:21:27,240 --> 00:21:31,830 If you will buy stock in an electrical utility which has a 316 00:21:31,830 --> 00:21:34,940 guaranteed revenue stream, because no matter how bad the 317 00:21:34,940 --> 00:21:39,790 economy gets, a lot of people still use electricity, you 318 00:21:39,790 --> 00:21:43,450 don't expect it to fluctuate a lot. 319 00:21:43,450 --> 00:21:49,850 If you buy stock in a high tech company, that sells 320 00:21:49,850 --> 00:21:53,580 things on the internet, you might expect it to fluctuate 321 00:21:53,580 --> 00:21:55,160 enormously. 322 00:21:55,160 --> 00:21:59,570 Or if you buy stock in a retailer, you might expect it 323 00:21:59,570 --> 00:22:04,530 to go up or down more dramatically with the economy, 324 00:22:04,530 --> 00:22:09,470 and so in fact there is a notion of risk, and I'm not 325 00:22:09,470 --> 00:22:12,370 going to do this in this simulation, but usually people 326 00:22:12,370 --> 00:22:15,240 have to be paid to take risk. 327 00:22:15,240 --> 00:22:18,660 And so it's usually the case that you can get a higher 328 00:22:18,660 --> 00:22:20,870 return if you're willing to take more risk. 329 00:22:20,870 --> 00:22:27,420 We might or might not have time to come back to that. 330 00:22:27,420 --> 00:22:32,980 But more generally, the point is, that each stock actually 331 00:22:32,980 --> 00:22:36,150 behaves a little bit differently. 332 00:22:36,150 --> 00:22:39,350 There's a distribution of how it would move. 333 00:22:39,350 --> 00:22:45,040 So even if, on average, the stock is expected to not move 334 00:22:45,040 --> 00:22:50,840 at all from where it starts, some stocks will be expected 335 00:22:50,840 --> 00:22:53,500 to just trundle along without much 336 00:22:53,500 --> 00:22:56,630 change, not very volatile. 337 00:22:56,630 --> 00:23:02,550 And other stocks might jump up and down a lot because they're 338 00:23:02,550 --> 00:23:03,370 very volatile. 339 00:23:03,370 --> 00:23:07,240 Even if the expected value is the same, they'd 340 00:23:07,240 --> 00:23:10,300 move around a lot. 341 00:23:10,300 --> 00:23:12,500 So how can we model this kind of thing? 342 00:23:12,500 --> 00:23:16,930 Well, we've already looked at the basic notion. 343 00:23:16,930 --> 00:23:19,850 Last time we looked at the notion, last lecture we looked 344 00:23:19,850 --> 00:23:25,640 at the idea of a distribution. 345 00:23:25,640 --> 00:23:29,610 And when we do a simulation, we're pulling the samples from 346 00:23:29,610 --> 00:23:33,330 some distribution. 347 00:23:33,330 --> 00:23:37,340 It could be normal, everything, that would be a 348 00:23:37,340 --> 00:23:41,530 Gaussian, where if you recall there was a mean, and a 349 00:23:41,530 --> 00:23:47,510 standard deviation, and most values were going to be close 350 00:23:47,510 --> 00:23:49,480 to the mean. 351 00:23:49,480 --> 00:23:52,170 Especially if there is a small standard deviation. 352 00:23:52,170 --> 00:23:56,340 If there's a large standard deviation it would be spread. 353 00:23:56,340 --> 00:23:58,680 Or it could be uniform, where every 354 00:23:58,680 --> 00:24:00,190 value was equally probable. 355 00:24:00,190 --> 00:24:03,560 We also looked at exponential. 356 00:24:03,560 --> 00:24:08,300 So we're going to assign to each stock, when we create it, 357 00:24:08,300 --> 00:24:11,230 a distribution. 358 00:24:11,230 --> 00:24:17,330 Some way of visualizing, or thinking about, where we draw 359 00:24:17,330 --> 00:24:20,370 the price changes from. 360 00:24:20,370 --> 00:24:26,880 This gets us into a new linguistic concept, which 361 00:24:26,880 --> 00:24:29,790 we'll see down here. 362 00:24:29,790 --> 00:24:33,310 You don't have this particular code on your handout, you do 363 00:24:33,310 --> 00:24:36,340 have a code that uses the same concept. 364 00:24:36,340 --> 00:24:39,430 So here's my unit test procedure. 365 00:24:39,430 --> 00:24:42,240 And here's where I'm going to create distributions. 366 00:24:42,240 --> 00:24:45,460 And I'm going to look at two. 367 00:24:45,460 --> 00:24:49,930 A random-- a uniform, and a Gaussian. 368 00:24:49,930 --> 00:24:56,810 What lambda that does, it creates on the fly a function, 369 00:24:56,810 --> 00:24:59,370 as the program runs. 370 00:24:59,370 --> 00:25:03,440 That I can then pass around. 371 00:25:03,440 --> 00:25:08,520 So here, I'm going to look at the thing random dot uniform, 372 00:25:08,520 --> 00:25:11,880 for example, between minus volatility and plus 373 00:25:11,880 --> 00:25:13,270 volatility. 374 00:25:13,270 --> 00:25:16,710 So ignoring the lambda, what do we expect random dot 375 00:25:16,710 --> 00:25:19,340 uniform to do? 376 00:25:19,340 --> 00:25:26,020 It has equally likely, in the range from minus volatility to 377 00:25:26,020 --> 00:25:35,470 plus volatility, it will return any value in here. 378 00:25:35,470 --> 00:25:39,640 But notice the previous line, where I am computing 379 00:25:39,640 --> 00:25:42,620 volatility. 380 00:25:42,620 --> 00:25:47,680 If I wanted every stock to have the same volatility, I 381 00:25:47,680 --> 00:25:50,710 could just do that, if you will, at the 382 00:25:50,710 --> 00:25:52,940 time I wrote my program. 383 00:25:52,940 --> 00:25:55,240 But here I want it to be determined, 384 00:25:55,240 --> 00:25:57,280 chosen at run time. 385 00:25:57,280 --> 00:26:03,070 So first, I choose a volatility randomly, from some 386 00:26:03,070 --> 00:26:06,950 distribution of possible volatilities from 0 to, in 387 00:26:06,950 --> 00:26:12,930 this case, 0.2. 388 00:26:12,930 --> 00:26:17,140 Think of this as the percentage move per day. 389 00:26:17,140 --> 00:26:22,070 So 2/10 of a percent, would be the move here. 390 00:26:22,070 --> 00:26:28,130 And then I'll create this function, this distribution d 391 00:26:28,130 --> 00:26:33,720 1, which will, whenever I call it, give me a random, a 392 00:26:33,720 --> 00:26:37,930 uniformly selected value between minus and plus 393 00:26:37,930 --> 00:26:42,210 volatility. 394 00:26:42,210 --> 00:26:48,670 Then when I create the stock, here, I can pass it 395 00:26:48,670 --> 00:26:54,040 in, pass in d 1. 396 00:26:54,040 --> 00:26:55,310 OK, it's a new concept. 397 00:26:55,310 --> 00:26:58,240 I don't expect you'll all immediately grab it, but you 398 00:26:58,240 --> 00:27:05,000 will need to understand it before the quiz comes along. 399 00:27:05,000 --> 00:27:08,110 And then I could also do a Gaussian one here, with the 400 00:27:08,110 --> 00:27:11,230 mean of 0 and the standard deviation of volatility 401 00:27:11,230 --> 00:27:13,530 divided by 2. 402 00:27:13,530 --> 00:27:15,260 Where do these parameters come from? 403 00:27:15,260 --> 00:27:17,530 I made them up out of whole cloth. 404 00:27:17,530 --> 00:27:20,620 Later we'll talk about how 1 could think about them more 405 00:27:20,620 --> 00:27:24,710 intelligently. 406 00:27:24,710 --> 00:27:29,210 Now what do I do with that? 407 00:27:29,210 --> 00:27:31,040 All right, we'll see that in a minute. 408 00:27:31,040 --> 00:27:38,120 But people understand what the basic idea here is. 409 00:27:38,120 --> 00:27:44,110 Now, I can set the price of a stock. 410 00:27:44,110 --> 00:27:47,910 And when I do that, I'll append it to history. 411 00:27:47,910 --> 00:27:50,380 I can, oh, these have got some remnants which we 412 00:27:50,380 --> 00:27:52,330 really don't need. 413 00:27:52,330 --> 00:27:58,290 I'll get rid of this which is just an uninteresting thing. 414 00:27:58,290 --> 00:28:05,020 And let's look at make move. 415 00:28:05,020 --> 00:28:06,490 Because this is the interesting thing. 416 00:28:06,490 --> 00:28:11,520 Make move is what we call to change the price of a stock, 417 00:28:11,520 --> 00:28:14,800 at the beginning or end of a day if you will. 418 00:28:14,800 --> 00:28:18,730 So the first thing it does, is it says, if self dot price is 419 00:28:18,730 --> 00:28:23,360 0, I'm just going to return. 420 00:28:23,360 --> 00:28:26,550 This is not the right thing to do, by the way. 421 00:28:26,550 --> 00:28:30,790 Again, there are some bugs in here. 422 00:28:30,790 --> 00:28:33,120 You won't find these bugs in your handout, right? 423 00:28:33,120 --> 00:28:35,240 Code is different in the handout. 424 00:28:35,240 --> 00:28:39,400 But I wanted to show these to you so we could think about. 425 00:28:39,400 --> 00:28:41,820 What I'm more interested in here than in the result of the 426 00:28:41,820 --> 00:28:46,220 simulation, is the process of creating it. 427 00:28:46,220 --> 00:28:48,400 So why did I put this here? 428 00:28:48,400 --> 00:28:53,900 Why did I say if self dot price equals 0 return? 429 00:28:53,900 --> 00:28:57,660 Because the first time I wrote the program, I didn't have 430 00:28:57,660 --> 00:29:00,730 anything like that here, and a stock could go 431 00:29:00,730 --> 00:29:03,040 to 0 and then recover. 432 00:29:03,040 --> 00:29:05,650 Or even go to negative values. 433 00:29:05,650 --> 00:29:09,110 Well we know stock prices are never negative. 434 00:29:09,110 --> 00:29:12,010 And in fact we know if the price goes to 0, it's delisted 435 00:29:12,010 --> 00:29:14,460 from the exchange. 436 00:29:14,460 --> 00:29:16,460 So I said, all right, we better make a special case of 437 00:29:16,460 --> 00:29:24,130 that. it turns out, that this will be a bug, and I want you 438 00:29:24,130 --> 00:29:29,740 to think about why it's wrong for me to put this check here. 439 00:29:29,740 --> 00:29:32,750 The check needs to be somewhere in the program, but 440 00:29:32,750 --> 00:29:36,270 this is not the right place for it. 441 00:29:36,270 --> 00:29:40,640 So think about why I didn't leave it here. 442 00:29:40,640 --> 00:29:44,620 OK, then we'll get the old price, which we're going to 443 00:29:44,620 --> 00:29:49,180 try and remember, and now comes the interesting part. 444 00:29:49,180 --> 00:29:51,010 We're going to try and figure out how the 445 00:29:51,010 --> 00:29:54,840 price should change. 446 00:29:54,840 --> 00:29:56,650 So I'm first going to compute something 447 00:29:56,650 --> 00:29:59,650 called the base move. 448 00:29:59,650 --> 00:30:02,280 Think of this as kind of the basis from which we'll be 449 00:30:02,280 --> 00:30:05,880 computing the actual move. 450 00:30:05,880 --> 00:30:12,030 I'll draw something from the distribution, so this is 451 00:30:12,030 --> 00:30:15,720 interesting, I'm now calling self dot distribution, and 452 00:30:15,720 --> 00:30:20,270 remember this will be different for each stock. 453 00:30:20,270 --> 00:30:23,840 It will return me some random value from either the Gaussian 454 00:30:23,840 --> 00:30:28,460 or the normal distribution. 455 00:30:28,460 --> 00:30:30,840 With a different volatility for the stocks because that 456 00:30:30,840 --> 00:30:35,340 was also selected randomly, plus some market bias. 457 00:30:35,340 --> 00:30:37,670 Saying, well, the market on average will go up a little 458 00:30:37,670 --> 00:30:42,530 bit, or go down a little bit. 459 00:30:42,530 --> 00:30:46,500 And then I'll set the new price, if you will, self dot 460 00:30:46,500 --> 00:30:52,240 price, to self dot price times 1 plus the base move. 461 00:30:52,240 --> 00:30:53,410 So notice what this says. 462 00:30:53,410 --> 00:31:00,670 If the base move is 0, then the price doesn't change. 463 00:31:00,670 --> 00:31:02,740 So that makes sense. 464 00:31:02,740 --> 00:31:04,780 Interesting question. 465 00:31:04,780 --> 00:31:12,800 Why do you think I said self dot price times 1 plus the 466 00:31:12,800 --> 00:31:16,280 base move, rather than just adding the base move to the 467 00:31:16,280 --> 00:31:18,480 stock, price of the stock? 468 00:31:18,480 --> 00:31:22,090 Again, the first time I coded this, I had an addition there 469 00:31:22,090 --> 00:31:24,710 instead of a multiplication. 470 00:31:24,710 --> 00:31:31,190 What would the ramifications of an addition there be? 471 00:31:31,190 --> 00:31:35,360 That would say, how much the stock changed is independent 472 00:31:35,360 --> 00:31:39,430 of its current price. 473 00:31:39,430 --> 00:31:43,780 And when I ran that it, I got weird results, because we know 474 00:31:43,780 --> 00:31:49,750 that a Google priced at, say, 300, is much more likely to 475 00:31:49,750 --> 00:31:53,440 move by 10 points in a day than a stock 476 00:31:53,440 --> 00:31:58,290 that's priced at $0.50. 477 00:31:58,290 --> 00:32:02,120 So in fact, it is the case, if you look at data, and by the 478 00:32:02,120 --> 00:32:04,360 way, that's the way I ended up setting a lot of these 479 00:32:04,360 --> 00:32:07,840 parameters and playing with it, was comparing what my 480 00:32:07,840 --> 00:32:11,190 simulation said to historical stock data. 481 00:32:11,190 --> 00:32:14,540 And indeed it is the case that the price of the stock, the 482 00:32:14,540 --> 00:32:18,060 move, the amount of move, tends to be proportional to 483 00:32:18,060 --> 00:32:20,130 the price of the stock. 484 00:32:20,130 --> 00:32:22,910 Expensive stocks move more. 485 00:32:22,910 --> 00:32:25,430 Interestingly enough, the percentage moves are not much 486 00:32:25,430 --> 00:32:29,630 different between cheap stocks and expensive stocks. 487 00:32:29,630 --> 00:32:34,770 And that's why, I ended up using a multiplicative factor, 488 00:32:34,770 --> 00:32:47,880 rather than an additive factor. 489 00:32:47,880 --> 00:32:50,060 This is again a general lesson. 490 00:32:50,060 --> 00:32:53,150 As you build these kinds of simulations, or anything like 491 00:32:53,150 --> 00:32:57,950 this, you need to think through whether things should 492 00:32:57,950 --> 00:32:59,880 be multiplicative or additive. 493 00:32:59,880 --> 00:33:03,160 Because you get very different results, typically. 494 00:33:03,160 --> 00:33:07,270 Multiplicative is what you want to do if the amount of 495 00:33:07,270 --> 00:33:12,900 change is proportional to the current size, whether it's 496 00:33:12,900 --> 00:33:16,970 price or anything else, and additive if the change is 497 00:33:16,970 --> 00:33:21,840 independent of the current value, typically, is I think 498 00:33:21,840 --> 00:33:27,140 the general way to think about it. 499 00:33:27,140 --> 00:33:31,350 Now, you'll see this other kind of peculiar thing. 500 00:33:31,350 --> 00:33:39,880 So I've now set the price, and then I've got this test here. 501 00:33:39,880 --> 00:33:45,570 If mo, mo stands for momentum. 502 00:33:45,570 --> 00:33:53,390 I'm now exploring the question of whether or not stock prices 503 00:33:53,390 --> 00:34:03,000 are indeed memoryless, or the stock changes. 504 00:34:03,000 --> 00:34:09,240 And the fancy word for that is Poisson. 505 00:34:09,240 --> 00:34:14,710 People often model things as Poisson processes, which is to 506 00:34:14,710 --> 00:34:20,410 say, processes in which past behavior has no impact on 507 00:34:20,410 --> 00:34:24,030 future behavior, it's memoryless. 508 00:34:24,030 --> 00:34:29,200 And in fact, that's what the Efficient Market Hypothesis 509 00:34:29,200 --> 00:34:31,940 purports to say. 510 00:34:31,940 --> 00:34:35,930 It says that, since all the information is in the current 511 00:34:35,930 --> 00:34:38,790 price, you don't have to worry about whether it went up or 512 00:34:38,790 --> 00:34:45,610 down yesterday, to decide what it's going to do today. 513 00:34:45,610 --> 00:34:48,500 There are people who don't believe that, and instead 514 00:34:48,500 --> 00:34:52,470 argue that there is this notion called momentum. 515 00:34:52,470 --> 00:34:54,690 These are called momentum investors. 516 00:34:54,690 --> 00:34:58,180 And they say, what's most likely to happen today, is 517 00:34:58,180 --> 00:35:00,110 what happened yesterday. 518 00:35:00,110 --> 00:35:01,490 Or more likely. 519 00:35:01,490 --> 00:35:03,810 If the stock went up yesterday, it's more likely to 520 00:35:03,810 --> 00:35:08,360 go up today, than if it didn't go up yesterday. 521 00:35:08,360 --> 00:35:12,180 So I wasn't sure which religion I was willing to 522 00:35:12,180 --> 00:35:16,910 believe in, if either, so I added a parameter called, if 523 00:35:16,910 --> 00:35:20,870 you believe in momentum, then you should change 524 00:35:20,870 --> 00:35:23,970 the price by -- 525 00:35:23,970 --> 00:35:27,310 And here I just did something taking a Gaussian times the 526 00:35:27,310 --> 00:35:32,230 last change, and, in fact, added it in. 527 00:35:32,230 --> 00:35:35,610 So if it went up yesterday, it will more likely go up today, 528 00:35:35,610 --> 00:35:39,130 because I'm throwing in a positive number, otherwise a 529 00:35:39,130 --> 00:35:41,430 negative number. 530 00:35:41,430 --> 00:35:44,750 Notice that this is additive. 531 00:35:44,750 --> 00:35:48,910 Because it's dealing with yesterday's price. 532 00:35:48,910 --> 00:35:51,140 Change, with the change. 533 00:35:51,140 --> 00:35:56,740 OK, so that's why we're dealing with that. 534 00:35:56,740 --> 00:35:59,950 Now, here's where I should've put in this test 535 00:35:59,950 --> 00:36:03,660 that I had up here. 536 00:36:03,660 --> 00:36:05,850 Get it out from there. 537 00:36:05,850 --> 00:36:08,860 Because what I want to do is, say if self dot price is less 538 00:36:08,860 --> 00:36:14,390 than 0.01, I'm going to set it to 0, just keep it there. 539 00:36:14,390 --> 00:36:19,770 That doesn't solve the problem we had before though, right? 540 00:36:19,770 --> 00:36:26,020 Then I'm going to append it, and keep the last change for 541 00:36:26,020 --> 00:36:28,360 future use. 542 00:36:28,360 --> 00:36:32,220 OK, people understand what's going on here? 543 00:36:32,220 --> 00:36:35,140 And then show history is just going to produce a plot. 544 00:36:35,140 --> 00:36:37,990 We've seen that a million times before. 545 00:36:37,990 --> 00:36:42,840 Any questions about this? 546 00:36:42,840 --> 00:36:45,480 Well, I have a question? 547 00:36:45,480 --> 00:36:46,640 Does it make any sense? 548 00:36:46,640 --> 00:36:48,000 Is it going to work at all? 549 00:36:48,000 --> 00:36:51,220 So now let's test it. 550 00:36:51,220 --> 00:36:57,790 So, I now have this unit test program 551 00:36:57,790 --> 00:37:01,360 called unit test stock. 552 00:37:01,360 --> 00:37:04,450 I originally did not make it a function, I had it in-line, 553 00:37:04,450 --> 00:37:07,600 and I realized that was really stupid, because I wanted to do 554 00:37:07,600 --> 00:37:10,720 it a lot of times. 555 00:37:10,720 --> 00:37:15,710 So it's got an internal procedure, internal function, 556 00:37:15,710 --> 00:37:20,260 local to the unit test, that runs the simulation. 557 00:37:20,260 --> 00:37:25,880 And it takes the set of stocks to simulate, a fig, figure 558 00:37:25,880 --> 00:37:29,290 number, this is going to print a bunch of graphs, and I want 559 00:37:29,290 --> 00:37:32,320 to say what graph it is, and whether or not I 560 00:37:32,320 --> 00:37:35,940 believe in big mo. 561 00:37:35,940 --> 00:37:41,470 It sets the mean to 0, and then for s in the stocks, it 562 00:37:41,470 --> 00:37:46,340 moves it, giving it the bias and the momentum, then it 563 00:37:46,340 --> 00:37:49,190 shows the history. 564 00:37:49,190 --> 00:37:53,440 And then computes the mean of, getting me the mean of all the 565 00:37:53,440 --> 00:37:54,160 stocks in it. 566 00:37:54,160 --> 00:37:57,570 We've seen this sort of thing many times before. 567 00:37:57,570 --> 00:37:59,810 I've then got some constants. 568 00:37:59,810 --> 00:38:02,190 By the way, I want to emphasize that I've named 569 00:38:02,190 --> 00:38:04,860 these constants to make it easier to change. 570 00:38:04,860 --> 00:38:10,710 Starting with 20 stocks, 100 days. 571 00:38:10,710 --> 00:38:15,940 And then what I do is, I stock sub 1, stocks 1 will be the 572 00:38:15,940 --> 00:38:19,430 empty list, stocks 2 is the empty list. Why do you think 573 00:38:19,430 --> 00:38:27,010 I'm starting with bias of 0? 574 00:38:27,010 --> 00:38:32,490 Because, what do you think the mean should be, if I simulate 575 00:38:32,490 --> 00:38:35,990 various things that the bias of 0? 576 00:38:35,990 --> 00:38:39,585 I start $100 as the average price of the stock, what 577 00:38:39,585 --> 00:38:42,240 should the average price of the stock be? 578 00:38:42,240 --> 00:38:45,350 If my code is correct, what should the average price be, 579 00:38:45,350 --> 00:38:50,270 after say, 100 days, if there's no bias. 580 00:38:50,270 --> 00:38:51,480 Pardon? 581 00:38:51,480 --> 00:38:53,810 100, exactly. 582 00:38:53,810 --> 00:38:56,580 Since there's no upward or downward bias. 583 00:38:56,580 --> 00:39:00,130 They may fluctuate wildly, but if I look at enough stocks, 584 00:39:00,130 --> 00:39:03,600 the average should be right around 100. 585 00:39:03,600 --> 00:39:05,690 I don't know what the average would be if I chose a 586 00:39:05,690 --> 00:39:07,080 different bias. 587 00:39:07,080 --> 00:39:11,940 It's a little bit complicated, so I chose the simplest bias. 588 00:39:11,940 --> 00:39:14,360 Important lesson, so that there would be some 589 00:39:14,360 --> 00:39:18,250 predictability in the results, and I would have some, if you 590 00:39:18,250 --> 00:39:22,300 will, smoke test for knowing whether or not I was getting, 591 00:39:22,300 --> 00:39:27,530 my code seemed to be working. 592 00:39:27,530 --> 00:39:31,800 All right, and initially, well, maybe initially, just to 593 00:39:31,800 --> 00:39:39,360 be simple I'm going to start momentum equal to false. 594 00:39:39,360 --> 00:39:41,990 Because, again, it seems simpler have a model where 595 00:39:41,990 --> 00:39:43,790 there's no momentum. 596 00:39:43,790 --> 00:39:46,630 I'm looking for the simplest model possible for the first 597 00:39:46,630 --> 00:39:49,230 time I run it. 598 00:39:49,230 --> 00:39:51,780 And then we looked at this little loop before, for i in 599 00:39:51,780 --> 00:39:55,470 range number of stocks, I'm going to create two different 600 00:39:55,470 --> 00:40:00,330 lists of stocks, one where the moves, or distributions, are 601 00:40:00,330 --> 00:40:04,530 chosen from a uniform, and the other where they're Gaussian. 602 00:40:04,530 --> 00:40:07,780 Because I'm sort of curious as to, again, which is the right 603 00:40:07,780 --> 00:40:14,090 way to think about this, all right? 604 00:40:14,090 --> 00:40:18,940 And then, I'm going to just call it. 605 00:40:18,940 --> 00:40:22,030 We'll see what we get. 606 00:40:22,030 --> 00:40:22,780 So let's do it. 607 00:40:22,780 --> 00:40:24,730 Let's hope that all the changes I mad have not 608 00:40:24,730 --> 00:40:29,920 introduced a syntax error. 609 00:40:29,920 --> 00:40:31,670 All right, well at least it did something. 610 00:40:31,670 --> 00:40:39,750 Let's see what it did. 611 00:40:39,750 --> 00:40:43,740 So the test on the left, you'll remember, was the one 612 00:40:43,740 --> 00:40:47,930 with test one, I believe, was the uniform distribution, and 613 00:40:47,930 --> 00:40:51,180 test two is the Gaussian. 614 00:40:51,180 --> 00:40:55,290 So, but let's, what should we do first? 615 00:40:55,290 --> 00:41:00,240 Well, let's do the smoke test number one: is the mean more 616 00:41:00,240 --> 00:41:02,790 or less what we expected? 617 00:41:02,790 --> 00:41:05,740 Well, it looks like it's dead on 100, which was our initial 618 00:41:05,740 --> 00:41:09,120 price in test two. 619 00:41:09,120 --> 00:41:12,440 And in test one it's a little bit above 100. 620 00:41:12,440 --> 00:41:16,960 But, we didn't do that many stocks, or that many days, so 621 00:41:16,960 --> 00:41:24,230 it's quite plausible that it's correct. 622 00:41:24,230 --> 00:41:28,000 But, just to be sure, not to be sure, but just to increase 623 00:41:28,000 --> 00:41:46,740 my confidence, I'm going to just run it again. 624 00:41:46,740 --> 00:41:51,110 Well, here I'm a little bit below 100 and in two, and test 625 00:41:51,110 --> 00:41:53,510 one a little bit below 100 as well. 626 00:41:53,510 --> 00:41:56,820 You remember last time was a little bit above 100. 627 00:41:56,820 --> 00:41:59,900 I feel pretty good about this, and in fact I ran it a lot of 628 00:41:59,900 --> 00:42:01,540 times in my office. 629 00:42:01,540 --> 00:42:05,100 And it just bounces around, hovering around 100. 630 00:42:05,100 --> 00:42:06,760 Course, this is the wrong way to do it. 631 00:42:06,760 --> 00:42:09,690 I should really just put it in a nice test harness, where I 632 00:42:09,690 --> 00:42:14,370 run 100, 200, 1,000 trials, but I didn't want to bore you 633 00:42:14,370 --> 00:42:15,930 with that here. 634 00:42:15,930 --> 00:42:20,950 So we'll see that, OK, we passed the first smoke test. 635 00:42:20,950 --> 00:42:25,130 We seem to be where we expect to be. 636 00:42:25,130 --> 00:42:30,080 Well, let's try smoke test two. 637 00:42:30,080 --> 00:42:32,420 What else might we want to see, to see if we got things 638 00:42:32,420 --> 00:42:34,580 working properly? 639 00:42:34,580 --> 00:42:39,590 Well, I kind of ignored the notion of bias by making it 0, 640 00:42:39,590 --> 00:42:49,730 so let's give it a big bias here. 641 00:42:49,730 --> 00:43:09,830 Assuming it will let me edit it. 642 00:43:09,830 --> 00:43:11,550 We just gotta start it up again, it's the 643 00:43:11,550 --> 00:43:34,890 safest thing to do. 644 00:43:34,890 --> 00:43:38,120 You wouldn't think I would have, I don't have -- all 645 00:43:38,120 --> 00:43:42,800 right, be that way about it. 646 00:43:42,800 --> 00:43:46,440 Fortunately, we've been through this before. 647 00:43:46,440 --> 00:43:49,750 We know if we relaunch the Finder. 648 00:43:49,750 --> 00:43:58,300 Who says Mac OS is flawless? 649 00:43:58,300 --> 00:44:07,290 All right, we were down here, and I was saying, let's try a 650 00:44:07,290 --> 00:44:10,490 larger, introduce a bias. 651 00:44:10,490 --> 00:44:12,580 Again, we're trying to see if it does what we 652 00:44:12,580 --> 00:44:13,970 think it might do. 653 00:44:13,970 --> 00:44:17,020 So what do you think it should do with a bias? 654 00:44:17,020 --> 00:44:21,710 Where should the mean be now? 655 00:44:21,710 --> 00:44:24,730 Still around 100? 656 00:44:24,730 --> 00:44:26,300 Or higher, right? 657 00:44:26,300 --> 00:44:29,660 Because we've now put in a bias suggesting that 658 00:44:29,660 --> 00:44:31,460 it should go up. 659 00:44:31,460 --> 00:44:33,090 Oops. 660 00:44:33,090 --> 00:44:37,720 It wouldn't have hurt it. 661 00:44:37,720 --> 00:44:39,550 All right. 662 00:44:39,550 --> 00:44:49,730 So let's run it. 663 00:44:49,730 --> 00:44:54,220 Sure enough, for one, we see, test two, it's a little bit 664 00:44:54,220 --> 00:45:02,760 over 100, and for test one it's way over 100. 665 00:45:02,760 --> 00:45:11,130 Well, let's make sure it's not a fluke. 666 00:45:11,130 --> 00:45:21,730 Try it again. 667 00:45:21,730 --> 00:45:26,940 So, sure enough, changing the bias changed the price, and 668 00:45:26,940 --> 00:45:31,170 even changed it in the right direction. 669 00:45:31,170 --> 00:45:33,990 So we can feel pretty comfortable that it's doing 670 00:45:33,990 --> 00:45:35,380 something good with that. 671 00:45:35,380 --> 00:45:37,350 We could also feel pretty comfortable that that's 672 00:45:37,350 --> 00:45:40,140 probably way too high a bias, right? 673 00:45:40,140 --> 00:45:45,740 We would not expect that the mean should be over 160, or in 674 00:45:45,740 --> 00:45:49,850 one case, 150, after only 100 days trading, right? 675 00:45:49,850 --> 00:45:53,580 Things don't typically go up 50% in 100 days. 676 00:45:53,580 --> 00:46:00,000 They go down 50%, but -- 677 00:46:00,000 --> 00:46:03,590 All right, so that's good. 678 00:46:03,590 --> 00:46:06,960 Oh, let's look at something else now. 679 00:46:06,960 --> 00:46:10,360 Let's go back to where, a simpler bias here. 680 00:46:10,360 --> 00:46:13,440 We'll run it again. 681 00:46:13,440 --> 00:46:16,910 And think about, what's the difference between the 682 00:46:16,910 --> 00:46:26,110 Gaussian and the normal? 683 00:46:26,110 --> 00:46:35,490 Can we deduce anything about those? 684 00:46:35,490 --> 00:46:37,870 Not, well, let me ask you. 685 00:46:37,870 --> 00:46:45,260 What do you think, yes or no? 686 00:46:45,260 --> 00:46:47,450 Anybody see anything interesting here? 687 00:46:47,450 --> 00:46:47,730 Yeah? 688 00:46:47,730 --> 00:46:52,625 STUDENT: The variance of the Gaussian seems to be less than 689 00:46:52,625 --> 00:46:54,860 the variance of the uniform. 690 00:46:54,860 --> 00:46:55,960 PROFESSOR: The variance of the Gaussian -- 691 00:46:55,960 --> 00:46:59,620 STUDENT: -- is less. 692 00:46:59,620 --> 00:47:02,810 PROFESSOR: So all right, that appears to be the case here. 693 00:47:02,810 --> 00:47:04,860 But let's run it again, as we've done with 694 00:47:04,860 --> 00:47:06,490 all the other tests. 695 00:47:06,490 --> 00:47:09,680 So we have a hypothesis. 696 00:47:09,680 --> 00:47:13,440 Let's not fall victim to the Oklahoma sharpshooter. 697 00:47:13,440 --> 00:47:19,860 We'll test our hypothesis, or at least examine it again, see 698 00:47:19,860 --> 00:47:27,230 if it's, in some sense, repeatable. 699 00:47:27,230 --> 00:47:32,860 Well, now what do we see? 700 00:47:32,860 --> 00:47:35,880 Doesn't seem to be true this time, right? 701 00:47:35,880 --> 00:47:37,960 Not obviously. 702 00:47:37,960 --> 00:47:42,240 So, we're not sure about this. 703 00:47:42,240 --> 00:47:43,960 So this is something that we would need 704 00:47:43,960 --> 00:47:49,420 to investigate further. 705 00:47:49,420 --> 00:47:52,320 And we would need, to have to look at it, and it's going to 706 00:47:52,320 --> 00:47:57,470 be very tricky, by the way, as to what the right answer is. 707 00:47:57,470 --> 00:48:04,200 But if you think about it, it would not be surprising if the 708 00:48:04,200 --> 00:48:09,350 Gaussians, at least, gave us some surprising, more extreme, 709 00:48:09,350 --> 00:48:12,030 results, than the uniform. 710 00:48:12,030 --> 00:48:16,950 Because the uniform, as we've set it up here, is bounded. 711 00:48:16,950 --> 00:48:22,280 The minimum and the maximum is bounded. 712 00:48:22,280 --> 00:48:26,950 With the Gaussian, there's a tail. 713 00:48:26,950 --> 00:48:29,830 And you might every once in a while get this, at least as 714 00:48:29,830 --> 00:48:31,450 we've done it in this case, this large 715 00:48:31,450 --> 00:48:34,570 move out at the end. 716 00:48:34,570 --> 00:48:36,790 You might not. 717 00:48:36,790 --> 00:48:40,460 There's nothing profound about this, other than the 718 00:48:40,460 --> 00:48:44,350 understanding that the details of how you set these things up 719 00:48:44,350 --> 00:48:47,210 can matter a lot. 720 00:48:47,210 --> 00:48:54,770 Well, the final thing I want to look at is momentum. 721 00:48:54,770 --> 00:49:07,040 So let's go back, and let's set mo to true here. 722 00:49:07,040 --> 00:49:10,930 Well, doesn't want us to set mo to true here. 723 00:49:10,930 --> 00:49:16,720 Ah, there it does. 724 00:49:16,720 --> 00:49:19,460 So, and now let's run it and see what happens. 725 00:49:19,460 --> 00:49:25,850 What do you think should happen? 726 00:49:25,850 --> 00:49:26,410 Anybody? 727 00:49:26,410 --> 00:49:31,726 STUDENT: [INAUDIBLE] 728 00:49:31,726 --> 00:49:36,620 PROFESSOR: I think you're right. 729 00:49:36,620 --> 00:49:41,420 These ones should curl, see if I can -- oh, not bad. 730 00:49:41,420 --> 00:49:48,610 Let's run it. 731 00:49:48,610 --> 00:49:53,460 Well, it's a little hard to see, but things 732 00:49:53,460 --> 00:49:56,500 tend to take off. 733 00:49:56,500 --> 00:49:59,895 Because once things started moving, it tends to move in 734 00:49:59,895 --> 00:50:05,640 that direction. 735 00:50:05,640 --> 00:50:10,710 All right. 736 00:50:10,710 --> 00:50:12,750 How do we go about choosing these parameters? 737 00:50:12,750 --> 00:50:14,880 How do we go about deciding what to do? 738 00:50:14,880 --> 00:50:18,510 Well, we play with it, the way I've been playing with it, and 739 00:50:18,510 --> 00:50:22,490 compare the results to some set of real data. 740 00:50:22,490 --> 00:50:26,820 And then we try and get our simulation to match the past, 741 00:50:26,820 --> 00:50:31,600 and hope that that will help it predict the future. 742 00:50:31,600 --> 00:50:33,480 We're not enough time to go through all the, 743 00:50:33,480 --> 00:50:35,460 to do that a lot. 744 00:50:35,460 --> 00:50:38,850 I will be posting code that you can play with, and I 745 00:50:38,850 --> 00:50:42,850 suggest you go through exactly this kind of exercise. 746 00:50:42,850 --> 00:50:45,590 Because this is really the way that people do develop 747 00:50:45,590 --> 00:50:46,930 simulations. 748 00:50:46,930 --> 00:50:48,810 They don't, out of whole cloth, get it 749 00:50:48,810 --> 00:50:50,700 right the first time. 750 00:50:50,700 --> 00:50:54,580 They build them, they do what if games, they play with them, 751 00:50:54,580 --> 00:50:57,870 and then they try and adjust them to get them right. 752 00:50:57,870 --> 00:51:00,980 The nice thing here is, you can decide whether you believe 753 00:51:00,980 --> 00:51:04,370 momentum and see what it would mean, or not mean, etc. 754 00:51:04,370 --> 00:51:06,940 All right, one more lecture. 755 00:51:06,940 --> 00:51:09,140 See you guys next week.