1 00:00:00,000 --> 00:00:02,000 OPERATOR: The following content is provided under a 2 00:00:02,000 --> 00:00:03,840 Creative Commons license. 3 00:00:03,840 --> 00:00:06,840 Your support will help MIT OpenCourseWare continue to 4 00:00:06,840 --> 00:00:10,530 offer high quality educational resources for free. 5 00:00:10,530 --> 00:00:13,390 To make a donation, or view additional material from 6 00:00:13,390 --> 00:00:17,600 hundreds of MIT courses, visit MIT OpenCourseWare at 7 00:00:17,600 --> 00:00:19,570 ocw.mit.edu. 8 00:00:19,570 --> 00:00:23,360 PROFESSOR: All right, so today we're returning to 9 00:00:23,360 --> 00:00:25,510 simulations. 10 00:00:25,510 --> 00:00:29,910 And I'm going to do at first, a little bit more abstractly, 11 00:00:29,910 --> 00:00:32,550 and then come back to some details. 12 00:00:32,550 --> 00:00:38,310 So they're different ways to classify simulation models. 13 00:00:38,310 --> 00:00:53,570 The first is whether it's stochastic or deterministic. 14 00:00:53,570 --> 00:00:57,400 And the difference here is in a deterministic simulation, 15 00:00:57,400 --> 00:01:04,170 you should get the same result every time you run it. 16 00:01:04,170 --> 00:01:06,880 And there's a lot of uses we'll see for deterministic 17 00:01:06,880 --> 00:01:08,880 simulations. 18 00:01:08,880 --> 00:01:14,030 And then there's stochastic simulations, where the answer 19 00:01:14,030 --> 00:01:17,720 will differ from run to run because there's an element of 20 00:01:17,720 --> 00:01:20,500 randomness in it. 21 00:01:20,500 --> 00:01:23,030 So here if you run it again and again you get the same 22 00:01:23,030 --> 00:01:29,310 outcome every time, here you may not. 23 00:01:29,310 --> 00:01:35,830 So, for example, the problem set that's due today -- is 24 00:01:35,830 --> 00:01:40,740 that a stochastic or deterministic simulation? 25 00:01:40,740 --> 00:01:42,940 Somebody? 26 00:01:42,940 --> 00:01:45,740 Stochastic, exactly. 27 00:01:45,740 --> 00:01:49,200 And that's what we're going to focus on in this class, 28 00:01:49,200 --> 00:01:51,990 because one of the interesting questions we'll see about 29 00:01:51,990 --> 00:01:58,160 stochastic simulations is, how often do have to run them 30 00:01:58,160 --> 00:02:00,990 before you believe the answer? 31 00:02:00,990 --> 00:02:03,940 And that turns out to be a very important issue. 32 00:02:03,940 --> 00:02:05,960 You run it once, you get an answer, you can't 33 00:02:05,960 --> 00:02:07,740 take it to the bank. 34 00:02:07,740 --> 00:02:09,850 Because the next time you run it, you may get a completely 35 00:02:09,850 --> 00:02:11,250 different answer. 36 00:02:11,250 --> 00:02:15,250 So that will get us a little bit into the whole issue of 37 00:02:15,250 --> 00:02:19,130 statistical analysis. 38 00:02:19,130 --> 00:02:31,180 Another interesting dichotomy is static vs dynamic. 39 00:02:31,180 --> 00:02:33,810 We'll look at both, but will spend more 40 00:02:33,810 --> 00:02:37,640 time on dynamic models. 41 00:02:37,640 --> 00:02:43,340 So the issue -- it's not my phone. 42 00:02:43,340 --> 00:02:45,340 If it's your mother, you could feel free to take it, 43 00:02:45,340 --> 00:02:48,150 otherwise -- 44 00:02:48,150 --> 00:02:50,600 OK, no problem. 45 00:02:50,600 --> 00:02:54,090 Inevitable. 46 00:02:54,090 --> 00:02:57,640 In a dynamic situation, time plays a role. 47 00:02:57,640 --> 00:03:00,900 And you look at how things evolve over time. 48 00:03:00,900 --> 00:03:07,950 In a static simulation, there is no issue with time. 49 00:03:07,950 --> 00:03:11,700 We'll be looking at both, but most of the time we'll be 50 00:03:11,700 --> 00:03:15,990 focusing on dynamic ones. 51 00:03:15,990 --> 00:03:20,060 So an example of this kind of thing would be a queuing 52 00:03:20,060 --> 00:03:24,930 network model. 53 00:03:24,930 --> 00:03:27,630 This is one of the most popular and important kinds of 54 00:03:27,630 --> 00:03:31,250 dynamic simulations. 55 00:03:31,250 --> 00:03:36,680 Where you try and look at how queues, a fancy word for 56 00:03:36,680 --> 00:03:40,580 lines, evolve over time. 57 00:03:40,580 --> 00:03:44,320 So for example, people who are trying to decide how many 58 00:03:44,320 --> 00:03:47,970 lanes should be in a highway, or how far apart the exits 59 00:03:47,970 --> 00:03:52,390 should be, or what should the ratio of Fast Lane tolls to 60 00:03:52,390 --> 00:03:54,650 manually staffed tolls should be. 61 00:03:54,650 --> 00:04:00,240 All use queuing networks to try and answer that question. 62 00:04:00,240 --> 00:04:02,230 And we'll look at some examples of these later 63 00:04:02,230 --> 00:04:04,450 because they are very important. 64 00:04:04,450 --> 00:04:06,080 Particularly for things related to 65 00:04:06,080 --> 00:04:10,440 scheduling and planning. 66 00:04:10,440 --> 00:04:26,790 A third dichotomy is discrete vs continuous. 67 00:04:26,790 --> 00:04:32,590 Imagine, for example, trying to analyze the flow of traffic 68 00:04:32,590 --> 00:04:35,590 along the highway. 69 00:04:35,590 --> 00:04:40,020 One way to do it, is to try and have a simulation which 70 00:04:40,020 --> 00:04:43,950 models each vehicle. 71 00:04:43,950 --> 00:04:46,070 That would be a discrete simulation, because you've got 72 00:04:46,070 --> 00:04:47,800 different parts. 73 00:04:47,800 --> 00:04:52,490 Alternatively, you might decide to treat traffic as a 74 00:04:52,490 --> 00:04:56,970 flow, kind of like water flowing through things, where 75 00:04:56,970 --> 00:04:59,410 changes in the flow can be described by 76 00:04:59,410 --> 00:05:01,910 differential equations. 77 00:05:01,910 --> 00:05:08,020 That would lead to a continuous model. 78 00:05:08,020 --> 00:05:13,560 Another example is, a lot of effort has gone into analyzing 79 00:05:13,560 --> 00:05:18,090 the way blood flows through the human body. 80 00:05:18,090 --> 00:05:22,250 You can try and model it discretely, where you take 81 00:05:22,250 --> 00:05:27,060 each red blood cell, each white blood cell, and look at 82 00:05:27,060 --> 00:05:30,770 how they move, or simulate how they move. 83 00:05:30,770 --> 00:05:33,280 Or you could treat it continuously and say, well, 84 00:05:33,280 --> 00:05:36,990 we're just going to treat blood as a fluid, not made up 85 00:05:36,990 --> 00:05:41,190 of discrete components, and write some equations to model 86 00:05:41,190 --> 00:05:45,630 how that fluid goes through and then simulate that. 87 00:05:45,630 --> 00:05:51,300 In this course, we're going to be focusing mostly on discrete 88 00:05:51,300 --> 00:05:56,610 simulations. 89 00:05:56,610 --> 00:05:59,230 Now if we think about the random walk we looked at, 90 00:05:59,230 --> 00:06:10,350 indeed it was stochastic, dynamic, and discrete. 91 00:06:10,350 --> 00:06:15,770 The random walk was an example of what's called a Monte Carlo 92 00:06:15,770 --> 00:06:21,520 simulation. 93 00:06:21,520 --> 00:06:28,240 This term was coined there. 94 00:06:28,240 --> 00:06:32,450 Anyone know what that is? 95 00:06:32,450 --> 00:06:34,170 Anyone want to guess? 96 00:06:34,170 --> 00:06:39,530 It's the casino, in Monaco, in Monte Carlo. 97 00:06:39,530 --> 00:06:43,060 It was at one time, before there was even a Las Vegas, 98 00:06:43,060 --> 00:06:47,090 the most famous casino in the world certainly. 99 00:06:47,090 --> 00:06:50,060 Still one of the more opulent ones as you can see. 100 00:06:50,060 --> 00:06:53,560 And unlike Las Vegas, it's real opulence, as opposed to 101 00:06:53,560 --> 00:06:57,090 faux opulence. 102 00:06:57,090 --> 00:07:01,680 And this term Monte Carlo simulation, was coined by Ulam 103 00:07:01,680 --> 00:07:07,010 and Metropolis, two mathematicians, back in 1949, 104 00:07:07,010 --> 00:07:10,050 in reference to the fact that at Monte Carlos, people bet on 105 00:07:10,050 --> 00:07:15,480 roulette wheels, and cards on a table, games of chance, 106 00:07:15,480 --> 00:07:19,500 where there was randomness, and things are discrete, in 107 00:07:19,500 --> 00:07:20,120 some sense. 108 00:07:20,120 --> 00:07:23,400 And they decided, well, this is just like gambling, and so 109 00:07:23,400 --> 00:07:28,370 they called them Monte Carlos simulations. 110 00:07:28,370 --> 00:07:33,820 What Is it that makes this approach work? 111 00:07:33,820 --> 00:07:40,430 And, in some sense, I won't go into a lot of the math, but I 112 00:07:40,430 --> 00:07:44,470 would like to get some concepts across. 113 00:07:44,470 --> 00:07:47,130 This is an application of what are called inferential 114 00:07:47,130 --> 00:07:50,960 statistics. 115 00:07:50,960 --> 00:07:55,500 You have some sample size, some number of points, and 116 00:07:55,500 --> 00:08:00,860 from that you try to infer something more general. 117 00:08:00,860 --> 00:08:06,230 We always depend upon one property when we do this. 118 00:08:06,230 --> 00:08:18,990 And that property is that, a randomly chosen sample tends 119 00:08:18,990 --> 00:08:44,360 to exhibit the same properties as the population from which 120 00:08:44,360 --> 00:08:55,600 it is drawn. 121 00:08:55,600 --> 00:08:59,360 So you take a population of anything, red balls and black 122 00:08:59,360 --> 00:09:06,020 balls, or students, or steps, and at random draw some 123 00:09:06,020 --> 00:09:11,940 sample, and you assume that that sample has properties 124 00:09:11,940 --> 00:09:17,480 similar to the entire population. 125 00:09:17,480 --> 00:09:23,750 So if I were to go around this room and choose some random 126 00:09:23,750 --> 00:09:32,220 sample of you guys and write down your hair color, we would 127 00:09:32,220 --> 00:09:35,730 be assuming that the fraction of you with blonde hair in 128 00:09:35,730 --> 00:09:39,330 that sample would be the same as the fraction of you with 129 00:09:39,330 --> 00:09:42,480 blonde hair in the whole class. 130 00:09:42,480 --> 00:09:44,510 That's kind of what this means. 131 00:09:44,510 --> 00:09:50,550 And the same would be true of black hair, auburn hair, etc. 132 00:09:50,550 --> 00:09:57,360 So consider, for example, flipping a coin. 133 00:09:57,360 --> 00:10:03,020 And if I were to flip it some number of times, say 100 134 00:10:03,020 --> 00:10:09,780 times, you might be able to, from the proportion of heads 135 00:10:09,780 --> 00:10:13,380 and tails, be able to infer whether or not 136 00:10:13,380 --> 00:10:15,230 the coin was fair. 137 00:10:15,230 --> 00:10:17,700 That is to say, half the times it would be heads, in half the 138 00:10:17,700 --> 00:10:21,460 times it would be that tails, or whether it was unfair, that 139 00:10:21,460 --> 00:10:24,450 it was somehow weighted, so that heads would come up more 140 00:10:24,450 --> 00:10:26,160 than tails. 141 00:10:26,160 --> 00:10:28,800 And you might say if we did this 100 times and looked at 142 00:10:28,800 --> 00:10:33,660 the results, then we could make a decision about what 143 00:10:33,660 --> 00:10:37,130 would happen in general when we looked at the coin. 144 00:10:37,130 --> 00:10:45,830 So let's look in an example of doing that. 145 00:10:45,830 --> 00:10:48,880 So I wrote a little program, it's on your 146 00:10:48,880 --> 00:10:54,540 handout, to flip a coin. 147 00:10:54,540 --> 00:10:56,210 So this looks like the simulations 148 00:10:56,210 --> 00:10:58,370 we looked at before. 149 00:10:58,370 --> 00:11:02,400 I've got flip trials, which says that the number of heads 150 00:11:02,400 --> 00:11:07,320 and tails is 0 for i in x range. 151 00:11:07,320 --> 00:11:09,690 What is x range? 152 00:11:09,690 --> 00:11:13,410 So normally you would have written, for i in range zero 153 00:11:13,410 --> 00:11:16,580 to num flips. 154 00:11:16,580 --> 00:11:21,570 What range does, is it creates a list, in this case from 0 to 155 00:11:21,570 --> 00:11:27,420 99 and goes through the list of one at a time. 156 00:11:27,420 --> 00:11:34,150 That's fine, but supposed num flips were a billion. 157 00:11:34,150 --> 00:11:36,280 Well, range would create a list with a 158 00:11:36,280 --> 00:11:39,090 billion numbers in it. 159 00:11:39,090 --> 00:11:42,190 Which would take a lot of space in the computer. 160 00:11:42,190 --> 00:11:44,980 And it's kind of wasted. 161 00:11:44,980 --> 00:11:50,630 What x range says is, don't bother creating the list just 162 00:11:50,630 --> 00:11:54,440 go through the, in this case, the numbers one at a time. 163 00:11:54,440 --> 00:11:58,890 So it's much more efficient than range. 164 00:11:58,890 --> 00:12:03,260 It will behave the same way as far as the answers you get, 165 00:12:03,260 --> 00:12:06,480 but it doesn't use as much space. 166 00:12:06,480 --> 00:12:09,270 And since some of the simulations we'll be doing 167 00:12:09,270 --> 00:12:13,160 will have lots of trials, or lots of flips, it's worth the 168 00:12:13,160 --> 00:12:15,730 trouble to use x range instead of range. 169 00:12:15,730 --> 00:12:19,180 Yeah? 170 00:12:19,180 --> 00:12:23,870 Pardon? 171 00:12:23,870 --> 00:12:25,588 STUDENT: Like, why would we ever use range 172 00:12:25,588 --> 00:12:26,210 instead of x range? 173 00:12:26,210 --> 00:12:28,530 PROFESSOR: No good reason, when dealing with numbers, 174 00:12:28,530 --> 00:12:31,650 unless you wanted to do something different with the 175 00:12:31,650 --> 00:12:39,490 list. But, there's no good reason. 176 00:12:39,490 --> 00:12:40,900 The right answer for the purposes of 177 00:12:40,900 --> 00:12:43,040 today is, no good reason. 178 00:12:43,040 --> 00:12:45,480 I typically use x range all the time if I'm 179 00:12:45,480 --> 00:12:47,980 thinking about it. 180 00:12:47,980 --> 00:12:51,700 It was just something that didn't seem worth introducing 181 00:12:51,700 --> 00:12:53,220 earlier in this semester. 182 00:12:53,220 --> 00:12:55,210 But good question. 183 00:12:55,210 --> 00:13:03,010 Certainly deserving of a piece of candy. 184 00:13:03,010 --> 00:13:06,610 All right, so for i in x range, coin is equal some 185 00:13:06,610 --> 00:13:08,950 random integer 0 or 1. 186 00:13:08,950 --> 00:13:13,910 If coin is equal to 0 then heads, else tails. 187 00:13:13,910 --> 00:13:16,010 Well, that's pretty easy. 188 00:13:16,010 --> 00:13:19,620 And then all I'm going to do here is go through and flip it 189 00:13:19,620 --> 00:13:23,720 a bunch of times, and we'll get some 190 00:13:23,720 --> 00:13:27,920 answer, and do some plots. 191 00:13:27,920 --> 00:13:31,900 So let's look at an example. 192 00:13:31,900 --> 00:13:55,480 We'll try -- we'll flip 100 coins, we'll do 100 trials and 193 00:13:55,480 --> 00:13:59,660 see what we get. 194 00:13:59,660 --> 00:14:01,360 Error in multi-line statement. 195 00:14:01,360 --> 00:14:05,020 All right, what have I done wrong here? 196 00:14:05,020 --> 00:14:08,410 Obviously did something by accident, edited something I 197 00:14:08,410 --> 00:14:12,920 did not intend edit. 198 00:14:12,920 --> 00:14:15,190 Anyone spot what I did wrong? 199 00:14:15,190 --> 00:14:18,410 Pardon? 200 00:14:18,410 --> 00:14:21,150 The parentheses. 201 00:14:21,150 --> 00:14:22,520 I typed where I didn't intend. 202 00:14:22,520 --> 00:14:24,960 Which line? 203 00:14:24,960 --> 00:14:29,930 Down at the bottom? 204 00:14:29,930 --> 00:14:35,940 Obviously my, here, yes, I deleted that. 205 00:14:35,940 --> 00:14:46,930 Thank you. 206 00:14:46,930 --> 00:14:52,250 All right, so we have a couple of figures here. 207 00:14:52,250 --> 00:14:59,230 Figure one, I'm showing a histogram. 208 00:14:59,230 --> 00:15:05,680 The number of trials on the y-axis and the difference 209 00:15:05,680 --> 00:15:08,130 between heads and tails , do I have more of one than the 210 00:15:08,130 --> 00:15:10,920 other on the x-axis. 211 00:15:10,920 --> 00:15:14,310 And so we what we could see out of my 100 trials, 212 00:15:14,310 --> 00:15:20,140 somewhere around 22 of them came out the same, 213 00:15:20,140 --> 00:15:21,410 close to the same. 214 00:15:21,410 --> 00:15:25,900 But way over here we've got some funny ones. 215 00:15:25,900 --> 00:15:28,810 100 and there was a difference of 25. 216 00:15:28,810 --> 00:15:31,070 Pretty big difference. 217 00:15:31,070 --> 00:15:34,890 Another way to look at the same data, and I'm doing this 218 00:15:34,890 --> 00:15:37,180 just to show that there are different ways of looking at 219 00:15:37,180 --> 00:15:46,650 data, is here, what I've plotted is each trial, the 220 00:15:46,650 --> 00:15:49,900 percent difference. 221 00:15:49,900 --> 00:15:51,400 So out of 100 flips. 222 00:15:51,400 --> 00:15:54,830 And this is normalizing it, because if I flip a million 223 00:15:54,830 --> 00:15:59,190 coins, I might expect the difference to be pretty big in 224 00:15:59,190 --> 00:16:01,420 absolute terms, but maybe very small as a 225 00:16:01,420 --> 00:16:04,790 percentage of a million. 226 00:16:04,790 --> 00:16:08,900 And so here, we can again see that as these stochastic kinds 227 00:16:08,900 --> 00:16:12,020 of things, there's a pretty big difference, right? 228 00:16:12,020 --> 00:16:17,680 We've got one where it was over 25 percent, and several 229 00:16:17,680 --> 00:16:20,290 where it's zero. 230 00:16:20,290 --> 00:16:24,030 So the point here, we can see from this graph, that if I'd 231 00:16:24,030 --> 00:16:28,260 done only one trial and just assumed that was the answer as 232 00:16:28,260 --> 00:16:31,040 to whether my coin was weighted or not, I could 233 00:16:31,040 --> 00:16:35,100 really have fooled myself. 234 00:16:35,100 --> 00:16:40,600 So the the main point is that you need to be careful when 235 00:16:40,600 --> 00:16:44,520 you're doing these kinds of things. 236 00:16:44,520 --> 00:16:48,310 And this green line here is the mean. 237 00:16:48,310 --> 00:16:54,740 So it says on average, the difference was seven precent. 238 00:16:54,740 --> 00:17:14,750 Well suppose, maybe, instead of, flipping 100, I were to 239 00:17:14,750 --> 00:17:42,480 flip 1,000. 240 00:17:42,480 --> 00:17:44,990 Well, doesn't seem to want to notice it. 241 00:17:44,990 --> 00:17:48,560 One more try and then I'll just restart it, which is 242 00:17:48,560 --> 00:18:07,250 always the safest thing as we've discussed before. 243 00:18:07,250 --> 00:18:14,760 Well, we won't panic. 244 00:18:14,760 --> 00:18:20,830 Sometimes this helps. 245 00:18:20,830 --> 00:18:26,820 If not, here we go. 246 00:18:26,820 --> 00:18:29,890 So let's say we wanted to flip 1,000 coins. 247 00:18:29,890 --> 00:18:33,880 So now what do we think? 248 00:18:33,880 --> 00:18:36,920 Is the difference going to be bigger or smaller than when we 249 00:18:36,920 --> 00:18:37,620 flipped 100? 250 00:18:37,620 --> 00:18:42,750 Is the average difference between heads and tails bigger 251 00:18:42,750 --> 00:18:48,950 or smaller with 1,000 flips than with 100 flips? 252 00:18:48,950 --> 00:18:51,930 Well, the percentage will be smaller, but in absolute 253 00:18:51,930 --> 00:18:55,970 terms, it's probably going to be bigger, right? 254 00:18:55,970 --> 00:19:01,750 Because I've got more chances to stray. 255 00:19:01,750 --> 00:19:17,890 But we'll find out. 256 00:19:17,890 --> 00:19:22,930 So here we see that the mean difference is somewhere in the 257 00:19:22,930 --> 00:19:25,670 twenties, which was much higher than the mean 258 00:19:25,670 --> 00:19:28,370 difference for 100 flips. 259 00:19:28,370 --> 00:19:31,920 On the other hand, if we look at the percentage, we see it's 260 00:19:31,920 --> 00:19:32,560 much lower. 261 00:19:32,560 --> 00:19:34,770 Instead of seven percent, it's around two and a half percent 262 00:19:34,770 --> 00:19:37,710 in the main. 263 00:19:37,710 --> 00:19:40,160 There's something else interesting to observe in 264 00:19:40,160 --> 00:19:44,860 figure two, relative to when we looked at with 100 flips. 265 00:19:44,860 --> 00:19:46,500 What else is pretty interesting about the 266 00:19:46,500 --> 00:19:49,620 difference between these two figures, if you can remember 267 00:19:49,620 --> 00:19:52,460 the other one? 268 00:19:52,460 --> 00:19:52,820 Yeah? 269 00:19:52,820 --> 00:19:54,500 STUDENT: There are no zeros? 270 00:19:54,500 --> 00:19:57,850 PROFESSOR: There are no zeros. 271 00:19:57,850 --> 00:20:04,660 That's right, as it happens there were no zeros. 272 00:20:04,660 --> 00:20:07,290 Not so surprising that it didn't ever come out 273 00:20:07,290 --> 00:20:10,560 exactly 500, 500. 274 00:20:10,560 --> 00:20:14,420 What else? 275 00:20:14,420 --> 00:20:16,810 What was the biggest difference, percentage-wise, 276 00:20:16,810 --> 00:20:19,030 we saw last time? 277 00:20:19,030 --> 00:20:20,930 Over 25. 278 00:20:20,930 --> 00:20:24,940 So notice how much narrower the range is here. 279 00:20:24,940 --> 00:20:29,480 Instead of ranging from 2 to 25 or something like that, it 280 00:20:29,480 --> 00:20:35,620 ranges from 0 to 7, or maybe 7 and a little. 281 00:20:35,620 --> 00:20:41,400 So, by flipping more coins, the experiment becomes more 282 00:20:41,400 --> 00:20:43,800 reproduce-able. 283 00:20:43,800 --> 00:20:49,450 I'm looks like the same because of scaling, but in 284 00:20:49,450 --> 00:20:52,630 fact the range is much narrower. 285 00:20:52,630 --> 00:20:57,200 Each experiment tends to give an answer closer to all the 286 00:20:57,200 --> 00:21:01,280 other experiments. 287 00:21:01,280 --> 00:21:03,230 That's a good thing. 288 00:21:03,230 --> 00:21:08,300 It should give you confidence that the answers are getting 289 00:21:08,300 --> 00:21:10,900 pretty close to right. 290 00:21:10,900 --> 00:21:13,810 That they're not bouncing all over the place. 291 00:21:13,810 --> 00:21:17,360 And if I were to flip a million coins, we would find 292 00:21:17,360 --> 00:21:22,010 the range would get very tight. 293 00:21:22,010 --> 00:21:24,810 So notice that even though there's similar information in 294 00:21:24,810 --> 00:21:30,380 the histogram and the plot, different things leap out at 295 00:21:30,380 --> 00:21:36,790 you, as you look at it. 296 00:21:36,790 --> 00:21:41,240 All right, we could ask a lot of other interesting questions 297 00:21:41,240 --> 00:21:46,460 about coins here. 298 00:21:46,460 --> 00:21:49,080 But, we'll come back to this in a minute and look at some 299 00:21:49,080 --> 00:21:52,030 other questions. 300 00:21:52,030 --> 00:21:56,390 I want to talk again a little bit more generally. 301 00:21:56,390 --> 00:21:59,370 It's kind of easy to think about running a simulation to 302 00:21:59,370 --> 00:22:01,770 predict the future. 303 00:22:01,770 --> 00:22:06,730 So in some sense, we look at this, and this predicts what 304 00:22:06,730 --> 00:22:10,680 might happen if I flipped 1,000 coins. 305 00:22:10,680 --> 00:22:16,620 That the most likely event would be that I'd have 306 00:22:16,620 --> 00:22:24,910 something under 10 in the difference between heads and 307 00:22:24,910 --> 00:22:28,680 tails, but that it's not terribly unlikely that I might 308 00:22:28,680 --> 00:22:34,730 have close to 70 as a difference. 309 00:22:34,730 --> 00:22:38,090 And if I ran more than 100 trials I'd see more, but this 310 00:22:38,090 --> 00:22:42,660 helps me predict what might happen. 311 00:22:42,660 --> 00:22:47,150 Now we don't always use simulations to predict what 312 00:22:47,150 --> 00:22:47,960 might happen. 313 00:22:47,960 --> 00:22:52,690 We sometimes actually use simulations to understand the 314 00:22:52,690 --> 00:22:54,540 current state of the world. 315 00:22:54,540 --> 00:22:59,460 So for example, if I told you that we are going to flip 316 00:22:59,460 --> 00:23:06,930 three coins, and I wanted you to predict the probability 317 00:23:06,930 --> 00:23:09,960 that all three would be either heads, or all 318 00:23:09,960 --> 00:23:12,050 three would be tails. 319 00:23:12,050 --> 00:23:14,410 Well, if you'd studied any probability, you could know 320 00:23:14,410 --> 00:23:15,990 how to do that. 321 00:23:15,990 --> 00:23:18,770 If you hadn't studied probability, you would say, 322 00:23:18,770 --> 00:23:22,210 well, that's OK, we have a simulation right here. 323 00:23:22,210 --> 00:23:27,170 Let's just do it. 324 00:23:27,170 --> 00:23:45,360 Here we go again. 325 00:23:45,360 --> 00:23:46,670 And so let's try it. 326 00:23:46,670 --> 00:24:06,160 Let's flip three coins, and let's do it 4,000 times here. 327 00:24:06,160 --> 00:24:07,630 Well, that's kind of hard to read. 328 00:24:07,630 --> 00:24:09,720 It's pretty dense. 329 00:24:09,720 --> 00:24:19,260 But we can see that the mean here is 50. 330 00:24:19,260 --> 00:24:24,630 And, this is a little easier to read. 331 00:24:24,630 --> 00:24:31,580 This tells us that, how many times will the difference, 332 00:24:31,580 --> 00:24:38,200 right, be zero 3,000 out of 4,000. 333 00:24:38,200 --> 00:24:39,920 Is that right? 334 00:24:39,920 --> 00:24:41,020 What do you think? 335 00:24:41,020 --> 00:24:42,050 Do you believe this? 336 00:24:42,050 --> 00:24:47,140 Have I done the right thing? three coins, 4,000 flips, how 337 00:24:47,140 --> 00:24:49,280 often should they all be heads, or how often should 338 00:24:49,280 --> 00:24:54,000 they all be tails? 339 00:24:54,000 --> 00:24:54,840 What does this tell us? 340 00:24:54,840 --> 00:25:01,150 It tells us one-fourth of the time they'll all be-- the 341 00:25:01,150 --> 00:25:03,920 difference between, wait a minute, how can the difference 342 00:25:03,920 --> 00:25:09,610 between -- something's wrong with my code, right? 343 00:25:09,610 --> 00:25:11,800 Because I only have two possible values. 344 00:25:11,800 --> 00:25:17,110 I hadn't expected this. 345 00:25:17,110 --> 00:25:20,690 I obviously messed something up. 346 00:25:20,690 --> 00:25:21,000 Pardon? 347 00:25:21,000 --> 00:25:22,770 STUDENT: It's right. 348 00:25:22,770 --> 00:25:24,050 PROFESSOR: It's right, because? 349 00:25:24,050 --> 00:25:28,992 STUDENT: Because you had an odd number of flips, and when 350 00:25:28,992 --> 00:25:29,360 you split them -- 351 00:25:29,360 --> 00:25:29,720 PROFESSOR: Pardon? 352 00:25:29,720 --> 00:25:32,070 STUDENT: When you split an odd number -- 353 00:25:32,070 --> 00:25:35,070 PROFESSOR: Exactly. 354 00:25:35,070 --> 00:25:36,380 So it is correct. 355 00:25:36,380 --> 00:25:37,860 And it gives us what we want. 356 00:25:37,860 --> 00:25:41,150 But now, let's think about a different situation. 357 00:25:41,150 --> 00:25:42,420 Anybody got a coin here? 358 00:25:42,420 --> 00:25:46,670 Anyone give me three coins? 359 00:25:46,670 --> 00:25:55,100 I can trust somebody, I hope. 360 00:25:55,100 --> 00:25:57,160 What a cheap -- anybody got silver dollars, would be 361 00:25:57,160 --> 00:25:59,420 preferable? 362 00:25:59,420 --> 00:26:01,470 All right, look at this. 363 00:26:01,470 --> 00:26:06,640 She's very carefully given me three pennies. 364 00:26:06,640 --> 00:26:09,710 She had big, big money in that purse, too, but she didn't 365 00:26:09,710 --> 00:26:11,120 want me to have it. 366 00:26:11,120 --> 00:26:15,140 All right, so I'm going to take these three pennies, 367 00:26:15,140 --> 00:26:19,440 jiggle them up, and now ask you, what's the probability 368 00:26:19,440 --> 00:26:24,290 that all three of them are heads? 369 00:26:24,290 --> 00:26:26,770 Anyone want to tell me? 370 00:26:26,770 --> 00:26:30,180 It's either 0 or 1, right? 371 00:26:30,180 --> 00:26:32,050 And I can actually look at you and tell you 372 00:26:32,050 --> 00:26:33,890 exactly which it is. 373 00:26:33,890 --> 00:26:38,420 And you can't see which it is. 374 00:26:38,420 --> 00:26:43,070 So, how should you think about what the probability it? 375 00:26:43,070 --> 00:26:48,960 Well, you might as well assume that it's whatever this graph 376 00:26:48,960 --> 00:26:52,810 tells you it is. 377 00:26:52,810 --> 00:26:55,730 Because the fact that you don't have access to the 378 00:26:55,730 --> 00:26:59,570 information, means that you really might as well treat the 379 00:26:59,570 --> 00:27:02,250 present as if it's the future. 380 00:27:02,250 --> 00:27:04,400 That it's unknown. 381 00:27:04,400 --> 00:27:08,950 And so in fact we frequently, when there's data out there 382 00:27:08,950 --> 00:27:13,420 that we don't have access to, we use simulations and 383 00:27:13,420 --> 00:27:17,570 probabilities to estimate, make our best guess, about the 384 00:27:17,570 --> 00:27:20,690 current state of the world. 385 00:27:20,690 --> 00:27:24,670 And so, in fact, guessing the value of the current state, is 386 00:27:24,670 --> 00:27:27,900 really no different from predicting the value of a 387 00:27:27,900 --> 00:27:34,230 future state when you don't have the information. 388 00:27:34,230 --> 00:27:40,530 In general, all right, now, just to show that your 389 00:27:40,530 --> 00:27:52,860 precautions were unnecessary. 390 00:27:52,860 --> 00:27:55,170 Where was I? 391 00:27:55,170 --> 00:27:57,660 Right. 392 00:27:57,660 --> 00:28:05,470 In general, when we're trying to predict the future, or in 393 00:28:05,470 --> 00:28:09,850 this case, guess the present, we have to use information we 394 00:28:09,850 --> 00:28:17,100 already have to make our prediction or our best guess. 395 00:28:17,100 --> 00:28:21,590 So to do that, we have to always ask the question, is 396 00:28:21,590 --> 00:28:28,860 past behavior a good prediction of future behavior? 397 00:28:28,860 --> 00:28:32,095 So if I flip a coin 1,000 times and count up the heads 398 00:28:32,095 --> 00:28:35,230 and tails, is that a good prediction what will happen 399 00:28:35,230 --> 00:28:39,720 the next time? 400 00:28:39,720 --> 00:28:44,010 This is a step people often omit, in doing these 401 00:28:44,010 --> 00:28:46,360 predictions. 402 00:28:46,360 --> 00:28:49,430 See the recent meltdown of the financial system. 403 00:28:49,430 --> 00:28:52,910 Where people had lots of stochastic simulations 404 00:28:52,910 --> 00:28:55,040 predicting what the market would do, and they were all 405 00:28:55,040 --> 00:28:58,640 wrong, because they were all based upon assuming samples 406 00:28:58,640 --> 00:29:02,660 drawn from the past would predict the future. 407 00:29:02,660 --> 00:29:06,270 So, as we build these models, that's the question you always 408 00:29:06,270 --> 00:29:08,840 have to ask yourself. 409 00:29:08,840 --> 00:29:14,770 Is, in some sense, this true? 410 00:29:14,770 --> 00:29:17,810 Because usually what we're doing is, we're choosing a 411 00:29:17,810 --> 00:29:21,130 random sample from the past and hoping it 412 00:29:21,130 --> 00:29:24,960 predicts the future. 413 00:29:24,960 --> 00:29:27,700 And that is to say, is the population we have available 414 00:29:27,700 --> 00:29:31,620 the same has the one in the future. 415 00:29:31,620 --> 00:29:34,620 So it's easy to see how one might use these kinds of 416 00:29:34,620 --> 00:29:39,570 simulations to figure out things that are inherently 417 00:29:39,570 --> 00:29:41,150 stochastic. 418 00:29:41,150 --> 00:29:47,300 So for example, to predict a poker hand. 419 00:29:47,300 --> 00:29:49,830 What's the probability of my getting a full house when I 420 00:29:49,830 --> 00:29:53,060 draw this card from the deck? 421 00:29:53,060 --> 00:29:55,770 To predict the probability of coming up with a particular 422 00:29:55,770 --> 00:29:57,530 kind of poker hand. 423 00:29:57,530 --> 00:30:02,080 Is a full house more probable than a straight? 424 00:30:02,080 --> 00:30:03,300 Or not? 425 00:30:03,300 --> 00:30:06,600 Well, you can deal out lots of cards, and count them up, just 426 00:30:06,600 --> 00:30:12,360 as Ulam suggested for Solitaire. 427 00:30:12,360 --> 00:30:14,690 And that's often what we do. 428 00:30:14,690 --> 00:30:18,720 Interestingly enough though, we can use randomized 429 00:30:18,720 --> 00:30:25,430 techniques to get solutions to problems that are not 430 00:30:25,430 --> 00:30:29,000 inherently stochastic. 431 00:30:29,000 --> 00:30:31,000 And that's what I want to do now. 432 00:30:31,000 --> 00:30:37,770 So, consider for example, pi. 433 00:30:37,770 --> 00:30:41,900 Many of you have probably heard of this. 434 00:30:41,900 --> 00:30:45,140 For thousands of years, literally, people have known 435 00:30:45,140 --> 00:30:51,390 that there is a constant, pi, associated with circles such 436 00:30:51,390 --> 00:30:59,820 that pi times the radius squared equals the area. 437 00:30:59,820 --> 00:31:03,160 And they've known that pi times the diameter is equal to 438 00:31:03,160 --> 00:31:08,740 the circumference. 439 00:31:08,740 --> 00:31:13,120 So, back in the days of the Egyptian pharaohs, it was 440 00:31:13,120 --> 00:31:15,870 known that such a constant existed. 441 00:31:15,870 --> 00:31:17,910 In fact, it didn't acquire the name pi 442 00:31:17,910 --> 00:31:20,260 until the 18th century. 443 00:31:20,260 --> 00:31:22,500 And so they called it other things, but 444 00:31:22,500 --> 00:31:26,660 they knew it existed. 445 00:31:26,660 --> 00:31:30,130 And for thousands of years, people have speculated on what 446 00:31:30,130 --> 00:31:32,460 it's value was. 447 00:31:32,460 --> 00:31:42,530 Sometime around 1650 BC, the Egyptians said that pi was 448 00:31:42,530 --> 00:31:46,850 3.16, something called the Rhind Papyrus, 449 00:31:46,850 --> 00:31:52,230 something they found. 450 00:31:52,230 --> 00:31:56,620 Many years later, about 1,000 years later, the 451 00:31:56,620 --> 00:32:04,540 Bible said pi was three. 452 00:32:04,540 --> 00:32:11,450 And I quote, this is describing the specifications 453 00:32:11,450 --> 00:32:15,170 for the Great Temple of Solomon. "He made a molten sea 454 00:32:15,170 --> 00:32:19,720 of 10 cubits from brim to brim, round in compass, and 5 455 00:32:19,720 --> 00:32:23,800 cubit the height thereof, and a line of 30 cubits did 456 00:32:23,800 --> 00:32:29,780 compass it round about." So, all right, so what we've got 457 00:32:29,780 --> 00:32:32,070 here is, we've got everything we need to plug into these 458 00:32:32,070 --> 00:32:37,070 equations and solve for pi, and it comes out three. 459 00:32:37,070 --> 00:32:42,300 And it does this in more than one place in the Bible. 460 00:32:42,300 --> 00:32:46,440 I will not comment on the theological implications of 461 00:32:46,440 --> 00:32:49,960 this assertion. 462 00:32:49,960 --> 00:32:52,030 Sarah Palin might. 463 00:32:52,030 --> 00:32:55,330 And Mike Huckabee certainly would. 464 00:32:55,330 --> 00:33:00,790 The first theoretical calculation of pi was carried 465 00:33:00,790 --> 00:33:05,820 out by Archimedes, a great Greek mathematician from 466 00:33:05,820 --> 00:33:11,150 Syracuse, that was about somewhere around 250 BC. 467 00:33:11,150 --> 00:33:19,040 And he said that pi was somewhere between 223 divided 468 00:33:19,040 --> 00:33:31,190 by 71, and 22 divided by 7. 469 00:33:31,190 --> 00:33:34,550 This was amazingly profound. 470 00:33:34,550 --> 00:33:39,340 He knew he didn't know what the answer was, but he had a 471 00:33:39,340 --> 00:33:42,030 way to give an upper and a lower bound, and say it was 472 00:33:42,030 --> 00:33:45,190 somewhere between these two values. 473 00:33:45,190 --> 00:33:50,630 And in fact if you calculate it, the average of those two 474 00:33:50,630 --> 00:33:56,210 values is 3.1418. 475 00:33:56,210 --> 00:33:58,830 Not bad for the time. 476 00:33:58,830 --> 00:34:00,880 This was not by measurement, he actually had a very 477 00:34:00,880 --> 00:34:04,750 interesting way of calculating it. 478 00:34:04,750 --> 00:34:09,260 All right, so this is where it stood, for years and years, 479 00:34:09,260 --> 00:34:12,200 because of course people forgot the Rhind Papyrus, and 480 00:34:12,200 --> 00:34:15,250 they forgot Archimedes, and they believed the Bible, and 481 00:34:15,250 --> 00:34:19,040 so three was used for a long time. 482 00:34:19,040 --> 00:34:23,710 People sort of knew it wasn't right, but still. 483 00:34:23,710 --> 00:34:33,170 Then quite interestingly, Buffon and Laplace, two great 484 00:34:33,170 --> 00:34:36,580 French mathematicians, actually people had better 485 00:34:36,580 --> 00:34:39,690 estimates using Archimedes' methods long before they came 486 00:34:39,690 --> 00:34:45,780 along, proposed a way to do it using a simulation. 487 00:34:45,780 --> 00:34:52,580 Now, since Laplace lived between 1749 and 1827, it was 488 00:34:52,580 --> 00:34:56,240 not a computer simulation. 489 00:34:56,240 --> 00:34:59,010 So I'm going to show you, basically, the way that he 490 00:34:59,010 --> 00:35:08,150 proposed to do it. the basic idea was you take a square, 491 00:35:08,150 --> 00:35:12,410 assume that's a square, and you inscribe in it a quarter 492 00:35:12,410 --> 00:35:16,060 of a circle. 493 00:35:16,060 --> 00:35:23,870 So here, you have the radius of the square r. 494 00:35:23,870 --> 00:35:31,490 And then you get some person to, he used needles, but I'm 495 00:35:31,490 --> 00:35:37,130 going to use darts, to throw darts at the shape. 496 00:35:37,130 --> 00:35:44,520 And some number of the darts will land in the circle part, 497 00:35:44,520 --> 00:35:48,960 and some number of the darts will land out here, in the 498 00:35:48,960 --> 00:35:55,610 part of the square that's not inscribed by the circle. 499 00:35:55,610 --> 00:36:02,260 And then we can look at the ratio of the darts in the 500 00:36:02,260 --> 00:36:10,560 shaded area divided by the total number of 501 00:36:10,560 --> 00:36:16,900 darts in the square. 502 00:36:16,900 --> 00:36:27,220 And that's equal to the shaded area divided by the area of 503 00:36:27,220 --> 00:36:31,070 the square. 504 00:36:31,070 --> 00:36:34,800 The notion being, if they're landing at random in these 505 00:36:34,800 --> 00:36:41,460 places, the proportion here and not here will depend upon 506 00:36:41,460 --> 00:36:44,540 the relative areas. 507 00:36:44,540 --> 00:36:45,870 And that certainly makes sense. 508 00:36:45,870 --> 00:36:49,780 If this were half the area of the square, then you'd expect 509 00:36:49,780 --> 00:36:55,130 half the darts to land in here. 510 00:36:55,130 --> 00:36:58,970 And then as you can see in your handout, a little simple 511 00:36:58,970 --> 00:37:09,000 algebra can take this, plus pi r squared equals the area, you 512 00:37:09,000 --> 00:37:16,180 can solve for pi, and you can get that pi is equal to 4, and 513 00:37:16,180 --> 00:37:20,210 I'll write it, h, where h is the hit ratio, the number 514 00:37:20,210 --> 00:37:22,810 falling in here. 515 00:37:22,810 --> 00:37:27,330 So people sort of see why that should work intuitively? 516 00:37:27,330 --> 00:37:31,660 And that it's a very clever idea to use randomness to find 517 00:37:31,660 --> 00:37:34,090 a value that there's nothing random about. 518 00:37:34,090 --> 00:37:36,570 So we can now do the experiment. 519 00:37:36,570 --> 00:37:42,330 I need volunteers to throw darts. 520 00:37:42,330 --> 00:37:44,100 Come on. 521 00:37:44,100 --> 00:37:48,500 Come on up. 522 00:37:48,500 --> 00:37:50,360 I need more volunteers. 523 00:37:50,360 --> 00:37:51,360 I have a lot of darts. 524 00:37:51,360 --> 00:37:56,230 Anybody else? 525 00:37:56,230 --> 00:37:58,130 Anybody? 526 00:37:58,130 --> 00:38:00,270 All right, then since you're all in the front 527 00:38:00,270 --> 00:38:07,330 row, you get stuck. 528 00:38:07,330 --> 00:38:11,180 So now we'll try it. 529 00:38:11,180 --> 00:38:14,780 And you guys, we'll see how many of you hit in the circle, 530 00:38:14,780 --> 00:38:20,910 and how many of you hit there. 531 00:38:20,910 --> 00:38:24,160 Go ahead, on the count of 3, everybody throw. 532 00:38:24,160 --> 00:38:28,350 1, 2, 3. 533 00:38:28,350 --> 00:38:30,880 Ohh! 534 00:38:30,880 --> 00:38:34,220 He did that on purpose. 535 00:38:34,220 --> 00:38:36,450 You'll notice Professor Grimson isn't here today, and 536 00:38:36,450 --> 00:38:38,240 that's because I told him he was going to have to hold the 537 00:38:38,240 --> 00:38:41,240 dart board. 538 00:38:41,240 --> 00:38:47,680 Well, what we see here is, we ignore the ones that missed 539 00:38:47,680 --> 00:38:49,890 all together. 540 00:38:49,890 --> 00:38:52,590 And we'll see that, truly, I'm assuming 541 00:38:52,590 --> 00:38:54,180 these are random throws. 542 00:38:54,180 --> 00:38:55,570 We have one here and two here. 543 00:38:55,570 --> 00:39:01,430 Well, your eyes will tell you that's the wrong ratio. 544 00:39:01,430 --> 00:39:04,660 Which suggests that having students throw darts is not 545 00:39:04,660 --> 00:39:07,390 the best way to solve this problem. 546 00:39:07,390 --> 00:39:09,770 And so you will see in your handout a computer 547 00:39:09,770 --> 00:39:14,250 simulation of it. 548 00:39:14,250 --> 00:39:20,300 So let's look at that. 549 00:39:20,300 --> 00:39:23,520 So this is find pi. 550 00:39:23,520 --> 00:39:26,380 So at the beginning of this code, by the way, it's not on 551 00:39:26,380 --> 00:39:29,540 your handout, is some magic. 552 00:39:29,540 --> 00:39:32,530 I got tired of looking at big numbers without commas 553 00:39:32,530 --> 00:39:34,910 separating the thousands places. 554 00:39:34,910 --> 00:39:36,440 You've see me in other lectures counting 555 00:39:36,440 --> 00:39:38,920 the number of zeros. 556 00:39:38,920 --> 00:39:43,630 What we have here is, that just tells it I have to have 557 00:39:43,630 --> 00:39:47,050 two versions, one for the Mac, and one for the PC. 558 00:39:47,050 --> 00:39:51,070 To set some variables that had to write integers, things in 559 00:39:51,070 --> 00:39:54,380 general, and I'm saying here, do it the way you would do it 560 00:39:54,380 --> 00:39:57,940 in the United States in English. 561 00:39:57,940 --> 00:40:00,530 And UTF8 is just an extended character code. 562 00:40:00,530 --> 00:40:02,350 Anyway, you don't need to learn anything about this 563 00:40:02,350 --> 00:40:05,930 magic, but it's just a handy little way to make the numbers 564 00:40:05,930 --> 00:40:09,290 easier to read. 565 00:40:09,290 --> 00:40:14,940 All right, so let's let's try and look at it. 566 00:40:14,940 --> 00:40:18,740 There's not much interesting to see here. 567 00:40:18,740 --> 00:40:21,680 I've done this little thing, format ints, that uses this 568 00:40:21,680 --> 00:40:25,910 magic to say grouping equals true, that means put a comma 569 00:40:25,910 --> 00:40:27,990 in the thousand places. 570 00:40:27,990 --> 00:40:29,470 But again, you can ignore all that. 571 00:40:29,470 --> 00:40:37,860 The interesting part, is that from Pylab I import star, 572 00:40:37,860 --> 00:40:39,640 import random in math. 573 00:40:39,640 --> 00:40:42,730 As some of you observed, the order these imports matters. 574 00:40:42,730 --> 00:40:45,970 I think I sent out an email yesterday explaining what was 575 00:40:45,970 --> 00:40:47,510 going on here. 576 00:40:47,510 --> 00:40:49,880 This was one of the things that I knew, and probably 577 00:40:49,880 --> 00:40:51,120 should've mentioned. 578 00:40:51,120 --> 00:40:54,250 But since I knew it, I thought everyone knew. 579 00:40:54,250 --> 00:40:55,230 Silly me. 580 00:40:55,230 --> 00:40:58,760 It was of course a dumb thing to think. 581 00:40:58,760 --> 00:41:00,940 And then I'm going to throw a bunch of darts. 582 00:41:00,940 --> 00:41:03,140 The other thing you'll notice is, throw darts has a 583 00:41:03,140 --> 00:41:05,910 parameter called should plot. 584 00:41:05,910 --> 00:41:09,370 And that's because when I throw a billion darts, I 585 00:41:09,370 --> 00:41:11,530 really don't want to try and take the time to plot a 586 00:41:11,530 --> 00:41:13,420 billion points. 587 00:41:13,420 --> 00:41:18,030 So let's first look at a little example. 588 00:41:18,030 --> 00:41:32,090 We'll try throwing 10,000 darts. 589 00:41:32,090 --> 00:41:38,640 And it gives me an estimated value of pi of 3.16. 590 00:41:38,640 --> 00:41:44,370 And what we'll see here, is that the number of darts 591 00:41:44,370 --> 00:41:46,660 thrown, the estimate changes, right? 592 00:41:46,660 --> 00:41:52,780 When I threw one dart, the estimate of pi was 4. 593 00:41:52,780 --> 00:41:56,780 I threw my second dart, it dropped all the way to 3. 594 00:41:56,780 --> 00:41:59,450 And then it bounced around a while, and then at the end, it 595 00:41:59,450 --> 00:42:03,420 starts to really stabilize around the true value. 596 00:42:03,420 --> 00:42:06,680 You'll notice, by the way, that what I've got here is a 597 00:42:06,680 --> 00:42:08,420 logarithmic x-axis. 598 00:42:08,420 --> 00:42:11,620 If you look at the code, you'll see I've told it to be 599 00:42:11,620 --> 00:42:13,710 semi log x. 600 00:42:13,710 --> 00:42:16,440 And that's because I wanted you to be able to see what was 601 00:42:16,440 --> 00:42:19,460 happening early on, where it was fluctuating. 602 00:42:19,460 --> 00:42:21,490 But out here it's kind of boring, because the 603 00:42:21,490 --> 00:42:23,970 fluctuations are so small. 604 00:42:23,970 --> 00:42:34,100 So that was a good way to do it. 605 00:42:34,100 --> 00:42:37,020 All right now. 606 00:42:37,020 --> 00:42:38,940 Do I think I have enough samples here? 607 00:42:38,940 --> 00:42:43,290 Well, I don't want you to cheat and look at the estimate 608 00:42:43,290 --> 00:42:45,070 and say no, you don't, because you know that's 609 00:42:45,070 --> 00:42:46,740 not the right answer. 610 00:42:46,740 --> 00:42:51,670 And, it's not even as good as Archimedes did. 611 00:42:51,670 --> 00:42:55,360 But how could you sort of look at the data, and get a sense 612 00:42:55,360 --> 00:42:58,580 that maybe this is not the right answer? 613 00:42:58,580 --> 00:43:01,720 Well, even at the end, if we look at it, it's still 614 00:43:01,720 --> 00:43:03,390 wiggling around a fair amount. 615 00:43:03,390 --> 00:43:12,050 We can zoom in. 616 00:43:12,050 --> 00:43:14,220 And it's bouncing up and down here. 617 00:43:14,220 --> 00:43:20,350 I'm in a region, but it's sort of makes us think that maybe 618 00:43:20,350 --> 00:43:22,100 it hasn't stabilized, right? 619 00:43:22,100 --> 00:43:27,150 You'd like it to not be moving very much. 620 00:43:27,150 --> 00:43:31,080 Now, by the way, the other thing we could've looked at, 621 00:43:31,080 --> 00:43:38,490 when we ran it, let's run it again, probably get a 622 00:43:38,490 --> 00:43:43,700 different answer by the way. 623 00:43:43,700 --> 00:43:47,160 Yeah, notice the different answer here. 624 00:43:47,160 --> 00:43:49,020 Turns out to be a better answer, but it's just an 625 00:43:49,020 --> 00:43:55,760 accident, right? 626 00:43:55,760 --> 00:43:59,270 Notice in the beginning it fluctuates wildly, and it 627 00:43:59,270 --> 00:44:01,810 fluctuates less wildly at the end. 628 00:44:01,810 --> 00:44:04,850 Why is that? 629 00:44:04,850 --> 00:44:06,545 And don't just say because it's close to right 630 00:44:06,545 --> 00:44:09,210 and it knows it. 631 00:44:09,210 --> 00:44:13,180 Why do the mathematics of this, in some sense, tell us 632 00:44:13,180 --> 00:44:16,450 it has to fluctuate less wildly at the end? 633 00:44:16,450 --> 00:44:17,200 Yes? 634 00:44:17,200 --> 00:44:22,950 STUDENT: [INAUDIBLE] 635 00:44:22,950 --> 00:44:27,100 PROFESSOR: Exactly, exactly right. 636 00:44:27,100 --> 00:44:30,130 If I've only thrown two darts, the third dart can have a big 637 00:44:30,130 --> 00:44:33,110 difference in the average value. 638 00:44:33,110 --> 00:44:35,710 But if I've thrown a million darts, the million and first 639 00:44:35,710 --> 00:44:38,530 can't matter very much. 640 00:44:38,530 --> 00:44:41,510 And what this tells me is, as I want ever more digits of 641 00:44:41,510 --> 00:44:47,570 precision, I have to run a lot more trials to get there. 642 00:44:47,570 --> 00:44:50,960 And that's often true, that simulations can get you in the 643 00:44:50,960 --> 00:44:55,180 neighborhood quickly, but the more precision you want, the 644 00:44:55,180 --> 00:45:00,920 number of steps grows quite quickly. 645 00:45:00,920 --> 00:45:03,570 Now, the fact that I got such different answers the two 646 00:45:03,570 --> 00:45:08,000 times I ran this suggests strongly that I shouldn't 647 00:45:08,000 --> 00:45:10,310 believe either answer. 648 00:45:10,310 --> 00:45:13,300 Right? 649 00:45:13,300 --> 00:45:17,300 So we need to do something else. 650 00:45:17,300 --> 00:45:30,890 So let's try something else. 651 00:45:30,890 --> 00:45:32,760 Let's try throwing a lot more darts here, 652 00:45:32,760 --> 00:45:42,420 and see what we get. 653 00:45:42,420 --> 00:45:44,780 Now if you look at my code, you'll see I'm printing 654 00:45:44,780 --> 00:45:47,950 intermediate values. 655 00:45:47,950 --> 00:45:50,700 Every million darts, I'm printing the value. 656 00:45:50,700 --> 00:45:53,890 And I did that because the first time I ran this on a big 657 00:45:53,890 --> 00:45:56,920 number, I was afraid I had an infinite loop and my program 658 00:45:56,920 --> 00:45:58,570 was not working. 659 00:45:58,570 --> 00:46:01,040 So I just said, all right, let's put a print statement in 660 00:46:01,040 --> 00:46:06,560 the loop, so I could see that it's making progress. 661 00:46:06,560 --> 00:46:08,910 And then I decided it was just kind of nice to look at it, to 662 00:46:08,910 --> 00:46:11,390 see what was going on here. 663 00:46:11,390 --> 00:46:15,970 So now you see that if I throw 10 million darts, I'm starting 664 00:46:15,970 --> 00:46:18,500 to get a much better estimate. 665 00:46:18,500 --> 00:46:22,580 You'll also see, as predicted, that as I get further out, the 666 00:46:22,580 --> 00:46:26,300 value of the estimate changes less and less with each 667 00:46:26,300 --> 00:46:29,590 million new darts, because it's a smaller fraction of the 668 00:46:29,590 --> 00:46:31,570 total darts. 669 00:46:31,570 --> 00:46:36,000 But it's getting a lot better. 670 00:46:36,000 --> 00:46:40,650 Still not quite there. 671 00:46:40,650 --> 00:46:47,190 Let's just see what happens, I can throw in another one. 672 00:46:47,190 --> 00:46:51,230 This is going to take a little while. 673 00:46:51,230 --> 00:46:53,930 So I can talk while it's running. 674 00:46:53,930 --> 00:47:06,790 Oops, what did I do here? 675 00:47:06,790 --> 00:47:12,520 So, it's going to keep on going and going and going. 676 00:47:12,520 --> 00:47:15,520 And then if we were to run it with this number of darts 677 00:47:15,520 --> 00:47:18,950 several times over, we would discover that we got answers 678 00:47:18,950 --> 00:47:22,040 that were very, very similar. 679 00:47:22,040 --> 00:47:27,010 From that we can take comfort, statistically, that we're 680 00:47:27,010 --> 00:47:30,880 really getting close to the same answer every time, so 681 00:47:30,880 --> 00:47:34,810 we've probably thrown enough darts to feel comfortable that 682 00:47:34,810 --> 00:47:39,580 we're doing what's statistically the right thing. 683 00:47:39,580 --> 00:47:41,650 And that there maybe isn't a lot of point in 684 00:47:41,650 --> 00:47:43,750 throwing more darts. 685 00:47:43,750 --> 00:47:48,120 Does that mean that we have the right answer? 686 00:47:48,120 --> 00:47:51,690 No, not necessarily, and that's what we're going to 687 00:47:51,690 --> 00:47:53,750 look at next week.