1 00:00:00,000 --> 00:00:02,000 OPERATOR: The following content is provided under a 2 00:00:02,000 --> 00:00:03,840 Creative Commons license. 3 00:00:03,840 --> 00:00:06,840 Your support helps MIT OpenCourseWare continue to 4 00:00:06,840 --> 00:00:10,530 offer high quality educational resources for free. 5 00:00:10,530 --> 00:00:13,390 To make a donation or view additional materials from 6 00:00:13,390 --> 00:00:17,490 hundreds of MIT courses, visit MIT OpenCourseWare at 7 00:00:17,490 --> 00:00:19,980 ocw.mit.edu. 8 00:00:19,980 --> 00:00:24,490 PROFESSOR: You'll recall, at least some of you will recall, 9 00:00:24,490 --> 00:00:28,410 that last time we ended up looking at a simulation of a 10 00:00:28,410 --> 00:00:33,320 drunken university student wandering randomly in a field. 11 00:00:33,320 --> 00:00:37,300 I'm going to return to that. 12 00:00:37,300 --> 00:00:40,740 Before I get to the interesting part, I do want to 13 00:00:40,740 --> 00:00:44,410 call your attention to something that some people 14 00:00:44,410 --> 00:00:47,990 seemed a little bit confused by last time. 15 00:00:47,990 --> 00:00:49,900 So you remember that we had this thing 16 00:00:49,900 --> 00:00:53,500 called perform trial. 17 00:00:53,500 --> 00:00:58,550 And the way we basically tested the drunk, what would 18 00:00:58,550 --> 00:01:03,340 happen, is we would call this, passing in a time, the number 19 00:01:03,340 --> 00:01:07,970 of steps, the amount of time that the drunk was wondering, 20 00:01:07,970 --> 00:01:09,580 and the field. 21 00:01:09,580 --> 00:01:14,170 And there was just 1 line, that I blazed past, probably 22 00:01:14,170 --> 00:01:16,430 too quickly. 23 00:01:16,430 --> 00:01:23,200 So what that line is saying, is we, from the field, we're 24 00:01:23,200 --> 00:01:26,990 going to call the method get drunk, which is going to 25 00:01:26,990 --> 00:01:30,130 return us a drunk. 26 00:01:30,130 --> 00:01:33,370 And then we're going to call the move method for that 27 00:01:33,370 --> 00:01:38,150 drunk, passing the field as an argument. 28 00:01:38,150 --> 00:01:42,200 Now the question is, what would have happened if I had 29 00:01:42,200 --> 00:01:47,210 left off these parentheses? 30 00:01:47,210 --> 00:01:52,270 There. 31 00:01:52,270 --> 00:01:54,540 A volunteer? 32 00:01:54,540 --> 00:01:59,190 What would, we will do the experiment in a minute. 33 00:01:59,190 --> 00:02:02,680 And by the way, I want to emphasize the importance of 34 00:02:02,680 --> 00:02:04,370 experimentation. 35 00:02:04,370 --> 00:02:07,080 I was kind of surprised, for example, to get email from 36 00:02:07,080 --> 00:02:10,140 people saying, well, what was the correct answer to question 37 00:02:10,140 --> 00:02:12,150 3 on the quiz? 38 00:02:12,150 --> 00:02:14,340 And my sort of response was, why don't you 39 00:02:14,340 --> 00:02:18,400 type it and find out? 40 00:02:18,400 --> 00:02:20,920 When in doubt, run the experiment. 41 00:02:20,920 --> 00:02:24,170 So, what's going to happen, what would have happened, had 42 00:02:24,170 --> 00:02:29,100 I failed to type that pair of parentheses? 43 00:02:29,100 --> 00:02:34,560 Anybody? 44 00:02:34,560 --> 00:02:43,050 Well, what is, get drunk? 45 00:02:43,050 --> 00:02:45,430 It is a method. 46 00:02:45,430 --> 00:02:50,510 And remember, in Python, methods, like classes, are 47 00:02:50,510 --> 00:02:55,500 themselves objects. 48 00:02:55,500 --> 00:02:59,380 So get drunk would have a returned an object which was a 49 00:02:59,380 --> 00:03:10,340 method, and then, well, let's try it. 50 00:03:10,340 --> 00:03:11,820 And the error message will tell us 51 00:03:11,820 --> 00:03:14,930 everything we need to know. 52 00:03:14,930 --> 00:03:20,200 Function has no attribute move. 53 00:03:20,200 --> 00:03:24,610 Sure enough, the method does not have an attribute move. 54 00:03:24,610 --> 00:03:28,930 The instance of the class has the attribute. 55 00:03:28,930 --> 00:03:37,340 And so what those parentheses did, was tell me to invoke the 56 00:03:37,340 --> 00:03:40,800 method get drunk, so now instead of that being the 57 00:03:40,800 --> 00:03:45,820 method itself, it's the result returned by the method. 58 00:03:45,820 --> 00:03:51,090 Which is an instance of class drunk, and that instance has 59 00:03:51,090 --> 00:03:57,190 an attribute move, which is itself a method. 60 00:03:57,190 --> 00:04:00,330 This is a very common programming paradigm, and it's 61 00:04:00,330 --> 00:04:04,380 important to sort of lock into your heads, the distinction 62 00:04:04,380 --> 00:04:10,050 between the method and an invocation of the method. 63 00:04:10,050 --> 00:04:13,740 Because you can write either, and sometimes you won't get an 64 00:04:13,740 --> 00:04:17,320 error message, you'll just get the wrong answer. 65 00:04:17,320 --> 00:04:19,620 And that's kind of bad. 66 00:04:19,620 --> 00:04:22,890 That make sense to everybody? 67 00:04:22,890 --> 00:04:25,660 Does it not make sense to anybody, maybe is the question 68 00:04:25,660 --> 00:04:29,050 I should ask? 69 00:04:29,050 --> 00:04:32,210 All right, we'll go on, and we'll see more examples of 70 00:04:32,210 --> 00:04:34,820 this kind of thing. 71 00:04:34,820 --> 00:04:39,370 Now last week, I ran this several times to see what 72 00:04:39,370 --> 00:04:42,300 would happen, and we saw that in fact contrary to 73 00:04:42,300 --> 00:04:46,200 everyone's, most everyone's, expectation, the longer we ran 74 00:04:46,200 --> 00:04:48,570 the simulation, the further the drunk was from the 75 00:04:48,570 --> 00:04:50,780 starting point. 76 00:04:50,780 --> 00:04:54,920 And we saw that by plotting how far the drunk was after 77 00:04:54,920 --> 00:04:57,200 each time unit. 78 00:04:57,200 --> 00:05:01,290 We ran it several times, we got different answers. 79 00:05:01,290 --> 00:05:05,620 And at that point we should ask ourselves, is that really 80 00:05:05,620 --> 00:05:10,320 the way to go about answering the original question? 81 00:05:10,320 --> 00:05:15,720 Which was, how far should we expect the drunk to be? 82 00:05:15,720 --> 00:05:17,880 The answer is no. 83 00:05:17,880 --> 00:05:21,370 I don't want to sit there at the keyboard typing in 400 84 00:05:21,370 --> 00:05:25,480 examples and then, in my head, trying to figure out what's 85 00:05:25,480 --> 00:05:29,050 quote, typical, unquote. 86 00:05:29,050 --> 00:05:35,390 Instead, I need to organize my simulation so that it runs a 87 00:05:35,390 --> 00:05:42,490 number of trials for me and summarizes the results. 88 00:05:42,490 --> 00:05:46,290 All the simulations we're going to look at it, in fact 89 00:05:46,290 --> 00:05:50,010 almost all the simulations anyone will ever write, sort 90 00:05:50,010 --> 00:05:55,490 of, have the same kind of structure. 91 00:05:55,490 --> 00:06:14,610 We start with an inner loop that simulates one trial. 92 00:06:14,610 --> 00:06:17,090 That's what we have here, right, we have happen to have 93 00:06:17,090 --> 00:06:18,610 a function. 94 00:06:18,610 --> 00:06:22,840 So when I say inner loop, what I mean is, not that I'll write 95 00:06:22,840 --> 00:06:26,320 a program a bunch of nested loops, but that I'll write a 96 00:06:26,320 --> 00:06:31,390 program with some function calls, and down at sort of the 97 00:06:31,390 --> 00:06:36,430 bottom of the stack will be perform trial. 98 00:06:36,430 --> 00:06:39,830 Which stimulates one trial of some number of 99 00:06:39,830 --> 00:06:46,900 seconds, in this case. 100 00:06:46,900 --> 00:07:04,320 And then I'll quote, enclose, unquote the inner loop in 101 00:07:04,320 --> 00:07:25,650 another loop that conducts an appropriate number of trials. 102 00:07:25,650 --> 00:07:29,070 Now a little bit later in the term, we'll get to the 103 00:07:29,070 --> 00:07:33,900 question of, how do we know what an appropriate number is? 104 00:07:33,900 --> 00:07:39,520 For today, we'll just say, a lot. 105 00:07:39,520 --> 00:07:42,080 And we'll talk about that a little bit more. 106 00:07:42,080 --> 00:07:55,750 And then finally, what we want to do, is calculate and 107 00:07:55,750 --> 00:08:07,600 present some relevant statistics about the trials. 108 00:08:07,600 --> 00:08:11,410 And we'll talk about what's relevant. 109 00:08:11,410 --> 00:08:17,030 I want to emphasize today the presentation of them. 110 00:08:17,030 --> 00:08:20,850 Last time we looked at a graph, which I think you'll 111 00:08:20,850 --> 00:08:24,430 all agree, was a lot prettier to look at an array of, say, 112 00:08:24,430 --> 00:08:33,630 1000 numbers. 113 00:08:33,630 --> 00:08:34,920 All right. 114 00:08:34,920 --> 00:08:41,460 So now, on your handout and on the screen, you'll see the way 115 00:08:41,460 --> 00:08:43,460 we've done this. 116 00:08:43,460 --> 00:08:46,580 So perform trial, we've already seen, so we know what 117 00:08:46,580 --> 00:08:49,210 the inner loop looks like. 118 00:08:49,210 --> 00:08:53,100 We can ignore first test, that was just my taking the code 119 00:08:53,100 --> 00:08:58,240 that I had in line last time and putting it in a function. 120 00:08:58,240 --> 00:09:03,470 And now what you'll see in the handout is perform sim and 121 00:09:03,470 --> 00:09:07,380 answer question. 122 00:09:07,380 --> 00:09:11,570 So, let's look it those. 123 00:09:11,570 --> 00:09:17,880 So perform sim, sim for simulation, takes the amount 124 00:09:17,880 --> 00:09:21,850 of time, the number of steps for each trial, plus the 125 00:09:21,850 --> 00:09:26,590 number of trials. 126 00:09:26,590 --> 00:09:32,590 It starts with an empty list, dist short for distances here, 127 00:09:32,590 --> 00:09:35,390 saying so far we haven't run any trials, we don't know what 128 00:09:35,390 --> 00:09:37,970 any of the distances are. 129 00:09:37,970 --> 00:09:42,600 And then for trial in range number of trials, it creates a 130 00:09:42,600 --> 00:09:50,930 drunk, and I'm putting the trial as part of the drunk's 131 00:09:50,930 --> 00:09:53,450 name, just so we can make sure that the 132 00:09:53,450 --> 00:09:56,070 drunks are all different. 133 00:09:56,070 --> 00:09:59,780 Then I'm going to create a field with that drunk in it at 134 00:09:59,780 --> 00:10:03,900 location 0,0. 135 00:10:03,900 --> 00:10:08,630 And then I'm going to call perform trial with the time in 136 00:10:08,630 --> 00:10:14,430 that field to get the distances. 137 00:10:14,430 --> 00:10:22,960 What does perform trial return? 138 00:10:22,960 --> 00:10:25,070 We looked at it earlier. 139 00:10:25,070 --> 00:10:30,160 What is perform trial returning? 140 00:10:30,160 --> 00:10:35,670 Somebody? 141 00:10:35,670 --> 00:10:39,280 Surely someone can figure this one out. 142 00:10:39,280 --> 00:10:45,710 I have a whole new bag of candy here, post-Halloween. 143 00:10:45,710 --> 00:10:47,320 What kind of thing is it returning? 144 00:10:47,320 --> 00:10:49,150 Is it returning a number? 145 00:10:49,150 --> 00:10:52,090 If you think it's returning a number, oh, ok, you gonna tell 146 00:10:52,090 --> 00:10:55,870 me what kind of thing it's returning, please? 147 00:10:55,870 --> 00:10:57,450 A list, thank you. 148 00:10:57,450 --> 00:11:02,370 And it's a list of what? 149 00:11:02,370 --> 00:11:04,670 I'm giving you the candy on spec, assuming you'll give me 150 00:11:04,670 --> 00:11:07,990 the right answer. 151 00:11:07,990 --> 00:11:10,380 List of distances, exactly. 152 00:11:10,380 --> 00:11:13,000 So how far way it is after each time step? 153 00:11:13,000 --> 00:11:19,320 This is exactly the list that we graphed last time. 154 00:11:19,320 --> 00:11:26,000 So perform sim will get that list and append it to the 155 00:11:26,000 --> 00:11:34,020 list. So dist list will be a list of lists, each element 156 00:11:34,020 --> 00:11:38,540 will be a list of distances, right? 157 00:11:38,540 --> 00:11:42,110 OK, so that's good. 158 00:11:42,110 --> 00:11:50,650 And that's now step 2. 159 00:11:50,650 --> 00:11:54,730 In some sense, all of this is irrelevant to the actual 160 00:11:54,730 --> 00:11:58,350 question we started with, in some sense. 161 00:11:58,350 --> 00:12:00,590 This is just the structure. 162 00:12:00,590 --> 00:12:03,420 Then I'm going to write a function called answer 163 00:12:03,420 --> 00:12:08,430 question, or ans quest, designed to actually address 164 00:12:08,430 --> 00:12:12,020 the original question. 165 00:12:12,020 --> 00:12:14,560 So it, too, is going to create a list. This is 166 00:12:14,560 --> 00:12:19,350 a list called means. 167 00:12:19,350 --> 00:12:23,120 Then it's going to call perform sim, to get this list 168 00:12:23,120 --> 00:12:35,470 of lists, then it will go through that and calculate the 169 00:12:35,470 --> 00:12:40,290 means and create a list of means. 170 00:12:40,290 --> 00:12:43,080 And then it will plot it, and we'll get to, before this 171 00:12:43,080 --> 00:12:45,890 lecture's over, how the plotting works. 172 00:12:45,890 --> 00:12:49,150 So I'm following exactly that structure here. 173 00:12:49,150 --> 00:12:55,890 It calls this function, which runs an appropriate number of 174 00:12:55,890 --> 00:12:58,890 trials by calling that function, and then we'll 175 00:12:58,890 --> 00:13:01,850 calculate and present some statistics. 176 00:13:01,850 --> 00:13:11,240 So now let's run it. 177 00:13:11,240 --> 00:13:12,530 All right, so what have I done? 178 00:13:12,530 --> 00:13:15,840 I typed an inadvertent chara -- ah, yes, I typed an s which 179 00:13:15,840 --> 00:13:30,400 I didn't intend to type. 180 00:13:30,400 --> 00:13:33,870 It's going to take a little while, it's loading Pylab. 181 00:13:33,870 --> 00:13:37,780 Now it's running the simulation. 182 00:13:37,780 --> 00:13:43,230 All right, and here's a picture. 183 00:13:43,230 --> 00:13:50,820 So, when we ran it, let's look at the code for a minute here, 184 00:13:50,820 --> 00:13:55,470 what we can see is at the bottom, I called ans quest, 185 00:13:55,470 --> 00:14:04,350 saying each trial should be 500 steps, and run 100 trials. 186 00:14:04,350 --> 00:14:20,120 And then we'll plot this graph. 187 00:14:20,120 --> 00:14:23,170 Graph is lurking somewhere in there, it is. 188 00:14:23,170 --> 00:14:26,530 And one of the nice things we'll see is, it's kind of 189 00:14:26,530 --> 00:14:31,860 smooth, And we'll come back to this, but the fact that it's 190 00:14:31,860 --> 00:14:38,150 sort of smooth makes me feel that running 100 trials might 191 00:14:38,150 --> 00:14:43,800 actually be enough to give me a consistent answer. 192 00:14:43,800 --> 00:14:45,930 You know, if it had been bouncing up and down as we 193 00:14:45,930 --> 00:14:51,860 went, then we'd say, jeez, no trend here. 194 00:14:51,860 --> 00:14:56,250 Seeing a relatively smooth trend makes me feel somewhat 195 00:14:56,250 --> 00:14:59,750 comfortable that we're actually getting an 196 00:14:59,750 --> 00:15:01,960 appropriate answer. 197 00:15:01,960 --> 00:15:04,890 And that if I were to run 500 trials, the line would be 198 00:15:04,890 --> 00:15:08,650 smoother, but it would look kind of the same. 199 00:15:08,650 --> 00:15:12,610 Because it doesn't look like it's moving here, in arbitrary 200 00:15:12,610 --> 00:15:14,050 directions large amounts. 201 00:15:14,050 --> 00:15:19,150 It's not like the stock market. 202 00:15:19,150 --> 00:15:23,680 Should I be happy? 203 00:15:23,680 --> 00:15:25,960 I've sort of done what I wanted, I kind of I think I 204 00:15:25,960 --> 00:15:30,460 have an answer now, which is 500 steps, it should be four 205 00:15:30,460 --> 00:15:37,470 and a half units away from the origin. 206 00:15:37,470 --> 00:15:39,250 What do you think? 207 00:15:39,250 --> 00:15:48,440 Who think this is the right answer? 208 00:15:48,440 --> 00:15:55,070 So who thinks it's a wrong answer, raise your hand? 209 00:15:55,070 --> 00:15:57,970 All right, TAs, what do you guys think? 210 00:15:57,970 --> 00:15:59,000 Putting you on the spot. 211 00:15:59,000 --> 00:16:03,760 Right answer or wrong answer? 212 00:16:03,760 --> 00:16:06,410 They think it's right. 213 00:16:06,410 --> 00:16:11,820 Well, shame on them. 214 00:16:11,820 --> 00:16:16,870 Let's remember, rack our brains to a week ago, when we 215 00:16:16,870 --> 00:16:19,760 ran a bunch of individual tests. 216 00:16:19,760 --> 00:16:22,750 And let's see what we get if we do that. 217 00:16:22,750 --> 00:16:27,370 And the point here is, it's always good to check. 218 00:16:27,370 --> 00:16:30,020 My recollection, when I looked at this, was that 219 00:16:30,020 --> 00:16:32,740 something was amiss. 220 00:16:32,740 --> 00:16:36,650 Because I kind of remember, when I ran the test last time, 221 00:16:36,650 --> 00:16:41,150 we were more like 40 away than four away. 222 00:16:41,150 --> 00:16:44,670 Well all right, let's try it. 223 00:16:44,670 --> 00:16:58,220 We'll, sometimes happens, all right, I'm going to have to 224 00:16:58,220 --> 00:17:03,360 restart Idol here, just, as you all, at least all who use 225 00:17:03,360 --> 00:17:07,060 Macintoshes know, this happens sometimes, it's not 226 00:17:07,060 --> 00:17:18,590 catastrophic. 227 00:17:18,590 --> 00:17:22,190 Sigh. 228 00:17:22,190 --> 00:17:26,030 So this reminds me of the old joke. 229 00:17:26,030 --> 00:17:31,760 That a computer scientist, a mechanical engineer, were 230 00:17:31,760 --> 00:17:36,990 riding in a car and the car stalled, stopped running. 231 00:17:36,990 --> 00:17:39,320 And the mechanical engineer said I know what to do, let's 232 00:17:39,320 --> 00:17:41,440 go out and check the carburetor, 233 00:17:41,440 --> 00:17:43,480 and look at the engine. 234 00:17:43,480 --> 00:17:45,390 The computer scientist said, no that's the 235 00:17:45,390 --> 00:17:47,480 wrong thing to do. 236 00:17:47,480 --> 00:17:50,990 What you ought to do is, let's turn off the key, get out of 237 00:17:50,990 --> 00:17:53,630 the car, shut the doors, open the doors, get back in and 238 00:17:53,630 --> 00:17:56,550 restart it. 239 00:17:56,550 --> 00:17:58,940 And sure enough, it worked. 240 00:17:58,940 --> 00:18:03,870 So when in doubt, reboot. 241 00:18:03,870 --> 00:18:09,580 So, we'll come down, we'll do that, and we're going to call 242 00:18:09,580 --> 00:18:18,010 first test here, and see what that gives us. 243 00:18:18,010 --> 00:18:33,980 And we'll, for the moment, ignore that. 244 00:18:33,980 --> 00:18:35,320 Well, look at this. 245 00:18:35,320 --> 00:18:38,200 We ran a bunch of Homer's random walks, and maybe it 246 00:18:38,200 --> 00:18:45,700 isn't 40, but not even one of them was four. 247 00:18:45,700 --> 00:18:50,540 So now we see is, we've run two tests, and we've gotten 248 00:18:50,540 --> 00:18:54,210 inconsistent answers. 249 00:18:54,210 --> 00:18:57,260 Well, we don't know which one is wrong. 250 00:18:57,260 --> 00:18:59,590 We know that one of them is wrong. 251 00:18:59,590 --> 00:19:02,090 We don't even know that, maybe we just got unlucky with these 252 00:19:02,090 --> 00:19:04,050 five tests. 253 00:19:04,050 --> 00:19:08,240 But odds are, something is wrong, that 254 00:19:08,240 --> 00:19:10,770 there's a bug here. 255 00:19:10,770 --> 00:19:15,400 And, we have to figure out which one. 256 00:19:15,400 --> 00:19:20,670 So how would we go about doing that? 257 00:19:20,670 --> 00:19:28,370 Well, I'm going to do what I always recommend people do. 258 00:19:28,370 --> 00:19:34,320 Which was, find a really simple example, One for which 259 00:19:34,320 --> 00:19:38,210 I actually know the answer. 260 00:19:38,210 --> 00:19:41,110 So what would be a good example for which I might know 261 00:19:41,110 --> 00:19:49,060 the answer? 262 00:19:49,060 --> 00:19:53,810 Give me the simplest example of a simulation of the random 263 00:19:53,810 --> 00:19:57,110 walk I could run, where you're confident you know 264 00:19:57,110 --> 00:19:59,020 what the answer is. 265 00:19:59,020 --> 00:20:03,880 Yeah? one step, exactly, and what's the 266 00:20:03,880 --> 00:20:07,630 answer after one step? 267 00:20:07,630 --> 00:20:08,780 One. 268 00:20:08,780 --> 00:20:12,120 She can't catch and talk at the same time. 269 00:20:12,120 --> 00:20:12,730 Exactly. 270 00:20:12,730 --> 00:20:17,170 So we know if we simulate it, one, the drunk has moved in 271 00:20:17,170 --> 00:20:19,730 some direction, and is going to be exactly one step from 272 00:20:19,730 --> 00:20:21,140 the origin. 273 00:20:21,140 --> 00:20:25,070 So now we can go and see what we get. 274 00:20:25,070 --> 00:20:27,920 So let's do that. 275 00:20:27,920 --> 00:20:36,410 And we'll change this to be one, and we'll 276 00:20:36,410 --> 00:20:45,410 change this to be one. 277 00:20:45,410 --> 00:20:56,210 We'll see what the answer is. 278 00:20:56,210 --> 00:20:59,530 Well. 279 00:20:59,530 --> 00:21:02,990 50? 280 00:21:02,990 --> 00:21:06,550 Well, kind of makes me worry. 281 00:21:06,550 --> 00:21:08,380 1. 282 00:21:08,380 --> 00:21:15,070 All right, so we see that the simple test of Homer gives me 283 00:21:15,070 --> 00:21:15,770 the right answer. 284 00:21:15,770 --> 00:21:17,760 We don't know it's always the right answer, we know it's, at 285 00:21:17,760 --> 00:21:18,970 least for this 1. 286 00:21:18,970 --> 00:21:22,440 But we know the other 1 was just way off. 287 00:21:22,440 --> 00:21:24,910 Interestingly, unlike the previous time, instead of 288 00:21:24,910 --> 00:21:29,860 being way too low, it's way too high. 289 00:21:29,860 --> 00:21:33,160 So that gives us some pause for thought. 290 00:21:33,160 --> 00:21:37,120 And now we need to go in and debug it. 291 00:21:37,120 --> 00:21:40,720 So let's go and debug it. 292 00:21:40,720 --> 00:21:51,270 And seems to me the right thing to do is to go here. 293 00:21:51,270 --> 00:21:56,590 Oh, boy, I'm going to have to restart it again, not good. 294 00:21:56,590 --> 00:22:00,300 And we'll put an intermediate value. 295 00:22:00,300 --> 00:22:07,440 Actually, maybe we'll do it. 296 00:22:07,440 --> 00:22:16,510 What would be a nice thing to do here? 297 00:22:16,510 --> 00:22:18,580 We're going to come here. 298 00:22:18,580 --> 00:22:21,130 Well, let's, you know, we want to go somewhere halfway 299 00:22:21,130 --> 00:22:24,440 through the program, print some intermediate value that 300 00:22:24,440 --> 00:22:27,100 will give us some information. 301 00:22:27,100 --> 00:22:35,380 So, this might be a good place. 302 00:22:35,380 --> 00:22:43,980 And, what should we print? 303 00:22:43,980 --> 00:22:55,260 Well, what values do you think you should get here? 304 00:22:55,260 --> 00:22:55,590 Pardon? 305 00:22:55,590 --> 00:22:57,500 STUDENT: The total distance so far. 306 00:22:57,500 --> 00:23:02,160 PROFESSOR: The total distance so far. 307 00:23:02,160 --> 00:23:07,170 So that would be a good thing to print. 308 00:23:07,170 --> 00:23:24,930 And what do we think it should be? 309 00:23:24,930 --> 00:23:29,080 We'll comment this 1 out since we think that works, and just 310 00:23:29,080 --> 00:23:33,230 to be safe, let's not even run 100 trials let's, run one 311 00:23:33,230 --> 00:23:38,470 trial, or two trials maybe. 312 00:23:38,470 --> 00:23:45,540 See we get. 313 00:23:45,540 --> 00:23:49,200 0 and then 2. 314 00:23:49,200 --> 00:23:51,780 W, 0 was sort of what we expected the first time 315 00:23:51,780 --> 00:23:54,770 around, but 2? 316 00:23:54,770 --> 00:24:00,870 How did you get to be 2? 317 00:24:00,870 --> 00:24:03,940 Anyone want to see what's going on here? 318 00:24:03,940 --> 00:24:08,080 So we see, right here we have the wrong answer. 319 00:24:08,080 --> 00:24:09,440 Well, maybe we should see what things 320 00:24:09,440 --> 00:24:18,550 looked like before this? 321 00:24:18,550 --> 00:24:19,930 Is it the lists are wrong? 322 00:24:19,930 --> 00:24:23,130 What am I doing wrong here? 323 00:24:23,130 --> 00:24:32,100 I'll bet someone can figure this out. 324 00:24:32,100 --> 00:24:32,440 Pardon? 325 00:24:32,440 --> 00:24:35,520 STUDENT: [INAUDIBLE] 326 00:24:35,520 --> 00:24:39,510 PROFESSOR: Well, I'm adding them up, fair enough. 327 00:24:39,510 --> 00:24:42,960 But so tot looks OK. 328 00:24:42,960 --> 00:24:48,280 So, all right, maybe we should take a look at means. 329 00:24:48,280 --> 00:24:49,550 Right? 330 00:24:49,550 --> 00:25:03,820 Let's take a look at what that looks like. 331 00:25:03,820 --> 00:25:06,370 Not bad. 332 00:25:06,370 --> 00:25:08,570 All right, so maybe my example's too simple. 333 00:25:08,570 --> 00:25:19,420 Let's try a little bit bigger. 334 00:25:19,420 --> 00:25:23,180 Hmmm -- 335 00:25:23,180 --> 00:25:26,780 2.5? 336 00:25:26,780 --> 00:25:30,600 All right, so now I know what's going wrong is, somehow 337 00:25:30,600 --> 00:25:34,140 not that I'm messing up tot, but that I'm computing the 338 00:25:34,140 --> 00:25:37,530 mean incorrectly. 339 00:25:37,530 --> 00:25:39,750 Where am I computing the mean? 340 00:25:39,750 --> 00:25:41,600 They're only two expressions here. 341 00:25:41,600 --> 00:25:44,540 There's tot, we've checked that. 342 00:25:44,540 --> 00:25:48,380 So there must be a problem with the divisor, that's the 343 00:25:48,380 --> 00:25:51,110 only thing that's left. 344 00:25:51,110 --> 00:25:54,440 Yeah? 345 00:25:54,440 --> 00:25:55,180 STUDENT: [INAUDIBLE] 346 00:25:55,180 --> 00:25:56,580 PROFESSOR: Exactly right. 347 00:25:56,580 --> 00:26:02,090 I should be dividing by the length of the list. The number 348 00:26:02,090 --> 00:26:06,240 of things I'm adding to tot. 349 00:26:06,240 --> 00:26:19,030 So I just, inadvertently, divided by, I 350 00:26:19,030 --> 00:26:21,380 have a list of lists. 351 00:26:21,380 --> 00:26:23,410 And what I really wanted to do is, divide by 352 00:26:23,410 --> 00:26:25,300 the number of lists. 353 00:26:25,300 --> 00:26:28,720 Because I'm computing the mean for each list, adding it to 354 00:26:28,720 --> 00:26:31,700 total, and then at the end I need to divide by the number 355 00:26:31,700 --> 00:26:35,920 of lists who's means I computed, not by the length of 356 00:26:35,920 --> 00:26:41,950 1 of the lists, right? 357 00:26:41,950 --> 00:26:49,000 So now, let's see what happens if we run it. 358 00:26:49,000 --> 00:26:52,350 Now, we get some output printed, which I really didn't 359 00:26:52,350 --> 00:26:57,760 want, but it happens. 360 00:26:57,760 --> 00:27:02,210 Well this looks a lot better. 361 00:27:02,210 --> 00:27:02,540 Right? 362 00:27:02,540 --> 00:27:08,580 Sure enough, it's 1. 363 00:27:08,580 --> 00:27:10,650 All right, so now I'm feeling better. 364 00:27:10,650 --> 00:27:12,910 I'm going to get rid of this print statement, if we're 365 00:27:12,910 --> 00:27:17,630 gonna run a more extensive test. And now we can go back 366 00:27:17,630 --> 00:27:25,390 to our original question. 367 00:27:25,390 --> 00:27:35,850 And run it. 368 00:27:35,850 --> 00:27:37,620 Well, this looks a lot more consistent 369 00:27:37,620 --> 00:27:41,890 with what we saw before. 370 00:27:41,890 --> 00:27:49,930 It says that on average, you should be around 20. 371 00:27:49,930 --> 00:27:52,560 So we feel pretty good about that. 372 00:27:52,560 --> 00:27:56,130 Now, just to feel even better, I'm going to double the number 373 00:27:56,130 --> 00:28:10,510 of trials and see what that tells us. 374 00:28:10,510 --> 00:28:12,700 And it's still around 20. 375 00:28:12,700 --> 00:28:15,060 Line a little smoother. 376 00:28:15,060 --> 00:28:18,030 And if I where do 1000 trials will get a little smoother, 377 00:28:18,030 --> 00:28:21,890 and it would still be around 20. 378 00:28:21,890 --> 00:28:26,580 Maybe slightly different each time, but consistent with what 379 00:28:26,580 --> 00:28:29,760 we saw before, when we ran the other program. 380 00:28:29,760 --> 00:28:31,860 We can feel that we're actually 381 00:28:31,860 --> 00:28:34,310 doing something useful. 382 00:28:34,310 --> 00:28:38,060 And so now we can conclude, and would actually be the 383 00:28:38,060 --> 00:28:42,650 correct conclusion, that we know about how far this random 384 00:28:42,650 --> 00:28:47,500 drunk is going to move in 500 steps. 385 00:28:47,500 --> 00:28:50,270 And if you want to know how far he would move in 1000 386 00:28:50,270 --> 00:28:54,260 steps, we could try that, too. 387 00:28:54,260 --> 00:28:55,310 All right. 388 00:28:55,310 --> 00:28:59,280 What are the lessons here? 389 00:28:59,280 --> 00:29:05,750 One lesson is to look at the labels on the axes. 390 00:29:05,750 --> 00:29:08,520 Because if we just looked at it without noticing these 391 00:29:08,520 --> 00:29:13,130 numbers, it looks the same. 392 00:29:13,130 --> 00:29:13,770 Right? 393 00:29:13,770 --> 00:29:15,870 This doesn't look any different, in some sense, than 394 00:29:15,870 --> 00:29:18,270 when the numbers were four. 395 00:29:18,270 --> 00:29:20,730 So you can't just look at the shape of the curve, you have 396 00:29:20,730 --> 00:29:22,990 to look at the values. 397 00:29:22,990 --> 00:29:25,130 So what does that tell me? 398 00:29:25,130 --> 00:29:30,080 It tells me that a responsible person will 399 00:29:30,080 --> 00:29:32,030 always label the axes. 400 00:29:32,030 --> 00:29:39,730 As I have done here, not only giving you the numbers, but 401 00:29:39,730 --> 00:29:42,350 telling you it's the distance. 402 00:29:42,350 --> 00:29:45,820 I hate it when I look at graphs and I have to guess 403 00:29:45,820 --> 00:29:48,350 what the x- and y- axes are. 404 00:29:48,350 --> 00:29:53,040 Here it says time versus distance, and you also notice 405 00:29:53,040 --> 00:29:55,790 I put a title on it. 406 00:29:55,790 --> 00:30:00,370 So there's a point there. 407 00:30:00,370 --> 00:30:08,630 And look, when you're doing it. 408 00:30:08,630 --> 00:30:16,820 Ask if the answer make sense. 409 00:30:16,820 --> 00:30:20,060 One of the things we'll see as we go on, is you can get all 410 00:30:20,060 --> 00:30:24,920 your statistics right, and still get the wrong answer 411 00:30:24,920 --> 00:30:29,260 because of a consistent bug. 412 00:30:29,260 --> 00:30:33,370 And so always just say, do I believe it, or is this so 413 00:30:33,370 --> 00:30:37,230 counterintuitive that I'm suspicious? 414 00:30:37,230 --> 00:30:43,980 And as part of that ask, is it 415 00:30:43,980 --> 00:30:51,270 consistent with other evidence? 416 00:30:51,270 --> 00:30:54,280 In this case we had the evidence of watching an 417 00:30:54,280 --> 00:30:56,440 individual walk. 418 00:30:56,440 --> 00:30:59,290 Now those two things were not consistent, don't know which 419 00:30:59,290 --> 00:31:03,970 is wrong, but it must be one of them. 420 00:31:03,970 --> 00:31:07,650 And then the final point I wanted to make, is that you 421 00:31:07,650 --> 00:31:20,040 can be pretty systematic about debugging, And in particular, 422 00:31:20,040 --> 00:31:28,090 debug with a simple example. 423 00:31:28,090 --> 00:31:31,790 Right, instead of trying to debug 500 steps and 100 424 00:31:31,790 --> 00:31:35,300 trials, I said, all right, let's look at one step and 425 00:31:35,300 --> 00:31:38,410 four trials, five trials. 426 00:31:38,410 --> 00:31:42,000 OK, where in my head I knew what it should look like, and 427 00:31:42,000 --> 00:31:44,250 then I could check it. 428 00:31:44,250 --> 00:31:47,400 All right. 429 00:31:47,400 --> 00:31:52,650 Jumping up a level or three of abstraction now. 430 00:31:52,650 --> 00:31:54,800 What we've done, is we've introduced the notion of a 431 00:31:54,800 --> 00:32:00,900 random walk in the context of a pretty contrived example. 432 00:32:00,900 --> 00:32:05,750 But in fact, it's worth knowing that random walks are 433 00:32:05,750 --> 00:32:11,230 used all over the place to solve real problems, deal with 434 00:32:11,230 --> 00:32:13,100 real phenomena. 435 00:32:13,100 --> 00:32:19,070 So for example, if you look at something like Brownian 436 00:32:19,070 --> 00:32:31,650 motion, which can be used to model the path traced by a 437 00:32:31,650 --> 00:32:35,580 molecule as it travels in a liquid or a gas. 438 00:32:35,580 --> 00:32:41,260 Typically, people who do that model it using a random walk. 439 00:32:41,260 --> 00:32:45,180 And, depending upon, say the density of the gas or the 440 00:32:45,180 --> 00:32:49,790 liquid, the size of the molecules, they change 441 00:32:49,790 --> 00:32:53,680 parameters in the simulation, how far it, say, goes in each 442 00:32:53,680 --> 00:32:55,810 unit time and things like that. 443 00:32:55,810 --> 00:32:58,790 But they use a random walk to try and model 444 00:32:58,790 --> 00:33:02,240 what will really happened. 445 00:33:02,240 --> 00:33:06,740 People have attempted, for several hundred years now, to 446 00:33:06,740 --> 00:33:11,690 use, well, maybe a 150 years, to use random walks to model 447 00:33:11,690 --> 00:33:12,580 the stock market. 448 00:33:12,580 --> 00:33:17,540 There was a very famous book called A Random Walk Down Wall 449 00:33:17,540 --> 00:33:23,980 Street, that argued that things happened as a, random 450 00:33:23,980 --> 00:33:29,210 walk was a good way to model things. 451 00:33:29,210 --> 00:33:32,260 There's a lot of evidence that says that's wrong, but people 452 00:33:32,260 --> 00:33:36,310 continue to attempt to do it. 453 00:33:36,310 --> 00:33:39,820 They use it a lot in biology to do 454 00:33:39,820 --> 00:33:44,930 things like model kinetics. 455 00:33:44,930 --> 00:33:50,390 So, the kinetics of a protein, DNA strand exchange, things of 456 00:33:50,390 --> 00:33:52,140 that nature. 457 00:33:52,140 --> 00:33:55,490 A separation of macro-molecules, the movement 458 00:33:55,490 --> 00:34:02,630 of microorganisms all of those things are done in biology. 459 00:34:02,630 --> 00:34:04,490 And do that. 460 00:34:04,490 --> 00:34:10,330 People use it to model evolution. 461 00:34:10,330 --> 00:34:15,610 They look at mutations as kind of a random event. 462 00:34:15,610 --> 00:34:20,450 So, we'll come back to this, but random walks are used over 463 00:34:20,450 --> 00:34:24,100 and over and over again in the sciences, the social sciences 464 00:34:24,100 --> 00:34:29,020 and therefore a very useful thing to notice about. 465 00:34:29,020 --> 00:34:30,890 All right, we're going to come back to that. 466 00:34:30,890 --> 00:34:34,600 We're going to even come back to our drunken student and 467 00:34:34,600 --> 00:34:37,220 look at other kinds of random walks other than the kind we 468 00:34:37,220 --> 00:34:38,950 just looked at. 469 00:34:38,950 --> 00:34:42,650 Before I do that, though, I wanted back up and take the 470 00:34:42,650 --> 00:34:45,240 magic out of plotting. 471 00:34:45,240 --> 00:34:49,940 So we've gone from the sublime, of what random walks 472 00:34:49,940 --> 00:34:54,430 are good for, to in some sense the ridiculous, the actual 473 00:34:54,430 --> 00:34:56,590 syntax for plotting things. 474 00:34:56,590 --> 00:35:01,680 And maybe it's not ridiculous, but it's boring. 475 00:35:01,680 --> 00:35:05,420 But you need it, so let's look at it. 476 00:35:05,420 --> 00:35:12,650 So we're doing this using a package called Pylab, which is 477 00:35:12,650 --> 00:35:27,330 in itself built on a package called Pylab, either 478 00:35:27,330 --> 00:35:31,750 pronounced num p or num pi, you can choose your 479 00:35:31,750 --> 00:35:34,150 pronunciation as you prefer. 480 00:35:34,150 --> 00:35:38,770 This basically gives you a lot of operations on numbers, 481 00:35:38,770 --> 00:35:43,450 numbered things, and on top of that, someone bill Pylab which 482 00:35:43,450 --> 00:35:47,590 is designed to provide a Python interface to a lot of 483 00:35:47,590 --> 00:35:50,800 the functionality you get in Matlab. 484 00:35:50,800 --> 00:35:55,110 And in particular, we're going to be using today the plotting 485 00:35:55,110 --> 00:36:00,140 functionality that comes with Matlab, or the version of it. 486 00:36:00,140 --> 00:36:05,230 So we're going to say, from Pylab import star, that's just 487 00:36:05,230 --> 00:36:09,350 so I don't have to type Pylab dot plot every time. 488 00:36:09,350 --> 00:36:11,010 And I'm going import random which we're 489 00:36:11,010 --> 00:36:12,770 going to use later. 490 00:36:12,770 --> 00:36:15,200 So let's look at it now. 491 00:36:15,200 --> 00:36:19,860 First thing we're going to do is plot 1, 2, 3, 4, and then 492 00:36:19,860 --> 00:36:24,560 1, 2, 3, and then 5, 6, 7, 8. 493 00:36:24,560 --> 00:36:28,930 And then at the very bottom, you'll see this line show. 494 00:36:28,930 --> 00:36:31,480 That's going to annoy the heck out of you throughout the 495 00:36:31,480 --> 00:36:34,050 semester, the rest of the semester. 496 00:36:34,050 --> 00:36:37,860 Because what happens is, Pylab produces all these beautiful 497 00:36:37,860 --> 00:36:40,330 plots, and then does not display them 498 00:36:40,330 --> 00:36:43,240 until you type show. 499 00:36:43,240 --> 00:36:47,350 So remember, at the end of every program, kind of, the 500 00:36:47,350 --> 00:36:50,640 last thing you should execute should be show. 501 00:36:50,640 --> 00:36:54,160 You don't want to execute it in the middle, because what 502 00:36:54,160 --> 00:36:57,710 happens in the middle is it, in an interactive mode at 503 00:36:57,710 --> 00:37:00,530 least, it just stops. 504 00:37:00,530 --> 00:37:03,650 And displays the graphs, and until you make the plots go 505 00:37:03,650 --> 00:37:07,260 away, it won't execute the next line. 506 00:37:07,260 --> 00:37:09,900 Which is why I've tucked the show at the very bottom of my 507 00:37:09,900 --> 00:37:11,170 script here. 508 00:37:11,170 --> 00:37:14,600 Inevitably, you will forget to type show. 509 00:37:14,600 --> 00:37:17,600 You will ask a TA, how come my graphs aren't appearing in the 510 00:37:17,600 --> 00:37:21,330 screen, and the TA will say, did you do show? 511 00:37:21,330 --> 00:37:25,780 And you'll go -- but it happens to all of us. 512 00:37:25,780 --> 00:37:27,450 All right, so let's try it. 513 00:37:27,450 --> 00:37:34,870 See what we get. 514 00:37:34,870 --> 00:37:41,080 So sure enough, it's plotted the values 1, 2, 3, 4, and 5, 515 00:37:41,080 --> 00:37:44,810 6, 7, 8, on the x- and y- axis. 516 00:37:44,810 --> 00:37:49,610 Two things I want you to notice here. 517 00:37:49,610 --> 00:37:56,150 One Is, that both plots showed up on the same figure. 518 00:37:56,150 --> 00:37:58,290 Which is sometimes what you want, and sometimes 519 00:37:58,290 --> 00:37:59,960 not what you want. 520 00:37:59,960 --> 00:38:02,400 You'll notice that also happened with the random walk 521 00:38:02,400 --> 00:38:05,360 we looked at, where when I plotted five different walks 522 00:38:05,360 --> 00:38:08,180 for Homer they all showed up superimposed 523 00:38:08,180 --> 00:38:12,500 on top of one another. 524 00:38:12,500 --> 00:38:16,700 The other thing I want you to notice, is the x-axis runs 525 00:38:16,700 --> 00:38:19,840 from 0 to 3. 526 00:38:19,840 --> 00:38:22,550 So you might have kind of thought, that what we would 527 00:38:22,550 --> 00:38:27,240 see is a 45 degree angle on these things. 528 00:38:27,240 --> 00:38:32,320 But of course, Python, when not instructed otherwise, 529 00:38:32,320 --> 00:38:35,890 always starts at zero. 530 00:38:35,890 --> 00:38:43,680 Since when we called plot, I gave it only the y-values, it 531 00:38:43,680 --> 00:38:48,870 used default values for x. 532 00:38:48,870 --> 00:38:53,090 It was smart enough to say, since we have four y-values, 533 00:38:53,090 --> 00:38:57,270 we should need four x-values, and I'll choose the integers 534 00:38:57,270 --> 00:39:04,650 0, 1, 2, 3 as those values. 535 00:39:04,650 --> 00:39:08,560 Now you don't have to do that. 536 00:39:08,560 --> 00:39:18,300 We could do this instead. 537 00:39:18,300 --> 00:39:20,600 Let's try this one. 538 00:39:20,600 --> 00:39:26,010 What did I just do? 539 00:39:26,010 --> 00:39:31,760 Let's comment these two out, if we could only get there. 540 00:39:31,760 --> 00:39:42,270 This is highly annoying. 541 00:39:42,270 --> 00:39:45,820 Let's hope it doesn't tell me that I have to -- 542 00:39:45,820 --> 00:39:50,180 All right, so let's go here. 543 00:39:50,180 --> 00:39:58,200 We'll get rid of those guys, and we'll try this one. 544 00:39:58,200 --> 00:40:17,660 We'll plot 1, 2, 3, 4 against 1, 4, 9, 16. 545 00:40:17,660 --> 00:40:19,420 OK? 546 00:40:19,420 --> 00:40:26,060 So now, it's using 1, 2, 3, 4 as the x-axis, and the 547 00:40:26,060 --> 00:40:28,550 y-axis I gave it. 548 00:40:28,550 --> 00:40:31,920 First x then y. 549 00:40:31,920 --> 00:40:34,080 Now it looks a little funny, right, you might have not 550 00:40:34,080 --> 00:40:36,840 expected it to look like this. 551 00:40:36,840 --> 00:40:40,560 You'll notice they're these little inflection points here. 552 00:40:40,560 --> 00:40:43,580 Well, because what it's really doing is, I gave it a small 553 00:40:43,580 --> 00:40:46,440 number of points, only four. 554 00:40:46,440 --> 00:40:51,240 It's found those four points, and it's connected them, each 555 00:40:51,240 --> 00:40:55,700 point by a straight line. 556 00:40:55,700 --> 00:40:59,360 And since the points are kind of spread out, the line has 557 00:40:59,360 --> 00:41:03,210 little bumps in it. 558 00:41:03,210 --> 00:41:05,280 That makes sense to everyone? 559 00:41:05,280 --> 00:41:09,210 Now, it's often deceptive to plot things this way, where 560 00:41:09,210 --> 00:41:11,900 you think you have a continuous function when in 561 00:41:11,900 --> 00:41:17,640 fact you just have a few miscellaneous points. 562 00:41:17,640 --> 00:41:22,630 So let's look at another example. 563 00:41:22,630 --> 00:41:27,590 Here, what I'm going to do, is I've called figure, and 564 00:41:27,590 --> 00:41:34,990 remember, this is Pylab dot figure, which says, create a 565 00:41:34,990 --> 00:41:36,990 new figure. 566 00:41:36,990 --> 00:41:40,680 So instead of putting this new curve on the same figure as 567 00:41:40,680 --> 00:41:44,530 the old curve, start a new one. 568 00:41:44,530 --> 00:41:48,610 And furthermore, I've got this obscure little thing 569 00:41:48,610 --> 00:41:50,560 at the end of it. 570 00:41:50,560 --> 00:41:54,690 After you give it the x- and y- values, you can give it 571 00:41:54,690 --> 00:41:59,340 some instructions about how you want to plot points, or 572 00:41:59,340 --> 00:42:00,890 anything else. 573 00:42:00,890 --> 00:42:05,780 In this case, what this little string says is, each point 574 00:42:05,780 --> 00:42:12,820 should be represented as a red o. r for red, o for o. 575 00:42:12,820 --> 00:42:16,340 I'm not asking you to remember this, what you will discover, 576 00:42:16,340 --> 00:42:21,520 the good news is there's very good documentation on this. 577 00:42:21,520 --> 00:42:27,550 And so you'll find in the reading of pointer to plots, 578 00:42:27,550 --> 00:42:29,800 and it will tell you everything you need to know, 579 00:42:29,800 --> 00:42:32,380 all of the wizardry and the magic you can put in these 580 00:42:32,380 --> 00:42:36,130 strings that tell you how to do things. 581 00:42:36,130 --> 00:42:41,460 These are basically the same strings borrowed from Matlab. 582 00:42:41,460 --> 00:42:48,370 And now if we run it. 583 00:42:48,370 --> 00:42:53,620 Figure one is the same figure we saw before. 584 00:42:53,620 --> 00:42:58,660 But figure two has not connected the dots, not drawn 585 00:42:58,660 --> 00:43:05,220 a line, it's actually planted each, or plotted, excuse me, 586 00:43:05,220 --> 00:43:11,450 each point as a red circle. 587 00:43:11,450 --> 00:43:15,950 Now when I look at this, there's something that's not 588 00:43:15,950 --> 00:43:18,300 very pleasing about this. 589 00:43:18,300 --> 00:43:23,680 That in particular, I know I plotted four points, but it a 590 00:43:23,680 --> 00:43:30,510 quick glance it looks like they're only three. 591 00:43:30,510 --> 00:43:36,040 And that's because it's taking this fourth point and stuck it 592 00:43:36,040 --> 00:43:38,580 way up there in the corner where I missed it. 593 00:43:38,580 --> 00:43:40,040 It's there. 594 00:43:40,040 --> 00:43:44,590 But it's so close to the edge of the graph that it's kind of 595 00:43:44,590 --> 00:43:49,580 hard to see. 596 00:43:49,580 --> 00:44:09,520 So I can fix that by executing the command axis, which tells 597 00:44:09,520 --> 00:44:13,820 it how far I want it to be. 598 00:44:13,820 --> 00:44:17,240 And this says, I want 1 axis to go from 0 to 6, and the 599 00:44:17,240 --> 00:44:22,090 other 0 to 20. 600 00:44:22,090 --> 00:44:26,450 We'll do that, and also to avoid boring you, we'll do 601 00:44:26,450 --> 00:44:29,680 more at the same time. 602 00:44:29,680 --> 00:44:36,400 We'll put some labels on these things. 603 00:44:36,400 --> 00:44:38,810 I'm going to put that the title of the graph is going to 604 00:44:38,810 --> 00:44:44,960 be earnings, and that the x-axis will be labelled days, 605 00:44:44,960 --> 00:44:48,500 and the y-axis will be labelled dollars. 606 00:44:48,500 --> 00:44:55,210 So earnings dollars against days. 607 00:44:55,210 --> 00:45:03,970 OK, now let's see what happens when we do this. 608 00:45:03,970 --> 00:45:07,680 Well, we get the same ugly figure one as before, and now 609 00:45:07,680 --> 00:45:12,190 you can see figure two I've moved the axes so that my 610 00:45:12,190 --> 00:45:14,950 graph will show up in the middle rather than at the 611 00:45:14,950 --> 00:45:17,230 edges, and therefore easier to read. 612 00:45:17,230 --> 00:45:22,520 I put a title in the top, and I put labels on the axes. 613 00:45:22,520 --> 00:45:26,850 Every graph that I ask you to do this course, I want you to 614 00:45:26,850 --> 00:45:29,935 put a title on it and to label your axes so we know what 615 00:45:29,935 --> 00:45:32,560 we're reading. 616 00:45:32,560 --> 00:45:36,830 Again, nothing very deep here, this is really just syntax, 617 00:45:36,830 --> 00:45:38,900 just to give you an idea of the sorts of 618 00:45:38,900 --> 00:45:45,250 things you can do. 619 00:45:45,250 --> 00:45:47,480 All right. now we get to something a little bit more 620 00:45:47,480 --> 00:45:52,220 interesting. 621 00:45:52,220 --> 00:46:02,120 Let's look at this code here. 622 00:46:02,120 --> 00:46:07,470 So far, what I've been passing to the plot function for the 623 00:46:07,470 --> 00:46:13,410 x- and y- values are lists. 624 00:46:13,410 --> 00:46:22,530 In fact, what Pylab uses is something it gets from NumPy 625 00:46:22,530 --> 00:46:33,500 which are not lists really, but what it calls arrays. 626 00:46:33,500 --> 00:46:39,810 Now, truth be told, most programming languages use 627 00:46:39,810 --> 00:46:42,860 array to mean something quite different. 628 00:46:42,860 --> 00:46:53,380 But, in NumPy an array is basically a matrix. 629 00:46:53,380 --> 00:46:56,600 On which we can do some interesting things. 630 00:46:56,600 --> 00:47:02,210 So for example, when I say x-axes equals 631 00:47:02,210 --> 00:47:05,040 array 1, 2, 3, 4. 632 00:47:05,040 --> 00:47:09,380 Array is a type, so array applied to the list is just 633 00:47:09,380 --> 00:47:11,790 like applying float to an int. 634 00:47:11,790 --> 00:47:14,090 If I apply float to an int, it turns it into a 635 00:47:14,090 --> 00:47:15,960 floating point number. 636 00:47:15,960 --> 00:47:22,780 If I apply array to a list, it turns it into an array. 637 00:47:22,780 --> 00:47:25,380 Once it's an array, as we'll see, we can do some very 638 00:47:25,380 --> 00:47:28,720 interesting things with it. 639 00:47:28,720 --> 00:47:34,100 Now, in addition to getting an array by coercing a the list, 640 00:47:34,100 --> 00:47:35,900 which is probably the most common way to 641 00:47:35,900 --> 00:47:37,600 get it, by the way. 642 00:47:37,600 --> 00:47:40,590 Because you build up a list in simulations of the sort we 643 00:47:40,590 --> 00:47:43,480 looked at, and then you might want to change it to an array 644 00:47:43,480 --> 00:47:46,010 to perform some operations on it. 645 00:47:46,010 --> 00:47:50,080 You can get an array directly with aRange. 646 00:47:50,080 --> 00:47:52,540 This is just like the range function we've been using all 647 00:47:52,540 --> 00:47:56,900 term, but whereas the range function gives you a list of 648 00:47:56,900 --> 00:48:04,390 ints, this gives you an array of ints. 649 00:48:04,390 --> 00:48:08,120 But the nice thing about an array is, I can perform 650 00:48:08,120 --> 00:48:13,550 operations on it like this. 651 00:48:13,550 --> 00:48:17,840 So if I say y-axis equals x-axis raised to the third 652 00:48:17,840 --> 00:48:22,370 power, that's something I can't do with a list. I get an 653 00:48:22,370 --> 00:48:25,990 error message if I try that with a list. What that will do 654 00:48:25,990 --> 00:48:29,990 is, will point-wise, take each element in the 655 00:48:29,990 --> 00:48:34,290 array and cube it. 656 00:48:34,290 --> 00:48:38,300 So the nice thing about arrays is, you can use them to do the 657 00:48:38,300 --> 00:48:41,350 kinds of things you, if you ever took a linear algebra 658 00:48:41,350 --> 00:48:43,810 course, you learned about doing. 659 00:48:43,810 --> 00:48:46,510 You can multiply an array times an array, You can 660 00:48:46,510 --> 00:48:49,050 multiply an array times an integer. 661 00:48:49,050 --> 00:48:53,020 And sometimes that's a very convenient thing to do. 662 00:48:53,020 --> 00:48:55,270 It's not what this course is about, I don't want to 663 00:48:55,270 --> 00:48:56,680 emphasize it. 664 00:48:56,680 --> 00:49:00,960 I just want you to know it's there so if in some subsequent 665 00:49:00,960 --> 00:49:06,430 life you want to do more complicated manipulations, 666 00:49:06,430 --> 00:49:09,410 you'll know that that's possible. 667 00:49:09,410 --> 00:49:27,200 So let's run this and see what we get. 668 00:49:27,200 --> 00:49:32,150 So the first thing to look at, is we'll ignore the figure for 669 00:49:32,150 --> 00:49:36,830 the moment. 670 00:49:36,830 --> 00:49:42,440 And we've seen that when I printed test and I printed 671 00:49:42,440 --> 00:49:48,390 x-axis, they look the same, they are the same. 672 00:49:48,390 --> 00:49:51,520 And in fact, I can do this interesting thing now. 673 00:49:51,520 --> 00:49:54,740 Print test double equals x-axis. 674 00:49:54,740 --> 00:49:56,410 You might have thought that would return a 675 00:49:56,410 --> 00:50:00,050 single value, true. 676 00:50:00,050 --> 00:50:04,440 Instead it returns a list, where it's done a point-wise 677 00:50:04,440 --> 00:50:08,490 comparison of each element. 678 00:50:08,490 --> 00:50:13,070 So when we deal with arrays, they don't behave like lists. 679 00:50:13,070 --> 00:50:15,720 And you can imagine that it might be very convenient to be 680 00:50:15,720 --> 00:50:17,460 able to do this. 681 00:50:17,460 --> 00:50:21,250 Answers the question, are all the elements the same? 682 00:50:21,250 --> 00:50:23,180 Or which ones are the same? 683 00:50:23,180 --> 00:50:27,110 So you can imagine doing some very clever things with these. 684 00:50:27,110 --> 00:50:32,250 And certainly, if you can convert problems to vectors of 685 00:50:32,250 --> 00:50:37,980 this sort, you can really perform what's almost magical. 686 00:50:37,980 --> 00:50:44,450 And then when we look at the figure, which should be tucked 687 00:50:44,450 --> 00:50:51,120 away somewhere here, what did I do with the figure? 688 00:50:51,120 --> 00:50:57,250 Did I make it go away? 689 00:50:57,250 --> 00:51:05,020 Well, I think I did one of those ugly things and made it 690 00:51:05,020 --> 00:51:07,290 go away again. 691 00:51:07,290 --> 00:51:08,010 Oh, no, there it is. 692 00:51:08,010 --> 00:51:10,410 All right. 693 00:51:10,410 --> 00:51:14,620 And sure enough here, we're plotting a cubic. 694 00:51:14,620 --> 00:51:16,020 All right. 695 00:51:16,020 --> 00:51:21,340 Nothing very important to observe about any of that, 696 00:51:21,340 --> 00:51:23,790 other than that arrays are really quite interesting and 697 00:51:23,790 --> 00:51:26,660 can be very valuable. 698 00:51:26,660 --> 00:51:31,070 Finally, the thing I want to show you is that there are a 699 00:51:31,070 --> 00:51:39,390 lot of things we can do that are more interesting than what 700 00:51:39,390 --> 00:51:46,940 we've done. 701 00:51:46,940 --> 00:51:49,270 So now I'm going to use that random, which I brought in 702 00:51:49,270 --> 00:51:58,180 before, and show you that we can plot things other than 703 00:51:58,180 --> 00:52:01,690 simply curves. 704 00:52:01,690 --> 00:52:04,600 In this case, I'm going to plot a histogram. 705 00:52:04,600 --> 00:52:07,940 And what this histogram is going to do is, I'm going to 706 00:52:07,940 --> 00:52:12,510 throw a pair of dice a large number of times, and add up 707 00:52:12,510 --> 00:52:16,250 the sum, and see what I get. 708 00:52:16,250 --> 00:52:20,260 So, for die values equals 1 through 6, for i in range 709 00:52:20,260 --> 00:52:22,310 10,000, a lot of dice. 710 00:52:22,310 --> 00:52:24,940 I'm just going to append random choice, we've seen this 711 00:52:24,940 --> 00:52:30,910 before, of the two dice, and their sum. 712 00:52:30,910 --> 00:52:33,900 And then I'm going to plot a histogram, Pylab dot hist 713 00:52:33,900 --> 00:52:42,220 instead of plot, and we'll get something quite different. 714 00:52:42,220 --> 00:52:46,730 A nice little histogram showing me the values I get, 715 00:52:46,730 --> 00:52:50,250 and we will come back to this later and talk about why this 716 00:52:50,250 --> 00:52:52,490 is called a normal distribution.