1 00:00:00,790 --> 00:00:03,130 The following content is provided under a Creative 2 00:00:03,130 --> 00:00:04,550 Commons license. 3 00:00:04,550 --> 00:00:06,760 Your support will help MIT OpenCourseWare 4 00:00:06,760 --> 00:00:10,850 continue to offer high quality educational resources for free. 5 00:00:10,850 --> 00:00:13,390 To make a donation or to view additional materials 6 00:00:13,390 --> 00:00:17,320 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:17,320 --> 00:00:18,570 at ocw.mit.edu. 8 00:00:30,590 --> 00:00:32,840 JOHN GUTTAG: Today we're starting 9 00:00:32,840 --> 00:00:35,270 a new topic, which is, of course, related 10 00:00:35,270 --> 00:00:36,260 to previous topics. 11 00:00:39,230 --> 00:00:44,150 As usual, if you go to either the 60002 or the 600 web site, 12 00:00:44,150 --> 00:00:49,700 you'll find both today's PowerPoint and today's Python. 13 00:00:49,700 --> 00:00:52,050 You'll discover if you look at the Python file 14 00:00:52,050 --> 00:00:54,530 that there is quite a lot of code in there. 15 00:00:54,530 --> 00:00:57,050 And I'll be talking only about some of it. 16 00:00:57,050 --> 00:01:00,350 But it's probably all worth looking at. 17 00:01:00,350 --> 00:01:06,530 And a fair amount of reading associated with this week. 18 00:01:06,530 --> 00:01:09,140 Why are we looking at random walks? 19 00:01:09,140 --> 00:01:13,700 See a picture here of, think of them as molecules just bouncing 20 00:01:13,700 --> 00:01:15,351 around. 21 00:01:15,351 --> 00:01:16,850 This is actually a picture of what's 22 00:01:16,850 --> 00:01:20,750 called Brownian motion, though Robert Brown probably 23 00:01:20,750 --> 00:01:23,960 did not discover it. 24 00:01:23,960 --> 00:01:27,110 We're looking at random walks because, well, first of all, 25 00:01:27,110 --> 00:01:29,900 they're important in many domains. 26 00:01:29,900 --> 00:01:32,270 There are people who will argue, for example, 27 00:01:32,270 --> 00:01:35,000 that the movement of prices in the stock market 28 00:01:35,000 --> 00:01:37,970 is best modeled as a random walk. 29 00:01:37,970 --> 00:01:40,850 There was a very popular book called A Random Walk Down Wall 30 00:01:40,850 --> 00:01:43,330 Street that made this argument. 31 00:01:43,330 --> 00:01:49,604 And a lot of modern portfolio analysis is based upon that. 32 00:01:49,604 --> 00:01:51,770 Those of you who are not interested in making money, 33 00:01:51,770 --> 00:01:55,990 and I presume that's most of you, 34 00:01:55,990 --> 00:01:58,860 it's also very important in many physical processes. 35 00:01:58,860 --> 00:02:02,490 We use random walks, say, to model diffusion, 36 00:02:02,490 --> 00:02:06,450 heat diffusion, or the diffusion of molecules 37 00:02:06,450 --> 00:02:08,320 in suspension, et cetera. 38 00:02:08,320 --> 00:02:12,150 So they're very important in a lot of scientific, and indeed, 39 00:02:12,150 --> 00:02:14,880 social disciplines. 40 00:02:14,880 --> 00:02:17,010 They're not the only important thing, 41 00:02:17,010 --> 00:02:19,230 so why are we looking at those? 42 00:02:19,230 --> 00:02:22,530 Because I think it provides a really good illustration 43 00:02:22,530 --> 00:02:29,410 of how we can use simulation to understand the world around us. 44 00:02:29,410 --> 00:02:32,650 And it does give me an excuse to cover some important topics 45 00:02:32,650 --> 00:02:34,600 related to programming. 46 00:02:34,600 --> 00:02:37,390 You'll remember that one of the subtexts of the course 47 00:02:37,390 --> 00:02:40,690 is while I'm covering a lot of what you might think 48 00:02:40,690 --> 00:02:44,080 of as abstract material, we're using it 49 00:02:44,080 --> 00:02:46,990 as an excuse to teach more about programming and software 50 00:02:46,990 --> 00:02:48,190 engineering. 51 00:02:48,190 --> 00:02:51,370 A little practice with classes and subclassing, 52 00:02:51,370 --> 00:02:56,110 and we're going to also look at producing plots. 53 00:02:56,110 --> 00:02:58,930 So the first random walk I want to look at 54 00:02:58,930 --> 00:03:02,470 is actually not a diffusion process or the stock market, 55 00:03:02,470 --> 00:03:05,230 but an actual walk. 56 00:03:05,230 --> 00:03:07,930 So imagine that you've got a field which has somehow 57 00:03:07,930 --> 00:03:11,920 inexplicably been mown to look like a piece of graph paper, 58 00:03:11,920 --> 00:03:15,160 and you've got a drunk wandering around the field, 59 00:03:15,160 --> 00:03:20,090 taking a step every once in a while in some random direction. 60 00:03:20,090 --> 00:03:23,390 We can then ask the question is there 61 00:03:23,390 --> 00:03:27,260 an interesting relationship between the number of steps 62 00:03:27,260 --> 00:03:30,770 the drunk takes and how far the drunk is 63 00:03:30,770 --> 00:03:34,680 from the origin at the end of those steps? 64 00:03:34,680 --> 00:03:37,560 You could imagine that if the drunk takes more steps, 65 00:03:37,560 --> 00:03:40,000 he's ever further from the origin. 66 00:03:40,000 --> 00:03:42,810 Or maybe you could imagine, since it's random, 67 00:03:42,810 --> 00:03:47,070 that he just wanders away and he wanders back in all directions 68 00:03:47,070 --> 00:03:50,340 and more or less never gets very far. 69 00:03:50,340 --> 00:03:53,250 So just out of curiosity, I'll take a poll. 70 00:03:53,250 --> 00:03:56,370 Who thinks that the drunk doesn't much 71 00:03:56,370 --> 00:03:58,170 matter how many steps he takes, he'll 72 00:03:58,170 --> 00:04:02,110 be more or less the same distance away? 73 00:04:02,110 --> 00:04:04,810 And who thinks the more steps he takes, the further away he's 74 00:04:04,810 --> 00:04:05,540 likely to be? 75 00:04:08,210 --> 00:04:10,700 It seems to be a season where when you take polls, 76 00:04:10,700 --> 00:04:12,020 they come out almost tied. 77 00:04:18,140 --> 00:04:20,690 Let's look at a small example. 78 00:04:20,690 --> 00:04:23,440 Suppose he takes one step only. 79 00:04:23,440 --> 00:04:25,670 Well, if he takes one step, and we'll 80 00:04:25,670 --> 00:04:28,615 assume for simplicity that he's not so drunk 81 00:04:28,615 --> 00:04:29,615 that he moves at random. 82 00:04:29,615 --> 00:04:34,310 He either moves north or south, east or west. 83 00:04:34,310 --> 00:04:37,650 These are all the places he can get to in one step. 84 00:04:37,650 --> 00:04:40,940 What they have in common is that after one step, 85 00:04:40,940 --> 00:04:46,590 the drunk is always exactly one unit away from the origin. 86 00:04:46,590 --> 00:04:49,090 Well, how about after two steps? 87 00:04:49,090 --> 00:04:52,430 So without loss of generality, let's assume that 88 00:04:52,430 --> 00:04:54,160 the first step-- 89 00:04:54,160 --> 00:04:56,900 let me use the pen that you're supposed 90 00:04:56,900 --> 00:05:02,070 to use to write on this, rather than this pen, which would 91 00:05:02,070 --> 00:05:03,960 make a real mess on my screen. 92 00:05:07,990 --> 00:05:10,720 What did I do with it? 93 00:05:10,720 --> 00:05:12,630 Well, I won't write on it. 94 00:05:12,630 --> 00:05:15,160 So without loss of generality, we'll 95 00:05:15,160 --> 00:05:20,330 assume that the drunk is there after one step. 96 00:05:20,330 --> 00:05:22,370 Took one step to the east. 97 00:05:22,370 --> 00:05:25,040 Well, after two steps, those are all the possible places 98 00:05:25,040 --> 00:05:26,660 he could be. 99 00:05:26,660 --> 00:05:30,900 So on average, how far is the drunk from the origin? 100 00:05:30,900 --> 00:05:35,270 Well, if we look, he could either be two steps away, 101 00:05:35,270 --> 00:05:39,620 if he took another step east, zero steps away, 102 00:05:39,620 --> 00:05:47,180 if he took a step west, or what do we see for the top two? 103 00:05:47,180 --> 00:05:49,490 Well, the top and the bottom one, 104 00:05:49,490 --> 00:05:51,940 we can go back and use the Pythagorean theorem. 105 00:05:57,580 --> 00:06:01,060 c squared equals a squared plus b squared. 106 00:06:04,050 --> 00:06:06,690 And that will tell us that it'll be the square root 107 00:06:06,690 --> 00:06:10,130 of a squared plus b squared. 108 00:06:10,130 --> 00:06:15,680 And that will tell us how far away the upper two are, 109 00:06:15,680 --> 00:06:19,330 and then we can just average them and get a distance. 110 00:06:19,330 --> 00:06:22,870 And as we can see, on average, the drunk 111 00:06:22,870 --> 00:06:25,300 will be a little bit further away after two steps than 112 00:06:25,300 --> 00:06:28,110 after one step. 113 00:06:28,110 --> 00:06:32,490 Well how about after 100,000 steps? 114 00:06:32,490 --> 00:06:35,790 It would be a little bit tedious to go through the case 115 00:06:35,790 --> 00:06:38,190 analysis I just did. 116 00:06:38,190 --> 00:06:41,500 There are a lot of cases after 100,000 steps. 117 00:06:41,500 --> 00:06:43,820 So we end up resorting to a simulation. 118 00:06:46,400 --> 00:06:49,700 So we'll structure it exactly the same way 119 00:06:49,700 --> 00:06:52,790 we've been structuring our other simulations. 120 00:06:52,790 --> 00:06:58,220 We're going to simulate one walk of k steps, n such walks, 121 00:06:58,220 --> 00:07:00,470 and then report the average distance 122 00:07:00,470 --> 00:07:01,905 from the origin of the n walks. 123 00:07:05,160 --> 00:07:08,610 Before we do that, in line with the software engineering 124 00:07:08,610 --> 00:07:11,640 theme of the course, we'll start by defining 125 00:07:11,640 --> 00:07:13,580 some useful abstractions. 126 00:07:13,580 --> 00:07:15,460 There are three of them I want to look at. 127 00:07:15,460 --> 00:07:19,440 Location, the field that the drunk is in, 128 00:07:19,440 --> 00:07:21,885 and the drunk him or herself. 129 00:07:24,710 --> 00:07:27,390 So let's first look at location. 130 00:07:27,390 --> 00:07:31,590 This is going to be an immutable type. 131 00:07:31,590 --> 00:07:33,220 So what we see here-- 132 00:07:33,220 --> 00:07:34,845 as long as I can't point in the screen, 133 00:07:34,845 --> 00:07:37,970 I'll point with a pointer-- 134 00:07:37,970 --> 00:07:40,270 is that we'll initiate it. 135 00:07:40,270 --> 00:07:43,860 We'll initialize it with an x and y value. 136 00:07:43,860 --> 00:07:45,990 That makes sense. 137 00:07:45,990 --> 00:07:50,200 We'll be able to have two getters, getX and getY. 138 00:07:50,200 --> 00:07:53,200 And here's how we see it's immutable. 139 00:07:53,200 --> 00:07:56,530 What move is doing is it's not changing the location, 140 00:07:56,530 --> 00:07:59,060 it's returning a new location. 141 00:07:59,060 --> 00:08:02,880 Perhaps move is poor choice of name for that. 142 00:08:02,880 --> 00:08:04,090 But that is what it's doing. 143 00:08:04,090 --> 00:08:06,310 It's just returning a new location 144 00:08:06,310 --> 00:08:09,580 where it adds the change in x and the change in y 145 00:08:09,580 --> 00:08:11,980 to get two new xy values. 146 00:08:14,570 --> 00:08:16,470 Notice, by the way, that I'm not restricting 147 00:08:16,470 --> 00:08:19,880 these to be integers, or one, or anything like that. 148 00:08:19,880 --> 00:08:22,570 So this would work even if I did not 149 00:08:22,570 --> 00:08:25,930 want to take those nice little east-west, north-south steps. 150 00:08:29,600 --> 00:08:33,020 I've got a underbar underbar string, _str_, 151 00:08:33,020 --> 00:08:35,270 and then here's my implementation-- 152 00:08:35,270 --> 00:08:37,260 you can see it's very sophisticated-- 153 00:08:37,260 --> 00:08:38,720 of the Pythagorean theorem. 154 00:08:41,450 --> 00:08:45,440 So I just do it that way, and that 155 00:08:45,440 --> 00:08:49,050 will get me the distance between two things. 156 00:08:49,050 --> 00:08:52,880 It's one of the annoying things about classes in, actually, 157 00:08:52,880 --> 00:08:55,520 all languages I know with classes, is 158 00:08:55,520 --> 00:08:57,530 you would like to think that self 159 00:08:57,530 --> 00:09:00,075 and other-- there's a symmetry here. 160 00:09:00,075 --> 00:09:02,450 The distance from self to other is the same as from other 161 00:09:02,450 --> 00:09:03,560 to self. 162 00:09:03,560 --> 00:09:05,870 But syntactically, because of the way 163 00:09:05,870 --> 00:09:08,420 the language is structured, we treat them a little bit 164 00:09:08,420 --> 00:09:09,333 differently. 165 00:09:12,190 --> 00:09:15,080 How about class Drunk? 166 00:09:15,080 --> 00:09:17,690 Well, this is kind of boring. 167 00:09:17,690 --> 00:09:22,240 Drunk has a name and a string. 168 00:09:22,240 --> 00:09:22,970 And that's all. 169 00:09:26,130 --> 00:09:27,660 The point of this, and I don't think 170 00:09:27,660 --> 00:09:30,720 we've looked at this before, is this 171 00:09:30,720 --> 00:09:34,800 is not intended to be a useful class on its own. 172 00:09:34,800 --> 00:09:37,970 It's what we call a base class. 173 00:09:37,970 --> 00:09:44,590 The notion here is its only purpose is to be inherited. 174 00:09:44,590 --> 00:09:47,380 It's not supposed to be useful on itself, 175 00:09:47,380 --> 00:09:49,660 but it does give me something that will 176 00:09:49,660 --> 00:09:54,710 be used for the two subclasses. 177 00:09:54,710 --> 00:09:56,490 And we'll look at two subclasses. 178 00:09:56,490 --> 00:09:59,300 The so-called usual drunk, the one 179 00:09:59,300 --> 00:10:02,810 I tried to simulate when I was wandering around, 180 00:10:02,810 --> 00:10:04,760 wanders around at random. 181 00:10:04,760 --> 00:10:06,170 And a drunk I like to think of it 182 00:10:06,170 --> 00:10:09,200 as a New Englander, or a masochistic drunk, 183 00:10:09,200 --> 00:10:12,740 who tries forever to move ever northward, because he 184 00:10:12,740 --> 00:10:15,710 or she wants to be frozen. 185 00:10:15,710 --> 00:10:18,637 I do like this picture of entering the state of Maine 186 00:10:18,637 --> 00:10:19,220 in the winter. 187 00:10:22,870 --> 00:10:25,510 So here is the usual drunk. 188 00:10:25,510 --> 00:10:29,140 Subclass of drunk, and it can take steps 189 00:10:29,140 --> 00:10:35,230 at random, one step, either increasing y, a step north, 190 00:10:35,230 --> 00:10:40,570 decreasing y, a step south, increasing x, a step east, 191 00:10:40,570 --> 00:10:45,770 or decreasing x a step west. 192 00:10:45,770 --> 00:10:46,940 So those are the choices. 193 00:10:46,940 --> 00:10:49,410 And it's going to return one of those at random. 194 00:10:49,410 --> 00:10:53,350 I think we saw random.choice in the last lecture. 195 00:10:53,350 --> 00:10:56,420 And then our masochistic drunk, it's 196 00:10:56,420 --> 00:11:01,100 almost the same, except the choices are slightly different. 197 00:11:01,100 --> 00:11:03,890 If he chooses to head north, he doesn't go one step. 198 00:11:03,890 --> 00:11:08,380 He goes 1.1 steps north. 199 00:11:08,380 --> 00:11:12,902 And if he chooses to go south, he only goes 9/10 of a step. 200 00:11:12,902 --> 00:11:14,360 So what we're seeing here is what's 201 00:11:14,360 --> 00:11:17,250 called a biased random walk. 202 00:11:17,250 --> 00:11:21,110 And the bias here is the direction of the walk 203 00:11:21,110 --> 00:11:23,330 that he's moving either up or down. 204 00:11:27,612 --> 00:11:28,195 Pretty simple. 205 00:11:30,755 --> 00:11:33,310 How about just for to test things out, 206 00:11:33,310 --> 00:11:37,040 we'll ask the question is this an immutable or a mutable type? 207 00:11:41,660 --> 00:11:44,710 Are drunks mutable or immutable? 208 00:11:44,710 --> 00:11:48,160 This is a deep philosophical question. 209 00:11:48,160 --> 00:11:50,820 But if we ignore the philosophical underpinnings 210 00:11:50,820 --> 00:11:53,380 of that question, what about the two types here? 211 00:11:56,250 --> 00:12:00,010 Who thinks it's immutable? 212 00:12:00,010 --> 00:12:01,010 Who thinks it's mutable? 213 00:12:03,970 --> 00:12:05,220 Why do you think it's mutable? 214 00:12:05,220 --> 00:12:06,178 What's getting changed? 215 00:12:09,860 --> 00:12:11,990 The answer is nothing. 216 00:12:15,000 --> 00:12:23,210 It gets created, and then it's returning the step, 217 00:12:23,210 --> 00:12:26,260 but it's not actually changing the drunk. 218 00:12:26,260 --> 00:12:29,620 So so far we have two things that are immutable, drunks 219 00:12:29,620 --> 00:12:32,380 and locations. 220 00:12:32,380 --> 00:12:33,480 Let's look at fields. 221 00:12:33,480 --> 00:12:37,690 Fields are a little bit more complicated. 222 00:12:37,690 --> 00:12:41,730 So field will be a dictionary, and the dictionary 223 00:12:41,730 --> 00:12:47,590 is going to map a drunk to his or her location in the field. 224 00:12:47,590 --> 00:12:52,430 So we can add a drunk at some location, 225 00:12:52,430 --> 00:12:54,620 and we're going to check. 226 00:12:54,620 --> 00:12:57,680 And if the drunk is already there, 227 00:12:57,680 --> 00:13:00,290 we're not going to put the drunk in. 228 00:13:00,290 --> 00:13:04,040 We're going to raise a value error, "Duplicate drunk." 229 00:13:04,040 --> 00:13:09,230 Otherwise we're going to set the value of drunkenness mapping 230 00:13:09,230 --> 00:13:10,610 to loc. 231 00:13:10,610 --> 00:13:14,540 Now you see, by the way, why I wanted drunks to be immutable. 232 00:13:14,540 --> 00:13:17,000 Because they have to be hashable so 233 00:13:17,000 --> 00:13:20,990 I can use them as a key in a dictionary. 234 00:13:20,990 --> 00:13:23,710 So it was not an idle question whether they were immutable. 235 00:13:23,710 --> 00:13:27,250 It was an important question. 236 00:13:27,250 --> 00:13:29,230 I can get the location of a drunk. 237 00:13:29,230 --> 00:13:31,720 If the drunk is not in there, then I'll 238 00:13:31,720 --> 00:13:35,490 raise a different value error, "Drunk not in field." 239 00:13:35,490 --> 00:13:37,030 Otherwise I'll return the location 240 00:13:37,030 --> 00:13:38,440 associated with that drunk. 241 00:13:42,750 --> 00:13:47,810 And finally, we're going to have moveDrunk. 242 00:13:47,810 --> 00:13:51,670 Again I'll check whether the drunk is there. 243 00:13:51,670 --> 00:13:55,760 If the drunk is there, I'm going to get the distance 244 00:13:55,760 --> 00:14:00,070 on x and the distance in y by calling drunk.takeStep. 245 00:14:02,630 --> 00:14:06,410 So we saw takeStep for a drunk didn't move the drunk anywhere, 246 00:14:06,410 --> 00:14:08,080 because the drunks were immutable, 247 00:14:08,080 --> 00:14:10,790 but returned new locations. 248 00:14:10,790 --> 00:14:13,860 A new x and new values. 249 00:14:13,860 --> 00:14:20,350 And then I'm going to use that to move the drunk in the field. 250 00:14:23,230 --> 00:14:26,955 So I'll set self.drunk, so drunk to move x distance and y 251 00:14:26,955 --> 00:14:27,455 distance. 252 00:14:32,800 --> 00:14:38,950 So it's very simple, but having built this set of classes, 253 00:14:38,950 --> 00:14:42,180 we can now actually write the simulation. 254 00:14:42,180 --> 00:14:42,760 Oh. 255 00:14:42,760 --> 00:14:43,718 What about our classes? 256 00:14:43,718 --> 00:14:45,220 Are they mutable or immutable? 257 00:14:45,220 --> 00:14:46,245 Not classes. 258 00:14:46,245 --> 00:14:46,995 What about fields? 259 00:14:51,280 --> 00:14:54,280 Any votes for mutable? 260 00:14:54,280 --> 00:14:56,650 Yeah, exactly. 261 00:14:56,650 --> 00:15:00,280 Because you can see I'm mutating it right here. 262 00:15:00,280 --> 00:15:02,770 I'm changing the value of the dictionary. 263 00:15:02,770 --> 00:15:06,100 And in fact, every time I add a drunk to the field, 264 00:15:06,100 --> 00:15:08,860 I'm changing the value of the dictionary, which 265 00:15:08,860 --> 00:15:12,360 is to say mutating the field. 266 00:15:12,360 --> 00:15:14,160 So I'll have a bunch of locations, 267 00:15:14,160 --> 00:15:16,500 which are immutable objects. 268 00:15:16,500 --> 00:15:19,020 Makes sense that a location is immutable. 269 00:15:19,020 --> 00:15:21,570 A bunch of drunks, and the thing I'm going to change 270 00:15:21,570 --> 00:15:23,520 is where the drunks are in the field. 271 00:15:29,460 --> 00:15:34,370 I said we'd start by simulating a single walk. 272 00:15:34,370 --> 00:15:40,910 So here it is, a walk in a field with a drunk, and that drunk 273 00:15:40,910 --> 00:15:42,890 will take some number of steps in the field. 274 00:15:46,850 --> 00:15:47,900 And you can see this. 275 00:15:47,900 --> 00:15:48,900 It's very simple. 276 00:15:48,900 --> 00:15:50,960 I just have a loop. 277 00:15:50,960 --> 00:15:53,640 Drunk takes some number of random steps, 278 00:15:53,640 --> 00:15:57,710 and I'm going to return the distance from the start 279 00:15:57,710 --> 00:16:02,700 to the final location of the drunk. 280 00:16:02,700 --> 00:16:04,420 So how far is the drunk from the origin? 281 00:16:06,970 --> 00:16:09,780 I then need to simulate multiple walks. 282 00:16:12,790 --> 00:16:15,840 Notice here that I've got the number of steps, 283 00:16:15,840 --> 00:16:23,300 the number of trials, and dClass stands for class of the drunk. 284 00:16:23,300 --> 00:16:26,510 And that's because I want to use the same function 285 00:16:26,510 --> 00:16:29,000 to simulate as many different kinds of drunks 286 00:16:29,000 --> 00:16:30,530 as I care about. 287 00:16:30,530 --> 00:16:33,200 We've only seen two here, the masochistic drunk 288 00:16:33,200 --> 00:16:37,010 and the usual drunk, but you can imagine many other kinds 289 00:16:37,010 --> 00:16:37,540 as well. 290 00:16:40,770 --> 00:16:41,850 So let's do it. 291 00:16:41,850 --> 00:16:45,770 So here I'm going to simulate a walk for one drunk, Homer. 292 00:16:45,770 --> 00:16:49,310 So we'll create a drunk named Homer, or the variable Homer, 293 00:16:49,310 --> 00:16:53,550 which is the drunk class. 294 00:16:53,550 --> 00:16:59,000 Then the origin, distances, and for t 295 00:16:59,000 --> 00:17:02,980 in range number of trials, we'll just do it, 296 00:17:02,980 --> 00:17:05,319 and then we'll return the distances. 297 00:17:05,319 --> 00:17:07,790 So it's initialized to the empty list. 298 00:17:07,790 --> 00:17:11,710 So we're going to return a list for however many trials we do, 299 00:17:11,710 --> 00:17:15,650 how far the drunk ended up from the origin. 300 00:17:15,650 --> 00:17:17,960 Then we can average that, and we look at the mean. 301 00:17:17,960 --> 00:17:20,420 Maybe we'll look at the min or the max. 302 00:17:20,420 --> 00:17:23,629 Lots of different questions we could ask about the behavior. 303 00:17:26,160 --> 00:17:29,860 And now we can put it all together here. 304 00:17:29,860 --> 00:17:35,850 So drunkTest will take a set of different walk lengths, 305 00:17:35,850 --> 00:17:39,240 a list of different walk lengths, the number of trials, 306 00:17:39,240 --> 00:17:41,580 and the class. 307 00:17:41,580 --> 00:17:43,530 And for a number of steps and walk lengths, 308 00:17:43,530 --> 00:17:45,630 distances will be simWalks of number 309 00:17:45,630 --> 00:17:46,820 of steps, numTrials, dClass. 310 00:17:49,610 --> 00:17:52,450 And then I'm going to just print some statistics. 311 00:17:56,180 --> 00:17:58,360 You may or may not have seen this. 312 00:17:58,360 --> 00:18:02,640 This is something that's built in to Python. 313 00:18:02,640 --> 00:18:09,130 I can ask for the name of a class. 314 00:18:09,130 --> 00:18:12,760 So dClass, remember, is a class, and _name_ will give me 315 00:18:12,760 --> 00:18:14,620 the name of the class. 316 00:18:14,620 --> 00:18:17,710 Might be usual, it might be drunk, in this case. 317 00:18:21,750 --> 00:18:22,740 So let's try it. 318 00:18:27,780 --> 00:18:29,310 So the code we've looked at. 319 00:18:34,460 --> 00:18:37,970 So let's go down here, and we'll run it, 320 00:18:37,970 --> 00:18:40,795 and we'll try it for walks of 10, 100, 1,000, 321 00:18:40,795 --> 00:18:42,740 and 10,000 steps. 322 00:18:42,740 --> 00:18:43,910 And we'll do 100 trials. 323 00:18:54,840 --> 00:18:57,150 Here's what we got. 324 00:18:57,150 --> 00:19:01,800 So my question to you is does this look plausible? 325 00:19:09,806 --> 00:19:10,930 What is it telling us here? 326 00:19:10,930 --> 00:19:14,080 Well, it's telling us here that the length of the walk 327 00:19:14,080 --> 00:19:16,930 actually doesn't really affect-- 328 00:19:16,930 --> 00:19:21,110 the number of steps doesn't affect how far the drunk gets. 329 00:19:21,110 --> 00:19:22,310 There's some randomness. 330 00:19:22,310 --> 00:19:27,080 8.6, 8.57, 9.2, 8.7. 331 00:19:27,080 --> 00:19:29,990 Not much variance. 332 00:19:29,990 --> 00:19:33,980 So we've done this simulation and we've learned something, 333 00:19:33,980 --> 00:19:36,620 maybe. 334 00:19:36,620 --> 00:19:39,060 So does this look plausible? 335 00:19:42,210 --> 00:19:45,660 We can look at it here. 336 00:19:45,660 --> 00:19:46,850 I've just transcribed it. 337 00:19:49,960 --> 00:19:52,070 What do you think? 338 00:19:52,070 --> 00:19:54,592 Well, go ahead. 339 00:19:54,592 --> 00:19:56,740 AUDIENCE: I was going to say, it seems plausible 340 00:19:56,740 --> 00:19:58,307 because after the first two steps, 341 00:19:58,307 --> 00:20:02,203 there's a 50% chance he's going closer to the origin. 342 00:20:02,203 --> 00:20:04,650 And a 50% chance he's going away from it. 343 00:20:04,650 --> 00:20:07,800 JOHN GUTTAG: So we have at least one vote for plausible, 344 00:20:07,800 --> 00:20:13,120 and it's certainly a plausible argument. 345 00:20:13,120 --> 00:20:15,910 Well, one of the things we need to learn to do 346 00:20:15,910 --> 00:20:21,050 is whenever we build a simulation, 347 00:20:21,050 --> 00:20:24,680 we need to do what I call a sanity check 348 00:20:24,680 --> 00:20:28,010 to see whether or not the simulation actually 349 00:20:28,010 --> 00:20:30,510 makes sense. 350 00:20:30,510 --> 00:20:33,440 So if we're going to do a sanity check, 351 00:20:33,440 --> 00:20:36,860 what might we do in this case? 352 00:20:36,860 --> 00:20:42,700 We should try it on cases where we think we know the answer. 353 00:20:42,700 --> 00:20:44,980 So we say, let's take a really simple case 354 00:20:44,980 --> 00:20:48,010 where we're pretty sure we know what the answer is. 355 00:20:48,010 --> 00:20:49,690 Let's run our simulation and make 356 00:20:49,690 --> 00:20:52,690 sure it gives us the right answer for this simple case. 357 00:20:55,310 --> 00:20:59,610 So if we think of a sanity check here, 358 00:20:59,610 --> 00:21:02,600 maybe we should look at these numbers. 359 00:21:02,600 --> 00:21:03,930 We just did it. 360 00:21:03,930 --> 00:21:08,700 We know how far the drunk should get in zero steps. 361 00:21:08,700 --> 00:21:10,800 How far should the drunk move in zero steps? 362 00:21:10,800 --> 00:21:12,300 Zero. 363 00:21:12,300 --> 00:21:14,850 How far should the drunk move in one steps? 364 00:21:14,850 --> 00:21:16,780 We know that should be one. 365 00:21:16,780 --> 00:21:19,890 Two steps, well, we knew what that should be. 366 00:21:19,890 --> 00:21:23,460 Well, if I run this sanity check, 367 00:21:23,460 --> 00:21:24,870 these are the numbers I get. 368 00:21:28,780 --> 00:21:32,410 I should be pretty suspicious. 369 00:21:32,410 --> 00:21:33,940 I should also be suspicious they're 370 00:21:33,940 --> 00:21:39,470 kind of the same numbers I got for 10,000 steps. 371 00:21:39,470 --> 00:21:41,270 What should I think about? 372 00:21:41,270 --> 00:21:43,640 I should think that maybe there's a bug in my code. 373 00:21:46,810 --> 00:21:49,710 So if we now go back and look at the code, 374 00:21:49,710 --> 00:21:53,400 yes, this fails the pants on fire test 375 00:21:53,400 --> 00:21:55,930 that there's clearly something wrong with these numbers. 376 00:21:59,670 --> 00:22:04,650 What we were appending is walk of Homer, numTrials, 1. 377 00:22:04,650 --> 00:22:08,340 Well, numTrials is a constant. 378 00:22:08,340 --> 00:22:10,880 It's always 100. 379 00:22:10,880 --> 00:22:15,410 What I intended to write here was not numTrials but numSteps. 380 00:22:19,430 --> 00:22:21,260 I actually did this the first time 381 00:22:21,260 --> 00:22:23,940 I wrote this simulation many years ago. 382 00:22:23,940 --> 00:22:27,080 I made this typo, if you will, and I 383 00:22:27,080 --> 00:22:29,510 got these bizarre answers. 384 00:22:29,510 --> 00:22:31,900 So I looked at the code and I said, well, 385 00:22:31,900 --> 00:22:34,100 that's actually wrong. 386 00:22:34,100 --> 00:22:36,080 No wonder it's always the same number. 387 00:22:36,080 --> 00:22:37,775 I'm calling it with a constant. 388 00:22:37,775 --> 00:22:40,670 The constant happens to be 100. 389 00:22:40,670 --> 00:22:42,950 So let's go fix the simulation. 390 00:22:57,670 --> 00:22:59,150 So this should have been numSteps. 391 00:23:16,080 --> 00:23:17,250 Now let's run it again. 392 00:23:22,560 --> 00:23:26,920 Well, these results are pretty different. 393 00:23:26,920 --> 00:23:31,622 Now we see that in fact, they're increasing. 394 00:23:34,720 --> 00:23:37,860 Should I just look at this and be happy? 395 00:23:37,860 --> 00:23:39,030 Probably not. 396 00:23:39,030 --> 00:23:41,520 I should run my sanity check again 397 00:23:41,520 --> 00:23:47,530 and make sure I get the right results for zero, one, and two. 398 00:23:47,530 --> 00:23:50,475 So let's go back and do that, just to be a little bit safe. 399 00:23:58,430 --> 00:24:01,120 So I'll just change this tuple of values to be-- 400 00:24:15,650 --> 00:24:19,450 and I should feel a lot better about this. 401 00:24:19,450 --> 00:24:21,950 The mean, the max, and the min are all zero 402 00:24:21,950 --> 00:24:24,820 when he doesn't take any steps. 403 00:24:24,820 --> 00:24:27,960 One is exactly what we should expect, 404 00:24:27,960 --> 00:24:31,890 and two is also-- the mean is where we would guess it to be, 405 00:24:31,890 --> 00:24:34,500 and the max is two, happened to take 406 00:24:34,500 --> 00:24:38,130 two steps in the same direction, and the min is zero, happened 407 00:24:38,130 --> 00:24:40,230 to end up where he started. 408 00:24:40,230 --> 00:24:44,040 So I've passed my sanity check. 409 00:24:44,040 --> 00:24:45,810 Doesn't mean my simulation is right, 410 00:24:45,810 --> 00:24:48,180 but at least I have reason to be hopeful. 411 00:24:51,380 --> 00:24:53,660 So getting back. 412 00:25:01,380 --> 00:25:05,730 So we saw these results, and now we're getting the indication 413 00:25:05,730 --> 00:25:09,090 that in fact, contrary to what we might have thought, 414 00:25:09,090 --> 00:25:13,110 it does appear to be that the more steps the drunk takes, 415 00:25:13,110 --> 00:25:15,020 the further away the drunk ends up. 416 00:25:17,600 --> 00:25:20,560 And that was the usual drunk. 417 00:25:20,560 --> 00:25:22,430 We can try the masochistic drunk, 418 00:25:22,430 --> 00:25:26,350 and then we see something pretty interesting. 419 00:25:26,350 --> 00:25:30,130 I won't make you sit through it, but when we run it, 420 00:25:30,130 --> 00:25:32,530 here are the usual drunks, the numbers, 421 00:25:32,530 --> 00:25:35,110 and I just looked at it for 1,000 and 10,000 422 00:25:35,110 --> 00:25:37,300 so it would fit on the screen. 423 00:25:37,300 --> 00:25:43,690 You see for the usual drunk, it's 26.8, roughly 90. 424 00:25:43,690 --> 00:25:46,870 Fair dispersion in the min and the max. 425 00:25:46,870 --> 00:25:50,920 And the masochistic drunk seems to be making considerably more 426 00:25:50,920 --> 00:25:56,080 progress than the usual drunk. 427 00:25:56,080 --> 00:25:58,780 So we see is this bias actually appears 428 00:25:58,780 --> 00:26:00,025 to be changing the distance. 429 00:26:03,770 --> 00:26:06,090 Well, that's interesting. 430 00:26:06,090 --> 00:26:08,850 Now we could ask the question why? 431 00:26:08,850 --> 00:26:11,840 What's going on? 432 00:26:11,840 --> 00:26:16,130 And to do that, I want to go and start visualizing 433 00:26:16,130 --> 00:26:18,170 what's the trend? 434 00:26:18,170 --> 00:26:21,770 So rather than just looking at two numbers or three numbers, 435 00:26:21,770 --> 00:26:24,670 as we've been doing, I'm going to draw a pretty picture. 436 00:26:24,670 --> 00:26:26,070 Actually, I'm not going to draw. 437 00:26:26,070 --> 00:26:27,710 I'm a terrible artist. 438 00:26:27,710 --> 00:26:30,800 But Python will draw us some pretty pictures. 439 00:26:30,800 --> 00:26:33,320 We're going to simulate walks of multiple lengths 440 00:26:33,320 --> 00:26:36,170 for each kind of drunk, and then plot 441 00:26:36,170 --> 00:26:38,180 the distance at the end of each length 442 00:26:38,180 --> 00:26:39,980 walk for each kind of drunk. 443 00:26:42,790 --> 00:26:46,100 I now digress for a moment to talk about how we do plotting. 444 00:26:48,650 --> 00:26:51,380 So we're going to use something called Pylab. 445 00:26:51,380 --> 00:26:56,730 I've listed here four really important libraries 446 00:26:56,730 --> 00:27:01,770 that you will surely end up using extensively 447 00:27:01,770 --> 00:27:06,060 if you continue to use Python for research purposes. 448 00:27:06,060 --> 00:27:10,280 NumPy adds vectors, matrices, and many high-level 449 00:27:10,280 --> 00:27:12,200 mathematical functions. 450 00:27:12,200 --> 00:27:13,500 Actually, it might be NumPy. 451 00:27:13,500 --> 00:27:14,416 It might be "Num-Pee." 452 00:27:14,416 --> 00:27:16,700 I'm not sure how to pronounce it. 453 00:27:16,700 --> 00:27:18,470 But we'll call it NumPy. 454 00:27:18,470 --> 00:27:21,420 So these are really useful things. 455 00:27:21,420 --> 00:27:24,860 SciPy adds on top of that a bunch 456 00:27:24,860 --> 00:27:30,590 of mathematical classes and functions useful to scientists. 457 00:27:30,590 --> 00:27:32,990 Things like-- well, we'll look at some of them 458 00:27:32,990 --> 00:27:35,220 as we go on through the term. 459 00:27:35,220 --> 00:27:40,450 MatPlotLib adds an object-oriented programming 460 00:27:40,450 --> 00:27:42,460 interface for plotting. 461 00:27:42,460 --> 00:27:45,920 Anybody here used MATLAB? 462 00:27:45,920 --> 00:27:46,880 Great. 463 00:27:46,880 --> 00:27:53,120 Well you'll find that MatPlotLib is the Mat, think MATLAB. 464 00:27:53,120 --> 00:27:55,910 Lets you, in Python, use all the plotting stuff 465 00:27:55,910 --> 00:28:00,440 that you've come to either like or hate in MATLAB. 466 00:28:00,440 --> 00:28:02,000 So it's really convenient. 467 00:28:02,000 --> 00:28:03,740 If you know how to do plots in MATLAB, 468 00:28:03,740 --> 00:28:05,840 you'll know how to do it in Python. 469 00:28:05,840 --> 00:28:09,260 PyLab combines all of these to give you 470 00:28:09,260 --> 00:28:12,900 a MATLAB-like interface to Python. 471 00:28:12,900 --> 00:28:15,780 So once you have PyLab, you can do a lot of things 472 00:28:15,780 --> 00:28:18,390 that you would normally want to do in MATLAB, 473 00:28:18,390 --> 00:28:20,580 for example, produce weird-looking plots 474 00:28:20,580 --> 00:28:23,530 like this one. 475 00:28:23,530 --> 00:28:29,440 I'm going to show you one of the many, many plotting commands. 476 00:28:29,440 --> 00:28:31,940 It is called plot. 477 00:28:31,940 --> 00:28:34,370 It takes two arguments, which must be 478 00:28:34,370 --> 00:28:36,340 sequences of the same length. 479 00:28:38,950 --> 00:28:40,920 The first argument is the x-coordinates, 480 00:28:40,920 --> 00:28:43,450 the second argument is the y-coordinates corresponding 481 00:28:43,450 --> 00:28:44,920 to the x ones. 482 00:28:44,920 --> 00:28:48,910 There are roughly three zillion optional arguments, 483 00:28:48,910 --> 00:28:52,530 and you'll see me use only a small subset of them. 484 00:28:52,530 --> 00:28:55,330 It plots the points in order. 485 00:28:55,330 --> 00:29:00,670 First the first xy, then the second xy, then the third xy. 486 00:29:00,670 --> 00:29:03,850 Why is it important that I say it plots them in order? 487 00:29:03,850 --> 00:29:08,350 Because by default, as each point is plotted, 488 00:29:08,350 --> 00:29:11,320 it draws a line connecting one point to the next point 489 00:29:11,320 --> 00:29:13,060 to the next point. 490 00:29:13,060 --> 00:29:15,460 And so the order in which they're plotted 491 00:29:15,460 --> 00:29:18,260 will determine where the lines go. 492 00:29:18,260 --> 00:29:21,290 Now we'll see, as we go on, that we often don't draw the lines, 493 00:29:21,290 --> 00:29:25,330 but by default they are drawn. 494 00:29:25,330 --> 00:29:27,340 Here's an example. 495 00:29:27,340 --> 00:29:31,320 You start by importing PyLab. 496 00:29:31,320 --> 00:29:36,520 Then I've given xVals and yVals1, and if I call 497 00:29:36,520 --> 00:29:40,230 pylab.plot of xVals, yVals1. 498 00:29:40,230 --> 00:29:42,410 Here is one of the arguments I can give it. 499 00:29:42,410 --> 00:29:46,950 I'm saying I'd like this to be plotted in blue, b for blue, 500 00:29:46,950 --> 00:29:51,540 and I'd like it to be plotted as a solid line, a single dash. 501 00:29:51,540 --> 00:29:54,630 And I want to give that line a label, which 502 00:29:54,630 --> 00:29:55,440 I've said is first. 503 00:29:58,360 --> 00:30:00,960 YVals2 is a different list. 504 00:30:00,960 --> 00:30:01,930 I'll plot it again. 505 00:30:01,930 --> 00:30:06,810 Here I'm going to say I want a red dotted line, 506 00:30:06,810 --> 00:30:09,100 and the label will be second. 507 00:30:09,100 --> 00:30:10,920 And then after plotting it, I'm going 508 00:30:10,920 --> 00:30:15,030 to invoke pylab.legend, which puts this nice little box up 509 00:30:15,030 --> 00:30:17,460 here in the corner, in this case, 510 00:30:17,460 --> 00:30:20,265 saying that first is a solid blue line and second 511 00:30:20,265 --> 00:30:27,970 a dashed red line, I should say. 512 00:30:27,970 --> 00:30:29,680 Now again, there are lots of arguments, 513 00:30:29,680 --> 00:30:32,800 lots of other arguments I could give in plot. 514 00:30:32,800 --> 00:30:35,500 Also legend, I can tell it where to put the legend, 515 00:30:35,500 --> 00:30:37,030 if I so choose. 516 00:30:37,030 --> 00:30:38,620 Here I've just said, put it wherever 517 00:30:38,620 --> 00:30:43,180 you happen to want to put it, or PyLab wants to put it. 518 00:30:43,180 --> 00:30:46,045 So a very simple way to produce a plot. 519 00:30:51,620 --> 00:30:54,860 There are lots of details and many more 520 00:30:54,860 --> 00:30:58,250 examples about plotting in the assigned reading. 521 00:30:58,250 --> 00:31:01,460 We've posted a video that Professor Grimson produced 522 00:31:01,460 --> 00:31:06,320 for an online course, 600.1x. 523 00:31:06,320 --> 00:31:10,430 It's about a 50 minute video broken into multiple segments 524 00:31:10,430 --> 00:31:13,390 about how to use plotting in PyLab with a lot more detail 525 00:31:13,390 --> 00:31:15,110 than I've given you. 526 00:31:15,110 --> 00:31:17,859 You'll see if you read the code for this lecture. 527 00:31:17,859 --> 00:31:19,400 And as you see this lecture, there'll 528 00:31:19,400 --> 00:31:23,210 be lots of other plots showing up of different kinds. 529 00:31:23,210 --> 00:31:26,150 These are my two favorite online sites 530 00:31:26,150 --> 00:31:29,090 for finding out what to do. 531 00:31:29,090 --> 00:31:33,570 And of course, you can google all sorts of things. 532 00:31:33,570 --> 00:31:36,420 That's all I'm going to tell you about how to produce plots 533 00:31:36,420 --> 00:31:39,330 in class, but we're going to expect 534 00:31:39,330 --> 00:31:41,580 you to learn a lot about it, because I think 535 00:31:41,580 --> 00:31:43,650 it's a really useful skill. 536 00:31:43,650 --> 00:31:45,630 And at the very least, you should 537 00:31:45,630 --> 00:31:50,070 feel comfortable that any plot I show you, you now-- 538 00:31:50,070 --> 00:31:51,780 obviously not right now-- 539 00:31:51,780 --> 00:31:54,760 but you will eventually know how to produce. 540 00:31:54,760 --> 00:31:57,660 So if you do these things, you'll 541 00:31:57,660 --> 00:31:58,980 be more than up to speed. 542 00:32:02,800 --> 00:32:08,450 So I started by saying I wanted to plot the trends in distance, 543 00:32:08,450 --> 00:32:09,840 and they're interesting. 544 00:32:09,840 --> 00:32:14,600 So here's the usual drunk and the masochistic drunk. 545 00:32:14,600 --> 00:32:16,730 So you can see, sure enough, the usual drunk, 546 00:32:16,730 --> 00:32:20,360 this fuschia line is progressing very slowly 547 00:32:20,360 --> 00:32:24,880 and the masochistic drunk, considerably faster. 548 00:32:24,880 --> 00:32:28,150 I looked at these, and after looking at those two, 549 00:32:28,150 --> 00:32:30,660 I tried to figure out whether there 550 00:32:30,660 --> 00:32:33,700 was some mathematical explanation of what 551 00:32:33,700 --> 00:32:36,670 was going on, and decided, well, it 552 00:32:36,670 --> 00:32:38,830 looked to me like the usual drunk 553 00:32:38,830 --> 00:32:41,440 was moving at about the square root of the number of steps. 554 00:32:44,560 --> 00:32:48,010 Not so odd to think about it, if you go back to old Pythagoras 555 00:32:48,010 --> 00:32:49,880 here. 556 00:32:49,880 --> 00:32:52,870 And sure enough, when I plot, and I ran this simulation up 557 00:32:52,870 --> 00:32:55,120 to 100,000 steps. 558 00:32:55,120 --> 00:32:57,950 When I plot the square root of the number of steps, 559 00:32:57,950 --> 00:33:03,280 it's not identical, but it's pretty darn close. 560 00:33:03,280 --> 00:33:06,630 Seems to be moving just a tad faster than the square root, 561 00:33:06,630 --> 00:33:08,440 but not much. 562 00:33:08,440 --> 00:33:09,690 But who knows exactly? 563 00:33:09,690 --> 00:33:11,760 But pretty good. 564 00:33:11,760 --> 00:33:16,090 And then the masochistic drunk seems 565 00:33:16,090 --> 00:33:23,960 to be moving at a rate of numSteps times 0.05. 566 00:33:23,960 --> 00:33:27,574 A less intuitive answer than the square root. 567 00:33:27,574 --> 00:33:29,240 Why do you think it might be doing that? 568 00:33:33,890 --> 00:33:38,300 Well, what we notice is that-- 569 00:33:38,300 --> 00:33:39,800 and we'll look at this-- 570 00:33:39,800 --> 00:33:42,380 maybe there's not much difference 571 00:33:42,380 --> 00:33:46,010 between what the masochistic drunk and the usual drunk 572 00:33:46,010 --> 00:33:50,640 do on the x-axis, east and west. 573 00:33:50,640 --> 00:33:52,760 In fact, they shouldn't be. 574 00:33:52,760 --> 00:33:56,210 But there should be a difference on the y-axis, 575 00:33:56,210 --> 00:33:59,900 because every time, 1/4 of the time, 576 00:33:59,900 --> 00:34:05,840 the drunk is taking a step north of 1.1 units, 577 00:34:05,840 --> 00:34:11,034 and 1/4 of the time, he's taking a step south of 0.9 units. 578 00:34:13,560 --> 00:34:16,560 And so 1/2 the time, the steps are 579 00:34:16,560 --> 00:34:22,239 diverging by a small fraction. 580 00:34:22,239 --> 00:34:25,320 And if we think about it, 0.1 1/2 the time. 581 00:34:28,880 --> 00:34:29,480 We divide it. 582 00:34:29,480 --> 00:34:32,989 We get 0.05. 583 00:34:32,989 --> 00:34:36,620 So at least we need to do some more analysis, 584 00:34:36,620 --> 00:34:39,380 but the data is pretty compelling here 585 00:34:39,380 --> 00:34:40,940 that it's a very good fit. 586 00:34:43,920 --> 00:34:46,139 Well, let's look at the ending location. 587 00:34:46,139 --> 00:34:49,830 So here you see a rather different kind of plot. 588 00:34:49,830 --> 00:34:52,260 Here I'm showing that you can plot 589 00:34:52,260 --> 00:34:56,719 these things without connecting them by lines. 590 00:35:01,890 --> 00:35:05,190 And giving them different shapes. 591 00:35:05,190 --> 00:35:08,630 So what here I've said is that the masochistic drunk 592 00:35:08,630 --> 00:35:14,500 we're going to plot using red triangles, 593 00:35:14,500 --> 00:35:21,915 and the usual drunk I'm going to plot using black plus signs. 594 00:35:25,180 --> 00:35:27,870 And since I'm going to plot the location 595 00:35:27,870 --> 00:35:32,690 at the end of many walks, it doesn't make sense 596 00:35:32,690 --> 00:35:35,180 to draw lines connecting everything, 597 00:35:35,180 --> 00:35:39,780 because all we're caring about here is the endpoints. 598 00:35:39,780 --> 00:35:42,720 So since I only want the endpoints, 599 00:35:42,720 --> 00:35:45,600 I'm plotting them a different way. 600 00:35:45,600 --> 00:35:55,880 So for example, I can write something like plot( xVals, 601 00:35:55,880 --> 00:36:06,340 yVals, and then if I do something like let's see. 602 00:36:12,020 --> 00:36:14,610 Just 'b0' in fact. 603 00:36:14,610 --> 00:36:17,150 What that says is plot blue circles. 604 00:36:21,094 --> 00:36:22,010 I could have written-- 605 00:36:24,830 --> 00:36:33,370 in fact I did write 'k+' and that says black plus signs. 606 00:36:33,370 --> 00:36:34,840 And I don't actually remember what 607 00:36:34,840 --> 00:36:37,110 I did to get the triangles, but it's in the code. 608 00:36:41,250 --> 00:36:44,970 And so that's very flexible what you do. 609 00:36:44,970 --> 00:36:47,040 And as you can see here, we get the insight 610 00:36:47,040 --> 00:36:51,750 I'd communicated earlier that if you look at east and west, 611 00:36:51,750 --> 00:36:56,870 not much difference between this ball and that ball. 612 00:36:56,870 --> 00:37:02,060 They seem to be moving about the same spread, the same outliers. 613 00:37:02,060 --> 00:37:04,470 But this ball is displaced north. 614 00:37:07,370 --> 00:37:10,400 And not surprisingly, after 10,000 steps, 615 00:37:10,400 --> 00:37:12,200 you would not expect any of these points 616 00:37:12,200 --> 00:37:16,200 to be below zero, where you'd expect roughly half 617 00:37:16,200 --> 00:37:17,625 of these points to be below zero. 618 00:37:21,460 --> 00:37:24,630 And indeed that's about true. 619 00:37:24,630 --> 00:37:28,950 And we see here what's going on that if we 620 00:37:28,950 --> 00:37:33,840 look at the mean absolute difference in x and y, 621 00:37:33,840 --> 00:37:36,800 we see that not a huge difference 622 00:37:36,800 --> 00:37:39,480 between the usual drunk and the masochistic drunk. 623 00:37:39,480 --> 00:37:41,550 There happens to be a distance. 624 00:37:41,550 --> 00:37:44,050 But a huge difference-- 625 00:37:44,050 --> 00:37:46,200 sorry, x and x. 626 00:37:46,200 --> 00:37:50,150 Comparing the two y values, there's a big difference, 627 00:37:50,150 --> 00:37:51,487 as you see here. 628 00:37:51,487 --> 00:37:52,820 So what's the point of all this? 629 00:37:52,820 --> 00:37:56,090 It's not to learn about different kinds of drunks. 630 00:37:56,090 --> 00:37:59,120 It's to show how, by visualization, we 631 00:37:59,120 --> 00:38:02,990 can get insight into our data that if I just 632 00:38:02,990 --> 00:38:07,040 printed spreadsheets showing you all of these endpoints, 633 00:38:07,040 --> 00:38:10,160 it would be hard to make sense of what was there. 634 00:38:10,160 --> 00:38:13,130 So get accustomed to using plotting 635 00:38:13,130 --> 00:38:14,630 to help you understand data. 636 00:38:17,510 --> 00:38:21,710 Now let's play a little bit more with the simulation. 637 00:38:21,710 --> 00:38:23,900 We looked at different kinds of drunks. 638 00:38:23,900 --> 00:38:27,030 Let's look at different kinds of fields. 639 00:38:27,030 --> 00:38:29,000 So I want to look at a field with what 640 00:38:29,000 --> 00:38:30,440 we'll call a wormhole in it. 641 00:38:34,032 --> 00:38:35,740 For those of you of a certain generation, 642 00:38:35,740 --> 00:38:38,980 you will recognize the Tardis. 643 00:38:38,980 --> 00:38:42,850 So the idea here is that the field is such 644 00:38:42,850 --> 00:38:45,700 that as you wander around it, everything is normal. 645 00:38:45,700 --> 00:38:48,190 But every once in a while you hit a place 646 00:38:48,190 --> 00:38:49,750 where you're magically transported 647 00:38:49,750 --> 00:38:52,330 to a different place. 648 00:38:52,330 --> 00:38:55,480 So it behaves very peculiarly. 649 00:38:55,480 --> 00:38:58,910 So let's call this an OddField. 650 00:38:58,910 --> 00:39:03,480 Not odd numbers, but odd as in strange. 651 00:39:03,480 --> 00:39:07,620 So it's going to be a subclass of field. 652 00:39:07,620 --> 00:39:09,780 We're going to have a parameter that tells us 653 00:39:09,780 --> 00:39:12,390 how many worm holes it has. 654 00:39:12,390 --> 00:39:16,880 A default value of 1,000. 655 00:39:16,880 --> 00:39:23,000 And we'll see how we use xRange and yRange shortly. 656 00:39:23,000 --> 00:39:24,510 So what are the first thing we do? 657 00:39:24,510 --> 00:39:28,400 Well, we'll call Field _init to initialize the field 658 00:39:28,400 --> 00:39:30,200 in the usual way. 659 00:39:30,200 --> 00:39:33,720 And then we're going to create a dictionary of wormholes. 660 00:39:33,720 --> 00:39:37,560 So for w in the range number of worm holes, 661 00:39:37,560 --> 00:39:41,450 I'm going to choose a random x and a random y 662 00:39:41,450 --> 00:39:45,980 in xRange minus xRange to plus xRange, minus yRange to yRange. 663 00:39:49,060 --> 00:39:52,050 So this is going to be where the worm holes are located. 664 00:39:52,050 --> 00:39:53,790 And then for each of those, I'm going 665 00:39:53,790 --> 00:39:57,690 to get a random location where you're, in some sense, 666 00:39:57,690 --> 00:40:01,830 teleported to if you enter the wormhole. 667 00:40:01,830 --> 00:40:04,770 So here we're using random to get random integers. 668 00:40:04,770 --> 00:40:07,390 We've seen that before. 669 00:40:07,390 --> 00:40:09,600 And so the new location will be the location 670 00:40:09,600 --> 00:40:14,040 of the new x and the new y, and we're 671 00:40:14,040 --> 00:40:18,570 going to update this dictionary of wormholes 672 00:40:18,570 --> 00:40:23,280 to say that paired with the location x, y is newLoc. 673 00:40:29,471 --> 00:40:34,310 Now when we move the drunk, and again this is just changing-- 674 00:40:34,310 --> 00:40:37,460 we're overriding moveDrunk, so we're overriding one 675 00:40:37,460 --> 00:40:41,410 of the methods in Field. 676 00:40:41,410 --> 00:40:46,230 So Field.moveDrunk will take a self and a drunk. 677 00:40:49,140 --> 00:40:51,460 It's going to get the x value, the y value, 678 00:40:51,460 --> 00:40:55,350 and if that is in the wormholes, it's 679 00:40:55,350 --> 00:40:57,810 going to move the drunk to the location associated 680 00:40:57,810 --> 00:40:59,650 with that wormhole. 681 00:40:59,650 --> 00:41:02,320 So we move a drunk, and if the drunk ends up 682 00:41:02,320 --> 00:41:07,060 being in the wormhole, he gets transported. 683 00:41:07,060 --> 00:41:10,750 So we're using Field.moveDrunk. 684 00:41:10,750 --> 00:41:16,150 So notice that we're using the moveDrunk of the superclass, 685 00:41:16,150 --> 00:41:18,100 even though we're overriding it here. 686 00:41:18,100 --> 00:41:21,400 Because we've overridden it here, I have to say Field. 687 00:41:21,400 --> 00:41:24,010 to indicate I want the one from the superclass, not 688 00:41:24,010 --> 00:41:25,510 the subclass. 689 00:41:25,510 --> 00:41:28,900 And then we're doing something peculiar after the move. 690 00:41:32,730 --> 00:41:36,120 So interestingly here, I've taken, 691 00:41:36,120 --> 00:41:39,990 I think, a usual drunk and plotted the usual drunk 692 00:41:39,990 --> 00:41:44,450 on a walk of 500 steps. 693 00:41:44,450 --> 00:41:50,250 One walk, and shown all the places the drunk visited. 694 00:41:50,250 --> 00:41:52,820 So we've seen three kinds of plots, 695 00:41:52,820 --> 00:41:56,100 one showing how far the drunk would get at different length 696 00:41:56,100 --> 00:42:01,800 walks, one showing all the places 697 00:42:01,800 --> 00:42:05,850 the drunk would end up with many walks of the same length, 698 00:42:05,850 --> 00:42:10,890 and here a single walk, all the places the drunk visits. 699 00:42:10,890 --> 00:42:12,480 And as you can see, the wormholes 700 00:42:12,480 --> 00:42:16,450 produce a profound effect, in this case, 701 00:42:16,450 --> 00:42:19,720 on where the drunks end up. 702 00:42:19,720 --> 00:42:21,030 And again, you have the code. 703 00:42:21,030 --> 00:42:24,420 You can run this yourself and simulate it 704 00:42:24,420 --> 00:42:25,950 and see what you go. 705 00:42:25,950 --> 00:42:29,970 And I think I've set random.Seed to zero 706 00:42:29,970 --> 00:42:32,630 in each of the simulations in the code, 707 00:42:32,630 --> 00:42:35,070 but you should play with it, change it, 708 00:42:35,070 --> 00:42:38,021 to just see that you'll actually get different results 709 00:42:38,021 --> 00:42:38,895 with different seeds. 710 00:42:43,810 --> 00:42:50,380 Let me summarize here, and say the point of going 711 00:42:50,380 --> 00:42:55,430 through these random walks is not the simulations themselves, 712 00:42:55,430 --> 00:42:58,090 but how we built them. 713 00:42:58,090 --> 00:43:02,070 That we started by defining the classes. 714 00:43:02,070 --> 00:43:04,250 We then built functions corresponding 715 00:43:04,250 --> 00:43:10,520 to one trial, multiple trials, and reported the results. 716 00:43:10,520 --> 00:43:14,090 And then made a set of incremental changes 717 00:43:14,090 --> 00:43:17,600 to the simulation so that we could investigate 718 00:43:17,600 --> 00:43:18,780 different questions. 719 00:43:21,300 --> 00:43:23,670 So we started with a simple simulation 720 00:43:23,670 --> 00:43:28,090 with just the usual drunk and the simple field, 721 00:43:28,090 --> 00:43:31,930 and we noticed it didn't work. 722 00:43:31,930 --> 00:43:32,830 How did we know it? 723 00:43:32,830 --> 00:43:35,470 Well, not because when we did the full simulation 724 00:43:35,470 --> 00:43:38,410 we had great insight. 725 00:43:38,410 --> 00:43:40,300 I probably could have fooled 1/2 of you 726 00:43:40,300 --> 00:43:43,370 and convinced you that that was a reasonable answer. 727 00:43:43,370 --> 00:43:46,870 But as soon as we went and did the sanity check, where 728 00:43:46,870 --> 00:43:52,780 we knew the answer, we could know something was wrong. 729 00:43:52,780 --> 00:43:54,730 And then we went and we fixed it. 730 00:43:54,730 --> 00:43:56,590 And then we went and we elaborated it 731 00:43:56,590 --> 00:43:57,730 at a step of a time. 732 00:43:57,730 --> 00:43:59,560 I first got a more sophistic-- 733 00:43:59,560 --> 00:44:01,090 I shouldn't say sophisticated. 734 00:44:01,090 --> 00:44:03,510 A different kind of drunk. 735 00:44:03,510 --> 00:44:07,350 And then we went to a different kind of field. 736 00:44:07,350 --> 00:44:10,810 Finally, we spent time showing how 737 00:44:10,810 --> 00:44:13,630 to use plots to get an insight. 738 00:44:13,630 --> 00:44:16,340 And in the remaining few minutes of the class, 739 00:44:16,340 --> 00:44:19,580 I want to go back and show you some of the plotting commands. 740 00:44:23,970 --> 00:44:25,780 To show you how these plots were produced. 741 00:44:42,480 --> 00:44:45,020 So one of the things I did, since I 742 00:44:45,020 --> 00:44:48,830 knew I was going to be producing a lot of different plots, 743 00:44:48,830 --> 00:44:53,060 I decided I would actually not spend time worrying 744 00:44:53,060 --> 00:44:56,926 about what kind of markers-- 745 00:44:56,926 --> 00:44:58,550 those are the things like the triangles 746 00:44:58,550 --> 00:45:00,650 and the plus sign-- or what colors for each one 747 00:45:00,650 --> 00:45:03,080 individually, but instead I'd set up 748 00:45:03,080 --> 00:45:06,560 a styleIterator that would just return 749 00:45:06,560 --> 00:45:07,860 a bunch of different styles. 750 00:45:07,860 --> 00:45:13,160 So once and for all, I could define n styles, 751 00:45:13,160 --> 00:45:16,490 and then when I want to plot a new kind of drunk, 752 00:45:16,490 --> 00:45:18,500 I would just call the styleIterator 753 00:45:18,500 --> 00:45:21,680 to get the next style. 754 00:45:21,680 --> 00:45:24,790 So this is a fairly common kind of paradigm 755 00:45:24,790 --> 00:45:28,420 to say that I just want to do this once and for all. 756 00:45:28,420 --> 00:45:30,880 I don't want to have to go through each time I do this. 757 00:45:35,380 --> 00:45:37,030 So what do the styles look like? 758 00:45:43,940 --> 00:45:47,441 Let me just get this window. 759 00:45:47,441 --> 00:45:47,940 Oh. 760 00:45:55,460 --> 00:45:57,990 So here it is. 761 00:46:03,530 --> 00:46:05,910 I said there were going to be three styles that I'm 762 00:46:05,910 --> 00:46:07,830 going to iterate through. 763 00:46:07,830 --> 00:46:12,400 Style one is going to be an m, I guess 764 00:46:12,400 --> 00:46:20,050 that's maroon with a line, a blue with a dashed line, 765 00:46:20,050 --> 00:46:29,810 and green with a line with a comma and a minus sign. 766 00:46:29,810 --> 00:46:33,440 So these are called the styles. 767 00:46:33,440 --> 00:46:37,370 And you can control the marker, if you have a marker. 768 00:46:37,370 --> 00:46:38,930 You can control the line. 769 00:46:38,930 --> 00:46:41,220 You can control the color. 770 00:46:41,220 --> 00:46:43,950 Also you can control the size. 771 00:46:43,950 --> 00:46:47,070 You can give the sizes of all these things. 772 00:46:47,070 --> 00:46:50,640 What you'll see when you look at the code 773 00:46:50,640 --> 00:46:55,110 is I don't like the default styles 774 00:46:55,110 --> 00:46:57,250 things, because when they show up on the screen, 775 00:46:57,250 --> 00:46:59,020 they're too small. 776 00:46:59,020 --> 00:47:02,310 So there's something called rcParams. 777 00:47:02,310 --> 00:47:03,810 Those of you who are Unix hackers 778 00:47:03,810 --> 00:47:06,889 can maybe guess where that name came from. 779 00:47:06,889 --> 00:47:08,430 And I've just said a bunch of things, 780 00:47:08,430 --> 00:47:11,670 like that my default line width will be four points. 781 00:47:11,670 --> 00:47:14,610 The size for the titles will be 20. 782 00:47:14,610 --> 00:47:16,120 You can put titles on the graphs. 783 00:47:18,990 --> 00:47:20,700 Various kinds of things. 784 00:47:20,700 --> 00:47:26,120 Again, once and for all trying to set some of these parameters 785 00:47:26,120 --> 00:47:27,730 so they get used over and over again. 786 00:47:38,910 --> 00:47:43,260 And then finally down here, you'll 787 00:47:43,260 --> 00:47:45,870 see that I did things like you want 788 00:47:45,870 --> 00:47:48,900 to put titles on the slides. 789 00:47:48,900 --> 00:47:50,490 So on the graph. 790 00:47:50,490 --> 00:47:52,970 So here's the location at end of walk. 791 00:47:52,970 --> 00:47:55,250 Title is just a string. 792 00:47:55,250 --> 00:47:58,840 You want to label your x and y-axis, 793 00:47:58,840 --> 00:48:01,990 so I've labeled them here. 794 00:48:01,990 --> 00:48:06,140 And here I've said where I want the legend to appear 795 00:48:06,140 --> 00:48:07,175 in the lower center. 796 00:48:10,380 --> 00:48:15,210 I've also set the y-limits and the x-limits on the axis, 797 00:48:15,210 --> 00:48:17,430 because I wanted a little extra room. 798 00:48:17,430 --> 00:48:20,760 Otherwise, by default it will put points 799 00:48:20,760 --> 00:48:24,620 right on the axes, which I find hard to read. 800 00:48:24,620 --> 00:48:27,620 Anyway, the point here is not that you understand 801 00:48:27,620 --> 00:48:29,930 all of this instantaneously. 802 00:48:29,930 --> 00:48:34,850 The point I want to communicate is that it's very flexible. 803 00:48:34,850 --> 00:48:37,190 And so if you decide you don't like 804 00:48:37,190 --> 00:48:39,810 the way a plot looks and you want to change it, 805 00:48:39,810 --> 00:48:41,830 and you know what you want it to look like, 806 00:48:41,830 --> 00:48:45,380 there's almost surely a way to make it do that. 807 00:48:45,380 --> 00:48:47,360 So don't despair. 808 00:48:47,360 --> 00:48:50,150 You can look at the references I gave earlier 809 00:48:50,150 --> 00:48:50,990 and figure that out. 810 00:48:53,650 --> 00:48:56,140 Next lecture we're going to move on. 811 00:48:56,140 --> 00:48:58,030 No more random walks. 812 00:48:58,030 --> 00:49:00,550 We'll look at simulating other things, 813 00:49:00,550 --> 00:49:05,230 and in particular, we'll look at the question of how believable 814 00:49:05,230 --> 00:49:07,360 is a simulation? 815 00:49:07,360 --> 00:49:11,610 See you Wednesday if the world has not come to an end.