1 00:00:00,530 --> 00:00:02,960 The following content is provided under a Creative 2 00:00:02,960 --> 00:00:04,370 Commons license. 3 00:00:04,370 --> 00:00:07,410 Your support will help MIT OpenCourseWare continue to 4 00:00:07,410 --> 00:00:11,060 offer high-quality educational resources for free. 5 00:00:11,060 --> 00:00:13,960 To make a donation or view additional materials from 6 00:00:13,960 --> 00:00:19,790 hundreds of MIT courses, visit MIT OpenCourseWare at 7 00:00:19,790 --> 00:00:21,040 ocw.mit.edu. 8 00:00:23,300 --> 00:00:24,890 PROFESSOR: OK, we're all ready to go. 9 00:00:29,860 --> 00:00:35,900 This is Discrete Stochastic Processes as you are know. 10 00:00:35,900 --> 00:00:37,150 It is-- 11 00:00:39,050 --> 00:00:40,615 want to get it where I can read it, too. 12 00:00:44,150 --> 00:00:46,270 We're going to try to deal with a bunch of different 13 00:00:46,270 --> 00:00:51,450 topics today, some of which are a little bit philosophical 14 00:00:51,450 --> 00:00:55,360 saying, what is probability really? 15 00:00:55,360 --> 00:01:00,220 You are supposed to have taken a course in probability, but 16 00:01:00,220 --> 00:01:04,760 unfortunately courses in probability are almost always 17 00:01:04,760 --> 00:01:08,080 courses in how to solve well-posed problems. 18 00:01:08,080 --> 00:01:11,750 The big problem in probability theory, and particularly 19 00:01:11,750 --> 00:01:18,040 stochastic processes is not so much how do you solve 20 00:01:18,040 --> 00:01:20,290 well-posed problems. 21 00:01:20,290 --> 00:01:22,320 Anybody can do that. 22 00:01:22,320 --> 00:01:28,040 Or anybody who has a little bit of background can do it. 23 00:01:28,040 --> 00:01:33,550 The hard problem is finding the right models for a 24 00:01:33,550 --> 00:01:35,060 real-world problem. 25 00:01:35,060 --> 00:01:36,980 I will call these real-world problems. 26 00:01:36,980 --> 00:01:40,730 I hate people who call things real-world because they sound 27 00:01:40,730 --> 00:01:43,310 like they dislike theory or something. 28 00:01:43,310 --> 00:01:47,380 It's the only word I can think of though, because physical 29 00:01:47,380 --> 00:01:51,200 world is no longer adequate because so much of the 30 00:01:51,200 --> 00:01:55,190 applications of probability are to all sorts of different 31 00:01:55,190 --> 00:01:58,670 things, not having to do with the physical world very much, 32 00:01:58,670 --> 00:02:03,400 but having to do with things like the business world, or 33 00:02:03,400 --> 00:02:07,850 the economic world, or the biological world, or all of 34 00:02:07,850 --> 00:02:10,190 these things. 35 00:02:10,190 --> 00:02:15,990 So real-world is just a code word we'll use to distinguish 36 00:02:15,990 --> 00:02:21,280 theory from anything real that you might have to deal with. 37 00:02:21,280 --> 00:02:23,830 Theory is very nice because theory-- 38 00:02:23,830 --> 00:02:25,670 everything is specified. 39 00:02:25,670 --> 00:02:27,800 There's a right answer to everything. 40 00:02:27,800 --> 00:02:29,640 There is a wrong answer usually. 41 00:02:29,640 --> 00:02:32,350 But there's at least one right answer. 42 00:02:32,350 --> 00:02:34,420 And most of us like those things because 43 00:02:34,420 --> 00:02:35,620 they're very specific. 44 00:02:35,620 --> 00:02:39,250 People go into engineering and science very often because 45 00:02:39,250 --> 00:02:42,150 they don't like the uncertainty of a 46 00:02:42,150 --> 00:02:44,130 lot of other fields. 47 00:02:44,130 --> 00:02:48,170 The problem is as soon as you go into probability theory, 48 00:02:48,170 --> 00:02:51,560 you're moving away from that safe region where everything 49 00:02:51,560 --> 00:02:55,260 is specified, and you're moving into a region where 50 00:02:55,260 --> 00:02:58,360 things, in fact, are not very well-specified and you have to 51 00:02:58,360 --> 00:02:59,900 be careful about it. 52 00:02:59,900 --> 00:03:02,670 OK, so first we're going to talk about probability in the 53 00:03:02,670 --> 00:03:06,150 real world and probability as a branch of mathematics. 54 00:03:06,150 --> 00:03:07,840 Then we're going to say what discrete 55 00:03:07,840 --> 00:03:10,200 stochastic processes are. 56 00:03:10,200 --> 00:03:14,850 Then we're going to talk just a very, very little bit about 57 00:03:14,850 --> 00:03:17,140 the processes we're going to study. 58 00:03:17,140 --> 00:03:21,430 If you want to see more of that, you have two 59 00:03:21,430 --> 00:03:22,530 chapters of the notes. 60 00:03:22,530 --> 00:03:23,720 You can look at them. 61 00:03:23,720 --> 00:03:26,260 You can look at the table of contents. 62 00:03:26,260 --> 00:03:30,910 And more than that, if you look at my website, you will 63 00:03:30,910 --> 00:03:33,490 see the notes for all the other chapters if you want to 64 00:03:33,490 --> 00:03:36,500 read ahead or if you want to really find out what kinds of 65 00:03:36,500 --> 00:03:38,660 things we're going to talk about and what kinds of things 66 00:03:38,660 --> 00:03:40,320 we're not going to talk about. 67 00:03:40,320 --> 00:03:42,180 Then we're going to talk about when, where, 68 00:03:42,180 --> 00:03:44,000 and how is this useful? 69 00:03:44,000 --> 00:03:47,385 The short answer to that is it's useful everywhere. 70 00:03:47,385 --> 00:03:49,930 But we'll have to see why that is. 71 00:03:49,930 --> 00:03:51,620 Then we're going to talk about the axioms 72 00:03:51,620 --> 00:03:53,150 of probability theory. 73 00:03:53,150 --> 00:03:58,310 You cannot take any elementary course in probability, or even 74 00:03:58,310 --> 00:04:03,690 in statistics, without seeing the axioms of probability. 75 00:04:03,690 --> 00:04:06,740 And in almost all of those cases, and in almost all of 76 00:04:06,740 --> 00:04:10,350 the graduate courses I've seen, you see them, they 77 00:04:10,350 --> 00:04:14,340 disappear, and suddenly you're solving problems in whatever 78 00:04:14,340 --> 00:04:16,470 way you can, and the axioms have nothing 79 00:04:16,470 --> 00:04:18,040 to do with it anymore. 80 00:04:18,040 --> 00:04:21,390 So we're going to see that, in fact, the axioms do have 81 00:04:21,390 --> 00:04:23,680 something to do with this. 82 00:04:23,680 --> 00:04:29,940 Those of you who want to be real engineers and not 83 00:04:29,940 --> 00:04:31,990 mathematicians, you'll find this a little 84 00:04:31,990 --> 00:04:33,910 uncomfortable at times. 85 00:04:33,910 --> 00:04:37,050 We are going to be proving things. 86 00:04:37,050 --> 00:04:40,940 And you will have to get used to that. 87 00:04:40,940 --> 00:04:43,560 And I'll try to convince you of why it's important to be 88 00:04:43,560 --> 00:04:45,180 able to prove things. 89 00:04:45,180 --> 00:04:47,470 Then we're going on to a review of probability, 90 00:04:47,470 --> 00:04:48,930 independent events, 91 00:04:48,930 --> 00:04:51,850 experiments, and random variables. 92 00:04:51,850 --> 00:04:53,100 So that's what we'll do today. 93 00:04:56,250 --> 00:04:59,990 Incidentally, this course started about-- 94 00:04:59,990 --> 00:05:02,730 must've been 25 years ago or so. 95 00:05:02,730 --> 00:05:07,790 I started it because we had a huge number of students at MIT 96 00:05:07,790 --> 00:05:12,120 who had been interested in communication and control, and 97 00:05:12,120 --> 00:05:15,170 who were suddenly starting to get interested in networks. 98 00:05:15,170 --> 00:05:17,780 And there were all sorts of queuing problems that they had 99 00:05:17,780 --> 00:05:19,490 to deal with every day. 100 00:05:19,490 --> 00:05:21,950 And they started to read about queuing theory. 101 00:05:21,950 --> 00:05:25,580 and it was the most disjointed, crazy theory in 102 00:05:25,580 --> 00:05:28,690 the world where there were 1,000 different kinds of 103 00:05:28,690 --> 00:05:30,720 queues and each one of them had to be 104 00:05:30,720 --> 00:05:32,960 treated in its own way. 105 00:05:32,960 --> 00:05:36,130 And we realized that stochastic processes was the 106 00:05:36,130 --> 00:05:38,620 right way to tie all of that together, so we 107 00:05:38,620 --> 00:05:39,620 started this course. 108 00:05:39,620 --> 00:05:44,090 And we made it mostly discrete so it would deal primarily 109 00:05:44,090 --> 00:05:46,330 with network type applications. 110 00:05:46,330 --> 00:05:51,120 As things have grown, it now deals with a whole lot more 111 00:05:51,120 --> 00:05:52,320 applications. 112 00:05:52,320 --> 00:05:56,170 And we'll see how that works later on. 113 00:05:56,170 --> 00:06:00,230 OK, how did probability get started in the real world? 114 00:06:00,230 --> 00:06:02,540 Well, there were games of chance that everybody was 115 00:06:02,540 --> 00:06:03,690 interested in. 116 00:06:03,690 --> 00:06:06,120 People really like to gamble. 117 00:06:06,120 --> 00:06:07,980 I don't know why. 118 00:06:07,980 --> 00:06:09,720 I don't like to gamble that much. 119 00:06:09,720 --> 00:06:11,740 I would rather be certain about things. 120 00:06:11,740 --> 00:06:14,330 But most people love to gamble. 121 00:06:14,330 --> 00:06:16,730 And most people have an intuitive sense of what 122 00:06:16,730 --> 00:06:18,690 probability is about. 123 00:06:18,690 --> 00:06:22,790 I mean, eight-year-old kids, when they start to learn to 124 00:06:22,790 --> 00:06:24,750 play games of chance-- 125 00:06:24,750 --> 00:06:26,410 and there are all sorts of board games 126 00:06:26,410 --> 00:06:27,870 that involve chance. 127 00:06:27,870 --> 00:06:29,740 These kids, if they're bright-- 128 00:06:29,740 --> 00:06:33,780 and I'm sure you people fall into that category-- 129 00:06:33,780 --> 00:06:37,230 they immediately start to figure out what the odds are. 130 00:06:37,230 --> 00:06:40,390 I mean, how many of you have never thought about what the 131 00:06:40,390 --> 00:06:43,640 odds are in some gambling game? 132 00:06:43,640 --> 00:06:46,320 OK, that makes my point. 133 00:06:46,320 --> 00:06:50,630 So all of you understand this at an intuitive level. 134 00:06:50,630 --> 00:06:54,310 But what makes games of chance easier to deal with than all 135 00:06:54,310 --> 00:06:59,340 the other issues where we have uncertainty in life? 136 00:06:59,340 --> 00:07:03,340 Well, games of chance are inherently repeatable. 137 00:07:03,340 --> 00:07:07,100 You play a game of chance and you play many, many hands, or 138 00:07:07,100 --> 00:07:10,740 many, many throws, or many, many trials. 139 00:07:10,740 --> 00:07:14,470 And after many, many trials of what's essentially the same 140 00:07:14,470 --> 00:07:18,080 experiment, you start to get a sense of what relative 141 00:07:18,080 --> 00:07:18,960 frequencies are. 142 00:07:18,960 --> 00:07:22,210 You start to get a sense of what the odds are because of 143 00:07:22,210 --> 00:07:26,110 doing this repeatedly. 144 00:07:26,110 --> 00:07:30,500 So games of chance are easy to use probability on because 145 00:07:30,500 --> 00:07:31,890 they are repeatable. 146 00:07:31,890 --> 00:07:35,820 You have essentially the same thing going on each time, but 147 00:07:35,820 --> 00:07:38,090 each time there's a different answer. 148 00:07:38,090 --> 00:07:41,190 You flip a coin and sometimes it comes up heads and 149 00:07:41,190 --> 00:07:43,630 sometimes it comes up tails. 150 00:07:43,630 --> 00:07:49,310 So in fact, we have to figure out how to deal with that fact 151 00:07:49,310 --> 00:07:50,650 that there is uncertainty there. 152 00:07:50,650 --> 00:07:54,350 I'll talk about that in just another minute. 153 00:07:54,350 --> 00:07:58,600 But anyway, most of life's decisions involve uncertainty. 154 00:07:58,600 --> 00:08:03,920 I mean, for all of you, when you go into a PhD program, you 155 00:08:03,920 --> 00:08:04,960 have two problems. 156 00:08:04,960 --> 00:08:06,930 Am I going to enjoy this? 157 00:08:06,930 --> 00:08:09,070 And you don't know whether you're going to enjoy it 158 00:08:09,070 --> 00:08:12,210 because not until you really get into it do you have a 159 00:08:12,210 --> 00:08:15,540 sense of whether this set of problems you're dealing with 160 00:08:15,540 --> 00:08:20,060 is something that you like to deal with. 161 00:08:20,060 --> 00:08:26,330 And the only way you can do that is to make guesses. 162 00:08:26,330 --> 00:08:28,100 You come up with likelihoods. 163 00:08:28,100 --> 00:08:29,400 There's some likelihood. 164 00:08:29,400 --> 00:08:34,039 There's a risk cost-benefit that you deal with. 165 00:08:34,039 --> 00:08:38,270 And in life, risk cost-benefits are always based 166 00:08:38,270 --> 00:08:41,650 on some sense of what the likelihood of something is. 167 00:08:41,650 --> 00:08:42,710 Now, what is a likelihood? 168 00:08:42,710 --> 00:08:44,360 A likelihood is just a probability. 169 00:08:44,360 --> 00:08:46,980 It's a synonym for probability. 170 00:08:46,980 --> 00:08:50,060 When you get into the mathematics of probability, 171 00:08:50,060 --> 00:08:52,950 likelihood has a special meaning to it. 172 00:08:52,950 --> 00:08:57,080 But in the real world, likelihood is just a word you 173 00:08:57,080 --> 00:08:59,540 use when you don't want to let people know that you're really 174 00:08:59,540 --> 00:09:01,790 talking about probabilities. 175 00:09:01,790 --> 00:09:07,710 OK, so that's where we are. 176 00:09:07,710 --> 00:09:10,942 But on the last slide, you saw the word "essentially, 177 00:09:10,942 --> 00:09:14,400 essentially, essentially." If you read the notes, and I hope 178 00:09:14,400 --> 00:09:18,060 you read the notes, because I spent the last three years 179 00:09:18,060 --> 00:09:20,220 doing virtually nothing but trying to 180 00:09:20,220 --> 00:09:22,280 make these notes clear. 181 00:09:22,280 --> 00:09:25,800 I would appreciate it if any of you, with whatever 182 00:09:25,800 --> 00:09:29,620 background you have, when you read these nodes, if you read 183 00:09:29,620 --> 00:09:32,970 them twice and you still don't understand something, tell me 184 00:09:32,970 --> 00:09:34,300 you don't understand it. 185 00:09:34,300 --> 00:09:36,705 If you know why you don't understand it, I'd appreciate 186 00:09:36,705 --> 00:09:38,420 it knowing that. 187 00:09:38,420 --> 00:09:42,490 But just saying "help" is enough to let me know that I 188 00:09:42,490 --> 00:09:45,730 still haven't made something as clear as it should be. 189 00:09:45,730 --> 00:09:49,240 At least as clear as it should be for some of the people who 190 00:09:49,240 --> 00:09:52,590 I think should be taking this course. 191 00:09:52,590 --> 00:09:56,000 One of the problems we have at MIT now, and at every graduate 192 00:09:56,000 --> 00:10:00,980 school in the world I think, is that human knowledge has 193 00:10:00,980 --> 00:10:05,390 changed and grown so much in the last 50 years. 194 00:10:05,390 --> 00:10:09,010 So when you study something general, like probability, 195 00:10:09,010 --> 00:10:12,600 there's just an enormous mass of stuff you 196 00:10:12,600 --> 00:10:14,020 have to deal with. 197 00:10:14,020 --> 00:10:16,360 And because of that, when you try to write notes for a 198 00:10:16,360 --> 00:10:18,290 course, you don't know what anybody's 199 00:10:18,290 --> 00:10:20,360 background is anymore. 200 00:10:20,360 --> 00:10:24,580 I mean, it used to be that when you saw a graduate course 201 00:10:24,580 --> 00:10:27,320 at MIT, people would know what limits were. 202 00:10:27,320 --> 00:10:30,830 People would know what basic mathematics is all about. 203 00:10:30,830 --> 00:10:33,000 They would know what continuity means. 204 00:10:33,000 --> 00:10:35,110 They would know some linear algebra. 205 00:10:35,110 --> 00:10:37,440 They would know all sorts of different things. 206 00:10:37,440 --> 00:10:39,800 Many people still do know those things. 207 00:10:39,800 --> 00:10:44,830 Many other people have studied all sorts of other fascinating 208 00:10:44,830 --> 00:10:46,110 and very interesting things. 209 00:10:46,110 --> 00:10:48,440 They're just as smart, but they have a very different 210 00:10:48,440 --> 00:10:49,320 background. 211 00:10:49,320 --> 00:10:52,650 So if your background is different, it's not your fault 212 00:10:52,650 --> 00:10:55,090 that you don't have the kind of background that makes 213 00:10:55,090 --> 00:10:56,680 probability easy. 214 00:10:56,680 --> 00:10:58,250 Just yell. 215 00:10:58,250 --> 00:10:59,560 Or yell in class. 216 00:10:59,560 --> 00:11:01,440 Please ask questions. 217 00:11:01,440 --> 00:11:04,200 The fact that we're videotaping this makes it far 218 00:11:04,200 --> 00:11:08,340 more interesting for anybody who's using OpenCourseWare to 219 00:11:08,340 --> 00:11:12,710 see some kinds of questions going on, so I very much 220 00:11:12,710 --> 00:11:15,760 encourage that. 221 00:11:15,760 --> 00:11:19,690 I'm fairly old at this point, and my memory is getting shot. 222 00:11:19,690 --> 00:11:23,500 So if you ask a question and I don't remember what it's all 223 00:11:23,500 --> 00:11:26,590 about, just be patient with it. 224 00:11:26,590 --> 00:11:28,870 I will come back the next time, or I'll send you an 225 00:11:28,870 --> 00:11:31,500 email straightening it out. 226 00:11:31,500 --> 00:11:35,640 But I will often get confused doing something, and that's 227 00:11:35,640 --> 00:11:36,850 just because of my age. 228 00:11:36,850 --> 00:11:40,090 It's what we call "senior moments." It's not that I 229 00:11:40,090 --> 00:11:42,110 don't understand the subject. 230 00:11:42,110 --> 00:11:43,630 I think I understand it, I just 231 00:11:43,630 --> 00:11:44,936 don't remember it anymore. 232 00:11:49,770 --> 00:11:52,470 Important point about probability. 233 00:11:52,470 --> 00:11:54,810 Think about flipping a coin. 234 00:11:54,810 --> 00:11:56,920 I'm going to talk about flipping coins a 235 00:11:56,920 --> 00:11:58,500 great deal this term. 236 00:11:58,500 --> 00:12:02,300 It's an absolutely trivial topic, but it's important 237 00:12:02,300 --> 00:12:07,300 because when you understand deep things about a large 238 00:12:07,300 --> 00:12:10,530 subject, the best way to understand them is to 239 00:12:10,530 --> 00:12:13,420 understand them in terms of the most trivial examples you 240 00:12:13,420 --> 00:12:14,880 can think of. 241 00:12:14,880 --> 00:12:18,770 Now, when you flip a coin, the outcome-- heads or tails-- 242 00:12:18,770 --> 00:12:24,020 really depends on the initial velocity, the orientation of 243 00:12:24,020 --> 00:12:27,310 the person flipping it, or the machine flipping it, the coin 244 00:12:27,310 --> 00:12:29,960 surfaces, the ground surface. 245 00:12:29,960 --> 00:12:33,480 And after you put all of those things into a careful 246 00:12:33,480 --> 00:12:36,800 equation, you will know whether the coin is going to 247 00:12:36,800 --> 00:12:39,080 come up heads or tails. 248 00:12:39,080 --> 00:12:41,860 I don't think quantum theory has anything to do with 249 00:12:41,860 --> 00:12:43,620 something as big as a coin. 250 00:12:43,620 --> 00:12:45,210 I might be wrong. 251 00:12:45,210 --> 00:12:46,630 I've never looked into it. 252 00:12:46,630 --> 00:12:48,200 And frankly, I don't care. 253 00:12:48,200 --> 00:12:50,870 Because the point that I'm trying to make is that 254 00:12:50,870 --> 00:12:53,410 flipping a coin, and many of the things that you view as 255 00:12:53,410 --> 00:12:57,170 random, when you look at them in a slightly different way, 256 00:12:57,170 --> 00:12:59,020 are not random. 257 00:12:59,020 --> 00:13:01,390 There's a big field in communication called data 258 00:13:01,390 --> 00:13:02,810 compression. 259 00:13:02,810 --> 00:13:07,270 And data compression is based on random models for the data, 260 00:13:07,270 --> 00:13:09,480 which is going to be compressed. 261 00:13:09,480 --> 00:13:13,070 Now, what I'm saying here today is by no means random. 262 00:13:13,070 --> 00:13:14,900 Or maybe it is partly random. 263 00:13:14,900 --> 00:13:17,910 Maybe it's coming out of a random mind, I don't know. 264 00:13:17,910 --> 00:13:23,725 But all of the data we try to compress to the people who 265 00:13:23,725 --> 00:13:27,260 have created that data, it's not random at all. 266 00:13:27,260 --> 00:13:30,710 If you study the data carefully enough-- 267 00:13:30,710 --> 00:13:33,510 I mean, code breakers and people like that are extremely 268 00:13:33,510 --> 00:13:37,550 good at sorting out what the meaning is in something, which 269 00:13:37,550 --> 00:13:41,230 cannot be done by data compression techniques at all. 270 00:13:41,230 --> 00:13:45,280 So the point is, when you're doing data compression, you 271 00:13:45,280 --> 00:13:48,010 model the data as being random and having certain 272 00:13:48,010 --> 00:13:49,570 characteristics. 273 00:13:49,570 --> 00:13:51,800 But it really isn't. 274 00:13:51,800 --> 00:13:54,130 So the model is no good. 275 00:13:54,130 --> 00:13:56,550 When you get to more important questions-- 276 00:13:56,550 --> 00:13:59,950 well, data compression is an important question. 277 00:13:59,950 --> 00:14:02,530 When you ask, what's the probability of another 278 00:14:02,530 --> 00:14:05,790 catastrophic oil spill in the next year? 279 00:14:05,790 --> 00:14:08,040 Or you ask the question, what's the probability that 280 00:14:08,040 --> 00:14:10,140 Google stock will double in five years? 281 00:14:10,140 --> 00:14:12,580 That's less important, but it's still 282 00:14:12,580 --> 00:14:15,630 important to many people. 283 00:14:15,630 --> 00:14:16,880 How do you model that? 284 00:14:19,410 --> 00:14:22,300 Understanding probability theory, understanding all the 285 00:14:22,300 --> 00:14:26,500 mathematics of this is not going to help you model this. 286 00:14:26,500 --> 00:14:29,880 Now, why do I make such a big deal about this? 287 00:14:29,880 --> 00:14:33,910 Well, there have been a number of times in the last 10 or 15 288 00:14:33,910 --> 00:14:38,480 years when the whole financial system of the world has almost 289 00:14:38,480 --> 00:14:43,200 been destroyed by very, very bright PhDs. 290 00:14:43,200 --> 00:14:47,680 Many of them coming from electrical engineering. 291 00:14:47,680 --> 00:14:50,010 Most of whom are really superb at 292 00:14:50,010 --> 00:14:52,580 understanding probability theory. 293 00:14:52,580 --> 00:14:55,490 And they have used their probability theory to analyze 294 00:14:55,490 --> 00:14:58,640 risk and other things in investments. 295 00:14:58,640 --> 00:15:00,630 And what has happened? 296 00:15:00,630 --> 00:15:03,100 They do very well for a while. 297 00:15:03,100 --> 00:15:06,050 Suddenly they do so well that they think they can borrow all 298 00:15:06,050 --> 00:15:08,750 sorts of money and risk other people's money as 299 00:15:08,750 --> 00:15:09,610 well as their own. 300 00:15:09,610 --> 00:15:13,080 In fact, they try to do that right from the beginning. 301 00:15:13,080 --> 00:15:16,030 And then suddenly, the whole thing collapses. 302 00:15:16,030 --> 00:15:18,240 Because their models are no damn good. 303 00:15:18,240 --> 00:15:21,220 There's nothing wrong with their mathematics, it's that 304 00:15:21,220 --> 00:15:22,770 their models are no good. 305 00:15:22,770 --> 00:15:26,710 So please, especially if you're going to do something 306 00:15:26,710 --> 00:15:28,940 important in your lives-- 307 00:15:28,940 --> 00:15:32,410 if you're just going to write papers in engineering 308 00:15:32,410 --> 00:15:36,000 journals, maybe it's all right. 309 00:15:36,000 --> 00:15:39,940 But if you're going to make decisions about things, please 310 00:15:39,940 --> 00:15:43,060 spend some time thinking about the probability models that 311 00:15:43,060 --> 00:15:47,890 you come up because this is vitally important. 312 00:15:47,890 --> 00:15:49,110 OK, what's probability? 313 00:15:49,110 --> 00:15:50,630 It's a branch of mathematics. 314 00:15:50,630 --> 00:15:53,870 Now we're into something that's more familiar, 315 00:15:53,870 --> 00:15:57,420 something that's simpler, something we can deal with. 316 00:16:00,070 --> 00:16:02,340 You might be uncomfortable with what 317 00:16:02,340 --> 00:16:04,280 probability really means. 318 00:16:04,280 --> 00:16:08,660 And all probability books, all stochastic process books are 319 00:16:08,660 --> 00:16:10,080 uncomfortable with this. 320 00:16:10,080 --> 00:16:14,250 Feller is the best book in probability 321 00:16:14,250 --> 00:16:17,260 there's ever been written. 322 00:16:17,260 --> 00:16:21,090 Any question you have, he probably has the answer to it. 323 00:16:21,090 --> 00:16:26,960 When you look at what he says about real-world probability, 324 00:16:26,960 --> 00:16:31,150 the modeling issues, he's an extraordinarily bright guy. 325 00:16:31,150 --> 00:16:33,900 And he spent some time thinking about this. 326 00:16:33,900 --> 00:16:35,360 But you read it and you realize 327 00:16:35,360 --> 00:16:38,630 that it's pure nonsense. 328 00:16:38,630 --> 00:16:40,710 So please, take my word for it. 329 00:16:43,590 --> 00:16:46,780 Don't assume that real-world probability is something 330 00:16:46,780 --> 00:16:49,070 you're going to learn about from other people because you 331 00:16:49,070 --> 00:16:51,120 can't trust what any of them say. 332 00:16:51,120 --> 00:16:53,800 It's something you have to think through for yourselves, 333 00:16:53,800 --> 00:16:56,410 and we'll talk more about this as we go. 334 00:16:56,410 --> 00:16:59,200 But now, when we get into mathematics, that's fine. 335 00:16:59,200 --> 00:17:00,440 We just create models. 336 00:17:00,440 --> 00:17:03,090 And once we have the model, we just use it. 337 00:17:03,090 --> 00:17:06,680 We have standard models for all sorts of different 338 00:17:06,680 --> 00:17:08,260 standard problems. 339 00:17:08,260 --> 00:17:11,890 When you talk about coin tossing, what almost everyone 340 00:17:11,890 --> 00:17:16,339 means is not this crazy thing I was just talking about where 341 00:17:16,339 --> 00:17:20,800 you have an initial angular momentum when you flip a coin 342 00:17:20,800 --> 00:17:22,300 and all of that stuff. 343 00:17:22,300 --> 00:17:25,990 It's a purely mathematical model where a coin is flipped 344 00:17:25,990 --> 00:17:28,440 and with probability one half it comes up heads and with 345 00:17:28,440 --> 00:17:32,070 probability one half it comes up tails. 346 00:17:32,070 --> 00:17:36,050 OK, students are given a well-specified model, and they 347 00:17:36,050 --> 00:17:37,840 calculate various things. 348 00:17:37,840 --> 00:17:40,370 This is in mathematical probability. 349 00:17:40,370 --> 00:17:43,710 Heads and tails are equiprobable in that system. 350 00:17:43,710 --> 00:17:46,510 Subsequent tosses are independent. 351 00:17:46,510 --> 00:17:49,630 Here's a little bit of cynicism. 352 00:17:49,630 --> 00:17:52,700 I apologize for insulting you people with it. 353 00:17:52,700 --> 00:17:57,430 I apologize to any faculty member who later reads this. 354 00:17:57,430 --> 00:18:01,300 And I particularly apologize to businessmen and government 355 00:18:01,300 --> 00:18:03,070 people who might read it. 356 00:18:03,070 --> 00:18:06,840 Students compute, professors write papers, business and 357 00:18:06,840 --> 00:18:10,290 government leaders obtain questionable models and data 358 00:18:10,290 --> 00:18:12,540 on which they can blame failures. 359 00:18:12,540 --> 00:18:16,070 Most cynical towards business leaders because business 360 00:18:16,070 --> 00:18:19,220 leaders often hire consultants. 361 00:18:19,220 --> 00:18:23,300 Not so much to learn what to do, but so they have excuses 362 00:18:23,300 --> 00:18:26,430 when what they do doesn't work out right. 363 00:18:26,430 --> 00:18:30,220 When I say the students compute, what I mean is this 364 00:18:30,220 --> 00:18:34,030 in almost all the courses you've taken up until now-- 365 00:18:34,030 --> 00:18:36,390 and in this course also-- 366 00:18:36,390 --> 00:18:38,220 what you're going to be doing is 367 00:18:38,220 --> 00:18:39,820 solving well-posed problems. 368 00:18:39,820 --> 00:18:43,730 You solve well-posed exercises because that's a good way to 369 00:18:43,730 --> 00:18:46,340 understand what the mathematics of 370 00:18:46,340 --> 00:18:48,410 the subject is about. 371 00:18:48,410 --> 00:18:51,780 Don't think that that's the only part of it. 372 00:18:51,780 --> 00:18:54,690 If that's the only thing you're doing, you might as 373 00:18:54,690 --> 00:18:55,830 well not waste your time. 374 00:18:55,830 --> 00:18:57,600 You might as well do something else. 375 00:18:57,600 --> 00:19:00,730 You might as well go out and shovel snow today instead of 376 00:19:00,730 --> 00:19:02,860 trying to learn about probability theory. 377 00:19:02,860 --> 00:19:06,960 It's more pleasant to learn about probability theory. 378 00:19:06,960 --> 00:19:11,010 OK, the use of probability models has two major 379 00:19:11,010 --> 00:19:13,300 problems with it. 380 00:19:13,300 --> 00:19:16,400 The first problem is, how do you make a model for a 381 00:19:16,400 --> 00:19:18,600 real-world problem? 382 00:19:18,600 --> 00:19:23,930 And a partial answer is, learn about estimation and decisions 383 00:19:23,930 --> 00:19:26,150 in the context of standard models. 384 00:19:26,150 --> 00:19:31,870 In other words, decisions and estimation inside a completely 385 00:19:31,870 --> 00:19:34,880 mathematical framework. 386 00:19:34,880 --> 00:19:36,480 Then you learn a great deal about the 387 00:19:36,480 --> 00:19:39,140 real-world problem itself. 388 00:19:39,140 --> 00:19:43,880 Not about the mathematics of it, but about how you actually 389 00:19:43,880 --> 00:19:46,270 understand what's going on. 390 00:19:46,270 --> 00:19:49,740 If you talk to somebody who is a superb 391 00:19:49,740 --> 00:19:53,030 architect in any field-- 392 00:19:53,030 --> 00:19:57,420 networks, computer systems, control systems, anything-- 393 00:19:57,420 --> 00:19:59,500 what are you going to find? 394 00:19:59,500 --> 00:20:03,630 You're not going to find huge, involved sets of equations 395 00:20:03,630 --> 00:20:06,700 that they're going to use to explain something to you. 396 00:20:06,700 --> 00:20:09,780 They're going to pick at-- if there any good, they're going 397 00:20:09,780 --> 00:20:12,480 to take this big problem, and they're going to take your 398 00:20:12,480 --> 00:20:14,560 issue with this big problem. 399 00:20:14,560 --> 00:20:17,700 And they're going to find the one or two really important 400 00:20:17,700 --> 00:20:21,790 things that tell you something that you have to know. 401 00:20:21,790 --> 00:20:24,350 And that's what you want to get out of this course. 402 00:20:24,350 --> 00:20:28,150 You want to get the ability to take all of the chat, put it 403 00:20:28,150 --> 00:20:31,920 all together, and be able to say one or two important 404 00:20:31,920 --> 00:20:35,220 things which is really necessary. 405 00:20:35,220 --> 00:20:37,670 That's where you're going to. 406 00:20:37,670 --> 00:20:41,300 Before you get there, you'll take low-level jobs in various 407 00:20:41,300 --> 00:20:43,570 companies and you'll compute a lot of things. 408 00:20:43,570 --> 00:20:45,310 You'll simulate a lot of things. 409 00:20:45,310 --> 00:20:47,540 You'll deal with a lot of detail. 410 00:20:47,540 --> 00:20:50,170 Eventually, you're going to get to the point where you've 411 00:20:50,170 --> 00:20:51,920 got to make major decisions. 412 00:20:51,920 --> 00:20:53,800 And you want to be ready for it. 413 00:20:53,800 --> 00:20:55,580 OK, that's enough philosophy. 414 00:20:55,580 --> 00:20:58,830 I will try to give no more philosophy today, except when 415 00:20:58,830 --> 00:21:00,080 I get pushed into it. 416 00:21:09,776 --> 00:21:15,930 OK, one of the problems in this problem of finding a good 417 00:21:15,930 --> 00:21:18,880 model is that no model is perfect. 418 00:21:18,880 --> 00:21:22,130 Namely, what happens is you keep finding more and more 419 00:21:22,130 --> 00:21:25,050 complicated models, which deal with more 420 00:21:25,050 --> 00:21:26,990 and more of the issues. 421 00:21:26,990 --> 00:21:31,100 And as you deal with them, things get more complicated. 422 00:21:31,100 --> 00:21:34,380 You're more down in the level of details and 423 00:21:34,380 --> 00:21:35,640 you're finding out less. 424 00:21:35,640 --> 00:21:39,850 So you want to find some sort of match between a model that 425 00:21:39,850 --> 00:21:42,880 tells you something and a model which is complicated 426 00:21:42,880 --> 00:21:45,180 enough to deal with the issues. 427 00:21:45,180 --> 00:21:49,740 There's a beautiful quote by Alfred North Whitehead. 428 00:21:49,740 --> 00:21:51,960 I don't know whether you've ever heard of Whitehead. 429 00:21:51,960 --> 00:21:55,160 You've probably heard of Bertrand Russell, who was both 430 00:21:55,160 --> 00:22:01,000 a great logician and a great philosopher, and had a lot to 431 00:22:01,000 --> 00:22:04,590 do with the origins of set theory. 432 00:22:04,590 --> 00:22:08,570 Whitehead and Russell together, wrote this massive 433 00:22:08,570 --> 00:22:14,160 book around the turn of the last century between the 1900s 434 00:22:14,160 --> 00:22:18,630 and the 2000s called Principia Mathematica where they try to 435 00:22:18,630 --> 00:22:22,220 resolve all of the paradoxes which were coming up in 436 00:22:22,220 --> 00:22:23,420 mathematics. 437 00:22:23,420 --> 00:22:27,920 And Whitehead's general philosophical comment was, 438 00:22:27,920 --> 00:22:33,580 "Seek simplicity and distrust it." 439 00:22:33,580 --> 00:22:37,025 Now, every time I look at that, I say, why in hell 440 00:22:37,025 --> 00:22:41,640 didn't he say, seek simplicity and question it? 441 00:22:41,640 --> 00:22:43,850 I mean, you all hear about questioning authority, of 442 00:22:43,850 --> 00:22:46,520 course, and that's important to do. 443 00:22:46,520 --> 00:22:50,310 Why when you find a simple model for something should you 444 00:22:50,310 --> 00:22:52,420 distrust it? 445 00:22:52,420 --> 00:22:55,670 Well, the reason is psychological. 446 00:22:55,670 --> 00:22:59,660 If you find a simple model for something and you question it, 447 00:22:59,660 --> 00:23:03,260 you have an enormous psychological bias towards not 448 00:23:03,260 --> 00:23:04,550 giving up the simple model. 449 00:23:04,550 --> 00:23:07,340 You want to keep that simple model. 450 00:23:07,340 --> 00:23:11,270 And therefore, it takes an enormous amount of evidence 451 00:23:11,270 --> 00:23:13,770 before you're going to give something out. 452 00:23:13,770 --> 00:23:15,530 Whitehead said something more than that. 453 00:23:15,530 --> 00:23:19,400 He said, "Seek simplicity and distrust it." 454 00:23:19,400 --> 00:23:21,940 Now, why do I talk about the philosophy of science when 455 00:23:21,940 --> 00:23:24,110 we're trying to learn about probability theory? 456 00:23:24,110 --> 00:23:27,480 Well, probability theory is a mathematical theory. 457 00:23:27,480 --> 00:23:29,960 It's the basis for a great deal of science. 458 00:23:29,960 --> 00:23:35,050 And it's the place where modeling is most difficult. 459 00:23:35,050 --> 00:23:37,600 Scientific questions in most areas, if there's no 460 00:23:37,600 --> 00:23:41,140 probability or uncertainty involved, you just do an 461 00:23:41,140 --> 00:23:44,170 experiment that tells you the answer. 462 00:23:44,170 --> 00:23:46,330 You might not do it carefully enough and then 10 other 463 00:23:46,330 --> 00:23:47,380 people do it. 464 00:23:47,380 --> 00:23:49,670 And finally, everybody agrees, this is the 465 00:23:49,670 --> 00:23:51,140 answer to that problem. 466 00:23:51,140 --> 00:23:53,930 In probability, it ain't that simple. 467 00:23:53,930 --> 00:23:57,720 And that's why one has to focus on this a little more 468 00:23:57,720 --> 00:23:59,740 than usual. 469 00:23:59,740 --> 00:24:04,730 The second problem is, how do you make a probability model 470 00:24:04,730 --> 00:24:06,430 that has no hidden paradoxes in it? 471 00:24:06,430 --> 00:24:11,130 In other words, when you make a mathematical model, how do 472 00:24:11,130 --> 00:24:14,930 you make sure that it really is well-posed? 473 00:24:14,930 --> 00:24:17,490 How do you make sure that when you solve a problem in that 474 00:24:17,490 --> 00:24:20,400 mathematical model that you don't come to something that 475 00:24:20,400 --> 00:24:22,820 doesn't make any sense? 476 00:24:22,820 --> 00:24:27,060 Well, everyone's answer to that is you use Kolmogorov's 477 00:24:27,060 --> 00:24:29,030 axioms of probability. 478 00:24:29,030 --> 00:24:33,190 Because back in 1933, Kolmogorov published this 479 00:24:33,190 --> 00:24:35,410 little thin book. 480 00:24:35,410 --> 00:24:37,370 Those of you who are interested in the history of 481 00:24:37,370 --> 00:24:40,620 science probably ought to read it. 482 00:24:40,620 --> 00:24:43,530 You will find you only understand the first five 483 00:24:43,530 --> 00:24:46,330 pages the first time you look at it. 484 00:24:46,330 --> 00:24:49,170 But it's worthwhile doing that because here was one of the 485 00:24:49,170 --> 00:24:53,930 truly great minds of the early 20th century. 486 00:24:53,930 --> 00:24:58,260 And he took everything he knew about probability, which was a 487 00:24:58,260 --> 00:25:01,630 whole lot more than I know certainly, and a whole lot 488 00:25:01,630 --> 00:25:04,740 more than anybody else at the time knew, and he collapsed it 489 00:25:04,740 --> 00:25:07,100 into these very simple axioms. 490 00:25:07,100 --> 00:25:12,330 And he said, if you obey these axioms in a model that you use 491 00:25:12,330 --> 00:25:16,580 in probability, those axioms will keep you out of any 492 00:25:16,580 --> 00:25:18,270 paradoxes at all. 493 00:25:18,270 --> 00:25:21,690 And then, he showed why that was and he showed how the 494 00:25:21,690 --> 00:25:23,845 axioms could be used and so forth. 495 00:25:23,845 --> 00:25:25,990 So we're going to spend a little bit of time talking 496 00:25:25,990 --> 00:25:27,790 about them today. 497 00:25:27,790 --> 00:25:31,850 OK, quickly, what is a discrete stochastic process? 498 00:25:31,850 --> 00:25:34,180 Well, a stochastic process-- 499 00:25:34,180 --> 00:25:36,100 you've been talking about probability. 500 00:25:36,100 --> 00:25:41,690 And you might be getting the idea that I'm just using the 501 00:25:41,690 --> 00:25:45,670 name "stochastic processes" as a foil for talking about what 502 00:25:45,670 --> 00:25:48,210 I really love, which is the probability. 503 00:25:48,210 --> 00:25:51,210 And there's a certain amount of truth to that. 504 00:25:51,210 --> 00:25:56,360 But stochastic processes are special types of probability 505 00:25:56,360 --> 00:26:02,680 models where the sample points represent functions in time. 506 00:26:02,680 --> 00:26:05,420 In other words, when we're dealing with a probability 507 00:26:05,420 --> 00:26:07,270 model, the basis of a probability 508 00:26:07,270 --> 00:26:08,750 model is a sample space. 509 00:26:08,750 --> 00:26:12,220 It's the set of possible things that might happen. 510 00:26:12,220 --> 00:26:15,550 And you can reduce that to the sample points, which are the 511 00:26:15,550 --> 00:26:20,010 indivisible, little, tiny crumbs of what happens when 512 00:26:20,010 --> 00:26:20,790 you do an experiment. 513 00:26:20,790 --> 00:26:24,080 It's the thing which specifies everything that can be 514 00:26:24,080 --> 00:26:29,040 specified in that model of that experiment. 515 00:26:29,040 --> 00:26:31,670 OK, when you get to a stochastic process, what 516 00:26:31,670 --> 00:26:35,600 you're doing is you're looking at a situation in which these 517 00:26:35,600 --> 00:26:40,560 sample points, the solutions to what happens is, in fact, a 518 00:26:40,560 --> 00:26:44,550 whole sequence of random variables in time. 519 00:26:49,475 --> 00:26:53,410 And what that means is instead of looking at just a vector of 520 00:26:53,410 --> 00:26:56,070 random variables, you're going to be looking at a whole 521 00:26:56,070 --> 00:26:59,040 sequence of random variables. 522 00:26:59,040 --> 00:27:03,890 Now, what is different about a vector of a very large number 523 00:27:03,890 --> 00:27:06,580 of random variables and an infinite 524 00:27:06,580 --> 00:27:09,280 sequence of random variables? 525 00:27:09,280 --> 00:27:14,150 Well, from an engineering standpoint, not very much. 526 00:27:14,150 --> 00:27:18,060 I mean, there's nothing you can do to actually look at an 527 00:27:18,060 --> 00:27:22,160 infinite sequence of random variables. 528 00:27:22,160 --> 00:27:26,380 If you start out at the Big Bang and you carry it on to 529 00:27:26,380 --> 00:27:29,220 what you might imagine is the time when the sun explodes or 530 00:27:29,220 --> 00:27:32,380 something, that's a finite amount of time. 531 00:27:32,380 --> 00:27:37,330 And if you imagine how fast you can observe things, 532 00:27:37,330 --> 00:27:39,530 there's a finite number of random 533 00:27:39,530 --> 00:27:41,280 variables you might observe. 534 00:27:41,280 --> 00:27:44,670 All these models we're going to be dealing with get outside 535 00:27:44,670 --> 00:27:47,960 or that realm, and they deal with something that starts 536 00:27:47,960 --> 00:27:50,200 infinitely far in the past and goes 537 00:27:50,200 --> 00:27:52,760 infinitely far in the future. 538 00:27:52,760 --> 00:27:55,860 It doesn't make much sense, does it? 539 00:27:55,860 --> 00:27:58,610 But then look at the alternative. 540 00:27:58,610 --> 00:28:01,800 You built a device which you're going to sell to 541 00:28:01,800 --> 00:28:04,840 people, and which they're going to use. 542 00:28:04,840 --> 00:28:06,660 And you know they're only going to use it three or four 543 00:28:06,660 --> 00:28:09,970 year until something better comes along. 544 00:28:09,970 --> 00:28:13,410 But do you want to build in to everything you're doing the 545 00:28:13,410 --> 00:28:15,790 idea that it's going to be obsolete in three years? 546 00:28:15,790 --> 00:28:17,390 No. 547 00:28:17,390 --> 00:28:21,510 You want to design this thing so, in fact, it will work for 548 00:28:21,510 --> 00:28:24,300 an essentially arbitrary amount of time. 549 00:28:24,300 --> 00:28:27,160 And therefore, you make a mathematical model of it. 550 00:28:27,160 --> 00:28:30,860 You look at what happens over an infinite span of time. 551 00:28:30,860 --> 00:28:34,330 So whenever we get into mathematics, we always go to 552 00:28:34,330 --> 00:28:36,290 an infinite number of things rather than a 553 00:28:36,290 --> 00:28:39,130 finite number of things. 554 00:28:39,130 --> 00:28:42,420 Now, discrete stochastic processes are those where the 555 00:28:42,420 --> 00:28:45,560 random variables are discrete in time. 556 00:28:45,560 --> 00:28:47,820 Namely, a finite number of possible outcomes 557 00:28:47,820 --> 00:28:49,880 from each of them. 558 00:28:49,880 --> 00:28:55,340 Or the set of possible sample values is discrete. 559 00:28:55,340 --> 00:28:57,000 What does that mean? 560 00:28:57,000 --> 00:29:00,090 It doesn't mean a whole lot when you really start asking 561 00:29:00,090 --> 00:29:02,680 detailed questions about this. 562 00:29:02,680 --> 00:29:07,420 What it means is, I want to talk about a particular kind 563 00:29:07,420 --> 00:29:08,760 of stochastic processes. 564 00:29:11,290 --> 00:29:15,740 And it's a class of processes which will be more than we can 565 00:29:15,740 --> 00:29:17,600 deal with in one term. 566 00:29:17,600 --> 00:29:20,900 And I want to exclude certain processes, like noise 567 00:29:20,900 --> 00:29:24,890 processes, because we don't have time to do both of them. 568 00:29:24,890 --> 00:29:27,980 So don't worry too much about exactly what a discrete 569 00:29:27,980 --> 00:29:29,230 stochastic process is. 570 00:29:31,720 --> 00:29:34,930 It's whatever we want to call it when we deal with it. 571 00:29:38,220 --> 00:29:39,630 Oops. 572 00:29:39,630 --> 00:29:43,252 Oh, where am I? 573 00:29:43,252 --> 00:29:47,180 Oh, I wanted to talk about the different processes we're 574 00:29:47,180 --> 00:29:49,540 going to study. 575 00:29:49,540 --> 00:29:51,310 The first kind of process is what we 576 00:29:51,310 --> 00:29:54,310 call a counting process. 577 00:29:54,310 --> 00:29:57,230 The sample points in the process-- remember, a sample 578 00:29:57,230 --> 00:30:00,660 point specifies everything about an experiment. 579 00:30:00,660 --> 00:30:03,660 It tells you every little detail. 580 00:30:03,660 --> 00:30:07,010 And the sample points here in counting processes are 581 00:30:07,010 --> 00:30:08,980 sequences of arrivals. 582 00:30:08,980 --> 00:30:13,140 This is a very useful idea in dealing with queuing systems 583 00:30:13,140 --> 00:30:15,260 because queuing systems have arrivals. 584 00:30:15,260 --> 00:30:16,670 They have departures. 585 00:30:16,670 --> 00:30:20,120 They have rules for how the arrivals get processed before 586 00:30:20,120 --> 00:30:22,300 they get spit out. 587 00:30:22,300 --> 00:30:26,220 And a big part of that is studying first the arrival 588 00:30:26,220 --> 00:30:29,420 process, then we study the departure process. 589 00:30:29,420 --> 00:30:32,330 We study how to put them together. 590 00:30:32,330 --> 00:30:35,730 And when we get to chapter 2 of the notes, we're going to 591 00:30:35,730 --> 00:30:39,400 be studying Poisson processes, which are in a sense, the 592 00:30:39,400 --> 00:30:43,020 perfect discrete stochastic process. 593 00:30:43,020 --> 00:30:45,660 It's like coin tossing in probability. 594 00:30:45,660 --> 00:30:47,930 Everything that might be true with a 595 00:30:47,930 --> 00:30:50,530 Poisson process is true. 596 00:30:50,530 --> 00:30:52,830 The only things that aren't true are the things that 597 00:30:52,830 --> 00:30:55,130 obviously can't be true. 598 00:30:55,130 --> 00:30:57,280 And we'll find out why that is and how that 599 00:30:57,280 --> 00:30:59,150 works a little later. 600 00:30:59,150 --> 00:31:00,750 We're then going to study renewal 601 00:31:00,750 --> 00:31:03,950 processes in chapter 4. 602 00:31:03,950 --> 00:31:07,510 We're going to put Markov chains in between. 603 00:31:07,510 --> 00:31:09,800 And you'll see why when we do it. 604 00:31:09,800 --> 00:31:13,960 And renewal processes are a more complicated kind of thing 605 00:31:13,960 --> 00:31:15,810 than Poisson processes. 606 00:31:15,810 --> 00:31:18,440 And there's no point confusing you at this point saying what 607 00:31:18,440 --> 00:31:20,800 the difference is, so I won't. 608 00:31:20,800 --> 00:31:25,690 Markov processes are processes. 609 00:31:25,690 --> 00:31:29,590 In other words, the sequences in time of things where what 610 00:31:29,590 --> 00:31:34,270 happens in the future depends on what happens in the past, 611 00:31:34,270 --> 00:31:36,820 only through the state at the present. 612 00:31:36,820 --> 00:31:39,140 In other words, if you can specify the state in the 613 00:31:39,140 --> 00:31:41,760 present, you can forget about everything in the past. 614 00:31:44,460 --> 00:31:49,750 If you have those kinds of processes around, you don't 615 00:31:49,750 --> 00:31:54,030 have to study history at all, which would be very nice. 616 00:31:54,030 --> 00:31:58,110 But unfortunately, not all processes behave that way. 617 00:31:58,110 --> 00:32:00,860 When you do the modeling to try to find out what the state 618 00:32:00,860 --> 00:32:03,650 is, which is what you have to know at the present, you find 619 00:32:03,650 --> 00:32:06,050 out there's a lot of history involved. 620 00:32:06,050 --> 00:32:08,470 OK, finally, we're going to talk about random wa;ls and 621 00:32:08,470 --> 00:32:09,640 martingales. 622 00:32:09,640 --> 00:32:12,030 I'm not going to even say what a random walks or 623 00:32:12,030 --> 00:32:13,410 a martingale is. 624 00:32:13,410 --> 00:32:15,580 We will find out about that soon enough, but I want to 625 00:32:15,580 --> 00:32:18,760 tell you that's what's in chapter 7 of the notes. 626 00:32:18,760 --> 00:32:21,420 That's the last topic we will deal with. 627 00:32:21,420 --> 00:32:24,690 We'll study all sorts of mixtures of these. 628 00:32:24,690 --> 00:32:26,990 Things which involve a little bit of each. 629 00:32:26,990 --> 00:32:28,790 We'll start out working on one thing and 630 00:32:28,790 --> 00:32:30,070 we'll find out another. 631 00:32:30,070 --> 00:32:34,000 One of these other topics is the right way to look at it. 632 00:32:34,000 --> 00:32:37,390 If you want to know more about that, please go look at the 633 00:32:37,390 --> 00:32:40,490 notes, and you'll find out as much as you want. 634 00:32:40,490 --> 00:32:46,210 But it's not appropriate to talk about it right now. 635 00:32:46,210 --> 00:32:49,400 OK, when, where, and how is this useful? 636 00:32:49,400 --> 00:32:52,410 You see, I'm almost at the point where we'll start 637 00:32:52,410 --> 00:32:55,130 actually talking about real stuff. 638 00:32:55,130 --> 00:32:58,440 And when I say real stuff, I mean mathematical stuff, which 639 00:32:58,440 --> 00:32:59,780 is not real stuff. 640 00:33:04,120 --> 00:33:05,040 Broad answer-- 641 00:33:05,040 --> 00:33:09,160 probability in stochastic processes are an important 642 00:33:09,160 --> 00:33:12,470 adjunct to rational thought about all human 643 00:33:12,470 --> 00:33:14,160 and scientific endeavor. 644 00:33:14,160 --> 00:33:16,500 That's a very strong statement. 645 00:33:16,500 --> 00:33:18,210 I happen to believe it. 646 00:33:18,210 --> 00:33:20,250 You might not believe it. 647 00:33:20,250 --> 00:33:22,720 And you're welcome to not believe it. 648 00:33:22,720 --> 00:33:26,220 It won't be on a quiz or anything, believe me. 649 00:33:26,220 --> 00:33:34,010 But almost anything you have to deal with is dealing with 650 00:33:34,010 --> 00:33:35,860 something in the future. 651 00:33:35,860 --> 00:33:38,890 I mean, you have to plan for things which are going to 652 00:33:38,890 --> 00:33:41,180 happen in the future. 653 00:33:41,180 --> 00:33:44,180 When you look at the future, there's a lot of uncertainty 654 00:33:44,180 --> 00:33:45,740 involved with it. 655 00:33:45,740 --> 00:33:48,080 One of the ways is dealing with uncertainty. 656 00:33:48,080 --> 00:33:50,670 And probably the only scientific way of dealing with 657 00:33:50,670 --> 00:33:53,180 uncertainty is through the mechanism 658 00:33:53,180 --> 00:33:56,660 of probability models. 659 00:33:56,660 --> 00:34:01,760 So anything you want to deal with, which is important, 660 00:34:01,760 --> 00:34:04,000 you're probably better off knowing something about 661 00:34:04,000 --> 00:34:06,050 probability than not. 662 00:34:06,050 --> 00:34:09,810 A narrow answer is probability in stochastic processes are 663 00:34:09,810 --> 00:34:13,989 essential components of the following areas. 664 00:34:13,989 --> 00:34:17,889 Now, I must confess I made up this list in about 10 minutes 665 00:34:17,889 --> 00:34:20,860 without thinking about it very seriously. 666 00:34:20,860 --> 00:34:22,880 And these things are related to each other. 667 00:34:22,880 --> 00:34:25,690 Some of them are parts of others. 668 00:34:25,690 --> 00:34:26,360 Let me read them. 669 00:34:26,360 --> 00:34:28,330 Communication systems and networks. 670 00:34:28,330 --> 00:34:31,960 That's where I got involved in this question, and very 671 00:34:31,960 --> 00:34:33,350 important there. 672 00:34:33,350 --> 00:34:34,820 Computer systems. 673 00:34:34,820 --> 00:34:38,420 I also got involved in it because of computer systems. 674 00:34:38,420 --> 00:34:39,940 Queuing in all areas. 675 00:34:39,940 --> 00:34:42,400 Well, I got involved in queuing because of being 676 00:34:42,400 --> 00:34:44,409 interested in networks. 677 00:34:44,409 --> 00:34:46,820 risk management in all areas. 678 00:34:46,820 --> 00:34:49,370 I got interested in that because I started to get 679 00:34:49,370 --> 00:34:52,790 disturbed about civilization destroying itself because 680 00:34:52,790 --> 00:34:56,199 people who have a great deal of power don't know anything 681 00:34:56,199 --> 00:34:57,449 about taking risks. 682 00:35:00,600 --> 00:35:02,560 OK, catastrophe management. 683 00:35:02,560 --> 00:35:04,860 How do you prevent oil spills and things like that? 684 00:35:04,860 --> 00:35:08,960 How do you prevent nuclear plants from going off? 685 00:35:08,960 --> 00:35:11,660 How do you prevent nuclear weapons from falling in the 686 00:35:11,660 --> 00:35:14,320 hands of the wrong people? 687 00:35:14,320 --> 00:35:17,050 These again, are probability issues. 688 00:35:17,050 --> 00:35:19,760 These are important probability issues because 689 00:35:19,760 --> 00:35:24,430 most people don't regard them as probability issues. 690 00:35:24,430 --> 00:35:28,520 If you say there is one chance in a billion that something 691 00:35:28,520 --> 00:35:33,510 will happen, 3/4 of the population will say, that's 692 00:35:33,510 --> 00:35:34,460 not acceptable. 693 00:35:34,460 --> 00:35:37,400 I don't want any risk. 694 00:35:37,400 --> 00:35:39,050 And these people are fools. 695 00:35:39,050 --> 00:35:44,950 But unfortunately, these fools outnumber those of us who have 696 00:35:44,950 --> 00:35:46,440 studied these issues. 697 00:35:46,440 --> 00:35:48,460 So we have to deal with it. 698 00:35:48,460 --> 00:35:51,340 We have to understand it if nothing else. 699 00:35:51,340 --> 00:35:53,890 OK, failures in all types of systems-- 700 00:35:53,890 --> 00:35:56,790 operations research, biology, medicine, optical systems, and 701 00:35:56,790 --> 00:35:58,070 control system. 702 00:35:58,070 --> 00:35:59,500 Name your own favorite thing. 703 00:35:59,500 --> 00:36:01,550 You can put it all in. 704 00:36:01,550 --> 00:36:04,410 Probability gets used everywhere. 705 00:36:04,410 --> 00:36:05,660 OK, let's go to the axioms. 706 00:36:09,460 --> 00:36:13,480 Probability models have three components to them. 707 00:36:13,480 --> 00:36:16,020 There's a sample space. 708 00:36:16,020 --> 00:36:18,810 Now, here we're in mathematics again. 709 00:36:18,810 --> 00:36:24,050 The sample space is just a set of things. 710 00:36:24,050 --> 00:36:26,660 You don't have to be at all specific about what those 711 00:36:26,660 --> 00:36:28,970 things are. 712 00:36:28,970 --> 00:36:32,400 I mean, at this point we're right in to set theory, which 713 00:36:32,400 --> 00:36:36,400 is the most basic part of mathematics again. 714 00:36:36,400 --> 00:36:40,430 And a set contains elements. 715 00:36:40,430 --> 00:36:41,890 And that's what we're talking about here. 716 00:36:41,890 --> 00:36:43,700 So there's a sample space. 717 00:36:43,700 --> 00:36:45,770 There are the elements in that sample space. 718 00:36:48,310 --> 00:36:52,120 There's also a collection of things called events. 719 00:36:52,120 --> 00:36:56,650 Now, the events are subsets of the sample space. 720 00:36:56,650 --> 00:37:02,700 And if you're dealing with a finite set of things, there's 721 00:37:02,700 --> 00:37:06,740 no reason why the events should not be all subsets of 722 00:37:06,740 --> 00:37:09,830 that countable collection of things. 723 00:37:09,830 --> 00:37:13,870 If you have a deck of cards, there are 52 factorial ways of 724 00:37:13,870 --> 00:37:17,310 arranging the cards in that deck of cards. 725 00:37:17,310 --> 00:37:19,590 Very large number. 726 00:37:19,590 --> 00:37:23,290 But when you talk about subsets of that, you might as 727 00:37:23,290 --> 00:37:29,530 well talk about all combinations of those 728 00:37:29,530 --> 00:37:31,550 configurations of the deck. 729 00:37:31,550 --> 00:37:34,770 You can talk about, what's the probability that the first 730 00:37:34,770 --> 00:37:38,770 five cards in that deck happen to contain 4 aces? 731 00:37:38,770 --> 00:37:40,870 That's an easy thing to compute. 732 00:37:40,870 --> 00:37:44,010 I'm sure you've all computed it at some point or other. 733 00:37:44,010 --> 00:37:46,640 Those who like to play poker, of course do this. 734 00:37:46,640 --> 00:37:48,960 It's fun. 735 00:37:48,960 --> 00:37:51,540 But it's a straightforward problem. 736 00:37:51,540 --> 00:37:55,660 When you have these countable sets of things, there's no 737 00:37:55,660 --> 00:38:01,010 reason at all for not having the set of events consist of 738 00:38:01,010 --> 00:38:04,200 all possible subsets. 739 00:38:04,200 --> 00:38:06,150 Well, people believed that for a long time. 740 00:38:06,150 --> 00:38:09,640 One of the things that forced Kolmogorov to start dealing 741 00:38:09,640 --> 00:38:13,420 with these axioms was the realization that when you had 742 00:38:13,420 --> 00:38:18,010 much more complicated sets, where in fact you had the set 743 00:38:18,010 --> 00:38:21,910 of real numbers as possible outcomes, or sequences of 744 00:38:21,910 --> 00:38:26,460 things which go from 0 to infinity, and all of these 745 00:38:26,460 --> 00:38:32,000 sets, which are uncountable, you really can't make sense 746 00:38:32,000 --> 00:38:37,960 out of probability models where all subsets of sample 747 00:38:37,960 --> 00:38:41,240 points are called events. 748 00:38:41,240 --> 00:38:45,020 So in terms of measure theory, you're forced to restrict the 749 00:38:45,020 --> 00:38:46,980 set of things you call events. 750 00:38:46,980 --> 00:38:48,860 Now, we're not going to deal with measure 751 00:38:48,860 --> 00:38:51,200 theory in this subject. 752 00:38:51,200 --> 00:38:54,710 But every once in a while, we will have to mention it 753 00:38:54,710 --> 00:38:59,130 because the reason why a lot of things are the way they are 754 00:38:59,130 --> 00:39:01,260 is because of measure theory. 755 00:39:01,260 --> 00:39:03,900 So you'll have to be at least conscious of it. 756 00:39:03,900 --> 00:39:08,040 If you really want to be serious, as far as your study 757 00:39:08,040 --> 00:39:11,310 of mathematical probability theory, you really have to 758 00:39:11,310 --> 00:39:13,570 take a course in measure theory at some point. 759 00:39:13,570 --> 00:39:16,240 But you don't have to do it now. 760 00:39:16,240 --> 00:39:20,640 In fact, I would almost urge most of you not to do it now. 761 00:39:20,640 --> 00:39:26,440 Because once you get all the way into measure theory, 762 00:39:26,440 --> 00:39:29,310 you're so far into measure theory that you can't come 763 00:39:29,310 --> 00:39:31,790 back and think about real problems anymore. 764 00:39:31,790 --> 00:39:35,460 You're suddenly stuck in the world of mathematics, which 765 00:39:35,460 --> 00:39:38,080 happens to lots of people. 766 00:39:38,080 --> 00:39:41,100 So anyway, some of you should learn about all this 767 00:39:41,100 --> 00:39:42,720 mathematics. 768 00:39:42,720 --> 00:39:43,670 Some of you shouldn't. 769 00:39:43,670 --> 00:39:45,585 Some of you should learn about it later. 770 00:39:45,585 --> 00:39:48,150 So you can do whatever you want. 771 00:39:48,150 --> 00:39:52,590 OK, the axioms about events is that if you 772 00:39:52,590 --> 00:39:53,950 have a set of events. 773 00:39:53,950 --> 00:39:58,260 In other words, a set of subsets, and it's a countable 774 00:39:58,260 --> 00:40:01,310 set, then the union of all of those-- 775 00:40:01,310 --> 00:40:05,190 the union from n equals 1 to infinity of a sub 776 00:40:05,190 --> 00:40:08,750 n is also an event. 777 00:40:08,750 --> 00:40:10,510 I've gone for 50 minutes and nobody has 778 00:40:10,510 --> 00:40:13,660 asked a question yet. 779 00:40:13,660 --> 00:40:14,910 Who has a question? 780 00:40:17,240 --> 00:40:19,990 Who thinks that all of this is nonsense? 781 00:40:19,990 --> 00:40:21,050 How many of you? 782 00:40:21,050 --> 00:40:22,300 I do. 783 00:40:26,104 --> 00:40:28,830 OK, I'll come back in another 10 minutes. 784 00:40:28,830 --> 00:40:31,320 And if nobody has a question by then, I'm just going to 785 00:40:31,320 --> 00:40:33,560 stop and wait. 786 00:40:33,560 --> 00:40:35,530 OK, so anyway. 787 00:40:35,530 --> 00:40:38,990 If you look at a union of events. 788 00:40:38,990 --> 00:40:44,410 Now, remember, that an event is a subset of points. 789 00:40:44,410 --> 00:40:46,920 We're just talking about set theory now. 790 00:40:46,920 --> 00:40:51,360 So the union of this union here-- 791 00:40:51,360 --> 00:40:52,620 excuse me. 792 00:40:52,620 --> 00:40:58,960 This union here is A1, all the points in A1, and all the 793 00:40:58,960 --> 00:41:02,230 points in A2, and all the points in A3, all the way up 794 00:41:02,230 --> 00:41:02,970 to infinity. 795 00:41:02,970 --> 00:41:05,060 That's what we're talking about here. 796 00:41:05,060 --> 00:41:09,020 And one of the axioms of probability theory is that if 797 00:41:09,020 --> 00:41:12,650 each of these things are events, then this union is 798 00:41:12,650 --> 00:41:13,420 also an event. 799 00:41:13,420 --> 00:41:14,590 That's just an axiom. 800 00:41:14,590 --> 00:41:18,930 You can't define events if that's not true. 801 00:41:18,930 --> 00:41:22,640 And if you try to define events where this isn't true, 802 00:41:22,640 --> 00:41:25,400 you eventually come into the most god awful problems you 803 00:41:25,400 --> 00:41:27,090 might imagine. 804 00:41:27,090 --> 00:41:31,110 And suddenly, nothing make sense anymore. 805 00:41:31,110 --> 00:41:36,700 Most of the time when we define a set of events in a 806 00:41:36,700 --> 00:41:40,170 probability model, each singleton event-- 807 00:41:40,170 --> 00:41:44,070 namely, each single point has a set, which contains only 808 00:41:44,070 --> 00:41:47,060 that element, is taken as an event. 809 00:41:47,060 --> 00:41:49,930 There's no real reason to not do that. 810 00:41:49,930 --> 00:41:54,890 If you don't do that, you might as well just put those 811 00:41:54,890 --> 00:41:59,430 points together and not regard them as separate points. 812 00:41:59,430 --> 00:42:02,910 We will see an example in a little bit where, in fact, you 813 00:42:02,910 --> 00:42:03,950 might want to do that. 814 00:42:03,950 --> 00:42:07,080 But let's hold that off for a little bit. 815 00:42:07,080 --> 00:42:09,320 OK, not all subsets need to be events. 816 00:42:09,320 --> 00:42:14,190 Usually, each sample point is taken to be a singleton event. 817 00:42:14,190 --> 00:42:17,870 And then non-events are truly weird things. 818 00:42:17,870 --> 00:42:20,320 I mean, as soon as you take all sample points to be 819 00:42:20,320 --> 00:42:24,460 events, all countable unions of sample points are events. 820 00:42:27,150 --> 00:42:30,490 And then intersections of events are events, and so 821 00:42:30,490 --> 00:42:32,330 forth, and so forth, and so forth. 822 00:42:32,330 --> 00:42:35,610 So most things are events. 823 00:42:35,610 --> 00:42:38,820 And just because of measure theory, you can't make all 824 00:42:38,820 --> 00:42:39,580 things events. 825 00:42:39,580 --> 00:42:43,610 And I'm not going to give you any example of that because 826 00:42:43,610 --> 00:42:47,030 examples are horrendous. 827 00:42:47,030 --> 00:42:51,120 OK, the empty set has to be an event. 828 00:42:51,120 --> 00:42:54,850 Why does the empty set have to be an event. 829 00:42:54,850 --> 00:42:56,293 If we're going to believe these axioms-- 830 00:42:58,950 --> 00:43:01,990 I'm in a real bind here because every one of you 831 00:43:01,990 --> 00:43:05,230 people has seen these axioms before. 832 00:43:05,230 --> 00:43:10,320 And you've all gone on and said, I can get an A in any 833 00:43:10,320 --> 00:43:13,630 probability class in the world without having any idea of 834 00:43:13,630 --> 00:43:15,900 what these axioms are all about. 835 00:43:15,900 --> 00:43:18,620 And therefore, it's unimportant. 836 00:43:18,620 --> 00:43:21,180 So you see something that says, the 837 00:43:21,180 --> 00:43:23,370 empty set is an event. 838 00:43:23,370 --> 00:43:24,450 And you say, well, of course that has 839 00:43:24,450 --> 00:43:25,780 nothing to do with anything. 840 00:43:25,780 --> 00:43:28,450 Why should I worry about whether the empty set is an 841 00:43:28,450 --> 00:43:28,960 event or not? 842 00:43:28,960 --> 00:43:31,730 The empty set can't happen, so how can it be an event? 843 00:43:31,730 --> 00:43:35,380 Well, because of these axioms, it has to be an event. 844 00:43:35,380 --> 00:43:40,750 The axioms say that if A is an event, and that's the whole 845 00:43:40,750 --> 00:43:43,640 sample space, then the complement has 846 00:43:43,640 --> 00:43:45,380 to be an event also. 847 00:43:45,380 --> 00:43:48,320 So that says that the empty set has to be an event. 848 00:43:48,320 --> 00:43:51,320 And that just follows from the axioms. 849 00:43:51,320 --> 00:43:53,910 If all sample points are singleton events, then all 850 00:43:53,910 --> 00:43:59,250 finite and countable sets are events. 851 00:43:59,250 --> 00:44:01,790 And finally, deMorgan's law. 852 00:44:01,790 --> 00:44:05,700 Is there anyone who isn't familiar with deMorgan's law? 853 00:44:05,700 --> 00:44:09,210 Anyone who hasn't seen even that small 854 00:44:09,210 --> 00:44:12,140 amount of set theory? 855 00:44:12,140 --> 00:44:16,570 If not, look it up on-- 856 00:44:16,570 --> 00:44:23,120 what's the name of the computer-- 857 00:44:23,120 --> 00:44:25,560 Wikipedia. 858 00:44:25,560 --> 00:44:27,260 Most of you will think that things on 859 00:44:27,260 --> 00:44:29,470 Wikipedia are not reliable. 860 00:44:29,470 --> 00:44:34,070 Strangely enough, in terms of probability theory and a lot 861 00:44:34,070 --> 00:44:37,930 of mathematics, Wikipedia does things a whole lot better than 862 00:44:37,930 --> 00:44:40,150 most textbooks do. 863 00:44:40,150 --> 00:44:43,520 So any time you're unfamiliar with what a word means or 864 00:44:43,520 --> 00:44:46,170 something, you can look it up in your 865 00:44:46,170 --> 00:44:49,300 old probability textbook. 866 00:44:49,300 --> 00:44:51,270 If you've used [INAUDIBLE] 867 00:44:51,270 --> 00:44:53,310 and [INAUDIBLE], you will probably find the 868 00:44:53,310 --> 00:44:54,670 right answer there. 869 00:44:54,670 --> 00:44:57,630 Other textbooks, maybe the right answer. 870 00:44:57,630 --> 00:45:00,520 Wikipedia's more reliable than most of them. 871 00:45:00,520 --> 00:45:02,920 And it's also clearer than most of them. 872 00:45:02,920 --> 00:45:08,240 So I highly recommend using Wikipedia whenever you get 873 00:45:08,240 --> 00:45:10,750 totally confused by something. 874 00:45:10,750 --> 00:45:17,020 OK, so probability measure and events satisfies 875 00:45:17,020 --> 00:45:18,440 the following axioms. 876 00:45:18,440 --> 00:45:21,350 We've said what things are events. 877 00:45:21,350 --> 00:45:24,710 The only things that have probabilities are events. 878 00:45:24,710 --> 00:45:27,920 So the entire set has a probability. 879 00:45:27,920 --> 00:45:30,630 When you do the experiment, something has to happen. 880 00:45:30,630 --> 00:45:32,560 So one of the sample points occurs. 881 00:45:32,560 --> 00:45:34,820 That's the whole idea of probability. 882 00:45:34,820 --> 00:45:37,960 And therefore, omega has probability 1. 883 00:45:37,960 --> 00:45:39,540 Capital Omega. 884 00:45:39,540 --> 00:45:44,010 If A is an event, then the probability of A has to be 885 00:45:44,010 --> 00:45:46,370 greater than or equal to 0. 886 00:45:46,370 --> 00:45:49,670 You can probably see without too much trouble why it has to 887 00:45:49,670 --> 00:45:51,790 be less than or equal to 1 also. 888 00:45:51,790 --> 00:45:53,700 But that's not one of the axioms. 889 00:45:53,700 --> 00:45:59,670 You see, when you state a set of axioms for something, you'd 890 00:45:59,670 --> 00:46:03,970 like to use the minimum set of axioms you can, so that you 891 00:46:03,970 --> 00:46:07,590 don't have to verify too many things before you say, yes, 892 00:46:07,590 --> 00:46:10,120 this satisfies all the axioms. 893 00:46:10,120 --> 00:46:12,680 So the second one is the probability of A has to be 894 00:46:12,680 --> 00:46:14,940 greater than or equal to 0. 895 00:46:14,940 --> 00:46:18,880 The third one says that if you have a sequence of disjoint 896 00:46:18,880 --> 00:46:23,450 events, incidentally when I say a sequence, I will almost 897 00:46:23,450 --> 00:46:25,885 always mean a countably infinite sequence-- 898 00:46:25,885 --> 00:46:29,740 A1, A2, A3, all the way up. 899 00:46:29,740 --> 00:46:32,980 If I'm talking about what most of you would 900 00:46:32,980 --> 00:46:35,130 call a finite sequence-- 901 00:46:35,130 --> 00:46:37,610 and I like the word "finite sequence," but I like to be 902 00:46:37,610 --> 00:46:39,450 able to talk about sequences. 903 00:46:39,450 --> 00:46:42,180 I'm talking about a finite sequence I will usually call 904 00:46:42,180 --> 00:46:45,280 it an n-tuple of random variables or 905 00:46:45,280 --> 00:46:47,400 an n-tuple of things. 906 00:46:47,400 --> 00:46:51,380 So sequence really means you go the whole way out. 907 00:46:51,380 --> 00:46:55,800 OK, if A1, A2, all the way up are disjoint events-- 908 00:46:55,800 --> 00:46:56,630 disjoint. 909 00:46:56,630 --> 00:47:00,460 Disjoint means if omega is only in one, it can't be in 910 00:47:00,460 --> 00:47:01,690 any of the others. 911 00:47:01,690 --> 00:47:05,990 Then the probability of this countable union is going to be 912 00:47:05,990 --> 00:47:08,830 equal to the sum of the probabilities of the 913 00:47:08,830 --> 00:47:10,980 individual event. 914 00:47:10,980 --> 00:47:13,910 Anyone who has ever done a probability problem knows all 915 00:47:13,910 --> 00:47:15,200 of these things. 916 00:47:15,200 --> 00:47:17,570 The only thing you don't know and you probably haven't 917 00:47:17,570 --> 00:47:19,760 thought about is why everything else 918 00:47:19,760 --> 00:47:21,730 follows from this. 919 00:47:21,730 --> 00:47:25,610 But this is the whole mathematical theory. 920 00:47:25,610 --> 00:47:27,040 Why should we study it anymore? 921 00:47:27,040 --> 00:47:28,210 We're done. 922 00:47:28,210 --> 00:47:29,340 We have the axioms. 923 00:47:29,340 --> 00:47:34,430 Everything else follows, it's just a matter of computation. 924 00:47:34,430 --> 00:47:38,180 Just sit down and do it. 925 00:47:38,180 --> 00:47:41,480 Not quite that simple. 926 00:47:41,480 --> 00:47:45,990 Anyway, a few consequences of the probability of the empty 927 00:47:45,990 --> 00:47:51,030 set is 0, which says when you do an experiment something's 928 00:47:51,030 --> 00:47:51,690 going to happen. 929 00:47:51,690 --> 00:47:55,360 And therefore, the probability that nothing happens is 0 930 00:47:55,360 --> 00:47:57,440 because that's what the model says. 931 00:47:57,440 --> 00:48:02,320 The probability of the complement of an event is 1 932 00:48:02,320 --> 00:48:05,380 minus the probability of that event. 933 00:48:05,380 --> 00:48:08,260 Which, in fact, is what's says that all events have to have 934 00:48:08,260 --> 00:48:11,240 probabilities less than or equal to 1. 935 00:48:11,240 --> 00:48:15,250 And if the event A is contained in the event B-- 936 00:48:15,250 --> 00:48:17,870 remember when we talk about events, we're talking about 937 00:48:17,870 --> 00:48:20,780 two different things, both simultaneously. 938 00:48:20,780 --> 00:48:26,150 One of them is this beautiful idea with measure theory 939 00:48:26,150 --> 00:48:28,340 worked into it and everything else. 940 00:48:28,340 --> 00:48:31,910 And the other is just a simple set theoretic idea. 941 00:48:31,910 --> 00:48:34,170 And all we need to be familiar with is a 942 00:48:34,170 --> 00:48:36,870 set theoretical idea. 943 00:48:36,870 --> 00:48:42,190 Within that set theoretical idea, A contained in B means 944 00:48:42,190 --> 00:48:46,400 that every sample point that's in A is also in B. It means 945 00:48:46,400 --> 00:48:49,650 that when you do an experiment, and the event A 946 00:48:49,650 --> 00:48:53,360 occurs, the event B has to occur because one of the 947 00:48:53,360 --> 00:48:57,610 things that compose A has to occur. 948 00:48:57,610 --> 00:49:01,255 And that thing has to be in B because A is contained in B. 949 00:49:01,255 --> 00:49:04,390 So the probability of A has to be less than or equal to the 950 00:49:04,390 --> 00:49:08,170 probability of B. That has to be less than or equal to 1. 951 00:49:08,170 --> 00:49:11,310 These are things you all know. 952 00:49:11,310 --> 00:49:13,230 Another consequence is the union bound. 953 00:49:13,230 --> 00:49:16,690 Many of you have probably seen the union bound. 954 00:49:16,690 --> 00:49:22,830 We will use it probably almost every day in this course. 955 00:49:22,830 --> 00:49:26,600 So it's good to have that as one of the things you remember 956 00:49:26,600 --> 00:49:28,740 at the highest level. 957 00:49:28,740 --> 00:49:31,780 If you have a set of events-- 958 00:49:31,780 --> 00:49:34,360 A1, A2, and so forth-- 959 00:49:34,360 --> 00:49:36,270 the probability of that union-- 960 00:49:36,270 --> 00:49:38,950 namely, the event that consists of all of them-- 961 00:49:38,950 --> 00:49:43,360 is less than or equal to the sum of the individual event 962 00:49:43,360 --> 00:49:45,010 probabilities. 963 00:49:45,010 --> 00:49:52,540 I give a little proof here for just two events, A1 and A2. 964 00:49:52,540 --> 00:49:54,790 So you see why this is true. 965 00:49:54,790 --> 00:49:57,950 I hope you can extend this to 3 and 4. 966 00:49:57,950 --> 00:50:00,400 I can't draw a picture of it very easily for 3 967 00:50:00,400 --> 00:50:01,780 and 4 and so forth. 968 00:50:01,780 --> 00:50:03,980 But here's the event A1. 969 00:50:03,980 --> 00:50:05,830 Here's the event A2. 970 00:50:05,830 --> 00:50:10,320 Visualize this as a set of sample points, which are just 971 00:50:10,320 --> 00:50:12,720 in the two-dimensional space here. 972 00:50:12,720 --> 00:50:15,730 So all these points here are in A1. 973 00:50:15,730 --> 00:50:18,790 All these points are in A2. 974 00:50:18,790 --> 00:50:20,750 This set of points here are the points that 975 00:50:20,750 --> 00:50:23,770 are in A1 and A2. 976 00:50:23,770 --> 00:50:27,610 I will use just writing things next to each other to mean 977 00:50:27,610 --> 00:50:28,630 intersection. 978 00:50:28,630 --> 00:50:33,950 And sometimes I'll use a big cap to main intersection. 979 00:50:33,950 --> 00:50:37,810 So all of these things are both in A1 and A2. 980 00:50:37,810 --> 00:50:40,550 This is A2, but not A1. 981 00:50:40,550 --> 00:50:45,860 So the probability of this whole event, A1 union A2, is 982 00:50:45,860 --> 00:50:49,810 the probability of this thing and this thing together. 983 00:50:49,810 --> 00:50:53,300 So it's the probability of this plus the 984 00:50:53,300 --> 00:50:56,040 probability of this. 985 00:50:56,040 --> 00:50:59,890 The probability of this is less than the probability of 986 00:50:59,890 --> 00:51:05,630 A2 because this is contained in that whole rectangle. 987 00:51:05,630 --> 00:51:09,680 And therefore, the probability of the union of A1 and A2 is 988 00:51:09,680 --> 00:51:13,160 less than or equal to the probability of A1 plus 989 00:51:13,160 --> 00:51:15,060 probability of A2. 990 00:51:15,060 --> 00:51:18,700 Now, the classy way to extend this to a countably infinite 991 00:51:18,700 --> 00:51:21,860 set is to use induction. 992 00:51:21,860 --> 00:51:26,340 And I leave that as something that you can all play with 993 00:51:26,340 --> 00:51:30,090 some time when it's not between 9:30 and 11:00 in the 994 00:51:30,090 --> 00:51:32,050 morning and you're struggling to stay awake. 995 00:51:35,880 --> 00:51:38,410 And if you don't want to do that on your own, you can look 996 00:51:38,410 --> 00:51:40,430 at it in the notes. 997 00:51:40,430 --> 00:51:45,550 OK, these axioms look ho-hum to you. 998 00:51:45,550 --> 00:51:47,970 And you've always ignored them before, and you think you're 999 00:51:47,970 --> 00:51:50,440 going to be able to ignore them now. 1000 00:51:50,440 --> 00:51:53,670 Partly you can, but partly you can't because every once in a 1001 00:51:53,670 --> 00:51:58,660 while we'll start doing things where you really need to 1002 00:51:58,660 --> 00:52:02,510 understand what the axioms say. 1003 00:52:02,510 --> 00:52:06,770 OK, one other thing which you might not have noticed. 1004 00:52:06,770 --> 00:52:10,570 When you studied elementary probability, wherever you 1005 00:52:10,570 --> 00:52:15,040 studied it, what do you spend most of your time doing? 1006 00:52:15,040 --> 00:52:19,160 You spent most of your time talking about random variables 1007 00:52:19,160 --> 00:52:22,740 and talking about expectations. 1008 00:52:22,740 --> 00:52:28,370 The axioms don't have random variables in them. 1009 00:52:28,370 --> 00:52:30,850 They don't have expectations in them. 1010 00:52:30,850 --> 00:52:33,560 All they have in them is events and 1011 00:52:33,560 --> 00:52:36,230 probabilities of events. 1012 00:52:36,230 --> 00:52:39,380 So these axioms say that the really important things in 1013 00:52:39,380 --> 00:52:43,480 probability are the events and the probabilities of events. 1014 00:52:43,480 --> 00:52:46,820 And the random variables and the expectations are derived 1015 00:52:46,820 --> 00:52:51,480 quantities, which we'll now start to talk about. 1016 00:52:51,480 --> 00:52:54,470 OK, so we're now down to independent events and 1017 00:52:54,470 --> 00:52:56,980 experiments. 1018 00:52:56,980 --> 00:53:02,350 Two events, A1 and A2, are independent if the probability 1019 00:53:02,350 --> 00:53:05,310 of the two of them is equal to the product of their 1020 00:53:05,310 --> 00:53:06,290 probabilities. 1021 00:53:06,290 --> 00:53:08,400 You've all seen this. 1022 00:53:08,400 --> 00:53:09,970 I'm sure you've all seen it. 1023 00:53:09,970 --> 00:53:13,090 If you haven't at least seen it, you probably shouldn't be 1024 00:53:13,090 --> 00:53:17,240 in this class because even though the text does 1025 00:53:17,240 --> 00:53:21,920 everything in detail that has to be done, you need to have a 1026 00:53:21,920 --> 00:53:24,660 little bit of insight from having dealt with these 1027 00:53:24,660 --> 00:53:26,410 subjects before. 1028 00:53:26,410 --> 00:53:28,830 If you don't have that, you're just going to get lost very, 1029 00:53:28,830 --> 00:53:31,570 very quickly. 1030 00:53:31,570 --> 00:53:35,440 So the probability is the intersection of the event A1 1031 00:53:35,440 --> 00:53:40,160 and A2 is the product of the two. 1032 00:53:40,160 --> 00:53:43,590 Now, in other words, you have a red die and a white die. 1033 00:53:43,590 --> 00:53:49,900 You flip the dice, what's the probability that you get a 1 1034 00:53:49,900 --> 00:53:52,360 for the red die and a 1 for the white die? 1035 00:53:52,360 --> 00:53:57,630 Well, the probability you get a 1 for the red die is 1/6. 1036 00:53:57,630 --> 00:54:00,360 Just by symmetry, there are only 6 possible things that 1037 00:54:00,360 --> 00:54:01,470 can happen. 1038 00:54:01,470 --> 00:54:04,040 Probability of white die comes up as 1. 1039 00:54:04,040 --> 00:54:07,530 Probability is 1/6 for that. 1040 00:54:07,530 --> 00:54:12,720 And the probability of the two things, they're independent. 1041 00:54:12,720 --> 00:54:16,970 There's a sense of real-world independence and probability 1042 00:54:16,970 --> 00:54:18,420 theory independence. 1043 00:54:18,420 --> 00:54:21,290 Real-world independence says the two things are isolated, 1044 00:54:21,290 --> 00:54:23,290 they don't interfere with each other. 1045 00:54:23,290 --> 00:54:28,020 Probability theory says just by definition, well, the 1046 00:54:28,020 --> 00:54:31,800 real-world idea of them not interfering with each other 1047 00:54:31,800 --> 00:54:33,390 should say-- 1048 00:54:33,390 --> 00:54:38,820 and I'm waving my hands here because this is so elementary, 1049 00:54:38,820 --> 00:54:39,690 you all know it. 1050 00:54:39,690 --> 00:54:43,620 And I would bore you if I talked about it more. 1051 00:54:43,620 --> 00:54:45,305 But I probably should talk about it, but 1052 00:54:45,305 --> 00:54:46,280 I'm not going to. 1053 00:54:46,280 --> 00:54:50,060 Anyway, this is the definition of independence. 1054 00:54:50,060 --> 00:54:54,000 If you don't have any idea of how this corresponds to being 1055 00:54:54,000 --> 00:54:58,510 unconnected in the real-world, then go to Wikipedia. 1056 00:54:58,510 --> 00:54:59,330 Read the notes. 1057 00:54:59,330 --> 00:55:00,760 Well, you should read the notes anyway. 1058 00:55:00,760 --> 00:55:03,980 I hope you will read the notes because I'm not going to say 1059 00:55:03,980 --> 00:55:07,140 everything in class that needs to be said. 1060 00:55:07,140 --> 00:55:10,450 And you will get a better feeling for it. 1061 00:55:10,450 --> 00:55:12,820 Now, here's something important. 1062 00:55:12,820 --> 00:55:17,410 Given two probability models, a combined model can be 1063 00:55:17,410 --> 00:55:22,460 defined in which, first, the sample space, omega, is the 1064 00:55:22,460 --> 00:55:26,820 Cartesian product of omega 1 and omega 2. 1065 00:55:26,820 --> 00:55:28,950 Namely, it's the Cartesian product of 1066 00:55:28,950 --> 00:55:30,820 the two sample spaces. 1067 00:55:30,820 --> 00:55:34,390 Think of rolling the red die and the white die. 1068 00:55:34,390 --> 00:55:36,740 Rolling a red die is an experiment. 1069 00:55:36,740 --> 00:55:39,460 There are 6 possible outcomes, a 1 to 6. 1070 00:55:39,460 --> 00:55:42,940 Rolled a white die, there are 6 possible 1071 00:55:42,940 --> 00:55:44,940 outcomes, a 1 to a 6. 1072 00:55:44,940 --> 00:55:48,250 You roll the two dice together, and you really need 1073 00:55:48,250 --> 00:55:49,890 to have some way of putting these 1074 00:55:49,890 --> 00:55:51,510 two experiments together. 1075 00:55:51,510 --> 00:55:52,650 How do you put them together? 1076 00:55:52,650 --> 00:55:57,770 You talk about the outcome for the two dice, number for one 1077 00:55:57,770 --> 00:55:59,020 and number for the other. 1078 00:55:59,020 --> 00:56:03,200 The Cartesian product simply means you have the set made up 1079 00:56:03,200 --> 00:56:08,680 of 1 to 6 Cartesian product with 1 to 6. 1080 00:56:08,680 --> 00:56:10,065 So you have 36 possibilities. 1081 00:56:16,350 --> 00:56:19,010 It's an interesting thing, which comes from 1082 00:56:19,010 --> 00:56:20,790 Kolmogorov's axioms. 1083 00:56:20,790 --> 00:56:27,420 That, in fact, you can take in any two probability models for 1084 00:56:27,420 --> 00:56:29,100 two different experiments. 1085 00:56:29,100 --> 00:56:33,440 You can take this Cartesian product of sample points. 1086 00:56:33,440 --> 00:56:36,830 You can assume that what happens here is independent of 1087 00:56:36,830 --> 00:56:38,500 what happens here. 1088 00:56:38,500 --> 00:56:41,600 And when you do this, you will, in fact, get something 1089 00:56:41,600 --> 00:56:45,180 for the two experiments put together which satisfies 1090 00:56:45,180 --> 00:56:47,850 Kolmogorov's axioms. 1091 00:56:47,850 --> 00:56:53,590 That is neither trivial nor very hard to prove. 1092 00:56:53,590 --> 00:56:58,020 I mean, for the case of two dice, you can see it almost 1093 00:56:58,020 --> 00:56:58,990 immediately. 1094 00:56:58,990 --> 00:57:00,720 I mean, you see what the sample space is. 1095 00:57:04,050 --> 00:57:06,220 It's this Cartesian product. 1096 00:57:06,220 --> 00:57:08,460 And you see what the probabilities have to be 1097 00:57:08,460 --> 00:57:12,370 because the probability of, say, 1 and 2 for the red die 1098 00:57:12,370 --> 00:57:18,070 and 1 and 2 for the white is 2, 6 times 2, 6. 1099 00:57:18,070 --> 00:57:24,180 So with probability 1/9, you're going to get a 1 and a 1100 00:57:24,180 --> 00:57:28,360 2 combined with a 1 and a 2. 1101 00:57:28,360 --> 00:57:31,440 I'm going to talk a little bit later about something 1102 00:57:31,440 --> 00:57:34,240 that you all know. 1103 00:57:34,240 --> 00:57:37,650 What happens if you roll two white dice? 1104 00:57:37,650 --> 00:57:40,380 This is something you all ought to think about a little 1105 00:57:40,380 --> 00:57:44,420 bit because it really isn't as simple as it sounds. 1106 00:57:44,420 --> 00:57:49,840 If you roll two dice, what's the probability that you'll 1107 00:57:49,840 --> 00:57:52,440 get a 1 and a 2? 1108 00:57:52,440 --> 00:57:56,440 And how can you justify that? 1109 00:57:56,440 --> 00:57:59,660 First, what's a sample space when you roll two white dice? 1110 00:58:05,350 --> 00:58:07,630 Well, if you look at the possible things that might 1111 00:58:07,630 --> 00:58:15,640 happen, you can get 1, 1; 2, 2; 3, 3; 4, 4; 5, 5; 6, 6. 1112 00:58:15,640 --> 00:58:18,140 You can also get 1, 2 or 2, 1. 1113 00:58:18,140 --> 00:58:21,930 But you can't tell them apart, so there's one sample point, 1114 00:58:21,930 --> 00:58:25,170 you might say, which is 1, 2, and 2, 1. 1115 00:58:25,170 --> 00:58:29,410 Another sample point which is 2, 3; 3, 2, and so forth. 1116 00:58:29,410 --> 00:58:33,740 If you count them up, there are 21 separate sample points 1117 00:58:33,740 --> 00:58:35,990 that you might have. 1118 00:58:35,990 --> 00:58:41,150 And when you look at what the probabilities ought to be, the 1119 00:58:41,150 --> 00:58:44,410 probabilities of the pairs are 136 each. 1120 00:58:44,410 --> 00:58:48,830 And the probabilities of the I J, where I is unequal 1121 00:58:48,830 --> 00:58:53,380 to J is 1/18 each. 1122 00:58:53,380 --> 00:58:56,080 That's awful. 1123 00:58:56,080 --> 00:58:58,400 So what do you do? 1124 00:58:58,400 --> 00:59:00,540 When you're rolling two dice, you do the same thing that 1125 00:59:00,540 --> 00:59:02,860 everybody else does. 1126 00:59:02,860 --> 00:59:05,800 You say, well, even though it's two white dice, I'm going 1127 00:59:05,800 --> 00:59:08,850 to think of it as if it's a white die and a red die. 1128 00:59:08,850 --> 00:59:12,060 I'm going to think of it as if the two are indistinguishable. 1129 00:59:12,060 --> 00:59:15,950 My sample space is going to be these 36 different things. 1130 00:59:15,950 --> 00:59:18,100 I will never be able to distinguish a 1131 00:59:18,100 --> 00:59:19,340 1, 2 from a 2, 1. 1132 00:59:19,340 --> 00:59:21,690 But I don't care because I now know the 1133 00:59:21,690 --> 00:59:24,450 probability of each of them. 1134 00:59:24,450 --> 00:59:30,850 What I'm trying to say by this, is this a very, very 1135 00:59:30,850 --> 00:59:36,100 trivial example of where you really have to think through 1136 00:59:36,100 --> 00:59:40,000 the question of what kind of mathematical model do you want 1137 00:59:40,000 --> 00:59:44,220 of the most simple situation you can think of almost. 1138 00:59:44,220 --> 00:59:48,290 When you combine two different experiments together and you 1139 00:59:48,290 --> 00:59:52,930 lose distinguishability, then what do you do? 1140 00:59:52,930 --> 00:59:55,390 Well, the sensible thing to do is assume that the 1141 00:59:55,390 --> 00:59:58,010 distinguishability is still there, but it's not 1142 00:59:58,010 --> 00:59:59,260 observable. 1143 01:00:01,120 --> 01:00:05,760 But that makes it hard to make a correspondence between the 1144 01:00:05,760 --> 01:00:08,070 real world and the probability world. 1145 01:00:08,070 --> 01:00:10,660 So we'll come back to that later. 1146 01:00:10,660 --> 01:00:13,770 But for the most part, you don't have to worry about it 1147 01:00:13,770 --> 01:00:15,810 because this is something you've dealt 1148 01:00:15,810 --> 01:00:17,060 with all of your lives. 1149 01:00:19,710 --> 01:00:22,340 I mean, you've done probability 1150 01:00:22,340 --> 01:00:24,730 problems with dice. 1151 01:00:24,730 --> 01:00:27,200 You've done probability problems with all sorts of 1152 01:00:27,200 --> 01:00:29,140 other things where things are 1153 01:00:29,140 --> 01:00:31,230 indistinguishable from each other. 1154 01:00:31,230 --> 01:00:36,140 And after doing a few of these problems, you are used to 1155 01:00:36,140 --> 01:00:38,630 being schizophrenic about it. 1156 01:00:38,630 --> 01:00:41,300 And on one hand, thinking that these things are 1157 01:00:41,300 --> 01:00:43,090 distinguishable to figure out what all the 1158 01:00:43,090 --> 01:00:44,600 probabilities are. 1159 01:00:44,600 --> 01:00:47,460 And then you go back to saying, well, they aren't 1160 01:00:47,460 --> 01:00:51,600 really distinguishable, and you find the right answer. 1161 01:00:51,600 --> 01:00:54,010 So you don't have to worry about it. 1162 01:00:54,010 --> 01:00:59,390 All I'm trying to say here is that you should understand it. 1163 01:00:59,390 --> 01:01:03,460 Because when you get the complicated situations, this 1164 01:01:03,460 --> 01:01:07,540 is one of the main things which will cause confusion. 1165 01:01:07,540 --> 01:01:10,260 It's one of the main things where people write papers and 1166 01:01:10,260 --> 01:01:14,180 other people say that paper is wrong because they're both 1167 01:01:14,180 --> 01:01:18,420 thinking of different models for it. 1168 01:01:18,420 --> 01:01:23,130 Important thing is if you satisfy Kolmogorov's axioms in 1169 01:01:23,130 --> 01:01:27,140 each of a set of models, and most important thing is where 1170 01:01:27,140 --> 01:01:30,200 each of these models are exactly the same. 1171 01:01:30,200 --> 01:01:34,080 And then you make them each independent of each other, 1172 01:01:34,080 --> 01:01:36,960 Kolmogorov's axioms are going to be satisfied for the 1173 01:01:36,960 --> 01:01:40,950 combination, as well as the individual model. 1174 01:01:40,950 --> 01:01:44,030 Why do I care about that? 1175 01:01:44,030 --> 01:01:46,600 Because we're studying stochastic processes. 1176 01:01:46,600 --> 01:01:51,370 We're studying an infinite sequence of random variables. 1177 01:01:51,370 --> 01:01:56,890 And I don't want to generate a complete probability model for 1178 01:01:56,890 --> 01:01:59,660 an infinite set of random variables every time I talk 1179 01:01:59,660 --> 01:02:02,410 about an infinite set of random variables. 1180 01:02:02,410 --> 01:02:04,850 If I'm talking about an infinite sequence of flipping 1181 01:02:04,850 --> 01:02:10,040 a coin, I want to do what you do, which is say, for each 1182 01:02:10,040 --> 01:02:16,700 coin, the coin is equiprobably a head or a tail. 1183 01:02:16,700 --> 01:02:19,830 And the coin tosses are independent of each other. 1184 01:02:19,830 --> 01:02:23,070 And I want to know that I can go from that to thinking about 1185 01:02:23,070 --> 01:02:24,060 this sequence. 1186 01:02:24,060 --> 01:02:26,990 Strange things will happen in these sequences when 1187 01:02:26,990 --> 01:02:28,250 we go to the limit. 1188 01:02:28,250 --> 01:02:32,720 But still, we don't want to have to worry about a model 1189 01:02:32,720 --> 01:02:34,770 for the whole infinite sequence. 1190 01:02:34,770 --> 01:02:39,650 So that's one of the things we should deal with. 1191 01:02:39,650 --> 01:02:40,900 Finally, random variables. 1192 01:02:44,330 --> 01:02:45,580 Definition. 1193 01:02:47,950 --> 01:02:51,810 Three years ago, I taught this course and I asked people to 1194 01:02:51,810 --> 01:02:55,880 write down definition of what a random variable was. 1195 01:02:55,880 --> 01:03:00,140 And almost no one really had any idea of what it was. 1196 01:03:00,140 --> 01:03:01,870 They said it was something that had a probability 1197 01:03:01,870 --> 01:03:06,620 density, or something that had a probability mass function, 1198 01:03:06,620 --> 01:03:10,390 or something that had a distribution function, or 1199 01:03:10,390 --> 01:03:12,570 something like that. 1200 01:03:12,570 --> 01:03:17,190 What it is, if you want to get a definition which fits in 1201 01:03:17,190 --> 01:03:20,640 with the axioms, the only thing we know from the axioms 1202 01:03:20,640 --> 01:03:22,750 is there's a sample space. 1203 01:03:22,750 --> 01:03:25,500 There are events and there are probabilities. 1204 01:03:25,500 --> 01:03:29,910 So a random variable, what it really is, is it's a function 1205 01:03:29,910 --> 01:03:37,370 from the set of sample points to the set of real values. 1206 01:03:37,370 --> 01:03:41,150 And as you get into this, you will realize that the set of 1207 01:03:41,150 --> 01:03:43,300 real values does not include minus 1208 01:03:43,300 --> 01:03:45,250 infinity or plus infinity. 1209 01:03:45,250 --> 01:03:47,800 It says that every sample point gets mapped into a 1210 01:03:47,800 --> 01:03:49,800 finite value. 1211 01:03:49,800 --> 01:03:52,630 This happens, of course, when you flip a coin. 1212 01:03:52,630 --> 01:03:54,560 Well, flipping a coin, the outcome is 1213 01:03:54,560 --> 01:03:56,140 not a random variable. 1214 01:03:56,140 --> 01:03:59,440 But you'd like to make it a random variable, so you say, 1215 01:03:59,440 --> 01:04:02,810 OK, I'm going to model tails as 0 and heads 1216 01:04:02,810 --> 01:04:05,640 as 1, or vice versa. 1217 01:04:05,640 --> 01:04:07,200 And then what happens? 1218 01:04:07,200 --> 01:04:11,860 Your model for coin tossing, a sequence of coin tosses 1219 01:04:11,860 --> 01:04:17,550 becomes the same as your model for data. 1220 01:04:17,550 --> 01:04:20,690 So that what you know about coin tossing, you can apply to 1221 01:04:20,690 --> 01:04:22,920 data compression. 1222 01:04:22,920 --> 01:04:27,040 You see, when you think about these things mathematically, 1223 01:04:27,040 --> 01:04:29,500 then you can make all sorts of connections you 1224 01:04:29,500 --> 01:04:31,250 couldn't make otherwise. 1225 01:04:31,250 --> 01:04:37,010 So random variables have to satisfy the constraint that-- 1226 01:04:45,670 --> 01:04:51,340 they have to satisfy the constraint that the set of 1227 01:04:51,340 --> 01:04:56,970 sample points, such that x, x of omega, which is a real 1228 01:04:56,970 --> 01:05:01,080 number, is less than or equal to some given real number. 1229 01:05:01,080 --> 01:05:04,050 That this set has to be an event. 1230 01:05:04,050 --> 01:05:05,680 Because those are the only things that have 1231 01:05:05,680 --> 01:05:07,310 probabilities. 1232 01:05:07,310 --> 01:05:09,970 So if we want to be able to talk about the probabilities 1233 01:05:09,970 --> 01:05:14,080 of these random variables lying in certain ranges, or 1234 01:05:14,080 --> 01:05:16,850 things like this, or having PMFs, or anything that you 1235 01:05:16,850 --> 01:05:19,750 like to do, you need this constraint on it. 1236 01:05:19,750 --> 01:05:24,470 It's an event for all A in the set of real numbers. 1237 01:05:24,470 --> 01:05:29,790 Also, if this set of things here 1238 01:05:29,790 --> 01:05:31,100 are each random variables. 1239 01:05:31,100 --> 01:05:33,900 In other words, if each of them are functions from the 1240 01:05:33,900 --> 01:05:42,600 sample space to the real line, then the set of omega such 1241 01:05:42,600 --> 01:05:46,620 that x1 of omega is less than or equal to A1, up to An of 1242 01:05:46,620 --> 01:05:50,940 omega is less than or equal to A n is an event also. 1243 01:05:50,940 --> 01:05:54,860 You might recognize this as the distribution function, the 1244 01:05:54,860 --> 01:05:58,190 joint distribution function for n random variables. 1245 01:05:58,190 --> 01:06:01,920 You might recognize this as the distribution function 1246 01:06:01,920 --> 01:06:05,830 evaluated at A for a single random variable. 1247 01:06:05,830 --> 01:06:12,770 So you define a random variable. 1248 01:06:12,770 --> 01:06:14,750 And what we're doing here is-- 1249 01:06:14,750 --> 01:06:17,880 it's kind of funny because we already have these axioms. 1250 01:06:17,880 --> 01:06:20,770 But now when we want to define things in the context of these 1251 01:06:20,770 --> 01:06:24,905 axioms, we need extra things in the definitions. 1252 01:06:28,840 --> 01:06:32,840 This is a distribution function, a distribution 1253 01:06:32,840 --> 01:06:35,990 function of the random variable x is the probability 1254 01:06:35,990 --> 01:06:39,860 that the random variable x is less than or equal to x, which 1255 01:06:39,860 --> 01:06:45,850 means that x is a mapping from omega into real numbers. 1256 01:06:45,850 --> 01:06:48,670 It says that with this mapping here, you're mapping this 1257 01:06:48,670 --> 01:06:51,730 whole sample space into the real line. 1258 01:06:51,730 --> 01:06:55,690 Some omegas get mapped into things less than or equal to a 1259 01:06:55,690 --> 01:06:56,930 real number x. 1260 01:06:56,930 --> 01:06:59,510 Some of them get mapped into things greater than 1261 01:06:59,510 --> 01:07:01,590 the real number x. 1262 01:07:01,590 --> 01:07:04,820 And the set that gets mapped into something less than or 1263 01:07:04,820 --> 01:07:08,400 equal to x, according to the definition of a random 1264 01:07:08,400 --> 01:07:10,230 variable, has to be an event. 1265 01:07:10,230 --> 01:07:13,000 Therefore, it has to have a probability. 1266 01:07:13,000 --> 01:07:18,390 And these probabilities increase as we go. 1267 01:07:18,390 --> 01:07:24,440 It is totally immaterial for all purposes whether we have a 1268 01:07:24,440 --> 01:07:28,690 less than or equal to here or a less than here. 1269 01:07:28,690 --> 01:07:32,450 And everyone follows the convention of using a less 1270 01:07:32,450 --> 01:07:36,420 than or equal to here rather than a less than here. 1271 01:07:36,420 --> 01:07:41,360 The importance of that is that when you look at a 1272 01:07:41,360 --> 01:07:44,240 distribution function, the distribution 1273 01:07:44,240 --> 01:07:46,820 function often has jumps. 1274 01:07:46,820 --> 01:07:53,100 And the distribution will have a jump whenever there's a 1275 01:07:53,100 --> 01:07:57,130 nonzero probability that the random variable takes on a 1276 01:07:57,130 --> 01:08:00,270 particular value x here. 1277 01:08:00,270 --> 01:08:04,160 It takes on this particular value with something more than 1278 01:08:04,160 --> 01:08:05,520 probabilities here. 1279 01:08:05,520 --> 01:08:09,330 If you have a probability density for a random variable, 1280 01:08:09,330 --> 01:08:12,680 this curve just moves up continuously. 1281 01:08:12,680 --> 01:08:14,400 And the derivative of this curve is 1282 01:08:14,400 --> 01:08:16,310 the probability density. 1283 01:08:16,310 --> 01:08:19,970 If you have a probability mass function, this is a staircase 1284 01:08:19,970 --> 01:08:21,220 type of function. 1285 01:08:24,510 --> 01:08:27,279 Because of the fact that we define the distribution 1286 01:08:27,279 --> 01:08:31,410 function with a less than or equal to rather than a less 1287 01:08:31,410 --> 01:08:37,819 than means that in every one of these jumps, the value here 1288 01:08:37,819 --> 01:08:40,439 is the upper value of the jump. 1289 01:08:40,439 --> 01:08:45,180 Value here is the upper value of the jump, and so forth. 1290 01:08:45,180 --> 01:08:46,500 Now, I'm going to-- 1291 01:08:54,616 --> 01:08:57,649 I've already said half of this. 1292 01:08:57,649 --> 01:09:02,120 Affects maps only until finite or countable set of values. 1293 01:09:02,120 --> 01:09:03,370 It's discrete. 1294 01:09:05,350 --> 01:09:08,139 And it has a probability mass function-- 1295 01:09:08,139 --> 01:09:09,880 this notation. 1296 01:09:09,880 --> 01:09:13,640 If the derivative exists, then you say that the random 1297 01:09:13,640 --> 01:09:18,370 variable is continuous and it has a density. 1298 01:09:18,370 --> 01:09:22,319 And most problems that you do in probability theory, you're 1299 01:09:22,319 --> 01:09:23,689 dealing with random variables. 1300 01:09:23,689 --> 01:09:25,979 And they either have a probability mass function if 1301 01:09:25,979 --> 01:09:28,970 they're discrete or they have a density if they're 1302 01:09:28,970 --> 01:09:30,140 continuous. 1303 01:09:30,140 --> 01:09:33,689 And this is just saying some things are one way, some 1304 01:09:33,689 --> 01:09:36,069 things are the other way. 1305 01:09:36,069 --> 01:09:37,529 And some things are neither. 1306 01:09:37,529 --> 01:09:40,180 And we'll see lots of things that are neither. 1307 01:09:40,180 --> 01:09:43,330 And you need the distribution function to talk about things 1308 01:09:43,330 --> 01:09:44,529 that are neither. 1309 01:09:44,529 --> 01:09:46,770 We will find that the distribution function, which 1310 01:09:46,770 --> 01:09:50,120 you've hardly ever used in the past, is extraordinarily 1311 01:09:50,120 --> 01:09:55,440 important, both for theoretical purposes and for 1312 01:09:55,440 --> 01:09:56,230 other purposes. 1313 01:09:56,230 --> 01:10:02,080 You really need that as a way of solving problems, as well 1314 01:10:02,080 --> 01:10:06,040 as keeping yourself out of trouble. 1315 01:10:06,040 --> 01:10:10,530 For every random variable, the distribution function exists. 1316 01:10:10,530 --> 01:10:11,580 Why? 1317 01:10:11,580 --> 01:10:13,990 Anybody know why this has to exist for 1318 01:10:13,990 --> 01:10:15,240 every random variable? 1319 01:10:21,450 --> 01:10:22,215 Yeah. 1320 01:10:22,215 --> 01:10:23,670 AUDIENCE: Because the [INAUDIBLE]. 1321 01:10:31,230 --> 01:10:31,740 PROFESSOR: Yes. 1322 01:10:31,740 --> 01:10:35,470 Because we insisted that it did. 1323 01:10:35,470 --> 01:10:40,520 Namely, we insisted that this event actually was an event 1324 01:10:40,520 --> 01:10:42,990 for all little x. 1325 01:10:42,990 --> 01:10:46,410 That's part of the definition. 1326 01:10:46,410 --> 01:10:51,560 So, in fact, when you do these things a little more carefully 1327 01:10:51,560 --> 01:10:56,810 than you might be used to, the definition implies that the 1328 01:10:56,810 --> 01:11:01,790 distribution function always exists. 1329 01:11:01,790 --> 01:11:06,110 As a more real-world kind of argument, we now have a way of 1330 01:11:06,110 --> 01:11:09,680 dealing with things that are discrete, and continuous, and 1331 01:11:09,680 --> 01:11:14,530 mixed continuous and discrete, and anything else that you 1332 01:11:14,530 --> 01:11:19,095 might think of because the definition restricts it. 1333 01:11:19,095 --> 01:11:23,310 Now, one other thing. 1334 01:11:23,310 --> 01:11:25,185 How do I know that this starts at 0? 1335 01:11:29,650 --> 01:11:32,680 That's a more complicated thing. 1336 01:11:32,680 --> 01:11:37,820 And I'm not even going to do it in detail. 1337 01:11:37,820 --> 01:11:43,370 But since every omega maps into a finite number, you 1338 01:11:43,370 --> 01:11:47,300 can't have a jump down here at minus infinity. 1339 01:11:47,300 --> 01:11:51,280 And you can't have a jump here at plus infinity. 1340 01:11:51,280 --> 01:11:54,400 Because omegas don't map into plus 1341 01:11:54,400 --> 01:11:55,910 infinity or minus infinity. 1342 01:11:55,910 --> 01:11:58,760 So you have to start down here at 0. 1343 01:11:58,760 --> 01:12:00,510 You have to climb up here to 1. 1344 01:12:00,510 --> 01:12:01,870 You might never reach 1. 1345 01:12:01,870 --> 01:12:05,610 You might reach it only as a limit, but you have to reach 1346 01:12:05,610 --> 01:12:07,940 it as a limit. 1347 01:12:07,940 --> 01:12:09,002 Yes? 1348 01:12:09,002 --> 01:12:11,312 AUDIENCE: In the first paragraph, [INAUDIBLE]. 1349 01:12:19,760 --> 01:12:21,790 PROFESSOR: If we have a sequence of [INAUDIBLE], yeah. 1350 01:12:21,790 --> 01:12:25,230 AUDIENCE: It's [INAUDIBLE]. 1351 01:12:25,230 --> 01:12:27,180 PROFESSOR: You are probably right. 1352 01:12:27,180 --> 01:12:29,820 Yes. 1353 01:12:29,820 --> 01:12:33,030 Well, I don't know I don't think about that one. 1354 01:12:33,030 --> 01:12:38,320 I don't think you're right, but we can argue about it. 1355 01:12:38,320 --> 01:12:41,130 But anyway, this has to start at 0. 1356 01:12:41,130 --> 01:12:42,380 It has to go up to 1. 1357 01:12:44,860 --> 01:12:50,400 OK, we did this. 1358 01:12:50,400 --> 01:12:54,150 Now, I'm going to go through a theoretical nitpick for the 1359 01:12:54,150 --> 01:12:56,960 last five minutes of the class. 1360 01:12:56,960 --> 01:12:59,340 Anyone who doesn't like theoretical nitpicks, you're 1361 01:12:59,340 --> 01:13:03,380 welcome to either go to sleep for five minutes, or you're 1362 01:13:03,380 --> 01:13:06,030 welcome to go out and get a cup of coffee, or whatever you 1363 01:13:06,030 --> 01:13:07,090 want to do. 1364 01:13:07,090 --> 01:13:10,230 I will do this to you occasionally. 1365 01:13:10,230 --> 01:13:15,260 And I realize it's almost torture for some of you, 1366 01:13:15,260 --> 01:13:20,890 because I want to get you used to thinking about how 1367 01:13:20,890 --> 01:13:26,590 relatively obvious things actually get proven. 1368 01:13:26,590 --> 01:13:29,650 I want to increase your ability to prove things. 1369 01:13:29,650 --> 01:13:34,010 The general statement about proving things, or at least 1370 01:13:34,010 --> 01:13:38,000 the way I prove things, is not the way most mathematicians 1371 01:13:38,000 --> 01:13:38,750 prove things. 1372 01:13:38,750 --> 01:13:41,440 Most mathematicians prove things by starting out with 1373 01:13:41,440 --> 01:13:46,190 axioms and going step by step until they get to what they're 1374 01:13:46,190 --> 01:13:48,340 trying to prove. 1375 01:13:48,340 --> 01:13:53,020 Now, every time I talk to a good mathematician, I find out 1376 01:13:53,020 --> 01:13:56,660 that's what they write down when they prove something, but 1377 01:13:56,660 --> 01:14:00,140 that's not the way they think about it at all. 1378 01:14:00,140 --> 01:14:04,540 All of us-- engineers, businesspeople, everyone-- 1379 01:14:04,540 --> 01:14:06,470 thinks about problems in a different way. 1380 01:14:06,470 --> 01:14:11,080 If we're trying to prove something, we first give a 1381 01:14:11,080 --> 01:14:13,770 really half-assed proof of it. 1382 01:14:13,770 --> 01:14:16,580 And after we do that, we look at it and we say, well, I 1383 01:14:16,580 --> 01:14:19,570 don't see why this is true and I don't see why that's true. 1384 01:14:19,570 --> 01:14:22,390 And then you go back and you patch these things up. 1385 01:14:22,390 --> 01:14:24,020 And then after you patch things up, it 1386 01:14:24,020 --> 01:14:25,290 starts to look ugly. 1387 01:14:25,290 --> 01:14:27,830 So you go back and do it a nicer way. 1388 01:14:27,830 --> 01:14:30,190 And you go back and forth and back and forth and back and 1389 01:14:30,190 --> 01:14:37,070 forth, using both contradiction and implication. 1390 01:14:37,070 --> 01:14:39,270 You use both of them. 1391 01:14:39,270 --> 01:14:42,750 Now, when you're proving things in this class, I don't 1392 01:14:42,750 --> 01:14:46,620 care whether you make it look like you're a formal 1393 01:14:46,620 --> 01:14:48,040 mathematician or not. 1394 01:14:48,040 --> 01:14:51,230 I would just assume you didn't pretend you were a formal 1395 01:14:51,230 --> 01:14:52,560 mathematician. 1396 01:14:52,560 --> 01:14:56,870 I would like to see you prove things in such a way that it 1397 01:14:56,870 --> 01:15:00,890 is at least difficult to poke a hole in your argument. 1398 01:15:00,890 --> 01:15:03,620 In other words, I would like you to give an argument which 1399 01:15:03,620 --> 01:15:06,590 you've thought about enough that there aren't obvious 1400 01:15:06,590 --> 01:15:08,380 counter examples to it. 1401 01:15:08,380 --> 01:15:10,940 And if you learn to do that, you're well on your way to 1402 01:15:10,940 --> 01:15:16,080 learning to use this theory in a way where you can actually 1403 01:15:16,080 --> 01:15:20,020 come up with correct answer. 1404 01:15:20,020 --> 01:15:23,090 And in fact, I'm not going to go through this proof at all. 1405 01:15:23,090 --> 01:15:24,830 And I don't think I really wanted to. 1406 01:15:24,830 --> 01:15:26,440 I just did it because-- 1407 01:15:26,440 --> 01:15:27,960 well, I think it's something you ought to read. 1408 01:15:31,180 --> 01:15:34,410 It is important to learn to prove things because when you 1409 01:15:34,410 --> 01:15:38,800 get to complicated systems, you cannot see your way 1410 01:15:38,800 --> 01:15:41,000 through them intuitively. 1411 01:15:41,000 --> 01:15:44,010 And if you can't see your way through it intuitively, you 1412 01:15:44,010 --> 01:15:47,170 need to understand something about how to prove things, and 1413 01:15:47,170 --> 01:15:50,320 you need to put all the techniques of proving things 1414 01:15:50,320 --> 01:15:53,920 that you learned together with all the techniques that you've 1415 01:15:53,920 --> 01:15:56,390 learned for doing things intuitively. 1416 01:15:56,390 --> 01:15:59,410 And you need to know how to put them together. 1417 01:15:59,410 --> 01:16:02,170 If you're stuck dealing only with things that are 1418 01:16:02,170 --> 01:16:05,100 intuitive, or things that you learned in high school like 1419 01:16:05,100 --> 01:16:11,660 calculus, then you really can't deal with complicated 1420 01:16:11,660 --> 01:16:12,960 systems very well. 1421 01:16:12,960 --> 01:16:16,270 OK, I'm going to end at that point. 1422 01:16:16,270 --> 01:16:19,040 You can read this theoretical nitpick if you want, 1423 01:16:19,040 --> 01:16:21,100 and play with it. 1424 01:16:21,100 --> 01:16:22,350 And we'll go on next time.