1 00:00:00,060 --> 00:00:01,780 The following content is provided 2 00:00:01,780 --> 00:00:04,019 under a Creative Commons license. 3 00:00:04,019 --> 00:00:06,870 Your support will help MIT OpenCourseWare continue 4 00:00:06,870 --> 00:00:10,730 to offer high quality educational resources for free. 5 00:00:10,730 --> 00:00:13,330 To make a donation or view additional materials 6 00:00:13,330 --> 00:00:15,780 from hundreds of MIT courses, visit 7 00:00:15,780 --> 00:00:20,720 MIT OpenCourseWare at ocw.mit.edu 8 00:00:20,720 --> 00:00:24,420 PROFESSOR: So we started by talking about thermodynamics. 9 00:00:24,420 --> 00:00:28,500 And then switched off to talking about probability. 10 00:00:28,500 --> 00:00:31,850 And you may well ask, what's the connection between these? 11 00:00:31,850 --> 00:00:35,030 And we will eventually try to build that connection 12 00:00:35,030 --> 00:00:36,920 through statistical physics. 13 00:00:36,920 --> 00:00:39,890 And maybe this lecture today will sort of 14 00:00:39,890 --> 00:00:44,670 provide you with why these elements of probability 15 00:00:44,670 --> 00:00:48,040 are important and essential to making this bridge. 16 00:00:48,040 --> 00:00:51,920 So last time, I started with talking about the Central Limit 17 00:00:51,920 --> 00:01:02,950 Theorem which pertains to adding lots of variables 18 00:01:02,950 --> 00:01:06,810 together to form a sum. 19 00:01:06,810 --> 00:01:10,100 And the control parameter that we will use 20 00:01:10,100 --> 00:01:12,090 is this number of terms in the sum. 21 00:01:15,170 --> 00:01:21,190 So in principle, there's a joint PDF 22 00:01:21,190 --> 00:01:25,740 that determines how these variables are distributed. 23 00:01:29,380 --> 00:01:34,600 And using that, we can calculate various characteristics 24 00:01:34,600 --> 00:01:36,410 of this sum. 25 00:01:36,410 --> 00:01:41,310 If I were to raise the sum to some power m, 26 00:01:41,310 --> 00:01:46,840 I could do that by doing a sum over i running from let's say 27 00:01:46,840 --> 00:01:55,780 i1 running from 1 to N, i2 running from-- im running 28 00:01:55,780 --> 00:01:59,940 from 1 to N, so basically speaking this sum. 29 00:01:59,940 --> 00:02:05,880 And then I have x of i1, x of i2, x of im. 30 00:02:05,880 --> 00:02:10,160 So basically I multiplied m copies of the original sum 31 00:02:10,160 --> 00:02:11,690 together. 32 00:02:11,690 --> 00:02:17,560 And if I were to calculate some moment of this, 33 00:02:17,560 --> 00:02:22,010 basically the moment of a sum is the sum of the moments. 34 00:02:22,010 --> 00:02:24,320 I could do this. 35 00:02:24,320 --> 00:02:27,570 Now the last thing that we did last time 36 00:02:27,570 --> 00:02:30,050 was to look at some characteristic function 37 00:02:30,050 --> 00:02:32,740 for the sum related to the characteristic function 38 00:02:32,740 --> 00:02:36,210 of this joint probability distribution, 39 00:02:36,210 --> 00:02:40,030 and conclude that actually exactly the same relation holds 40 00:02:40,030 --> 00:02:44,020 if I were to put index c for a cumulant. 41 00:02:44,020 --> 00:02:48,240 And that is basically, say the mean is the sum of the means, 42 00:02:48,240 --> 00:02:51,190 the variance is sum of all possible variances 43 00:02:51,190 --> 00:02:52,640 and covariances. 44 00:02:52,640 --> 00:02:56,390 And this holds to all orders. 45 00:02:56,390 --> 00:02:56,890 OK? 46 00:02:56,890 --> 00:02:57,390 Fine. 47 00:02:57,390 --> 00:03:00,390 So where do we go from here? 48 00:03:00,390 --> 00:03:04,370 We are going to gradually simplify the problem in order 49 00:03:04,370 --> 00:03:08,150 to get some final result that we want. 50 00:03:08,150 --> 00:03:11,780 But that result eventually is a little bit more general 51 00:03:11,780 --> 00:03:13,920 than the simplification. 52 00:03:13,920 --> 00:03:15,810 The first simplification that we do 53 00:03:15,810 --> 00:03:17,780 is to look at independent variables. 54 00:03:22,870 --> 00:03:26,340 And what happened when we had the independent variables 55 00:03:26,340 --> 00:03:28,700 was that the probability distribution 56 00:03:28,700 --> 00:03:32,300 could be written as the product of probability distributions 57 00:03:32,300 --> 00:03:33,990 pertaining to different ones. 58 00:03:33,990 --> 00:03:41,290 I would have a p1 acting on x1, a p2 acting on x2, 59 00:03:41,290 --> 00:03:43,170 a pn acting on the xn. 60 00:03:49,430 --> 00:03:53,650 Now, when we did that, we saw that actually one 61 00:03:53,650 --> 00:03:57,185 of the conditions that would then follow from this if we 62 00:03:57,185 --> 00:04:00,150 were to Fourier transform and then try to expand in powers 63 00:04:00,150 --> 00:04:04,160 of k, is we would never get in the expansion of the log terms 64 00:04:04,160 --> 00:04:07,830 that were coupling different k's. 65 00:04:07,830 --> 00:04:13,190 Essentially all of the joint cumulants involving things 66 00:04:13,190 --> 00:04:16,560 other than one variable by itself would vanish. 67 00:04:16,560 --> 00:04:21,209 So essentially in that limit, the only terms 68 00:04:21,209 --> 00:04:23,920 in this that would survive we're the ones 69 00:04:23,920 --> 00:04:27,270 in which all of the indices were the same. 70 00:04:27,270 --> 00:04:30,090 So basically in that case, I would write this 71 00:04:30,090 --> 00:04:42,830 as a sum i running from one to N, xi to the power of N. 72 00:04:42,830 --> 00:04:46,690 So basically for independent variables, 73 00:04:46,690 --> 00:04:50,010 let's say, the variance is the sum of the variances, 74 00:04:50,010 --> 00:04:52,980 the third cumulant is the sum of the third cumulants, et cetera. 75 00:04:56,050 --> 00:04:59,110 One more simplification. 76 00:04:59,110 --> 00:05:02,140 Again not necessary for the final thing 77 00:05:02,140 --> 00:05:04,710 that we want to have in mind. 78 00:05:04,710 --> 00:05:06,710 But let's just assume that all of these 79 00:05:06,710 --> 00:05:12,959 are identically distributed. 80 00:05:16,312 --> 00:05:20,920 By that I mean that this is basically the same probability 81 00:05:20,920 --> 00:05:23,200 that I would use for each one of them. 82 00:05:23,200 --> 00:05:27,350 So this I could write as a product over i one 83 00:05:27,350 --> 00:05:32,360 to N, the same p for each xi. 84 00:05:32,360 --> 00:05:41,190 Just to make sure you sum notation that you may see every 85 00:05:41,190 --> 00:05:46,660 now and then, variables that are independent and identically 86 00:05:46,660 --> 00:05:48,843 distributed are sometimes called IID's. 87 00:05:53,850 --> 00:06:00,010 And if I focus my attention to these IID's, then 88 00:06:00,010 --> 00:06:03,620 all of these things are clearly the same thing. 89 00:06:03,620 --> 00:06:08,160 And the answer would be simply N times the cumulant 90 00:06:08,160 --> 00:06:11,730 that I would have for one of them. 91 00:06:11,730 --> 00:06:14,310 This-- actually some version of this, 92 00:06:14,310 --> 00:06:17,320 we already saw for the binomial distribution 93 00:06:17,320 --> 00:06:19,340 in which the same coin, let's say, 94 00:06:19,340 --> 00:06:22,600 was thrown N independent times. 95 00:06:22,600 --> 00:06:27,000 And all of the cumulants for the sum of the number of heads, 96 00:06:27,000 --> 00:06:30,560 let's say, were related to the cumulants in one trial 97 00:06:30,560 --> 00:06:33,520 that you would get. 98 00:06:33,520 --> 00:06:34,860 OK? 99 00:06:34,860 --> 00:06:35,900 So fine. 100 00:06:35,900 --> 00:06:40,180 Nothing so far here. 101 00:06:40,180 --> 00:06:43,880 However let's imagine now that I construct a variable that I 102 00:06:43,880 --> 00:06:48,170 will call y, which is the variable 103 00:06:48,170 --> 00:06:50,720 x, this sum that I have. 104 00:06:50,720 --> 00:07:00,130 From it I subtract N times the mean, 105 00:07:00,130 --> 00:07:07,925 and then I divide by square root of N. 106 00:07:07,925 --> 00:07:11,170 I can certainly choose to do so. 107 00:07:11,170 --> 00:07:16,620 Then what we observe here is that the average of y 108 00:07:16,620 --> 00:07:18,470 by this construction is 0. 109 00:07:18,470 --> 00:07:22,530 Because essentially, I make sure that the average of x 110 00:07:22,530 --> 00:07:25,164 is subtracted. 111 00:07:25,164 --> 00:07:27,060 No problem. 112 00:07:27,060 --> 00:07:31,350 Average of y squared-- not average of y squared, 113 00:07:31,350 --> 00:07:33,380 but the variance. 114 00:07:33,380 --> 00:07:36,860 Surely it's easy to show the variance doesn't really 115 00:07:36,860 --> 00:07:39,840 depend on the subtraction. 116 00:07:39,840 --> 00:07:43,750 It is the same thing as the variance of x. 117 00:07:43,750 --> 00:07:46,740 So it is going to be essentially x 118 00:07:46,740 --> 00:07:51,480 squared c divided by square of this. 119 00:07:51,480 --> 00:07:56,720 So I will have N. And x squared, big x 120 00:07:56,720 --> 00:07:59,500 squared cumulant, according to this rule, 121 00:07:59,500 --> 00:08:01,850 is N times small x squared cumulant. 122 00:08:01,850 --> 00:08:05,180 And I get something like this. 123 00:08:05,180 --> 00:08:07,930 Still nothing interesting. 124 00:08:07,930 --> 00:08:11,720 But now let's look at the m-th cumulant. 125 00:08:11,720 --> 00:08:20,650 So let's look at y m c for m that is greater than 2. 126 00:08:20,650 --> 00:08:21,970 And then what do I get? 127 00:08:21,970 --> 00:08:32,340 I will get to N times x m c divided by N to the m over 2. 128 00:08:32,340 --> 00:08:35,570 The N to the power of m over 2 just 129 00:08:35,570 --> 00:08:38,400 came from raising this to the power of m, 130 00:08:38,400 --> 00:08:41,020 since I'm looking at y to the m. 131 00:08:41,020 --> 00:08:46,720 And x to the m c, according to this, is N times x1. 132 00:08:49,280 --> 00:08:50,920 Now we see that this is something 133 00:08:50,920 --> 00:08:55,600 that is proportional to the N to the power of 1 minus m over 2. 134 00:08:55,600 --> 00:08:59,730 And since I chose m to be greater than 2, 135 00:08:59,730 --> 00:09:04,260 in the limit that N becomes much, much larger than 1, 136 00:09:04,260 --> 00:09:05,434 this goes to 0. 137 00:09:08,160 --> 00:09:13,120 So if I look at the limit where the number of terms in the sum 138 00:09:13,120 --> 00:09:17,280 is much larger than 1, what I conclude 139 00:09:17,280 --> 00:09:19,920 is that the probability distribution for this variable 140 00:09:19,920 --> 00:09:25,170 that I have constructed has 0 mean, a finite variance, 141 00:09:25,170 --> 00:09:27,110 and all the other higher order cumulants 142 00:09:27,110 --> 00:09:29,640 are asymptotically vanishing. 143 00:09:29,640 --> 00:09:34,630 So I know that the probability of y, 144 00:09:34,630 --> 00:09:38,280 which is this variable that I have given you up there, 145 00:09:38,280 --> 00:09:41,570 is given by the one distribution that we know is completely 146 00:09:41,570 --> 00:09:44,565 characterized by its first and second cumulant, which 147 00:09:44,565 --> 00:09:45,940 is the Gaussian. 148 00:09:45,940 --> 00:09:51,220 So it is exponential of minus y squared, two times its variance 149 00:09:51,220 --> 00:09:53,182 divided, appropriately normalized. 150 00:10:03,520 --> 00:10:06,415 Essentially this sum is Gaussian distributed. 151 00:10:10,030 --> 00:10:15,660 And this result is true for things 152 00:10:15,660 --> 00:10:34,830 that are not IID's so long as this sum i1 to im, one 153 00:10:34,830 --> 00:10:45,670 to N, xi1 to xim goes as N goes to infinity, much, 154 00:10:45,670 --> 00:10:48,980 much less than 1, as long as it is 155 00:10:48,980 --> 00:10:54,165 less than-- less than strictly than N to the m over 2. 156 00:10:57,780 --> 00:11:00,870 So basically, what I want to do is 157 00:11:00,870 --> 00:11:05,470 to ensure that when I construct the analog of this, 158 00:11:05,470 --> 00:11:09,380 I would have something that when I divide by N to the m over 2, 159 00:11:09,380 --> 00:11:11,870 I will asymptotically go to 0. 160 00:11:11,870 --> 00:11:16,300 So in the case of IID's, the numerator goes like N, 161 00:11:16,300 --> 00:11:18,330 it could be that I have correlations 162 00:11:18,330 --> 00:11:21,520 among the variables et cetera, so that there 163 00:11:21,520 --> 00:11:25,870 are other terms in the sum because of the correlations 164 00:11:25,870 --> 00:11:28,770 as long as the sum total of them asymptotically 165 00:11:28,770 --> 00:11:31,810 grows less than N to the m over 2, 166 00:11:31,810 --> 00:11:34,580 this statement that the sum is Gaussian 167 00:11:34,580 --> 00:11:37,520 distributed it is going to be valid. 168 00:11:37,520 --> 00:11:38,020 Yes. 169 00:11:38,020 --> 00:11:40,536 AUDIENCE: Question-- how can you compare 170 00:11:40,536 --> 00:11:46,497 a value of [INAUDIBLE] with number of variables that you 171 00:11:46,497 --> 00:11:46,996 [INAUDIBLE]? 172 00:11:46,996 --> 00:11:49,810 Because this is a-- just, if, say, 173 00:11:49,810 --> 00:11:53,305 your random value is set [? in advance-- ?] 174 00:11:53,305 --> 00:11:56,450 PROFESSOR: So basically, you choose a probability 175 00:11:56,450 --> 00:12:01,320 distribution-- at least in this case, it is obvious. 176 00:12:01,320 --> 00:12:03,970 In this case, basically what we want to know 177 00:12:03,970 --> 00:12:06,200 is that there is a probability distribution 178 00:12:06,200 --> 00:12:08,310 for individual variables. 179 00:12:08,310 --> 00:12:11,170 And I repeat it many, many times. 180 00:12:11,170 --> 00:12:13,450 So it is like the coin. 181 00:12:13,450 --> 00:12:15,945 So for the coin I will ensure that I 182 00:12:15,945 --> 00:12:18,070 will throw it hundreds of times. 183 00:12:18,070 --> 00:12:20,510 Now suppose that for some reason, if I throw 184 00:12:20,510 --> 00:12:23,730 the coin once, the next five times 185 00:12:23,730 --> 00:12:27,480 it is much more likely to be the same thing that I had before. 186 00:12:27,480 --> 00:12:30,030 Kind of some strange coin, or whatever. 187 00:12:30,030 --> 00:12:33,430 Then there is some correlation up to five. 188 00:12:33,430 --> 00:12:36,010 So when I'm calculating things up to five, 189 00:12:36,010 --> 00:12:38,670 there all kinds of results over here. 190 00:12:38,670 --> 00:12:42,860 But as long as that's five is independent of the length 191 00:12:42,860 --> 00:12:46,960 of the sequence, if I throw things 1,000 times, still only 192 00:12:46,960 --> 00:12:49,230 groups of five that are correlated, 193 00:12:49,230 --> 00:12:50,860 then this result still holds. 194 00:12:50,860 --> 00:12:54,450 Because I have the additional parameter N to play with. 195 00:12:54,450 --> 00:12:57,810 So I want to have a parameter N to play with 196 00:12:57,810 --> 00:13:01,840 to go to infinity which is independent of what 197 00:13:01,840 --> 00:13:04,881 characterizes the distribution of my variable. 198 00:13:04,881 --> 00:13:06,797 AUDIENCE: I was mainly concerned with the fact 199 00:13:06,797 --> 00:13:10,222 that you compare the cumulant which 200 00:13:10,222 --> 00:13:13,673 has the same dimension as your random variable. 201 00:13:13,673 --> 00:13:17,620 So if my random variable is-- I measure length or something. 202 00:13:17,620 --> 00:13:23,365 I do it many, many times length is measured in meters, 203 00:13:23,365 --> 00:13:26,420 and you try to compare it to a number of measurements. 204 00:13:26,420 --> 00:13:29,662 So, shouldn't there be some dimensionful constant 205 00:13:29,662 --> 00:13:31,000 on the right? 206 00:13:31,000 --> 00:13:32,970 PROFESSOR: So here, this quantity 207 00:13:32,970 --> 00:13:36,950 has dimensions of meter to m-th power, 208 00:13:36,950 --> 00:13:40,280 this quantity has dimensions of meter to the m-th power. 209 00:13:40,280 --> 00:13:43,630 This quantity is dimensionless. 210 00:13:43,630 --> 00:13:44,180 Right? 211 00:13:44,180 --> 00:13:47,120 So what I want is the N dependence 212 00:13:47,120 --> 00:13:51,670 to be such that when I go to large N, it goes to 0. 213 00:13:51,670 --> 00:13:54,040 It is true that this is still multiplying 214 00:13:54,040 --> 00:13:58,073 something that has-- so it is. 215 00:13:58,073 --> 00:14:02,240 AUDIENCE: It's like less than something of order of N to m/2? 216 00:14:02,240 --> 00:14:03,170 OK. 217 00:14:03,170 --> 00:14:06,229 PROFESSOR: Oh this is what you-- order. 218 00:14:06,229 --> 00:14:06,729 Thank you. 219 00:14:13,631 --> 00:14:19,547 AUDIENCE: The last time [INAUDIBLE] cumulant 220 00:14:19,547 --> 00:14:20,997 [INAUDIBLE]? 221 00:14:20,997 --> 00:14:22,080 PROFESSOR: Yes, thank you. 222 00:14:27,800 --> 00:14:30,635 Any other correction, clarification? 223 00:14:33,270 --> 00:14:34,630 OK. 224 00:14:34,630 --> 00:14:38,060 So again but we will see that essentially 225 00:14:38,060 --> 00:14:40,530 in statistical physics, we will have, 226 00:14:40,530 --> 00:14:43,620 always, to deal with some analog of this N, 227 00:14:43,620 --> 00:14:47,090 like the part number of molecules of gas in this room, 228 00:14:47,090 --> 00:14:54,370 et cetera, that enables us to use something like this. 229 00:14:54,370 --> 00:14:57,420 I mean, it is clear that in this case, 230 00:14:57,420 --> 00:15:02,450 I chose to subtract the mean and divide by N to the 1/2. 231 00:15:02,450 --> 00:15:08,020 But suppose I didn't have the division by N to the 1/2. 232 00:15:08,020 --> 00:15:11,940 Then what happens is that I could have divided for example 233 00:15:11,940 --> 00:15:16,490 by N. Then my distribution for something 234 00:15:16,490 --> 00:15:21,300 that has a well-defined, independent mean 235 00:15:21,300 --> 00:15:24,540 would have gone to something like a delta function 236 00:15:24,540 --> 00:15:27,120 in the limit of N going to infinity. 237 00:15:27,120 --> 00:15:31,210 But I kind of sort of change my scale 238 00:15:31,210 --> 00:15:34,340 by dividing by N to the 1/2 rather than 239 00:15:34,340 --> 00:15:37,940 N to sort of emphasize that the scale of fluctuations 240 00:15:37,940 --> 00:15:40,190 is of the order of square root of N. 241 00:15:40,190 --> 00:15:44,120 This is again something that generically happens. 242 00:15:44,120 --> 00:15:47,580 So let's say, we know the energy of the gas in this room 243 00:15:47,580 --> 00:15:50,810 to be proportional to volume or whatever. 244 00:15:50,810 --> 00:15:52,960 The amount of uncertainty that we have 245 00:15:52,960 --> 00:15:56,830 will be of the order of square root of volume. 246 00:15:56,830 --> 00:15:59,080 So it's clear that we are kind of building 247 00:15:59,080 --> 00:16:05,580 results that have to do with dependencies on N. So let's 248 00:16:05,580 --> 00:16:16,010 sort of look at some other things that happen when we are 249 00:16:16,010 --> 00:16:19,700 dealing with large number of degrees of freedom. 250 00:16:19,700 --> 00:16:24,520 So already we've spoken about things 251 00:16:24,520 --> 00:16:34,240 that intensive, variables such as temperature, pressure, et 252 00:16:34,240 --> 00:16:36,130 cetera. 253 00:16:36,130 --> 00:16:40,020 And their characteristic is that if we express them 254 00:16:40,020 --> 00:16:44,470 in terms of, say, the number of constituents, 255 00:16:44,470 --> 00:16:48,690 they are independent of that number. 256 00:16:48,690 --> 00:16:54,490 As opposed to extensive quantities, such as the energy 257 00:16:54,490 --> 00:17:02,060 or the volume, et cetera, that are proportional to this. 258 00:17:02,060 --> 00:17:05,130 We can certainly imagine things that 259 00:17:05,130 --> 00:17:13,079 would increase [INAUDIBLE] the polynomial, order of N 260 00:17:13,079 --> 00:17:14,790 to some power. 261 00:17:14,790 --> 00:17:18,150 If I have N molecules of gas, and I 262 00:17:18,150 --> 00:17:20,599 ask how many pairs of interactions I have, 263 00:17:20,599 --> 00:17:24,560 you would say it's N, N minus 1 over 2, for example. 264 00:17:24,560 --> 00:17:26,920 That would be something like this. 265 00:17:26,920 --> 00:17:31,540 But most importantly, when we deal with statistical physics, 266 00:17:31,540 --> 00:17:33,420 we will encounter quantities that 267 00:17:33,420 --> 00:17:35,870 have exponential dependence. 268 00:17:35,870 --> 00:17:38,670 That is, they will be something like e 269 00:17:38,670 --> 00:17:44,280 to the N with some something that will appear after. 270 00:17:44,280 --> 00:17:48,270 An example of that is when we were, for example, calculating 271 00:17:48,270 --> 00:17:51,390 the phase space of gas particles. 272 00:17:51,390 --> 00:17:56,230 A gas particle by itself can be in a volume V. Two of them, 273 00:17:56,230 --> 00:17:58,880 jointly, can occupy a volume V squared. 274 00:17:58,880 --> 00:18:01,430 Three of them, V cubed, et cetera. 275 00:18:01,430 --> 00:18:04,370 Eventually you hit V to the N for N particles. 276 00:18:04,370 --> 00:18:07,030 So that's a kind of exponential dependence. 277 00:18:07,030 --> 00:18:11,740 So this is e g V to the N that you 278 00:18:11,740 --> 00:18:17,890 would have for joined volume of N particles. 279 00:18:17,890 --> 00:18:18,390 OK? 280 00:18:21,620 --> 00:18:23,900 So some curious things happen when 281 00:18:23,900 --> 00:18:27,000 you have these kinds of variables. 282 00:18:27,000 --> 00:18:34,120 And one thing that you may not realize 283 00:18:34,120 --> 00:18:37,860 is what happens when you summing exponentials. 284 00:18:42,250 --> 00:18:46,240 So let's imagine that I have a sum composed 285 00:18:46,240 --> 00:18:52,860 of a number of terms i running from one to script N-- script n 286 00:18:52,860 --> 00:18:56,000 is the number of terms in the sum-- that 287 00:18:56,000 --> 00:18:58,500 are of these exponential types. 288 00:18:58,500 --> 00:19:02,870 So let's actually sometimes I will call this-- never mind. 289 00:19:02,870 --> 00:19:08,260 So let's call these e to the N phi-- 290 00:19:08,260 --> 00:19:10,510 Let me write it in this fashion. 291 00:19:10,510 --> 00:19:21,010 Epsilon i where epsilon i satisfies two conditions. 292 00:19:21,010 --> 00:19:24,360 One of them, it is positive. 293 00:19:24,360 --> 00:19:26,520 And the other is that it has this kind 294 00:19:26,520 --> 00:19:28,530 of exponential dependence. 295 00:19:28,530 --> 00:19:33,770 It is order of e to the N phi i where 296 00:19:33,770 --> 00:19:37,360 there could be some prefactor or something else in front to give 297 00:19:37,360 --> 00:19:39,995 you dimension and stuff like that that you were discussing. 298 00:19:43,830 --> 00:19:46,800 I assume that the number of terms 299 00:19:46,800 --> 00:19:50,970 is less than or of the order of some polynomial. 300 00:19:53,670 --> 00:19:54,170 OK? 301 00:19:59,300 --> 00:20:05,910 Then my claim is that, in some sense, 302 00:20:05,910 --> 00:20:11,140 the sum S is the largest term. 303 00:20:14,470 --> 00:20:14,970 OK? 304 00:20:18,350 --> 00:20:21,736 So let's sort of put this graphically. 305 00:20:21,736 --> 00:20:27,370 What I'm telling you is that we have a whole bunch of terms 306 00:20:27,370 --> 00:20:29,470 that are these epsilons i's. 307 00:20:29,470 --> 00:20:32,830 They're all positive, so I can sort of indicate them 308 00:20:32,830 --> 00:20:40,790 by bars of different lengths that are positive and so forth. 309 00:20:40,790 --> 00:20:44,530 So let's say this is epsilon 1, epsilon 2 all the way 310 00:20:44,530 --> 00:20:47,730 to epsilon N. And let's say that this guy is the largest. 311 00:20:52,310 --> 00:20:58,110 And my task is to add up the length of all of these things. 312 00:20:58,110 --> 00:21:01,870 So how do I claim that the length is just the largest one. 313 00:21:01,870 --> 00:21:04,590 It's in the following sense. 314 00:21:04,590 --> 00:21:08,230 You would agree that this sum you say 315 00:21:08,230 --> 00:21:11,700 is certainly larger than the largest term, 316 00:21:11,700 --> 00:21:13,980 because I have added lots of other things 317 00:21:13,980 --> 00:21:18,030 to the largest term, and they are all positive. 318 00:21:18,030 --> 00:21:21,260 I say, fine, what I'm going to do 319 00:21:21,260 --> 00:21:24,600 is I'm going to raise the length of everybody else 320 00:21:24,600 --> 00:21:28,082 to be the same thing as epsilon max. 321 00:21:31,360 --> 00:21:35,540 And then I would say that the sum is certainly 322 00:21:35,540 --> 00:21:42,420 less than this artificial sum where I have raised everybody 323 00:21:42,420 --> 00:21:45,530 to epsilon max. 324 00:21:45,530 --> 00:21:47,510 OK? 325 00:21:47,510 --> 00:21:52,280 So then what I will do is I will take log off this expression, 326 00:21:52,280 --> 00:21:58,040 and it will be bounded by log of epsilon max and log of N 327 00:21:58,040 --> 00:22:01,530 epsilon max, which is the same thing as log of epsilon max 328 00:22:01,530 --> 00:22:17,240 plus log of N. And then I divide by N. 329 00:22:17,240 --> 00:22:21,550 And then note that the conditions that I have set up 330 00:22:21,550 --> 00:22:28,800 are such that in the limit that N goes to infinity, 331 00:22:28,800 --> 00:22:32,310 script N would be P log N over N. 332 00:22:32,310 --> 00:22:39,670 And the limit of this as N becomes much less than 1 is 0. 333 00:22:39,670 --> 00:22:43,320 Log N over N goes to 0 as N goes to infinity. 334 00:22:43,320 --> 00:22:48,150 So basically this sum is bounded on both sides 335 00:22:48,150 --> 00:22:49,710 by the same thing. 336 00:22:49,710 --> 00:22:53,210 So what we've established is that essentially log of S 337 00:22:53,210 --> 00:22:58,050 over N, its limit as N goes to infinity, 338 00:22:58,050 --> 00:23:05,140 is the same thing as a log of epsilon max over N, 339 00:23:05,140 --> 00:23:06,430 which is what? 340 00:23:06,430 --> 00:23:10,700 If I say my epsilon max's have this exponential dependence, 341 00:23:10,700 --> 00:23:12,020 is phi max. 342 00:23:15,360 --> 00:23:18,190 And actually this is again the reason for something 343 00:23:18,190 --> 00:23:20,480 that you probably have seen. 344 00:23:20,480 --> 00:23:23,610 That using statistical physics let's 345 00:23:23,610 --> 00:23:25,860 say a micro-canonical ensemble when 346 00:23:25,860 --> 00:23:28,070 you say exactly what the energy is. 347 00:23:28,070 --> 00:23:30,450 Or you look at the canonical ensemble 348 00:23:30,450 --> 00:23:33,090 where the energy can be all over the place, 349 00:23:33,090 --> 00:23:35,111 why do you get the same result? 350 00:23:35,111 --> 00:23:35,610 This is why. 351 00:23:40,150 --> 00:23:43,360 Any questions on this? 352 00:23:43,360 --> 00:23:46,102 Everybody's happy, obviously. 353 00:23:46,102 --> 00:23:47,325 Good. 354 00:23:47,325 --> 00:23:48,700 AUDIENCE: [INAUDIBLE] a question? 355 00:23:48,700 --> 00:23:49,618 PROFESSOR: Yes. 356 00:23:49,618 --> 00:23:52,110 AUDIENCE: The N on the end, [INAUDIBLE]? 357 00:23:52,110 --> 00:23:54,660 PROFESSOR: There's a script N, which is the number of terms. 358 00:23:54,660 --> 00:23:58,710 And there's the Roman N, which is the parameter that 359 00:23:58,710 --> 00:24:01,660 is the analog of the number of degrees of freedom. 360 00:24:01,660 --> 00:24:04,920 The one that we usually deal in statistical physics 361 00:24:04,920 --> 00:24:06,582 would be, say, the number of particles. 362 00:24:06,582 --> 00:24:08,832 AUDIENCE: So number of measurements [INAUDIBLE] number 363 00:24:08,832 --> 00:24:10,544 of particles. 364 00:24:10,544 --> 00:24:11,960 PROFESSOR: Number of measurements? 365 00:24:11,960 --> 00:24:14,755 AUDIENCE: So the script N is what? 366 00:24:14,755 --> 00:24:17,380 PROFESSOR: The script N could be, for example, I'm 367 00:24:17,380 --> 00:24:20,380 summing over all pairs of interactions. 368 00:24:20,380 --> 00:24:23,660 So the number of pairs would go like N squared. 369 00:24:23,660 --> 00:24:26,480 Now in reality practicality in all cases 370 00:24:26,480 --> 00:24:30,230 that you will deal with, this P would be one. 371 00:24:30,230 --> 00:24:32,640 So the number of terms that we would be dealing 372 00:24:32,640 --> 00:24:36,920 would be of the order of the number of degrees of freedom. 373 00:24:36,920 --> 00:24:43,002 So, we will see some examples of that later on. 374 00:24:43,002 --> 00:24:46,960 AUDIENCE: [INAUDIBLE] script N might be N squared? 375 00:24:46,960 --> 00:24:49,520 PROFESSOR: If I'm forced to come up with a situation 376 00:24:49,520 --> 00:24:52,450 where script N is N squared, I would 377 00:24:52,450 --> 00:24:54,050 say count the number of pairs. 378 00:24:57,810 --> 00:25:00,150 Number of pairs if I have N [? sides ?] 379 00:25:00,150 --> 00:25:02,880 is N, N minus 1 over 2. 380 00:25:02,880 --> 00:25:06,880 So this is something that goes like N squared over 2. 381 00:25:06,880 --> 00:25:09,230 Can I come up with a physical situation 382 00:25:09,230 --> 00:25:12,120 where I'm summing over the number of terms? 383 00:25:12,120 --> 00:25:14,890 Not obviously, but it could be something like that. 384 00:25:14,890 --> 00:25:18,520 The situations in statistical physics that we come up with 385 00:25:18,520 --> 00:25:21,500 is typically, let's say, in going from the micro-canonical 386 00:25:21,500 --> 00:25:23,910 to the canonical ensemble, you would 387 00:25:23,910 --> 00:25:26,820 be summing over energy levels. 388 00:25:26,820 --> 00:25:29,470 And typically, let's say, in a system 389 00:25:29,470 --> 00:25:31,820 that is bounded the number of energy levels 390 00:25:31,820 --> 00:25:35,350 is proportional to the number of particles. 391 00:25:44,540 --> 00:25:49,500 Now there cases that actually, in going from micro-canonical 392 00:25:49,500 --> 00:25:53,020 to canonical, like the energy of the gas in this room, 393 00:25:53,020 --> 00:25:58,010 the energy axis goes all the way from 0 to infinity. 394 00:25:58,010 --> 00:26:02,200 So there is a continuous version of the summation procedure 395 00:26:02,200 --> 00:26:05,340 that we have that is then usually applied 396 00:26:05,340 --> 00:26:10,846 which is in mathematics is called the saddle point 397 00:26:10,846 --> 00:26:11,345 integration. 398 00:26:18,680 --> 00:26:23,810 So basically there, rather than having to deal with a sum, 399 00:26:23,810 --> 00:26:24,810 I deal with an integral. 400 00:26:27,380 --> 00:26:33,930 The integration is over some variable, let's say x. 401 00:26:33,930 --> 00:26:37,160 Could be energy, whatever. 402 00:26:37,160 --> 00:26:42,120 And then I have a quantity that has this exponential character. 403 00:26:47,780 --> 00:26:51,170 And then again, in some specific sense, 404 00:26:51,170 --> 00:26:57,250 I can just look at the largest value and replace this with e 405 00:26:57,250 --> 00:27:01,186 to the N phi evaluated at x max. 406 00:27:01,186 --> 00:27:03,221 I should really write this as a proportionality, 407 00:27:03,221 --> 00:27:08,490 but we'll see what that means shortly. 408 00:27:08,490 --> 00:27:13,740 So basically it's the above picture, 409 00:27:13,740 --> 00:27:17,880 I have a continuous variable. 410 00:27:17,880 --> 00:27:22,770 And this continuous variable, let's 411 00:27:22,770 --> 00:27:26,680 say I have to sum a quantity that is e to the N phi. 412 00:27:26,680 --> 00:27:32,375 So maybe I will have to not sum, but integrate over 413 00:27:32,375 --> 00:27:34,380 a function such as this. 414 00:27:34,380 --> 00:27:38,280 And let's say this is the place where the maximums occur. 415 00:27:42,310 --> 00:27:46,620 So the procedure of saddle point is 416 00:27:46,620 --> 00:27:56,285 to expand phi around its maximum. 417 00:27:59,290 --> 00:28:07,690 And then I can write i as an integral over x, exponential 418 00:28:07,690 --> 00:28:16,040 of N, phi evaluated at the maximum. 419 00:28:16,040 --> 00:28:18,080 Now if I'm doing a Taylor series, 420 00:28:18,080 --> 00:28:19,810 then next term in the Taylor series 421 00:28:19,810 --> 00:28:23,000 typically would involve the first derivative. 422 00:28:23,000 --> 00:28:26,500 But around the maximum, the first derivative is 0. 423 00:28:26,500 --> 00:28:31,960 Again if it is a maximum, the second derivative phi 424 00:28:31,960 --> 00:28:36,270 double prime evaluated at this xm, would be negative. 425 00:28:36,270 --> 00:28:39,885 And that's why I indicate it in this fashion. 426 00:28:39,885 --> 00:28:44,870 To sort of emphasize that it is a negative thing, x minus xm 427 00:28:44,870 --> 00:28:46,630 squared. 428 00:28:46,630 --> 00:28:50,280 And then I would have higher order terms, N minus xm cubed, 429 00:28:50,280 --> 00:28:51,850 et cetera. 430 00:28:51,850 --> 00:28:56,210 Actually what I will do is I will expand all of those things 431 00:28:56,210 --> 00:28:57,010 separately. 432 00:28:57,010 --> 00:29:03,910 So I have e to the minus N over 6 phi triple prime. 433 00:29:03,910 --> 00:29:10,110 N plus N over 6 phi triple prime, evaluated at xm, 434 00:29:10,110 --> 00:29:16,330 x minus xm cubed, and then the fourth order term and so forth. 435 00:29:16,330 --> 00:29:19,180 So basically there is a series such as this 436 00:29:19,180 --> 00:29:21,700 that I would have to look at. 437 00:29:32,010 --> 00:29:34,850 So the first term you can take outside the integral. 438 00:29:41,680 --> 00:29:46,650 And the integration against the one of this 439 00:29:46,650 --> 00:29:49,340 is simply a Gaussian. 440 00:29:49,340 --> 00:29:53,220 So what I would get is square root 441 00:29:53,220 --> 00:30:00,500 of 2 pi divided by the variance, which is N phi double prime. 442 00:30:05,150 --> 00:30:09,420 So that's the first term I have taken care of. 443 00:30:09,420 --> 00:30:13,090 Now the next term actually the way 444 00:30:13,090 --> 00:30:16,920 that I have it, since I'm expanding something that 445 00:30:16,920 --> 00:30:21,930 is third order around a potential that is symmetric. 446 00:30:21,930 --> 00:30:23,900 That would give me 0. 447 00:30:23,900 --> 00:30:28,210 The next order term, which is x minus xm to the fourth power, 448 00:30:28,210 --> 00:30:31,540 you already know how to calculate averages 449 00:30:31,540 --> 00:30:37,060 of various powers with the Gaussian using Wick's Theorem. 450 00:30:37,060 --> 00:30:40,620 And it would be related to essentially 451 00:30:40,620 --> 00:30:42,420 to the square of the variance. 452 00:30:42,420 --> 00:30:46,090 The square of the variance would be essentially the square 453 00:30:46,090 --> 00:30:48,070 of this quantity out here. 454 00:30:48,070 --> 00:30:56,830 So I will get a correction that is order of 1 over N. 455 00:30:56,830 --> 00:30:59,500 So if you have sufficient energy, 456 00:30:59,500 --> 00:31:02,510 you can actually numerically calculate what this is 457 00:31:02,510 --> 00:31:06,320 and the higher order terms, et cetera. 458 00:31:06,320 --> 00:31:07,286 Yes. 459 00:31:07,286 --> 00:31:09,150 AUDIENCE: Could you, briefly remind 460 00:31:09,150 --> 00:31:12,890 what the second term in the bracket means? 461 00:31:12,890 --> 00:31:14,264 PROFESSOR: This? 462 00:31:14,264 --> 00:31:15,258 This? 463 00:31:15,258 --> 00:31:17,743 AUDIENCE: The whole thing, on the second bracket. 464 00:31:17,743 --> 00:31:23,980 PROFESSOR: In the numerator, I would have N phi m, N phi 465 00:31:23,980 --> 00:31:25,020 prime. 466 00:31:25,020 --> 00:31:28,720 Let's call the deviation y y. 467 00:31:28,720 --> 00:31:32,930 But phi prime is 0 around the maximum. 468 00:31:32,930 --> 00:31:34,850 So the next order term will be phi 469 00:31:34,850 --> 00:31:37,760 double prime y squared over 2. 470 00:31:37,760 --> 00:31:43,700 The next order term will be phi triple prime y cubed over 6. 471 00:31:43,700 --> 00:31:50,630 e to the minus N phi triple prime y cubed over 6, 472 00:31:50,630 --> 00:31:59,150 I can expand as 1 minus N phi triple prime y cubed over 6, 473 00:31:59,150 --> 00:32:00,770 which is what this is. 474 00:32:00,770 --> 00:32:03,530 And then you can go and do that with all of the other terms. 475 00:32:09,520 --> 00:32:10,111 Yes. 476 00:32:10,111 --> 00:32:11,610 AUDIENCE: Isn't it then you can also 477 00:32:11,610 --> 00:32:13,795 expand as N the local maximum? 478 00:32:13,795 --> 00:32:14,670 PROFESSOR: Excellent. 479 00:32:14,670 --> 00:32:15,430 Good. 480 00:32:15,430 --> 00:32:19,150 So you are saying, why didn't I expand around this maximum, 481 00:32:19,150 --> 00:32:20,450 around this maximum. 482 00:32:20,450 --> 00:32:22,610 So let's do that. 483 00:32:22,610 --> 00:32:26,040 xm prime xm double prime. 484 00:32:26,040 --> 00:32:30,220 So I would have a series around the other maxima. 485 00:32:30,220 --> 00:32:37,030 So the next one would be N to the phi of xm prime, root 2 486 00:32:37,030 --> 00:32:42,690 pi N phi double prime at xm prime. 487 00:32:42,690 --> 00:32:47,910 And then one plus order of 1 over N And then the next one, 488 00:32:47,910 --> 00:32:48,570 and so forth. 489 00:32:51,860 --> 00:32:57,920 Now we are interested in the limit where N goes to infinity. 490 00:32:57,920 --> 00:33:02,000 Or N is much, much larger than 1. 491 00:33:02,000 --> 00:33:05,190 In the limit where N is much larger than 1, 492 00:33:05,190 --> 00:33:09,220 Let's imagine that these two phi's 493 00:33:09,220 --> 00:33:13,560 if I were to plot not e to the phi but phi itself. 494 00:33:13,560 --> 00:33:15,720 Let's imagine that these two phi's are 495 00:33:15,720 --> 00:33:20,740 different by I don't know, 0.1, 10 to the minus 4. 496 00:33:20,740 --> 00:33:21,510 It doesn't matter. 497 00:33:21,510 --> 00:33:26,560 I'm multiplying two things with N, 498 00:33:26,560 --> 00:33:29,540 and then I'm comparing two exponentials. 499 00:33:29,540 --> 00:33:35,080 So if this maximum was at 1, I would have here e to the N. 500 00:33:35,080 --> 00:33:39,131 If this one was at 1 minus epsilon, 501 00:33:39,131 --> 00:33:43,440 over here I would have e to the N minus N epsilon. 502 00:33:43,440 --> 00:33:47,660 And so I can always ignore this compared to that. 503 00:33:51,050 --> 00:33:54,480 And so basically, this is the leading term. 504 00:33:54,480 --> 00:34:02,190 And if I were to take its log and divide by N, what do I get? 505 00:34:02,190 --> 00:34:06,280 I will get phi of xm. 506 00:34:06,280 --> 00:34:09,199 And then I would get from this something 507 00:34:09,199 --> 00:34:19,000 like minus 1/2 log of N phi double prime xm over 2 pi. 508 00:34:19,000 --> 00:34:23,570 And I divided by N, so this is 1 over N. 509 00:34:23,570 --> 00:34:26,600 And the next term would be order of 1 over N squared. 510 00:34:32,380 --> 00:34:37,210 So systematically, in the large N limit, 511 00:34:37,210 --> 00:34:39,880 there is a series for the quantity log 512 00:34:39,880 --> 00:34:44,679 i divided by N that starts with phi of xm. 513 00:34:44,679 --> 00:34:48,380 And then subsequent terms to it, you can calculate. 514 00:34:48,380 --> 00:34:52,699 Actually I was kind of hesitant in writing 515 00:34:52,699 --> 00:34:55,020 this as asymptotically equal because you 516 00:34:55,020 --> 00:34:58,530 may have worried about the dimensions. 517 00:34:58,530 --> 00:35:02,050 There should be something that has dimensions of x here. 518 00:35:02,050 --> 00:35:04,910 Now when I take the log it doesn't matter that much. 519 00:35:04,910 --> 00:35:07,470 But the dimension appears over here. 520 00:35:07,470 --> 00:35:10,880 It's really the size of the interval that contributes which 521 00:35:10,880 --> 00:35:13,250 is of the order of N to the 1/2. 522 00:35:13,250 --> 00:35:15,265 And that's where the log N comes. 523 00:35:35,490 --> 00:35:35,990 Questions? 524 00:35:44,630 --> 00:35:50,710 Now let me do one example of this because we will need it. 525 00:35:53,430 --> 00:35:56,920 We can easily show that N factorial 526 00:35:56,920 --> 00:36:07,840 you can write as 0 to infinity dx x to N, e to the minus x. 527 00:36:07,840 --> 00:36:10,380 And if you don't believe this, you 528 00:36:10,380 --> 00:36:16,705 can start with the integral 0 to infinity of dx 529 00:36:16,705 --> 00:36:21,790 e to the minus alpha x being one over alpha 530 00:36:21,790 --> 00:36:25,200 and taking many derivatives. 531 00:36:25,200 --> 00:36:32,520 If you take N derivatives on this side, 532 00:36:32,520 --> 00:36:38,360 you would have 0 to N dx x to the N, e to the minus alpha x, 533 00:36:38,360 --> 00:36:41,670 because every time, you bring down a factor of x. 534 00:36:41,670 --> 00:36:44,920 On the other side, if you take derivatives, 1 over alpha 535 00:36:44,920 --> 00:36:46,760 becomes 1 over alpha squared, then 536 00:36:46,760 --> 00:36:50,090 goes to 2 over alpha cubed, then go c over alpha to the fourth. 537 00:36:50,090 --> 00:36:55,280 So basically we will N factorial alpha to the N plus 1. 538 00:36:55,280 --> 00:36:59,380 So I just set alpha equals to 1. 539 00:36:59,380 --> 00:37:05,760 Now if you look at the thing that I have to integrate, 540 00:37:05,760 --> 00:37:11,630 it is something that has a function of x, 541 00:37:11,630 --> 00:37:17,220 the quantity that I should integrate starts as x to the N, 542 00:37:17,220 --> 00:37:19,890 and then decays exponentially. 543 00:37:19,890 --> 00:37:23,562 So over here, I have x to the N. Out here I 544 00:37:23,562 --> 00:37:25,475 have e to the minus x. 545 00:37:25,475 --> 00:37:30,890 It is not quite of the form that I had before. 546 00:37:30,890 --> 00:37:35,280 Part of it is proportional to N in the exponent, part of it 547 00:37:35,280 --> 00:37:36,210 is not. 548 00:37:36,210 --> 00:37:38,910 But you can still use exactly the saddle point 549 00:37:38,910 --> 00:37:41,590 approach for even this function. 550 00:37:41,590 --> 00:37:43,610 And so that's what we will do. 551 00:37:43,610 --> 00:37:48,700 I will write this as integral 0 to infinity dx e 552 00:37:48,700 --> 00:37:57,220 to some function of x where this function of x is N 553 00:37:57,220 --> 00:37:59,225 log x minus x. 554 00:38:01,840 --> 00:38:05,830 And then I will follow that procedure despite this is not 555 00:38:05,830 --> 00:38:09,360 being quite entirely proportional to N. 556 00:38:09,360 --> 00:38:13,820 I will find its maximum by setting phi prime to 0. 557 00:38:13,820 --> 00:38:18,300 phi prime is N over x minus 1. 558 00:38:18,300 --> 00:38:22,580 So clearly, phi prime to 0 will give me 559 00:38:22,580 --> 00:38:29,880 that x max is N. So the location of this maximum 560 00:38:29,880 --> 00:38:35,890 that I have is in fact N. 561 00:38:35,890 --> 00:38:40,390 And the second derivative, phi double prime, 562 00:38:40,390 --> 00:38:50,020 is minus N over x squared, which if I evaluate at the maximum, 563 00:38:50,020 --> 00:38:55,480 is going to be minus 1 over N. Because the maximum occurs 564 00:38:55,480 --> 00:38:57,360 at the N. 565 00:38:57,360 --> 00:39:02,590 So if I'm were to make a saddle point expansion of this, 566 00:39:02,590 --> 00:39:10,256 I would say that N factorial is integral 0 to infinity, dx 567 00:39:10,256 --> 00:39:18,940 e to the phi evaluated at x max, which is N log N minus N. 568 00:39:18,940 --> 00:39:20,740 First derivative is 0. 569 00:39:20,740 --> 00:39:23,950 The second derivative will give me minus 1 570 00:39:23,950 --> 00:39:28,260 over N with a factor of 2 because I'm 571 00:39:28,260 --> 00:39:30,780 expanding second order. 572 00:39:30,780 --> 00:39:35,500 And then I have x minus this location 573 00:39:35,500 --> 00:39:37,520 of the maximum squared. 574 00:39:37,520 --> 00:39:40,020 And there would be higher order terms from the higher order 575 00:39:40,020 --> 00:39:40,520 derivatives. 576 00:39:43,790 --> 00:39:50,880 So I can clearly take e to the N log N minus N out front. 577 00:39:50,880 --> 00:39:53,960 And then the integration that I have 578 00:39:53,960 --> 00:39:57,930 is just a standard Gaussian with a variance that 579 00:39:57,930 --> 00:40:03,790 is just proportional to N. So I would get a root 2 pi N. 580 00:40:03,790 --> 00:40:08,220 And then I would have higher order corrections 581 00:40:08,220 --> 00:40:11,320 that if you are energetic, you can actually calculate. 582 00:40:11,320 --> 00:40:13,460 It's not that difficult. 583 00:40:13,460 --> 00:40:26,890 So you get this Stirling's Formula that limit of large N, 584 00:40:26,890 --> 00:40:34,600 let's do log of N factorial is N log N minus N. And if you want, 585 00:40:34,600 --> 00:40:41,320 you can go one step further, and you have 1/2 log of 2 pi N. 586 00:40:41,320 --> 00:41:14,572 And the next order term would be order of 1/N. 587 00:41:14,572 --> 00:41:15,155 Any questions? 588 00:41:20,240 --> 00:41:21,210 OK? 589 00:41:21,210 --> 00:41:24,530 Where do I need to use this? 590 00:41:24,530 --> 00:41:33,910 Next part, we are going to talk about entropy, information, 591 00:41:33,910 --> 00:41:34,560 and estimation. 592 00:41:45,600 --> 00:41:50,100 So the first four topics of the course 593 00:41:50,100 --> 00:41:55,910 thermodynamics, probability, this kinetic theory of gases, 594 00:41:55,910 --> 00:41:58,560 and basic of statistical physics. 595 00:41:58,560 --> 00:42:03,960 In each one of them, you will define some version of entropy. 596 00:42:03,960 --> 00:42:06,220 We already saw the thermodynamic one 597 00:42:06,220 --> 00:42:11,420 as dQ divided by T meaning dS. 598 00:42:11,420 --> 00:42:14,120 Now just thinking about probability 599 00:42:14,120 --> 00:42:17,520 will also enable you to define some form of entropy. 600 00:42:17,520 --> 00:42:21,000 So let's see how we go about it. 601 00:42:21,000 --> 00:42:26,230 So also information, what does that mean? 602 00:42:26,230 --> 00:42:29,920 It goes back to work off Shannon. 603 00:42:29,920 --> 00:42:33,650 And the idea is as follows, suppose 604 00:42:33,650 --> 00:42:36,876 you want to send a message of N characters. 605 00:42:48,200 --> 00:42:53,560 The characters themselves are taken 606 00:42:53,560 --> 00:42:55,670 from some kind of alphabet, if you 607 00:42:55,670 --> 00:43:08,365 like, x1 through xM that has M characters. 608 00:43:12,890 --> 00:43:15,720 So, for example if you're sending 609 00:43:15,720 --> 00:43:17,400 a message in English language, you 610 00:43:17,400 --> 00:43:19,640 would be using the letters A through Z. 611 00:43:19,640 --> 00:43:21,720 So you have M off 26. 612 00:43:24,370 --> 00:43:26,450 Maybe if you want to include space, 613 00:43:26,450 --> 00:43:28,783 punctuation, it would be larger than that. 614 00:43:31,850 --> 00:43:37,580 But let's say if you're dealing with English language, 615 00:43:37,580 --> 00:43:40,730 the probabilities of the different characters 616 00:43:40,730 --> 00:43:42,260 are not the same. 617 00:43:42,260 --> 00:43:46,510 So S and P, you are going to encounter much more frequently 618 00:43:46,510 --> 00:43:53,820 than, say, Z or X. So let's say that the frequencies with which 619 00:43:53,820 --> 00:43:57,980 we expect these characters to occur are things like P1 620 00:43:57,980 --> 00:44:00,226 through PM. 621 00:44:00,226 --> 00:44:00,725 OK? 622 00:44:03,680 --> 00:44:09,680 Now how many possible messages are there? 623 00:44:09,680 --> 00:44:18,560 So number of possible messages that's are composed 624 00:44:18,560 --> 00:44:27,650 of N occurrences of alphabet of M letters you would say 625 00:44:27,650 --> 00:44:36,130 is M to the N. Now, Shannon was sort of concerned with sending 626 00:44:36,130 --> 00:44:39,090 the information about this message, 627 00:44:39,090 --> 00:44:44,780 let's say, over a line where you have converted it 628 00:44:44,780 --> 00:44:46,830 to, say, a binary code. 629 00:44:46,830 --> 00:44:51,710 And then you would say that the number of bits 630 00:44:51,710 --> 00:45:02,900 that would correspond to M to the N is the N log base 2 of M. 631 00:45:02,900 --> 00:45:09,850 That is, if you really had the simpler case 632 00:45:09,850 --> 00:45:15,680 where your selections was just head or tail, it was binary. 633 00:45:15,680 --> 00:45:18,090 And you wanted to send to somebody 634 00:45:18,090 --> 00:45:23,230 else the outcome of 500 throws of a coin. 635 00:45:23,230 --> 00:45:27,990 It would be a sequence of 500 0's and 1's corresponding 636 00:45:27,990 --> 00:45:29,280 to head or tails. 637 00:45:29,280 --> 00:45:35,130 So you would have to send for the binary case, one 638 00:45:35,130 --> 00:45:37,560 bit per outcome. 639 00:45:37,560 --> 00:45:39,830 If it is something like a base of DNA 640 00:45:39,830 --> 00:45:43,910 and there are four things, you would have two per base. 641 00:45:43,910 --> 00:45:48,030 So that would be log 4 base 2. 642 00:45:48,030 --> 00:45:53,470 And for English, it would be log 26 or whatever 643 00:45:53,470 --> 00:45:56,020 the appropriate number is with punctuation-- maybe 644 00:45:56,020 --> 00:46:00,138 comes to 32-- possible characters than five 645 00:46:00,138 --> 00:46:02,380 per [? element ?]. 646 00:46:02,380 --> 00:46:04,150 OK. 647 00:46:04,150 --> 00:46:08,430 But you know that if you sort of were 648 00:46:08,430 --> 00:46:13,290 to look at all possible messages, most of them 649 00:46:13,290 --> 00:46:14,910 would be junk. 650 00:46:14,910 --> 00:46:18,710 And in particular, if you had used this simple substitution 651 00:46:18,710 --> 00:46:23,000 code, for example, to mix up your message, 652 00:46:23,000 --> 00:46:25,990 you replaced A by something else, et cetera, 653 00:46:25,990 --> 00:46:30,310 the frequencies would be preserved. 654 00:46:30,310 --> 00:46:34,770 So sort of clearly a nice way to decode this substitution code, 655 00:46:34,770 --> 00:46:36,940 if you have a long enough text, you sort of 656 00:46:36,940 --> 00:46:39,420 look at how many repetitions they are 657 00:46:39,420 --> 00:46:41,920 and match them with their frequencies 658 00:46:41,920 --> 00:46:45,010 that you expect for a real language. 659 00:46:45,010 --> 00:46:49,600 So the number of possible messages-- 660 00:46:49,600 --> 00:46:58,400 So in a typical message, what you 661 00:46:58,400 --> 00:47:15,955 expect Ni, which is Pi N occurrences, of xi. 662 00:47:19,070 --> 00:47:24,840 So if you know for example, what the frequencies of the letters 663 00:47:24,840 --> 00:47:28,070 in the alphabet are, in a long enough message, 664 00:47:28,070 --> 00:47:32,080 you expect that typically you would get that number. 665 00:47:32,080 --> 00:47:33,670 Of course, what that really means 666 00:47:33,670 --> 00:47:35,870 is that you're going to get correction 667 00:47:35,870 --> 00:47:38,890 because not all messages are the same. 668 00:47:38,890 --> 00:47:41,860 But the deviation that you would get 669 00:47:41,860 --> 00:47:46,810 from getting something that is proportional to the probability 670 00:47:46,810 --> 00:47:50,260 through the frequency in the limit of a very long message 671 00:47:50,260 --> 00:47:54,620 would be of the order of N to the 1/2. 672 00:47:54,620 --> 00:47:57,780 So ignoring this N to the 1/2, you 673 00:47:57,780 --> 00:48:00,420 would say that the typical message that I expect 674 00:48:00,420 --> 00:48:06,540 to receive will have characters according to these proportions. 675 00:48:06,540 --> 00:48:09,070 So if I asked the following question, 676 00:48:09,070 --> 00:48:13,360 not what are the number of all possible messages, 677 00:48:13,360 --> 00:48:15,625 but what is the number of typical messages? 678 00:48:21,980 --> 00:48:24,400 I will call that g. 679 00:48:24,400 --> 00:48:27,040 The number of typical messages would 680 00:48:27,040 --> 00:48:33,080 be always of distributing these number of characters 681 00:48:33,080 --> 00:48:37,250 in a message of length N. Again there are clearly correlations. 682 00:48:37,250 --> 00:48:39,149 But for the time being, forgetting all 683 00:48:39,149 --> 00:48:41,440 of the correlations, if [? we ?] [? do ?] correlations, 684 00:48:41,440 --> 00:48:42,870 we only reduce this number. 685 00:48:58,870 --> 00:49:30,120 So this number is much, much less time M to the N. 686 00:49:30,120 --> 00:49:34,680 Now here is I'm going to make an excursion to so far everything 687 00:49:34,680 --> 00:49:35,400 was clear. 688 00:49:35,400 --> 00:49:39,480 Now I'm going to say something that is kind of theoretically 689 00:49:39,480 --> 00:49:43,210 correct, but practically not so much. 690 00:49:43,210 --> 00:49:46,010 You could, for example, have some way 691 00:49:46,010 --> 00:49:50,640 of labeling all possible typical messages. 692 00:49:50,640 --> 00:49:54,440 So you would have-- this would be typical message number 693 00:49:54,440 --> 00:49:58,970 one, number two, all the way to typical message number g. 694 00:49:58,970 --> 00:50:01,310 This is the number of typical message. 695 00:50:01,310 --> 00:50:04,990 Suppose I could point out to one of these messages and say, 696 00:50:04,990 --> 00:50:08,420 this is the message that was actually sent. 697 00:50:08,420 --> 00:50:11,200 How many bits of information would I 698 00:50:11,200 --> 00:50:17,530 have to that indicate one number out of g? 699 00:50:17,530 --> 00:50:29,620 The number of bits of information 700 00:50:29,620 --> 00:50:39,820 for a typical message, rather than being this object, 701 00:50:39,820 --> 00:50:41,260 would simply be log g. 702 00:50:49,240 --> 00:50:50,980 So let's see what this log g is. 703 00:50:50,980 --> 00:50:53,960 And for the time being, let's forget the basis. 704 00:50:53,960 --> 00:50:56,420 I can always change basis by dividing 705 00:50:56,420 --> 00:51:00,790 by log of whatever quantity I'm looking at the basis. 706 00:51:00,790 --> 00:51:12,024 This is they log of N factorial divided by these product over i 707 00:51:12,024 --> 00:51:18,890 of Ni factorials which are these Pi N's. 708 00:51:18,890 --> 00:51:23,990 And in the limit of large N, what I can use 709 00:51:23,990 --> 00:51:27,390 is the Stirling's Formula that we had over there. 710 00:51:27,390 --> 00:51:32,390 So what I have is N log N minus N in the numerator. 711 00:51:36,320 --> 00:51:44,664 Minus sum over i Ni log of Ni minus Ni. 712 00:51:48,840 --> 00:51:53,370 Of course the sum over Ni's cancels this N, 713 00:51:53,370 --> 00:51:56,540 so I don't need to worry about that. 714 00:51:56,540 --> 00:51:59,240 And I can rearrange this. 715 00:51:59,240 --> 00:52:05,860 I can write this as this N as sum over i Ni. 716 00:52:05,860 --> 00:52:09,040 Put the terms that are proportional to Ni together. 717 00:52:09,040 --> 00:52:12,340 You can see that I get Ni log of Ni 718 00:52:12,340 --> 00:52:14,780 over N, which would be log of Pi. 719 00:52:14,780 --> 00:52:20,180 And I can actually then take out a factor of N, 720 00:52:20,180 --> 00:52:25,438 and write it as sum over i Pi log of Pi. 721 00:52:39,140 --> 00:52:41,930 And just as a excursion, this is something 722 00:52:41,930 --> 00:52:45,770 that you've already seen hopefully. 723 00:52:45,770 --> 00:52:48,555 This is also called mixing entropy. 724 00:52:57,400 --> 00:53:00,710 And we will see it later on, also. 725 00:53:00,710 --> 00:53:06,350 That is, if I had initially a bunch of, 726 00:53:06,350 --> 00:53:10,130 let's say, things that were of color red, 727 00:53:10,130 --> 00:53:12,820 and separately in a box a bunch of things 728 00:53:12,820 --> 00:53:15,980 that are color green, and then bunch of things that 729 00:53:15,980 --> 00:53:23,470 are a different color, and I knew initially 730 00:53:23,470 --> 00:53:26,610 where they were in each separate box, 731 00:53:26,610 --> 00:53:30,360 and I then mix them up together so that they're 732 00:53:30,360 --> 00:53:33,510 putting all possible random ways, 733 00:53:33,510 --> 00:53:36,740 and I don't know which is where, I 734 00:53:36,740 --> 00:53:42,430 have done something that is irreversible. 735 00:53:42,430 --> 00:53:45,390 It is very easy to take these boxes of marbles 736 00:53:45,390 --> 00:53:48,470 of different colors and mix them up. 737 00:53:48,470 --> 00:53:51,960 You have to do more work to separate them out. 738 00:53:51,960 --> 00:53:56,580 And so this increase in entropy is 739 00:53:56,580 --> 00:54:00,090 given by precisely the same formula here. 740 00:54:00,090 --> 00:54:03,840 And it's called the mixing entropy. 741 00:54:03,840 --> 00:54:07,850 So what we can see now that we sort of 742 00:54:07,850 --> 00:54:10,820 rather than thinking of these as particles, 743 00:54:10,820 --> 00:54:12,820 we were thinking of these as letters. 744 00:54:12,820 --> 00:54:15,620 And then we mixed up the letters in all possible ways 745 00:54:15,620 --> 00:54:17,560 to make our messages. 746 00:54:17,560 --> 00:54:28,850 But quite generally for any discrete probability, 747 00:54:28,850 --> 00:54:35,920 so a probability that has a set of possible outcomes Pi, 748 00:54:35,920 --> 00:54:49,180 we can define an entropy S associated 749 00:54:49,180 --> 00:54:55,160 with these set of probabilities, which is given by this formula. 750 00:54:55,160 --> 00:55:00,160 Minus sum over i Pi log of Pi. 751 00:55:00,160 --> 00:55:03,680 If you like, it is also this-- not quite, doesn't makes 752 00:55:03,680 --> 00:55:14,190 sense-- but it's some kind of an average of log P. 753 00:55:14,190 --> 00:55:18,170 So anytime we see a discrete probability, 754 00:55:18,170 --> 00:55:20,950 we can certainly do that. 755 00:55:20,950 --> 00:55:25,750 It turns out that also we will encounter in cases later on, 756 00:55:25,750 --> 00:55:30,350 where rather than having a discrete probability, 757 00:55:30,350 --> 00:55:34,140 we have a probability density function. 758 00:55:34,140 --> 00:55:40,400 And we would be very tempted to define an entropy associated 759 00:55:40,400 --> 00:55:47,646 with a PDF to be something like minus an integral dx 760 00:55:47,646 --> 00:55:50,020 P of x log of P of x. 761 00:55:53,060 --> 00:55:56,420 But this is kind of undefined. 762 00:55:56,420 --> 00:56:01,790 Because probability density depends on some quantity 763 00:56:01,790 --> 00:56:03,910 x that has units. 764 00:56:03,910 --> 00:56:06,660 If this was probability along a line, 765 00:56:06,660 --> 00:56:11,190 and I changed my units from meters to centimeters, 766 00:56:11,190 --> 00:56:14,870 then this log will gain a factor that 767 00:56:14,870 --> 00:56:18,080 will be associated with the change in scale 768 00:56:18,080 --> 00:56:21,145 So this is kind of undefined. 769 00:56:24,200 --> 00:56:28,910 One of the miracles of statistical physics 770 00:56:28,910 --> 00:56:33,060 is that we will find the exact measure 771 00:56:33,060 --> 00:56:35,840 to make this probability in the continuum 772 00:56:35,840 --> 00:56:41,840 unique and independent of the choice of-- I mean, 773 00:56:41,840 --> 00:56:44,710 there is a very precise choice of units 774 00:56:44,710 --> 00:56:47,310 for measuring things that would make this well-defined. 775 00:56:47,310 --> 00:56:47,810 Yes. 776 00:56:47,810 --> 00:56:51,198 AUDIENCE: But that would be undefined up to some sort of 777 00:56:51,198 --> 00:56:51,698 [INAUDIBLE]. 778 00:56:51,698 --> 00:56:52,670 PROFESSOR: After you [INAUDIBLE]. 779 00:56:52,670 --> 00:56:54,620 AUDIENCE: So you can still extract dependencies from it. 780 00:56:54,620 --> 00:56:56,660 PROFESSOR: You can still calculate things 781 00:56:56,660 --> 00:56:58,440 like differences, et cetera. 782 00:56:58,440 --> 00:57:01,495 But there is a certain lack of definition. 783 00:57:06,136 --> 00:57:06,635 Yes. 784 00:57:06,635 --> 00:57:09,012 AUDIENCE: [INAUDIBLE] the relation between this entropy 785 00:57:09,012 --> 00:57:12,160 defined here with the entropy defined earlier, 786 00:57:12,160 --> 00:57:15,210 you notice the parallel. 787 00:57:15,210 --> 00:57:17,360 PROFESSOR: We find that all you have to do 788 00:57:17,360 --> 00:57:20,030 is to multiply by a Boltzmann factor, 789 00:57:20,030 --> 00:57:23,330 and they would become identical. 790 00:57:23,330 --> 00:57:24,780 So we will see that. 791 00:57:24,780 --> 00:57:30,570 It turns out that the heat definition of entropy, 792 00:57:30,570 --> 00:57:32,990 once you look at the right variables 793 00:57:32,990 --> 00:57:37,770 to define probability with, then the entropy of a probability 794 00:57:37,770 --> 00:57:39,770 distribution is exactly the entropy 795 00:57:39,770 --> 00:57:42,450 that comes from the heat calculation. 796 00:57:42,450 --> 00:57:46,612 So up to here, there is a measured numerical constant 797 00:57:46,612 --> 00:57:47,570 that we have to define. 798 00:57:59,290 --> 00:58:00,420 All right. 799 00:58:00,420 --> 00:58:04,060 But what does this have to do with this Shannon story? 800 00:58:20,910 --> 00:58:27,228 Going back to the story, if I didn't know the probabilities, 801 00:58:27,228 --> 00:58:30,880 if I didn't know this, I would say 802 00:58:30,880 --> 00:58:36,120 that I need to pass on this amount of information. 803 00:58:36,120 --> 00:58:40,050 But if I somehow constructed the right scheme, 804 00:58:40,050 --> 00:58:43,020 and the person that I'm sending the message 805 00:58:43,020 --> 00:58:46,940 knows the probabilities, then I need 806 00:58:46,940 --> 00:58:52,570 to send this amount of information, which is actually 807 00:58:52,570 --> 00:58:56,560 less than N log M. 808 00:58:56,560 --> 00:59:01,170 So clearly having knowledge of the probabilities 809 00:59:01,170 --> 00:59:05,840 gives you some ability, some amount of information, 810 00:59:05,840 --> 00:59:09,980 so that you have to send less bits. 811 00:59:09,980 --> 00:59:11,370 OK. 812 00:59:11,370 --> 00:59:32,440 So the reduction in number of bits due to knowledge of P 813 00:59:32,440 --> 00:59:39,880 is the difference between N log M, which I had to do before, 814 00:59:39,880 --> 00:59:43,430 and what I have to do now, which is 815 00:59:43,430 --> 00:59:50,000 N Pi sum over i Pi log of Pi. 816 00:59:55,500 --> 01:00:08,710 So which is N log M plus sum over i Pi log of Pi. 817 01:00:08,710 --> 01:00:11,830 I can evaluate this in any basis. 818 01:00:11,830 --> 01:00:16,220 If I wanted to really count in terms of the number of bits, 819 01:00:16,220 --> 01:00:21,620 I would do both of these things in log base 2. 820 01:00:21,620 --> 01:00:26,860 It is clearly something that is proportional to the length 821 01:00:26,860 --> 01:00:27,810 of the message. 822 01:00:27,810 --> 01:00:32,700 That is, if I want to send a book that these twice as big, 823 01:00:32,700 --> 01:00:35,970 the amount of bits will be reduced proportionately 824 01:00:35,970 --> 01:00:37,930 by this amount. 825 01:00:37,930 --> 01:00:39,820 So you can define a quantity that 826 01:00:39,820 --> 01:00:42,163 is basically the information per bit. 827 01:00:45,550 --> 01:00:54,590 And this is given the knowledge of the probabilities, 828 01:00:54,590 --> 01:01:00,670 you really have gained an information per bit 829 01:01:00,670 --> 01:01:06,650 which is the difference of log M and sum over i Pi log Pi. 830 01:01:12,880 --> 01:01:16,725 Up to a sign and this additional factor of log N, 831 01:01:16,725 --> 01:01:21,705 the entropy-- because I can actually get rid of this N-- 832 01:01:21,705 --> 01:01:28,430 the entropy and the information are really the same thing 833 01:01:28,430 --> 01:01:30,890 up to a sign. 834 01:01:30,890 --> 01:01:33,070 And just to sort of make sure that we 835 01:01:33,070 --> 01:01:37,760 understand the appropriate limits. 836 01:01:37,760 --> 01:01:41,720 If I have something like the case 837 01:01:41,720 --> 01:01:46,330 where I have a uniform distribution. 838 01:01:46,330 --> 01:01:51,430 Let's say that I say that all characters in my message 839 01:01:51,430 --> 01:01:54,460 are equally likely to occur. 840 01:01:54,460 --> 01:01:58,180 If it's a coin, it's unbiased coin, it's as likely in a throw 841 01:01:58,180 --> 01:02:00,140 to be head or tail. 842 01:02:00,140 --> 01:02:02,400 You would say that if it's an unbiased coin, 843 01:02:02,400 --> 01:02:06,740 I really should send one bit per throw of the coin. 844 01:02:06,740 --> 01:02:09,720 And indeed, that will follow from this. 845 01:02:09,720 --> 01:02:11,650 Because in this case, you can see 846 01:02:11,650 --> 01:02:14,440 that the information contained is 847 01:02:14,440 --> 01:02:24,200 going to be log M. And then I have plus 1 over M log of 1 848 01:02:24,200 --> 01:02:28,520 over M. And there are M such terms that are uniform. 849 01:02:28,520 --> 01:02:31,240 And this gives me 0. 850 01:02:31,240 --> 01:02:34,460 There is no information here. 851 01:02:34,460 --> 01:02:37,870 If I ask what's the entropy in this case. 852 01:02:37,870 --> 01:02:41,210 The entropy is M terms. 853 01:02:41,210 --> 01:02:44,750 Each one of them have a factor of 1 over M. 854 01:02:44,750 --> 01:02:49,060 And then I have a log of 1 over M. 855 01:02:49,060 --> 01:02:52,400 And there is a minus sign here overall. 856 01:02:56,050 --> 01:03:00,150 So this is log of M. 857 01:03:00,150 --> 01:03:06,310 So you've probably seen this version of the entropy before. 858 01:03:06,310 --> 01:03:09,640 That if you have M equal possibilities, 859 01:03:09,640 --> 01:03:13,610 the entropy is related to log M. This 860 01:03:13,610 --> 01:03:19,380 is the case where all of outcomes are equally likely. 861 01:03:19,380 --> 01:03:23,380 So basically this is a uniform probability. 862 01:03:23,380 --> 01:03:25,540 Everything is equally likely. 863 01:03:25,540 --> 01:03:27,480 You have no information. 864 01:03:27,480 --> 01:03:31,520 You have this maximal possible entropy. 865 01:03:31,520 --> 01:03:35,520 The other extreme of it would be where 866 01:03:35,520 --> 01:03:37,870 you have a definite result. 867 01:03:37,870 --> 01:03:42,320 You have a coin that always gives you heads. 868 01:03:42,320 --> 01:03:44,300 And if the other person knows that, 869 01:03:44,300 --> 01:03:46,360 you don't need to send any information. 870 01:03:46,360 --> 01:03:49,560 No matter thousand times, it will be thousand heads. 871 01:03:49,560 --> 01:03:53,620 So here, Pi is a delta function. 872 01:03:53,620 --> 01:03:57,720 Let's say i equals to five or whatever number is. 873 01:03:57,720 --> 01:04:01,370 So one of the variables in the list 874 01:04:01,370 --> 01:04:02,840 carries all the probability. 875 01:04:02,840 --> 01:04:05,520 All the others carry 0 probability. 876 01:04:05,520 --> 01:04:08,680 How much information do I have here? 877 01:04:08,680 --> 01:04:14,230 I have log M. Now when I go and looked at the list, 878 01:04:14,230 --> 01:04:21,840 in the list, either P is 0, or P is one, but the log of 1 and M 879 01:04:21,840 --> 01:04:23,140 is 0. 880 01:04:23,140 --> 01:04:25,240 So this is basically going to give me 0. 881 01:04:25,240 --> 01:04:28,150 Entropy in this case is 0. 882 01:04:28,150 --> 01:04:29,860 The information is maximum. 883 01:04:29,860 --> 01:04:32,310 You don't need to pass any information. 884 01:04:32,310 --> 01:04:34,230 So anything else is in between. 885 01:04:34,230 --> 01:04:37,720 So you sort of think of a probability that 886 01:04:37,720 --> 01:04:43,050 is some big thing, some small things, et cetera, 887 01:04:43,050 --> 01:04:45,160 you can figure out what its entropy is 888 01:04:45,160 --> 01:04:49,460 and what is information content is. 889 01:04:49,460 --> 01:04:52,560 So actually I don't know the answer. 890 01:04:52,560 --> 01:04:54,720 But presume it's very easy to figure out 891 01:04:54,720 --> 01:04:57,810 what's the information per character 892 01:04:57,810 --> 01:05:00,400 of the text in English language. 893 01:05:00,400 --> 01:05:02,415 Once you know the frequencies of the characters 894 01:05:02,415 --> 01:05:04,993 you can go and calculate this. 895 01:05:10,550 --> 01:05:11,240 Questions. 896 01:05:11,240 --> 01:05:11,790 Yes. 897 01:05:11,790 --> 01:05:13,540 AUDIENCE: Just to clarify the terminology, 898 01:05:13,540 --> 01:05:17,187 so the information means the [INAUDIBLE]? 899 01:05:17,187 --> 01:05:18,770 PROFESSOR: The number of bits that you 900 01:05:18,770 --> 01:05:22,860 have to transmit to the other person. 901 01:05:22,860 --> 01:05:25,410 So the other person knows the probability. 902 01:05:25,410 --> 01:05:28,095 Given that they know the probabilities, 903 01:05:28,095 --> 01:05:30,280 how many fewer bits of information 904 01:05:30,280 --> 01:05:33,160 should I send to them? 905 01:05:33,160 --> 01:05:36,520 So their knowledge corresponds to a gain 906 01:05:36,520 --> 01:05:39,742 in number of bits, which is given by this formula. 907 01:05:47,310 --> 01:05:50,980 If you know that the coin that I'm throwing 908 01:05:50,980 --> 01:05:55,660 is biased so that it always comes heads, 909 01:05:55,660 --> 01:05:59,110 then I don't have to send you any information. 910 01:05:59,110 --> 01:06:01,480 So per every time I throw the coin, 911 01:06:01,480 --> 01:06:02,820 you have one bit of information. 912 01:06:12,940 --> 01:06:15,746 Other questions? 913 01:06:15,746 --> 01:06:21,235 AUDIENCE: The equation, the top equation, so natural 914 01:06:21,235 --> 01:06:25,726 log [INAUDIBLE] natural log of 2, [INAUDIBLE]? 915 01:06:28,720 --> 01:06:34,510 PROFESSOR: I initially calculated my standing formula 916 01:06:34,510 --> 01:06:40,970 as log of N factorial is N log N minus N. 917 01:06:40,970 --> 01:06:44,920 So since I had done everything in natural log, 918 01:06:44,920 --> 01:06:46,650 I maintained that. 919 01:06:46,650 --> 01:06:54,050 And then I used this symbol that log, say, 5 2 920 01:06:54,050 --> 01:06:57,899 is the same thing that maybe are used with this notation. 921 01:06:57,899 --> 01:06:58,440 I don't know. 922 01:07:01,920 --> 01:07:05,790 So if I don't indicate a number here, it's the natural log. 923 01:07:05,790 --> 01:07:07,280 It's base e. 924 01:07:07,280 --> 01:07:13,240 If I put a number so log, let's say, base 2 of 5 925 01:07:13,240 --> 01:07:17,004 is log 5 divided by log 2. 926 01:07:20,378 --> 01:07:22,790 AUDIENCE: So [INAUDIBLE]? 927 01:07:22,790 --> 01:07:24,191 PROFESSOR: Log 2, log 2. 928 01:07:24,191 --> 01:07:24,690 Information. 929 01:07:27,289 --> 01:07:27,830 AUDIENCE: Oh. 930 01:07:30,430 --> 01:07:34,290 PROFESSOR: Or if you like, I could have divided by log 2 931 01:07:34,290 --> 01:07:35,748 here. 932 01:07:35,748 --> 01:07:37,692 AUDIENCE: But so there [INAUDIBLE] 933 01:07:37,692 --> 01:07:40,122 all of the other places, and you just 934 01:07:40,122 --> 01:07:42,552 [? write ?] all this [INAUDIBLE]. 935 01:07:42,552 --> 01:07:46,345 All right, thank you, [? Michael. ?] 936 01:07:46,345 --> 01:07:48,761 PROFESSOR: Right. 937 01:07:48,761 --> 01:07:49,260 Yeah. 938 01:07:49,260 --> 01:07:54,970 So this is the general way to transfer 939 01:07:54,970 --> 01:07:58,440 between log, natural log, and any log. 940 01:07:58,440 --> 01:08:01,340 In the language of electrical engineering, 941 01:08:01,340 --> 01:08:05,110 where Shannon worked, it is common to express everything 942 01:08:05,110 --> 01:08:07,270 in terms of the number of bits. 943 01:08:07,270 --> 01:08:09,110 So whenever I'm expressing things 944 01:08:09,110 --> 01:08:11,300 in terms of the number of bits, I really 945 01:08:11,300 --> 01:08:13,040 should use the log of 2. 946 01:08:13,040 --> 01:08:16,319 So I really, if I want to use information, 947 01:08:16,319 --> 01:08:18,439 I really should use log of 2. 948 01:08:18,439 --> 01:08:21,020 Whereas in statistical physics, we usually 949 01:08:21,020 --> 01:08:24,215 use the natural log in expressing entropy. 950 01:08:24,215 --> 01:08:26,657 AUDIENCE: Oh, so it doesn't really matter [INAUDIBLE]. 951 01:08:26,657 --> 01:08:28,490 PROFESSOR: It's just an overall coefficient. 952 01:08:28,490 --> 01:08:30,710 As I said that eventually, if I want 953 01:08:30,710 --> 01:08:34,130 to calculate to the heat version of the entropy, 954 01:08:34,130 --> 01:08:36,770 I have to multiply by yet another number, which 955 01:08:36,770 --> 01:08:38,470 is the Boltzmann constant. 956 01:08:38,470 --> 01:08:42,410 So really the conceptual part is more 957 01:08:42,410 --> 01:08:45,263 important than the overall numerical factor. 958 01:08:50,979 --> 01:08:51,479 OK? 959 01:09:02,590 --> 01:09:08,167 I had the third item in my list here, which we can finish with, 960 01:09:08,167 --> 01:09:09,000 which is estimation. 961 01:09:20,920 --> 01:09:26,450 So frequently you are faced with the task 962 01:09:26,450 --> 01:09:29,800 of assigning probabilities. 963 01:09:29,800 --> 01:09:32,850 So there's a situation. 964 01:09:32,850 --> 01:09:35,490 You know that there's a number of outcomes. 965 01:09:35,490 --> 01:09:37,399 And you want to assign probabilities 966 01:09:37,399 --> 01:09:39,170 for these outcomes. 967 01:09:39,170 --> 01:09:43,939 And the procedure that we will use 968 01:09:43,939 --> 01:09:46,900 is summarized by the following sentence 969 01:09:46,900 --> 01:09:49,350 that I have to then define. 970 01:09:49,350 --> 01:09:59,420 The most unbiased-- let's actually 971 01:09:59,420 --> 01:10:01,550 just say it's the definition if you like-- 972 01:10:01,550 --> 01:10:14,940 the unbiased assignment of probabilities 973 01:10:14,940 --> 01:10:27,560 maximizes the entropy subject to constraints. 974 01:10:30,690 --> 01:10:31,575 Known constraints. 975 01:10:40,810 --> 01:10:42,990 What do I mean by that? 976 01:10:42,990 --> 01:10:48,080 So suppose I had told you that we are throwing a dice. 977 01:10:48,080 --> 01:10:52,060 Or let's say a coin, but let's go back to the dice. 978 01:10:52,060 --> 01:10:57,210 And the dice has possibilities 1, 2, 3, 4, 5, 6. 979 01:10:57,210 --> 01:11:00,100 And this is the only thing that I know. 980 01:11:00,100 --> 01:11:03,360 So if somebody says that I'm throwing a dice 981 01:11:03,360 --> 01:11:05,030 and you don't know anything else, 982 01:11:05,030 --> 01:11:08,675 there's no reason for you to privilege 6 with respect to 4, 983 01:11:08,675 --> 01:11:10,240 or 3 with respect to 5. 984 01:11:10,240 --> 01:11:14,310 So as far as I know, at this moment in time, all of these 985 01:11:14,310 --> 01:11:16,030 are equally likely. 986 01:11:16,030 --> 01:11:22,160 So I will assign each one of them for probability of 1/6. 987 01:11:22,160 --> 01:11:25,990 But we also saw over here what was happening. 988 01:11:25,990 --> 01:11:28,580 The uniform probability was the one 989 01:11:28,580 --> 01:11:30,770 that had the largest entropy. 990 01:11:30,770 --> 01:11:33,050 If I were to change the probability 991 01:11:33,050 --> 01:11:36,200 so that something goes up and something goes down, 992 01:11:36,200 --> 01:11:37,720 then I calculate that formula. 993 01:11:37,720 --> 01:11:41,050 And I find that the-- sorry-- the uniform 994 01:11:41,050 --> 01:11:42,490 one has the largest entropy. 995 01:11:42,490 --> 01:11:46,290 This has less entropy compared to the uniform one. 996 01:11:46,290 --> 01:11:52,420 So what we have done in assigning uniform probability 997 01:11:52,420 --> 01:11:56,790 is really to maximize the entropy subject to the fact 998 01:11:56,790 --> 01:11:59,260 that I don't know anything except that the probabilities 999 01:11:59,260 --> 01:12:02,420 should add up to 1. 1000 01:12:02,420 --> 01:12:06,230 But now suppose that somebody threw 1001 01:12:06,230 --> 01:12:08,450 the dice many, many times. 1002 01:12:08,450 --> 01:12:11,230 And each time they were throwing the dice, 1003 01:12:11,230 --> 01:12:14,050 they were calculating the number. 1004 01:12:14,050 --> 01:12:16,700 But they didn't give us the number and frequency 1005 01:12:16,700 --> 01:12:20,770 is what they told us was that at the end of many, many run, 1006 01:12:20,770 --> 01:12:27,990 the average number that we were coming up was 3.2, 1007 01:12:27,990 --> 01:12:30,190 4.7, whatever. 1008 01:12:30,190 --> 01:12:33,650 So we know the average of M. 1009 01:12:33,650 --> 01:12:35,820 So I know now some other constraint. 1010 01:12:35,820 --> 01:12:39,810 I've added to the information that I had. 1011 01:12:39,810 --> 01:12:43,650 So if I want to reassign the probabilities given 1012 01:12:43,650 --> 01:12:47,090 that somebody told me that in a large number of runs, 1013 01:12:47,090 --> 01:12:49,790 the average value of the faces that showed up 1014 01:12:49,790 --> 01:12:52,020 was some particular value. 1015 01:12:52,020 --> 01:12:53,380 What do I do? 1016 01:12:53,380 --> 01:13:01,290 I say, well, I maximize S which depends on these Pi's, which 1017 01:13:01,290 --> 01:13:07,190 is minus sum over i Pi log of Pi, subjected 1018 01:13:07,190 --> 01:13:09,730 to constraints that I know. 1019 01:13:09,730 --> 01:13:11,950 Now one constraint you already used 1020 01:13:11,950 --> 01:13:16,070 previously is that the sum of the probabilities 1021 01:13:16,070 --> 01:13:17,863 is equal to 1. 1022 01:13:20,821 --> 01:13:27,440 This I introduce here through a Lagrange multiplier, 1023 01:13:27,440 --> 01:13:34,330 alpha, which I will adjust later to make sure that this holds. 1024 01:13:34,330 --> 01:13:39,710 And in general, what we do if we have multiple constraints is 1025 01:13:39,710 --> 01:13:44,970 we can add more and more Lagrange multipliers. 1026 01:13:44,970 --> 01:13:53,892 And the average of M is sum over, let's say, i Pi. 1027 01:13:53,892 --> 01:13:57,150 So 1 times P of 1, 2 times P of 2, 1028 01:13:57,150 --> 01:14:02,650 et cetera, will give you whatever the average value is. 1029 01:14:02,650 --> 01:14:06,130 So these are the two constraints that I specified for you here. 1030 01:14:06,130 --> 01:14:10,300 There could've been other constraints, et cetera. 1031 01:14:10,300 --> 01:14:15,050 So then, if you have a function with constraint 1032 01:14:15,050 --> 01:14:19,070 that you have to extremize, you add these Lagrange multipliers. 1033 01:14:19,070 --> 01:14:22,084 Then you do dS by dPi. 1034 01:14:22,084 --> 01:14:34,100 Why did I do this? dS by dPi, which is minus log of Pi 1035 01:14:34,100 --> 01:14:35,435 from here. 1036 01:14:35,435 --> 01:14:40,400 Derivative of log P is 1 over P, with this will give me minus 1. 1037 01:14:40,400 --> 01:14:43,740 There is a minus alpha here. 1038 01:14:43,740 --> 01:14:52,770 And then there's a minus beta times i from here. 1039 01:14:52,770 --> 01:14:58,570 And extremizing means I have to set this to 0. 1040 01:14:58,570 --> 01:15:04,770 So you can see that the solution to this is Pi-- 1041 01:15:04,770 --> 01:15:09,720 or actually log of Pi, let's say, is minus 1 1042 01:15:09,720 --> 01:15:15,260 plus alpha minus beta i. 1043 01:15:15,260 --> 01:15:23,660 So that Pi is e to the minus 1 plus alpha 1044 01:15:23,660 --> 01:15:26,220 e to the minus beta times i. 1045 01:15:29,980 --> 01:15:32,480 I haven't completed the story. 1046 01:15:32,480 --> 01:15:36,300 I really have to solve the equations 1047 01:15:36,300 --> 01:15:39,830 in terms of alpha and beta that would give me 1048 01:15:39,830 --> 01:15:44,980 the final results in terms of the expectation value of i 1049 01:15:44,980 --> 01:15:47,790 as well as some other quantities. 1050 01:15:47,790 --> 01:15:51,420 But this is the procedure that you would normally 1051 01:15:51,420 --> 01:15:57,520 use to give you the unbiased assignment of probability. 1052 01:15:57,520 --> 01:16:01,160 Now this actually goes back to what I said at the beginning. 1053 01:16:01,160 --> 01:16:04,940 That there's two ways of assigning probabilities, 1054 01:16:04,940 --> 01:16:08,710 either objectively by actually doing lots of measurement, 1055 01:16:08,710 --> 01:16:09,980 or subjectivity. 1056 01:16:09,980 --> 01:16:12,160 So this is really formalizing what 1057 01:16:12,160 --> 01:16:14,630 this objective procedure means. 1058 01:16:14,630 --> 01:16:16,810 So you put in all of the information 1059 01:16:16,810 --> 01:16:20,680 that you have, the number of states, any constraints. 1060 01:16:20,680 --> 01:16:23,310 And then you maximize entropy that we 1061 01:16:23,310 --> 01:16:29,010 defined what it was to get the best 1062 01:16:29,010 --> 01:16:33,880 maximal entropy for the assignment of probabilities 1063 01:16:33,880 --> 01:16:36,122 consistent with things that you know. 1064 01:16:39,038 --> 01:16:43,350 You probably recognize this form as kind of a Boltzmann weight 1065 01:16:43,350 --> 01:16:46,760 that comes up again and again in statistical physics. 1066 01:16:46,760 --> 01:16:50,570 And that is again natural, because there are constraints, 1067 01:16:50,570 --> 01:16:52,310 such as the average value of energy, 1068 01:16:52,310 --> 01:16:54,240 average value of the number of particles, 1069 01:16:54,240 --> 01:16:59,470 et cetera, that consistent with maximizing their entropy, 1070 01:16:59,470 --> 01:17:01,940 give you forms such as this. 1071 01:17:01,940 --> 01:17:04,850 So you can see that a lot of concepts 1072 01:17:04,850 --> 01:17:09,960 that we will later on be using in statistical physics 1073 01:17:09,960 --> 01:17:14,130 are already embedded in these discussions of probability. 1074 01:17:14,130 --> 01:17:18,170 And we've also seen how the large N aspect comes about, 1075 01:17:18,170 --> 01:17:19,480 et cetera. 1076 01:17:19,480 --> 01:17:22,270 So we now have the probabilistic tools. 1077 01:17:22,270 --> 01:17:26,185 And from next time, we will go on 1078 01:17:26,185 --> 01:17:28,550 to define the degrees of freedom. 1079 01:17:28,550 --> 01:17:33,110 What are the units that we are going to be talking about? 1080 01:17:33,110 --> 01:17:37,370 And how to assign them some kind of a probabilistic picture. 1081 01:17:37,370 --> 01:17:40,180 And then build on into statistical mechanics. 1082 01:17:40,180 --> 01:17:41,069 Yes. 1083 01:17:41,069 --> 01:17:43,065 AUDIENCE: So here, you write the letter i 1084 01:17:43,065 --> 01:17:46,558 to represent, in this case, the results of a random die roll, 1085 01:17:46,558 --> 01:17:49,959 that you can replace it with any function of a random variable. 1086 01:17:49,959 --> 01:17:50,750 PROFESSOR: Exactly. 1087 01:17:50,750 --> 01:17:54,775 So I could have, maybe rather than giving me 1088 01:17:54,775 --> 01:17:58,135 the average value of the number that was appearing on the face, 1089 01:17:58,135 --> 01:18:00,145 they would have given me the average inverse. 1090 01:18:03,870 --> 01:18:05,290 And then I would have had this. 1091 01:18:08,060 --> 01:18:09,560 I could have had multiple things. 1092 01:18:09,560 --> 01:18:12,530 So maybe somebody else measures something else. 1093 01:18:12,530 --> 01:18:14,760 And then my general form would be 1094 01:18:14,760 --> 01:18:19,460 e to the minus beta measurement of type one, 1095 01:18:19,460 --> 01:18:22,980 minus beta 2 measurement of type two, et cetera. 1096 01:18:22,980 --> 01:18:25,975 And the rest of thing over here is clearly just a constant 1097 01:18:25,975 --> 01:18:28,170 of proportionality that I would need 1098 01:18:28,170 --> 01:18:29,698 to adjust for the normalization. 1099 01:18:33,970 --> 01:18:34,950 OK? 1100 01:18:34,950 --> 01:18:38,211 So that's it for today.