1 00:00:00,060 --> 00:00:02,500 The following content is provided under a Creative 2 00:00:02,500 --> 00:00:04,019 Commons license. 3 00:00:04,019 --> 00:00:06,360 Your support will help MIT OpenCourseWare 4 00:00:06,360 --> 00:00:10,730 continue to offer high quality educational resources for free. 5 00:00:10,730 --> 00:00:13,340 To make a donation or view additional materials 6 00:00:13,340 --> 00:00:17,229 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:17,229 --> 00:00:17,854 at ocw.mit.edu. 8 00:00:21,340 --> 00:00:23,080 PROFESSOR: Today we are introducing 9 00:00:23,080 --> 00:00:25,330 an exciting new pledge in 6034. 10 00:00:25,330 --> 00:00:29,280 Anyone who has already looked at any of the neural net problems 11 00:00:29,280 --> 00:00:31,860 will have easily been able to see that even though Patrick 12 00:00:31,860 --> 00:00:34,300 only has them back up to 2006 now, 13 00:00:34,300 --> 00:00:39,500 there's still-- well out of four tests, perhaps two or three 14 00:00:39,500 --> 00:00:42,140 different ways that the neural nets were drawn. 15 00:00:42,140 --> 00:00:44,960 Our exciting new pledge is we're going to draw them 16 00:00:44,960 --> 00:00:47,110 in a particular way this year. 17 00:00:47,110 --> 00:00:52,350 And I will show you which way, assuming that this works. 18 00:00:52,350 --> 00:00:53,820 Yes. 19 00:00:53,820 --> 00:00:57,580 We are going to draw them like the one on the right. 20 00:00:57,580 --> 00:01:01,050 The one on the left is the same as the one on the right. 21 00:01:01,050 --> 00:01:03,460 At first, not having had to explain 22 00:01:03,460 --> 00:01:05,209 the difference between the two of them, 23 00:01:05,209 --> 00:01:07,310 you might think you want the one on the left. 24 00:01:07,310 --> 00:01:09,590 But you really want the one on the right, 25 00:01:09,590 --> 00:01:11,590 and I'll explain why. 26 00:01:11,590 --> 00:01:15,810 The 2007 quiz was drawn, roughly similarly, to this. 27 00:01:15,810 --> 00:01:19,860 Although if you somehow wind up in tutorial or somewhere else 28 00:01:19,860 --> 00:01:21,930 doing one of the older quizzes, a lot of them 29 00:01:21,930 --> 00:01:24,040 were drawn exactly like this. 30 00:01:24,040 --> 00:01:27,650 In this representation, one thing I really don't like, 31 00:01:27,650 --> 00:01:30,900 is that the inputs are called x's, and the outputs 32 00:01:30,900 --> 00:01:37,330 are called y's, but there's two x's, so the inputs are not 33 00:01:37,330 --> 00:01:40,690 x and y, and then they often correspond to x's of a graph, 34 00:01:40,690 --> 00:01:43,720 and then people get confused. 35 00:01:43,720 --> 00:01:46,980 Additional issues that many people have 36 00:01:46,980 --> 00:01:50,880 are the fact that the summation and the multiplication 37 00:01:50,880 --> 00:01:52,190 with the weight is implied. 38 00:01:52,190 --> 00:01:54,780 The weights are written on the edges, where outputs and inputs 39 00:01:54,780 --> 00:01:58,390 go, and the summation of the two inputs into the node 40 00:01:58,390 --> 00:02:00,540 are also implied. 41 00:02:00,540 --> 00:02:02,970 But take a look here. 42 00:02:02,970 --> 00:02:05,560 This is the same net. 43 00:02:05,560 --> 00:02:17,500 These w's here would be the w's that are written 44 00:02:17,500 --> 00:02:20,610 onto these lines are here. 45 00:02:20,610 --> 00:02:22,450 Actually the better way to draw it 46 00:02:22,450 --> 00:02:32,380 would be like so, since each of these 47 00:02:32,380 --> 00:02:36,760 can have their own w, which is different. 48 00:02:40,730 --> 00:02:44,020 So each of the w's that are down here, 49 00:02:44,020 --> 00:02:46,250 are being explicitly set to a multiplier. 50 00:02:46,250 --> 00:02:48,470 Where as here, you just had to remember 51 00:02:48,470 --> 00:02:51,700 to multiply the weight by the input that was coming by. 52 00:02:51,700 --> 00:02:54,260 Here you see an input, comes to a multiplier, 53 00:02:54,260 --> 00:02:59,180 you multiply by the weight, then once you multiplied 54 00:02:59,180 --> 00:03:02,410 all the inputs by the weight, then you send them to a sum, 55 00:03:02,410 --> 00:03:04,870 so the sigma is just a sum, you sum them, add them 56 00:03:04,870 --> 00:03:09,910 all together, send the result of that into the sigmoid function, 57 00:03:09,910 --> 00:03:14,330 our old buddy, 1 over 1 plus e to the negative whatever 58 00:03:14,330 --> 00:03:18,900 our input was, with a weight for an offset, 59 00:03:18,900 --> 00:03:20,740 and then we send the result of that 60 00:03:20,740 --> 00:03:26,520 into more multipliers with more weights, more sums, more 61 00:03:26,520 --> 00:03:28,360 sigmoids. 62 00:03:28,360 --> 00:03:33,340 So this is how it's going to look like on the quiz. 63 00:03:33,340 --> 00:03:41,610 And this is a conversion guide from version 0.9 data 64 00:03:41,610 --> 00:03:42,940 into version 1.0. 65 00:03:42,940 --> 00:03:46,060 So if you see something that looks 66 00:03:46,060 --> 00:03:49,610 like this, on one of the old quizzes that you're doing, 67 00:03:49,610 --> 00:03:52,980 see if you can convert it, and then solve the problem. 68 00:03:52,980 --> 00:03:54,735 Chances are if you can convert it, 69 00:03:54,735 --> 00:03:56,110 you're probably going to do fine. 70 00:04:01,970 --> 00:04:04,550 We'll start off not only with this conversion guide, 71 00:04:04,550 --> 00:04:17,839 but also-- I'll leave that up here-- also 72 00:04:17,839 --> 00:04:20,310 I'm going to work out the formulas for you guys one more 73 00:04:20,310 --> 00:04:21,230 time. 74 00:04:21,230 --> 00:04:25,500 These are all the formulae that you're 75 00:04:25,500 --> 00:04:27,100 going to need on the quiz. 76 00:04:27,100 --> 00:04:29,700 And then we're going to decide what 77 00:04:29,700 --> 00:04:34,480 will change in the formulae, if, and this is a very 78 00:04:34,480 --> 00:04:38,230 likely if, there seems to be good amount of times 79 00:04:38,230 --> 00:04:41,820 that this happens, is that the sigmoid function in those 80 00:04:41,820 --> 00:04:44,530 neurons out there was ever changed 81 00:04:44,530 --> 00:04:46,050 into some other kind of function. 82 00:04:46,050 --> 00:04:46,550 Hint. 83 00:04:46,550 --> 00:04:49,460 It's changed into a plus already in the problem 84 00:04:49,460 --> 00:04:50,800 we're about to do. 85 00:04:50,800 --> 00:04:54,420 People change it all the time into some bizarro function. 86 00:04:54,420 --> 00:04:57,680 I've seen arc tangent, I think. 87 00:04:57,680 --> 00:04:59,910 So here we go. 88 00:04:59,910 --> 00:05:01,160 Let's look at the front of it. 89 00:05:01,160 --> 00:05:02,220 First of all, sigmoid. 90 00:05:02,220 --> 00:05:06,910 Well our old buddy, sigmoid, I just said it a moment ago, 91 00:05:06,910 --> 00:05:10,275 sigmoid is 1 over 1 plus e to the minus x. 92 00:05:18,330 --> 00:05:34,940 Also, fun fact about sigmoid, the derivative of sigmoid, 93 00:05:34,940 --> 00:05:41,210 is itself-- the derivative of sigmoid 94 00:05:41,210 --> 00:05:47,250 is-- let's say that the sigmoid-- we'll just 95 00:05:47,250 --> 00:05:53,001 turn sigmoid into like the letter say y. 96 00:05:53,001 --> 00:05:54,670 Y is the result, right? 97 00:05:54,670 --> 00:06:02,650 So if you say y equals 1 over 1 plus e to the negative x, 98 00:06:02,650 --> 00:06:09,480 then the derivative of sigmoid is y times 1 minus y. 99 00:06:13,330 --> 00:06:15,740 You can also write out the whole nasty thing, 100 00:06:15,740 --> 00:06:19,380 it's 1 over 1 plus e to the negative x times 1 minus 1 101 00:06:19,380 --> 00:06:21,780 over 1 plus e to negative x. 102 00:06:21,780 --> 00:06:23,820 So the nice property of sigmoid it's 103 00:06:23,820 --> 00:06:28,160 going to be important for us in the very near future, 104 00:06:28,160 --> 00:06:31,900 and that future begins now. 105 00:06:31,900 --> 00:06:33,790 So now the performance function. 106 00:06:33,790 --> 00:06:38,785 This is a function we used to tell neural nets when 107 00:06:38,785 --> 00:06:42,700 they inevitably act up and give us really crappy results. 108 00:06:42,700 --> 00:06:47,510 At first we tell them just how long they are, 109 00:06:47,510 --> 00:06:49,430 with our performance function. 110 00:06:49,430 --> 00:06:53,060 The first function can be any sane function 111 00:06:53,060 --> 00:07:01,980 that gives you a better score, where better can be decided 112 00:07:01,980 --> 00:07:05,991 as lower or higher, if you feel like, that gives you a better 113 00:07:05,991 --> 00:07:08,240 score, if your answers are closer to the answer you're 114 00:07:08,240 --> 00:07:09,920 looking for. 115 00:07:09,920 --> 00:07:17,360 However, in this case, we have, for a very sneaky reason, 116 00:07:17,360 --> 00:07:20,300 chosen the performance function to be 117 00:07:20,300 --> 00:07:27,530 1/2 d, which is the desired output, minus o, 118 00:07:27,530 --> 00:07:30,680 the actual output squared. 119 00:07:33,770 --> 00:07:38,900 So we want a small, well it's negative, 120 00:07:38,900 --> 00:07:43,450 So we want a small negative or 0. 121 00:07:43,450 --> 00:07:47,130 That would mean we performed well. 122 00:07:47,130 --> 00:07:47,915 So why this? 123 00:07:50,560 --> 00:08:05,090 Well the main reason is ddx of performance 124 00:08:05,090 --> 00:08:09,450 is, the 2 comes down, the o is the variable that we're 125 00:08:09,450 --> 00:08:16,930 actually, so maybe I should say ddo, that negative comes out, 126 00:08:16,930 --> 00:08:22,340 we get a simple d minus o. 127 00:08:25,980 --> 00:08:29,370 And yeah, we're using derivatives here. 128 00:08:29,370 --> 00:08:32,049 So those are fine. 129 00:08:32,049 --> 00:08:33,620 These are two assumptions. 130 00:08:33,620 --> 00:08:35,493 They could be changed on your test. 131 00:08:35,493 --> 00:08:37,909 We're going to figure out what happens, if we change them, 132 00:08:37,909 --> 00:08:41,440 if we change the performance, if we change the sigmoid, that 133 00:08:41,440 --> 00:08:44,927 is if we change the sigmoid to some other function, what's 134 00:08:44,927 --> 00:08:47,010 going to happen to the next three functions, which 135 00:08:47,010 --> 00:08:48,920 are basically the only things that you need 136 00:08:48,920 --> 00:08:51,410 to know to do backpropagation. 137 00:08:51,410 --> 00:08:52,880 So let's look at that. 138 00:08:52,880 --> 00:08:54,210 First, w prime. 139 00:08:54,210 --> 00:08:56,820 This is the formula for a new weight. 140 00:08:56,820 --> 00:08:59,520 After one step of backpropagation. 141 00:08:59,520 --> 00:09:01,490 A new weight in any of these positions 142 00:09:01,490 --> 00:09:07,370 that you can see up here on this beautiful neural net. 143 00:09:07,370 --> 00:09:11,670 That w-- each of the w's will have to change step by step. 144 00:09:11,670 --> 00:09:14,330 That's, in fact, how you do the hill climbing neural nets. 145 00:09:14,330 --> 00:09:16,635 You change the weights incrementally. 146 00:09:16,635 --> 00:09:19,000 You step a little bit in the direction 147 00:09:19,000 --> 00:09:23,170 towards giving you your desired results until eventually, you 148 00:09:23,170 --> 00:09:27,100 hope, you have an intelligent neural net. 149 00:09:27,100 --> 00:09:29,265 And maybe you have many different training examples 150 00:09:29,265 --> 00:09:32,140 that you run it on, in a cycle, hoping 151 00:09:32,140 --> 00:09:35,980 that you don't over fit to your one sample, on a computer. 152 00:09:35,980 --> 00:09:38,510 But on the test, we will probably will not do that. 153 00:09:38,510 --> 00:09:40,750 So let's take a look at how you calculate 154 00:09:40,750 --> 00:09:43,060 the weights for the next level. 155 00:09:43,060 --> 00:09:45,570 And then you have the weights for the current level. 156 00:09:45,570 --> 00:09:48,080 So first things first. 157 00:09:48,080 --> 00:09:49,810 New weight, weight prime equals-- 158 00:09:49,810 --> 00:09:51,500 starts with the old weight. 159 00:09:51,500 --> 00:09:54,110 That has to go there because otherwise we're 160 00:09:54,110 --> 00:09:56,910 just going to jump off somewhere at random. 161 00:09:56,910 --> 00:10:00,010 We want to make a little step in some direction, 162 00:10:00,010 --> 00:10:03,250 so we want to start where we are, with the weight. 163 00:10:03,250 --> 00:10:08,720 And then we're going to add three things. 164 00:10:08,720 --> 00:10:11,050 So if we're talking about the weight 165 00:10:11,050 --> 00:10:17,390 between some i and some j-- there's 166 00:10:17,390 --> 00:10:20,190 some examples of the names of weights. 167 00:10:20,190 --> 00:10:24,240 So this is w 1 i, that's the weight between 1 168 00:10:24,240 --> 00:10:29,332 and-- so this is w 1 a, it's the weight between 1 and a. 169 00:10:29,332 --> 00:10:34,920 This is w 2 b, which is the weight between 2 and b . 170 00:10:34,920 --> 00:10:36,170 Makes sense? 171 00:10:36,170 --> 00:10:38,270 Well makes sense so far, but what 172 00:10:38,270 --> 00:10:42,430 if it's just called w b, then it's 173 00:10:42,430 --> 00:10:46,279 the weight between-- these w's that only have one letter, 174 00:10:46,279 --> 00:10:47,070 we'll get to later. 175 00:10:47,070 --> 00:10:47,930 They're the bias. 176 00:10:47,930 --> 00:10:49,560 They're the offset. 177 00:10:49,560 --> 00:10:52,020 They are always attached to a negative 1. 178 00:10:52,020 --> 00:11:00,520 So you can pretty much treat them as being a negative 1 179 00:11:00,520 --> 00:11:04,840 here, that is then fed into a multiplier with this w b, 180 00:11:04,840 --> 00:11:05,673 if you like. 181 00:11:10,880 --> 00:11:12,370 This is implied to be that. 182 00:11:12,370 --> 00:11:15,920 All of the offsets are implied to be that. 183 00:11:15,920 --> 00:11:24,130 So w plus sum of alpha-- why is this Greek letter? 184 00:11:24,130 --> 00:11:25,240 Where does it come from? 185 00:11:25,240 --> 00:11:26,320 How do we calculate it? 186 00:11:26,320 --> 00:11:32,149 Well alpha is just some value told to you on the quiz. 187 00:11:32,149 --> 00:11:33,190 You'll find it somewhere. 188 00:11:33,190 --> 00:11:35,810 There's no way you're going to have to calculate alpha. 189 00:11:35,810 --> 00:11:38,090 You might be asked to try to give us an alpha, 190 00:11:38,090 --> 00:11:39,530 but probably not. 191 00:11:39,530 --> 00:11:42,660 Alpha is supposed to give the size of our little steps 192 00:11:42,660 --> 00:11:44,650 that we take when we're doing hill climbing. 193 00:11:44,650 --> 00:11:46,600 Very large alpha, take a huge step. 194 00:11:46,600 --> 00:11:49,410 Very small alpha, take tentative steps. 195 00:11:49,410 --> 00:11:53,250 So alpha is there, basically, to change this answer 196 00:11:53,250 --> 00:11:58,900 and to make the new value either very close to w, or far from w, 197 00:11:58,900 --> 00:12:00,770 depending on our taste. 198 00:12:00,770 --> 00:12:14,250 So plus alpha times i, so i is the value 199 00:12:14,250 --> 00:12:23,999 coming in into the node. 200 00:12:23,999 --> 00:12:25,290 We're changing the weight here. 201 00:12:25,290 --> 00:12:30,870 So i is the value, for instance, i sub 1 here, 202 00:12:30,870 --> 00:12:36,080 i would be the value of WAC, i would be the value coming 203 00:12:36,080 --> 00:12:40,170 output of node a. 204 00:12:40,170 --> 00:12:43,600 WBC, i would be the output of node b. 205 00:12:43,600 --> 00:12:47,630 i is sometimes as little as i is the input coming in 206 00:12:47,630 --> 00:12:51,140 to meet that weight at the multiplier. 207 00:12:51,140 --> 00:12:56,700 And then it's multiplied by delta j. 208 00:12:56,700 --> 00:12:59,560 Your delta is the delta that belongs 209 00:12:59,560 --> 00:13:02,200 to these neural net nodes. 210 00:13:02,200 --> 00:13:04,160 What is a delta, you said? 211 00:13:04,160 --> 00:13:05,420 Funny you may ask. 212 00:13:05,420 --> 00:13:07,222 It is a strange Greek letter. 213 00:13:07,222 --> 00:13:08,930 It sort of comes from the fact that we're 214 00:13:08,930 --> 00:13:10,638 doing some partial derivatives and stuff, 215 00:13:10,638 --> 00:13:13,335 but the main way you're going to figure out what the deltas are 216 00:13:13,335 --> 00:13:17,040 are these two formulae that I've not written in yet. 217 00:13:17,040 --> 00:13:20,970 So hold off on trying to figure out 218 00:13:20,970 --> 00:13:25,033 what the delta is until-- well right now, I'm 219 00:13:25,033 --> 00:13:26,710 about to tell you the delta is. 220 00:13:26,710 --> 00:13:30,670 So the delta is basically, think of the delta 221 00:13:30,670 --> 00:13:34,870 as using partial derivatives to figure out 222 00:13:34,870 --> 00:13:36,590 which way you're going to step, when 223 00:13:36,590 --> 00:13:37,940 you're doing hill climbing. 224 00:13:37,940 --> 00:13:39,981 Because you know when you're doing hill climbing, 225 00:13:39,981 --> 00:13:42,090 you look around, you figure out, OK, this 226 00:13:42,090 --> 00:13:44,320 is the direction of the highest increase, 227 00:13:44,320 --> 00:13:46,980 and then you step off in that direction. 228 00:13:46,980 --> 00:13:48,720 So the deltas are telling you which way 229 00:13:48,720 --> 00:13:51,270 to step, with the weights. 230 00:13:51,270 --> 00:13:55,480 And the way they do that is by taking the partial derivative 231 00:13:55,480 --> 00:13:58,460 of-- basically you try to figure out 232 00:13:58,460 --> 00:14:02,320 how the weight that you're currently looking at 233 00:14:02,320 --> 00:14:07,470 is contributing to the performance of the net. 234 00:14:07,470 --> 00:14:10,064 Contributing to, either the good performance 235 00:14:10,064 --> 00:14:11,980 of the net, or the bad performance of the net. 236 00:14:14,630 --> 00:14:21,330 So when you're dealing with the weights, like WBC, WAC, 237 00:14:21,330 --> 00:14:27,090 that pretty much directly feed into the end of the net. 238 00:14:27,090 --> 00:14:29,330 They feed into the last node, and it then comes out. 239 00:14:29,330 --> 00:14:31,990 It's the output. 240 00:14:31,990 --> 00:14:33,590 That's pretty easy. 241 00:14:33,590 --> 00:14:36,270 You can tell exactly how much those weights, 242 00:14:36,270 --> 00:14:39,440 and the values coming from them, are contributing to the end. 243 00:14:39,440 --> 00:14:46,950 And we do that by essentially, remember 244 00:14:46,950 --> 00:14:53,120 what the partial derivative, so partial derivative here 245 00:14:53,120 --> 00:14:59,240 is, in fact, the way that the final weights are contributing 246 00:14:59,240 --> 00:15:02,850 to the performance, is just the performance function. 247 00:15:02,850 --> 00:15:05,160 Partial derivative-- I've already 248 00:15:05,160 --> 00:15:07,830 figured out the derivative here, it's just d minus o. 249 00:15:11,800 --> 00:15:15,490 This is for sort of final weights, 250 00:15:15,490 --> 00:15:17,670 the weights in the last level. 251 00:15:17,670 --> 00:15:19,790 D minus o, except we're not done yet, 252 00:15:19,790 --> 00:15:24,560 because when we do derivatives, remember the chain rule. 253 00:15:24,560 --> 00:15:29,330 To get from the end to these weights, we pass through, 254 00:15:29,330 --> 00:15:31,540 well it should be a sigmoid, here it's not, 255 00:15:31,540 --> 00:15:33,730 we're going to pretend it is for the moment, 256 00:15:33,730 --> 00:15:36,580 we pass through a sigmoid, and since we 257 00:15:36,580 --> 00:15:38,420 passed through the sigmoid, we had better 258 00:15:38,420 --> 00:15:41,050 take the derivative of the sigmoid function. 259 00:15:41,050 --> 00:15:46,700 That is, y times 1 minus y. 260 00:15:46,700 --> 00:15:47,510 Well what is y? 261 00:15:47,510 --> 00:15:50,150 What is the output of the sigmoid? 262 00:15:50,150 --> 00:15:51,970 It's up. 263 00:15:51,970 --> 00:15:56,520 So that's also multiplied by o times 1 minus o. 264 00:16:02,667 --> 00:16:05,480 However, there is a-- let me see, 265 00:16:05,480 --> 00:16:10,780 let me see, yes-- sorry, I'm carefully studying this sheet 266 00:16:10,780 --> 00:16:13,000 to make sure my nomenclature is exactly 267 00:16:13,000 --> 00:16:17,120 right for our new nomenclature, which so new and brave, 268 00:16:17,120 --> 00:16:19,510 that we're doing it, that we only knew for sure we're 269 00:16:19,510 --> 00:16:21,450 going to do it on Wednesday. 270 00:16:21,450 --> 00:16:26,160 So we have d minus o times o times 1 minus o. 271 00:16:29,100 --> 00:16:35,440 So you say, that's fine, that can get us these weights 272 00:16:35,440 --> 00:16:39,300 here, even this w c, how are we going 273 00:16:39,300 --> 00:16:48,070 to get the deltas for the new weights here? 274 00:16:51,330 --> 00:16:55,870 Oh, I realize-- yeah, I got it. 275 00:16:55,870 --> 00:16:58,860 So the delta-- by the way, this is a delta c, 276 00:16:58,860 --> 00:17:00,980 how is neuron c contributing to the output? 277 00:17:00,980 --> 00:17:03,020 Well it's directly contributing to the output , 278 00:17:03,020 --> 00:17:04,819 and it's got a sigmoid in it. 279 00:17:04,819 --> 00:17:07,290 It doesn't really, but we're pretending it does for now. 280 00:17:07,290 --> 00:17:10,440 d minus o times 1 minus o. 281 00:17:10,440 --> 00:17:12,329 What about inner node? 282 00:17:12,329 --> 00:17:15,780 Node d, node a, what are we going to have to do? 283 00:17:15,780 --> 00:17:18,329 Well the way they contribute to the output is 284 00:17:18,329 --> 00:17:20,819 that they contribute to node c. 285 00:17:20,819 --> 00:17:25,170 So we can do this problem recursively. 286 00:17:25,170 --> 00:17:27,680 So let's do this recursively. 287 00:17:27,680 --> 00:17:30,080 First of all, as you have probably figured out, 288 00:17:30,080 --> 00:17:32,880 all of them are going to have an o times 1 minus o factoring 289 00:17:32,880 --> 00:17:35,570 from the chain rule, because they're all sigmoid, pretending 290 00:17:35,570 --> 00:17:36,653 that they're all sigmoids. 291 00:17:39,430 --> 00:17:41,440 We also have a dearth of good problems 292 00:17:41,440 --> 00:17:44,800 that are actually sigmoid on the web right now. 293 00:17:44,800 --> 00:17:46,990 There's only 2007. 294 00:17:46,990 --> 00:17:50,800 But here's o times 1 minus o, what 295 00:17:50,800 --> 00:17:53,070 are we going to do for the rest of it? 296 00:17:53,070 --> 00:17:55,770 How does it contribute to our final result? 297 00:17:55,770 --> 00:18:00,460 Well it contributes to our final result recursively. 298 00:18:00,460 --> 00:18:01,920 So we're talking about delta i. 299 00:18:01,920 --> 00:18:04,930 I is an inner node. 300 00:18:04,930 --> 00:18:06,740 It's not a final node. 301 00:18:06,740 --> 00:18:08,390 It's somewhere along the way. 302 00:18:08,390 --> 00:18:24,500 So sum over j of w, going from i to j, times delta j. 303 00:18:24,500 --> 00:18:32,000 Now sum over all j, j such that i leads to j. 304 00:18:32,000 --> 00:18:34,860 I needs to have a direct path into j. 305 00:18:34,860 --> 00:18:38,830 So if i, in this instance, was j, 306 00:18:38,830 --> 00:18:45,240 everyone, the only possible j in this would be c. 307 00:18:45,240 --> 00:18:46,170 That's right. 308 00:18:46,170 --> 00:18:49,940 We would not sum over b as one of the j. 309 00:18:49,940 --> 00:18:54,620 i does not lead to b, or a does not lead to b, a only leads 310 00:18:54,620 --> 00:18:55,470 to c. 311 00:18:55,470 --> 00:18:57,820 Also note that c does not need to be here. 312 00:18:57,820 --> 00:18:59,860 That's going backwards. 313 00:18:59,860 --> 00:19:02,760 So you just-- to figure out which j you're looking at, 314 00:19:02,760 --> 00:19:07,430 look directly forwards at the next one. 315 00:19:07,430 --> 00:19:11,030 So if there was another d here, or that a does not go to d, 316 00:19:11,030 --> 00:19:14,262 a goes to c. 317 00:19:14,262 --> 00:19:15,970 You only look at the next level children, 318 00:19:15,970 --> 00:19:20,890 and you sum over all those children, 319 00:19:20,890 --> 00:19:23,050 the weight between them, multiplied 320 00:19:23,050 --> 00:19:25,040 by the child's delta. 321 00:19:25,040 --> 00:19:26,440 That makes sense, right? 322 00:19:26,440 --> 00:19:30,180 Because the way we affect, if the child's delta is the way 323 00:19:30,180 --> 00:19:33,850 the child affects the output, calling these children 324 00:19:33,850 --> 00:19:37,042 for a moment, and then if this one directly 325 00:19:37,042 --> 00:19:38,750 affects the output, then the way this one 326 00:19:38,750 --> 00:19:44,290 affects it is-- it affects it because it affects this, 327 00:19:44,290 --> 00:19:46,960 but it's also multiplied by it's weight. 328 00:19:46,960 --> 00:19:52,440 So in fact, for instance, if the weight between a and c were 0, 329 00:19:52,440 --> 00:19:55,690 then a doesn't affect the output at all, right? 330 00:19:55,690 --> 00:20:00,600 Because its weight is 0, and when we do this problem, 331 00:20:00,600 --> 00:20:05,010 we go this times 0, and then we try to add it in there, 332 00:20:05,010 --> 00:20:06,510 doesn't affect anything. 333 00:20:06,510 --> 00:20:08,620 It's weight is very high, it's going to really 334 00:20:08,620 --> 00:20:14,280 dominate c, and that is taken into account here, 335 00:20:14,280 --> 00:20:19,090 and then multiply by the delta for the right node. 336 00:20:19,090 --> 00:20:22,800 So the following question, and since I 337 00:20:22,800 --> 00:20:26,610 spent a lot of time with formulae and not that much time 338 00:20:26,610 --> 00:20:30,650 starting on the problem, I will not call on someone at random, 339 00:20:30,650 --> 00:20:32,152 but rather take a volunteer. 340 00:20:32,152 --> 00:20:33,910 If no one volunteers, I'll eventually 341 00:20:33,910 --> 00:20:37,530 tell you, which is, we've got some nice formulae 342 00:20:37,530 --> 00:20:39,970 on the bottom three. 343 00:20:39,970 --> 00:20:43,140 If we change the sigmoid function, what has to change? 344 00:20:50,770 --> 00:20:53,560 That's the only thing that changes 345 00:20:53,560 --> 00:20:56,000 in this crazy assed problem right here, which by the way, 346 00:20:56,000 --> 00:20:58,820 changes the sigmoid functions into adders, 347 00:20:58,820 --> 00:21:00,690 is that we take all of the o times 1 348 00:21:00,690 --> 00:21:04,440 minus o in delta f and the delta i, 349 00:21:04,440 --> 00:21:06,750 and we change it to a new derivative. 350 00:21:06,750 --> 00:21:09,850 We then do the exact same thing that we would've done. 351 00:21:09,850 --> 00:21:10,650 Correct. 352 00:21:10,650 --> 00:21:13,040 And on a similar note, if you change the performance 353 00:21:13,040 --> 00:21:15,090 function, how many of these equations 354 00:21:15,090 --> 00:21:18,770 at all have to change out of the bottom three. 355 00:21:21,360 --> 00:21:22,070 Yeah. 356 00:21:22,070 --> 00:21:24,430 That's right, just one, just delta f. 357 00:21:24,430 --> 00:21:26,480 Take the d minus o, make it the new derivative 358 00:21:26,480 --> 00:21:28,390 of the new performance function. 359 00:21:28,390 --> 00:21:31,920 And in fact, delta i doesn't change at all. 360 00:21:31,920 --> 00:21:33,220 Does everyone see that? 361 00:21:33,220 --> 00:21:37,530 Because it is very common for something to be replaced, 362 00:21:37,530 --> 00:21:39,670 I think three of the four the quizzes that we have, 363 00:21:39,670 --> 00:21:43,190 replaced in some-- changed something in some way. 364 00:21:43,190 --> 00:21:43,690 All right. 365 00:21:43,690 --> 00:21:44,620 Let's go. 366 00:21:44,620 --> 00:21:47,690 We're going to do 2008 quiz, because it 367 00:21:47,690 --> 00:21:49,710 has a part of the end that screwed up everyone, 368 00:21:49,710 --> 00:21:51,510 and so let's make sure we get to that part. 369 00:21:51,510 --> 00:21:53,593 That's going to be the part that you probably care 370 00:21:53,593 --> 00:21:54,920 about the most at this point. 371 00:21:54,920 --> 00:21:58,050 So these are all adders instead of sigmoids. 372 00:21:58,050 --> 00:22:03,180 That means that they simply add up everything as normal, 373 00:22:03,180 --> 00:22:04,960 for a normal neural net, and then there's 374 00:22:04,960 --> 00:22:06,070 no sigmoid threshold. 375 00:22:06,070 --> 00:22:07,710 They just give some kind of value. 376 00:22:07,710 --> 00:22:08,450 Question? 377 00:22:08,450 --> 00:22:11,354 STUDENT: So we talked about those multiplier things, 378 00:22:11,354 --> 00:22:14,750 we don't have those in nodes? 379 00:22:14,750 --> 00:22:16,940 PROFESSOR: They're not neural net nodes. 380 00:22:16,940 --> 00:22:22,020 That is one of the reasons why that other form that you 381 00:22:22,020 --> 00:22:23,940 can see over there is elegant. 382 00:22:23,940 --> 00:22:25,720 It only has the actual nodes on it. 383 00:22:25,720 --> 00:22:26,970 It is very compact. 384 00:22:26,970 --> 00:22:30,210 It's one of the front we've used in the previous tests. 385 00:22:30,210 --> 00:22:35,020 The question is, do those multipliers count as nodes? 386 00:22:35,020 --> 00:22:38,110 However by not putting in the multipliers, 387 00:22:38,110 --> 00:22:41,410 we feel it sometimes confuses people of explicitness. 388 00:22:41,410 --> 00:22:43,410 The ones that are nodes will always 389 00:22:43,410 --> 00:22:46,230 have a label, like a or here, you see 390 00:22:46,230 --> 00:22:49,240 there's a sigmoid and an L1. 391 00:22:49,240 --> 00:22:52,490 The multipliers are there for your convenience, 392 00:22:52,490 --> 00:22:54,310 to remind you to multiply, and also 393 00:22:54,310 --> 00:22:56,650 those, if you look those sigmoids that are over there, 394 00:22:56,650 --> 00:23:00,330 are there for your convenience to remind you to add. 395 00:23:00,330 --> 00:23:02,850 In fact, the only thing that counts 396 00:23:02,850 --> 00:23:04,636 as a node in the neural net-- and that's 397 00:23:04,636 --> 00:23:07,810 a very good question-- is usually the sigmoids, 398 00:23:07,810 --> 00:23:10,382 here it's the adders. 399 00:23:10,382 --> 00:23:12,090 We've essentially taken out the sigmoids. 400 00:23:12,090 --> 00:23:17,370 These adders are the-- oh, here's the way to tell. 401 00:23:17,370 --> 00:23:22,010 If it's got a threshold weight associated with it, 402 00:23:22,010 --> 00:23:25,200 then it's one of the actual nodes. 403 00:23:25,200 --> 00:23:26,220 A threshold weight. 404 00:23:26,220 --> 00:23:28,428 I guess the multipliers look like they have a weight, 405 00:23:28,428 --> 00:23:31,774 but this is just the weight that is being multiplied in. 406 00:23:31,774 --> 00:23:33,940 This is our witness be multiplied in with the input, 407 00:23:33,940 --> 00:23:35,973 but if it has a threshold weight, like wa, 408 00:23:35,973 --> 00:23:38,030 wb-- oh, I promised I would tell you guys 409 00:23:38,030 --> 00:23:39,760 the difference between the two weights. 410 00:23:39,760 --> 00:23:41,960 So let's do that very quickly. 411 00:23:41,960 --> 00:23:46,860 The kinds of weights that, say w2b or w1a, our weight the 412 00:23:46,860 --> 00:23:51,775 comes between input 1 and a or between a 413 00:23:51,775 --> 00:23:57,590 and c, then mentally multiplying the input by this weight, 414 00:23:57,590 --> 00:23:59,840 and then eventually that's added together. 415 00:23:59,840 --> 00:24:06,040 The threshold weights, they just have like wb, wa, wc. 416 00:24:06,040 --> 00:24:09,900 They are essentially to decide the threshold for a success 417 00:24:09,900 --> 00:24:14,715 or failure, for a 1 or a 0, or anything in between, 418 00:24:14,715 --> 00:24:17,150 at any of the given nodes. 419 00:24:17,150 --> 00:24:20,500 So the idea is maybe you at some node 420 00:24:20,500 --> 00:24:22,260 want to have a really high cut off, 421 00:24:22,260 --> 00:24:25,180 you have to very high value coming in, or else it's a 0. 422 00:24:25,180 --> 00:24:27,060 So you put a high threshold. 423 00:24:27,060 --> 00:24:29,200 The weight is multiplied by negative 1. 424 00:24:29,200 --> 00:24:37,880 And in fact, the threshold weight won't-- one could 425 00:24:37,880 --> 00:24:40,850 consider if you wanted to that the threshold weight times 426 00:24:40,850 --> 00:24:44,030 negative 1. was also added in it that sum, 427 00:24:44,030 --> 00:24:47,560 instead of putting at the same location as the node. 428 00:24:47,560 --> 00:24:50,140 If that works better for you, when you're converting it, 429 00:24:50,140 --> 00:24:52,010 you can also think of it that way. 430 00:24:52,010 --> 00:24:54,090 Because the threshold weight is essentially 431 00:24:54,090 --> 00:24:55,650 multiplied by negative 1 and added in 432 00:24:55,650 --> 00:24:58,130 at that same sum over there. 433 00:24:58,130 --> 00:25:02,062 So that's another way to do it. 434 00:25:02,062 --> 00:25:04,270 There's a lot of ways to visualize these neural nets. 435 00:25:04,270 --> 00:25:07,940 Just make sure you have a way that makes sense to you, 436 00:25:07,940 --> 00:25:09,690 and that you can tell pretty much whatever 437 00:25:09,690 --> 00:25:12,050 we write, as long as it looks vaguely like that, 438 00:25:12,050 --> 00:25:15,159 how to get it in your mind, into the representation 439 00:25:15,159 --> 00:25:15,950 that works for you. 440 00:25:15,950 --> 00:25:18,241 Because once you have the representation right for you, 441 00:25:18,241 --> 00:25:20,360 you're more than halfway to solving these guys. 442 00:25:20,360 --> 00:25:21,660 They aren't that bad. 443 00:25:21,660 --> 00:25:23,080 They just look nasty. 444 00:25:23,080 --> 00:25:24,310 They don't bite. 445 00:25:24,310 --> 00:25:24,900 OK. 446 00:25:24,900 --> 00:25:26,560 These are just adders. 447 00:25:26,560 --> 00:25:28,570 So if it's just an adder, then that 448 00:25:28,570 --> 00:25:34,410 means that, if we take all the x inputs coming in-- 449 00:25:34,410 --> 00:25:36,630 let's do x and y for the moment, so we can figure out 450 00:25:36,630 --> 00:25:39,980 the derivative-- then what comes out 451 00:25:39,980 --> 00:25:46,830 after we just add up the x, what comes out, y equals x, right? 452 00:25:46,830 --> 00:25:48,770 We're just adding it up. 453 00:25:48,770 --> 00:25:52,290 Adding up all the input, we're not doing anything to it. 454 00:25:52,290 --> 00:25:55,910 Y equals x is what this node does. 455 00:25:55,910 --> 00:25:56,790 You people see that? 456 00:25:56,790 --> 00:26:00,690 So the derivative is just one. 457 00:26:00,690 --> 00:26:04,590 So that's pretty easy, because the first problem says, 458 00:26:04,590 --> 00:26:10,850 what is the new formula, delta f. 459 00:26:10,850 --> 00:26:12,890 So I'll just tell you. 460 00:26:12,890 --> 00:26:14,810 You guys probably figured it out. 461 00:26:14,810 --> 00:26:17,538 o times 1 minus o. 462 00:26:17,538 --> 00:26:21,530 Because we replaced d minus o with 1. 463 00:26:21,530 --> 00:26:22,230 OK? 464 00:26:22,230 --> 00:26:23,172 Makes sense so far? 465 00:26:23,172 --> 00:26:24,630 Please ask questions along the way, 466 00:26:24,630 --> 00:26:26,660 because I'm not going to be asking you guys. 467 00:26:26,660 --> 00:26:27,805 I'll do it myself. 468 00:26:27,805 --> 00:26:28,305 Question? 469 00:26:28,305 --> 00:26:29,888 STUDENT: Why to we use d minus o of 1? 470 00:26:32,890 --> 00:26:34,600 PROFESSOR: That's a good question. 471 00:26:34,600 --> 00:26:37,510 The reason is because I did the wrong thing. 472 00:26:37,510 --> 00:26:42,230 So see, it's good that you guys are asking questions. 473 00:26:42,230 --> 00:26:44,850 It actually should be replaced with o times 1 minus o with 1. 474 00:26:44,850 --> 00:26:49,350 The answer is delta f equals d minus o. 475 00:26:49,350 --> 00:26:52,170 So yes, perhaps I did it to trick you. 476 00:26:52,170 --> 00:26:54,570 No, I actually messed up. 477 00:26:54,570 --> 00:26:56,820 But yes, please ask questions along the way. 478 00:26:56,820 --> 00:26:59,080 Again, I don't have time to call on you guys 479 00:26:59,080 --> 00:27:01,520 at random to figure out if you guys are following along. 480 00:27:01,520 --> 00:27:03,020 So I'll do it myself. 481 00:27:03,020 --> 00:27:06,180 We're placing the o times 1 minus o with 1 because 482 00:27:06,180 --> 00:27:07,860 of the fact that the sigmoid is gone, 483 00:27:07,860 --> 00:27:10,960 and we get just delta f equals d minus o. 484 00:27:10,960 --> 00:27:13,660 So great. 485 00:27:13,660 --> 00:27:18,320 We now want to know what the equation is for delta i, 486 00:27:18,320 --> 00:27:19,720 at the node a. 487 00:27:19,720 --> 00:27:21,740 So delta a. 488 00:27:21,740 --> 00:27:24,270 Well let's take a look. 489 00:27:24,270 --> 00:27:26,530 The o times 1 minus o is gone. 490 00:27:26,530 --> 00:27:30,210 Now we just have the sum over j, which you guys already 491 00:27:30,210 --> 00:27:36,120 told me is, only c of WAC times delta c. 492 00:27:36,120 --> 00:27:38,350 We know that delta c is d minus o. 493 00:27:38,350 --> 00:27:43,590 The answer is delta a is just WAC times d minus o. 494 00:27:43,590 --> 00:27:46,050 That time, I got it right. 495 00:27:46,050 --> 00:27:47,684 I see the answer here. 496 00:27:47,684 --> 00:27:49,600 Though it's written in a very different format 497 00:27:49,600 --> 00:27:52,290 from the old quiz. 498 00:27:52,290 --> 00:27:54,590 Any questions on that? 499 00:27:54,590 --> 00:27:58,070 Well that's part a that we finished out of c. 500 00:27:58,070 --> 00:27:59,300 Let's go to part b. 501 00:27:59,300 --> 00:28:01,625 Part b is doing one step backpropagation. 502 00:28:01,625 --> 00:28:04,430 There's almost always going to be one of these in here. 503 00:28:04,430 --> 00:28:07,740 So the first thing it asks is to figure out 504 00:28:07,740 --> 00:28:11,510 what the output o is for this neural net 505 00:28:11,510 --> 00:28:15,670 if all weights are initially 1 except that this guy right here 506 00:28:15,670 --> 00:28:18,120 is negative 0.5. 507 00:28:18,120 --> 00:28:22,120 All the other ones start off as 1. 508 00:28:22,120 --> 00:28:25,560 Let's do a step-- oh, let's see what are the inputs. 509 00:28:25,560 --> 00:28:28,040 The inputs are also all 1. 510 00:28:28,040 --> 00:28:31,320 Desired output is also 1. 511 00:28:31,320 --> 00:28:37,980 And in fact, the rate constant alpha is also 1. 512 00:28:37,980 --> 00:28:40,490 This is the only thing that isn't 1, folks. 513 00:28:40,490 --> 00:28:42,070 So let's see what happens. 514 00:28:42,070 --> 00:28:48,210 1 times 1 is 1, then this is a negative 1 515 00:28:48,210 --> 00:28:50,870 times 1 is negative 1. 516 00:28:50,870 --> 00:28:53,062 That's 0. 517 00:28:53,062 --> 00:28:55,520 The exact same thing happens here because it's symmetrical. 518 00:28:55,520 --> 00:28:57,546 So these are both 0. 519 00:28:57,546 --> 00:29:03,150 0 times 1 is 0, 0 times 1 is 0. 520 00:29:03,150 --> 00:29:07,770 Then this is negative 1 times negative 0.5 is positive 0.5, 521 00:29:07,770 --> 00:29:14,740 so 0 plus 0 plus a positive 0.5, the output is positive 0.5. 522 00:29:14,740 --> 00:29:16,700 Does everyone see that? 523 00:29:16,700 --> 00:29:20,830 If not, you can convince yourself 524 00:29:20,830 --> 00:29:21,900 that it is positive 0.5. 525 00:29:21,900 --> 00:29:23,570 That would be a good exercise for you, 526 00:29:23,570 --> 00:29:25,470 run through one forward run. 527 00:29:25,470 --> 00:29:28,650 The output is definitely positive 0.5. 528 00:29:28,650 --> 00:29:30,000 First time around. 529 00:29:30,000 --> 00:29:31,130 OK? 530 00:29:31,130 --> 00:29:34,000 Now we have to do one step of backpropagation. 531 00:29:34,000 --> 00:29:35,900 To do that, let's calculate all the delta 532 00:29:35,900 --> 00:29:37,810 so that we can calculate all the new weights, 533 00:29:37,810 --> 00:29:39,560 the the new weight primes. 534 00:29:39,560 --> 00:29:42,660 So delta c. 535 00:29:42,660 --> 00:29:43,490 That's easy. 536 00:29:43,490 --> 00:29:45,060 You guys can tell me what delta c is. 537 00:29:45,060 --> 00:29:47,440 We figured out what the new delta c is going to be. 538 00:29:47,440 --> 00:29:50,500 So simple addition or subtraction problem? 539 00:29:50,500 --> 00:29:52,306 Everyone, delta c is? 540 00:29:52,306 --> 00:29:53,000 STUDENT: 0.5. 541 00:29:53,000 --> 00:29:58,086 PROFESSOR: 0.5, one half, yes. 542 00:29:58,086 --> 00:29:58,585 All right. 543 00:30:02,250 --> 00:30:07,690 We know that delta a and delta b are just WAC times delta c, 544 00:30:07,690 --> 00:30:09,150 and WBC times delta c. 545 00:30:09,150 --> 00:30:10,660 So they are? 546 00:30:10,660 --> 00:30:11,460 STUDENT: One half. 547 00:30:11,460 --> 00:30:14,440 PROFESSOR: Also one half, because all the weights were 1. 548 00:30:16,970 --> 00:30:17,750 Easy street. 549 00:30:17,750 --> 00:30:18,600 OK. 550 00:30:18,600 --> 00:30:20,980 We've got all of the deltas are one half. 551 00:30:20,980 --> 00:30:23,290 And all but a few of the weights are 1. 552 00:30:23,290 --> 00:30:25,190 So let's figure out what the new weights are. 553 00:30:29,910 --> 00:30:32,140 New WAC, OK. 554 00:30:37,520 --> 00:30:38,320 Yeah, so let's see. 555 00:30:38,320 --> 00:30:40,480 What's going to be the new WAC? 556 00:30:40,480 --> 00:30:45,080 So the new WAC is going to be old 557 00:30:45,080 --> 00:30:48,880 WAC, which is 1, because all of them are 1 except for wc, 558 00:30:48,880 --> 00:30:53,960 plus the rate constant which is 1, times the input coming 559 00:30:53,960 --> 00:30:57,270 in here, but remember that was 0, 560 00:30:57,270 --> 00:31:01,500 so actually it's just going to be the same as the old WAC. 561 00:31:01,500 --> 00:31:06,000 This is a metrical problem between b and a, at the moment, 562 00:31:06,000 --> 00:31:08,720 this is going to be the same. 563 00:31:08,720 --> 00:31:09,407 All right. 564 00:31:09,407 --> 00:31:10,990 Somethings are going to change though. 565 00:31:10,990 --> 00:31:14,960 What about wc, that was the one that was actually not 1? 566 00:31:14,960 --> 00:31:15,610 OK. 567 00:31:15,610 --> 00:31:25,120 So new wc, remember, the i for wc, 568 00:31:25,120 --> 00:31:26,720 the i that we use in this equation 569 00:31:26,720 --> 00:31:29,680 is always negative 1 because it's a threshold. 570 00:31:29,680 --> 00:31:37,700 So we have the old wc, which is negative 0.5, plus 1 times 571 00:31:37,700 --> 00:31:41,990 negative 1 times delta c, which is one half. 572 00:31:41,990 --> 00:31:49,330 So we have negative 0.5 plus negative 0.5 equals negative 1. 573 00:31:49,330 --> 00:31:57,260 w 1 a, well we've got w 1 a starts out as 1. 574 00:31:57,260 --> 00:32:04,180 Then we also know that w 1 a is going 575 00:32:04,180 --> 00:32:08,090 to be equal to 1 plus 1 times the input, which 576 00:32:08,090 --> 00:32:18,420 is 1, times delta of a, which is one half, so 1.5. 577 00:32:18,420 --> 00:32:25,000 And since it's symmetrical between a and b, then w 2 578 00:32:25,000 --> 00:32:29,400 b is also 1.5. 579 00:32:29,400 --> 00:32:35,730 And then finally, wa and wb, the offsets here, well they 580 00:32:35,730 --> 00:32:41,250 start at 1 plus 1 times negative 1 times 0.5. 581 00:32:41,250 --> 00:32:44,150 So they're both, everyone? 582 00:32:44,150 --> 00:32:44,947 STUDENT: One half. 583 00:32:44,947 --> 00:32:45,780 PROFESSOR: One half. 584 00:32:45,780 --> 00:32:46,321 That's right. 585 00:32:55,330 --> 00:32:56,080 That's right. 586 00:32:56,080 --> 00:33:00,780 Because negative 1 is their i. 587 00:33:00,780 --> 00:33:05,440 Negative 1 times one half plus positive 1 is just one half. 588 00:33:05,440 --> 00:33:07,690 That's one full step. 589 00:33:07,690 --> 00:33:10,180 Maybe a mite easier than you might be used to seeing, 590 00:33:10,180 --> 00:33:11,386 but there's a full step. 591 00:33:11,386 --> 00:33:13,510 And it asks what's going to be the output after one 592 00:33:13,510 --> 00:33:15,509 step of backpropagation? 593 00:33:15,509 --> 00:33:16,300 We can take a look. 594 00:33:19,860 --> 00:33:26,280 So we have 1 times the new wa, which is 1.5, you've got 1.5, 595 00:33:26,280 --> 00:33:29,130 then the new wa is just 0.5, now is 596 00:33:29,130 --> 00:33:31,920 0.5, that's a 1 coming into an adder. 597 00:33:31,920 --> 00:33:34,910 We've got another 1 coming in here because it's symmetrical. 598 00:33:34,910 --> 00:33:39,370 So 1 and a 1, 1 times WAC is 1. 599 00:33:39,370 --> 00:33:41,020 1 times WBC is 1. 600 00:33:41,020 --> 00:33:44,990 So we have two 1s coming in here, they're added, that's 2. 601 00:33:44,990 --> 00:33:50,860 Then this has become negative 1, in fact, at this point. 602 00:33:50,860 --> 00:33:55,970 So negative 1 times negative 1, that's 3, and the output is 3. 603 00:33:58,540 --> 00:33:59,070 All right. 604 00:33:59,070 --> 00:33:59,590 Cool. 605 00:33:59,590 --> 00:34:03,880 We've now finished part b, which is over half of everything. 606 00:34:03,880 --> 00:34:05,186 Oh no, we've not. 607 00:34:05,186 --> 00:34:05,810 One more thing. 608 00:34:10,929 --> 00:34:11,690 These are adders. 609 00:34:11,690 --> 00:34:12,565 They're not sigmoids. 610 00:34:16,210 --> 00:34:19,095 What if we train this entire neural net 611 00:34:19,095 --> 00:34:21,389 to try to learn this data, so that it 612 00:34:21,389 --> 00:34:25,520 can draw a line on a graph, or draw some lines, 613 00:34:25,520 --> 00:34:29,199 or do some kind of learning, to separate off the minuses 614 00:34:29,199 --> 00:34:31,070 from all the pluses. 615 00:34:31,070 --> 00:34:33,280 You've seen, maybe, and if not, you 616 00:34:33,280 --> 00:34:34,780 are about to in a second, because it 617 00:34:34,780 --> 00:34:37,600 asks you to do this in detail, than neural nets can usually 618 00:34:37,600 --> 00:34:40,830 draw one line on the graph for each of these, 619 00:34:40,830 --> 00:34:43,120 sort of, nodes in the net, because each of the nodes 620 00:34:43,120 --> 00:34:44,449 has some kind of threshold. 621 00:34:44,449 --> 00:34:49,480 And you can do some logic between them like ands or ors. 622 00:34:49,480 --> 00:34:52,830 What do you guys think this net is going to draw? 623 00:34:52,830 --> 00:34:55,389 Anyone could volunteer, I'm not going to ask anyone 624 00:34:55,389 --> 00:34:58,260 to give this answer. 625 00:34:58,260 --> 00:35:01,510 That's a little bit tricky, because usually 626 00:35:01,510 --> 00:35:03,890 if you had this many nodes, you could easily 627 00:35:03,890 --> 00:35:07,750 draw a box and box off the minuses from the pluses. 628 00:35:07,750 --> 00:35:11,680 However, it draws this. 629 00:35:11,680 --> 00:35:13,350 And it asks what is the error? 630 00:35:13,350 --> 00:35:16,570 The error is-- oh yeah, it even tells you the error is 1/8, 631 00:35:16,570 --> 00:35:19,260 because why? 632 00:35:19,260 --> 00:35:20,660 These are all adders. 633 00:35:20,660 --> 00:35:23,460 You can't actually do anything logical. 634 00:35:23,460 --> 00:35:25,760 This entire net boils down to just one node, 635 00:35:25,760 --> 00:35:27,530 because it just adds up every time. 636 00:35:27,530 --> 00:35:30,330 It never takes a threshold at any point. 637 00:35:30,330 --> 00:35:33,160 So you can't turn into logical ones and zeroes, 638 00:35:33,160 --> 00:35:37,440 because it's basically not digital at all, its analog. 639 00:35:37,440 --> 00:35:39,500 It's giving us some very high number. 640 00:35:39,500 --> 00:35:41,840 So it all boils down to one cut off. 641 00:35:41,840 --> 00:35:43,850 And that's the best one. 642 00:35:43,850 --> 00:35:46,700 The one that I drew right here. 643 00:35:46,700 --> 00:35:47,929 OK. 644 00:35:47,929 --> 00:35:49,220 Did that not make sense to you? 645 00:35:49,220 --> 00:35:50,280 That's OK. 646 00:35:50,280 --> 00:35:52,080 This problem is much harder. 647 00:35:52,080 --> 00:35:56,040 And putting them both on the same quiz, was a bit brutal, 648 00:35:56,040 --> 00:35:57,880 but by the time you're done with this, 649 00:35:57,880 --> 00:36:00,380 you'll understand what a neural net can do or not. 650 00:36:00,380 --> 00:36:02,840 I put these in simplified form because of the fact 651 00:36:02,840 --> 00:36:06,350 that we don't care about their values or anything like that. 652 00:36:06,350 --> 00:36:09,570 But inside of these little circles is a sigmoid, 653 00:36:09,570 --> 00:36:13,860 the multipliers and the summers are implied. 654 00:36:13,860 --> 00:36:16,380 I think in the simplified form when we're not actually 655 00:36:16,380 --> 00:36:18,590 doing backpropagation is easier to view it, and see 656 00:36:18,590 --> 00:36:19,950 how many nodes there are. 657 00:36:19,950 --> 00:36:21,741 For the same reason you asked your question 658 00:36:21,741 --> 00:36:22,850 about how many there are. 659 00:36:22,850 --> 00:36:26,130 So all of those big circles are node. 660 00:36:26,130 --> 00:36:30,080 And in those nodes is a sigmoid now, not those crazy adders. 661 00:36:30,080 --> 00:36:31,680 We have the following problem. 662 00:36:31,680 --> 00:36:33,590 We have to try to match each of a, 663 00:36:33,590 --> 00:36:37,690 b, c, d, e, f to 1, 2, 3, 4, 5, 6, using each of them 664 00:36:37,690 --> 00:36:39,100 only once. 665 00:36:39,100 --> 00:36:43,140 That's important, because some of the more powerful networks 666 00:36:43,140 --> 00:36:45,730 in here can do a lot of these. 667 00:36:45,730 --> 00:36:49,050 So it's like yes, the powerful networks could 668 00:36:49,050 --> 00:36:50,620 do some of the easier problems here, 669 00:36:50,620 --> 00:36:53,940 but we want to match each net to a problem it can do, 670 00:36:53,940 --> 00:36:57,720 and there is exactly one mapping that will map-- that 671 00:36:57,720 --> 00:37:02,290 is one to one, and maps exactly, uses all six of the nets 672 00:37:02,290 --> 00:37:04,700 to solve all six of these problems here. 673 00:37:04,700 --> 00:37:07,030 So some of you may be going like, what? 674 00:37:07,030 --> 00:37:08,870 How am I going to solve these problems? 675 00:37:08,870 --> 00:37:11,490 I gave away a hint before, which is 676 00:37:11,490 --> 00:37:16,940 that each node in the neural net, each sigmoid node 677 00:37:16,940 --> 00:37:21,070 can usually draw one line on the-- it 678 00:37:21,070 --> 00:37:23,290 can draw one line into the picture. 679 00:37:23,290 --> 00:37:25,225 The line can be diagonal if that nodes 680 00:37:25,225 --> 00:37:28,320 receives both of the inputs, which is here, i 1 and i 2. 681 00:37:28,320 --> 00:37:30,470 See there is an i 1 and an i 2 axis. 682 00:37:30,470 --> 00:37:32,330 Like x- and a y-axis. 683 00:37:32,330 --> 00:37:36,420 The node has to be horizontal, or vertical, if-- sorry, 684 00:37:36,420 --> 00:37:38,920 the line has to be horizontal or vertical if the node only 685 00:37:38,920 --> 00:37:41,810 receives one of the inputs. 686 00:37:41,810 --> 00:37:46,560 And then, if you have a deeper level, 687 00:37:46,560 --> 00:37:51,090 these secondary level nodes can sort of do a logical, 688 00:37:51,090 --> 00:37:53,810 can do some kind of brilliant thing like and or or of 689 00:37:53,810 --> 00:37:58,020 the first two, which can help you out. 690 00:37:58,020 --> 00:37:58,770 All right. 691 00:37:58,770 --> 00:38:01,530 And so let's try to figure it out. 692 00:38:01,530 --> 00:38:05,030 So right off the bat, and I hope that people will help and call 693 00:38:05,030 --> 00:38:06,930 this out, because I know we don't 694 00:38:06,930 --> 00:38:09,305 have enough time that I can force you guys to all get it. 695 00:38:09,305 --> 00:38:11,013 But right off the bat, which one of these 696 00:38:11,013 --> 00:38:12,699 looks like it's the easiest one? 697 00:38:12,699 --> 00:38:13,240 STUDENT: Six. 698 00:38:13,240 --> 00:38:13,865 PROFESSOR: Six. 699 00:38:13,865 --> 00:38:14,530 That's great. 700 00:38:14,530 --> 00:38:15,946 Six is definitely the easiest one. 701 00:38:15,946 --> 00:38:17,220 It's a single line. 702 00:38:17,220 --> 00:38:19,430 So this is just how I would have solved this problem, 703 00:38:19,430 --> 00:38:20,530 is find the easiest one. 704 00:38:20,530 --> 00:38:23,010 Which of these is the crappiest net? 705 00:38:23,010 --> 00:38:23,510 STUDENT: A. 706 00:38:23,510 --> 00:38:25,500 PROFESSOR: A is the crappiest net. 707 00:38:25,500 --> 00:38:27,440 But there's no way in hell that A 708 00:38:27,440 --> 00:38:30,510 is going to be able to get any of these except for six. 709 00:38:30,510 --> 00:38:39,240 So let's, right off the bat, say that six is A. All right. 710 00:38:39,240 --> 00:38:44,550 Six is A. That's A. We don't have to worry about A. OK. 711 00:38:44,550 --> 00:38:45,560 Cool. 712 00:38:45,560 --> 00:38:49,400 Now let's look at some other ones that are very interesting. 713 00:38:49,400 --> 00:38:52,430 All the rest of these draw two lines, 714 00:38:52,430 --> 00:38:53,835 well these three draw two lines. 715 00:38:53,835 --> 00:38:55,380 These three draw three lines. 716 00:38:55,380 --> 00:38:58,800 They draw a triangle. 717 00:38:58,800 --> 00:39:02,740 So despite the fact that this c is a very powerful node, 718 00:39:02,740 --> 00:39:09,470 that indeed, with three whole levels here of sigmoids, 719 00:39:09,470 --> 00:39:12,400 it looks like there's only two that's 720 00:39:12,400 --> 00:39:14,490 in our little stable of nets that are equipped 721 00:39:14,490 --> 00:39:16,460 to handle number one and two. 722 00:39:16,460 --> 00:39:18,530 And those are? 723 00:39:18,530 --> 00:39:23,335 E and F, because E and F have three nodes at the first level. 724 00:39:23,335 --> 00:39:25,276 They can draw three lines. 725 00:39:25,276 --> 00:39:27,650 And then they can do something logical about those lines, 726 00:39:27,650 --> 00:39:31,690 like for instance, maybe, if it's inside all of those lines. 727 00:39:31,690 --> 00:39:32,960 There's a way to do that. 728 00:39:32,960 --> 00:39:36,260 You just-- basically you can give negative and positive 729 00:39:36,260 --> 00:39:38,120 weights as you so choose to make sure 730 00:39:38,120 --> 00:39:40,264 that it's under certain ones, above other ones, 731 00:39:40,264 --> 00:39:42,930 and then make the threshold such that it has to follow all three 732 00:39:42,930 --> 00:39:45,160 of your rules. 733 00:39:45,160 --> 00:39:49,725 So between E and F, which one should be two 734 00:39:49,725 --> 00:39:52,050 and which one should be one. 735 00:39:52,050 --> 00:39:53,160 Anyone see? 736 00:39:53,160 --> 00:39:55,140 Well let's look at two and one. 737 00:39:55,140 --> 00:39:56,790 Which one is easier to do? 738 00:39:56,790 --> 00:39:57,880 Between two and one. 739 00:39:57,880 --> 00:39:59,120 Two. 740 00:39:59,120 --> 00:40:00,820 It's got a horizontal and a vertical. 741 00:40:00,820 --> 00:40:03,350 One has all three diagonal. 742 00:40:03,350 --> 00:40:08,262 And which one of these is a weaker net, between E and F. 743 00:40:08,262 --> 00:40:10,620 F. F has one node that can only do a horizontal, 744 00:40:10,620 --> 00:40:13,630 and one node that can only do a vertical line. 745 00:40:13,630 --> 00:40:16,370 So which one is F going to have to do? 746 00:40:16,370 --> 00:40:17,850 Two. 747 00:40:17,850 --> 00:40:19,210 And E does what? 748 00:40:19,210 --> 00:40:20,530 Good job, guys. 749 00:40:20,530 --> 00:40:22,470 Good job, you got this. 750 00:40:22,470 --> 00:40:24,940 So now let's look at the last three. 751 00:40:31,070 --> 00:40:32,790 Number three is definitely the hardest. 752 00:40:32,790 --> 00:40:35,470 It's an exceller. 753 00:40:35,470 --> 00:40:37,260 Those of you who've played around 754 00:40:37,260 --> 00:40:41,750 with double o 2 kind of stuff, or even just logic, 755 00:40:41,750 --> 00:40:45,560 probably know that there is no way 756 00:40:45,560 --> 00:40:52,490 to make a sort of simple linear combination in one 757 00:40:52,490 --> 00:40:55,610 level of logic to create an x or. 758 00:40:55,610 --> 00:40:58,660 x or is very difficult to create. 759 00:40:58,660 --> 00:41:00,830 There are some interesting problems 760 00:41:00,830 --> 00:41:04,560 involving trying to teach an exceller to a neural net. 761 00:41:04,560 --> 00:41:06,280 Because a neural net is not to be 762 00:41:06,280 --> 00:41:09,720 able to get the x or, because of the fact that you can tell it, 763 00:41:09,720 --> 00:41:14,120 OK, I want this one to be high, and this one to be low. 764 00:41:14,120 --> 00:41:14,930 That's fine. 765 00:41:14,930 --> 00:41:16,610 You say these both have to be high. 766 00:41:16,610 --> 00:41:17,620 That's fine. 767 00:41:17,620 --> 00:41:21,110 It's hard to say, it's pretty much impossible to say, 768 00:41:21,110 --> 00:41:24,860 this one or this one, but not the other, because need 769 00:41:24,860 --> 00:41:27,460 to be high in a single node, because of the fact that if you 770 00:41:27,460 --> 00:41:29,460 just play with it, you'll see. 771 00:41:29,460 --> 00:41:31,470 You need to set a threshold somewhere, 772 00:41:31,470 --> 00:41:33,680 and it's not going to be able to distinguish 773 00:41:33,680 --> 00:41:36,325 between, if the threshold is set such that the 774 00:41:36,325 --> 00:41:38,450 or is going to work, the whole or is going to work. 775 00:41:38,450 --> 00:41:42,060 It's going to accept when both of them are positive as well. 776 00:41:42,060 --> 00:41:43,360 So how we can do x or? 777 00:41:43,360 --> 00:41:44,730 We need more logic. 778 00:41:44,730 --> 00:41:46,990 We need to use some combinations of ands and ors 779 00:41:46,990 --> 00:41:48,480 in a two level way. 780 00:41:48,480 --> 00:41:51,520 To do that we need the deepest neural net that we have. 781 00:41:51,520 --> 00:41:53,390 There's only one that's capable of that. 782 00:41:53,390 --> 00:41:54,680 And that is? 783 00:41:54,680 --> 00:41:55,785 It's C. 784 00:41:55,785 --> 00:41:57,410 There are many different ways to do it. 785 00:41:57,410 --> 00:41:59,510 Let's think of a possibility. 786 00:41:59,510 --> 00:42:04,230 i 1 and i 2 draw these two lines. 787 00:42:04,230 --> 00:42:08,450 Let's call these one, two, three, four, five, 788 00:42:08,450 --> 00:42:12,010 node 1 and node 2 draw these two lines. 789 00:42:12,010 --> 00:42:14,680 And I'll just sort of draw it here for you guys. 790 00:42:14,680 --> 00:42:20,050 Then maybe node 3 gives value to-- yeah, 791 00:42:20,050 --> 00:42:29,810 let me see-- node three can give value to perhaps-- let's 792 00:42:29,810 --> 00:42:36,924 see-- node 3 can give value to everything that is-- there are 793 00:42:36,924 --> 00:42:38,090 a lot of possibilities here. 794 00:42:38,090 --> 00:42:48,410 Node 3 can give value to everything that is up here. 795 00:42:48,410 --> 00:42:50,650 Actually node 3 can give value to everything 796 00:42:50,650 --> 00:42:56,630 except for this bottom part, and then 797 00:42:56,630 --> 00:43:05,510 node 4 could give value to say-- doesn't do it yet, 798 00:43:05,510 --> 00:43:07,700 but there's a few-- there's a few different ways 799 00:43:07,700 --> 00:43:09,130 to do it if you played around. 800 00:43:09,130 --> 00:43:12,470 The key idea is that node 3 and node 4 801 00:43:12,470 --> 00:43:17,350 can give value to some combination and or or not, 802 00:43:17,350 --> 00:43:23,490 and then node 5 can give value based on being above or below 803 00:43:23,490 --> 00:43:26,280 a certain threshold, combination of 3 and 4. 804 00:43:26,280 --> 00:43:29,200 You can build an exceller out of the logic gates. 805 00:43:29,200 --> 00:43:32,820 I will ponder on that in the back burner for a moment, 806 00:43:32,820 --> 00:43:35,200 as we continue onward, but clearly C 807 00:43:35,200 --> 00:43:38,010 has to do number three. 808 00:43:38,010 --> 00:43:38,620 OK. 809 00:43:38,620 --> 00:43:40,490 Now we're left with four and five. 810 00:43:40,490 --> 00:43:42,447 I think, interestingly, five looks 811 00:43:42,447 --> 00:43:45,030 like it may be more complicated than four, because of the fact 812 00:43:45,030 --> 00:43:48,297 that it needs to do both different directions instead 813 00:43:48,297 --> 00:43:49,570 of two of the same direction. 814 00:43:52,930 --> 00:43:55,720 So however, just the idea of the one with the fewer lines, 815 00:43:55,720 --> 00:43:58,720 being a simpler one, may not get us through here. 816 00:43:58,720 --> 00:43:59,860 And there's a reason why. 817 00:43:59,860 --> 00:44:01,180 Look what we have left to use. 818 00:44:01,180 --> 00:44:06,610 We have to use D or B. What is the property of the two lines 819 00:44:06,610 --> 00:44:08,740 that D can draw? 820 00:44:08,740 --> 00:44:11,930 D being the simpler one. 821 00:44:11,930 --> 00:44:14,760 One horizontal, one vertical, that's right. 822 00:44:14,760 --> 00:44:16,260 So even though it may look simpler 823 00:44:16,260 --> 00:44:17,740 to just have two horizontal lines, 824 00:44:17,740 --> 00:44:20,860 it actually requires B. B is the only one that 825 00:44:20,860 --> 00:44:24,030 can draw two horizontal lines because D has to draw one 826 00:44:24,030 --> 00:44:25,660 horizontal and one vertical. 827 00:44:25,660 --> 00:44:33,160 So that leaves us with, B on this, D on this. 828 00:44:33,160 --> 00:44:34,410 Excellent, we have a question. 829 00:44:34,410 --> 00:44:36,327 I would've thought it would have been possible 830 00:44:36,327 --> 00:44:37,743 that we had no questions, or maybe 831 00:44:37,743 --> 00:44:39,471 I just explained it the best I ever have. 832 00:44:39,471 --> 00:44:39,970 Question. 833 00:44:39,970 --> 00:44:43,825 STUDENT: I didn't get why B has to be two horizontal lines. 834 00:44:43,825 --> 00:44:44,700 PROFESSOR: All right. 835 00:44:44,700 --> 00:44:46,450 So the question is, I don't understand why 836 00:44:46,450 --> 00:44:48,200 B to be two horizontal lines. 837 00:44:48,200 --> 00:44:52,340 The answer is, it doesn't. 838 00:44:52,340 --> 00:44:56,510 B can be anything, but D can't be two horizontal lines. 839 00:44:56,510 --> 00:44:58,180 And so by process of elimination, 840 00:44:58,180 --> 00:45:03,680 it's B. Well take a look at D, right. 841 00:45:03,680 --> 00:45:09,530 So D has three nodes, one, two, three. 842 00:45:09,530 --> 00:45:12,660 Node 1 and node 2 can just draw a line anywhere 843 00:45:12,660 --> 00:45:15,140 they want, involving the inputs they receive. 844 00:45:15,140 --> 00:45:18,470 What input does node 1 receive? 845 00:45:18,470 --> 00:45:19,280 Let's go to node 1. 846 00:45:21,910 --> 00:45:26,550 So it can only make a cut off based on i 1. 847 00:45:26,550 --> 00:45:30,800 So therefore, it can only draw by making the cut off 848 00:45:30,800 --> 00:45:32,200 above and below a certain point. 849 00:45:32,200 --> 00:45:34,990 Node 1 can only draw vertical lines. 850 00:45:34,990 --> 00:45:37,174 Node 2 can only draw a horizontal line, 851 00:45:37,174 --> 00:45:38,590 because it can only make a cut off 852 00:45:38,590 --> 00:45:41,780 based on where it is an i 2. 853 00:45:41,780 --> 00:45:44,390 Therefore they can't both draw a horizontal. 854 00:45:44,390 --> 00:45:46,640 That's why this is the trickiest part. 855 00:45:46,640 --> 00:45:49,160 This last part, because B is more powerful. 856 00:45:49,160 --> 00:45:51,340 B does not only have to do two horizontal lines. 857 00:45:51,340 --> 00:45:54,280 It can do two diagonal lines. 858 00:45:54,280 --> 00:45:55,490 It can do anything it wants. 859 00:45:55,490 --> 00:45:58,222 It just happens that it's stuck doing this somewhat easier 860 00:45:58,222 --> 00:46:00,680 problem, because the fact that it is the only one left that 861 00:46:00,680 --> 00:46:02,490 has the power to do it. 862 00:46:02,490 --> 00:46:05,760 So let's see, we're done, and we'd 863 00:46:05,760 --> 00:46:09,490 have aced this part of the quiz that like no one got, 864 00:46:09,490 --> 00:46:11,240 well not no one, but very few people got, 865 00:46:11,240 --> 00:46:13,660 when we put it on in 2008. 866 00:46:13,660 --> 00:46:17,140 The only thing we have left to ask 867 00:46:17,140 --> 00:46:22,430 is-- let me see-- yeah, the only thing we have left to ask 868 00:46:22,430 --> 00:46:29,931 is what are we going to do here for this? 869 00:46:29,931 --> 00:46:30,430 All right. 870 00:46:30,430 --> 00:46:31,070 Let's see. 871 00:46:33,860 --> 00:46:38,360 For the x or, let's see if I can do this x or. 872 00:46:41,360 --> 00:46:43,240 OK. 873 00:46:43,240 --> 00:46:44,970 How about this one. 874 00:46:44,970 --> 00:46:45,510 Right. 875 00:46:45,510 --> 00:46:46,180 I'm an idiot. 876 00:46:46,180 --> 00:46:47,910 This is the easiest way. 877 00:46:47,910 --> 00:46:49,926 Number one draws this line. 878 00:46:49,926 --> 00:46:51,050 Number two draws this line. 879 00:46:51,050 --> 00:46:55,307 Number three ends the line, the two lines. 880 00:46:55,307 --> 00:46:56,890 Number three says only if both of them 881 00:46:56,890 --> 00:46:58,360 are true, will I accept. 882 00:46:58,360 --> 00:47:02,510 Number four maps the two lines. 883 00:47:02,510 --> 00:47:05,550 And number five ors between three and four. 884 00:47:08,430 --> 00:47:09,740 Thank you. 885 00:47:09,740 --> 00:47:11,840 No, it's not that hard. 886 00:47:11,840 --> 00:47:13,902 I just completely blanked, because there's 887 00:47:13,902 --> 00:47:15,860 another way that a lot of people like to do it. 888 00:47:15,860 --> 00:47:17,300 It involves drawing in a lot of lines, 889 00:47:17,300 --> 00:47:18,570 and then making the clef b 2. 890 00:47:18,570 --> 00:47:20,280 But I can't remember it at the moment. 891 00:47:20,280 --> 00:47:21,610 Or there any other questions? 892 00:47:21,610 --> 00:47:26,958 Because I think if you have a question now, 893 00:47:26,958 --> 00:47:28,582 like four other people have it and just 894 00:47:28,582 --> 00:47:29,630 aren't raising their hand. 895 00:47:29,630 --> 00:47:31,470 So ask any questions about this drawing thing. 896 00:47:31,470 --> 00:47:31,970 Question? 897 00:47:31,970 --> 00:47:33,569 STUDENT: Why do we do this? 898 00:47:33,569 --> 00:47:35,360 PROFESSOR: Why do we do this drawing thing? 899 00:47:35,360 --> 00:47:37,680 That's a very good question. 900 00:47:37,680 --> 00:47:40,990 The answer is so that you can see what kinds of nets 901 00:47:40,990 --> 00:47:43,680 you might need to use in these simple problems, 902 00:47:43,680 --> 00:47:45,590 to answer these simple problems. 903 00:47:45,590 --> 00:47:51,320 So that if Athena forbid that you 904 00:47:51,320 --> 00:47:54,340 have to use a neural net in a job 905 00:47:54,340 --> 00:47:56,640 somewhere to do some actual learning, 906 00:47:56,640 --> 00:48:00,260 and you see some sort of quality about the problem, 907 00:48:00,260 --> 00:48:02,910 you know not to make a net that's too simple, 908 00:48:02,910 --> 00:48:03,720 for instance. 909 00:48:03,720 --> 00:48:05,178 And you wouldn't want a net that is 910 00:48:05,178 --> 00:48:06,660 more complex than it has to be. 911 00:48:06,660 --> 00:48:10,840 So you can sort of see what the net's do at each level, 912 00:48:10,840 --> 00:48:13,140 and more visibly understand. 913 00:48:13,140 --> 00:48:15,970 I think a lot of people who drew problems like this just 914 00:48:15,970 --> 00:48:17,595 want to make sure people know, oh yeah, 915 00:48:17,595 --> 00:48:19,636 it's not just these numbers that we're mindlessly 916 00:48:19,636 --> 00:48:21,760 backpropagating from the other part of the problem 917 00:48:21,760 --> 00:48:23,720 to make them higher or lower. 918 00:48:23,720 --> 00:48:25,760 This is what we're doing at each level. 919 00:48:25,760 --> 00:48:29,280 This is the space that we're looking at. 920 00:48:29,280 --> 00:48:32,780 Each node is performing logic on the steps before. 921 00:48:32,780 --> 00:48:36,680 So that if you actually have to use a neural net later on, down 922 00:48:36,680 --> 00:48:41,409 the road, then you'll be able to figure out 923 00:48:41,409 --> 00:48:43,200 what your net's going to need to look like. 924 00:48:43,200 --> 00:48:45,530 You'll be able to figure out what it's doing. 925 00:48:45,530 --> 00:48:47,174 At least as well as you can figure out 926 00:48:47,174 --> 00:48:48,590 what it's doing, for a neural net, 927 00:48:48,590 --> 00:48:50,950 since it often will start getting up 928 00:48:50,950 --> 00:48:54,070 these really crazy numbers, will have all sorts of nodes in it, 929 00:48:54,070 --> 00:48:56,981 and like a real neural net that's being used nowadays, 930 00:48:56,981 --> 00:48:58,730 there'll be tons of nodes, and you'll just 931 00:48:58,730 --> 00:49:00,104 see the numbers fluctuate wildly, 932 00:49:00,104 --> 00:49:04,402 and then suddenly it's going to start working or not. 933 00:49:04,402 --> 00:49:05,360 That's a good question. 934 00:49:05,360 --> 00:49:06,193 Any other questions? 935 00:49:06,193 --> 00:49:08,220 We still have a few minutes. 936 00:49:08,220 --> 00:49:09,340 Not many, but a few. 937 00:49:09,340 --> 00:49:11,564 Any other questions about any of this stuff? 938 00:49:11,564 --> 00:49:12,064 Sorry. 939 00:49:12,064 --> 00:49:12,556 STUDENT: Talk about what you just asked. 940 00:49:12,556 --> 00:49:15,016 Just because we draw it, does the machine need to learn-- 941 00:49:20,929 --> 00:49:23,220 PROFESSOR: You're confused why the machine is run what, 942 00:49:23,220 --> 00:49:26,140 by the pictures on the right? 943 00:49:26,140 --> 00:49:27,137 Oh OK. 944 00:49:27,137 --> 00:49:29,720 Machine does not have to learn by drawing pictures and calling 945 00:49:29,720 --> 00:49:30,676 them in. 946 00:49:30,676 --> 00:49:32,300 Let me give you some real applications. 947 00:49:32,300 --> 00:49:35,110 My friend at the University of Maryland 948 00:49:35,110 --> 00:49:38,300 recently actually used neural nets 949 00:49:38,300 --> 00:49:41,640 because, yeah, he actually did, because of the fact that he 950 00:49:41,640 --> 00:49:45,410 was doing an game plan competition, where 951 00:49:45,410 --> 00:49:48,795 the game was not known when you were designing your AI. 952 00:49:48,795 --> 00:49:52,490 It had to be able to-- there was some very elegant, general game 953 00:49:52,490 --> 00:49:54,950 solver thing that you had be able to hook up into, 954 00:49:54,950 --> 00:49:58,140 and then they made up the rules, and you had a little bit 955 00:49:58,140 --> 00:49:59,570 of time, and then it started. 956 00:49:59,570 --> 00:50:03,320 Some of the AI's, what they did was, they trained, 957 00:50:03,320 --> 00:50:05,780 once they found out what the rules were on their own, 958 00:50:05,780 --> 00:50:08,450 with the rules, in his case he had a neural net, because it 959 00:50:08,450 --> 00:50:11,660 was so generic, you just have a web of random gook. 960 00:50:11,660 --> 00:50:13,970 He thought it could learn anything, 961 00:50:13,970 --> 00:50:16,910 and then-- he never did tell me how it went, probably 962 00:50:16,910 --> 00:50:18,090 didn't go well. 963 00:50:18,090 --> 00:50:20,700 But maybe it did. 964 00:50:20,700 --> 00:50:25,119 It basically tried to learn some things about the rules. 965 00:50:25,119 --> 00:50:26,660 Some of the other people who are more 966 00:50:26,660 --> 00:50:29,930 principled game players actually tried to find out 967 00:50:29,930 --> 00:50:32,950 fundamental properties of the space of the rules 968 00:50:32,950 --> 00:50:34,820 by testing a few different things, 969 00:50:34,820 --> 00:50:36,640 so they could view more knowledge 970 00:50:36,640 --> 00:50:39,110 is less search so they could do less search 971 00:50:39,110 --> 00:50:41,180 when the actual game playing came on. 972 00:50:41,180 --> 00:50:43,390 And then when the actual game playing came on, 973 00:50:43,390 --> 00:50:48,680 pretty much everyone did some kind of game tree based stuff. 974 00:50:48,680 --> 00:50:51,640 He's telling me that a lot of Monte Carlo 975 00:50:51,640 --> 00:50:55,340 based game tree stuff that is this very non deterministic 976 00:50:55,340 --> 00:50:57,349 as what they're doing nowadays, rather than 977 00:50:57,349 --> 00:50:59,140 what determines the alpha beta, although he 978 00:50:59,140 --> 00:51:02,380 said it converges to alpha beta, if you've given enough time. 979 00:51:02,380 --> 00:51:05,050 That's what he told me, But that someone I 980 00:51:05,050 --> 00:51:06,570 know who is using neural nets. 981 00:51:06,570 --> 00:51:08,800 I've also in a cognitive science class I took, 982 00:51:08,800 --> 00:51:11,720 saw neural nets that tried to attach like qualities 983 00:51:11,720 --> 00:51:15,620 to objects, by having just this huge, huge number of nodes 984 00:51:15,620 --> 00:51:18,160 in levels in between, and then eventually it was like, 985 00:51:18,160 --> 00:51:20,780 a duck flies, and you're like, how's it doing this again? 986 00:51:20,780 --> 00:51:22,640 I'm not sure, but it is. 987 00:51:22,640 --> 00:51:25,890 So the basic idea is that when-- one 988 00:51:25,890 --> 00:51:28,140 of the main reasons that neural nets were used so much 989 00:51:28,140 --> 00:51:31,340 back in the day is that people on many different sides 990 00:51:31,340 --> 00:51:33,450 of this problem, cognitive science, AI, 991 00:51:33,450 --> 00:51:35,620 whatever, were all saying, wait a minute, 992 00:51:35,620 --> 00:51:38,610 there's networks of neurons, and they can do stuff, 993 00:51:38,610 --> 00:51:40,360 and we're seeing it in different places. 994 00:51:40,360 --> 00:51:42,860 And when you've seen it in so many different places at once, 995 00:51:42,860 --> 00:51:44,490 must be a genius idea that's going 996 00:51:44,490 --> 00:51:46,140 to revolutionize everything. 997 00:51:46,140 --> 00:51:47,780 And so then everyone started using 998 00:51:47,780 --> 00:51:50,860 them to try to connect all these things together, which I think 999 00:51:50,860 --> 00:51:53,340 is a noble endeavor, but unfortunately people 1000 00:51:53,340 --> 00:51:54,300 just stopped using it. 1001 00:51:54,300 --> 00:51:56,180 It didn't work as they wanted. 1002 00:51:56,180 --> 00:51:58,950 It turned out that figuring out our neurons worked in our head 1003 00:51:58,950 --> 00:52:03,860 was not the way to solve all AI hard problems at once. 1004 00:52:03,860 --> 00:52:06,360 And they fall into disfavor, although are still 1005 00:52:06,360 --> 00:52:09,510 used for some reasons, like the sum is like that. 1006 00:52:09,510 --> 00:52:11,610 So we wouldn't use it just to draw these pictures. 1007 00:52:11,610 --> 00:52:13,492 The reason why we have these pictures 1008 00:52:13,492 --> 00:52:15,950 is because we give you simple nets that you can work it out 1009 00:52:15,950 --> 00:52:17,720 by hand on the quiz. 1010 00:52:17,720 --> 00:52:20,340 Any net that is really used nowadays 1011 00:52:20,340 --> 00:52:23,700 would make your head explode, if we 1012 00:52:23,700 --> 00:52:26,140 tried to make you do something with it on the quiz. 1013 00:52:26,140 --> 00:52:27,660 It would just be horrible. 1014 00:52:27,660 --> 00:52:29,315 So I think that's a good question. 1015 00:52:29,315 --> 00:52:31,440 If there's no other questions, or even if they are, 1016 00:52:31,440 --> 00:52:34,390 because we have to head out, if there's any other questions, 1017 00:52:34,390 --> 00:52:37,360 you can see me as I'm walking out.