1 00:00:01,640 --> 00:00:04,040 The following content is provided under a Creative 2 00:00:04,040 --> 00:00:05,580 Commons license. 3 00:00:05,580 --> 00:00:07,880 Your support will help MIT OpenCourseWare 4 00:00:07,880 --> 00:00:12,270 continue to offer high-quality, educational resources for free. 5 00:00:12,270 --> 00:00:14,870 To make a donation or view additional materials 6 00:00:14,870 --> 00:00:18,830 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:18,830 --> 00:00:22,400 at ocw.mit.edu. 8 00:00:22,400 --> 00:00:24,590 TOMER ULLMAN: And today, with your active help 9 00:00:24,590 --> 00:00:27,470 and participation, I hope to run a probabilistic programming 10 00:00:27,470 --> 00:00:30,459 tutorial in the time that we have left. 11 00:00:30,459 --> 00:00:32,000 And we're going to focus specifically 12 00:00:32,000 --> 00:00:33,874 on a language called Church, which 13 00:00:33,874 --> 00:00:36,290 is a probabilistic programming language that was developed 14 00:00:36,290 --> 00:00:39,020 in Josh Tenenbaum's Group, but it's 15 00:00:39,020 --> 00:00:41,600 now taken on a life of its own and has set up 16 00:00:41,600 --> 00:00:44,510 shop in other places. 17 00:00:44,510 --> 00:00:46,010 Before I get started, I should say, 18 00:00:46,010 --> 00:00:47,720 I was, sort of, looking for a good image. 19 00:00:47,720 --> 00:00:49,400 I didn't like a blank page, so I was 20 00:00:49,400 --> 00:00:51,237 googling just Church tutorial. 21 00:00:51,237 --> 00:00:52,570 This is the first thing I found. 22 00:00:52,570 --> 00:00:54,290 It's an image for Minecraft about how 23 00:00:54,290 --> 00:00:56,030 to build a church in Minecraft. 24 00:00:56,030 --> 00:00:56,960 Does any of us-- 25 00:00:56,960 --> 00:00:58,375 people have heard of Minecraft? 26 00:00:58,375 --> 00:00:59,000 AUDIENCE: Yeah. 27 00:00:59,000 --> 00:00:59,900 TOMER ULLMAN: They've played with Minecraft? 28 00:00:59,900 --> 00:01:01,730 OK-- just in case you don't know, 29 00:01:01,730 --> 00:01:04,959 Minecraft is a sort of procedurally-generated world 30 00:01:04,959 --> 00:01:06,932 where you get some building blocks, literally, 31 00:01:06,932 --> 00:01:08,890 building blocks, that you can build stuff with. 32 00:01:08,890 --> 00:01:11,330 And you can build an infinite number of things, including 33 00:01:11,330 --> 00:01:13,239 a computer, and a church. 34 00:01:13,239 --> 00:01:14,030 And it's very cool. 35 00:01:14,030 --> 00:01:15,530 And I thought it's actually not that 36 00:01:15,530 --> 00:01:18,542 bad of an image for a tutorial about probabilistic programming 37 00:01:18,542 --> 00:01:20,000 language, which is also about, sort 38 00:01:20,000 --> 00:01:22,160 of, procedurally-generative things that 39 00:01:22,160 --> 00:01:24,984 use small building blocks to build up an entire world. 40 00:01:24,984 --> 00:01:26,900 And I thought, OK, that's the first hit I got. 41 00:01:26,900 --> 00:01:28,010 What's the other hit? 42 00:01:28,010 --> 00:01:29,820 Well, it's just another church, and another church, 43 00:01:29,820 --> 00:01:31,340 and another church, another church, another church 44 00:01:31,340 --> 00:01:34,080 that you can build in Minecraft from many different angles 45 00:01:34,080 --> 00:01:36,620 and many different tutorials, so maybe, instead of this, 46 00:01:36,620 --> 00:01:38,870 you can just train some deep-learning algorithm 47 00:01:38,870 --> 00:01:43,110 to, I don't know, learn a billion churches and do that. 48 00:01:43,110 --> 00:01:45,000 That's not what we're after. 49 00:01:45,000 --> 00:01:47,240 So probabilistic programming, Josh already 50 00:01:47,240 --> 00:01:48,887 talked a bunch about this, so I'll 51 00:01:48,887 --> 00:01:50,720 sort of be repeating him, or channeling him. 52 00:01:50,720 --> 00:01:53,245 It's about combining the best of both worlds 53 00:01:53,245 --> 00:01:54,620 in the, sort of, two states of AI 54 00:01:54,620 --> 00:01:58,354 right now, which is statistical modeling and logic. 55 00:01:58,354 --> 00:01:59,770 And in many models, you have this, 56 00:01:59,770 --> 00:02:02,854 sort of, dual question of representation and learning. 57 00:02:02,854 --> 00:02:05,270 And it's really, sort of, a problem for cognitive science, 58 00:02:05,270 --> 00:02:07,687 going back to the days of before cognitive science, right? 59 00:02:07,687 --> 00:02:10,020 I mean, this is the sort of problem that a lot of people 60 00:02:10,020 --> 00:02:12,170 had when they tried to model the human mind. 61 00:02:12,170 --> 00:02:13,330 This goes back to Turing. 62 00:02:13,330 --> 00:02:15,890 Sort of, when we want to build a system that 63 00:02:15,890 --> 00:02:18,380 is human-like in its intelligence, 64 00:02:18,380 --> 00:02:19,880 the two questions that we face are, 65 00:02:19,880 --> 00:02:21,740 what are the representations that it will have, 66 00:02:21,740 --> 00:02:23,114 and how is it going to learn them 67 00:02:23,114 --> 00:02:25,190 or how is it going to learn anything new? 68 00:02:25,190 --> 00:02:26,210 And you often have to, sort of-- it's 69 00:02:26,210 --> 00:02:27,440 a short blanket problem, right? 70 00:02:27,440 --> 00:02:28,880 If you try to cover your head, your feet 71 00:02:28,880 --> 00:02:29,765 are sort of not getting anything. 72 00:02:29,765 --> 00:02:30,950 If you try to cover your feet, your head's 73 00:02:30,950 --> 00:02:32,120 not getting anything. 74 00:02:32,120 --> 00:02:34,250 Because oftentimes, you find that, if you stick 75 00:02:34,250 --> 00:02:36,110 to a particularly easy representation that's 76 00:02:36,110 --> 00:02:39,080 sort of easy to code or rather something, 77 00:02:39,080 --> 00:02:41,390 a kind of a presentation that's easy to learn, 78 00:02:41,390 --> 00:02:44,000 like say a vector of weights that you're just trying 79 00:02:44,000 --> 00:02:47,600 to shift your weights around, then, yes, that 80 00:02:47,600 --> 00:02:49,520 might be easy, relatively easy, but you're 81 00:02:49,520 --> 00:02:51,155 sort of stuck with the representation 82 00:02:51,155 --> 00:02:53,270 that you can learn are weights. 83 00:02:53,270 --> 00:02:55,310 Or Josh was making a big point about this, 84 00:02:55,310 --> 00:02:57,770 and it is a big point, that if you try to learn something 85 00:02:57,770 --> 00:03:01,190 like causal Bayes nets, then you're sort of limited 86 00:03:01,190 --> 00:03:02,242 by that representation. 87 00:03:02,242 --> 00:03:03,950 That is your representation of these sort 88 00:03:03,950 --> 00:03:07,010 of circles and arrows that go into other circles. 89 00:03:07,010 --> 00:03:09,302 And that might get you very, very far. 90 00:03:09,302 --> 00:03:11,510 And you might even have very good learning algorithms 91 00:03:11,510 --> 00:03:14,000 for those particular models, for those particular 92 00:03:14,000 --> 00:03:15,560 representations, that are tailored 93 00:03:15,560 --> 00:03:17,000 for those representations. 94 00:03:17,000 --> 00:03:19,040 Like, in these causal circles and arrows, 95 00:03:19,040 --> 00:03:22,852 belief propagation might be a very good learning algorithm, 96 00:03:22,852 --> 00:03:24,560 but if you commit to that representation, 97 00:03:24,560 --> 00:03:26,750 then you are sort of stuck with that representation. 98 00:03:26,750 --> 00:03:28,250 And you might not be flexible enough 99 00:03:28,250 --> 00:03:30,530 to learn all the stuff that you want to. 100 00:03:30,530 --> 00:03:33,320 And a very flexible representation, sort of, 101 00:03:33,320 --> 00:03:35,060 one of the more flexible ones that 102 00:03:35,060 --> 00:03:36,860 have come onto the scene in the past years 103 00:03:36,860 --> 00:03:39,442 is why don't we try to learn a program? 104 00:03:39,442 --> 00:03:41,400 I say it's come onto the scene in recent years. 105 00:03:41,400 --> 00:03:42,400 That's not exactly true. 106 00:03:42,400 --> 00:03:44,705 People have been interested in learning programs 107 00:03:44,705 --> 00:03:46,580 for many, many years, for many, many decades, 108 00:03:46,580 --> 00:03:48,350 but they sort of try to infer them 109 00:03:48,350 --> 00:03:49,880 from kind of a logical perspective, 110 00:03:49,880 --> 00:03:51,680 not really getting these probabilistic learning 111 00:03:51,680 --> 00:03:52,070 algorithms. 112 00:03:52,070 --> 00:03:53,611 I'm sort of throwing words out there, 113 00:03:53,611 --> 00:03:55,600 but it'll make more sense as I go through it. 114 00:03:55,600 --> 00:03:57,530 And you already have some of Josh's stuff 115 00:03:57,530 --> 00:03:58,470 to carry you through. 116 00:03:58,470 --> 00:04:00,470 But the point is, there's always these questions 117 00:04:00,470 --> 00:04:02,134 of learning and representation. 118 00:04:02,134 --> 00:04:03,800 For probabilistic programming languages, 119 00:04:03,800 --> 00:04:05,930 the representation is not circles and arrows, 120 00:04:05,930 --> 00:04:09,132 it's not vectors of weights, it is programs. 121 00:04:09,132 --> 00:04:10,590 That's what you're trying to learn. 122 00:04:10,590 --> 00:04:12,810 That's what you're trying to figure out the world with. 123 00:04:12,810 --> 00:04:14,226 And then there's a question of how 124 00:04:14,226 --> 00:04:18,709 do you learn these programs, but we'll get to that. 125 00:04:18,709 --> 00:04:22,646 OK, let's see, so we think it's a good representation for AI 126 00:04:22,646 --> 00:04:24,020 and cognition for all the reasons 127 00:04:24,020 --> 00:04:25,922 that Josh just talked about. 128 00:04:25,922 --> 00:04:27,380 And there's been a growing interest 129 00:04:27,380 --> 00:04:29,450 in these things for the past 10 years, 130 00:04:29,450 --> 00:04:31,460 witnessed both by the proliferation 131 00:04:31,460 --> 00:04:33,880 of many, many different types of programming languages-- 132 00:04:33,880 --> 00:04:36,690 sorry, probabilistic programming languages. 133 00:04:36,690 --> 00:04:38,864 I don't know whether to call them PPL, or people, 134 00:04:38,864 --> 00:04:39,530 or what exactly. 135 00:04:39,530 --> 00:04:41,210 But probabilistic programming languages, 136 00:04:41,210 --> 00:04:42,980 there's been PyMC based on Python, 137 00:04:42,980 --> 00:04:44,396 there's Church, which you're going 138 00:04:44,396 --> 00:04:49,100 to play with right now, but also BLOG, WinBUGS, ProbLog, 139 00:04:49,100 --> 00:04:51,787 Venture, many others that I haven't mentioned here. 140 00:04:51,787 --> 00:04:53,370 So first of all, there's many of them. 141 00:04:53,370 --> 00:04:55,250 And also, DARPA has started taking interest 142 00:04:55,250 --> 00:04:59,080 and has given a large grant to advance this field. 143 00:04:59,080 --> 00:05:00,602 They think it might be big. 144 00:05:00,602 --> 00:05:02,810 If you're in, sort of, probabilistic programming more 145 00:05:02,810 --> 00:05:05,309 generally than Church, you think it's interesting to follow, 146 00:05:05,309 --> 00:05:07,940 you want to learn more about it, you should go to this thing, 147 00:05:07,940 --> 00:05:11,222 probabilistic-programming.org wiki. 148 00:05:11,222 --> 00:05:12,680 It sort of keeps it up-to-date with 149 00:05:12,680 --> 00:05:15,267 many, many, many, different types of programming languages. 150 00:05:15,267 --> 00:05:17,600 You don't necessarily have to write this down right now. 151 00:05:17,600 --> 00:05:18,980 I will send you the slides later, 152 00:05:18,980 --> 00:05:21,110 but just, sort of, keep it in mind form, 153 00:05:21,110 --> 00:05:23,400 link to it in your head. 154 00:05:23,400 --> 00:05:25,860 There's also this, sort of, nice summary from this DARPA-- 155 00:05:25,860 --> 00:05:28,310 so DARPA started sending this about a year ago. 156 00:05:28,310 --> 00:05:30,102 And someone already went to a summer school 157 00:05:30,102 --> 00:05:31,851 on probabilistic programming and, sort of, 158 00:05:31,851 --> 00:05:33,110 wrote the state of the field. 159 00:05:33,110 --> 00:05:34,110 It's six months ago. 160 00:05:34,110 --> 00:05:36,814 It's a bit outdated, but it also makes 161 00:05:36,814 --> 00:05:38,480 for an interesting read for those of you 162 00:05:38,480 --> 00:05:40,134 who want to follow that. 163 00:05:40,134 --> 00:05:42,050 OK, so that's about probabilistic programming, 164 00:05:42,050 --> 00:05:43,800 very, very, very generally. 165 00:05:43,800 --> 00:05:45,920 What about Church very, very, very generally? 166 00:05:45,920 --> 00:05:48,470 So as I said, Church is one example 167 00:05:48,470 --> 00:05:50,144 of a probabilistic programming language. 168 00:05:50,144 --> 00:05:51,560 It was developed by several people 169 00:05:51,560 --> 00:05:53,840 at MIT who have since gone on to do 170 00:05:53,840 --> 00:05:55,610 other different things like continue 171 00:05:55,610 --> 00:05:58,040 to develop Church at Stanford. 172 00:05:58,040 --> 00:05:59,950 That's Professor Noah Goodman. 173 00:05:59,950 --> 00:06:02,640 Although, of course, he's doing many, many other things. 174 00:06:02,640 --> 00:06:04,460 There's also been Vikash Mansinghka, 175 00:06:04,460 --> 00:06:06,980 who has gone on to develop other probabilistic programming 176 00:06:06,980 --> 00:06:11,330 languages like Venture at MIT. 177 00:06:11,330 --> 00:06:14,450 And one thing to say generally about probabilistic programming 178 00:06:14,450 --> 00:06:16,430 languages is that, usually they are 179 00:06:16,430 --> 00:06:18,350 based on an already existing language. 180 00:06:18,350 --> 00:06:21,350 So you take MATLAB and you try to make it probabilistic. 181 00:06:21,350 --> 00:06:24,920 You take Python and you try to make it probabilistic. 182 00:06:24,920 --> 00:06:27,580 Julia has a probabilistic programming implementation. 183 00:06:27,580 --> 00:06:30,050 Church in particular is based on Scheme, 184 00:06:30,050 --> 00:06:32,510 which is the derivative of LISP, which is itself 185 00:06:32,510 --> 00:06:35,452 sort of an attempt to capture lambda calculus, which is not 186 00:06:35,452 --> 00:06:37,160 a programming language, it is an approach 187 00:06:37,160 --> 00:06:41,750 to trying to think about all possible functions developed 188 00:06:41,750 --> 00:06:43,400 by Alonzo Church. 189 00:06:43,400 --> 00:06:45,320 And that's why Church is called Church. 190 00:06:45,320 --> 00:06:48,930 It has nothing to do with the actual buildings. 191 00:06:48,930 --> 00:06:51,830 So the point about Scheme which is very nice 192 00:06:51,830 --> 00:06:53,240 is that it's very compositional. 193 00:06:53,240 --> 00:06:54,740 And anything that you write can then 194 00:06:54,740 --> 00:06:57,230 be passed off into the other functions as the data. 195 00:06:57,230 --> 00:06:59,390 You'll see some examples of that. 196 00:06:59,390 --> 00:07:01,001 Church has several inference engines 197 00:07:01,001 --> 00:07:02,000 that you can try to run. 198 00:07:02,000 --> 00:07:03,260 We'll get into that. 199 00:07:03,260 --> 00:07:06,020 The backbone of it is Metropolis-Hastings-type 200 00:07:06,020 --> 00:07:08,390 sampling over possible programs, but it 201 00:07:08,390 --> 00:07:11,815 has other types of programming, including explicit enumeration. 202 00:07:11,815 --> 00:07:13,190 If your space is small enough, it 203 00:07:13,190 --> 00:07:16,190 can just look at all the possible ways to run a program. 204 00:07:16,190 --> 00:07:17,500 It has rejection query. 205 00:07:17,500 --> 00:07:18,547 Again, we'll get to this. 206 00:07:18,547 --> 00:07:20,630 Don't worry about, like, what is he talking about. 207 00:07:23,427 --> 00:07:25,010 Yeah, so it has a whole bunch of-- you 208 00:07:25,010 --> 00:07:27,300 know, particle filtering is one attempt at that. 209 00:07:27,300 --> 00:07:29,270 But the point is there are-- 210 00:07:29,270 --> 00:07:30,980 each probabilistic programming language 211 00:07:30,980 --> 00:07:33,770 has its own set of inference engine. 212 00:07:33,770 --> 00:07:36,000 Some of them try to go the Metropolis-Hastings route. 213 00:07:36,000 --> 00:07:37,760 Some of them try to say, well, it's 214 00:07:37,760 --> 00:07:39,301 a probabilistic programming language, 215 00:07:39,301 --> 00:07:41,414 but it's really limited to causal Bayes nets, 216 00:07:41,414 --> 00:07:42,830 so the inference engines are going 217 00:07:42,830 --> 00:07:45,080 to be stuff that's good for causal Bayes nets. 218 00:07:45,080 --> 00:07:49,530 But all of them sort of share this dream of, 219 00:07:49,530 --> 00:07:53,870 it's easier to write the forward model than the inference. 220 00:07:53,870 --> 00:07:54,950 And it's really annoying. 221 00:07:54,950 --> 00:07:57,650 Those of you who have ever tried to write an inference engine 222 00:07:57,650 --> 00:07:59,942 or to write inference over any sort of model, 223 00:07:59,942 --> 00:08:01,400 it's really annoying to write that. 224 00:08:01,400 --> 00:08:03,290 And it usually sort of only works for the one thing 225 00:08:03,290 --> 00:08:05,180 that you've built. And one of the selling 226 00:08:05,180 --> 00:08:07,100 points of probabilistic programming languages, 227 00:08:07,100 --> 00:08:08,630 one of the reasons that DARPA took 228 00:08:08,630 --> 00:08:11,150 an interest, beyond the fact that they can try to capture 229 00:08:11,150 --> 00:08:14,240 the human mind, and flexible AI, and all that, is they 230 00:08:14,240 --> 00:08:16,760 have this sort of promise, this pitch that, why don't you 231 00:08:16,760 --> 00:08:18,830 just write down the forward model, 232 00:08:18,830 --> 00:08:21,530 how you think the world works, and we'll, kind of, take 233 00:08:21,530 --> 00:08:24,350 care of inference for you. 234 00:08:24,350 --> 00:08:26,090 And in many cases, it turns out to be 235 00:08:26,090 --> 00:08:28,549 a lot easier to write the forward model 236 00:08:28,549 --> 00:08:30,590 than to try to write the inference engine for it. 237 00:08:30,590 --> 00:08:32,659 In fact, you can very quickly get 238 00:08:32,659 --> 00:08:36,500 to something that's even, like, five or six lines of code long, 239 00:08:36,500 --> 00:08:39,854 that would be intractable, would be very hard to write down 240 00:08:39,854 --> 00:08:41,270 the analytic expression for, would 241 00:08:41,270 --> 00:08:43,070 be very hard to think about what would 242 00:08:43,070 --> 00:08:46,010 be the inference engine for, but it's really just easy to write. 243 00:08:46,010 --> 00:08:47,305 I mean, all you have is a set of assumptions. 244 00:08:47,305 --> 00:08:49,490 And you're trying to figure out how they work together. 245 00:08:49,490 --> 00:08:51,180 Again, we'll see some examples of that. 246 00:08:51,180 --> 00:08:53,380 But my point was all probabilistic programming 247 00:08:53,380 --> 00:08:55,880 languages are about writing the forward model and then, sort 248 00:08:55,880 --> 00:08:58,942 of, trying to do the inference for you. 249 00:08:58,942 --> 00:09:00,650 Another point about Church in particular, 250 00:09:00,650 --> 00:09:02,675 it is under construction, so you'll notice 251 00:09:02,675 --> 00:09:03,800 this when you write it now. 252 00:09:03,800 --> 00:09:04,580 It will break. 253 00:09:04,580 --> 00:09:05,430 It will freeze. 254 00:09:05,430 --> 00:09:08,810 It will do all sorts of annoying things, 255 00:09:08,810 --> 00:09:10,560 so it is under construction. 256 00:09:10,560 --> 00:09:12,976 It's not exactly something that you would then go and work 257 00:09:12,976 --> 00:09:14,330 with like MATLAB. 258 00:09:14,330 --> 00:09:17,570 Let me put some caveats on that caveat, which is these two 259 00:09:17,570 --> 00:09:18,680 asterisks right here. 260 00:09:18,680 --> 00:09:21,200 First of all, despite being a, sort of, a toy language, 261 00:09:21,200 --> 00:09:23,660 it's already been used in several serious scientific 262 00:09:23,660 --> 00:09:25,760 papers, including a paper in Science, 263 00:09:25,760 --> 00:09:29,370 because it is very easy to make certain points about cognition 264 00:09:29,370 --> 00:09:31,790 or about computational cognition in Church that 265 00:09:31,790 --> 00:09:34,400 is very hard to do in certain other languages. 266 00:09:34,400 --> 00:09:37,370 In particular, things that require recursion, or inference 267 00:09:37,370 --> 00:09:39,675 over inference, where you write down sort of the way 268 00:09:39,675 --> 00:09:41,300 that you think about an agent, then you 269 00:09:41,300 --> 00:09:43,130 put that into another agent, that 270 00:09:43,130 --> 00:09:45,260 can be very hard to write in certain languages. 271 00:09:45,260 --> 00:09:49,110 Church can kind of do that more easily. 272 00:09:49,110 --> 00:09:52,337 Let's see, I had another caveat, which is-- 273 00:09:52,337 --> 00:09:52,920 what was that? 274 00:09:52,920 --> 00:09:54,680 Oh, another caveat is that, despite it 275 00:09:54,680 --> 00:09:56,600 being under construction, you sort of think, well, why should 276 00:09:56,600 --> 00:09:57,641 I worry about this thing? 277 00:09:57,641 --> 00:09:59,770 Why should I even bother hacking with it? 278 00:09:59,770 --> 00:10:03,470 Is because, you'll notice there's probmods.org. 279 00:10:03,470 --> 00:10:05,530 And there are just a ton, a ton of examples. 280 00:10:05,530 --> 00:10:08,060 There's a semester worth of examples 281 00:10:08,060 --> 00:10:11,380 of all sorts of things from both cognition, and AI, 282 00:10:11,380 --> 00:10:13,190 and interesting statistical models 283 00:10:13,190 --> 00:10:16,449 that are very easy to understand in Church. 284 00:10:16,449 --> 00:10:17,990 And for me at least, it was very much 285 00:10:17,990 --> 00:10:20,990 a process of demystification that something like this 286 00:10:20,990 --> 00:10:21,650 can help with. 287 00:10:21,650 --> 00:10:23,858 You learn about something like the Chinese restaurant 288 00:10:23,858 --> 00:10:25,889 process, the Dirichlet process, nonparametrics, 289 00:10:25,889 --> 00:10:28,430 and it's kind of hard to read the textbook description of it. 290 00:10:28,430 --> 00:10:29,670 It's hard to wrap your head around. 291 00:10:29,670 --> 00:10:31,430 And then you go and you write three lines of code, 292 00:10:31,430 --> 00:10:32,346 or five lines of code. 293 00:10:32,346 --> 00:10:35,187 And you think, oh, that wasn't so bad, right? 294 00:10:35,187 --> 00:10:36,770 And it's sort of easy to write a bunch 295 00:10:36,770 --> 00:10:39,050 of these things in Church, so it's a useful tool 296 00:10:39,050 --> 00:10:40,640 for demystification. 297 00:10:40,640 --> 00:10:42,140 It's a useful tool to get a handle 298 00:10:42,140 --> 00:10:44,730 on certain models in cognition and statistics, 299 00:10:44,730 --> 00:10:46,370 so those are the two asterisks. 300 00:10:46,370 --> 00:10:51,592 Be warned, but also, you know, do play around with it. 301 00:10:51,592 --> 00:10:53,550 Let's see, the founding paper, for those of you 302 00:10:53,550 --> 00:10:55,841 who are interested, you can look at this link later on. 303 00:10:55,841 --> 00:10:58,260 It was by Goodman, Mansinghka, Dan Roy, Bonawitz, 304 00:10:58,260 --> 00:10:59,240 and Tenenbaum. 305 00:10:59,240 --> 00:11:01,220 And for those of you who, by the way, 306 00:11:01,220 --> 00:11:03,290 have already read about Church a bit, 307 00:11:03,290 --> 00:11:05,840 you think that this tutorial is a bit-- maybe it was-- 308 00:11:05,840 --> 00:11:08,720 I should say, we'll start off very, very easy, OK? 309 00:11:08,720 --> 00:11:09,980 We'll do things like addition. 310 00:11:09,980 --> 00:11:11,660 We'll do things like flipping coins, OK? 311 00:11:11,660 --> 00:11:12,799 If you think that this is-- 312 00:11:12,799 --> 00:11:14,590 maybe you've already read through probmods, 313 00:11:14,590 --> 00:11:17,060 you've already done a few chapters of that, by all means, 314 00:11:17,060 --> 00:11:19,130 use this time to continue to think 315 00:11:19,130 --> 00:11:21,230 about probabilistic programming, for example, 316 00:11:21,230 --> 00:11:22,610 either by talking to me, and I'll 317 00:11:22,610 --> 00:11:26,930 find something for you, or by going to forestdb.org. 318 00:11:26,930 --> 00:11:29,540 Again, I'll give you that link for those of you who want it. 319 00:11:29,540 --> 00:11:31,790 It has a whole repository of different probabilistic 320 00:11:31,790 --> 00:11:34,610 programming models that you can play with, think about, see 321 00:11:34,610 --> 00:11:36,690 how you would change them, and things like that. 322 00:11:36,690 --> 00:11:39,090 Also after this tutorial, if you're still interested, 323 00:11:39,090 --> 00:11:41,330 you can go to that link. 324 00:11:41,330 --> 00:11:43,170 Oh and one last thing. 325 00:11:43,170 --> 00:11:45,890 There's sort of a-- you can't see that right there. 326 00:11:45,890 --> 00:11:48,820 One last thing that I should say about Church, 327 00:11:48,820 --> 00:11:49,820 it's based on Scheme. 328 00:11:49,820 --> 00:11:51,736 But a lot of the people that have sort of been 329 00:11:51,736 --> 00:11:53,420 doing a lot of work on it have become 330 00:11:53,420 --> 00:11:55,682 more in love with JavaScript. 331 00:11:55,682 --> 00:11:57,890 In fact, the thing that you're going to be working on 332 00:11:57,890 --> 00:11:59,750 is sort of a JavaScript implementation 333 00:11:59,750 --> 00:12:01,240 of Church under the hood. 334 00:12:01,240 --> 00:12:04,730 And they've started to implement something called WebPPL, so 335 00:12:04,730 --> 00:12:07,040 Web Probabilistic Programming Language. 336 00:12:07,040 --> 00:12:08,570 It's a language that's specifically 337 00:12:08,570 --> 00:12:09,699 a derivative of JavaScript. 338 00:12:09,699 --> 00:12:11,240 For those of you who like JavaScript, 339 00:12:11,240 --> 00:12:12,540 you can play with that. 340 00:12:12,540 --> 00:12:15,112 And if you go to WebPPL.org, if you search for WebPPL, 341 00:12:15,112 --> 00:12:16,820 again, I can leave you the link for that. 342 00:12:16,820 --> 00:12:18,950 It's sort of here, but you can't see it. 343 00:12:18,950 --> 00:12:20,870 There are, again, a lot of nice examples there 344 00:12:20,870 --> 00:12:22,730 of different programming language-- programs 345 00:12:22,730 --> 00:12:24,920 that you can write in JavaScript. 346 00:12:24,920 --> 00:12:29,420 OK, that was a very long-winded introduction, 347 00:12:29,420 --> 00:12:31,807 caveats, and setting up different things. 348 00:12:31,807 --> 00:12:33,890 The objectives for this tutorial is, first of all, 349 00:12:33,890 --> 00:12:35,780 to become familiar with the Church syntax, 350 00:12:35,780 --> 00:12:38,720 it can be a little wonky, if you don't know it, at first, to run 351 00:12:38,720 --> 00:12:40,910 forward a few models to give you an example of just, 352 00:12:40,910 --> 00:12:44,300 before inference, an example of, here's my forward model, 353 00:12:44,300 --> 00:12:47,032 here's how I describe the world, now let's try sampling from it. 354 00:12:47,032 --> 00:12:48,740 Let's sample, sample again, sample again, 355 00:12:48,740 --> 00:12:51,665 sample again, see what distributions we get. 356 00:12:51,665 --> 00:12:53,767 Get a sense for the point that I'm 357 00:12:53,767 --> 00:12:55,850 going to make a few times, which is once you write 358 00:12:55,850 --> 00:12:58,250 your forward model, that is a representation 359 00:12:58,250 --> 00:12:59,935 of a distribution-- 360 00:12:59,935 --> 00:13:01,310 and I'll come back to this point, 361 00:13:01,310 --> 00:13:02,851 but just, sort of, keep that in mind. 362 00:13:02,851 --> 00:13:03,950 You write down a program. 363 00:13:03,950 --> 00:13:04,908 And you run it forward. 364 00:13:04,908 --> 00:13:05,870 And you get a sample. 365 00:13:05,870 --> 00:13:07,870 You run it again and you get a different sample. 366 00:13:07,870 --> 00:13:12,200 You run it in the limit, you get some distribution. 367 00:13:12,200 --> 00:13:15,110 Some other constructs like memoization-- 368 00:13:15,110 --> 00:13:16,790 after we do all of this, we'll try 369 00:13:16,790 --> 00:13:20,316 to get at sampling, and the query operator, and really, 370 00:13:20,316 --> 00:13:21,440 conditioning and inference. 371 00:13:21,440 --> 00:13:23,750 So we said we'll try to run a few models forward. 372 00:13:23,750 --> 00:13:26,910 Once we do that, we'll try to get the hang of inference. 373 00:13:26,910 --> 00:13:29,840 So you'll try to write down a forward model about things 374 00:13:29,840 --> 00:13:32,639 like a coin, or goal inference, or things like that. 375 00:13:32,639 --> 00:13:34,430 And you'll try to actually infer something, 376 00:13:34,430 --> 00:13:35,930 like what is the weight of the coin, 377 00:13:35,930 --> 00:13:41,510 from some data, like some coin flips, some very simple stuff. 378 00:13:41,510 --> 00:13:44,180 OK, and we'll go through some examples, like, as I said, 379 00:13:44,180 --> 00:13:46,520 coin flipping, maybe causal networks, maybe intuitive 380 00:13:46,520 --> 00:13:48,320 physics and intuitive psychology. 381 00:13:48,320 --> 00:13:50,150 I do hope to get to intuitive psychology. 382 00:13:50,150 --> 00:13:52,300 We'll see if we get to that. 383 00:13:52,300 --> 00:13:53,810 So some prerequisites and set up, 384 00:13:53,810 --> 00:13:55,770 that's what I asked you to do at the beginning. 385 00:13:55,770 --> 00:13:58,340 If you happen to have a local implementation, 386 00:13:58,340 --> 00:13:59,690 you can open that now. 387 00:13:59,690 --> 00:14:05,720 If you didn't, just go to probmods.org/play-space.html 388 00:14:05,720 --> 00:14:08,120 and open that up. 389 00:14:08,120 --> 00:14:11,510 And we're going to play a game of Noisy Tomer Says. 390 00:14:11,510 --> 00:14:13,280 So now you should also-- 391 00:14:13,280 --> 00:14:15,430 open this, open a browser, go to that, 392 00:14:15,430 --> 00:14:17,270 or open your local implementation. 393 00:14:17,270 --> 00:14:23,144 Also open up the file that I sent you of-- 394 00:14:23,144 --> 00:14:25,310 it should have have been called, like, student copy, 395 00:14:25,310 --> 00:14:26,630 something like that. 396 00:14:26,630 --> 00:14:28,713 It contains a bunch of things that we're basically 397 00:14:28,713 --> 00:14:31,887 going to just sort of copy, paste into the browser. 398 00:14:31,887 --> 00:14:33,470 Now, the nice thing about this browser 399 00:14:33,470 --> 00:14:35,270 is, it is sort of a working implementation of Church. 400 00:14:35,270 --> 00:14:36,395 You just paste in the code. 401 00:14:36,395 --> 00:14:37,130 You hit run. 402 00:14:37,130 --> 00:14:38,906 It runs, OK? 403 00:14:38,906 --> 00:14:41,405 So you guys should all more or less have a screen like this. 404 00:14:44,095 --> 00:14:46,282 I'll take this out so I don't sit on it right now. 405 00:14:46,282 --> 00:14:47,990 Does everyone have more or less something 406 00:14:47,990 --> 00:14:49,823 like this, some sort of browser that you can 407 00:14:49,823 --> 00:14:51,920 type things into and press run? 408 00:14:51,920 --> 00:14:52,980 Over there? 409 00:14:52,980 --> 00:14:55,730 OK, we'll start off with some very, very simple stuff 410 00:14:55,730 --> 00:15:00,320 that you should already have in the syntax of the Church 411 00:15:00,320 --> 00:15:02,960 tutorial, so just try either pasting in or typing 412 00:15:02,960 --> 00:15:06,300 in things like this thing. 413 00:15:06,300 --> 00:15:09,080 So the first thing you'll notice is that, over here, it's 414 00:15:09,080 --> 00:15:11,040 what's called-- 415 00:15:11,040 --> 00:15:13,810 sorry, let me adjust this screen so it's not actually-- 416 00:15:13,810 --> 00:15:15,110 so that you can see it. 417 00:15:18,000 --> 00:15:20,320 Zone C over here, you should be looking-- 418 00:15:20,320 --> 00:15:26,977 I've sort of done over here, plus 2 2, and the result is 4. 419 00:15:26,977 --> 00:15:28,560 So the first thing to see, some of you 420 00:15:28,560 --> 00:15:31,185 may be familiar with this, who's somebody with Polish notation, 421 00:15:31,185 --> 00:15:34,020 where you just go plus 2 2? 422 00:15:34,020 --> 00:15:35,630 Instead of going 2 plus-- 423 00:15:35,630 --> 00:15:38,670 who is not familiar with Polish notation? 424 00:15:38,670 --> 00:15:40,710 OK, good, thank you. 425 00:15:40,710 --> 00:15:43,650 Polish notation just means that, instead of writing 2 plus 2, 426 00:15:43,650 --> 00:15:46,710 you write plus 2 2, so you write that the thing that operates, 427 00:15:46,710 --> 00:15:49,620 the function, outside, and you write all the arguments 428 00:15:49,620 --> 00:15:51,296 for the function like that. 429 00:15:51,296 --> 00:15:52,920 In fact, most of the time, you do this. 430 00:15:52,920 --> 00:15:54,390 When you write down functions for code, 431 00:15:54,390 --> 00:15:56,430 you usually write the function then the things 432 00:15:56,430 --> 00:15:57,327 that it operates on. 433 00:15:57,327 --> 00:15:59,160 But here, it's going to work for everything. 434 00:15:59,160 --> 00:16:00,743 And it can be a bit confusing at first 435 00:16:00,743 --> 00:16:03,252 when you do things like plus 2 2. 436 00:16:03,252 --> 00:16:05,460 The second thing is that you put brackets on anything 437 00:16:05,460 --> 00:16:07,346 that you want to evaluate, OK? 438 00:16:07,346 --> 00:16:08,970 So, for example, here is an expression. 439 00:16:08,970 --> 00:16:10,740 The expression is plus 2 2. 440 00:16:10,740 --> 00:16:13,350 And you want to evaluate that expression. 441 00:16:13,350 --> 00:16:17,550 So for example, I wanted to evaluate the expression-- 442 00:16:17,550 --> 00:16:19,737 I think I put some, like, cursor for-- 443 00:16:19,737 --> 00:16:21,570 so you can see what I'm doing with my thing. 444 00:16:21,570 --> 00:16:24,960 OK, if you want to do something like, you know, times 2 2, 445 00:16:24,960 --> 00:16:26,230 that would be the same thing. 446 00:16:26,230 --> 00:16:30,520 And I would go to run. 447 00:16:30,520 --> 00:16:32,595 And that would be, of course, 4 again. 448 00:16:32,595 --> 00:16:34,860 It let's you do some other examples from here. 449 00:16:34,860 --> 00:16:36,660 Like there's a bunch of simple logic, 450 00:16:36,660 --> 00:16:38,640 like you might do display. 451 00:16:38,640 --> 00:16:40,560 Display is just a way to run it, to-- sorry, 452 00:16:40,560 --> 00:16:43,200 to display the result over here. 453 00:16:43,200 --> 00:16:46,030 You can do a bunch of logic things, like equal. 454 00:16:46,030 --> 00:16:47,670 So again, the operator is outside. 455 00:16:47,670 --> 00:16:49,914 And you would do equal question mark 2 2, 456 00:16:49,914 --> 00:16:51,330 and then evaluate that expression. 457 00:16:51,330 --> 00:16:53,040 And you can do bigger than equals, 458 00:16:53,040 --> 00:16:54,165 all these different things. 459 00:16:54,165 --> 00:16:55,080 AUDIENCE: the question mark? 460 00:16:55,080 --> 00:16:56,130 TOMER ULLMAN: The question mark is just-- 461 00:16:56,130 --> 00:16:57,005 I've just named it that way. 462 00:16:57,005 --> 00:16:58,463 It doesn't actually have any sense. 463 00:16:58,463 --> 00:17:02,754 I could have just called it equal-- sorry, no, sorry. 464 00:17:02,754 --> 00:17:04,920 There is no particular meaning to the question mark. 465 00:17:04,920 --> 00:17:06,690 It's just that this thing, this operator, 466 00:17:06,690 --> 00:17:08,432 is called equal question mark. 467 00:17:08,432 --> 00:17:09,390 That's the name for it. 468 00:17:09,390 --> 00:17:11,069 And it's just-- it is the equals operator. 469 00:17:11,069 --> 00:17:12,652 That's how you check if two things are 470 00:17:12,652 --> 00:17:13,661 equal to one another. 471 00:17:13,661 --> 00:17:15,119 In languages like Python, you would 472 00:17:15,119 --> 00:17:18,119 do, you know, equals equals, like that. 473 00:17:18,119 --> 00:17:22,170 This is how you do it here, OK? 474 00:17:22,170 --> 00:17:24,270 Let's see, a few other simple syntax things. 475 00:17:24,270 --> 00:17:26,880 So you might say, for example, the statement 476 00:17:26,880 --> 00:17:30,060 for defining variables is, shockingly enough, 477 00:17:30,060 --> 00:17:33,720 define, so you would do define x 3. 478 00:17:33,720 --> 00:17:37,220 And now, the next time that I do x, then hopefully-- 479 00:17:37,220 --> 00:17:40,020 and I run that-- then it'll show 3. 480 00:17:40,020 --> 00:17:42,300 There are a few other basic syntax things, 481 00:17:42,300 --> 00:17:45,030 like lists, that might be important, like, you know, 482 00:17:45,030 --> 00:17:48,300 define x to be a list of 1 2 3. 483 00:17:48,300 --> 00:17:51,150 And if you run that, then you'll get 1 2 3. 484 00:17:51,150 --> 00:17:53,490 Again, we're starting out very, very slow, 485 00:17:53,490 --> 00:17:56,070 but we'll hopefully get soon to more things 486 00:17:56,070 --> 00:17:58,510 like Gaussian processes. 487 00:17:58,510 --> 00:18:01,740 Some simple things like if-then statements-- 488 00:18:01,740 --> 00:18:04,680 OK, I'm just copying and pasting off of this document 489 00:18:04,680 --> 00:18:06,330 that you should all have, so that's 490 00:18:06,330 --> 00:18:07,871 why I'm, sort of, running through it. 491 00:18:07,871 --> 00:18:09,900 But the point is that you would do-- the syntax 492 00:18:09,900 --> 00:18:17,280 for doing an if-then conditional statement is like this. 493 00:18:17,280 --> 00:18:21,940 You write down if, and then you write down the condition 494 00:18:21,940 --> 00:18:24,935 that either evaluates to true or to false. 495 00:18:24,935 --> 00:18:29,910 So it's if this condition, do the first thing. 496 00:18:29,910 --> 00:18:33,206 If it's false, do the second thing. 497 00:18:33,206 --> 00:18:34,830 In this particular case, I have defined 498 00:18:34,830 --> 00:18:36,330 a variable called socrates. 499 00:18:36,330 --> 00:18:38,400 I've defined it as drunk. 500 00:18:38,400 --> 00:18:43,110 And then I run the condition equal socrates drunk, 501 00:18:43,110 --> 00:18:45,462 if that's true, then return the answer true. 502 00:18:45,462 --> 00:18:47,670 Or, you know, I could have written return the answer, 503 00:18:47,670 --> 00:18:48,600 Socrates is a drunk. 504 00:18:48,600 --> 00:18:50,730 If it's false, return the answer false. 505 00:18:50,730 --> 00:18:53,430 Did everyone more or less get the conditional? 506 00:18:53,430 --> 00:18:56,160 It just says, if condition, return the first thing 507 00:18:56,160 --> 00:18:58,025 otherwise, the thing on the second line. 508 00:18:58,025 --> 00:18:59,400 Another important thing before we 509 00:18:59,400 --> 00:19:02,467 start getting at more things like recursion 510 00:19:02,467 --> 00:19:04,050 and forward sampling is the notion of, 511 00:19:04,050 --> 00:19:06,300 how would I define a function? 512 00:19:06,300 --> 00:19:08,160 So, so far we've defined variables, right? 513 00:19:08,160 --> 00:19:12,124 I could have defined something like define x 2, right? 514 00:19:12,124 --> 00:19:13,790 And then that would have just been that. 515 00:19:13,790 --> 00:19:15,998 But I want to define, probably, functions, so I might 516 00:19:15,998 --> 00:19:17,860 define something like define-- 517 00:19:17,860 --> 00:19:19,050 and now I have two options. 518 00:19:19,050 --> 00:19:23,112 There are two ways of defining functions in Church. 519 00:19:23,112 --> 00:19:24,570 One of them is to do the following. 520 00:19:24,570 --> 00:19:26,860 You define square. 521 00:19:26,860 --> 00:19:31,230 And then you say, well, square is, itself, a procedure. 522 00:19:31,230 --> 00:19:31,982 It is a lambda. 523 00:19:31,982 --> 00:19:33,690 And I'll explain this as I go along, just 524 00:19:33,690 --> 00:19:35,190 watch me, sort of, type it. 525 00:19:35,190 --> 00:19:38,060 It takes in a particular argument, say, x. 526 00:19:38,060 --> 00:19:41,794 And then what it does to, is it multiplies x by x. 527 00:19:41,794 --> 00:19:46,180 So the point is, you say, well, here, x is a particular thing. 528 00:19:46,180 --> 00:19:46,920 It is an object. 529 00:19:46,920 --> 00:19:47,419 What is it? 530 00:19:47,419 --> 00:19:48,460 It is just 2. 531 00:19:48,460 --> 00:19:50,940 Here, square is a thing. 532 00:19:50,940 --> 00:19:52,290 What sort of thing is it? 533 00:19:52,290 --> 00:19:53,830 It is this thing. 534 00:19:53,830 --> 00:19:55,770 Ah, what is this thing? 535 00:19:55,770 --> 00:19:59,940 This thing is a procedure that-- this is the only thing that you 536 00:19:59,940 --> 00:20:01,500 need to know about functions. 537 00:20:01,500 --> 00:20:04,290 Lambda is the thing that actually defines functions, OK? 538 00:20:04,290 --> 00:20:06,870 It is a procedure that takes in some number of arguments, 539 00:20:06,870 --> 00:20:08,310 in this case, just one argument. 540 00:20:08,310 --> 00:20:09,270 You could have called it anything. 541 00:20:09,270 --> 00:20:09,990 I just called it x. 542 00:20:09,990 --> 00:20:11,430 You could have called it argument1. 543 00:20:11,430 --> 00:20:12,540 You could have called it socrates. 544 00:20:12,540 --> 00:20:14,010 You could have called it fubar. 545 00:20:14,010 --> 00:20:15,540 But the point is, it takes in this argument. 546 00:20:15,540 --> 00:20:17,760 And then what does it do with it is the next thing? 547 00:20:17,760 --> 00:20:20,327 So you say, lambda, number of arguments that you take in. 548 00:20:20,327 --> 00:20:21,660 And then what do you do with it? 549 00:20:21,660 --> 00:20:24,320 In this case, you just do times x x. 550 00:20:24,320 --> 00:20:27,240 So this is a function called square, very basic stuff. 551 00:20:27,240 --> 00:20:30,250 It takes in an argument and it multiplies it by itself, 552 00:20:30,250 --> 00:20:34,986 so it is the square of x, x times x, very simple. 553 00:20:34,986 --> 00:20:37,110 There's another way of doing that if you don't want 554 00:20:37,110 --> 00:20:38,880 to type out lambdas, if you don't want 555 00:20:38,880 --> 00:20:41,100 to start doing lambda this, lambda that, 556 00:20:41,100 --> 00:20:43,117 it's sort of annoying. 557 00:20:43,117 --> 00:20:44,700 Let me just give you one more example. 558 00:20:44,700 --> 00:20:46,260 Like, if I wanted something with two arguments, 559 00:20:46,260 --> 00:20:48,843 I could have done-- you know, I could have called it something 560 00:20:48,843 --> 00:20:51,570 like my-proc lambda x y. 561 00:20:51,570 --> 00:20:53,950 And now, what it does is, it multiplies xy. 562 00:20:53,950 --> 00:20:56,604 OK, this is an example of a thing. 563 00:20:56,604 --> 00:20:57,645 What sort of thing is it? 564 00:20:57,645 --> 00:20:58,650 It is a procedure. 565 00:20:58,650 --> 00:21:00,900 I know it's a procedure because it starts with lambda. 566 00:21:00,900 --> 00:21:02,535 It takes in two arguments. 567 00:21:02,535 --> 00:21:04,427 Here they're called x and y. 568 00:21:04,427 --> 00:21:05,510 What does that do with it? 569 00:21:05,510 --> 00:21:06,900 It multiplies x times y. 570 00:21:06,900 --> 00:21:08,460 Really, this is just multiplication. 571 00:21:08,460 --> 00:21:09,930 So after I define this procedure, 572 00:21:09,930 --> 00:21:12,120 I could then do, like, my-proc-- sorry, 573 00:21:12,120 --> 00:21:13,470 I should have explained that. 574 00:21:13,470 --> 00:21:15,895 Then you do my-proc, say, 2 8, or something like that. 575 00:21:15,895 --> 00:21:16,770 AUDIENCE: [INAUDIBLE] 576 00:21:16,770 --> 00:21:17,894 TOMER ULLMAN: Yeah, sorry-- 577 00:21:17,894 --> 00:21:19,845 that's a very good question. 578 00:21:19,845 --> 00:21:20,970 And it would bring back 16. 579 00:21:20,970 --> 00:21:25,590 Sorry, once I define my thing, this is an operator now. 580 00:21:25,590 --> 00:21:28,470 This is an operator that can be applied to arguments. 581 00:21:28,470 --> 00:21:33,060 And you apply it by doing that parentheses that we just saw. 582 00:21:33,060 --> 00:21:35,381 If I just tried, by the way, like, without applying it, 583 00:21:35,381 --> 00:21:37,880 if I just tried something like this, what you would get back 584 00:21:37,880 --> 00:21:44,050 is, it would say, this is a function, because it just says, 585 00:21:44,050 --> 00:21:44,980 what is this thing? 586 00:21:44,980 --> 00:21:46,030 You try to evaluate it. 587 00:21:46,030 --> 00:21:48,310 You're not evaluating on anything, so it just returns, 588 00:21:48,310 --> 00:21:48,830 what is this thing? 589 00:21:48,830 --> 00:21:49,520 It's a function. 590 00:21:49,520 --> 00:21:53,594 It's a function that expects x y and then multiplies them. 591 00:21:53,594 --> 00:21:55,510 If you actually want to apply it on something, 592 00:21:55,510 --> 00:21:57,676 you would need to provide with some input arguments. 593 00:21:57,676 --> 00:22:02,220 So I said, let's try to define square as a lambda of x. 594 00:22:02,220 --> 00:22:05,890 That does-- it takes in an x and multiplies x by x. 595 00:22:05,890 --> 00:22:09,400 There's one more way to define a function, which, 596 00:22:09,400 --> 00:22:11,860 it sort of gets rid of this lambda type thing. 597 00:22:11,860 --> 00:22:14,152 It's exactly equivalent to the thing I just showed you, 598 00:22:14,152 --> 00:22:16,234 it just takes a bit less writing, which is to say, 599 00:22:16,234 --> 00:22:16,780 define-- 600 00:22:16,780 --> 00:22:19,090 I just misspelled square, didn't I? 601 00:22:19,090 --> 00:22:20,080 Yes. 602 00:22:20,080 --> 00:22:23,530 Define square x-- like that-- 603 00:22:23,530 --> 00:22:25,870 times x x. 604 00:22:25,870 --> 00:22:28,570 Now what this is saying, so this just 605 00:22:28,570 --> 00:22:30,940 goes straight to saying, like, before I would say, 606 00:22:30,940 --> 00:22:33,730 define this thing 2. 607 00:22:33,730 --> 00:22:36,280 OK, and then I said, define this thing, the square, 608 00:22:36,280 --> 00:22:38,830 as this procedure. 609 00:22:38,830 --> 00:22:41,770 Here you can say, I want to directly define a procedure. 610 00:22:41,770 --> 00:22:43,750 I'm not going to bother with this lambda stuff. 611 00:22:43,750 --> 00:22:45,190 I want to directly define a function. 612 00:22:45,190 --> 00:22:46,773 I want to directly define a procedure. 613 00:22:46,773 --> 00:22:47,440 Can I do that? 614 00:22:47,440 --> 00:22:49,060 Yes, you could if you wanted to. 615 00:22:49,060 --> 00:22:51,370 You would just directly put these brackets right there. 616 00:22:51,370 --> 00:22:52,690 You would say define. 617 00:22:52,690 --> 00:22:55,480 And if the next thing is some brackets, then it says, 618 00:22:55,480 --> 00:22:57,190 OK, I'm going to define a procedure 619 00:22:57,190 --> 00:22:59,470 where the name of the procedure is square. 620 00:22:59,470 --> 00:23:02,020 And it takes in one argument, which is x. 621 00:23:02,020 --> 00:23:04,562 And what it does to it is times x x. 622 00:23:04,562 --> 00:23:06,520 And if you do it that way, then under the hood, 623 00:23:06,520 --> 00:23:08,940 what Scheme does is actually writes it out like this. 624 00:23:08,940 --> 00:23:10,600 It puts in the lambda where it expects, 625 00:23:10,600 --> 00:23:13,050 but again, this is not terribly important stuff. 626 00:23:13,050 --> 00:23:14,800 And those of you are, sort of, tuning out, 627 00:23:14,800 --> 00:23:16,150 and saying, well, fine. 628 00:23:16,150 --> 00:23:18,108 And you just wanted to learn about-- a bit more 629 00:23:18,108 --> 00:23:20,990 about how probabilistic programming works, don't worry. 630 00:23:20,990 --> 00:23:23,254 We'll get to some examples in about 10 minutes. 631 00:23:23,254 --> 00:23:25,420 Here's another very useful thing that you might want 632 00:23:25,420 --> 00:23:27,220 to do in many of your things. 633 00:23:27,220 --> 00:23:28,540 This is called the map. 634 00:23:28,540 --> 00:23:31,420 And the way map works is, you map a function 635 00:23:31,420 --> 00:23:32,780 to a bunch of arguments. 636 00:23:32,780 --> 00:23:34,750 So you would say-- 637 00:23:34,750 --> 00:23:36,610 map is just a high-level function 638 00:23:36,610 --> 00:23:38,470 which takes in a particular procedure. 639 00:23:38,470 --> 00:23:42,640 Then it applies it to each one of these things individually, 640 00:23:42,640 --> 00:23:43,270 OK? 641 00:23:43,270 --> 00:23:44,830 So square, in this case, as we said, 642 00:23:44,830 --> 00:23:47,270 it is a thing that takes in one argument. 643 00:23:47,270 --> 00:23:49,520 So this is now going to take square and apply it to 1. 644 00:23:49,520 --> 00:23:50,845 So then I'm going to take square and apply it to 2, 645 00:23:50,845 --> 00:23:52,121 take square and apply it to 3. 646 00:23:52,121 --> 00:23:53,620 And the result of this is just going 647 00:23:53,620 --> 00:24:01,660 to be a list of squares, 1 4, 9, 16, 25, simple enough? 648 00:24:01,660 --> 00:24:02,830 Yes. 649 00:24:02,830 --> 00:24:04,160 But map is very useful. 650 00:24:04,160 --> 00:24:05,840 You should probably know about it. 651 00:24:05,840 --> 00:24:08,710 OK, some simple things like, recursion, OK, 652 00:24:08,710 --> 00:24:11,110 so suppose I wanted to apply square to the list 653 00:24:11,110 --> 00:24:15,730 from 1 to 100, and suppose I didn't have the range 1 to 100. 654 00:24:15,730 --> 00:24:17,560 Most languages in Scheme actually 655 00:24:17,560 --> 00:24:20,110 does have something called range, which gives you 656 00:24:20,110 --> 00:24:21,850 all the numbers from 1 to 100. 657 00:24:21,850 --> 00:24:22,750 Suppose I didn't. 658 00:24:22,750 --> 00:24:24,700 Suppose I want to construct all the numbers 1 to 100. 659 00:24:24,700 --> 00:24:26,449 I don't want to actually write them down-- 660 00:24:26,449 --> 00:24:29,830 1, 2, 3, 4, 5, 6, all the way up to 100. 661 00:24:29,830 --> 00:24:31,700 I can write down something that does that. 662 00:24:31,700 --> 00:24:33,580 And it uses a little bit of recursion. 663 00:24:33,580 --> 00:24:35,140 And the way it does it is this. 664 00:24:35,140 --> 00:24:37,610 This is just to get you used to recursion, 665 00:24:37,610 --> 00:24:40,960 because we'll be seeing it a little bit later. 666 00:24:40,960 --> 00:24:46,754 And this says, OK, I'm going to define something called range, 667 00:24:46,754 --> 00:24:49,170 which takes in an argument-- you should now be used to it, 668 00:24:49,170 --> 00:24:51,390 this is the same thing that we defined over here. 669 00:24:51,390 --> 00:24:53,235 We're going to call something a procedure. 670 00:24:53,235 --> 00:24:54,474 And we're going to call-- 671 00:24:54,474 --> 00:24:55,890 we're going to define a procedure. 672 00:24:55,890 --> 00:24:57,000 It's called range. 673 00:24:57,000 --> 00:24:59,470 It takes in an argument, n, one argument. 674 00:24:59,470 --> 00:25:00,460 What does it do? 675 00:25:00,460 --> 00:25:01,920 Well, it depends. 676 00:25:01,920 --> 00:25:03,160 It does a conditional. 677 00:25:03,160 --> 00:25:06,720 A conditional, it depends, let's see, is n equal to 0? 678 00:25:06,720 --> 00:25:10,190 If it's 0, just give me back an empty list. 679 00:25:10,190 --> 00:25:13,080 Does everyone sort of see that, if equal n 0, 680 00:25:13,080 --> 00:25:14,220 give me back a list. 681 00:25:14,220 --> 00:25:15,090 What if it's not 0? 682 00:25:15,090 --> 00:25:16,650 What if I did range 10? 683 00:25:16,650 --> 00:25:18,739 Oh, well, in that case, append-- 684 00:25:18,739 --> 00:25:21,030 another thing that you might want to know, so it's just 685 00:25:21,030 --> 00:25:24,690 combine these two things-- append what with what? 686 00:25:24,690 --> 00:25:31,680 Append range again with n minus 1 and with n. 687 00:25:31,680 --> 00:25:33,300 The point here is to say, OK, how 688 00:25:33,300 --> 00:25:35,434 do I get the numbers 1 to 100? 689 00:25:35,434 --> 00:25:36,600 I just, sort of, say range-- 690 00:25:36,600 --> 00:25:41,430 I want the range 1 to 100, so I say, 100-- am I at 0 yet? 691 00:25:41,430 --> 00:25:46,140 No, so take 100 and append it with range 99. 692 00:25:46,140 --> 00:25:47,740 What does range 99 do? 693 00:25:47,740 --> 00:25:49,110 Well, is 99 0? 694 00:25:49,110 --> 00:25:51,810 No, so give me back 99 plus what? 695 00:25:51,810 --> 00:25:53,060 Plus range 98. 696 00:25:53,060 --> 00:25:54,210 Is 98 0? 697 00:25:54,210 --> 00:25:57,290 No, keep going, so it's basically recursing-- range 698 00:25:57,290 --> 00:26:00,720 is a recursive function that calls itself until it 699 00:26:00,720 --> 00:26:04,110 hits 0, very simple recursion. 700 00:26:04,110 --> 00:26:05,610 And now you can do this to write out 701 00:26:05,610 --> 00:26:07,980 all the numbers from 1 to 100. 702 00:26:07,980 --> 00:26:09,750 And then you, if you were so inclined, 703 00:26:09,750 --> 00:26:14,440 you could do math square to that. 704 00:26:14,440 --> 00:26:15,670 OK, and we run that. 705 00:26:15,670 --> 00:26:18,010 And it gives me all the numbers from-- the squares 706 00:26:18,010 --> 00:26:19,690 of the numbers from 1 to 100. 707 00:26:19,690 --> 00:26:23,740 So far we've talked just about very basic stuff. 708 00:26:23,740 --> 00:26:26,290 This is no different from Scheme. 709 00:26:26,290 --> 00:26:28,817 You are all experts in Scheme notation and things like that. 710 00:26:28,817 --> 00:26:31,150 Let's move on to something a little bit more interesting 711 00:26:31,150 --> 00:26:33,940 that Church can do, which is, for example, take 712 00:26:33,940 --> 00:26:39,340 random sequences, and it can take random-- 713 00:26:39,340 --> 00:26:41,050 how should I put this? 714 00:26:41,050 --> 00:26:45,430 Kind of like plus is a basic thing in certain programming 715 00:26:45,430 --> 00:26:48,082 languages, it's a primitive, right? 716 00:26:48,082 --> 00:26:49,540 It's written into the language what 717 00:26:49,540 --> 00:26:52,030 plus means, what times means. 718 00:26:52,030 --> 00:26:53,470 You don't have to define that. 719 00:26:53,470 --> 00:26:55,619 The way most languages work is that they 720 00:26:55,619 --> 00:26:57,160 have this sort of long list of things 721 00:26:57,160 --> 00:26:58,329 that they need to evaluate. 722 00:26:58,329 --> 00:26:59,620 And they start evaluating them. 723 00:26:59,620 --> 00:27:02,650 And they're, sort of, OK, did I hit an expression I know, 724 00:27:02,650 --> 00:27:04,575 like a number or not? 725 00:27:04,575 --> 00:27:06,450 And it's, sort of, no, you didn't hit it yet. 726 00:27:06,450 --> 00:27:08,040 OK, fine, keep evaluating, keep evaluating, 727 00:27:08,040 --> 00:27:10,090 keep evaluating until you get some sort of primitive. 728 00:27:10,090 --> 00:27:12,070 And a primitive procedure could be something 729 00:27:12,070 --> 00:27:13,992 like plus or a number. 730 00:27:13,992 --> 00:27:15,700 In Church, there are primitive procedures 731 00:27:15,700 --> 00:27:18,067 which are random primitive procedures. 732 00:27:18,067 --> 00:27:19,900 They are procedures that, when you hit them, 733 00:27:19,900 --> 00:27:23,650 what you do is, you just return a value, a sampled value, 734 00:27:23,650 --> 00:27:26,480 from this expression, from this probability distribution. 735 00:27:26,480 --> 00:27:30,202 So the most basic random primitive, 736 00:27:30,202 --> 00:27:32,410 the most basic distribution that you can do in Church 737 00:27:32,410 --> 00:27:34,120 is something called flip. 738 00:27:34,120 --> 00:27:38,300 And if you just write down flip in Church, 739 00:27:38,300 --> 00:27:41,730 what you'll get, if you run it like that, is it tells you, 740 00:27:41,730 --> 00:27:43,250 well, it's a function. 741 00:27:43,250 --> 00:27:45,050 And it depends on certain arguments. 742 00:27:45,050 --> 00:27:46,400 And it tells you many, many things about it, 743 00:27:46,400 --> 00:27:47,566 but that's not what we want. 744 00:27:47,566 --> 00:27:50,450 We want to evaluate it, so put some parentheses around it. 745 00:27:50,450 --> 00:27:51,490 And we'll run it. 746 00:27:51,490 --> 00:27:54,440 And it will give us back false. 747 00:27:54,440 --> 00:27:55,575 OK, let's try that again. 748 00:27:55,575 --> 00:27:57,020 So let's run that again. 749 00:27:57,020 --> 00:27:59,629 It will gave us back true, OK, interesting. 750 00:27:59,629 --> 00:28:01,670 And if we run that again, you know, we get false. 751 00:28:01,670 --> 00:28:04,010 We run it again, and we get maybe true, maybe false. 752 00:28:04,010 --> 00:28:07,610 You could do repeat 1,000 times flip. 753 00:28:07,610 --> 00:28:09,410 OK, repeat is another important thing 754 00:28:09,410 --> 00:28:10,580 that you would need to know. 755 00:28:10,580 --> 00:28:12,860 It just says repeat as many times 756 00:28:12,860 --> 00:28:14,970 as you want to repeat some sort of function. 757 00:28:14,970 --> 00:28:16,970 In this case, the function is flip. 758 00:28:16,970 --> 00:28:19,290 OK, so repeat flip 1,000 times. 759 00:28:19,290 --> 00:28:21,540 I hope you guys are trying this while I'm saying this. 760 00:28:21,540 --> 00:28:23,039 Are people trying this more or less? 761 00:28:23,039 --> 00:28:24,110 OK, cool. 762 00:28:24,110 --> 00:28:25,451 So repeat 1,000 times flip. 763 00:28:25,451 --> 00:28:27,200 And what you'll get back is this long list 764 00:28:27,200 --> 00:28:29,670 of true, false, true, false, false, true, false, true. 765 00:28:29,670 --> 00:28:31,253 And it's independent from one another, 766 00:28:31,253 --> 00:28:33,627 because it's an exchangeable random sequence. 767 00:28:33,627 --> 00:28:35,460 And if you want to see what this looks like, 768 00:28:35,460 --> 00:28:37,293 well, you could just do something like hist. 769 00:28:38,992 --> 00:28:39,950 And you would run that. 770 00:28:39,950 --> 00:28:42,110 And you would get, you know, more or less 50-50. 771 00:28:42,110 --> 00:28:46,640 Not exactly 50-50, because I only ran it 1,000 times. 772 00:28:46,640 --> 00:28:48,950 If I had run this in the limit, what I would get 773 00:28:48,950 --> 00:28:52,550 is 50-50 on true-false. 774 00:28:52,550 --> 00:28:54,896 Now, what's nice about this is that this sort of gets 775 00:28:54,896 --> 00:28:57,437 at this thing that I was talking about earlier, where there's 776 00:28:57,437 --> 00:29:00,860 dual representation for any sort of probability distribution. 777 00:29:00,860 --> 00:29:02,990 You could either write the probability distribution 778 00:29:02,990 --> 00:29:04,050 in math. 779 00:29:04,050 --> 00:29:08,310 You could sort of say, well, the probability of true is 0.5. 780 00:29:08,310 --> 00:29:11,600 And the probability of false is 0.5. 781 00:29:11,600 --> 00:29:13,885 Now I've defined a distribution in math. 782 00:29:13,885 --> 00:29:15,740 And now you can say, well, what's 783 00:29:15,740 --> 00:29:18,650 conditioned on this, what can you do, and things like that. 784 00:29:18,650 --> 00:29:22,310 Or what you can do is, you can write a program such that, 785 00:29:22,310 --> 00:29:26,840 when you run it, it will sample one of these values. 786 00:29:26,840 --> 00:29:28,820 And in the limit, it samples it's such 787 00:29:28,820 --> 00:29:31,827 that it approximated the thing that we just defined in math. 788 00:29:31,827 --> 00:29:34,160 And you might say, well, why not just define it in math? 789 00:29:34,160 --> 00:29:37,160 Because oftentimes, it gets very, very, hairy very, very 790 00:29:37,160 --> 00:29:38,090 fast. 791 00:29:38,090 --> 00:29:40,200 And in fact, any sort of probability distribution 792 00:29:40,200 --> 00:29:42,110 that's well-defined and well-behaved, 793 00:29:42,110 --> 00:29:45,020 you can write as a program. 794 00:29:45,020 --> 00:29:49,140 A program which, if you run it many times, its sampling 795 00:29:49,140 --> 00:29:52,020 profile, the thing it will give you back if you sample it many, 796 00:29:52,020 --> 00:29:53,810 many different times, will give you back 797 00:29:53,810 --> 00:29:55,880 that probability distribution. 798 00:29:55,880 --> 00:29:57,440 Or you could equivalently say that, 799 00:29:57,440 --> 00:29:59,390 what it means for a probability distribution 800 00:29:59,390 --> 00:30:02,540 to be a probability distribution is to be some sort of program, 801 00:30:02,540 --> 00:30:05,627 to be some sort of procedure that gives you back a sample. 802 00:30:05,627 --> 00:30:07,460 And in the limit, you get some sort of thing 803 00:30:07,460 --> 00:30:09,590 that we're going to call the probability distribution. 804 00:30:09,590 --> 00:30:11,780 Actually, that's the way we define the probability 805 00:30:11,780 --> 00:30:14,550 distribution. 806 00:30:14,550 --> 00:30:16,250 And again, this gets in-- so one way 807 00:30:16,250 --> 00:30:18,860 to think about Church programs is that any Church program 808 00:30:18,860 --> 00:30:21,590 that you write-- if you just write plus 2 2, 809 00:30:21,590 --> 00:30:22,850 you'll get back 4. 810 00:30:22,850 --> 00:30:25,320 That's, in a way, a deterministic program, right? 811 00:30:25,320 --> 00:30:29,420 The probability of getting back 4 on this execution equals 1, 812 00:30:29,420 --> 00:30:32,210 but there are many other things that you could write 813 00:30:32,210 --> 00:30:34,350 and you could get back interesting things for them. 814 00:30:34,350 --> 00:30:35,808 And the point is to write something 815 00:30:35,808 --> 00:30:39,710 like a generative model that describes some sort of thing 816 00:30:39,710 --> 00:30:40,550 about the world. 817 00:30:40,550 --> 00:30:43,005 And when you run it forward, you get to a certain sample, 818 00:30:43,005 --> 00:30:44,880 but if you run in many, many different times, 819 00:30:44,880 --> 00:30:47,090 it gives you the probability distribution 820 00:30:47,090 --> 00:30:48,819 that this model describes. 821 00:30:48,819 --> 00:30:51,110 And now, if you-- and again, I'm getting slightly ahead 822 00:30:51,110 --> 00:30:51,680 of myself. 823 00:30:51,680 --> 00:30:53,690 If you change that model, if you, for example, 824 00:30:53,690 --> 00:30:56,690 condition on something, you'll get a different model. 825 00:30:56,690 --> 00:30:58,181 You'll get a different program. 826 00:30:58,181 --> 00:30:59,930 And you're trying to find the program such 827 00:30:59,930 --> 00:31:02,280 that its output will match the data. 828 00:31:02,280 --> 00:31:04,380 OK, but let's back up a little bit. 829 00:31:04,380 --> 00:31:06,740 And we're still in flip land. 830 00:31:06,740 --> 00:31:08,660 So we have here something which is flip. 831 00:31:08,660 --> 00:31:11,110 That's very, very basic. 832 00:31:11,110 --> 00:31:11,926 Flip can also be-- 833 00:31:11,926 --> 00:31:12,800 AUDIENCE: [INAUDIBLE] 834 00:31:12,800 --> 00:31:13,740 TOMER ULLMAN: OK. 835 00:31:13,740 --> 00:31:15,260 Flip can also be a biased coin. 836 00:31:15,260 --> 00:31:16,990 So for example, if I do-- 837 00:31:16,990 --> 00:31:19,745 I define something like, you know, define-- 838 00:31:22,970 --> 00:31:24,850 let's do this slightly differently. 839 00:31:24,850 --> 00:31:26,900 Let's call this lambda something. 840 00:31:26,900 --> 00:31:32,450 And what it does is flip 0.9. 841 00:31:32,450 --> 00:31:34,700 So if you run this forward, what you'll get now 842 00:31:34,700 --> 00:31:36,962 is that flip can actually take in some arguments. 843 00:31:36,962 --> 00:31:38,420 If you don't give it any arguments, 844 00:31:38,420 --> 00:31:39,861 it'll just do flip 50-50. 845 00:31:39,861 --> 00:31:41,360 If you give it some arguments, it'll 846 00:31:41,360 --> 00:31:45,920 do flip a biased coin, where the coin is biased towards 0.9. 847 00:31:45,920 --> 00:31:48,500 And you can see that, after I repeated that 1,000 times, 848 00:31:48,500 --> 00:31:52,700 I get, you know, it's approximately 90% heads, 849 00:31:52,700 --> 00:31:54,432 or true, and about 10% tails. 850 00:31:54,432 --> 00:31:56,390 AUDIENCE: Why did you make the lambda in there? 851 00:31:56,390 --> 00:31:58,730 TOMER ULLMAN: Ah, perfect, I'm glad somebody 852 00:31:58,730 --> 00:31:59,840 has asked that question. 853 00:31:59,840 --> 00:32:03,050 So if I were just to do the following-- suppose 854 00:32:03,050 --> 00:32:09,140 that I were just do repeat flip 0.9 like that, 855 00:32:09,140 --> 00:32:10,490 think about what would happen. 856 00:32:10,490 --> 00:32:13,910 What would happen is, I would first evaluate flip 0.9. 857 00:32:13,910 --> 00:32:17,370 OK, that would give me back a value, either true or false. 858 00:32:17,370 --> 00:32:20,320 And then this would say, repeat that 1,000 times. 859 00:32:20,320 --> 00:32:24,050 You would get, like, 1,000 trues, or 1,000 falses, 860 00:32:24,050 --> 00:32:25,480 or whatever it was that was first. 861 00:32:25,480 --> 00:32:27,470 In fact, it's going to fail, because repeat 862 00:32:27,470 --> 00:32:28,419 expects a function. 863 00:32:28,419 --> 00:32:30,710 But the point is, the reason that this is going to fail 864 00:32:30,710 --> 00:32:32,790 is because it wants a particular function. 865 00:32:32,790 --> 00:32:35,780 This is not a function, this is a value. 866 00:32:35,780 --> 00:32:36,964 You evaluate this first. 867 00:32:36,964 --> 00:32:38,630 It gives you a value like true or false. 868 00:32:38,630 --> 00:32:40,520 And then you repeat that value 1,000 times. 869 00:32:40,520 --> 00:32:41,561 That's not what you want. 870 00:32:41,561 --> 00:32:42,860 What you want is a procedure. 871 00:32:42,860 --> 00:32:46,800 A procedure, or a distribution, or something like that, 872 00:32:46,800 --> 00:32:49,470 some sort of function that, when you run it, 873 00:32:49,470 --> 00:32:52,580 you get a biased sample, so what would that look like? 874 00:32:52,580 --> 00:32:53,800 That would look like this. 875 00:32:53,800 --> 00:32:55,799 It would be-- or I could do something like this. 876 00:32:55,799 --> 00:32:59,990 Define my-coin weight-- 877 00:32:59,990 --> 00:33:03,360 OK, something like this. 878 00:33:03,360 --> 00:33:07,800 And what it does is this. 879 00:33:07,800 --> 00:33:11,430 Now what I've defined is, I've defined a procedure that 880 00:33:11,430 --> 00:33:13,919 takes in a particular weight. 881 00:33:13,919 --> 00:33:15,460 And what it does is that it gives you 882 00:33:15,460 --> 00:33:16,980 back a flip on that weight. 883 00:33:16,980 --> 00:33:19,196 AUDIENCE: [INAUDIBLE] 884 00:33:19,196 --> 00:33:21,070 TOMER ULLMAN: Yes, although you might, again, 885 00:33:21,070 --> 00:33:24,200 run into some problems, but we can get to that, because-- 886 00:33:24,200 --> 00:33:26,550 well, OK. 887 00:33:26,550 --> 00:33:29,992 So let's see-- 888 00:33:29,992 --> 00:33:32,860 AUDIENCE: How would define it as a lambda calculus? 889 00:33:32,860 --> 00:33:35,360 TOMER ULLMAN: OK, so how you would define it with the lambda 890 00:33:35,360 --> 00:33:39,130 calculus is, you would say my-coin lambda 891 00:33:39,130 --> 00:33:42,860 weight this thing. 892 00:33:42,860 --> 00:33:45,530 OK, now we're saying, what sort of thing is coin? 893 00:33:45,530 --> 00:33:46,700 Coin is a procedure. 894 00:33:46,700 --> 00:33:48,033 How do we know it's a procedure? 895 00:33:48,033 --> 00:33:49,934 Because we have this lambda right here. 896 00:33:49,934 --> 00:33:51,350 How many arguments does it expect? 897 00:33:51,350 --> 00:33:52,790 One, it's called weight. 898 00:33:52,790 --> 00:33:53,825 What does it do? 899 00:33:53,825 --> 00:33:55,140 It flips a coin. 900 00:33:55,140 --> 00:33:56,545 It gives you back that sample. 901 00:33:56,545 --> 00:33:57,410 AUDIENCE: Can I do-- 902 00:33:57,410 --> 00:33:59,118 TOMER ULLMAN: The equivalent way of doing 903 00:33:59,118 --> 00:34:02,300 that is by writing this thing without any lambdas. 904 00:34:02,300 --> 00:34:07,400 You would just write define my-coin-- 905 00:34:07,400 --> 00:34:09,199 notice the brackets there, right? 906 00:34:09,199 --> 00:34:11,840 Before we didn't have brackets around that-- define my-coin 907 00:34:11,840 --> 00:34:15,409 weight flip weight, like that. 908 00:34:15,409 --> 00:34:17,810 And now you're sort of saying, like, this is a procedure. 909 00:34:17,810 --> 00:34:19,310 You should know it's a procedure, because it's 910 00:34:19,310 --> 00:34:21,643 the first thing that you're hitting after define because 911 00:34:21,643 --> 00:34:23,022 of the parentheses. 912 00:34:23,022 --> 00:34:24,230 What sort of procedure is it? 913 00:34:24,230 --> 00:34:25,070 It's called my-coin. 914 00:34:25,070 --> 00:34:26,785 It takes in weight. 915 00:34:26,785 --> 00:34:28,089 Again, these are equivalent. 916 00:34:28,089 --> 00:34:30,380 And to answer Nori's question about how would I just do 917 00:34:30,380 --> 00:34:32,110 that without having to define things, 918 00:34:32,110 --> 00:34:35,550 I would say something like, hist repeat 1,000. 919 00:34:35,550 --> 00:34:36,800 Now, what do I want to repeat? 920 00:34:36,800 --> 00:34:40,370 I want to repeat some sort of procedure that samples things. 921 00:34:40,370 --> 00:34:41,969 So it's-- I'll call it lambda. 922 00:34:41,969 --> 00:34:43,070 It's an empty lambda. 923 00:34:43,070 --> 00:34:44,480 It doesn't take in any arguments. 924 00:34:44,480 --> 00:34:46,429 It's just the procedure. 925 00:34:46,429 --> 00:34:51,889 And what it does is, it flips a coin 0.9. 926 00:34:51,889 --> 00:34:54,219 And if I run that, I'll get that. 927 00:34:54,219 --> 00:34:56,434 OK, yes, no? 928 00:34:56,434 --> 00:34:58,290 OK, good. 929 00:34:58,290 --> 00:35:02,121 OK, so let's see, there are many other primitives 930 00:35:02,121 --> 00:35:02,995 that we could get to. 931 00:35:02,995 --> 00:35:04,600 There is uniform-draw. 932 00:35:04,600 --> 00:35:06,640 You can look at this online, but there's-- 933 00:35:06,640 --> 00:35:08,139 the basic primitives are things like 934 00:35:08,139 --> 00:35:10,710 multinomial, uniform, random integer, beta, Dirichlet, 935 00:35:10,710 --> 00:35:13,040 there's also the Chinese restaurant process. 936 00:35:13,040 --> 00:35:16,450 So let's see, we can build in our own little distribution. 937 00:35:16,450 --> 00:35:17,860 OK, let's try doing that. 938 00:35:17,860 --> 00:35:21,939 So here I've defined something which, under the hood, 939 00:35:21,939 --> 00:35:23,980 it's actually-- it's an interesting distribution. 940 00:35:23,980 --> 00:35:25,000 You all probably know it. 941 00:35:25,000 --> 00:35:26,416 But the way I'm going to define it 942 00:35:26,416 --> 00:35:31,060 is, I'm going to call it times it counts until heads. 943 00:35:31,060 --> 00:35:33,490 This is a procedure that's going to flip a coin. 944 00:35:33,490 --> 00:35:35,870 And if it comes up-- 945 00:35:35,870 --> 00:35:38,200 it's going to flip a coin with a particular weight. 946 00:35:38,200 --> 00:35:40,570 If it comes up true, if it comes up heads, 947 00:35:40,570 --> 00:35:41,780 then it's just going to stop. 948 00:35:41,780 --> 00:35:43,420 It's going to give you back 0. 949 00:35:43,420 --> 00:35:46,094 If it doesn't stop, if it comes back tails, 950 00:35:46,094 --> 00:35:47,260 it's going to tell you that. 951 00:35:47,260 --> 00:35:49,870 It's going to write down somewhere, like, 1. 952 00:35:49,870 --> 00:35:51,640 And it's going to keep going. 953 00:35:51,640 --> 00:35:55,460 It's going to recurse somehow, call itself, and then keep 954 00:35:55,460 --> 00:35:55,960 going. 955 00:35:55,960 --> 00:35:59,050 So this is for you, this is an exercise for you. 956 00:35:59,050 --> 00:36:03,070 You have it under the files, under 3.4, 957 00:36:03,070 --> 00:36:04,690 build your own distribution. 958 00:36:04,690 --> 00:36:06,370 I've left this open. 959 00:36:06,370 --> 00:36:08,320 Why don't you take two minutes. 960 00:36:08,320 --> 00:36:10,600 We're trying to build a procedure that gives me 961 00:36:10,600 --> 00:36:13,960 the amount of times that I need to flip a coin before I 962 00:36:13,960 --> 00:36:15,100 get back heads, OK? 963 00:36:15,100 --> 00:36:16,330 If I take a particular coin-- 964 00:36:16,330 --> 00:36:19,090 I guess I don't want to have one handy-- 965 00:36:19,090 --> 00:36:20,230 but I flip a coin. 966 00:36:20,230 --> 00:36:21,910 And I just-- you know, I flip it. 967 00:36:21,910 --> 00:36:24,670 If it comes back heads, I write down 0 and I'm done. 968 00:36:24,670 --> 00:36:26,920 If it comes back tails, I'm going to keep flipping it, 969 00:36:26,920 --> 00:36:28,720 so I flip it again. 970 00:36:28,720 --> 00:36:31,739 And you know, I might flip it 10 times until I get heads, 971 00:36:31,739 --> 00:36:34,030 so the point is that this procedure will, in that case, 972 00:36:34,030 --> 00:36:34,875 return 10. 973 00:36:34,875 --> 00:36:36,375 That would be one particular sample. 974 00:36:36,375 --> 00:36:38,916 Now, of course, if I take the coin again and I flip it again, 975 00:36:38,916 --> 00:36:42,130 sometimes I get 10 times until heads, sometimes once, 976 00:36:42,130 --> 00:36:45,280 sometimes 5, sometimes 20, so I'm 977 00:36:45,280 --> 00:36:47,680 going to get a particular distribution 978 00:36:47,680 --> 00:36:50,770 on the number of times I need until I hit heads. 979 00:36:50,770 --> 00:36:53,020 And the thing that we're trying to implement right now 980 00:36:53,020 --> 00:36:54,770 is just a procedure that, what it does is, 981 00:36:54,770 --> 00:36:57,040 it implements this counting thing that I just 982 00:36:57,040 --> 00:36:59,834 said by literally flipping a coin-- well, 983 00:36:59,834 --> 00:37:01,750 I don't know if literally, but under the hood, 984 00:37:01,750 --> 00:37:03,084 flipping a coin. 985 00:37:03,084 --> 00:37:05,500 If the coin comes back heads, because this thing evaluates 986 00:37:05,500 --> 00:37:07,150 to true, give back 0. 987 00:37:07,150 --> 00:37:11,200 If it doesn't, give back plus 1 plus what? 988 00:37:11,200 --> 00:37:12,790 So fill in those dots-- it shouldn't 989 00:37:12,790 --> 00:37:15,070 be a long expression-- such that you'll get 990 00:37:15,070 --> 00:37:16,490 what I was just talking about. 991 00:37:16,490 --> 00:37:19,120 So, guys, let me tell you what I was going for. 992 00:37:19,120 --> 00:37:27,590 An int plus 1 countsTillHeads coinweight. 993 00:37:27,590 --> 00:37:31,890 OK, and now if you do something like countsTillHeads, 994 00:37:31,890 --> 00:37:35,940 I don't know, 0.1 or something like that, and you run it. 995 00:37:35,940 --> 00:37:37,830 And it gets saved-- 996 00:37:37,830 --> 00:37:40,260 so let's read through this for a second. 997 00:37:40,260 --> 00:37:42,304 What happens is, you defined a procedure. 998 00:37:42,304 --> 00:37:43,470 It's called countsTillHeads. 999 00:37:43,470 --> 00:37:45,390 It takes in a coin weight. 1000 00:37:45,390 --> 00:37:47,110 It flips a coin. 1001 00:37:47,110 --> 00:37:49,440 If it comes back head, it gives you back 0. 1002 00:37:49,440 --> 00:37:51,780 If it didn't come back heads, then you just do plus 1. 1003 00:37:51,780 --> 00:37:53,760 And then you just call that thing again. 1004 00:37:53,760 --> 00:37:57,750 You do countTillHeads coinweight again and again. 1005 00:37:57,750 --> 00:38:02,790 If it comes back 0, then this time, you'll have plus 1 plus 0 1006 00:38:02,790 --> 00:38:05,880 if it came back heads in here. 1007 00:38:05,880 --> 00:38:09,550 But if it didn't, then this will be plus 1 plus something. 1008 00:38:09,550 --> 00:38:11,400 In effect, what we've defined here-- 1009 00:38:11,400 --> 00:38:12,630 those of you that have defined it, and if not, 1010 00:38:12,630 --> 00:38:13,421 just look at this-- 1011 00:38:13,421 --> 00:38:16,080 what you've defined here is sort of a procedure that 1012 00:38:16,080 --> 00:38:18,330 might give us back infinity in some way, 1013 00:38:18,330 --> 00:38:21,240 except it's becoming extremely unlikely to do so 1014 00:38:21,240 --> 00:38:23,560 with each particular flip of the coin. 1015 00:38:23,560 --> 00:38:25,230 Now, I run it once with 0.1. 1016 00:38:25,230 --> 00:38:26,190 I get 15. 1017 00:38:26,190 --> 00:38:28,824 I can run it again and I'll get, you know, 8. 1018 00:38:28,824 --> 00:38:30,240 That just means that, on that run, 1019 00:38:30,240 --> 00:38:32,910 I flipped it eight times before I got heads. 1020 00:38:32,910 --> 00:38:35,400 And again, I can do this many, many different times. 1021 00:38:35,400 --> 00:38:41,880 Like, I can do hist repeat 1,000 and then this thing, 1022 00:38:41,880 --> 00:38:44,610 some empty procedure that does that. 1023 00:38:44,610 --> 00:38:49,620 And what you gets is this, which, in case it 1024 00:38:49,620 --> 00:38:51,205 doesn't look familiar-- sorry, it's 1025 00:38:51,205 --> 00:38:52,830 just the way these things usually look. 1026 00:38:52,830 --> 00:38:54,630 This is sort of flipping the x- and y-axis. 1027 00:38:54,630 --> 00:38:56,827 But the point is, how many times did I 1028 00:38:56,827 --> 00:38:59,160 have to flip it to get, you know-- how many times did it 1029 00:38:59,160 --> 00:38:59,659 happen? 1030 00:38:59,659 --> 00:39:02,550 Did I flip it three times, or one, or two, three times? 1031 00:39:02,550 --> 00:39:04,950 That's about 24%. 1032 00:39:04,950 --> 00:39:07,139 And it sort of goes down, and down, and down, 1033 00:39:07,139 --> 00:39:09,180 because it becomes much, much, much more unlikely 1034 00:39:09,180 --> 00:39:11,886 that I'll flip it 40 times until I get heads. 1035 00:39:11,886 --> 00:39:14,010 It could be that I'll keep flipping it to infinity, 1036 00:39:14,010 --> 00:39:16,410 but it's not going to happen. 1037 00:39:16,410 --> 00:39:18,900 This, in case you didn't know, falls off geometrically. 1038 00:39:18,900 --> 00:39:21,360 It's the geometric distribution. 1039 00:39:21,360 --> 00:39:24,180 That's a very fundamental, simple distribution. 1040 00:39:24,180 --> 00:39:25,840 And one way to write it is to say, 1041 00:39:25,840 --> 00:39:27,870 what's the probability of k? 1042 00:39:27,870 --> 00:39:30,090 The probability of k is-- 1043 00:39:30,090 --> 00:39:33,195 let's say, we have a coin which has the-- 1044 00:39:33,195 --> 00:39:36,750 it's probability of coming up heads is p. 1045 00:39:36,750 --> 00:39:38,370 Then we say the probability of k is 1046 00:39:38,370 --> 00:39:44,010 p to the k minus 1 times 1 minus p, yes? 1047 00:39:44,010 --> 00:39:48,680 It's I flip the coin 1 minus p times to the k. 1048 00:39:48,680 --> 00:39:51,746 The point is, you can define the geometric distribution by sort 1049 00:39:51,746 --> 00:39:53,120 of saying, what's the probability 1050 00:39:53,120 --> 00:39:55,130 of any particular number? 1051 00:39:55,130 --> 00:39:59,072 Or you can define the procedure for it, OK? 1052 00:39:59,072 --> 00:40:00,530 Instead of writing down what should 1053 00:40:00,530 --> 00:40:04,580 be the probability of any particular sequence, 1054 00:40:04,580 --> 00:40:06,980 you can just write down the procedure that it describes. 1055 00:40:06,980 --> 00:40:07,920 This is the procedure. 1056 00:40:07,920 --> 00:40:09,628 The procedure doesn't explicitly tell you 1057 00:40:09,628 --> 00:40:11,930 what the distribution is, it just samples it. 1058 00:40:11,930 --> 00:40:13,910 You've built a procedure for flipping a coin. 1059 00:40:13,910 --> 00:40:15,993 And if you do it many, many, many different times, 1060 00:40:15,993 --> 00:40:19,070 what you'll get is the geometric distribution. 1061 00:40:19,070 --> 00:40:22,610 This is will approach the geometric distribution. 1062 00:40:22,610 --> 00:40:25,080 I can probably also do density, and then it'll 1063 00:40:25,080 --> 00:40:26,510 show you it like that. 1064 00:40:28,912 --> 00:40:31,120 So that's what I was talking about before with, like, 1065 00:40:31,120 --> 00:40:32,350 trying to wrap your head around something 1066 00:40:32,350 --> 00:40:34,641 like the equivalence between a probability distribution 1067 00:40:34,641 --> 00:40:37,540 that you can write down in math or as an analytical expression 1068 00:40:37,540 --> 00:40:40,390 and writing down the equivalent procedure for generating 1069 00:40:40,390 --> 00:40:42,527 that probability distribution. 1070 00:40:42,527 --> 00:40:44,860 Let's move on to something a little bit more interesting 1071 00:40:44,860 --> 00:40:46,992 like Gaussian sampling. 1072 00:40:46,992 --> 00:40:48,700 If you're not with us, you can look at it 1073 00:40:48,700 --> 00:40:51,640 in 3.5, Gaussian Samples. 1074 00:40:51,640 --> 00:40:53,770 What I've done here is, basically, I'm 1075 00:40:53,770 --> 00:40:56,399 defining a particular center. 1076 00:40:56,399 --> 00:40:57,940 Let's walk through this for a second. 1077 00:40:57,940 --> 00:41:00,340 I'm defining a two-dimensional Gaussian. 1078 00:41:00,340 --> 00:41:02,470 What it does is, it takes a particular center. 1079 00:41:02,470 --> 00:41:05,050 A center is just an x-y point. 1080 00:41:05,050 --> 00:41:10,040 And it does, you know, Gaussian around the first one. 1081 00:41:10,040 --> 00:41:12,400 I'm trying to define a two-dimensional Gaussian. 1082 00:41:12,400 --> 00:41:14,140 The way I do it is, I take a point 1083 00:41:14,140 --> 00:41:17,350 around-- a one-dimensional Gaussian around this point. 1084 00:41:17,350 --> 00:41:19,120 And I take a one-dimensional Gaussian 1085 00:41:19,120 --> 00:41:20,320 around the second point. 1086 00:41:20,320 --> 00:41:21,492 And then I just draw it. 1087 00:41:21,492 --> 00:41:23,950 So in this particular case, I'm going to define my Gaussian 1088 00:41:23,950 --> 00:41:26,200 center as 3, 2. 1089 00:41:26,200 --> 00:41:29,710 OK, I'm going to take it x equals 3, y equals 2. 1090 00:41:29,710 --> 00:41:32,830 And I want to sample a Gaussian around 3, 2. 1091 00:41:32,830 --> 00:41:37,070 So I'm going to sample of Gaussian around 3 1092 00:41:37,070 --> 00:41:39,030 and a Gaussian around 2. 1093 00:41:39,030 --> 00:41:40,960 And I'm going to give you that back. 1094 00:41:40,960 --> 00:41:44,530 And if I repeat this 1,000 times, then-- and I scatter it, 1095 00:41:44,530 --> 00:41:47,830 I'll end up with a plot that looks a bit like this. 1096 00:41:47,830 --> 00:41:51,340 And you can see on the x-axis, this is 3. 1097 00:41:51,340 --> 00:41:52,360 And this is 2. 1098 00:41:52,360 --> 00:41:54,790 And it's basically a Gaussian with sampling points 1099 00:41:54,790 --> 00:41:57,250 from around this thing, another forward procedure 1100 00:41:57,250 --> 00:41:59,060 that I can sample. 1101 00:41:59,060 --> 00:42:01,350 OK, is everyone more or less on board with this? 1102 00:42:01,350 --> 00:42:04,610 Let's take two seconds to read this again. 1103 00:42:04,610 --> 00:42:08,640 A basic procedure in Church is Gaussian. 1104 00:42:08,640 --> 00:42:09,980 What I do is I basically-- 1105 00:42:09,980 --> 00:42:13,100 I try to call Gaussian on some number. 1106 00:42:13,100 --> 00:42:16,280 Gaussian takes in two arguments. 1107 00:42:16,280 --> 00:42:20,387 Gaussian takes in a mean and a variance. 1108 00:42:20,387 --> 00:42:22,220 In particular, I'm going to take a Gaussian. 1109 00:42:22,220 --> 00:42:26,930 And its mean is going to be the first argument of center. 1110 00:42:26,930 --> 00:42:28,970 Its variance it's going to be 1. 1111 00:42:28,970 --> 00:42:30,620 I'm going to take a Gaussian sampled 1112 00:42:30,620 --> 00:42:35,250 from the second argument, the y, and a variance of 1. 1113 00:42:35,250 --> 00:42:37,920 And then I'm going to just give you back to that point. 1114 00:42:37,920 --> 00:42:42,240 So this is a procedure that takes in a center point. 1115 00:42:42,240 --> 00:42:44,940 And each time you sample it, it will give you a sample 1116 00:42:44,940 --> 00:42:48,420 from around the mean 3, 2. 1117 00:42:48,420 --> 00:42:49,884 And if I run that-- 1118 00:42:49,884 --> 00:42:51,550 so now I've defined a particular center. 1119 00:42:51,550 --> 00:42:52,910 You know, I've defined it 3, 2. 1120 00:42:52,910 --> 00:42:55,114 I could have done many other different things. 1121 00:42:55,114 --> 00:42:56,280 And I repeat that 100 times. 1122 00:42:56,280 --> 00:43:03,416 I've basically drawn a sample from something around 3, 2. 1123 00:43:03,416 --> 00:43:05,790 This can quickly get more interesting if you do something 1124 00:43:05,790 --> 00:43:07,189 like a mixture of Gaussians. 1125 00:43:07,189 --> 00:43:09,480 So a Gaussian mixture model is usually just saying, OK, 1126 00:43:09,480 --> 00:43:10,680 I have some particular space. 1127 00:43:10,680 --> 00:43:12,638 And I'm trying to figure out how many Gaussians 1128 00:43:12,638 --> 00:43:15,600 are in this scene, so let's write down the forward model 1129 00:43:15,600 --> 00:43:16,360 for that thing. 1130 00:43:16,360 --> 00:43:18,270 What's the forward model for a mixture model? 1131 00:43:18,270 --> 00:43:20,310 The forward model saying, I'm going to draw out 1132 00:43:20,310 --> 00:43:22,080 some number of Gaussians. 1133 00:43:22,080 --> 00:43:23,580 I don't know how many. 1134 00:43:23,580 --> 00:43:25,050 And I don't necessarily know what 1135 00:43:25,050 --> 00:43:27,019 their center point is, right? 1136 00:43:27,019 --> 00:43:28,560 And from each one of these, I'm going 1137 00:43:28,560 --> 00:43:29,857 to draw some number of samples. 1138 00:43:29,857 --> 00:43:32,190 Does everyone understand, more or less, that description 1139 00:43:32,190 --> 00:43:33,300 that I just gave? 1140 00:43:33,300 --> 00:43:34,780 We're going to write it out now. 1141 00:43:34,780 --> 00:43:35,940 But the point is, the generative model 1142 00:43:35,940 --> 00:43:37,530 in your head for a mixture of Gaussians 1143 00:43:37,530 --> 00:43:39,450 should be, there are some number of Gaussians. 1144 00:43:39,450 --> 00:43:40,509 I don't know what it is. 1145 00:43:40,509 --> 00:43:42,300 Each one of them is centered on some point. 1146 00:43:42,300 --> 00:43:43,440 I don't know what it is. 1147 00:43:43,440 --> 00:43:46,080 Let's say I know the variance just for simplicity, 1148 00:43:46,080 --> 00:43:49,320 but I could obviously put a prior on that. 1149 00:43:49,320 --> 00:43:51,030 And then I just sample from that. 1150 00:43:51,030 --> 00:43:53,100 And I'll get some distribution. 1151 00:43:53,100 --> 00:43:54,965 And then you could use-- we'll later on see, 1152 00:43:54,965 --> 00:43:56,590 once you write down that forward model, 1153 00:43:56,590 --> 00:43:58,940 it's pretty simple to then just invert it and say, 1154 00:43:58,940 --> 00:44:00,960 OK, I see some number of points. 1155 00:44:00,960 --> 00:44:02,730 How many Gaussians are there actually? 1156 00:44:02,730 --> 00:44:05,610 But let's write down the forward model. 1157 00:44:05,610 --> 00:44:08,610 So I have already done this ahead of time. 1158 00:44:08,610 --> 00:44:13,300 And I'll do it here. 1159 00:44:13,300 --> 00:44:16,470 So what I've done here, minus the typo, thanks, 1160 00:44:16,470 --> 00:44:20,096 is to say something like, I want a sample of Gaussian center 1161 00:44:20,096 --> 00:44:21,720 where I don't know where it is, but I'm 1162 00:44:21,720 --> 00:44:24,360 going to say that it's in this two-dimensional space between 0 1163 00:44:24,360 --> 00:44:29,882 and 10, a box that's 10 wide and 10 tall. 1164 00:44:29,882 --> 00:44:32,340 So for each new Gaussian, I don't know where its center is, 1165 00:44:32,340 --> 00:44:34,131 but I'm assuming it's somewhere in this box 1166 00:44:34,131 --> 00:44:35,280 that we're looking at. 1167 00:44:35,280 --> 00:44:37,390 And the way I do that is, I say, OK, I 1168 00:44:37,390 --> 00:44:38,940 define some sort of procedure. 1169 00:44:38,940 --> 00:44:40,830 Each time you evaluate this procedure, what 1170 00:44:40,830 --> 00:44:43,205 it's going to give you back is a pair, 1171 00:44:43,205 --> 00:44:44,580 where the first thing in the pair 1172 00:44:44,580 --> 00:44:47,890 is a uniform between 0 and 10, the second thing in the pair 1173 00:44:47,890 --> 00:44:49,740 is a uniform between 0 and 10. 1174 00:44:49,740 --> 00:44:51,930 If all you were to do are to sample Gaussian center, 1175 00:44:51,930 --> 00:44:57,300 you would get back some number uniformly-distributed in the 10 1176 00:44:57,300 --> 00:45:00,960 box, where the first one is, let's say, 1177 00:45:00,960 --> 00:45:03,649 x, and the second one is y. 1178 00:45:03,649 --> 00:45:05,190 And the next thing I do is, let's say 1179 00:45:05,190 --> 00:45:08,550 I want to define some number of Gaussians 1180 00:45:08,550 --> 00:45:12,230 and I don't know how many there are. 1181 00:45:12,230 --> 00:45:17,120 Let's say, for example, that I want 1182 00:45:17,120 --> 00:45:21,560 to put some sort of ignorance prior on Gaussians between-- 1183 00:45:21,560 --> 00:45:24,440 there might be one, there might be two, there might be 10. 1184 00:45:24,440 --> 00:45:28,282 Let's say I stop it at 10 or something like that. 1185 00:45:28,282 --> 00:45:30,740 So in this case, I just say, sample the number of Gaussians 1186 00:45:30,740 --> 00:45:33,200 from something like random integer 10, since this 1187 00:45:33,200 --> 00:45:34,700 goes to 0, and you don't want 0, I'm 1188 00:45:34,700 --> 00:45:38,430 just adding the number 1 here. 1189 00:45:38,430 --> 00:45:41,267 But what I also could have done, and I 1190 00:45:41,267 --> 00:45:43,100 think I was going to do this is an exercise, 1191 00:45:43,100 --> 00:45:45,266 but since we want to get to physics, and psychology, 1192 00:45:45,266 --> 00:45:47,840 and some more interesting stuff, what I could have done here 1193 00:45:47,840 --> 00:45:50,060 is define number of Gaussians-- 1194 00:45:50,060 --> 00:45:52,250 suppose I wanted to put a prior on there 1195 00:45:52,250 --> 00:45:54,710 being potentially an infinite number of Gaussian, 1196 00:45:54,710 --> 00:45:56,134 what would I do? 1197 00:45:56,134 --> 00:45:58,055 AUDIENCE: Dirichlet. 1198 00:45:58,055 --> 00:45:59,430 TOMER ULLMAN: A Dirichlet, right? 1199 00:45:59,430 --> 00:46:01,471 Or what else can I do that we've already learned? 1200 00:46:05,155 --> 00:46:06,530 We could do the geometric, right? 1201 00:46:06,530 --> 00:46:08,321 We just defined the geometric a second ago. 1202 00:46:08,321 --> 00:46:10,310 The geometric gives us a probability 1203 00:46:10,310 --> 00:46:13,284 on numbers basically going from 0 to infinity. 1204 00:46:13,284 --> 00:46:15,200 And it dies off very quickly, so this gives us 1205 00:46:15,200 --> 00:46:17,402 sort of a natural prior of some sort to say, 1206 00:46:17,402 --> 00:46:19,610 I think that there are some number of Gaussians here. 1207 00:46:19,610 --> 00:46:21,290 I don't know what it is. 1208 00:46:21,290 --> 00:46:22,880 I'm pretty sure it dies off. 1209 00:46:22,880 --> 00:46:25,790 Like, I don't think 100 is as equally likely as 10. 1210 00:46:25,790 --> 00:46:27,830 I don't think 10 is as equally likely as 1. 1211 00:46:27,830 --> 00:46:29,990 So I could have said, define number of Gaussians, 1212 00:46:29,990 --> 00:46:31,340 just draw from geometric. 1213 00:46:31,340 --> 00:46:33,830 And then I would have gotten some number, potentially 1214 00:46:33,830 --> 00:46:34,700 infinite. 1215 00:46:34,700 --> 00:46:38,270 You've just defined an infinite Gaussian mixture model. 1216 00:46:38,270 --> 00:46:40,610 And then I draw some number of centers 1217 00:46:40,610 --> 00:46:44,090 by basically repeating this procedure. 1218 00:46:44,090 --> 00:46:46,520 I sample the Gaussians. 1219 00:46:46,520 --> 00:46:48,732 And then I scatter the points. 1220 00:46:48,732 --> 00:46:50,690 Let's see, and then you can look at the points. 1221 00:46:50,690 --> 00:46:51,981 And this is a fun game to play. 1222 00:46:51,981 --> 00:46:55,080 It's basically recapturing a bit of what Josh said before, 1223 00:46:55,080 --> 00:46:57,950 which is to say, how many Gaussians 1224 00:46:57,950 --> 00:46:59,242 do you think are in this image? 1225 00:46:59,242 --> 00:47:01,033 And you can sort of play that with yourself 1226 00:47:01,033 --> 00:47:02,150 to get a sense of it. 1227 00:47:02,150 --> 00:47:03,816 You know, you've defined some procedure. 1228 00:47:03,816 --> 00:47:06,237 You don't know how many Gaussians you actually created. 1229 00:47:06,237 --> 00:47:07,820 You don't know exactly where they are, 1230 00:47:07,820 --> 00:47:09,230 but you can run it forward. 1231 00:47:09,230 --> 00:47:11,930 And you can look at it and say, well, here I 1232 00:47:11,930 --> 00:47:13,334 think it's pretty obvious. 1233 00:47:13,334 --> 00:47:14,750 I think there's sort of a Gaussian 1234 00:47:14,750 --> 00:47:17,120 here, maybe a Gaussian here. 1235 00:47:17,120 --> 00:47:19,670 So I guess the number here is 2, but here it's 1236 00:47:19,670 --> 00:47:20,600 a bit less obvious. 1237 00:47:20,600 --> 00:47:25,100 And again, you can play with this. 1238 00:47:25,100 --> 00:47:27,259 So those of you who've written this down, 1239 00:47:27,259 --> 00:47:29,050 and assuming you've done either a Dirichlet 1240 00:47:29,050 --> 00:47:31,780 or a geometric distribution what you've basically done 1241 00:47:31,780 --> 00:47:36,520 is written down the forward model for an infinite Gaussian 1242 00:47:36,520 --> 00:47:38,080 mixture model. 1243 00:47:38,080 --> 00:47:41,410 And you did it in, more or less, five lines of code. 1244 00:47:41,410 --> 00:47:42,006 Yeah? 1245 00:47:42,006 --> 00:47:43,910 AUDIENCE: What is the fold there? 1246 00:47:46,042 --> 00:47:47,750 TOMER ULLMAN: Where do you see fold here? 1247 00:47:47,750 --> 00:47:50,580 AUDIENCE: Visualize scatter fold append 1248 00:47:50,580 --> 00:47:53,680 TOMER ULLMAN: Ah, yes, so fold is 1249 00:47:53,680 --> 00:47:55,837 another high-level procedure. 1250 00:47:55,837 --> 00:47:58,420 It's not terribly important for the purposes of this tutorial, 1251 00:47:58,420 --> 00:48:02,390 but what it does is, it basically takes in a function. 1252 00:48:02,390 --> 00:48:03,880 It takes in a list of stuff. 1253 00:48:03,880 --> 00:48:07,210 And it basically applies it to the first argument. 1254 00:48:07,210 --> 00:48:09,370 Then it takes it and applies it to whatever 1255 00:48:09,370 --> 00:48:11,085 the result was plus the next item-- 1256 00:48:11,085 --> 00:48:11,710 AUDIENCE: Plus? 1257 00:48:11,710 --> 00:48:13,202 TOMER ULLMAN: --in the list. 1258 00:48:13,202 --> 00:48:14,570 Well, not exactly plus-- 1259 00:48:14,570 --> 00:48:14,830 AUDIENCE: In addition? 1260 00:48:14,830 --> 00:48:16,413 TOMER ULLMAN: --but, yes, in addition, 1261 00:48:16,413 --> 00:48:17,980 so you can have a fold which has, 1262 00:48:17,980 --> 00:48:19,720 for example, two arguments. 1263 00:48:19,720 --> 00:48:22,100 And what it does is it multiplies. 1264 00:48:22,100 --> 00:48:23,350 So then you would take a list. 1265 00:48:23,350 --> 00:48:25,420 And you would basically do-- 1266 00:48:25,420 --> 00:48:26,980 or rather, what is sum. 1267 00:48:26,980 --> 00:48:30,980 what some is basically is a fold of plus over a list, 1268 00:48:30,980 --> 00:48:32,667 because it takes the first number, 1269 00:48:32,667 --> 00:48:34,750 sums it up with the second one, takes that result, 1270 00:48:34,750 --> 00:48:35,980 sums it up with a third one-- 1271 00:48:35,980 --> 00:48:37,650 AUDIENCE: [INAUDIBLE] 1272 00:48:37,650 --> 00:48:39,370 TOMER ULLMAN: Fold needs three arguments. 1273 00:48:39,370 --> 00:48:43,060 Fold needs a particular-- well, it needs the function 1274 00:48:43,060 --> 00:48:44,410 that you're going to apply. 1275 00:48:44,410 --> 00:48:47,560 It needs a starting point to start from. 1276 00:48:47,560 --> 00:48:50,490 And it needs a lot that it's going to work on, 1277 00:48:50,490 --> 00:48:53,410 again, not terribly important for-- 1278 00:48:53,410 --> 00:48:54,852 AUDIENCE: So why do this? 1279 00:48:54,852 --> 00:48:56,560 TOMER ULLMAN: So in this particular case, 1280 00:48:56,560 --> 00:48:59,590 what I'm trying to do in the background 1281 00:48:59,590 --> 00:49:01,810 is, I'm going to get a lot of Gaussians. 1282 00:49:01,810 --> 00:49:02,860 I don't know how many. 1283 00:49:02,860 --> 00:49:05,140 I'm going to get basically a list of lists. 1284 00:49:05,140 --> 00:49:06,520 It could be one. 1285 00:49:06,520 --> 00:49:07,480 It could be three. 1286 00:49:07,480 --> 00:49:08,724 It could be 10. 1287 00:49:08,724 --> 00:49:11,140 Each one of them is going to define some number of points. 1288 00:49:11,140 --> 00:49:12,700 And I just want to scatter them. 1289 00:49:12,700 --> 00:49:15,749 But scatter works by taking in one list, 1290 00:49:15,749 --> 00:49:17,540 so it's basically just a way of collapsing. 1291 00:49:17,540 --> 00:49:19,498 Say I have three, or 10, I don't know how many. 1292 00:49:19,498 --> 00:49:21,340 I'm trying to collapse some number of lists 1293 00:49:21,340 --> 00:49:23,410 into a single list. 1294 00:49:23,410 --> 00:49:26,080 We've defined some number of Gaussians. 1295 00:49:26,080 --> 00:49:27,850 This is a London Blitz example. 1296 00:49:27,850 --> 00:49:30,070 Josh was talking about this a little bit. 1297 00:49:30,070 --> 00:49:32,620 Those of you who want to, sort of, jump back in again, 1298 00:49:32,620 --> 00:49:38,590 you can go to 3.5.2 in the student document. 1299 00:49:38,590 --> 00:49:41,272 You can copy and whatever is under that and paste it. 1300 00:49:41,272 --> 00:49:43,230 And let's talk about that example for a second. 1301 00:49:45,880 --> 00:49:49,720 What this thing is doing is, it's sort of Josh's example-- 1302 00:49:49,720 --> 00:49:52,390 do you remember his example of, we have some sort of grid. 1303 00:49:52,390 --> 00:49:53,920 And we're trying to say, is there 1304 00:49:53,920 --> 00:49:58,660 a suspicious cluster somewhere, a disease cluster? 1305 00:49:58,660 --> 00:49:59,410 We have some dots. 1306 00:49:59,410 --> 00:50:01,430 And we're trying to figure out is there 1307 00:50:01,430 --> 00:50:02,530 something going on here? 1308 00:50:02,530 --> 00:50:04,571 You know, there's sort of a faulty, I don't know, 1309 00:50:04,571 --> 00:50:07,140 whatever, asbestos or something like that. 1310 00:50:07,140 --> 00:50:08,390 And I want to figure that out. 1311 00:50:08,390 --> 00:50:10,897 So what you're going to get is sort of a 2D map. 1312 00:50:10,897 --> 00:50:12,730 You're going to get some dots from that map. 1313 00:50:12,730 --> 00:50:15,430 And you're trying to figure out-- your hypothesis is either 1314 00:50:15,430 --> 00:50:17,110 this is sort of randomly-distributed, 1315 00:50:17,110 --> 00:50:21,730 it's a uniform, or there's some sort of center here. 1316 00:50:21,730 --> 00:50:24,190 So how do we write down the forward model for something 1317 00:50:24,190 --> 00:50:25,490 like that? 1318 00:50:25,490 --> 00:50:26,800 We would write down either-- 1319 00:50:26,800 --> 00:50:27,970 the particular example, I'm doing 1320 00:50:27,970 --> 00:50:29,980 here is another example that Tom Griffiths did, 1321 00:50:29,980 --> 00:50:34,090 which is, during the Blitz, during the London bombing-- 1322 00:50:34,090 --> 00:50:38,030 this is actually a very old example of finding patterns. 1323 00:50:38,030 --> 00:50:40,450 Some of the British, the people of London, 1324 00:50:40,450 --> 00:50:43,270 were convinced that there were spies in London that 1325 00:50:43,270 --> 00:50:46,355 were telling the Germans where to bomb during the Blitz. 1326 00:50:46,355 --> 00:50:47,980 And the way that they reasoned this is, 1327 00:50:47,980 --> 00:50:50,380 they looked at the pattern of bombings. 1328 00:50:50,380 --> 00:50:52,540 And they said, there's no way that this is random. 1329 00:50:52,540 --> 00:50:54,820 They just looked at, like, dots on a map. 1330 00:50:54,820 --> 00:50:56,830 And to them, it looked a bit like Gaussians, 1331 00:50:56,830 --> 00:50:58,210 or things like that. 1332 00:50:58,210 --> 00:51:00,301 They were working from, sort of, few examples. 1333 00:51:00,301 --> 00:51:02,800 When you look at, there's, sort of, these nice web-- "nice," 1334 00:51:02,800 --> 00:51:05,258 I don't know if it's nice-- but there's these websites that 1335 00:51:05,258 --> 00:51:08,980 show you the entire Blitz from when 1336 00:51:08,980 --> 00:51:10,300 it started to when it ended. 1337 00:51:10,300 --> 00:51:13,230 And it's basically a random distribution. 1338 00:51:13,230 --> 00:51:14,860 If you run statistical tests on it, 1339 00:51:14,860 --> 00:51:17,230 it's no different from a random distribution. 1340 00:51:17,230 --> 00:51:19,630 How would you run such a test on it? 1341 00:51:19,630 --> 00:51:21,420 What you would do, for example, is 1342 00:51:21,420 --> 00:51:24,340 you would write a forward model that says it's either random, 1343 00:51:24,340 --> 00:51:25,690 uniform, or it's not. 1344 00:51:25,690 --> 00:51:28,510 Now, tell me which one is more likely. 1345 00:51:28,510 --> 00:51:30,590 And that's what people have, kind of, done. 1346 00:51:30,590 --> 00:51:32,436 That's a nice data set to play around with. 1347 00:51:32,436 --> 00:51:34,060 The way that we've written it over here 1348 00:51:34,060 --> 00:51:35,860 is to say, look, we have two options. 1349 00:51:35,860 --> 00:51:41,500 Either it's a uniform bombing or it's some targeted bombing. 1350 00:51:41,500 --> 00:51:43,630 The uniform bombing is basically going to give us 1351 00:51:43,630 --> 00:51:46,810 just some point between 0-- 1352 00:51:46,810 --> 00:51:49,360 between this box of 0 to 10, just this thing that we 1353 00:51:49,360 --> 00:51:50,620 were talking about before. 1354 00:51:50,620 --> 00:51:52,960 It's going to sample uniformly from this box. 1355 00:51:52,960 --> 00:51:55,754 The targeted bombing is going to sample some Gaussians, 1356 00:51:55,754 --> 00:51:56,920 just like we defined before. 1357 00:51:56,920 --> 00:51:57,920 You don't know how many. 1358 00:51:57,920 --> 00:51:59,980 You don't know where the center is. 1359 00:51:59,980 --> 00:52:03,430 And it's going to then sample from those Gaussians. 1360 00:52:03,430 --> 00:52:05,817 And it's going to give you back some sort of scatter. 1361 00:52:05,817 --> 00:52:07,400 And you're basically going to say, OK, 1362 00:52:07,400 --> 00:52:09,880 I don't know if it's random, uniform, 1363 00:52:09,880 --> 00:52:12,219 or if there's some targeted bombing going on here, 1364 00:52:12,219 --> 00:52:14,260 so I'm going to place, basically, some inference. 1365 00:52:14,260 --> 00:52:15,730 I'm going to flip a coin. 1366 00:52:15,730 --> 00:52:17,980 If it comes up heads, I'm going to do uniform bombing. 1367 00:52:17,980 --> 00:52:21,132 If it comes up tails, I'm going to do targeted bombing. 1368 00:52:21,132 --> 00:52:23,090 And then you could look at something like this. 1369 00:52:23,090 --> 00:52:25,950 And you can say, well, I don't know. 1370 00:52:25,950 --> 00:52:27,020 That's kind of odd. 1371 00:52:27,020 --> 00:52:29,980 I mean, it doesn't exactly look like a uniform bombing. 1372 00:52:29,980 --> 00:52:33,836 There's all this missing empty space over here, right? 1373 00:52:33,836 --> 00:52:35,960 It doesn't exactly look like one particular target. 1374 00:52:35,960 --> 00:52:37,840 And again, you can sort of play with this. 1375 00:52:37,840 --> 00:52:40,160 And we'll get into the inference about how to invert this thing. 1376 00:52:40,160 --> 00:52:42,326 But just as a forward model, you can play with this, 1377 00:52:42,326 --> 00:52:46,028 run it forward, and try to see if you can guess.