1 00:00:01,580 --> 00:00:03,920 The following content is provided under a Creative 2 00:00:03,920 --> 00:00:05,340 Commons license. 3 00:00:05,340 --> 00:00:07,550 Your support will help MIT OpenCourseWare 4 00:00:07,550 --> 00:00:11,640 continue to offer high-quality educational resources for free. 5 00:00:11,640 --> 00:00:14,180 To make a donation or to view additional materials 6 00:00:14,180 --> 00:00:18,110 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:18,110 --> 00:00:19,340 at ocw.mit.edu. 8 00:00:22,340 --> 00:00:23,620 GUEST SPEAKER: Hi, everybody. 9 00:00:23,620 --> 00:00:26,120 Today we're going to talk about semantic localization. 10 00:00:29,180 --> 00:00:32,770 So first I'm going to talk about what is semantic localization 11 00:00:32,770 --> 00:00:34,540 and what is the motivation for it. 12 00:00:34,540 --> 00:00:36,370 Then we'll go through an algorithm 13 00:00:36,370 --> 00:00:37,840 that will allow us to localize. 14 00:00:37,840 --> 00:00:40,080 And then we'll go into how to actually add 15 00:00:40,080 --> 00:00:43,850 semantic information into this algorithm. 16 00:00:43,850 --> 00:00:48,200 So our focus what we're coming up with this 17 00:00:48,200 --> 00:00:50,670 was the orienteering Grand Challenge. 18 00:00:50,670 --> 00:00:53,540 So in orienteering, basically you have a map 19 00:00:53,540 --> 00:00:55,950 and you have a compass, and you have to go around 20 00:00:55,950 --> 00:00:59,240 to various checkpoints. 21 00:00:59,240 --> 00:01:04,550 So as you can imagine, this is sort of difficult, 22 00:01:04,550 --> 00:01:07,580 because you don't know where you are on this map except what 23 00:01:07,580 --> 00:01:10,910 you can tell from your compass. 24 00:01:10,910 --> 00:01:13,010 And so we weren't sure how to do this either, 25 00:01:13,010 --> 00:01:18,470 so we asked some orienteering experts. 26 00:01:18,470 --> 00:01:22,720 So basically the key ideas are, if you are disoriented, 27 00:01:22,720 --> 00:01:23,380 that's fine. 28 00:01:23,380 --> 00:01:26,330 You just need to find a reference point so that you 29 00:01:26,330 --> 00:01:29,324 can figure out where you are. 30 00:01:29,324 --> 00:01:30,740 You can think about where you last 31 00:01:30,740 --> 00:01:33,980 knew where you were, and estimate what your movement has 32 00:01:33,980 --> 00:01:35,560 been since you were there. 33 00:01:35,560 --> 00:01:38,300 And that can give you some options as to where you are. 34 00:01:38,300 --> 00:01:40,460 And you can look around for features 35 00:01:40,460 --> 00:01:43,280 that you can identify uniquely on the map. 36 00:01:46,340 --> 00:01:51,500 And this is an example of an orienteering map. 37 00:01:51,500 --> 00:01:56,710 If you just have what's over here, it's not very useful. 38 00:01:56,710 --> 00:02:00,081 You can't actually tell what those things are. 39 00:02:00,081 --> 00:02:01,580 You might guess that green is grass, 40 00:02:01,580 --> 00:02:03,505 but you don't actually know what they are. 41 00:02:03,505 --> 00:02:05,780 This map isn't useful until you actually 42 00:02:05,780 --> 00:02:08,509 have a legend to go along with it so that you 43 00:02:08,509 --> 00:02:13,190 can say you have roads, or foot paths, or pits, 44 00:02:13,190 --> 00:02:17,720 or different things on the map that you can actually identify. 45 00:02:17,720 --> 00:02:21,950 So this is the difference between how robots typically 46 00:02:21,950 --> 00:02:24,120 localize and how humans localize. 47 00:02:24,120 --> 00:02:27,350 So humans think about places in terms of features, 48 00:02:27,350 --> 00:02:30,800 in terms of what you can do with them, things like that, 49 00:02:30,800 --> 00:02:34,580 whereas robots might measure distances with laser scanners 50 00:02:34,580 --> 00:02:38,630 to form a perfect map of the entire space, 51 00:02:38,630 --> 00:02:41,170 whereas humans might think about obstructions 52 00:02:41,170 --> 00:02:44,100 such as rooms and their relative location to each other. 53 00:02:46,700 --> 00:02:48,980 So we can define semantic information 54 00:02:48,980 --> 00:02:50,870 as signs and symbols that contain 55 00:02:50,870 --> 00:02:52,910 meaningful concepts for humans. 56 00:02:52,910 --> 00:02:56,900 Basically different sorts of abstractions. 57 00:02:56,900 --> 00:03:00,135 And why might we want this semantic information? 58 00:03:00,135 --> 00:03:03,770 Well, one very important application 59 00:03:03,770 --> 00:03:05,840 would be improving human-robot interaction. 60 00:03:05,840 --> 00:03:08,990 So say you want a robot that can make coffee. 61 00:03:08,990 --> 00:03:11,680 It has to understand commands like, go to the kitchen. 62 00:03:11,680 --> 00:03:12,320 Get a mug. 63 00:03:12,320 --> 00:03:14,230 Turn on the coffee maker. 64 00:03:14,230 --> 00:03:16,490 This is the language that humans think in, 65 00:03:16,490 --> 00:03:19,840 and it's very useful if a robot can actually understand 66 00:03:19,840 --> 00:03:21,380 this sort of language as well. 67 00:03:21,380 --> 00:03:26,300 Know what a coffee maker is, what it does, and where it is. 68 00:03:26,300 --> 00:03:30,190 And additionally, you can get some performance and memory 69 00:03:30,190 --> 00:03:30,820 benefits. 70 00:03:30,820 --> 00:03:34,100 You're not storing this full map with the distance 71 00:03:34,100 --> 00:03:35,750 to every single surface. 72 00:03:35,750 --> 00:03:39,470 You're just storing locations of key information. 73 00:03:39,470 --> 00:03:41,930 And this means that your search base is a lot smaller too. 74 00:03:41,930 --> 00:03:44,060 So you can prune everything that's 75 00:03:44,060 --> 00:03:47,450 not related to kitchens or coffee makers, 76 00:03:47,450 --> 00:03:50,040 and get a performance that way. 77 00:03:50,040 --> 00:03:52,550 Finally, the fact that you can use simply 78 00:03:52,550 --> 00:03:54,730 a camera instead of a complicated laser scanner 79 00:03:54,730 --> 00:03:58,460 means that this is a lot cheaper and more accessible for robots 80 00:03:58,460 --> 00:03:58,970 to have. 81 00:04:01,580 --> 00:04:03,800 So we've talked about what semantic information is. 82 00:04:03,800 --> 00:04:07,370 So now let's define what semantic localization is. 83 00:04:07,370 --> 00:04:09,770 So basically, it's localizing based 84 00:04:09,770 --> 00:04:12,680 on semantic information rather than metric information, 85 00:04:12,680 --> 00:04:15,630 like distances. 86 00:04:15,630 --> 00:04:17,779 So for our Grand Challenge, we have a map 87 00:04:17,779 --> 00:04:19,760 with labeled objects and their coordinates. 88 00:04:19,760 --> 00:04:22,220 This is our semantic information. 89 00:04:22,220 --> 00:04:25,100 And we want to basically look around and say, what can we 90 00:04:25,100 --> 00:04:26,300 see in our field of view? 91 00:04:26,300 --> 00:04:27,860 And from that, we want to figure out 92 00:04:27,860 --> 00:04:30,260 where could we be on this map. 93 00:04:30,260 --> 00:04:33,590 And it's important to note that map building is actually 94 00:04:33,590 --> 00:04:36,740 a very different and hard problem with a lot 95 00:04:36,740 --> 00:04:37,820 of other research on it. 96 00:04:37,820 --> 00:04:39,860 In our case, we've been given a map, 97 00:04:39,860 --> 00:04:42,680 and we're simply talking about the problem of localizing 98 00:04:42,680 --> 00:04:45,440 based on that map. 99 00:04:45,440 --> 00:04:48,350 And now I'm going to let Matt tell you 100 00:04:48,350 --> 00:04:52,100 about particle filters, which is an algorithm for localization. 101 00:04:52,100 --> 00:04:54,240 MATT: All right, so my name is Matt Deyo. 102 00:04:54,240 --> 00:04:56,448 Now we're going to talk about particle filters. 103 00:04:56,448 --> 00:04:57,864 So this is a tool that we're going 104 00:04:57,864 --> 00:04:59,740 to teach you to use, specifically 105 00:04:59,740 --> 00:05:06,370 for state estimation, using the given math and measurements. 106 00:05:06,370 --> 00:05:07,810 So what is localization? 107 00:05:07,810 --> 00:05:08,850 We just went over it. 108 00:05:08,850 --> 00:05:11,390 Specifically, it's the question of, where am I? 109 00:05:11,390 --> 00:05:13,990 For any system to work, the robot 110 00:05:13,990 --> 00:05:16,010 needs to know where it is on a map. 111 00:05:16,010 --> 00:05:18,830 This is not so simple of an answer. 112 00:05:18,830 --> 00:05:20,030 Metric localization. 113 00:05:20,030 --> 00:05:23,340 As was said before, it is quantitative. 114 00:05:23,340 --> 00:05:28,460 So how many meters are you from this wall or from your origin? 115 00:05:28,460 --> 00:05:31,628 What's your orientation in degrees? 116 00:05:31,628 --> 00:05:33,887 And here's some examples of that. 117 00:05:33,887 --> 00:05:35,595 You can convert between coordinate frames 118 00:05:35,595 --> 00:05:36,970 really easily if you have metric. 119 00:05:39,820 --> 00:05:44,640 So the localization is, well, in our case, mathematical. 120 00:05:44,640 --> 00:05:48,990 The problem statement is, you have a control u. 121 00:05:48,990 --> 00:05:54,070 You're going to observe something about your state. 122 00:05:54,070 --> 00:05:55,290 It might be a camera image. 123 00:05:55,290 --> 00:05:57,730 It might be laser scans. 124 00:05:57,730 --> 00:05:59,320 And then essentially on your map, 125 00:05:59,320 --> 00:06:04,820 you're going to use probability to figure out where you are. 126 00:06:04,820 --> 00:06:07,990 So this equation specifically is taking into account 127 00:06:07,990 --> 00:06:13,120 the probability of being out of position at your current time 128 00:06:13,120 --> 00:06:15,517 based off your previous position, the observation 129 00:06:15,517 --> 00:06:17,225 that you're taking right now, the command 130 00:06:17,225 --> 00:06:21,360 variable you gave it, and then your map, like I said. 131 00:06:21,360 --> 00:06:25,230 So that's built on the observation noise, actuation, 132 00:06:25,230 --> 00:06:26,064 and then the belief. 133 00:06:26,064 --> 00:06:27,730 We're specifically looking at the belief 134 00:06:27,730 --> 00:06:29,110 here for the particle filter. 135 00:06:33,080 --> 00:06:35,960 So in our case, we're approximating it 136 00:06:35,960 --> 00:06:40,800 by several points, also known as particles. 137 00:06:40,800 --> 00:06:43,280 So we're going to look at an example here. 138 00:06:43,280 --> 00:06:45,400 I have to actually press the Play button. 139 00:06:45,400 --> 00:06:46,594 OK. 140 00:06:46,594 --> 00:06:47,578 Here's a quick demo. 141 00:06:50,530 --> 00:06:55,960 [INAUDIBLE] 142 00:06:55,960 --> 00:06:59,662 So a quick YouTube demo of a particle filter in action. 143 00:06:59,662 --> 00:07:01,120 So I'm just going to show it first. 144 00:07:05,770 --> 00:07:07,130 So pause. 145 00:07:07,130 --> 00:07:10,310 The red dot here is our actual robot position. 146 00:07:10,310 --> 00:07:14,270 And it's trying to localize itself in a maze. 147 00:07:14,270 --> 00:07:17,670 There's black walls throughout here. 148 00:07:17,670 --> 00:07:22,470 And it looks like a grid world, but that initial distribution 149 00:07:22,470 --> 00:07:25,840 we had was completely random across all the walls. 150 00:07:25,840 --> 00:07:30,640 So it's taking in observations of probably laser range 151 00:07:30,640 --> 00:07:31,460 finders. 152 00:07:31,460 --> 00:07:34,830 So it essentially knows that there are walls around it, 153 00:07:34,830 --> 00:07:38,960 and it knows that there's not a wall in front of it. 154 00:07:38,960 --> 00:07:40,940 So that centers all of these guesses 155 00:07:40,940 --> 00:07:42,590 on the center of hallways. 156 00:07:42,590 --> 00:07:46,220 Obvious it got rid of things that were close to the walls. 157 00:07:46,220 --> 00:07:49,354 There's a low probability that it's right up against the wall 158 00:07:49,354 --> 00:07:51,020 because of the observations it's taking. 159 00:07:51,020 --> 00:07:53,055 And we're going to revisit this example 160 00:07:53,055 --> 00:07:55,360 at the end of the presentation. 161 00:07:55,360 --> 00:07:56,120 Particle filters. 162 00:07:56,120 --> 00:07:58,410 So the method that we're going to teach you 163 00:07:58,410 --> 00:08:00,090 has four easy steps. 164 00:08:00,090 --> 00:08:01,040 One is not repeated. 165 00:08:01,040 --> 00:08:05,250 So the initial step is your first sample of particles. 166 00:08:05,250 --> 00:08:08,540 If you don't know anything about your state at the start, 167 00:08:08,540 --> 00:08:11,330 then you can sample just a distribution 168 00:08:11,330 --> 00:08:13,361 of your entire space. 169 00:08:13,361 --> 00:08:16,115 And then the repeated steps are updating weights, resampling, 170 00:08:16,115 --> 00:08:17,550 and then propagating dynamics. 171 00:08:20,910 --> 00:08:23,290 So we're going to show you a toy example. 172 00:08:23,290 --> 00:08:26,270 This is a really simple example just to get the idea across. 173 00:08:26,270 --> 00:08:28,130 We're focusing on just one dimension. 174 00:08:28,130 --> 00:08:31,300 We have an aircraft at constant altitude. 175 00:08:31,300 --> 00:08:32,990 So take that variable out. 176 00:08:32,990 --> 00:08:36,080 The only thing we're trying to localize to 177 00:08:36,080 --> 00:08:39,380 is a distance x over a map. 178 00:08:39,380 --> 00:08:41,659 The sensor is just a range finder down 179 00:08:41,659 --> 00:08:44,360 to the ground at this constant altitude. 180 00:08:44,360 --> 00:08:46,040 And then the math is a known mapping 181 00:08:46,040 --> 00:08:47,750 of x location to ground altitudes. 182 00:08:47,750 --> 00:08:49,400 And we're about to see what that means. 183 00:08:49,400 --> 00:08:53,400 The goal here is determining where you are in the mountains. 184 00:08:53,400 --> 00:08:55,400 So here we have our plane. 185 00:08:55,400 --> 00:08:57,590 As you see, we know the map below, 186 00:08:57,590 --> 00:09:00,450 and we know that it has to be flying at this altitude. 187 00:09:00,450 --> 00:09:03,830 So with a range finder, you can measure different depths down 188 00:09:03,830 --> 00:09:04,940 to the mountain. 189 00:09:04,940 --> 00:09:07,990 So here we have long distances, a medium distance, 190 00:09:07,990 --> 00:09:11,150 and a short distance if it's directly on top of a mountain. 191 00:09:11,150 --> 00:09:12,380 Constant altitude, unknown x. 192 00:09:12,380 --> 00:09:13,690 That's what we're solving for. 193 00:09:13,690 --> 00:09:16,070 And then we actually have a noisy velocity forward. 194 00:09:16,070 --> 00:09:20,570 So we know how fast the plane wants to be traveling in this 195 00:09:20,570 --> 00:09:22,700 direction, but obviously with some noise, 196 00:09:22,700 --> 00:09:26,430 it could be travelling a little faster or a little slower. 197 00:09:26,430 --> 00:09:28,390 So the first step, sampling. 198 00:09:28,390 --> 00:09:29,810 Our initial belief here is that we 199 00:09:29,810 --> 00:09:32,476 don't know where it is with respect to these mountains. 200 00:09:32,476 --> 00:09:33,850 So we're going to actually sample 201 00:09:33,850 --> 00:09:35,700 with a uniform distribution. 202 00:09:35,700 --> 00:09:37,930 Well, this is as uniform as I could get it. 203 00:09:37,930 --> 00:09:42,500 So over this range, those were our initial particles. 204 00:09:42,500 --> 00:09:45,482 Essentially guesses for what the x location is. 205 00:09:45,482 --> 00:09:46,940 We're going to update these weights 206 00:09:46,940 --> 00:09:48,390 based on the observation. 207 00:09:48,390 --> 00:09:51,290 So we get our first observation from our range finder, 208 00:09:51,290 --> 00:09:52,370 and it's this length. 209 00:09:52,370 --> 00:09:53,670 So it's a long length. 210 00:09:53,670 --> 00:09:57,140 It's, I guess, green in this case. 211 00:09:57,140 --> 00:09:58,857 This is our measured value. 212 00:09:58,857 --> 00:10:00,440 Here are all the values that we expect 213 00:10:00,440 --> 00:10:01,523 to see at these particles. 214 00:10:01,523 --> 00:10:04,370 So this is pretty easy to calculate, based on the map 215 00:10:04,370 --> 00:10:05,350 that you have. 216 00:10:05,350 --> 00:10:07,340 The position of each particle maps directly 217 00:10:07,340 --> 00:10:12,410 to a range finder measurement. 218 00:10:12,410 --> 00:10:13,710 So we're going to compare them. 219 00:10:13,710 --> 00:10:16,060 So the likelihood of getting these. 220 00:10:16,060 --> 00:10:19,639 So measuring the likelihood of each of these occurring. 221 00:10:19,639 --> 00:10:21,430 And we're actually weighting the particles. 222 00:10:21,430 --> 00:10:26,530 So based on what we measured, these are most likely, 223 00:10:26,530 --> 00:10:28,250 so they get a larger weight. 224 00:10:28,250 --> 00:10:31,790 And we are most likely not on top 225 00:10:31,790 --> 00:10:33,630 of a mountain at this point. 226 00:10:33,630 --> 00:10:36,600 So those get smaller weights. 227 00:10:36,600 --> 00:10:38,240 So we're going to resample, given that. 228 00:10:38,240 --> 00:10:40,610 So we're going keep the same number of samples 229 00:10:40,610 --> 00:10:41,290 the entire time. 230 00:10:41,290 --> 00:10:43,880 That's just how this particle filter works. 231 00:10:43,880 --> 00:10:49,800 Except we're going to resample across our new distribution. 232 00:10:49,800 --> 00:10:52,597 So these are the weights that we just came up with. 233 00:10:52,597 --> 00:10:54,180 And we're going to resample over them. 234 00:10:54,180 --> 00:10:56,600 So now it's more likely that it's here, 235 00:10:56,600 --> 00:10:58,370 or here, or way over there. 236 00:11:01,334 --> 00:11:03,220 And then the final step is propagating. 237 00:11:03,220 --> 00:11:07,370 So this whole time that we've been filtering and updating 238 00:11:07,370 --> 00:11:09,820 our weights, the way forward. 239 00:11:09,820 --> 00:11:11,410 So we need to take that into account. 240 00:11:11,410 --> 00:11:14,370 So we have a forward velocity. 241 00:11:14,370 --> 00:11:17,250 Let's say this range is 0 meters per second, 242 00:11:17,250 --> 00:11:19,996 to 5 meters per second, to 10 meters per second 243 00:11:19,996 --> 00:11:21,245 that you can see on this plot. 244 00:11:21,245 --> 00:11:24,780 So we're most likely moving five meters per second. 245 00:11:24,780 --> 00:11:26,360 So essentially, this is your change 246 00:11:26,360 --> 00:11:28,050 in time between sensor measurements. 247 00:11:28,050 --> 00:11:30,410 Obviously if you're only getting sensor measurements 248 00:11:30,410 --> 00:11:34,440 at 10 hertz, then you can propagate 249 00:11:34,440 --> 00:11:37,240 in between each of those. 250 00:11:37,240 --> 00:11:38,880 So here we are. 251 00:11:38,880 --> 00:11:43,450 Using our new sample particles, propagating them forward. 252 00:11:43,450 --> 00:11:46,110 For example, this one moving there is a low likelihood. 253 00:11:46,110 --> 00:11:47,400 That's a large distance. 254 00:11:47,400 --> 00:11:50,380 And these moving at a shorter distance 255 00:11:50,380 --> 00:11:54,340 is more likely, given our model. 256 00:11:54,340 --> 00:11:55,590 So we have those new ones. 257 00:11:55,590 --> 00:11:57,855 The new weights are now based in the probability of it 258 00:11:57,855 --> 00:11:59,400 transitioning to that point. 259 00:11:59,400 --> 00:12:02,500 How likely is it for the plane to move that far, essentially. 260 00:12:02,500 --> 00:12:04,180 And then we repeat. 261 00:12:04,180 --> 00:12:05,641 So we're repeating the steps again. 262 00:12:05,641 --> 00:12:08,057 We're going to get a measurement in from our range finder. 263 00:12:08,057 --> 00:12:11,880 We're going to compare it to the map, to our particles, 264 00:12:11,880 --> 00:12:14,240 keep the ones that are most likely and weight them more, 265 00:12:14,240 --> 00:12:15,910 and then propagate them. 266 00:12:15,910 --> 00:12:19,200 So as an example, here's time step 2, 267 00:12:19,200 --> 00:12:22,530 when we're measuring the uphill of this mountain. 268 00:12:22,530 --> 00:12:25,620 Time step 3, now we're halfway up the mountain. 269 00:12:25,620 --> 00:12:29,420 So positions that we've kept are at similar heights, 270 00:12:29,420 --> 00:12:32,880 so here and here. 271 00:12:32,880 --> 00:12:35,130 And then we can slowly get down to differentiating it. 272 00:12:35,130 --> 00:12:38,040 So now that we're on top of a mountain, the only pattern that 273 00:12:38,040 --> 00:12:42,370 matches that is here, and maybe over there. 274 00:12:42,370 --> 00:12:45,810 And then eventually, we get down to these two. 275 00:12:45,810 --> 00:12:48,030 And then eventually, as this mountain drops off, 276 00:12:48,030 --> 00:12:51,910 it's a unique position, where it goes farther than that one did. 277 00:12:51,910 --> 00:12:53,820 So in the end, our particle filter 278 00:12:53,820 --> 00:12:57,075 thinks that we're right around here or here. 279 00:12:57,075 --> 00:12:58,980 And finally, there's a pretty high chance 280 00:12:58,980 --> 00:13:00,510 that we're over that valley. 281 00:13:03,375 --> 00:13:05,250 So we'll looking at this, again, now that you 282 00:13:05,250 --> 00:13:06,333 know how to do the filter. 283 00:13:09,256 --> 00:13:11,600 And this will make a little more sense now. 284 00:13:11,600 --> 00:13:14,050 So they started with the uniform distribution. 285 00:13:14,050 --> 00:13:16,795 They had no clue where they were at the beginning. 286 00:13:16,795 --> 00:13:20,830 And as the robot moves forward, they are resampling. 287 00:13:20,830 --> 00:13:23,000 And the measurements are changing, obviously. 288 00:13:23,000 --> 00:13:24,610 Because it's seeing these two walls, 289 00:13:24,610 --> 00:13:27,400 and eventually, it sees that there's doorway to the left. 290 00:13:27,400 --> 00:13:30,410 And you can keep going forward as well. 291 00:13:30,410 --> 00:13:32,940 So eventually, that geometry doesn't line up 292 00:13:32,940 --> 00:13:36,820 with any other spot on the map. 293 00:13:36,820 --> 00:13:38,630 Here, we see it nose into this hallway. 294 00:13:38,630 --> 00:13:40,780 I think this top hallway is the only one 295 00:13:40,780 --> 00:13:42,590 on this map that's that long. 296 00:13:42,590 --> 00:13:44,790 But it still don't know which direction it's going. 297 00:13:44,790 --> 00:13:47,980 It hasn't been able to differentiate that yet, 298 00:13:47,980 --> 00:13:48,760 but it's about to. 299 00:13:48,760 --> 00:13:51,790 So the only remaining particles are here and here. 300 00:13:51,790 --> 00:13:53,730 It knows it's in the middle of a hallway. 301 00:13:53,730 --> 00:13:59,200 And it knows it's moved about two blocks now without seeing 302 00:13:59,200 --> 00:14:01,590 a wall directly in front of it, which 303 00:14:01,590 --> 00:14:04,430 that doesn't occur anywhere else, 304 00:14:04,430 --> 00:14:06,970 without having another turn-off. 305 00:14:06,970 --> 00:14:11,008 So it's about to solve itself. 306 00:14:11,008 --> 00:14:15,189 Because eventually, it sees this wall over here. 307 00:14:15,189 --> 00:14:16,730 And those observations don't match up 308 00:14:16,730 --> 00:14:18,360 with what it's actually seeing. 309 00:14:18,360 --> 00:14:21,290 So there we go. 310 00:14:21,290 --> 00:14:25,030 There's a [INAUDIBLE] particle filter successfully working 311 00:14:25,030 --> 00:14:27,376 for a little robot with a rangefinder. 312 00:14:34,610 --> 00:14:36,599 I went too far. 313 00:14:36,599 --> 00:14:37,565 There we go. 314 00:14:38,882 --> 00:14:40,340 DAVID STINGLEY: I'm David Stingley. 315 00:14:40,340 --> 00:14:42,570 And now, after we've gotten an idea of why 316 00:14:42,570 --> 00:14:44,720 we want to use semantic localization, 317 00:14:44,720 --> 00:14:47,720 and how to use particle filters to get an idea of our guesses 318 00:14:47,720 --> 00:14:49,190 for initial positions, we're going 319 00:14:49,190 --> 00:14:52,130 to talk about how we can use these two to actually implement 320 00:14:52,130 --> 00:14:55,010 semantic localization onto a robot. 321 00:14:55,010 --> 00:14:58,260 So hearkening back to the implementation idea, 322 00:14:58,260 --> 00:15:00,680 we have three important parts. 323 00:15:00,680 --> 00:15:03,290 We have a belief representation. 324 00:15:03,290 --> 00:15:04,860 We have the actuation model. 325 00:15:04,860 --> 00:15:07,310 So once we have a location, how are we going to move? 326 00:15:07,310 --> 00:15:09,080 As we said before, if you don't exactly 327 00:15:09,080 --> 00:15:10,580 know how fast you're moving, there's 328 00:15:10,580 --> 00:15:13,710 a probability you move to a bunch of different locations. 329 00:15:13,710 --> 00:15:15,955 And then finally, we have an observation model, 330 00:15:15,955 --> 00:15:17,330 which is the probability that you 331 00:15:17,330 --> 00:15:19,940 see some certain observation, given you're in this newly 332 00:15:19,940 --> 00:15:22,310 simulated position. 333 00:15:22,310 --> 00:15:25,960 If we continuously solve for our most probable x, 334 00:15:25,960 --> 00:15:27,490 then that's our location. 335 00:15:27,490 --> 00:15:29,600 So that particle that is the most probable 336 00:15:29,600 --> 00:15:32,937 is the place that we guess we're going to be. 337 00:15:32,937 --> 00:15:34,520 So let's look at a pseudocode for what 338 00:15:34,520 --> 00:15:36,186 a semantic localization would look like. 339 00:15:36,186 --> 00:15:38,240 So as long as our robot is moving, 340 00:15:38,240 --> 00:15:40,130 we're going to make observations. 341 00:15:40,130 --> 00:15:43,670 And those observations are going to be [INAUDIBLE] z1. 342 00:15:43,670 --> 00:15:45,633 Then for those observations, we're 343 00:15:45,633 --> 00:15:47,480 going to start off our particle filter 344 00:15:47,480 --> 00:15:50,690 and guess a certain number of probable locations. 345 00:15:50,690 --> 00:15:53,830 We're going to use our actuation to update it. 346 00:15:53,830 --> 00:15:57,680 And then we're going to simulate observations at that location, 347 00:15:57,680 --> 00:15:59,510 and compare what we expect to see 348 00:15:59,510 --> 00:16:02,630 for that particle on our map versus what we actually saw 349 00:16:02,630 --> 00:16:05,000 that we made our observation. 350 00:16:05,000 --> 00:16:06,380 This is going to be our update. 351 00:16:06,380 --> 00:16:08,300 And this is what we're going to focus on understanding, 352 00:16:08,300 --> 00:16:10,633 since a lot of the rest of this is covered by a particle 353 00:16:10,633 --> 00:16:12,830 filter. 354 00:16:12,830 --> 00:16:15,424 So what kind of models can we pick, 355 00:16:15,424 --> 00:16:17,840 because we need to define what our observation looks like. 356 00:16:17,840 --> 00:16:20,280 And you get a lot of choices. 357 00:16:20,280 --> 00:16:23,900 In metric localization, you'd use a labelled laser scan. 358 00:16:23,900 --> 00:16:26,520 You'd have perfect information about the environment. 359 00:16:26,520 --> 00:16:28,330 And so you can see everything. 360 00:16:28,330 --> 00:16:30,080 It might be nice to use a scene of objects 361 00:16:30,080 --> 00:16:33,080 at specific locations, but that requires once again, 362 00:16:33,080 --> 00:16:34,219 a lot of information. 363 00:16:34,219 --> 00:16:36,010 Now you need to know where the objects are. 364 00:16:36,010 --> 00:16:39,455 You need to have an idea of its exact specific locations 365 00:16:39,455 --> 00:16:41,450 and orientations with respect each other. 366 00:16:41,450 --> 00:16:44,230 Maybe it might be nice just use, like, a bag of objects. 367 00:16:44,230 --> 00:16:48,460 I see four things in my view pane versus three. 368 00:16:48,460 --> 00:16:50,960 These are all choices, and we're going to take a look just 369 00:16:50,960 --> 00:16:53,939 really quickly at what the facts of these are. 370 00:16:53,939 --> 00:16:55,480 As we've stated before, there's a lot 371 00:16:55,480 --> 00:16:56,580 of choices in observation. 372 00:16:56,580 --> 00:16:59,280 It can get complicated really quickly. 373 00:16:59,280 --> 00:17:00,920 So just to impress that upon you, 374 00:17:00,920 --> 00:17:02,760 imagine if you used laser scanners 375 00:17:02,760 --> 00:17:04,800 when you have three objects here-- 376 00:17:04,800 --> 00:17:08,089 for instance, a house, a couple of trees, and a mailbox. 377 00:17:08,089 --> 00:17:10,400 You check each line for an intersection. 378 00:17:10,400 --> 00:17:13,250 And then you have to figure out what counts as your detection, 379 00:17:13,250 --> 00:17:15,200 since you're going to have to differentiate 380 00:17:15,200 --> 00:17:17,720 what's in your scene. 381 00:17:17,720 --> 00:17:19,160 The problem with that might be is 382 00:17:19,160 --> 00:17:22,800 that, well, you saw an entire wall, for instance. 383 00:17:22,800 --> 00:17:24,880 Where do you want the house to be? 384 00:17:24,880 --> 00:17:26,720 So let's say we assume that objects 385 00:17:26,720 --> 00:17:28,222 were a point at some point. 386 00:17:28,222 --> 00:17:30,430 You either completely see it or you completely don't. 387 00:17:30,430 --> 00:17:33,980 That way, you don't have to half-see an object. 388 00:17:33,980 --> 00:17:35,570 You just check to see if whatever 389 00:17:35,570 --> 00:17:37,010 you want your center of mass to be 390 00:17:37,010 --> 00:17:39,590 intersects with your current view plane. 391 00:17:39,590 --> 00:17:42,020 If it does, then you're good. 392 00:17:42,020 --> 00:17:44,240 The issue with that is that, for instance, 393 00:17:44,240 --> 00:17:47,180 something center of view isn't inside the scene anymore, 394 00:17:47,180 --> 00:17:48,650 you just completely ignore it. 395 00:17:48,650 --> 00:17:50,615 We have the exact opposite problem. 396 00:17:50,615 --> 00:17:52,750 So you might want to make it a bit more complex 397 00:17:52,750 --> 00:17:55,025 and have polygons, parts of objects. 398 00:17:55,025 --> 00:17:57,910 Do you see some percentage of something? 399 00:17:57,910 --> 00:18:00,800 How much of it is in the view plane? 400 00:18:00,800 --> 00:18:03,260 It adds in a lot of chance for errors. 401 00:18:03,260 --> 00:18:05,710 And that's, I guess, the big point here to remember, 402 00:18:05,710 --> 00:18:08,210 is that depending on how we characterize our observations, 403 00:18:08,210 --> 00:18:11,280 we have different ways to get things wrong. 404 00:18:11,280 --> 00:18:14,570 So let's say, for instance, for an observation like distance 405 00:18:14,570 --> 00:18:19,610 and bearing to some new scenic object, what can be wrong? 406 00:18:19,610 --> 00:18:22,140 Well, you have noise inside of your sensors. 407 00:18:22,140 --> 00:18:23,640 Your sensors might have limitations. 408 00:18:23,640 --> 00:18:25,874 You can't necessarily see to infinity. 409 00:18:25,874 --> 00:18:27,290 So an object might be too far away 410 00:18:27,290 --> 00:18:28,970 for you to identify correctly. 411 00:18:28,970 --> 00:18:31,370 It might be rotated in a way that you can't necessarily 412 00:18:31,370 --> 00:18:33,154 interpret it correctly. 413 00:18:33,154 --> 00:18:35,570 How about if you want to have your observations in classes 414 00:18:35,570 --> 00:18:37,530 of objects, so of just everything 415 00:18:37,530 --> 00:18:39,620 being an obstruction, now you have different types 416 00:18:39,620 --> 00:18:40,400 of obstructions? 417 00:18:40,400 --> 00:18:41,960 Trees are different from mailboxes, 418 00:18:41,960 --> 00:18:45,640 of course, in which case you have a classification error. 419 00:18:45,640 --> 00:18:47,210 What if you see a tree and you just 420 00:18:47,210 --> 00:18:50,157 think it's a really, really big mailbox, or you see a mailbox 421 00:18:50,157 --> 00:18:51,740 and you think it's a really small tree 422 00:18:51,740 --> 00:18:54,520 with a funny, little, metal top? 423 00:18:54,520 --> 00:18:55,970 These kind of errors can then make 424 00:18:55,970 --> 00:18:57,590 your scenes look incorrect. 425 00:18:57,590 --> 00:19:00,450 If you decide to have sets of objects, well, 426 00:19:00,450 --> 00:19:01,680 what permutations matter? 427 00:19:01,680 --> 00:19:04,040 If you don't have a way of differentiating elements 428 00:19:04,040 --> 00:19:06,659 in the set, then you don't know if two trees and a mailbox 429 00:19:06,659 --> 00:19:08,950 with the trees on the left and the mailbox on the right 430 00:19:08,950 --> 00:19:10,570 or vice versa aren't the same thing? 431 00:19:13,200 --> 00:19:16,040 So with that in mind, we're going 432 00:19:16,040 --> 00:19:17,780 to be talking about how we're going 433 00:19:17,780 --> 00:19:21,500 to solve this question of given some position 434 00:19:21,500 --> 00:19:23,900 and your synthetic view, how likely 435 00:19:23,900 --> 00:19:25,997 is it that it's your actual observation? 436 00:19:25,997 --> 00:19:28,580 And for that, we're just going to change what this probability 437 00:19:28,580 --> 00:19:31,730 statement looks like to make it a little more concrete. 438 00:19:31,730 --> 00:19:35,540 In this case, Z is the set of observed objects that you have. 439 00:19:35,540 --> 00:19:38,210 So we're going to say that there are objects in the scene, 440 00:19:38,210 --> 00:19:41,300 and we put them inside of this value, Z. 441 00:19:41,300 --> 00:19:43,940 So for instance, you see a house and you see a mailbox. 442 00:19:43,940 --> 00:19:46,580 That would be Z. Your set of objects is just two things. 443 00:19:46,580 --> 00:19:50,150 We're going to use the bag of objects approximation. 444 00:19:50,150 --> 00:19:52,880 Y of x is going to be the set of objects 445 00:19:52,880 --> 00:19:56,401 you expect to see given your map for the position that you're 446 00:19:56,401 --> 00:19:56,900 at. 447 00:19:56,900 --> 00:19:59,709 So given a position X, Y is a set of objects 448 00:19:59,709 --> 00:20:00,500 that you would see. 449 00:20:00,500 --> 00:20:02,125 So you might be in a position where you 450 00:20:02,125 --> 00:20:03,720 can see a house and a mailbox. 451 00:20:03,720 --> 00:20:06,304 Then Y is is a house and a mailbox. 452 00:20:06,304 --> 00:20:07,970 And X is just going to be your position. 453 00:20:07,970 --> 00:20:10,261 That's the element that we're getting from our particle 454 00:20:10,261 --> 00:20:11,880 filter. 455 00:20:11,880 --> 00:20:14,480 So in our example, as you can tell 456 00:20:14,480 --> 00:20:16,247 by how much I talk about it, we're 457 00:20:16,247 --> 00:20:18,330 going to use just two things, trees and mailboxes, 458 00:20:18,330 --> 00:20:21,530 because I like them. 459 00:20:21,530 --> 00:20:22,850 So here's my map. 460 00:20:22,850 --> 00:20:25,220 There's going to be a long road, and off to the side, 461 00:20:25,220 --> 00:20:26,418 we have trees and mailboxes. 462 00:20:26,418 --> 00:20:27,876 And let's say that your robot wants 463 00:20:27,876 --> 00:20:29,610 to be a paper delivery boy. 464 00:20:29,610 --> 00:20:32,210 So he needs to be able to figure out where the mailboxes are 465 00:20:32,210 --> 00:20:33,800 and how far down the street he is. 466 00:20:33,800 --> 00:20:35,750 Because if he were to throw the paper wrong, 467 00:20:35,750 --> 00:20:38,940 he'd throw a paper into a tree. 468 00:20:38,940 --> 00:20:41,440 So in this world, our robot is going to be here, represented 469 00:20:41,440 --> 00:20:42,950 by an orange hexagon. 470 00:20:42,950 --> 00:20:45,530 And it has this field of view. 471 00:20:45,530 --> 00:20:48,510 So given this field of view, what does Z look like, 472 00:20:48,510 --> 00:20:53,850 our actual observation, one tree and one mailbox? 473 00:20:53,850 --> 00:20:56,305 Well, for this case, what we're going to say is, 474 00:20:56,305 --> 00:20:58,430 we're assuming that as long as the thing intersects 475 00:20:58,430 --> 00:21:00,980 with our field of view, we're going to see it. 476 00:21:00,980 --> 00:21:02,332 So simple finding that. 477 00:21:02,332 --> 00:21:03,790 We're just going to say that we see 478 00:21:03,790 --> 00:21:05,660 both trees and this mailbox. 479 00:21:05,660 --> 00:21:07,570 Once again, we talked about how difficult 480 00:21:07,570 --> 00:21:08,870 it is to do this observation. 481 00:21:08,870 --> 00:21:11,210 So for a simplifying assumption, let's say that we just 482 00:21:11,210 --> 00:21:14,360 completely see the tree. 483 00:21:14,360 --> 00:21:18,320 So that's great for our actual robot position. 484 00:21:18,320 --> 00:21:20,127 But when we start spawning particles, 485 00:21:20,127 --> 00:21:21,710 we need to figure out what we're going 486 00:21:21,710 --> 00:21:23,940 to say they synthetically do. 487 00:21:23,940 --> 00:21:27,325 So say we spawn a particle here, and we spawn one 488 00:21:27,325 --> 00:21:28,960 when we're just slightly off the road. 489 00:21:28,960 --> 00:21:30,001 We deviated a little bit. 490 00:21:30,001 --> 00:21:32,210 We're further forward than the actual position. 491 00:21:32,210 --> 00:21:35,870 We drove into somebody's house, another guy's house 492 00:21:35,870 --> 00:21:37,170 in the woods. 493 00:21:37,170 --> 00:21:40,250 Then what is each of synthetic observations for these given 494 00:21:40,250 --> 00:21:41,390 points? 495 00:21:41,390 --> 00:21:43,806 Determining this is going to determine how we actually get 496 00:21:43,806 --> 00:21:46,050 that probability calculation. 497 00:21:46,050 --> 00:21:48,410 So what do we need to consider here? 498 00:21:48,410 --> 00:21:51,620 Well, the first thing we need to consider is classification. 499 00:21:51,620 --> 00:21:54,337 Like we said before, with this set of objects approximation, 500 00:21:54,337 --> 00:21:56,420 it's important that you understand if you classify 501 00:21:56,420 --> 00:21:58,880 things correctly or not. 502 00:21:58,880 --> 00:22:02,060 Past that, like we asked before, we said, 503 00:22:02,060 --> 00:22:04,140 wait, why didn't I just see a tree and a mailbox? 504 00:22:04,140 --> 00:22:05,890 Well, we need to know if we saw everything 505 00:22:05,890 --> 00:22:07,430 inside of our field of view. 506 00:22:07,430 --> 00:22:08,900 Maybe we did miss something. 507 00:22:08,900 --> 00:22:11,960 Maybe instead that old scene saw just one tree, 508 00:22:11,960 --> 00:22:15,020 one mailbox, even though the other tree intersected it. 509 00:22:15,020 --> 00:22:18,560 And noise can happen in reverse, so we could accidentally 510 00:22:18,560 --> 00:22:21,650 see a tree when there actually isn't one. 511 00:22:21,650 --> 00:22:24,700 And finally, we could see things overlap. 512 00:22:24,700 --> 00:22:27,220 We kind of ignored this before, but what if two trees were 513 00:22:27,220 --> 00:22:28,480 right on top of each other? 514 00:22:28,480 --> 00:22:30,080 It might make it kind of difficult 515 00:22:30,080 --> 00:22:32,120 for you to successfully see that there are 516 00:22:32,120 --> 00:22:34,910 two trees there instead of one. 517 00:22:34,910 --> 00:22:38,340 So to start with, we're going to strike this assumption. 518 00:22:38,340 --> 00:22:41,300 It will become more evident later why this is important. 519 00:22:41,300 --> 00:22:43,010 But for now, we're just going to assume 520 00:22:43,010 --> 00:22:46,610 that every observation corresponds 521 00:22:46,610 --> 00:22:49,160 to only one object being seen. 522 00:22:49,160 --> 00:22:50,770 Otherwise, you could end up infinitely 523 00:22:50,770 --> 00:22:52,040 expanding your scene. 524 00:22:52,040 --> 00:22:52,880 Think about it. 525 00:22:52,880 --> 00:22:53,776 You see a tree. 526 00:22:53,776 --> 00:22:56,150 There might be a possibility that that tree is two trees. 527 00:22:56,150 --> 00:22:57,750 Well, you saw two trees, then. 528 00:22:57,750 --> 00:23:00,240 So maybe there's a probability that each of those two trees 529 00:23:00,240 --> 00:23:01,860 is also two trees. 530 00:23:01,860 --> 00:23:03,450 And you keep going and keep going, 531 00:23:03,450 --> 00:23:06,094 until the entire forest is behind one tree. 532 00:23:06,094 --> 00:23:08,510 It would be kind of bad for doing probability calculation, 533 00:23:08,510 --> 00:23:11,009 because you'd eventually have to cut that off at some point, 534 00:23:11,009 --> 00:23:13,329 so that your algorithm actually finishes. 535 00:23:13,329 --> 00:23:14,870 So for our purposes, we're just going 536 00:23:14,870 --> 00:23:17,411 to cut it off at the start and say that everything's just one 537 00:23:17,411 --> 00:23:17,920 object. 538 00:23:17,920 --> 00:23:20,210 It's just error. 539 00:23:20,210 --> 00:23:23,760 So now we need to talk about if we classify it correctly. 540 00:23:23,760 --> 00:23:25,840 If we can solve this equation, what's 541 00:23:25,840 --> 00:23:28,490 the probability that our classification is right? 542 00:23:28,490 --> 00:23:31,430 So for now, we're going to make two simplifying assumptions. 543 00:23:31,430 --> 00:23:34,700 They're going to remove the two problems that we had before. 544 00:23:34,700 --> 00:23:36,590 And don't worry, we'll relax them later. 545 00:23:36,590 --> 00:23:38,240 For our first assumption, we're going 546 00:23:38,240 --> 00:23:40,916 to assume that we see everything inside of our field of view. 547 00:23:40,916 --> 00:23:43,550 So that means we're not going to have any misconceptions. 548 00:23:43,550 --> 00:23:45,930 And we never see something that doesn't exist. 549 00:23:45,930 --> 00:23:47,644 So we have no false detections. 550 00:23:47,644 --> 00:23:49,810 Everything that's in the scene, we see successfully. 551 00:23:49,810 --> 00:23:52,640 If it's not in the scene, we don't see it. 552 00:23:52,640 --> 00:23:55,700 So given that, and we spawn a robot here, 553 00:23:55,700 --> 00:24:00,240 and it has this field of view, What does this robot see? 554 00:24:00,240 --> 00:24:01,530 AUDIENCE: Three trees. 555 00:24:01,530 --> 00:24:02,810 DAVID STINGLEY: Yes. 556 00:24:02,810 --> 00:24:04,500 It happens to see three trees. 557 00:24:04,500 --> 00:24:06,470 So remember our assumptions. 558 00:24:06,470 --> 00:24:10,380 We're going to say here that our actual observation for wherever 559 00:24:10,380 --> 00:24:13,790 our robot is is one mailbox, and two trees. 560 00:24:13,790 --> 00:24:15,980 And we can see that the synthetic robot that we made 561 00:24:15,980 --> 00:24:17,810 saw three trees. 562 00:24:17,810 --> 00:24:20,360 So what are some other forms of Y 563 00:24:20,360 --> 00:24:25,760 that we can make that would also map to this? 564 00:24:25,760 --> 00:24:28,570 What's the way that we can take our Y and transform it 565 00:24:28,570 --> 00:24:30,320 so that it maps this Z? 566 00:24:30,320 --> 00:24:34,070 What kind of action would we have to take on this? 567 00:24:34,070 --> 00:24:36,320 If you were thinking that we'd have to misclassify one 568 00:24:36,320 --> 00:24:38,780 of these trees, you're correct. 569 00:24:38,780 --> 00:24:41,169 And remember, this is just a set of objects. 570 00:24:41,169 --> 00:24:43,085 So this doesn't have to be the first tree that 571 00:24:43,085 --> 00:24:44,150 got misclassified. 572 00:24:44,150 --> 00:24:45,220 There's three of them. 573 00:24:45,220 --> 00:24:49,500 We could have any permutation of these trees get misclassified. 574 00:24:49,500 --> 00:24:50,500 So it becomes important. 575 00:24:50,500 --> 00:24:52,270 And we're going to introduce this concept 576 00:24:52,270 --> 00:24:54,050 of this operator, phi. 577 00:24:54,050 --> 00:24:55,820 And what phi is going to do is it's 578 00:24:55,820 --> 00:24:59,870 going to be a way to map the permutations that we could have 579 00:24:59,870 --> 00:25:04,520 of misclassifications for Y to look like Z. That way, 580 00:25:04,520 --> 00:25:07,520 we don't have to try and write all the permutations down. 581 00:25:07,520 --> 00:25:10,040 It's possible to do this essentially with a matrix, 582 00:25:10,040 --> 00:25:12,000 like a permutation matrix, that just reorders 583 00:25:12,000 --> 00:25:13,280 the elements that you have. 584 00:25:15,900 --> 00:25:19,650 So what does this probability look like now? 585 00:25:19,650 --> 00:25:22,200 We're going to use the lower case z, y and i 586 00:25:22,200 --> 00:25:24,770 to represent each individual element of those sets. 587 00:25:24,770 --> 00:25:29,945 So for some element in the actual observation, z, 588 00:25:29,945 --> 00:25:31,870 what's the probability that some element 589 00:25:31,870 --> 00:25:35,360 in our synthetic observation matches it? 590 00:25:35,360 --> 00:25:37,676 Well, for that we need to pick-- well, 591 00:25:37,676 --> 00:25:39,050 there's some probability of being 592 00:25:39,050 --> 00:25:42,480 wrong or a misclassifying, or classifying correctly. 593 00:25:42,480 --> 00:25:45,530 So we're going to use c to represent a classification 594 00:25:45,530 --> 00:25:46,040 matrix. 595 00:25:46,040 --> 00:25:48,700 So there's some probability that we classify correctly, 596 00:25:48,700 --> 00:25:51,470 usually a higher probability, and then some small probability 597 00:25:51,470 --> 00:25:54,900 that an object becomes another type of object. 598 00:25:54,900 --> 00:25:58,160 So how often we misclassify is represented here. 599 00:25:58,160 --> 00:25:59,870 But we could add another term. 600 00:25:59,870 --> 00:26:02,090 Let's say that our classification engine gave 601 00:26:02,090 --> 00:26:04,160 the weighted values of things. 602 00:26:04,160 --> 00:26:07,910 It's common for neural nets and other types of systems 603 00:26:07,910 --> 00:26:09,740 that observe images to kind of give weights 604 00:26:09,740 --> 00:26:11,280 to their classification. 605 00:26:11,280 --> 00:26:13,510 So you might want to use those weights to represent 606 00:26:13,510 --> 00:26:14,450 our confidence. 607 00:26:14,450 --> 00:26:16,870 And there's a problem that that confidence 608 00:26:16,870 --> 00:26:19,130 determines that our classification might just 609 00:26:19,130 --> 00:26:21,210 be wrong out the get-go. 610 00:26:21,210 --> 00:26:22,710 In that case, we want to know what's 611 00:26:22,710 --> 00:26:25,370 the probably that the score is likely to be 612 00:26:25,370 --> 00:26:26,690 that type of object. 613 00:26:26,690 --> 00:26:28,700 And then we could have other things. 614 00:26:28,700 --> 00:26:30,740 For instance, let's say that our classification 615 00:26:30,740 --> 00:26:33,586 was way better at identifying mailboxes from the front. 616 00:26:33,586 --> 00:26:35,210 And if we turn the mailbox to the side, 617 00:26:35,210 --> 00:26:37,021 it gives poor observations. 618 00:26:37,021 --> 00:26:39,020 And it tells us if it thinks that the mailbox is 619 00:26:39,020 --> 00:26:41,210 on the side, so it doesn't really know. 620 00:26:41,210 --> 00:26:43,040 In that case, maybe having the bearing 621 00:26:43,040 --> 00:26:44,930 of the object inside of our synthetic view 622 00:26:44,930 --> 00:26:47,210 allows us to determine another probability 623 00:26:47,210 --> 00:26:49,286 for misclassification. 624 00:26:49,286 --> 00:26:51,590 The important thing to notice here 625 00:26:51,590 --> 00:26:54,380 is that we can keep adding more terms to this. 626 00:26:54,380 --> 00:26:56,630 The more specific your classification can get, 627 00:26:56,630 --> 00:26:58,610 the more terms you can introduce into it. 628 00:26:58,610 --> 00:27:01,640 So add some confusion to it, and make the probabilities 629 00:27:01,640 --> 00:27:06,660 of misclassification smaller, and smaller, and smaller. 630 00:27:06,660 --> 00:27:08,300 So with that in mind, we're going 631 00:27:08,300 --> 00:27:10,620 to look at what the probability for these entire sets 632 00:27:10,620 --> 00:27:12,500 is going to look like. 633 00:27:12,500 --> 00:27:14,000 And what is really is going to be is 634 00:27:14,000 --> 00:27:17,140 it's just going to be a product over all these classifications 635 00:27:17,140 --> 00:27:20,237 for some selection of phi. 636 00:27:20,237 --> 00:27:22,820 So you're going to have to take all the different permutations 637 00:27:22,820 --> 00:27:25,455 that you could possibly have, and for each of them, 638 00:27:25,455 --> 00:27:27,580 you're going to multiply all of these probabilities 639 00:27:27,580 --> 00:27:30,380 by each other. 640 00:27:30,380 --> 00:27:33,570 In the end, it's going to give you that entire probability. 641 00:27:33,570 --> 00:27:35,740 So it's all the permutations that you have, 642 00:27:35,740 --> 00:27:37,760 and then just the probability of each of those objects being 643 00:27:37,760 --> 00:27:39,410 classified as that type-- that you just 644 00:27:39,410 --> 00:27:42,720 map one set to the other set. 645 00:27:42,720 --> 00:27:44,600 So now let's take a look at another scene. 646 00:27:44,600 --> 00:27:47,720 So we spawned a particle above, fake robot here. 647 00:27:47,720 --> 00:27:49,510 And it has this field of view. 648 00:27:49,510 --> 00:27:51,790 What does this field of view look like? 649 00:27:51,790 --> 00:27:55,970 If you said it looked like two mailboxes and two trees, 650 00:27:55,970 --> 00:27:56,512 you're right. 651 00:27:56,512 --> 00:27:58,219 And so we're going to assume that we have 652 00:27:58,219 --> 00:27:59,640 the same actual observation. 653 00:27:59,640 --> 00:28:03,250 We still see a mailbox and two trees. 654 00:28:03,250 --> 00:28:05,330 And we're going to remove our old assumption. 655 00:28:05,330 --> 00:28:07,280 We're going to say that it might be possible 656 00:28:07,280 --> 00:28:11,180 that we don't see everything in a synthetic robot's field. 657 00:28:11,180 --> 00:28:15,100 In which case, how can we amp this synthetic observation 658 00:28:15,100 --> 00:28:17,570 to the actual observation? 659 00:28:17,570 --> 00:28:19,130 Well, we just add in a probability 660 00:28:19,130 --> 00:28:21,050 that we miss identifying an object. 661 00:28:21,050 --> 00:28:22,925 If we just didn't see one of these mailboxes, 662 00:28:22,925 --> 00:28:26,240 it looks exactly like this. 663 00:28:26,240 --> 00:28:29,780 So for that, we want to capture what's the probability that we 664 00:28:29,780 --> 00:28:30,560 see nothing? 665 00:28:30,560 --> 00:28:33,060 So say we have a synthetic view with some number of objects. 666 00:28:33,060 --> 00:28:37,590 What's the probability we just miss all of them? 667 00:28:37,590 --> 00:28:40,790 So the probability that we miss all of them, 668 00:28:40,790 --> 00:28:42,350 we're going add an assumption here, 669 00:28:42,350 --> 00:28:44,433 which is that we're going to say that there exists 670 00:28:44,433 --> 00:28:47,230 some probability that we see an object for a given 671 00:28:47,230 --> 00:28:51,470 synthetic view here, and that we don't see it with a probability 672 00:28:51,470 --> 00:28:52,612 one minus that probability. 673 00:28:52,612 --> 00:28:54,320 So essentially, there's just two states-- 674 00:28:54,320 --> 00:28:59,030 either we see the object or we don't see the object. 675 00:28:59,030 --> 00:29:01,640 And we're going to say that probability of identifying 676 00:29:01,640 --> 00:29:03,230 objects is independent. 677 00:29:03,230 --> 00:29:05,930 So if we see one object, then that 678 00:29:05,930 --> 00:29:08,870 does not change our chance of seeing the next object. 679 00:29:08,870 --> 00:29:11,750 Both of these assumptions help simplify the math here. 680 00:29:11,750 --> 00:29:14,130 In reality, these might be strongly interlinked. 681 00:29:14,130 --> 00:29:16,620 For instance, if your robot's camera is broken, 682 00:29:16,620 --> 00:29:18,246 the probability that it doesn't see one 683 00:29:18,246 --> 00:29:20,203 object directly correlates with the other ones, 684 00:29:20,203 --> 00:29:21,860 because it won't see any of the objects 685 00:29:21,860 --> 00:29:24,290 successfully if it has no camera any more. 686 00:29:24,290 --> 00:29:26,030 If your robot drives into a wall, 687 00:29:26,030 --> 00:29:28,196 and can't see anything because it's staring straight 688 00:29:28,196 --> 00:29:30,380 at the wall, the same kind of idea holds. 689 00:29:30,380 --> 00:29:31,800 But for the purposes of making it 690 00:29:31,800 --> 00:29:33,350 so that you don't have a lot of covariances, 691 00:29:33,350 --> 00:29:35,420 and a really, really big conditional statement 692 00:29:35,420 --> 00:29:37,970 of a lot of probabilities of seeing things, 693 00:29:37,970 --> 00:29:42,250 we make these assumptions for our items to be independent. 694 00:29:42,250 --> 00:29:44,450 Independence means we can multiply our probability 695 00:29:44,450 --> 00:29:45,780 successfully. 696 00:29:45,780 --> 00:29:47,990 So if we just don't want to see any of the objects, 697 00:29:47,990 --> 00:29:49,495 we just take 1 minus the probability 698 00:29:49,495 --> 00:29:52,080 of seeing the object for all the objects in the scene. 699 00:29:56,330 --> 00:30:00,476 So what goes hand-in-hand with not seeing anything? 700 00:30:00,476 --> 00:30:02,630 Think about if you had a robot far off 701 00:30:02,630 --> 00:30:03,965 in the distance over here. 702 00:30:03,965 --> 00:30:05,480 What can it see? 703 00:30:05,480 --> 00:30:07,100 Just two trees. 704 00:30:07,100 --> 00:30:10,370 So now, we're going to remove the idea that we can't 705 00:30:10,370 --> 00:30:12,420 see things that don't exist. 706 00:30:12,420 --> 00:30:15,230 So to make two trees map to two trees and a mailbox, 707 00:30:15,230 --> 00:30:19,120 we just made up a mailbox of some noise. 708 00:30:19,120 --> 00:30:21,720 So what's the probability that we see a full scene 709 00:30:21,720 --> 00:30:23,444 when we can't see anything? 710 00:30:23,444 --> 00:30:25,860 And if you're thinking it's going to map to a very similar 711 00:30:25,860 --> 00:30:27,765 formula, you're kind of right. 712 00:30:27,765 --> 00:30:30,670 But we need to figure out a way to capture our noise statement. 713 00:30:30,670 --> 00:30:32,840 Specifically, we are going to use noise 714 00:30:32,840 --> 00:30:35,440 as a Poisson variable, which means that there's always 715 00:30:35,440 --> 00:30:39,610 some probability of seeing an object out of nothing. 716 00:30:39,610 --> 00:30:43,030 And it's going to be coordinated and according to this factor Kz 717 00:30:43,030 --> 00:30:44,680 we'll get to on the next slide. 718 00:30:44,680 --> 00:30:47,260 You could choose a lot of different things 719 00:30:47,260 --> 00:30:49,680 to represent your noise in this case. 720 00:30:49,680 --> 00:30:52,240 A Poisson variable was just chosen by the specific method 721 00:30:52,240 --> 00:30:54,809 that we used and implemented from another paper. 722 00:30:54,809 --> 00:30:56,350 But with testing, you could find that 723 00:30:56,350 --> 00:30:58,170 different potential distributions 724 00:30:58,170 --> 00:31:02,170 map to your noise for your particular sensor better. 725 00:31:02,170 --> 00:31:05,239 So with a Poisson variable, what we're going to have is 726 00:31:05,239 --> 00:31:07,030 we're just going to have the product of all 727 00:31:07,030 --> 00:31:11,850 of our Poisson variables times this Kz factor for the given 728 00:31:11,850 --> 00:31:13,354 scene that we want to map to. 729 00:31:13,354 --> 00:31:14,770 Essentially, it's just the product 730 00:31:14,770 --> 00:31:16,600 of all these independent Poisson variables 731 00:31:16,600 --> 00:31:21,650 for each of our different objects that we want to create. 732 00:31:21,650 --> 00:31:25,120 So with that in mind, what's this Kz factor that we're 733 00:31:25,120 --> 00:31:27,205 multiplying everything by? 734 00:31:27,205 --> 00:31:28,580 What it's actually going to be is 735 00:31:28,580 --> 00:31:31,799 it's going to be a set of uniform distributions. 736 00:31:31,799 --> 00:31:33,340 These uniform distributions are going 737 00:31:33,340 --> 00:31:36,100 to be uniform over all the classifications we could get 738 00:31:36,100 --> 00:31:39,415 for an object we spawned from the noise, and all 739 00:31:39,415 --> 00:31:43,120 the possible scores, and all these possible bearings 740 00:31:43,120 --> 00:31:45,390 for this synthetic object. 741 00:31:45,390 --> 00:31:48,400 And if you remember a few slides ago or you rewind the video, 742 00:31:48,400 --> 00:31:51,790 you'll notice that these map directly to the categories 743 00:31:51,790 --> 00:31:53,950 that we put into that classification engine. 744 00:31:53,950 --> 00:31:54,940 In fact, they should. 745 00:31:54,940 --> 00:31:57,340 So if you added more things or took them out, 746 00:31:57,340 --> 00:31:59,740 you changed this uniform distribution. 747 00:31:59,740 --> 00:32:01,810 The idea is that when you synthetically 748 00:32:01,810 --> 00:32:03,280 create an object out of noise, you 749 00:32:03,280 --> 00:32:06,310 might get any of those types of objects. 750 00:32:06,310 --> 00:32:09,990 But if you needed to create a tree synthetically from seeing 751 00:32:09,990 --> 00:32:12,100 it in the noise, you might get a tree 752 00:32:12,100 --> 00:32:13,650 or you might get a mailbox. 753 00:32:13,650 --> 00:32:15,275 Either one of them might show up. 754 00:32:15,275 --> 00:32:17,560 If you had a different or more intelligent 755 00:32:17,560 --> 00:32:20,520 distribution for how you might misclassify things-- 756 00:32:20,520 --> 00:32:23,200 for instance, let's say your noise, whenever it shows up, 757 00:32:23,200 --> 00:32:25,030 always identifies trees. 758 00:32:25,030 --> 00:32:27,560 Whenever you accidentally see noise as an object, 759 00:32:27,560 --> 00:32:28,672 it's always a tree. 760 00:32:28,672 --> 00:32:31,255 Then you might only want to have a probability of seeing trees 761 00:32:31,255 --> 00:32:33,977 over here, and it would change you distribution. 762 00:32:33,977 --> 00:32:35,560 But for simplicity's sake, we're going 763 00:32:35,560 --> 00:32:37,720 to assume that it's equally probable 764 00:32:37,720 --> 00:32:42,230 that any type of object shows up out of noise. 765 00:32:42,230 --> 00:32:44,600 So now, we're going to put it all together, since we 766 00:32:44,600 --> 00:32:47,200 relaxed our assumptions. 767 00:32:47,200 --> 00:32:49,090 So we have a lot of different things 768 00:32:49,090 --> 00:32:52,090 that can potentially map to this actual scene. 769 00:32:52,090 --> 00:32:55,360 We have scenes that lack objects. 770 00:32:55,360 --> 00:32:58,920 We have scenes that lack objects and therefore, 771 00:32:58,920 --> 00:33:01,100 need to add the object in. 772 00:33:01,100 --> 00:33:04,030 And when they lack objects and have to add the object in, 773 00:33:04,030 --> 00:33:06,100 there's a chance they add in a different object. 774 00:33:06,100 --> 00:33:08,510 And then maybe, they just misclassify something, 775 00:33:08,510 --> 00:33:10,830 like our misclassification idea. 776 00:33:10,830 --> 00:33:13,100 But wait-- I guess we can make this more complicated. 777 00:33:13,100 --> 00:33:15,370 What if one of them just wasn't seen, 778 00:33:15,370 --> 00:33:17,470 so we'd take one of these out? 779 00:33:17,470 --> 00:33:20,760 And then we just see two things from noise. 780 00:33:20,760 --> 00:33:23,662 As you could see, these keep getting more complicated. 781 00:33:23,662 --> 00:33:26,120 But if you think about it, with our previous probabilities, 782 00:33:26,120 --> 00:33:27,940 is every single one of these is going 783 00:33:27,940 --> 00:33:30,440 to add a lot of probability terms 784 00:33:30,440 --> 00:33:32,590 to be multiplied by each other. 785 00:33:32,590 --> 00:33:35,720 You can keep getting as complicated as you want it to. 786 00:33:35,720 --> 00:33:37,570 If we removed our first assumption, 787 00:33:37,570 --> 00:33:39,540 we'd be adding in probabilities of two objects 788 00:33:39,540 --> 00:33:41,195 becoming one object. 789 00:33:41,195 --> 00:33:43,510 The idea is like higher power terms when 790 00:33:43,510 --> 00:33:45,310 you're doing approximations. 791 00:33:45,310 --> 00:33:48,100 The goal is to make sure that any particle you spawn 792 00:33:48,100 --> 00:33:51,160 could potentially be what you've actually seen. 793 00:33:51,160 --> 00:33:54,700 But we want the ones that are very low probability to really 794 00:33:54,700 --> 00:33:56,649 be very low probability. 795 00:33:56,649 --> 00:33:58,690 So we make sure that these incredibly complicated 796 00:33:58,690 --> 00:34:02,070 transformations are going to be so insignificant that they'll 797 00:34:02,070 --> 00:34:03,880 usually return to almost 0. 798 00:34:06,400 --> 00:34:09,690 So if we wanted to solve this, we 799 00:34:09,690 --> 00:34:12,100 need to start by using a little assumption, which 800 00:34:12,100 --> 00:34:14,280 is the fact that we now have to fold in the idea 801 00:34:14,280 --> 00:34:16,020 that we could have some missed detection 802 00:34:16,020 --> 00:34:17,690 and some false detections. 803 00:34:17,690 --> 00:34:19,570 So false detections would increase the number 804 00:34:19,570 --> 00:34:21,570 of objects we've seen over the number of objects 805 00:34:21,570 --> 00:34:23,537 that are actually presence. 806 00:34:23,537 --> 00:34:25,870 And missed detections would reduce the number of objects 807 00:34:25,870 --> 00:34:26,956 that are actually present. 808 00:34:26,956 --> 00:34:28,330 Notice that this always has to be 809 00:34:28,330 --> 00:34:30,429 same size as the number of expected objects, 810 00:34:30,429 --> 00:34:33,370 because you're trying to map whatever your synthetic scene 811 00:34:33,370 --> 00:34:37,060 is to the actual thing that you observe. 812 00:34:37,060 --> 00:34:40,060 When you do this, it really turns into a large number 813 00:34:40,060 --> 00:34:41,320 of multiplications. 814 00:34:41,320 --> 00:34:43,270 So this should look familiar. 815 00:34:43,270 --> 00:34:45,940 This is our term for successfully classifying 816 00:34:45,940 --> 00:34:47,514 all of our objects, and we're going 817 00:34:47,514 --> 00:34:48,889 to multiply it by the probability 818 00:34:48,889 --> 00:34:52,544 that we actually identify it, since we could now miss things. 819 00:34:52,544 --> 00:34:54,699 Then, we have to multiply that by the probability 820 00:34:54,699 --> 00:34:56,770 that for whatever objects we didn't see, 821 00:34:56,770 --> 00:34:59,630 that we actually missed them. 822 00:34:59,630 --> 00:35:02,234 And then finally, we have to add in the noise, 823 00:35:02,234 --> 00:35:04,150 because the noise is going to make the objects 824 00:35:04,150 --> 00:35:06,700 that we're missing show up, so that we 825 00:35:06,700 --> 00:35:09,370 have a classification of the same size as the thing 826 00:35:09,370 --> 00:35:10,870 that we want to see. 827 00:35:10,870 --> 00:35:14,470 Now, one thing you should note is that we added in phi way 828 00:35:14,470 --> 00:35:16,110 down here at Kz. 829 00:35:16,110 --> 00:35:19,720 That's because we have to map these false detections that 830 00:35:19,720 --> 00:35:21,900 added to the number of objects that we could see, 831 00:35:21,900 --> 00:35:23,850 as well as the actual detections. 832 00:35:23,850 --> 00:35:26,460 Essentially, we have to take all the objects that 833 00:35:26,460 --> 00:35:28,440 are seen here, real or not, and map them 834 00:35:28,440 --> 00:35:30,245 to all the objects we expect to see, 835 00:35:30,245 --> 00:35:31,870 because they're going to be one to one. 836 00:35:35,340 --> 00:35:38,210 So let's take a look at a little video that shows what 837 00:35:38,210 --> 00:35:40,410 semantic localization can do. 838 00:35:40,410 --> 00:35:43,320 So in this video, for a little heads up, 839 00:35:43,320 --> 00:35:45,750 we have essentially a scene where 840 00:35:45,750 --> 00:35:48,270 they have mapped out two things inside of a suburban area. 841 00:35:48,270 --> 00:35:50,910 They've mapped out cars and windows 842 00:35:50,910 --> 00:35:53,372 on the path of the robot as it drives around an area. 843 00:35:53,372 --> 00:35:55,455 And they're attempting to use just the information 844 00:35:55,455 --> 00:35:57,944 of the cars and windows visible on the scene in order 845 00:35:57,944 --> 00:35:59,860 to localize where they believe the robot to be 846 00:35:59,860 --> 00:36:01,018 during its journey. 847 00:36:03,494 --> 00:36:04,160 [VIDEO PLAYBACK] 848 00:36:04,160 --> 00:36:06,750 - This is a video extension to the paper-- 849 00:36:06,750 --> 00:36:08,496 DAVID STINGLEY: So going to mute that. 850 00:36:08,496 --> 00:36:10,170 It's on. 851 00:36:10,170 --> 00:36:15,230 So this is a few steps into the process. 852 00:36:15,230 --> 00:36:17,180 And then it resets. 853 00:36:17,180 --> 00:36:20,080 It's spawn a number of points for potential locations. 854 00:36:20,080 --> 00:36:22,330 And then it very quickly localizes and finds 855 00:36:22,330 --> 00:36:24,350 its spectacular rotation. 856 00:36:24,350 --> 00:36:25,820 These are all the identifications 857 00:36:25,820 --> 00:36:27,070 that are happening inside of the scene. 858 00:36:27,070 --> 00:36:29,420 You can see the boundary boxes for cars and windows 859 00:36:29,420 --> 00:36:32,220 showing up, the cars in red, and the windows in green. 860 00:36:32,220 --> 00:36:35,300 It expands out its distribution of particles. 861 00:36:35,300 --> 00:36:36,950 And then shortly thereafter, it gets 862 00:36:36,950 --> 00:36:39,050 a couple of approximations. 863 00:36:39,050 --> 00:36:41,420 And then it settles in on a location. 864 00:36:41,420 --> 00:36:44,150 It happens to use a kind of like a centralized weight for where 865 00:36:44,150 --> 00:36:46,810 it is for the set of particles, and then it draws the car 866 00:36:46,810 --> 00:36:48,170 as being in that location. 867 00:36:48,170 --> 00:36:50,224 If you let it keep running, it occasionally 868 00:36:50,224 --> 00:36:51,890 expands back out to make sure it doesn't 869 00:36:51,890 --> 00:36:54,530 fall into a local minima, and then compresses once again 870 00:36:54,530 --> 00:36:56,690 to where its strongest belief is. 871 00:36:56,690 --> 00:36:59,270 Notice that it has a couple of seconds 872 00:36:59,270 --> 00:37:02,060 of having a very, very spread distribution, and very quickly 873 00:37:02,060 --> 00:37:03,980 converges into a singular point. 874 00:37:03,980 --> 00:37:06,200 That's because a lot of these locations 875 00:37:06,200 --> 00:37:07,830 become very, very low probability 876 00:37:07,830 --> 00:37:10,070 after you've seen a couple of scenes into the future. 877 00:37:13,255 --> 00:37:14,997 And we're going to pause this. 878 00:37:14,997 --> 00:37:15,580 [END PLAYBACK] 879 00:37:15,580 --> 00:37:18,100 I welcome you to go see the video, to see the video 880 00:37:18,100 --> 00:37:19,640 yourself if you want to. 881 00:37:19,640 --> 00:37:20,720 You can check the title. 882 00:37:20,720 --> 00:37:22,220 It's pretty easy to find on YouTube, 883 00:37:22,220 --> 00:37:25,093 because it's pretty much the only one. 884 00:37:25,093 --> 00:37:27,710 So hopefully, it works now. 885 00:37:27,710 --> 00:37:29,720 Step back over here to the other side. 886 00:37:29,720 --> 00:37:31,480 So why would semantic localization 887 00:37:31,480 --> 00:37:33,210 be useful in this manner? 888 00:37:33,210 --> 00:37:35,440 In the example that was shown, it was kind of 889 00:37:35,440 --> 00:37:37,090 done on post-processed data. 890 00:37:37,090 --> 00:37:38,020 They took a scene. 891 00:37:38,020 --> 00:37:39,540 They drove around it already. 892 00:37:39,540 --> 00:37:41,740 Then they did the identification on the video feed, 893 00:37:41,740 --> 00:37:45,210 and used that to do a localization. 894 00:37:45,210 --> 00:37:46,540 We talked about before. 895 00:37:46,540 --> 00:37:49,270 But people kind of use sparse information to do a lot 896 00:37:49,270 --> 00:37:51,527 of their-- or if you can't walk into a room-- 897 00:37:51,527 --> 00:37:53,860 people don't walk into a room and produce an exact laser 898 00:37:53,860 --> 00:37:56,840 scanned map of the entire area themselves. 899 00:37:56,840 --> 00:37:59,220 But they can store important information, 900 00:37:59,220 --> 00:38:01,750 like seeing objects and tables that they 901 00:38:01,750 --> 00:38:05,960 find with their shins, and use those to move around. 902 00:38:05,960 --> 00:38:09,089 Robots store that perfect map, if they can manage to make it. 903 00:38:09,089 --> 00:38:11,380 But they don't really have a good understanding of what 904 00:38:11,380 --> 00:38:13,990 that map might mean to a human. 905 00:38:13,990 --> 00:38:16,860 So that means that we're actually kind of like better 906 00:38:16,860 --> 00:38:18,970 at doing tasks with the environment. 907 00:38:18,970 --> 00:38:20,929 If we wanted to go directly to a location, 908 00:38:20,929 --> 00:38:22,720 we don't have to look around and figure out 909 00:38:22,720 --> 00:38:25,230 on a wall what our distance from the wall 910 00:38:25,230 --> 00:38:27,832 is before we can then localize and move ourselves 911 00:38:27,832 --> 00:38:28,790 back over to the table. 912 00:38:28,790 --> 00:38:30,190 We just say, oh, I want to go to the table. 913 00:38:30,190 --> 00:38:31,190 And you look over there. 914 00:38:31,190 --> 00:38:32,200 Oh, there's a table. 915 00:38:32,200 --> 00:38:34,000 Table, got it. 916 00:38:34,000 --> 00:38:36,610 So how can we make robots think a little more like humans, 917 00:38:36,610 --> 00:38:39,620 so it's easier for us to give them instructions? 918 00:38:39,620 --> 00:38:42,010 If we can make the robot use a map that 919 00:38:42,010 --> 00:38:43,960 has just scenic objects like we use 920 00:38:43,960 --> 00:38:45,960 a map that just has scenic objects, 921 00:38:45,960 --> 00:38:47,990 then order to move between scenic objects 922 00:38:47,990 --> 00:38:50,590 are as simple as turning, finding the object, and saying, 923 00:38:50,590 --> 00:38:52,570 oh, well, I know where the object is. 924 00:38:52,570 --> 00:38:54,210 So I know roughly where I need to be. 925 00:38:54,210 --> 00:38:57,200 And then we start moving in process towards it. 926 00:38:57,200 --> 00:38:59,756 In conclusion, semantic localization has a lot of-- 927 00:38:59,756 --> 00:39:01,506 I'm going to leave the references slide up 928 00:39:01,506 --> 00:39:03,589 while I talk, so that if you wanted to go and take 929 00:39:03,589 --> 00:39:05,960 a chance to see any of these papers, you definitely can. 930 00:39:05,960 --> 00:39:07,870 Most of the work in this slide comes directly 931 00:39:07,870 --> 00:39:09,070 from these sources. 932 00:39:09,070 --> 00:39:12,070 Semantic localization offers us the opportunity 933 00:39:12,070 --> 00:39:16,820 to take robots, use sparser information to potentially 934 00:39:16,820 --> 00:39:19,540 localize, find ourselves inside of spaces. 935 00:39:19,540 --> 00:39:22,270 It also gives us a chance to have really tweakable factors 936 00:39:22,270 --> 00:39:23,800 for how you might understand where 937 00:39:23,800 --> 00:39:25,220 you are in a space intuitively. 938 00:39:25,220 --> 00:39:26,970 If you think certain things are important, 939 00:39:26,970 --> 00:39:29,000 you can add them in as probabilistic factors. 940 00:39:29,000 --> 00:39:32,970 If you don't, you remove them just as well. 941 00:39:32,970 --> 00:39:33,552 Thank you. 942 00:39:33,552 --> 00:39:36,010 And this is essentially the conclusion of our presentation. 943 00:39:36,010 --> 00:39:38,920 I definitely recommend taking, if you wanted to use something 944 00:39:38,920 --> 00:39:43,815 like this, taking a look at this particular paper here. 945 00:39:43,815 --> 00:39:46,969 Notice how it says, "via the matrix permanent?" 946 00:39:46,969 --> 00:39:48,760 As I said before, a lot of these operations 947 00:39:48,760 --> 00:39:51,120 are a series of multiplications and some permutations. 948 00:39:51,120 --> 00:39:52,940 There's permutation matrices. 949 00:39:52,940 --> 00:39:54,820 There's matrix multiplication. 950 00:39:54,820 --> 00:39:57,280 And there's a lot of ways to make matrix multiplication 951 00:39:57,280 --> 00:39:58,480 faster. 952 00:39:58,480 --> 00:40:00,889 This paper details how you can take the math that 953 00:40:00,889 --> 00:40:02,680 was shown before that's pretty inefficient, 954 00:40:02,680 --> 00:40:04,304 and turn it into something that you can 955 00:40:04,304 --> 00:40:07,270 do an estimate of very quickly. 956 00:40:07,270 --> 00:40:10,920 [APPLAUSE]