1
00:00:01,580 --> 00:00:03,920
The following content is
provided under a Creative

2
00:00:03,920 --> 00:00:05,340
Commons license.

3
00:00:05,340 --> 00:00:07,550
Your support will help
MIT OpenCourseWare

4
00:00:07,550 --> 00:00:11,640
continue to offer high-quality
educational resources for free.

5
00:00:11,640 --> 00:00:14,180
To make a donation or to
view additional materials

6
00:00:14,180 --> 00:00:18,110
from hundreds of MIT courses,
visit MIT OpenCourseWare

7
00:00:18,110 --> 00:00:19,340
at ocw.mit.edu.

8
00:00:22,340 --> 00:00:23,620
GUEST SPEAKER: Hi, everybody.

9
00:00:23,620 --> 00:00:26,120
Today we're going to talk
about semantic localization.

10
00:00:29,180 --> 00:00:32,770
So first I'm going to talk about
what is semantic localization

11
00:00:32,770 --> 00:00:34,540
and what is the
motivation for it.

12
00:00:34,540 --> 00:00:36,370
Then we'll go
through an algorithm

13
00:00:36,370 --> 00:00:37,840
that will allow us to localize.

14
00:00:37,840 --> 00:00:40,080
And then we'll go into
how to actually add

15
00:00:40,080 --> 00:00:43,850
semantic information
into this algorithm.

16
00:00:43,850 --> 00:00:48,200
So our focus what we're
coming up with this

17
00:00:48,200 --> 00:00:50,670
was the orienteering
Grand Challenge.

18
00:00:50,670 --> 00:00:53,540
So in orienteering,
basically you have a map

19
00:00:53,540 --> 00:00:55,950
and you have a compass,
and you have to go around

20
00:00:55,950 --> 00:00:59,240
to various checkpoints.

21
00:00:59,240 --> 00:01:04,550
So as you can imagine,
this is sort of difficult,

22
00:01:04,550 --> 00:01:07,580
because you don't know where
you are on this map except what

23
00:01:07,580 --> 00:01:10,910
you can tell from your compass.

24
00:01:10,910 --> 00:01:13,010
And so we weren't sure
how to do this either,

25
00:01:13,010 --> 00:01:18,470
so we asked some
orienteering experts.

26
00:01:18,470 --> 00:01:22,720
So basically the key ideas
are, if you are disoriented,

27
00:01:22,720 --> 00:01:23,380
that's fine.

28
00:01:23,380 --> 00:01:26,330
You just need to find a
reference point so that you

29
00:01:26,330 --> 00:01:29,324
can figure out where you are.

30
00:01:29,324 --> 00:01:30,740
You can think about
where you last

31
00:01:30,740 --> 00:01:33,980
knew where you were, and
estimate what your movement has

32
00:01:33,980 --> 00:01:35,560
been since you were there.

33
00:01:35,560 --> 00:01:38,300
And that can give you some
options as to where you are.

34
00:01:38,300 --> 00:01:40,460
And you can look
around for features

35
00:01:40,460 --> 00:01:43,280
that you can identify
uniquely on the map.

36
00:01:46,340 --> 00:01:51,500
And this is an example
of an orienteering map.

37
00:01:51,500 --> 00:01:56,710
If you just have what's over
here, it's not very useful.

38
00:01:56,710 --> 00:02:00,081
You can't actually tell
what those things are.

39
00:02:00,081 --> 00:02:01,580
You might guess
that green is grass,

40
00:02:01,580 --> 00:02:03,505
but you don't actually
know what they are.

41
00:02:03,505 --> 00:02:05,780
This map isn't useful
until you actually

42
00:02:05,780 --> 00:02:08,509
have a legend to go
along with it so that you

43
00:02:08,509 --> 00:02:13,190
can say you have roads,
or foot paths, or pits,

44
00:02:13,190 --> 00:02:17,720
or different things on the map
that you can actually identify.

45
00:02:17,720 --> 00:02:21,950
So this is the difference
between how robots typically

46
00:02:21,950 --> 00:02:24,120
localize and how
humans localize.

47
00:02:24,120 --> 00:02:27,350
So humans think about
places in terms of features,

48
00:02:27,350 --> 00:02:30,800
in terms of what you can do
with them, things like that,

49
00:02:30,800 --> 00:02:34,580
whereas robots might measure
distances with laser scanners

50
00:02:34,580 --> 00:02:38,630
to form a perfect map
of the entire space,

51
00:02:38,630 --> 00:02:41,170
whereas humans might
think about obstructions

52
00:02:41,170 --> 00:02:44,100
such as rooms and their
relative location to each other.

53
00:02:46,700 --> 00:02:48,980
So we can define
semantic information

54
00:02:48,980 --> 00:02:50,870
as signs and
symbols that contain

55
00:02:50,870 --> 00:02:52,910
meaningful concepts for humans.

56
00:02:52,910 --> 00:02:56,900
Basically different
sorts of abstractions.

57
00:02:56,900 --> 00:03:00,135
And why might we want
this semantic information?

58
00:03:00,135 --> 00:03:03,770
Well, one very
important application

59
00:03:03,770 --> 00:03:05,840
would be improving
human-robot interaction.

60
00:03:05,840 --> 00:03:08,990
So say you want a robot
that can make coffee.

61
00:03:08,990 --> 00:03:11,680
It has to understand commands
like, go to the kitchen.

62
00:03:11,680 --> 00:03:12,320
Get a mug.

63
00:03:12,320 --> 00:03:14,230
Turn on the coffee maker.

64
00:03:14,230 --> 00:03:16,490
This is the language
that humans think in,

65
00:03:16,490 --> 00:03:19,840
and it's very useful if a
robot can actually understand

66
00:03:19,840 --> 00:03:21,380
this sort of language as well.

67
00:03:21,380 --> 00:03:26,300
Know what a coffee maker is,
what it does, and where it is.

68
00:03:26,300 --> 00:03:30,190
And additionally, you can get
some performance and memory

69
00:03:30,190 --> 00:03:30,820
benefits.

70
00:03:30,820 --> 00:03:34,100
You're not storing this
full map with the distance

71
00:03:34,100 --> 00:03:35,750
to every single surface.

72
00:03:35,750 --> 00:03:39,470
You're just storing
locations of key information.

73
00:03:39,470 --> 00:03:41,930
And this means that your search
base is a lot smaller too.

74
00:03:41,930 --> 00:03:44,060
So you can prune
everything that's

75
00:03:44,060 --> 00:03:47,450
not related to kitchens
or coffee makers,

76
00:03:47,450 --> 00:03:50,040
and get a performance that way.

77
00:03:50,040 --> 00:03:52,550
Finally, the fact that
you can use simply

78
00:03:52,550 --> 00:03:54,730
a camera instead of a
complicated laser scanner

79
00:03:54,730 --> 00:03:58,460
means that this is a lot cheaper
and more accessible for robots

80
00:03:58,460 --> 00:03:58,970
to have.

81
00:04:01,580 --> 00:04:03,800
So we've talked about what
semantic information is.

82
00:04:03,800 --> 00:04:07,370
So now let's define what
semantic localization is.

83
00:04:07,370 --> 00:04:09,770
So basically, it's
localizing based

84
00:04:09,770 --> 00:04:12,680
on semantic information rather
than metric information,

85
00:04:12,680 --> 00:04:15,630
like distances.

86
00:04:15,630 --> 00:04:17,779
So for our Grand
Challenge, we have a map

87
00:04:17,779 --> 00:04:19,760
with labeled objects
and their coordinates.

88
00:04:19,760 --> 00:04:22,220
This is our semantic
information.

89
00:04:22,220 --> 00:04:25,100
And we want to basically look
around and say, what can we

90
00:04:25,100 --> 00:04:26,300
see in our field of view?

91
00:04:26,300 --> 00:04:27,860
And from that, we
want to figure out

92
00:04:27,860 --> 00:04:30,260
where could we be on this map.

93
00:04:30,260 --> 00:04:33,590
And it's important to note
that map building is actually

94
00:04:33,590 --> 00:04:36,740
a very different and
hard problem with a lot

95
00:04:36,740 --> 00:04:37,820
of other research on it.

96
00:04:37,820 --> 00:04:39,860
In our case, we've
been given a map,

97
00:04:39,860 --> 00:04:42,680
and we're simply talking about
the problem of localizing

98
00:04:42,680 --> 00:04:45,440
based on that map.

99
00:04:45,440 --> 00:04:48,350
And now I'm going
to let Matt tell you

100
00:04:48,350 --> 00:04:52,100
about particle filters, which is
an algorithm for localization.

101
00:04:52,100 --> 00:04:54,240
MATT: All right, so
my name is Matt Deyo.

102
00:04:54,240 --> 00:04:56,448
Now we're going to talk
about particle filters.

103
00:04:56,448 --> 00:04:57,864
So this is a tool
that we're going

104
00:04:57,864 --> 00:04:59,740
to teach you to
use, specifically

105
00:04:59,740 --> 00:05:06,370
for state estimation, using the
given math and measurements.

106
00:05:06,370 --> 00:05:07,810
So what is localization?

107
00:05:07,810 --> 00:05:08,850
We just went over it.

108
00:05:08,850 --> 00:05:11,390
Specifically, it's the
question of, where am I?

109
00:05:11,390 --> 00:05:13,990
For any system to
work, the robot

110
00:05:13,990 --> 00:05:16,010
needs to know where
it is on a map.

111
00:05:16,010 --> 00:05:18,830
This is not so
simple of an answer.

112
00:05:18,830 --> 00:05:20,030
Metric localization.

113
00:05:20,030 --> 00:05:23,340
As was said before,
it is quantitative.

114
00:05:23,340 --> 00:05:28,460
So how many meters are you from
this wall or from your origin?

115
00:05:28,460 --> 00:05:31,628
What's your
orientation in degrees?

116
00:05:31,628 --> 00:05:33,887
And here's some
examples of that.

117
00:05:33,887 --> 00:05:35,595
You can convert between
coordinate frames

118
00:05:35,595 --> 00:05:36,970
really easily if
you have metric.

119
00:05:39,820 --> 00:05:44,640
So the localization is, well,
in our case, mathematical.

120
00:05:44,640 --> 00:05:48,990
The problem statement
is, you have a control u.

121
00:05:48,990 --> 00:05:54,070
You're going to observe
something about your state.

122
00:05:54,070 --> 00:05:55,290
It might be a camera image.

123
00:05:55,290 --> 00:05:57,730
It might be laser scans.

124
00:05:57,730 --> 00:05:59,320
And then essentially
on your map,

125
00:05:59,320 --> 00:06:04,820
you're going to use probability
to figure out where you are.

126
00:06:04,820 --> 00:06:07,990
So this equation specifically
is taking into account

127
00:06:07,990 --> 00:06:13,120
the probability of being out of
position at your current time

128
00:06:13,120 --> 00:06:15,517
based off your previous
position, the observation

129
00:06:15,517 --> 00:06:17,225
that you're taking
right now, the command

130
00:06:17,225 --> 00:06:21,360
variable you gave it, and
then your map, like I said.

131
00:06:21,360 --> 00:06:25,230
So that's built on the
observation noise, actuation,

132
00:06:25,230 --> 00:06:26,064
and then the belief.

133
00:06:26,064 --> 00:06:27,730
We're specifically
looking at the belief

134
00:06:27,730 --> 00:06:29,110
here for the particle filter.

135
00:06:33,080 --> 00:06:35,960
So in our case, we're
approximating it

136
00:06:35,960 --> 00:06:40,800
by several points, also
known as particles.

137
00:06:40,800 --> 00:06:43,280
So we're going to look
at an example here.

138
00:06:43,280 --> 00:06:45,400
I have to actually
press the Play button.

139
00:06:45,400 --> 00:06:46,594
OK.

140
00:06:46,594 --> 00:06:47,578
Here's a quick demo.

141
00:06:50,530 --> 00:06:55,960
[INAUDIBLE]

142
00:06:55,960 --> 00:06:59,662
So a quick YouTube demo of
a particle filter in action.

143
00:06:59,662 --> 00:07:01,120
So I'm just going
to show it first.

144
00:07:05,770 --> 00:07:07,130
So pause.

145
00:07:07,130 --> 00:07:10,310
The red dot here is our
actual robot position.

146
00:07:10,310 --> 00:07:14,270
And it's trying to
localize itself in a maze.

147
00:07:14,270 --> 00:07:17,670
There's black walls
throughout here.

148
00:07:17,670 --> 00:07:22,470
And it looks like a grid world,
but that initial distribution

149
00:07:22,470 --> 00:07:25,840
we had was completely
random across all the walls.

150
00:07:25,840 --> 00:07:30,640
So it's taking in observations
of probably laser range

151
00:07:30,640 --> 00:07:31,460
finders.

152
00:07:31,460 --> 00:07:34,830
So it essentially knows that
there are walls around it,

153
00:07:34,830 --> 00:07:38,960
and it knows that there's
not a wall in front of it.

154
00:07:38,960 --> 00:07:40,940
So that centers all
of these guesses

155
00:07:40,940 --> 00:07:42,590
on the center of hallways.

156
00:07:42,590 --> 00:07:46,220
Obvious it got rid of things
that were close to the walls.

157
00:07:46,220 --> 00:07:49,354
There's a low probability that
it's right up against the wall

158
00:07:49,354 --> 00:07:51,020
because of the
observations it's taking.

159
00:07:51,020 --> 00:07:53,055
And we're going to
revisit this example

160
00:07:53,055 --> 00:07:55,360
at the end of the presentation.

161
00:07:55,360 --> 00:07:56,120
Particle filters.

162
00:07:56,120 --> 00:07:58,410
So the method that
we're going to teach you

163
00:07:58,410 --> 00:08:00,090
has four easy steps.

164
00:08:00,090 --> 00:08:01,040
One is not repeated.

165
00:08:01,040 --> 00:08:05,250
So the initial step is your
first sample of particles.

166
00:08:05,250 --> 00:08:08,540
If you don't know anything
about your state at the start,

167
00:08:08,540 --> 00:08:11,330
then you can sample
just a distribution

168
00:08:11,330 --> 00:08:13,361
of your entire space.

169
00:08:13,361 --> 00:08:16,115
And then the repeated steps are
updating weights, resampling,

170
00:08:16,115 --> 00:08:17,550
and then propagating dynamics.

171
00:08:20,910 --> 00:08:23,290
So we're going to show
you a toy example.

172
00:08:23,290 --> 00:08:26,270
This is a really simple example
just to get the idea across.

173
00:08:26,270 --> 00:08:28,130
We're focusing on
just one dimension.

174
00:08:28,130 --> 00:08:31,300
We have an aircraft
at constant altitude.

175
00:08:31,300 --> 00:08:32,990
So take that variable out.

176
00:08:32,990 --> 00:08:36,080
The only thing we're
trying to localize to

177
00:08:36,080 --> 00:08:39,380
is a distance x over a map.

178
00:08:39,380 --> 00:08:41,659
The sensor is just
a range finder down

179
00:08:41,659 --> 00:08:44,360
to the ground at this
constant altitude.

180
00:08:44,360 --> 00:08:46,040
And then the math
is a known mapping

181
00:08:46,040 --> 00:08:47,750
of x location to
ground altitudes.

182
00:08:47,750 --> 00:08:49,400
And we're about to
see what that means.

183
00:08:49,400 --> 00:08:53,400
The goal here is determining
where you are in the mountains.

184
00:08:53,400 --> 00:08:55,400
So here we have our plane.

185
00:08:55,400 --> 00:08:57,590
As you see, we
know the map below,

186
00:08:57,590 --> 00:09:00,450
and we know that it has to
be flying at this altitude.

187
00:09:00,450 --> 00:09:03,830
So with a range finder, you can
measure different depths down

188
00:09:03,830 --> 00:09:04,940
to the mountain.

189
00:09:04,940 --> 00:09:07,990
So here we have long
distances, a medium distance,

190
00:09:07,990 --> 00:09:11,150
and a short distance if it's
directly on top of a mountain.

191
00:09:11,150 --> 00:09:12,380
Constant altitude, unknown x.

192
00:09:12,380 --> 00:09:13,690
That's what we're solving for.

193
00:09:13,690 --> 00:09:16,070
And then we actually have
a noisy velocity forward.

194
00:09:16,070 --> 00:09:20,570
So we know how fast the plane
wants to be traveling in this

195
00:09:20,570 --> 00:09:22,700
direction, but obviously
with some noise,

196
00:09:22,700 --> 00:09:26,430
it could be travelling a little
faster or a little slower.

197
00:09:26,430 --> 00:09:28,390
So the first step, sampling.

198
00:09:28,390 --> 00:09:29,810
Our initial belief
here is that we

199
00:09:29,810 --> 00:09:32,476
don't know where it is with
respect to these mountains.

200
00:09:32,476 --> 00:09:33,850
So we're going to
actually sample

201
00:09:33,850 --> 00:09:35,700
with a uniform distribution.

202
00:09:35,700 --> 00:09:37,930
Well, this is as uniform
as I could get it.

203
00:09:37,930 --> 00:09:42,500
So over this range, those
were our initial particles.

204
00:09:42,500 --> 00:09:45,482
Essentially guesses for
what the x location is.

205
00:09:45,482 --> 00:09:46,940
We're going to
update these weights

206
00:09:46,940 --> 00:09:48,390
based on the observation.

207
00:09:48,390 --> 00:09:51,290
So we get our first observation
from our range finder,

208
00:09:51,290 --> 00:09:52,370
and it's this length.

209
00:09:52,370 --> 00:09:53,670
So it's a long length.

210
00:09:53,670 --> 00:09:57,140
It's, I guess,
green in this case.

211
00:09:57,140 --> 00:09:58,857
This is our measured value.

212
00:09:58,857 --> 00:10:00,440
Here are all the
values that we expect

213
00:10:00,440 --> 00:10:01,523
to see at these particles.

214
00:10:01,523 --> 00:10:04,370
So this is pretty easy to
calculate, based on the map

215
00:10:04,370 --> 00:10:05,350
that you have.

216
00:10:05,350 --> 00:10:07,340
The position of each
particle maps directly

217
00:10:07,340 --> 00:10:12,410
to a range finder measurement.

218
00:10:12,410 --> 00:10:13,710
So we're going to compare them.

219
00:10:13,710 --> 00:10:16,060
So the likelihood
of getting these.

220
00:10:16,060 --> 00:10:19,639
So measuring the likelihood
of each of these occurring.

221
00:10:19,639 --> 00:10:21,430
And we're actually
weighting the particles.

222
00:10:21,430 --> 00:10:26,530
So based on what we measured,
these are most likely,

223
00:10:26,530 --> 00:10:28,250
so they get a larger weight.

224
00:10:28,250 --> 00:10:31,790
And we are most
likely not on top

225
00:10:31,790 --> 00:10:33,630
of a mountain at this point.

226
00:10:33,630 --> 00:10:36,600
So those get smaller weights.

227
00:10:36,600 --> 00:10:38,240
So we're going to
resample, given that.

228
00:10:38,240 --> 00:10:40,610
So we're going keep the
same number of samples

229
00:10:40,610 --> 00:10:41,290
the entire time.

230
00:10:41,290 --> 00:10:43,880
That's just how this
particle filter works.

231
00:10:43,880 --> 00:10:49,800
Except we're going to resample
across our new distribution.

232
00:10:49,800 --> 00:10:52,597
So these are the weights
that we just came up with.

233
00:10:52,597 --> 00:10:54,180
And we're going to
resample over them.

234
00:10:54,180 --> 00:10:56,600
So now it's more
likely that it's here,

235
00:10:56,600 --> 00:10:58,370
or here, or way over there.

236
00:11:01,334 --> 00:11:03,220
And then the final
step is propagating.

237
00:11:03,220 --> 00:11:07,370
So this whole time that we've
been filtering and updating

238
00:11:07,370 --> 00:11:09,820
our weights, the way forward.

239
00:11:09,820 --> 00:11:11,410
So we need to take
that into account.

240
00:11:11,410 --> 00:11:14,370
So we have a forward velocity.

241
00:11:14,370 --> 00:11:17,250
Let's say this range
is 0 meters per second,

242
00:11:17,250 --> 00:11:19,996
to 5 meters per second,
to 10 meters per second

243
00:11:19,996 --> 00:11:21,245
that you can see on this plot.

244
00:11:21,245 --> 00:11:24,780
So we're most likely moving
five meters per second.

245
00:11:24,780 --> 00:11:26,360
So essentially,
this is your change

246
00:11:26,360 --> 00:11:28,050
in time between
sensor measurements.

247
00:11:28,050 --> 00:11:30,410
Obviously if you're only
getting sensor measurements

248
00:11:30,410 --> 00:11:34,440
at 10 hertz, then
you can propagate

249
00:11:34,440 --> 00:11:37,240
in between each of those.

250
00:11:37,240 --> 00:11:38,880
So here we are.

251
00:11:38,880 --> 00:11:43,450
Using our new sample particles,
propagating them forward.

252
00:11:43,450 --> 00:11:46,110
For example, this one moving
there is a low likelihood.

253
00:11:46,110 --> 00:11:47,400
That's a large distance.

254
00:11:47,400 --> 00:11:50,380
And these moving at
a shorter distance

255
00:11:50,380 --> 00:11:54,340
is more likely, given our model.

256
00:11:54,340 --> 00:11:55,590
So we have those new ones.

257
00:11:55,590 --> 00:11:57,855
The new weights are now based
in the probability of it

258
00:11:57,855 --> 00:11:59,400
transitioning to that point.

259
00:11:59,400 --> 00:12:02,500
How likely is it for the plane
to move that far, essentially.

260
00:12:02,500 --> 00:12:04,180
And then we repeat.

261
00:12:04,180 --> 00:12:05,641
So we're repeating
the steps again.

262
00:12:05,641 --> 00:12:08,057
We're going to get a measurement
in from our range finder.

263
00:12:08,057 --> 00:12:11,880
We're going to compare it to
the map, to our particles,

264
00:12:11,880 --> 00:12:14,240
keep the ones that are most
likely and weight them more,

265
00:12:14,240 --> 00:12:15,910
and then propagate them.

266
00:12:15,910 --> 00:12:19,200
So as an example,
here's time step 2,

267
00:12:19,200 --> 00:12:22,530
when we're measuring the
uphill of this mountain.

268
00:12:22,530 --> 00:12:25,620
Time step 3, now we're
halfway up the mountain.

269
00:12:25,620 --> 00:12:29,420
So positions that we've
kept are at similar heights,

270
00:12:29,420 --> 00:12:32,880
so here and here.

271
00:12:32,880 --> 00:12:35,130
And then we can slowly get
down to differentiating it.

272
00:12:35,130 --> 00:12:38,040
So now that we're on top of a
mountain, the only pattern that

273
00:12:38,040 --> 00:12:42,370
matches that is here,
and maybe over there.

274
00:12:42,370 --> 00:12:45,810
And then eventually, we
get down to these two.

275
00:12:45,810 --> 00:12:48,030
And then eventually, as
this mountain drops off,

276
00:12:48,030 --> 00:12:51,910
it's a unique position, where it
goes farther than that one did.

277
00:12:51,910 --> 00:12:53,820
So in the end, our
particle filter

278
00:12:53,820 --> 00:12:57,075
thinks that we're right
around here or here.

279
00:12:57,075 --> 00:12:58,980
And finally, there's
a pretty high chance

280
00:12:58,980 --> 00:13:00,510
that we're over that valley.

281
00:13:03,375 --> 00:13:05,250
So we'll looking at
this, again, now that you

282
00:13:05,250 --> 00:13:06,333
know how to do the filter.

283
00:13:09,256 --> 00:13:11,600
And this will make a
little more sense now.

284
00:13:11,600 --> 00:13:14,050
So they started with the
uniform distribution.

285
00:13:14,050 --> 00:13:16,795
They had no clue where
they were at the beginning.

286
00:13:16,795 --> 00:13:20,830
And as the robot moves
forward, they are resampling.

287
00:13:20,830 --> 00:13:23,000
And the measurements
are changing, obviously.

288
00:13:23,000 --> 00:13:24,610
Because it's seeing
these two walls,

289
00:13:24,610 --> 00:13:27,400
and eventually, it sees that
there's doorway to the left.

290
00:13:27,400 --> 00:13:30,410
And you can keep
going forward as well.

291
00:13:30,410 --> 00:13:32,940
So eventually, that
geometry doesn't line up

292
00:13:32,940 --> 00:13:36,820
with any other spot on the map.

293
00:13:36,820 --> 00:13:38,630
Here, we see it nose
into this hallway.

294
00:13:38,630 --> 00:13:40,780
I think this top
hallway is the only one

295
00:13:40,780 --> 00:13:42,590
on this map that's that long.

296
00:13:42,590 --> 00:13:44,790
But it still don't know
which direction it's going.

297
00:13:44,790 --> 00:13:47,980
It hasn't been able to
differentiate that yet,

298
00:13:47,980 --> 00:13:48,760
but it's about to.

299
00:13:48,760 --> 00:13:51,790
So the only remaining
particles are here and here.

300
00:13:51,790 --> 00:13:53,730
It knows it's in the
middle of a hallway.

301
00:13:53,730 --> 00:13:59,200
And it knows it's moved about
two blocks now without seeing

302
00:13:59,200 --> 00:14:01,590
a wall directly in
front of it, which

303
00:14:01,590 --> 00:14:04,430
that doesn't occur
anywhere else,

304
00:14:04,430 --> 00:14:06,970
without having another turn-off.

305
00:14:06,970 --> 00:14:11,008
So it's about to solve itself.

306
00:14:11,008 --> 00:14:15,189
Because eventually, it
sees this wall over here.

307
00:14:15,189 --> 00:14:16,730
And those observations
don't match up

308
00:14:16,730 --> 00:14:18,360
with what it's actually seeing.

309
00:14:18,360 --> 00:14:21,290
So there we go.

310
00:14:21,290 --> 00:14:25,030
There's a [INAUDIBLE] particle
filter successfully working

311
00:14:25,030 --> 00:14:27,376
for a little robot
with a rangefinder.

312
00:14:34,610 --> 00:14:36,599
I went too far.

313
00:14:36,599 --> 00:14:37,565
There we go.

314
00:14:38,882 --> 00:14:40,340
DAVID STINGLEY:
I'm David Stingley.

315
00:14:40,340 --> 00:14:42,570
And now, after we've
gotten an idea of why

316
00:14:42,570 --> 00:14:44,720
we want to use
semantic localization,

317
00:14:44,720 --> 00:14:47,720
and how to use particle filters
to get an idea of our guesses

318
00:14:47,720 --> 00:14:49,190
for initial
positions, we're going

319
00:14:49,190 --> 00:14:52,130
to talk about how we can use
these two to actually implement

320
00:14:52,130 --> 00:14:55,010
semantic localization
onto a robot.

321
00:14:55,010 --> 00:14:58,260
So hearkening back to
the implementation idea,

322
00:14:58,260 --> 00:15:00,680
we have three important parts.

323
00:15:00,680 --> 00:15:03,290
We have a belief representation.

324
00:15:03,290 --> 00:15:04,860
We have the actuation model.

325
00:15:04,860 --> 00:15:07,310
So once we have a location,
how are we going to move?

326
00:15:07,310 --> 00:15:09,080
As we said before,
if you don't exactly

327
00:15:09,080 --> 00:15:10,580
know how fast you're
moving, there's

328
00:15:10,580 --> 00:15:13,710
a probability you move to a
bunch of different locations.

329
00:15:13,710 --> 00:15:15,955
And then finally, we have
an observation model,

330
00:15:15,955 --> 00:15:17,330
which is the
probability that you

331
00:15:17,330 --> 00:15:19,940
see some certain observation,
given you're in this newly

332
00:15:19,940 --> 00:15:22,310
simulated position.

333
00:15:22,310 --> 00:15:25,960
If we continuously solve
for our most probable x,

334
00:15:25,960 --> 00:15:27,490
then that's our location.

335
00:15:27,490 --> 00:15:29,600
So that particle that
is the most probable

336
00:15:29,600 --> 00:15:32,937
is the place that we
guess we're going to be.

337
00:15:32,937 --> 00:15:34,520
So let's look at a
pseudocode for what

338
00:15:34,520 --> 00:15:36,186
a semantic localization
would look like.

339
00:15:36,186 --> 00:15:38,240
So as long as our
robot is moving,

340
00:15:38,240 --> 00:15:40,130
we're going to
make observations.

341
00:15:40,130 --> 00:15:43,670
And those observations are
going to be [INAUDIBLE] z1.

342
00:15:43,670 --> 00:15:45,633
Then for those
observations, we're

343
00:15:45,633 --> 00:15:47,480
going to start off
our particle filter

344
00:15:47,480 --> 00:15:50,690
and guess a certain number
of probable locations.

345
00:15:50,690 --> 00:15:53,830
We're going to use our
actuation to update it.

346
00:15:53,830 --> 00:15:57,680
And then we're going to simulate
observations at that location,

347
00:15:57,680 --> 00:15:59,510
and compare what
we expect to see

348
00:15:59,510 --> 00:16:02,630
for that particle on our map
versus what we actually saw

349
00:16:02,630 --> 00:16:05,000
that we made our observation.

350
00:16:05,000 --> 00:16:06,380
This is going to be our update.

351
00:16:06,380 --> 00:16:08,300
And this is what we're going
to focus on understanding,

352
00:16:08,300 --> 00:16:10,633
since a lot of the rest of
this is covered by a particle

353
00:16:10,633 --> 00:16:12,830
filter.

354
00:16:12,830 --> 00:16:15,424
So what kind of
models can we pick,

355
00:16:15,424 --> 00:16:17,840
because we need to define what
our observation looks like.

356
00:16:17,840 --> 00:16:20,280
And you get a lot of choices.

357
00:16:20,280 --> 00:16:23,900
In metric localization, you'd
use a labelled laser scan.

358
00:16:23,900 --> 00:16:26,520
You'd have perfect information
about the environment.

359
00:16:26,520 --> 00:16:28,330
And so you can see everything.

360
00:16:28,330 --> 00:16:30,080
It might be nice to
use a scene of objects

361
00:16:30,080 --> 00:16:33,080
at specific locations, but
that requires once again,

362
00:16:33,080 --> 00:16:34,219
a lot of information.

363
00:16:34,219 --> 00:16:36,010
Now you need to know
where the objects are.

364
00:16:36,010 --> 00:16:39,455
You need to have an idea of
its exact specific locations

365
00:16:39,455 --> 00:16:41,450
and orientations with
respect each other.

366
00:16:41,450 --> 00:16:44,230
Maybe it might be nice just
use, like, a bag of objects.

367
00:16:44,230 --> 00:16:48,460
I see four things in my
view pane versus three.

368
00:16:48,460 --> 00:16:50,960
These are all choices, and
we're going to take a look just

369
00:16:50,960 --> 00:16:53,939
really quickly at what
the facts of these are.

370
00:16:53,939 --> 00:16:55,480
As we've stated
before, there's a lot

371
00:16:55,480 --> 00:16:56,580
of choices in observation.

372
00:16:56,580 --> 00:16:59,280
It can get complicated
really quickly.

373
00:16:59,280 --> 00:17:00,920
So just to impress
that upon you,

374
00:17:00,920 --> 00:17:02,760
imagine if you
used laser scanners

375
00:17:02,760 --> 00:17:04,800
when you have three
objects here--

376
00:17:04,800 --> 00:17:08,089
for instance, a house, a
couple of trees, and a mailbox.

377
00:17:08,089 --> 00:17:10,400
You check each line
for an intersection.

378
00:17:10,400 --> 00:17:13,250
And then you have to figure out
what counts as your detection,

379
00:17:13,250 --> 00:17:15,200
since you're going to
have to differentiate

380
00:17:15,200 --> 00:17:17,720
what's in your scene.

381
00:17:17,720 --> 00:17:19,160
The problem with
that might be is

382
00:17:19,160 --> 00:17:22,800
that, well, you saw an
entire wall, for instance.

383
00:17:22,800 --> 00:17:24,880
Where do you want
the house to be?

384
00:17:24,880 --> 00:17:26,720
So let's say we
assume that objects

385
00:17:26,720 --> 00:17:28,222
were a point at some point.

386
00:17:28,222 --> 00:17:30,430
You either completely see
it or you completely don't.

387
00:17:30,430 --> 00:17:33,980
That way, you don't have
to half-see an object.

388
00:17:33,980 --> 00:17:35,570
You just check to
see if whatever

389
00:17:35,570 --> 00:17:37,010
you want your
center of mass to be

390
00:17:37,010 --> 00:17:39,590
intersects with your
current view plane.

391
00:17:39,590 --> 00:17:42,020
If it does, then you're good.

392
00:17:42,020 --> 00:17:44,240
The issue with that
is that, for instance,

393
00:17:44,240 --> 00:17:47,180
something center of view isn't
inside the scene anymore,

394
00:17:47,180 --> 00:17:48,650
you just completely ignore it.

395
00:17:48,650 --> 00:17:50,615
We have the exact
opposite problem.

396
00:17:50,615 --> 00:17:52,750
So you might want to make
it a bit more complex

397
00:17:52,750 --> 00:17:55,025
and have polygons,
parts of objects.

398
00:17:55,025 --> 00:17:57,910
Do you see some
percentage of something?

399
00:17:57,910 --> 00:18:00,800
How much of it is
in the view plane?

400
00:18:00,800 --> 00:18:03,260
It adds in a lot of
chance for errors.

401
00:18:03,260 --> 00:18:05,710
And that's, I guess, the
big point here to remember,

402
00:18:05,710 --> 00:18:08,210
is that depending on how we
characterize our observations,

403
00:18:08,210 --> 00:18:11,280
we have different ways
to get things wrong.

404
00:18:11,280 --> 00:18:14,570
So let's say, for instance, for
an observation like distance

405
00:18:14,570 --> 00:18:19,610
and bearing to some new scenic
object, what can be wrong?

406
00:18:19,610 --> 00:18:22,140
Well, you have noise
inside of your sensors.

407
00:18:22,140 --> 00:18:23,640
Your sensors might
have limitations.

408
00:18:23,640 --> 00:18:25,874
You can't necessarily
see to infinity.

409
00:18:25,874 --> 00:18:27,290
So an object might
be too far away

410
00:18:27,290 --> 00:18:28,970
for you to identify correctly.

411
00:18:28,970 --> 00:18:31,370
It might be rotated in a way
that you can't necessarily

412
00:18:31,370 --> 00:18:33,154
interpret it correctly.

413
00:18:33,154 --> 00:18:35,570
How about if you want to have
your observations in classes

414
00:18:35,570 --> 00:18:37,530
of objects, so of
just everything

415
00:18:37,530 --> 00:18:39,620
being an obstruction, now
you have different types

416
00:18:39,620 --> 00:18:40,400
of obstructions?

417
00:18:40,400 --> 00:18:41,960
Trees are different
from mailboxes,

418
00:18:41,960 --> 00:18:45,640
of course, in which case you
have a classification error.

419
00:18:45,640 --> 00:18:47,210
What if you see a
tree and you just

420
00:18:47,210 --> 00:18:50,157
think it's a really, really big
mailbox, or you see a mailbox

421
00:18:50,157 --> 00:18:51,740
and you think it's
a really small tree

422
00:18:51,740 --> 00:18:54,520
with a funny, little, metal top?

423
00:18:54,520 --> 00:18:55,970
These kind of
errors can then make

424
00:18:55,970 --> 00:18:57,590
your scenes look incorrect.

425
00:18:57,590 --> 00:19:00,450
If you decide to have
sets of objects, well,

426
00:19:00,450 --> 00:19:01,680
what permutations matter?

427
00:19:01,680 --> 00:19:04,040
If you don't have a way of
differentiating elements

428
00:19:04,040 --> 00:19:06,659
in the set, then you don't
know if two trees and a mailbox

429
00:19:06,659 --> 00:19:08,950
with the trees on the left
and the mailbox on the right

430
00:19:08,950 --> 00:19:10,570
or vice versa aren't
the same thing?

431
00:19:13,200 --> 00:19:16,040
So with that in
mind, we're going

432
00:19:16,040 --> 00:19:17,780
to be talking about
how we're going

433
00:19:17,780 --> 00:19:21,500
to solve this question
of given some position

434
00:19:21,500 --> 00:19:23,900
and your synthetic
view, how likely

435
00:19:23,900 --> 00:19:25,997
is it that it's your
actual observation?

436
00:19:25,997 --> 00:19:28,580
And for that, we're just going
to change what this probability

437
00:19:28,580 --> 00:19:31,730
statement looks like to make
it a little more concrete.

438
00:19:31,730 --> 00:19:35,540
In this case, Z is the set of
observed objects that you have.

439
00:19:35,540 --> 00:19:38,210
So we're going to say that
there are objects in the scene,

440
00:19:38,210 --> 00:19:41,300
and we put them inside
of this value, Z.

441
00:19:41,300 --> 00:19:43,940
So for instance, you see a
house and you see a mailbox.

442
00:19:43,940 --> 00:19:46,580
That would be Z. Your set of
objects is just two things.

443
00:19:46,580 --> 00:19:50,150
We're going to use the bag
of objects approximation.

444
00:19:50,150 --> 00:19:52,880
Y of x is going to
be the set of objects

445
00:19:52,880 --> 00:19:56,401
you expect to see given your
map for the position that you're

446
00:19:56,401 --> 00:19:56,900
at.

447
00:19:56,900 --> 00:19:59,709
So given a position X,
Y is a set of objects

448
00:19:59,709 --> 00:20:00,500
that you would see.

449
00:20:00,500 --> 00:20:02,125
So you might be in
a position where you

450
00:20:02,125 --> 00:20:03,720
can see a house and a mailbox.

451
00:20:03,720 --> 00:20:06,304
Then Y is is a
house and a mailbox.

452
00:20:06,304 --> 00:20:07,970
And X is just going
to be your position.

453
00:20:07,970 --> 00:20:10,261
That's the element that we're
getting from our particle

454
00:20:10,261 --> 00:20:11,880
filter.

455
00:20:11,880 --> 00:20:14,480
So in our example,
as you can tell

456
00:20:14,480 --> 00:20:16,247
by how much I talk
about it, we're

457
00:20:16,247 --> 00:20:18,330
going to use just two
things, trees and mailboxes,

458
00:20:18,330 --> 00:20:21,530
because I like them.

459
00:20:21,530 --> 00:20:22,850
So here's my map.

460
00:20:22,850 --> 00:20:25,220
There's going to be a long
road, and off to the side,

461
00:20:25,220 --> 00:20:26,418
we have trees and mailboxes.

462
00:20:26,418 --> 00:20:27,876
And let's say that
your robot wants

463
00:20:27,876 --> 00:20:29,610
to be a paper delivery boy.

464
00:20:29,610 --> 00:20:32,210
So he needs to be able to figure
out where the mailboxes are

465
00:20:32,210 --> 00:20:33,800
and how far down
the street he is.

466
00:20:33,800 --> 00:20:35,750
Because if he were to
throw the paper wrong,

467
00:20:35,750 --> 00:20:38,940
he'd throw a paper into a tree.

468
00:20:38,940 --> 00:20:41,440
So in this world, our robot is
going to be here, represented

469
00:20:41,440 --> 00:20:42,950
by an orange hexagon.

470
00:20:42,950 --> 00:20:45,530
And it has this field of view.

471
00:20:45,530 --> 00:20:48,510
So given this field of
view, what does Z look like,

472
00:20:48,510 --> 00:20:53,850
our actual observation,
one tree and one mailbox?

473
00:20:53,850 --> 00:20:56,305
Well, for this case, what
we're going to say is,

474
00:20:56,305 --> 00:20:58,430
we're assuming that as long
as the thing intersects

475
00:20:58,430 --> 00:21:00,980
with our field of view,
we're going to see it.

476
00:21:00,980 --> 00:21:02,332
So simple finding that.

477
00:21:02,332 --> 00:21:03,790
We're just going
to say that we see

478
00:21:03,790 --> 00:21:05,660
both trees and this mailbox.

479
00:21:05,660 --> 00:21:07,570
Once again, we talked
about how difficult

480
00:21:07,570 --> 00:21:08,870
it is to do this observation.

481
00:21:08,870 --> 00:21:11,210
So for a simplifying assumption,
let's say that we just

482
00:21:11,210 --> 00:21:14,360
completely see the tree.

483
00:21:14,360 --> 00:21:18,320
So that's great for our
actual robot position.

484
00:21:18,320 --> 00:21:20,127
But when we start
spawning particles,

485
00:21:20,127 --> 00:21:21,710
we need to figure
out what we're going

486
00:21:21,710 --> 00:21:23,940
to say they synthetically do.

487
00:21:23,940 --> 00:21:27,325
So say we spawn a particle
here, and we spawn one

488
00:21:27,325 --> 00:21:28,960
when we're just
slightly off the road.

489
00:21:28,960 --> 00:21:30,001
We deviated a little bit.

490
00:21:30,001 --> 00:21:32,210
We're further forward
than the actual position.

491
00:21:32,210 --> 00:21:35,870
We drove into somebody's
house, another guy's house

492
00:21:35,870 --> 00:21:37,170
in the woods.

493
00:21:37,170 --> 00:21:40,250
Then what is each of synthetic
observations for these given

494
00:21:40,250 --> 00:21:41,390
points?

495
00:21:41,390 --> 00:21:43,806
Determining this is going to
determine how we actually get

496
00:21:43,806 --> 00:21:46,050
that probability calculation.

497
00:21:46,050 --> 00:21:48,410
So what do we need
to consider here?

498
00:21:48,410 --> 00:21:51,620
Well, the first thing we need
to consider is classification.

499
00:21:51,620 --> 00:21:54,337
Like we said before, with this
set of objects approximation,

500
00:21:54,337 --> 00:21:56,420
it's important that you
understand if you classify

501
00:21:56,420 --> 00:21:58,880
things correctly or not.

502
00:21:58,880 --> 00:22:02,060
Past that, like we
asked before, we said,

503
00:22:02,060 --> 00:22:04,140
wait, why didn't I just
see a tree and a mailbox?

504
00:22:04,140 --> 00:22:05,890
Well, we need to know
if we saw everything

505
00:22:05,890 --> 00:22:07,430
inside of our field of view.

506
00:22:07,430 --> 00:22:08,900
Maybe we did miss something.

507
00:22:08,900 --> 00:22:11,960
Maybe instead that old
scene saw just one tree,

508
00:22:11,960 --> 00:22:15,020
one mailbox, even though the
other tree intersected it.

509
00:22:15,020 --> 00:22:18,560
And noise can happen in reverse,
so we could accidentally

510
00:22:18,560 --> 00:22:21,650
see a tree when there
actually isn't one.

511
00:22:21,650 --> 00:22:24,700
And finally, we could
see things overlap.

512
00:22:24,700 --> 00:22:27,220
We kind of ignored this before,
but what if two trees were

513
00:22:27,220 --> 00:22:28,480
right on top of each other?

514
00:22:28,480 --> 00:22:30,080
It might make it
kind of difficult

515
00:22:30,080 --> 00:22:32,120
for you to successfully
see that there are

516
00:22:32,120 --> 00:22:34,910
two trees there instead of one.

517
00:22:34,910 --> 00:22:38,340
So to start with, we're going
to strike this assumption.

518
00:22:38,340 --> 00:22:41,300
It will become more evident
later why this is important.

519
00:22:41,300 --> 00:22:43,010
But for now, we're
just going to assume

520
00:22:43,010 --> 00:22:46,610
that every observation
corresponds

521
00:22:46,610 --> 00:22:49,160
to only one object being seen.

522
00:22:49,160 --> 00:22:50,770
Otherwise, you could
end up infinitely

523
00:22:50,770 --> 00:22:52,040
expanding your scene.

524
00:22:52,040 --> 00:22:52,880
Think about it.

525
00:22:52,880 --> 00:22:53,776
You see a tree.

526
00:22:53,776 --> 00:22:56,150
There might be a possibility
that that tree is two trees.

527
00:22:56,150 --> 00:22:57,750
Well, you saw two trees, then.

528
00:22:57,750 --> 00:23:00,240
So maybe there's a probability
that each of those two trees

529
00:23:00,240 --> 00:23:01,860
is also two trees.

530
00:23:01,860 --> 00:23:03,450
And you keep going
and keep going,

531
00:23:03,450 --> 00:23:06,094
until the entire forest
is behind one tree.

532
00:23:06,094 --> 00:23:08,510
It would be kind of bad for
doing probability calculation,

533
00:23:08,510 --> 00:23:11,009
because you'd eventually have
to cut that off at some point,

534
00:23:11,009 --> 00:23:13,329
so that your algorithm
actually finishes.

535
00:23:13,329 --> 00:23:14,870
So for our purposes,
we're just going

536
00:23:14,870 --> 00:23:17,411
to cut it off at the start and
say that everything's just one

537
00:23:17,411 --> 00:23:17,920
object.

538
00:23:17,920 --> 00:23:20,210
It's just error.

539
00:23:20,210 --> 00:23:23,760
So now we need to talk about
if we classify it correctly.

540
00:23:23,760 --> 00:23:25,840
If we can solve this
equation, what's

541
00:23:25,840 --> 00:23:28,490
the probability that our
classification is right?

542
00:23:28,490 --> 00:23:31,430
So for now, we're going to make
two simplifying assumptions.

543
00:23:31,430 --> 00:23:34,700
They're going to remove the two
problems that we had before.

544
00:23:34,700 --> 00:23:36,590
And don't worry, we'll
relax them later.

545
00:23:36,590 --> 00:23:38,240
For our first
assumption, we're going

546
00:23:38,240 --> 00:23:40,916
to assume that we see everything
inside of our field of view.

547
00:23:40,916 --> 00:23:43,550
So that means we're not going
to have any misconceptions.

548
00:23:43,550 --> 00:23:45,930
And we never see something
that doesn't exist.

549
00:23:45,930 --> 00:23:47,644
So we have no false detections.

550
00:23:47,644 --> 00:23:49,810
Everything that's in the
scene, we see successfully.

551
00:23:49,810 --> 00:23:52,640
If it's not in the
scene, we don't see it.

552
00:23:52,640 --> 00:23:55,700
So given that, and we
spawn a robot here,

553
00:23:55,700 --> 00:24:00,240
and it has this field of view,
What does this robot see?

554
00:24:00,240 --> 00:24:01,530
AUDIENCE: Three trees.

555
00:24:01,530 --> 00:24:02,810
DAVID STINGLEY: Yes.

556
00:24:02,810 --> 00:24:04,500
It happens to see three trees.

557
00:24:04,500 --> 00:24:06,470
So remember our assumptions.

558
00:24:06,470 --> 00:24:10,380
We're going to say here that our
actual observation for wherever

559
00:24:10,380 --> 00:24:13,790
our robot is is one
mailbox, and two trees.

560
00:24:13,790 --> 00:24:15,980
And we can see that the
synthetic robot that we made

561
00:24:15,980 --> 00:24:17,810
saw three trees.

562
00:24:17,810 --> 00:24:20,360
So what are some
other forms of Y

563
00:24:20,360 --> 00:24:25,760
that we can make that
would also map to this?

564
00:24:25,760 --> 00:24:28,570
What's the way that we can
take our Y and transform it

565
00:24:28,570 --> 00:24:30,320
so that it maps this Z?

566
00:24:30,320 --> 00:24:34,070
What kind of action would
we have to take on this?

567
00:24:34,070 --> 00:24:36,320
If you were thinking that
we'd have to misclassify one

568
00:24:36,320 --> 00:24:38,780
of these trees, you're correct.

569
00:24:38,780 --> 00:24:41,169
And remember, this is
just a set of objects.

570
00:24:41,169 --> 00:24:43,085
So this doesn't have to
be the first tree that

571
00:24:43,085 --> 00:24:44,150
got misclassified.

572
00:24:44,150 --> 00:24:45,220
There's three of them.

573
00:24:45,220 --> 00:24:49,500
We could have any permutation of
these trees get misclassified.

574
00:24:49,500 --> 00:24:50,500
So it becomes important.

575
00:24:50,500 --> 00:24:52,270
And we're going to
introduce this concept

576
00:24:52,270 --> 00:24:54,050
of this operator, phi.

577
00:24:54,050 --> 00:24:55,820
And what phi is
going to do is it's

578
00:24:55,820 --> 00:24:59,870
going to be a way to map the
permutations that we could have

579
00:24:59,870 --> 00:25:04,520
of misclassifications for
Y to look like Z. That way,

580
00:25:04,520 --> 00:25:07,520
we don't have to try and write
all the permutations down.

581
00:25:07,520 --> 00:25:10,040
It's possible to do this
essentially with a matrix,

582
00:25:10,040 --> 00:25:12,000
like a permutation
matrix, that just reorders

583
00:25:12,000 --> 00:25:13,280
the elements that you have.

584
00:25:15,900 --> 00:25:19,650
So what does this
probability look like now?

585
00:25:19,650 --> 00:25:22,200
We're going to use the
lower case z, y and i

586
00:25:22,200 --> 00:25:24,770
to represent each individual
element of those sets.

587
00:25:24,770 --> 00:25:29,945
So for some element in
the actual observation, z,

588
00:25:29,945 --> 00:25:31,870
what's the probability
that some element

589
00:25:31,870 --> 00:25:35,360
in our synthetic
observation matches it?

590
00:25:35,360 --> 00:25:37,676
Well, for that we
need to pick-- well,

591
00:25:37,676 --> 00:25:39,050
there's some
probability of being

592
00:25:39,050 --> 00:25:42,480
wrong or a misclassifying,
or classifying correctly.

593
00:25:42,480 --> 00:25:45,530
So we're going to use c to
represent a classification

594
00:25:45,530 --> 00:25:46,040
matrix.

595
00:25:46,040 --> 00:25:48,700
So there's some probability
that we classify correctly,

596
00:25:48,700 --> 00:25:51,470
usually a higher probability,
and then some small probability

597
00:25:51,470 --> 00:25:54,900
that an object becomes
another type of object.

598
00:25:54,900 --> 00:25:58,160
So how often we misclassify
is represented here.

599
00:25:58,160 --> 00:25:59,870
But we could add another term.

600
00:25:59,870 --> 00:26:02,090
Let's say that our
classification engine gave

601
00:26:02,090 --> 00:26:04,160
the weighted values of things.

602
00:26:04,160 --> 00:26:07,910
It's common for neural nets
and other types of systems

603
00:26:07,910 --> 00:26:09,740
that observe images to
kind of give weights

604
00:26:09,740 --> 00:26:11,280
to their classification.

605
00:26:11,280 --> 00:26:13,510
So you might want to use
those weights to represent

606
00:26:13,510 --> 00:26:14,450
our confidence.

607
00:26:14,450 --> 00:26:16,870
And there's a problem
that that confidence

608
00:26:16,870 --> 00:26:19,130
determines that our
classification might just

609
00:26:19,130 --> 00:26:21,210
be wrong out the get-go.

610
00:26:21,210 --> 00:26:22,710
In that case, we
want to know what's

611
00:26:22,710 --> 00:26:25,370
the probably that the
score is likely to be

612
00:26:25,370 --> 00:26:26,690
that type of object.

613
00:26:26,690 --> 00:26:28,700
And then we could
have other things.

614
00:26:28,700 --> 00:26:30,740
For instance, let's say
that our classification

615
00:26:30,740 --> 00:26:33,586
was way better at identifying
mailboxes from the front.

616
00:26:33,586 --> 00:26:35,210
And if we turn the
mailbox to the side,

617
00:26:35,210 --> 00:26:37,021
it gives poor observations.

618
00:26:37,021 --> 00:26:39,020
And it tells us if it
thinks that the mailbox is

619
00:26:39,020 --> 00:26:41,210
on the side, so it
doesn't really know.

620
00:26:41,210 --> 00:26:43,040
In that case, maybe
having the bearing

621
00:26:43,040 --> 00:26:44,930
of the object inside
of our synthetic view

622
00:26:44,930 --> 00:26:47,210
allows us to determine
another probability

623
00:26:47,210 --> 00:26:49,286
for misclassification.

624
00:26:49,286 --> 00:26:51,590
The important thing
to notice here

625
00:26:51,590 --> 00:26:54,380
is that we can keep
adding more terms to this.

626
00:26:54,380 --> 00:26:56,630
The more specific your
classification can get,

627
00:26:56,630 --> 00:26:58,610
the more terms you
can introduce into it.

628
00:26:58,610 --> 00:27:01,640
So add some confusion to it,
and make the probabilities

629
00:27:01,640 --> 00:27:06,660
of misclassification smaller,
and smaller, and smaller.

630
00:27:06,660 --> 00:27:08,300
So with that in
mind, we're going

631
00:27:08,300 --> 00:27:10,620
to look at what the probability
for these entire sets

632
00:27:10,620 --> 00:27:12,500
is going to look like.

633
00:27:12,500 --> 00:27:14,000
And what is really
is going to be is

634
00:27:14,000 --> 00:27:17,140
it's just going to be a product
over all these classifications

635
00:27:17,140 --> 00:27:20,237
for some selection of phi.

636
00:27:20,237 --> 00:27:22,820
So you're going to have to take
all the different permutations

637
00:27:22,820 --> 00:27:25,455
that you could possibly
have, and for each of them,

638
00:27:25,455 --> 00:27:27,580
you're going to multiply
all of these probabilities

639
00:27:27,580 --> 00:27:30,380
by each other.

640
00:27:30,380 --> 00:27:33,570
In the end, it's going to give
you that entire probability.

641
00:27:33,570 --> 00:27:35,740
So it's all the
permutations that you have,

642
00:27:35,740 --> 00:27:37,760
and then just the probability
of each of those objects being

643
00:27:37,760 --> 00:27:39,410
classified as that
type-- that you just

644
00:27:39,410 --> 00:27:42,720
map one set to the other set.

645
00:27:42,720 --> 00:27:44,600
So now let's take a
look at another scene.

646
00:27:44,600 --> 00:27:47,720
So we spawned a particle
above, fake robot here.

647
00:27:47,720 --> 00:27:49,510
And it has this field of view.

648
00:27:49,510 --> 00:27:51,790
What does this field
of view look like?

649
00:27:51,790 --> 00:27:55,970
If you said it looked like
two mailboxes and two trees,

650
00:27:55,970 --> 00:27:56,512
you're right.

651
00:27:56,512 --> 00:27:58,219
And so we're going to
assume that we have

652
00:27:58,219 --> 00:27:59,640
the same actual observation.

653
00:27:59,640 --> 00:28:03,250
We still see a
mailbox and two trees.

654
00:28:03,250 --> 00:28:05,330
And we're going to remove
our old assumption.

655
00:28:05,330 --> 00:28:07,280
We're going to say that
it might be possible

656
00:28:07,280 --> 00:28:11,180
that we don't see everything
in a synthetic robot's field.

657
00:28:11,180 --> 00:28:15,100
In which case, how can we amp
this synthetic observation

658
00:28:15,100 --> 00:28:17,570
to the actual observation?

659
00:28:17,570 --> 00:28:19,130
Well, we just add
in a probability

660
00:28:19,130 --> 00:28:21,050
that we miss
identifying an object.

661
00:28:21,050 --> 00:28:22,925
If we just didn't see
one of these mailboxes,

662
00:28:22,925 --> 00:28:26,240
it looks exactly like this.

663
00:28:26,240 --> 00:28:29,780
So for that, we want to capture
what's the probability that we

664
00:28:29,780 --> 00:28:30,560
see nothing?

665
00:28:30,560 --> 00:28:33,060
So say we have a synthetic view
with some number of objects.

666
00:28:33,060 --> 00:28:37,590
What's the probability
we just miss all of them?

667
00:28:37,590 --> 00:28:40,790
So the probability that
we miss all of them,

668
00:28:40,790 --> 00:28:42,350
we're going add an
assumption here,

669
00:28:42,350 --> 00:28:44,433
which is that we're going
to say that there exists

670
00:28:44,433 --> 00:28:47,230
some probability that we
see an object for a given

671
00:28:47,230 --> 00:28:51,470
synthetic view here, and that we
don't see it with a probability

672
00:28:51,470 --> 00:28:52,612
one minus that probability.

673
00:28:52,612 --> 00:28:54,320
So essentially, there's
just two states--

674
00:28:54,320 --> 00:28:59,030
either we see the object
or we don't see the object.

675
00:28:59,030 --> 00:29:01,640
And we're going to say that
probability of identifying

676
00:29:01,640 --> 00:29:03,230
objects is independent.

677
00:29:03,230 --> 00:29:05,930
So if we see one
object, then that

678
00:29:05,930 --> 00:29:08,870
does not change our chance
of seeing the next object.

679
00:29:08,870 --> 00:29:11,750
Both of these assumptions
help simplify the math here.

680
00:29:11,750 --> 00:29:14,130
In reality, these might
be strongly interlinked.

681
00:29:14,130 --> 00:29:16,620
For instance, if your
robot's camera is broken,

682
00:29:16,620 --> 00:29:18,246
the probability that
it doesn't see one

683
00:29:18,246 --> 00:29:20,203
object directly correlates
with the other ones,

684
00:29:20,203 --> 00:29:21,860
because it won't see
any of the objects

685
00:29:21,860 --> 00:29:24,290
successfully if it has
no camera any more.

686
00:29:24,290 --> 00:29:26,030
If your robot
drives into a wall,

687
00:29:26,030 --> 00:29:28,196
and can't see anything
because it's staring straight

688
00:29:28,196 --> 00:29:30,380
at the wall, the same
kind of idea holds.

689
00:29:30,380 --> 00:29:31,800
But for the purposes
of making it

690
00:29:31,800 --> 00:29:33,350
so that you don't have
a lot of covariances,

691
00:29:33,350 --> 00:29:35,420
and a really, really big
conditional statement

692
00:29:35,420 --> 00:29:37,970
of a lot of probabilities
of seeing things,

693
00:29:37,970 --> 00:29:42,250
we make these assumptions for
our items to be independent.

694
00:29:42,250 --> 00:29:44,450
Independence means we can
multiply our probability

695
00:29:44,450 --> 00:29:45,780
successfully.

696
00:29:45,780 --> 00:29:47,990
So if we just don't want
to see any of the objects,

697
00:29:47,990 --> 00:29:49,495
we just take 1 minus
the probability

698
00:29:49,495 --> 00:29:52,080
of seeing the object for all
the objects in the scene.

699
00:29:56,330 --> 00:30:00,476
So what goes hand-in-hand
with not seeing anything?

700
00:30:00,476 --> 00:30:02,630
Think about if you
had a robot far off

701
00:30:02,630 --> 00:30:03,965
in the distance over here.

702
00:30:03,965 --> 00:30:05,480
What can it see?

703
00:30:05,480 --> 00:30:07,100
Just two trees.

704
00:30:07,100 --> 00:30:10,370
So now, we're going to
remove the idea that we can't

705
00:30:10,370 --> 00:30:12,420
see things that don't exist.

706
00:30:12,420 --> 00:30:15,230
So to make two trees map
to two trees and a mailbox,

707
00:30:15,230 --> 00:30:19,120
we just made up a
mailbox of some noise.

708
00:30:19,120 --> 00:30:21,720
So what's the probability
that we see a full scene

709
00:30:21,720 --> 00:30:23,444
when we can't see anything?

710
00:30:23,444 --> 00:30:25,860
And if you're thinking it's
going to map to a very similar

711
00:30:25,860 --> 00:30:27,765
formula, you're kind of right.

712
00:30:27,765 --> 00:30:30,670
But we need to figure out a way
to capture our noise statement.

713
00:30:30,670 --> 00:30:32,840
Specifically, we are
going to use noise

714
00:30:32,840 --> 00:30:35,440
as a Poisson variable, which
means that there's always

715
00:30:35,440 --> 00:30:39,610
some probability of seeing
an object out of nothing.

716
00:30:39,610 --> 00:30:43,030
And it's going to be coordinated
and according to this factor Kz

717
00:30:43,030 --> 00:30:44,680
we'll get to on the next slide.

718
00:30:44,680 --> 00:30:47,260
You could choose a lot
of different things

719
00:30:47,260 --> 00:30:49,680
to represent your
noise in this case.

720
00:30:49,680 --> 00:30:52,240
A Poisson variable was just
chosen by the specific method

721
00:30:52,240 --> 00:30:54,809
that we used and implemented
from another paper.

722
00:30:54,809 --> 00:30:56,350
But with testing,
you could find that

723
00:30:56,350 --> 00:30:58,170
different potential
distributions

724
00:30:58,170 --> 00:31:02,170
map to your noise for your
particular sensor better.

725
00:31:02,170 --> 00:31:05,239
So with a Poisson variable,
what we're going to have is

726
00:31:05,239 --> 00:31:07,030
we're just going to
have the product of all

727
00:31:07,030 --> 00:31:11,850
of our Poisson variables times
this Kz factor for the given

728
00:31:11,850 --> 00:31:13,354
scene that we want to map to.

729
00:31:13,354 --> 00:31:14,770
Essentially, it's
just the product

730
00:31:14,770 --> 00:31:16,600
of all these independent
Poisson variables

731
00:31:16,600 --> 00:31:21,650
for each of our different
objects that we want to create.

732
00:31:21,650 --> 00:31:25,120
So with that in mind, what's
this Kz factor that we're

733
00:31:25,120 --> 00:31:27,205
multiplying everything by?

734
00:31:27,205 --> 00:31:28,580
What it's actually
going to be is

735
00:31:28,580 --> 00:31:31,799
it's going to be a set
of uniform distributions.

736
00:31:31,799 --> 00:31:33,340
These uniform
distributions are going

737
00:31:33,340 --> 00:31:36,100
to be uniform over all the
classifications we could get

738
00:31:36,100 --> 00:31:39,415
for an object we spawned
from the noise, and all

739
00:31:39,415 --> 00:31:43,120
the possible scores, and
all these possible bearings

740
00:31:43,120 --> 00:31:45,390
for this synthetic object.

741
00:31:45,390 --> 00:31:48,400
And if you remember a few slides
ago or you rewind the video,

742
00:31:48,400 --> 00:31:51,790
you'll notice that these map
directly to the categories

743
00:31:51,790 --> 00:31:53,950
that we put into that
classification engine.

744
00:31:53,950 --> 00:31:54,940
In fact, they should.

745
00:31:54,940 --> 00:31:57,340
So if you added more
things or took them out,

746
00:31:57,340 --> 00:31:59,740
you changed this
uniform distribution.

747
00:31:59,740 --> 00:32:01,810
The idea is that when
you synthetically

748
00:32:01,810 --> 00:32:03,280
create an object
out of noise, you

749
00:32:03,280 --> 00:32:06,310
might get any of those
types of objects.

750
00:32:06,310 --> 00:32:09,990
But if you needed to create a
tree synthetically from seeing

751
00:32:09,990 --> 00:32:12,100
it in the noise,
you might get a tree

752
00:32:12,100 --> 00:32:13,650
or you might get a mailbox.

753
00:32:13,650 --> 00:32:15,275
Either one of them
might show up.

754
00:32:15,275 --> 00:32:17,560
If you had a different
or more intelligent

755
00:32:17,560 --> 00:32:20,520
distribution for how you
might misclassify things--

756
00:32:20,520 --> 00:32:23,200
for instance, let's say your
noise, whenever it shows up,

757
00:32:23,200 --> 00:32:25,030
always identifies trees.

758
00:32:25,030 --> 00:32:27,560
Whenever you accidentally
see noise as an object,

759
00:32:27,560 --> 00:32:28,672
it's always a tree.

760
00:32:28,672 --> 00:32:31,255
Then you might only want to have
a probability of seeing trees

761
00:32:31,255 --> 00:32:33,977
over here, and it would
change you distribution.

762
00:32:33,977 --> 00:32:35,560
But for simplicity's
sake, we're going

763
00:32:35,560 --> 00:32:37,720
to assume that it's
equally probable

764
00:32:37,720 --> 00:32:42,230
that any type of object
shows up out of noise.

765
00:32:42,230 --> 00:32:44,600
So now, we're going to put
it all together, since we

766
00:32:44,600 --> 00:32:47,200
relaxed our assumptions.

767
00:32:47,200 --> 00:32:49,090
So we have a lot
of different things

768
00:32:49,090 --> 00:32:52,090
that can potentially map
to this actual scene.

769
00:32:52,090 --> 00:32:55,360
We have scenes
that lack objects.

770
00:32:55,360 --> 00:32:58,920
We have scenes that lack
objects and therefore,

771
00:32:58,920 --> 00:33:01,100
need to add the object in.

772
00:33:01,100 --> 00:33:04,030
And when they lack objects
and have to add the object in,

773
00:33:04,030 --> 00:33:06,100
there's a chance they add
in a different object.

774
00:33:06,100 --> 00:33:08,510
And then maybe, they just
misclassify something,

775
00:33:08,510 --> 00:33:10,830
like our misclassification idea.

776
00:33:10,830 --> 00:33:13,100
But wait-- I guess we can
make this more complicated.

777
00:33:13,100 --> 00:33:15,370
What if one of them
just wasn't seen,

778
00:33:15,370 --> 00:33:17,470
so we'd take one of these out?

779
00:33:17,470 --> 00:33:20,760
And then we just see
two things from noise.

780
00:33:20,760 --> 00:33:23,662
As you could see, these keep
getting more complicated.

781
00:33:23,662 --> 00:33:26,120
But if you think about it, with
our previous probabilities,

782
00:33:26,120 --> 00:33:27,940
is every single one
of these is going

783
00:33:27,940 --> 00:33:30,440
to add a lot of
probability terms

784
00:33:30,440 --> 00:33:32,590
to be multiplied by each other.

785
00:33:32,590 --> 00:33:35,720
You can keep getting as
complicated as you want it to.

786
00:33:35,720 --> 00:33:37,570
If we removed our
first assumption,

787
00:33:37,570 --> 00:33:39,540
we'd be adding in
probabilities of two objects

788
00:33:39,540 --> 00:33:41,195
becoming one object.

789
00:33:41,195 --> 00:33:43,510
The idea is like
higher power terms when

790
00:33:43,510 --> 00:33:45,310
you're doing approximations.

791
00:33:45,310 --> 00:33:48,100
The goal is to make sure
that any particle you spawn

792
00:33:48,100 --> 00:33:51,160
could potentially be what
you've actually seen.

793
00:33:51,160 --> 00:33:54,700
But we want the ones that are
very low probability to really

794
00:33:54,700 --> 00:33:56,649
be very low probability.

795
00:33:56,649 --> 00:33:58,690
So we make sure that these
incredibly complicated

796
00:33:58,690 --> 00:34:02,070
transformations are going to be
so insignificant that they'll

797
00:34:02,070 --> 00:34:03,880
usually return to almost 0.

798
00:34:06,400 --> 00:34:09,690
So if we wanted
to solve this, we

799
00:34:09,690 --> 00:34:12,100
need to start by using a
little assumption, which

800
00:34:12,100 --> 00:34:14,280
is the fact that we now
have to fold in the idea

801
00:34:14,280 --> 00:34:16,020
that we could have
some missed detection

802
00:34:16,020 --> 00:34:17,690
and some false detections.

803
00:34:17,690 --> 00:34:19,570
So false detections
would increase the number

804
00:34:19,570 --> 00:34:21,570
of objects we've seen
over the number of objects

805
00:34:21,570 --> 00:34:23,537
that are actually presence.

806
00:34:23,537 --> 00:34:25,870
And missed detections would
reduce the number of objects

807
00:34:25,870 --> 00:34:26,956
that are actually present.

808
00:34:26,956 --> 00:34:28,330
Notice that this
always has to be

809
00:34:28,330 --> 00:34:30,429
same size as the number
of expected objects,

810
00:34:30,429 --> 00:34:33,370
because you're trying to map
whatever your synthetic scene

811
00:34:33,370 --> 00:34:37,060
is to the actual thing
that you observe.

812
00:34:37,060 --> 00:34:40,060
When you do this, it really
turns into a large number

813
00:34:40,060 --> 00:34:41,320
of multiplications.

814
00:34:41,320 --> 00:34:43,270
So this should look familiar.

815
00:34:43,270 --> 00:34:45,940
This is our term for
successfully classifying

816
00:34:45,940 --> 00:34:47,514
all of our objects,
and we're going

817
00:34:47,514 --> 00:34:48,889
to multiply it by
the probability

818
00:34:48,889 --> 00:34:52,544
that we actually identify it,
since we could now miss things.

819
00:34:52,544 --> 00:34:54,699
Then, we have to multiply
that by the probability

820
00:34:54,699 --> 00:34:56,770
that for whatever
objects we didn't see,

821
00:34:56,770 --> 00:34:59,630
that we actually missed them.

822
00:34:59,630 --> 00:35:02,234
And then finally, we
have to add in the noise,

823
00:35:02,234 --> 00:35:04,150
because the noise is
going to make the objects

824
00:35:04,150 --> 00:35:06,700
that we're missing
show up, so that we

825
00:35:06,700 --> 00:35:09,370
have a classification of
the same size as the thing

826
00:35:09,370 --> 00:35:10,870
that we want to see.

827
00:35:10,870 --> 00:35:14,470
Now, one thing you should note
is that we added in phi way

828
00:35:14,470 --> 00:35:16,110
down here at Kz.

829
00:35:16,110 --> 00:35:19,720
That's because we have to map
these false detections that

830
00:35:19,720 --> 00:35:21,900
added to the number of
objects that we could see,

831
00:35:21,900 --> 00:35:23,850
as well as the
actual detections.

832
00:35:23,850 --> 00:35:26,460
Essentially, we have to
take all the objects that

833
00:35:26,460 --> 00:35:28,440
are seen here, real
or not, and map them

834
00:35:28,440 --> 00:35:30,245
to all the objects
we expect to see,

835
00:35:30,245 --> 00:35:31,870
because they're going
to be one to one.

836
00:35:35,340 --> 00:35:38,210
So let's take a look at a
little video that shows what

837
00:35:38,210 --> 00:35:40,410
semantic localization can do.

838
00:35:40,410 --> 00:35:43,320
So in this video, for
a little heads up,

839
00:35:43,320 --> 00:35:45,750
we have essentially
a scene where

840
00:35:45,750 --> 00:35:48,270
they have mapped out two things
inside of a suburban area.

841
00:35:48,270 --> 00:35:50,910
They've mapped out
cars and windows

842
00:35:50,910 --> 00:35:53,372
on the path of the robot as
it drives around an area.

843
00:35:53,372 --> 00:35:55,455
And they're attempting to
use just the information

844
00:35:55,455 --> 00:35:57,944
of the cars and windows
visible on the scene in order

845
00:35:57,944 --> 00:35:59,860
to localize where they
believe the robot to be

846
00:35:59,860 --> 00:36:01,018
during its journey.

847
00:36:03,494 --> 00:36:04,160
[VIDEO PLAYBACK]

848
00:36:04,160 --> 00:36:06,750
- This is a video
extension to the paper--

849
00:36:06,750 --> 00:36:08,496
DAVID STINGLEY: So
going to mute that.

850
00:36:08,496 --> 00:36:10,170
It's on.

851
00:36:10,170 --> 00:36:15,230
So this is a few steps
into the process.

852
00:36:15,230 --> 00:36:17,180
And then it resets.

853
00:36:17,180 --> 00:36:20,080
It's spawn a number of points
for potential locations.

854
00:36:20,080 --> 00:36:22,330
And then it very quickly
localizes and finds

855
00:36:22,330 --> 00:36:24,350
its spectacular rotation.

856
00:36:24,350 --> 00:36:25,820
These are all the
identifications

857
00:36:25,820 --> 00:36:27,070
that are happening
inside of the scene.

858
00:36:27,070 --> 00:36:29,420
You can see the boundary
boxes for cars and windows

859
00:36:29,420 --> 00:36:32,220
showing up, the cars in red,
and the windows in green.

860
00:36:32,220 --> 00:36:35,300
It expands out its
distribution of particles.

861
00:36:35,300 --> 00:36:36,950
And then shortly
thereafter, it gets

862
00:36:36,950 --> 00:36:39,050
a couple of approximations.

863
00:36:39,050 --> 00:36:41,420
And then it settles
in on a location.

864
00:36:41,420 --> 00:36:44,150
It happens to use a kind of like
a centralized weight for where

865
00:36:44,150 --> 00:36:46,810
it is for the set of particles,
and then it draws the car

866
00:36:46,810 --> 00:36:48,170
as being in that location.

867
00:36:48,170 --> 00:36:50,224
If you let it keep
running, it occasionally

868
00:36:50,224 --> 00:36:51,890
expands back out to
make sure it doesn't

869
00:36:51,890 --> 00:36:54,530
fall into a local minima, and
then compresses once again

870
00:36:54,530 --> 00:36:56,690
to where its
strongest belief is.

871
00:36:56,690 --> 00:36:59,270
Notice that it has
a couple of seconds

872
00:36:59,270 --> 00:37:02,060
of having a very, very spread
distribution, and very quickly

873
00:37:02,060 --> 00:37:03,980
converges into a singular point.

874
00:37:03,980 --> 00:37:06,200
That's because a lot
of these locations

875
00:37:06,200 --> 00:37:07,830
become very, very
low probability

876
00:37:07,830 --> 00:37:10,070
after you've seen a couple
of scenes into the future.

877
00:37:13,255 --> 00:37:14,997
And we're going to pause this.

878
00:37:14,997 --> 00:37:15,580
[END PLAYBACK]

879
00:37:15,580 --> 00:37:18,100
I welcome you to go see
the video, to see the video

880
00:37:18,100 --> 00:37:19,640
yourself if you want to.

881
00:37:19,640 --> 00:37:20,720
You can check the title.

882
00:37:20,720 --> 00:37:22,220
It's pretty easy
to find on YouTube,

883
00:37:22,220 --> 00:37:25,093
because it's pretty
much the only one.

884
00:37:25,093 --> 00:37:27,710
So hopefully, it works now.

885
00:37:27,710 --> 00:37:29,720
Step back over here
to the other side.

886
00:37:29,720 --> 00:37:31,480
So why would
semantic localization

887
00:37:31,480 --> 00:37:33,210
be useful in this manner?

888
00:37:33,210 --> 00:37:35,440
In the example that was
shown, it was kind of

889
00:37:35,440 --> 00:37:37,090
done on post-processed data.

890
00:37:37,090 --> 00:37:38,020
They took a scene.

891
00:37:38,020 --> 00:37:39,540
They drove around it already.

892
00:37:39,540 --> 00:37:41,740
Then they did the identification
on the video feed,

893
00:37:41,740 --> 00:37:45,210
and used that to
do a localization.

894
00:37:45,210 --> 00:37:46,540
We talked about before.

895
00:37:46,540 --> 00:37:49,270
But people kind of use sparse
information to do a lot

896
00:37:49,270 --> 00:37:51,527
of their-- or if you
can't walk into a room--

897
00:37:51,527 --> 00:37:53,860
people don't walk into a room
and produce an exact laser

898
00:37:53,860 --> 00:37:56,840
scanned map of the
entire area themselves.

899
00:37:56,840 --> 00:37:59,220
But they can store
important information,

900
00:37:59,220 --> 00:38:01,750
like seeing objects
and tables that they

901
00:38:01,750 --> 00:38:05,960
find with their shins, and
use those to move around.

902
00:38:05,960 --> 00:38:09,089
Robots store that perfect map,
if they can manage to make it.

903
00:38:09,089 --> 00:38:11,380
But they don't really have
a good understanding of what

904
00:38:11,380 --> 00:38:13,990
that map might mean to a human.

905
00:38:13,990 --> 00:38:16,860
So that means that we're
actually kind of like better

906
00:38:16,860 --> 00:38:18,970
at doing tasks with
the environment.

907
00:38:18,970 --> 00:38:20,929
If we wanted to go
directly to a location,

908
00:38:20,929 --> 00:38:22,720
we don't have to look
around and figure out

909
00:38:22,720 --> 00:38:25,230
on a wall what our
distance from the wall

910
00:38:25,230 --> 00:38:27,832
is before we can then
localize and move ourselves

911
00:38:27,832 --> 00:38:28,790
back over to the table.

912
00:38:28,790 --> 00:38:30,190
We just say, oh, I want
to go to the table.

913
00:38:30,190 --> 00:38:31,190
And you look over there.

914
00:38:31,190 --> 00:38:32,200
Oh, there's a table.

915
00:38:32,200 --> 00:38:34,000
Table, got it.

916
00:38:34,000 --> 00:38:36,610
So how can we make robots think
a little more like humans,

917
00:38:36,610 --> 00:38:39,620
so it's easier for us to
give them instructions?

918
00:38:39,620 --> 00:38:42,010
If we can make the
robot use a map that

919
00:38:42,010 --> 00:38:43,960
has just scenic
objects like we use

920
00:38:43,960 --> 00:38:45,960
a map that just
has scenic objects,

921
00:38:45,960 --> 00:38:47,990
then order to move
between scenic objects

922
00:38:47,990 --> 00:38:50,590
are as simple as turning,
finding the object, and saying,

923
00:38:50,590 --> 00:38:52,570
oh, well, I know
where the object is.

924
00:38:52,570 --> 00:38:54,210
So I know roughly
where I need to be.

925
00:38:54,210 --> 00:38:57,200
And then we start moving
in process towards it.

926
00:38:57,200 --> 00:38:59,756
In conclusion, semantic
localization has a lot of--

927
00:38:59,756 --> 00:39:01,506
I'm going to leave the
references slide up

928
00:39:01,506 --> 00:39:03,589
while I talk, so that if
you wanted to go and take

929
00:39:03,589 --> 00:39:05,960
a chance to see any of these
papers, you definitely can.

930
00:39:05,960 --> 00:39:07,870
Most of the work in this
slide comes directly

931
00:39:07,870 --> 00:39:09,070
from these sources.

932
00:39:09,070 --> 00:39:12,070
Semantic localization
offers us the opportunity

933
00:39:12,070 --> 00:39:16,820
to take robots, use sparser
information to potentially

934
00:39:16,820 --> 00:39:19,540
localize, find ourselves
inside of spaces.

935
00:39:19,540 --> 00:39:22,270
It also gives us a chance to
have really tweakable factors

936
00:39:22,270 --> 00:39:23,800
for how you might
understand where

937
00:39:23,800 --> 00:39:25,220
you are in a space intuitively.

938
00:39:25,220 --> 00:39:26,970
If you think certain
things are important,

939
00:39:26,970 --> 00:39:29,000
you can add them in as
probabilistic factors.

940
00:39:29,000 --> 00:39:32,970
If you don't, you remove
them just as well.

941
00:39:32,970 --> 00:39:33,552
Thank you.

942
00:39:33,552 --> 00:39:36,010
And this is essentially the
conclusion of our presentation.

943
00:39:36,010 --> 00:39:38,920
I definitely recommend taking,
if you wanted to use something

944
00:39:38,920 --> 00:39:43,815
like this, taking a look at
this particular paper here.

945
00:39:43,815 --> 00:39:46,969
Notice how it says, "via
the matrix permanent?"

946
00:39:46,969 --> 00:39:48,760
As I said before, a
lot of these operations

947
00:39:48,760 --> 00:39:51,120
are a series of multiplications
and some permutations.

948
00:39:51,120 --> 00:39:52,940
There's permutation matrices.

949
00:39:52,940 --> 00:39:54,820
There's matrix multiplication.

950
00:39:54,820 --> 00:39:57,280
And there's a lot of ways to
make matrix multiplication

951
00:39:57,280 --> 00:39:58,480
faster.

952
00:39:58,480 --> 00:40:00,889
This paper details how
you can take the math that

953
00:40:00,889 --> 00:40:02,680
was shown before that's
pretty inefficient,

954
00:40:02,680 --> 00:40:04,304
and turn it into
something that you can

955
00:40:04,304 --> 00:40:07,270
do an estimate of very quickly.

956
00:40:07,270 --> 00:40:10,920
[APPLAUSE]