1
00:00:01,680 --> 00:00:04,080
The following content is
provided under a Creative

2
00:00:04,080 --> 00:00:05,620
Commons license.

3
00:00:05,620 --> 00:00:07,920
Your support will help
MIT OpenCourseWare

4
00:00:07,920 --> 00:00:12,280
continue to offer high quality
educational resources for free.

5
00:00:12,280 --> 00:00:14,910
To make a donation or
view additional materials

6
00:00:14,910 --> 00:00:18,120
from hundreds of MIT courses,
visit MIT OpenCourseWare

7
00:00:18,120 --> 00:00:19,325
at ocw.mit.edu.

8
00:00:22,340 --> 00:00:24,090
AMNON SHASHUA: So
unlike most of the talks

9
00:00:24,090 --> 00:00:25,470
that you have been
given, I'm not

10
00:00:25,470 --> 00:00:27,720
going to teach you
anything today.

11
00:00:27,720 --> 00:00:30,440
So it's not kind going to
be teaching type of talk.

12
00:00:30,440 --> 00:00:33,600
It will be more towards let's
look at the crystal ball

13
00:00:33,600 --> 00:00:37,380
and try to see how
the future will unfold

14
00:00:37,380 --> 00:00:42,240
and future where computer
vision is a major agent

15
00:00:42,240 --> 00:00:45,160
in this transformative future.

16
00:00:45,160 --> 00:00:48,630
So I'll start with
transportation.

17
00:00:48,630 --> 00:00:51,270
This is the field where
Mobileye is active.

18
00:00:51,270 --> 00:00:54,090
And then I'll move towards
wearable computing, the field

19
00:00:54,090 --> 00:00:55,230
where OrCam is active.

20
00:00:55,230 --> 00:00:58,830
These are two companies
that I co-founded.

21
00:00:58,830 --> 00:01:04,610
One, Mobileye in 1999
and OrCam in 2010.

22
00:01:04,610 --> 00:01:08,250
So before I just-- a few words
about the computer vision.

23
00:01:08,250 --> 00:01:11,940
I'm assuming that you all
know about computer vision.

24
00:01:11,940 --> 00:01:13,630
It's the science
of making computers

25
00:01:13,630 --> 00:01:17,640
see and extract meaning out
of images, out of video.

26
00:01:17,640 --> 00:01:22,320
This is a field that
in the past 20 years,

27
00:01:22,320 --> 00:01:25,230
through machine learning,
has made a big jump.

28
00:01:25,230 --> 00:01:28,710
And in the past four years,
through deep learning,

29
00:01:28,710 --> 00:01:32,490
has made another jump where
there are certain narrow areas

30
00:01:32,490 --> 00:01:34,350
in computer vision
and perception

31
00:01:34,350 --> 00:01:37,380
where computers reach
human level perception

32
00:01:37,380 --> 00:01:38,550
and even surpass it.

33
00:01:38,550 --> 00:01:42,240
Like facial recognition
is one of those areas.

34
00:01:42,240 --> 00:01:45,420
And the belief is that in many
narrow areas in computer vision

35
00:01:45,420 --> 00:01:48,340
within the next five years we'll
be able to reach human level

36
00:01:48,340 --> 00:01:49,600
perception.

37
00:01:49,600 --> 00:01:51,490
So it's a major branch of AI.

38
00:01:51,490 --> 00:01:53,760
It goes together
with machine learning

39
00:01:53,760 --> 00:01:56,250
and as said, a major progress.

40
00:01:56,250 --> 00:02:00,420
And one very
important thing, which

41
00:02:00,420 --> 00:02:06,330
is relevant to the industrial
impact of computer vision

42
00:02:06,330 --> 00:02:08,340
is that cameras
are the lowest cost

43
00:02:08,340 --> 00:02:09,810
sensor that you can imagine.

44
00:02:09,810 --> 00:02:13,050
A camera sensor
costs a few dollars.

45
00:02:13,050 --> 00:02:14,310
A lens costs a few dollars.

46
00:02:14,310 --> 00:02:15,540
All the rest is computing.

47
00:02:15,540 --> 00:02:17,340
And every sensor
needs computing.

48
00:02:17,340 --> 00:02:20,700
So if you can reach human
level perception with a camera

49
00:02:20,700 --> 00:02:24,000
you have a sensor that
the cost is so low

50
00:02:24,000 --> 00:02:26,240
that it can be everywhere.

51
00:02:26,240 --> 00:02:29,560
And that this is very--
this is very important.

52
00:02:29,560 --> 00:02:35,070
So I'll show you where
things are standing in terms

53
00:02:35,070 --> 00:02:36,240
of avoiding a collision.

54
00:02:36,240 --> 00:02:38,430
So avoiding collision,
you have a camera.

55
00:02:38,430 --> 00:02:42,300
Behind the windscreen
looking, facing forward,

56
00:02:42,300 --> 00:02:46,290
analyzing the video
coming from the camera.

57
00:02:46,290 --> 00:02:49,230
And the purpose of this
analysis is to avoid collisions.

58
00:02:49,230 --> 00:02:51,230
So what does it mean
to avoid collisions?

59
00:02:51,230 --> 00:02:54,000
The software needs
to detect vehicles,

60
00:02:54,000 --> 00:02:58,290
it need to detect pedestrians,
need to detect traffic lines,

61
00:02:58,290 --> 00:03:01,620
traffic signs, need to
detect traffic lights,

62
00:03:01,620 --> 00:03:05,160
detect lanes, to know where
the car is positioned relative

63
00:03:05,160 --> 00:03:06,510
to the lanes.

64
00:03:06,510 --> 00:03:09,150
And then send a signal to
the car control systems

65
00:03:09,150 --> 00:03:10,840
to avoid an accident.

66
00:03:10,840 --> 00:03:13,680
So let's look under the
hood what this means.

67
00:03:13,680 --> 00:03:18,280
So I'll let this run a bit until
all the information appears.

68
00:03:18,280 --> 00:03:20,380
So if we stop here,
what do we see?

69
00:03:20,380 --> 00:03:24,090
So the bounding
boxes around cars

70
00:03:24,090 --> 00:03:26,130
means that the system
has detect cars.

71
00:03:26,130 --> 00:03:29,550
Red means that this
vehicle is in our path.

72
00:03:29,550 --> 00:03:34,140
The green line here is
the detection of the lane.

73
00:03:34,140 --> 00:03:35,640
This is a traffic--

74
00:03:35,640 --> 00:03:37,260
this is a no entry traffic sign.

75
00:03:37,260 --> 00:03:40,165
This is a traffic light
being detected here.

76
00:03:40,165 --> 00:03:41,790
These are the
pedestrians and cyclists.

77
00:03:44,580 --> 00:03:48,100
Even a pedestrian standing
here is being detected.

78
00:03:48,100 --> 00:03:51,101
Let's no-- let this
run a bit further.

79
00:03:51,101 --> 00:03:51,600
All right.

80
00:03:51,600 --> 00:03:54,480
So these pedestrians
crossing the street.

81
00:03:54,480 --> 00:03:58,330
So this is running at
about 36 frames per second.

82
00:03:58,330 --> 00:04:00,900
So now imagine also the
amount of computations

83
00:04:00,900 --> 00:04:03,135
that are being running here.

84
00:04:03,135 --> 00:04:07,350
Again, this is the traffic sign,
traffic light, pedestrians,

85
00:04:07,350 --> 00:04:09,090
pedestrians here.

86
00:04:09,090 --> 00:04:12,690
So this is-- this is what
the system does today,

87
00:04:12,690 --> 00:04:15,840
detect objects,
detect lane marks,

88
00:04:15,840 --> 00:04:18,160
measure distances
to the objects.

89
00:04:18,160 --> 00:04:20,880
And in case you are
about to hit an object,

90
00:04:20,880 --> 00:04:23,020
the car would engage.

91
00:04:23,020 --> 00:04:25,110
Engage, at first it
will give warnings.

92
00:04:25,110 --> 00:04:28,920
Then later it will apply
automatic autonomous braking

93
00:04:28,920 --> 00:04:32,670
in order to avoid the accident.

94
00:04:32,670 --> 00:04:34,800
And here is a list of
many, many functions

95
00:04:34,800 --> 00:04:37,020
that the camera does
in terms of detecting

96
00:04:37,020 --> 00:04:39,990
objects and detecting--
trying to interpret

97
00:04:39,990 --> 00:04:46,800
the visual field at details that
are increasing over the years.

98
00:04:46,800 --> 00:04:51,910
Now computer vision is
also creating a disruption.

99
00:04:51,910 --> 00:04:54,870
So if you would ask an
engineer say, 15 years ago,

100
00:04:54,870 --> 00:04:57,720
what is a camera good
for in this space?

101
00:04:57,720 --> 00:04:59,100
The engineer would
say the camera

102
00:04:59,100 --> 00:05:00,872
is good for detecting lanes.

103
00:05:00,872 --> 00:05:02,580
Because there's no
other sensor that can,

104
00:05:02,580 --> 00:05:07,410
you know, find the lane marks,
not a radar, not a laser

105
00:05:07,410 --> 00:05:08,280
scanner.

106
00:05:08,280 --> 00:05:12,270
And it may be good for helping
the radar infusion-- radar

107
00:05:12,270 --> 00:05:19,080
camera fusion to compensate
for shortcomings of the radar.

108
00:05:19,080 --> 00:05:21,990
Traffic signs, OK it, will
be good for traffic signs.

109
00:05:21,990 --> 00:05:23,170
But that's it.

110
00:05:23,170 --> 00:05:25,800
But what happened
over the years is

111
00:05:25,800 --> 00:05:29,400
that the camera slowly
started taking territory

112
00:05:29,400 --> 00:05:31,200
from the radar.

113
00:05:31,200 --> 00:05:34,260
Until today, the camera is
really the primary sensor

114
00:05:34,260 --> 00:05:35,550
for active safety.

115
00:05:35,550 --> 00:05:39,630
Active safety is all this
area for avoiding accidents.

116
00:05:39,630 --> 00:05:42,230
And you can see this
through this chart.

117
00:05:42,230 --> 00:05:47,790
So in 2007, now we launched
the first camera radar fusion.

118
00:05:47,790 --> 00:05:49,170
So there's no disruption there.

119
00:05:49,170 --> 00:05:51,660
This is what normally people
would think a camera is good

120
00:05:51,660 --> 00:05:54,000
for, combining with a radar.

121
00:05:54,000 --> 00:05:58,200
2008, camera is also doing
traffic sign recognition.

122
00:05:58,200 --> 00:06:00,910
No disruption there.

123
00:06:00,910 --> 00:06:04,500
2010, camera's doing
pedestrian detection.

124
00:06:04,500 --> 00:06:06,210
No disruption there
because there's

125
00:06:06,210 --> 00:06:08,880
no other sensor that can
reliably detect pedestrians.

126
00:06:08,880 --> 00:06:12,270
Because they emit radar
very, very weakly.

127
00:06:12,270 --> 00:06:15,210
And pedestrians mostly
stationary object.

128
00:06:15,210 --> 00:06:19,770
And radars are not good at
detecting stationary objects.

129
00:06:19,770 --> 00:06:24,390
But then in 2011,
there's the first camera

130
00:06:24,390 --> 00:06:25,769
only forward collision warning.

131
00:06:25,769 --> 00:06:27,560
And that was the
beginning of a disruption.

132
00:06:27,560 --> 00:06:30,270
So forward collision warning
is detect a vehicle in front

133
00:06:30,270 --> 00:06:32,796
and provide a warning if
you are about to collide

134
00:06:32,796 --> 00:06:33,420
with a vehicle.

135
00:06:33,420 --> 00:06:36,300
And this was a
function that typically

136
00:06:36,300 --> 00:06:38,130
was in the territory of radars.

137
00:06:38,130 --> 00:06:41,100
So a radar sensor is very
good at detecting vehicles,

138
00:06:41,100 --> 00:06:43,990
very good at ranging,
very, very accurately can

139
00:06:43,990 --> 00:06:46,520
get the range of a vehicle,
say 100 meters away up

140
00:06:46,520 --> 00:06:48,510
to an accuracy of
a few centimeters.

141
00:06:48,510 --> 00:06:51,210
No camera can reach
those accuracies.

142
00:06:51,210 --> 00:06:55,710
So nobody believed that one
day a camera will take over

143
00:06:55,710 --> 00:06:57,120
the radar and do this function.

144
00:06:57,120 --> 00:06:59,820
And this is what
happened in 2011.

145
00:06:59,820 --> 00:07:01,350
And why did this happen?

146
00:07:01,350 --> 00:07:05,790
This happened because of
a commercial constraint.

147
00:07:05,790 --> 00:07:08,700
The regulator, the
American regulator,

148
00:07:08,700 --> 00:07:11,220
it's National Highway Safety
Transportation Agency,

149
00:07:11,220 --> 00:07:18,390
NHSTA, decided that by 2011 all
cars need to have as an option

150
00:07:18,390 --> 00:07:21,600
two functions, forward collision
warning and lane departure

151
00:07:21,600 --> 00:07:22,980
warning.

152
00:07:22,980 --> 00:07:25,590
Now this creates a problem
because forward collision

153
00:07:25,590 --> 00:07:27,062
warning requires a radar.

154
00:07:27,062 --> 00:07:28,770
Lane departure warning
requires a camera.

155
00:07:28,770 --> 00:07:32,280
So now put two sensors in
the car, it's expensive.

156
00:07:32,280 --> 00:07:34,560
If you can do it with one
sensor, like with a camera,

157
00:07:34,560 --> 00:07:36,430
then you save a lot of money.

158
00:07:36,430 --> 00:07:40,560
So this pushed the car
industry to adopt the idea

159
00:07:40,560 --> 00:07:43,020
that the camera can do
forward collision warning.

160
00:07:43,020 --> 00:07:45,520
And like all disruptions,
once you start small

161
00:07:45,520 --> 00:07:47,950
you kind of grow
very, very fast.

162
00:07:47,950 --> 00:07:51,270
So in 2013 the
camera is not only

163
00:07:51,270 --> 00:07:55,800
providing warning,
but also safe distance

164
00:07:55,800 --> 00:07:57,220
keeping to the car in front.

165
00:07:57,220 --> 00:07:59,400
It's called adaptive
cruise control.

166
00:07:59,400 --> 00:08:02,410
Then 2013 also provides
emergency braking.

167
00:08:02,410 --> 00:08:04,755
So the camera not only
decides that you're

168
00:08:04,755 --> 00:08:06,630
about to collide with
a vehicle, it will also

169
00:08:06,630 --> 00:08:08,460
apply the brakes for you.

170
00:08:08,460 --> 00:08:11,350
So 2013 was only
partial braking.

171
00:08:11,350 --> 00:08:15,340
So to avoid the accident up
to 30 kilometers per hour.

172
00:08:15,340 --> 00:08:18,810
And then in 2015, this
was few months ago,

173
00:08:18,810 --> 00:08:21,300
the camera is now
involved in full braking.

174
00:08:21,300 --> 00:08:24,420
It's one G-Force of braking
avoiding an accident of about

175
00:08:24,420 --> 00:08:26,790
70 - 80 kilometers per hour.

176
00:08:26,790 --> 00:08:29,460
And mitigating an
accident up to 220

177
00:08:29,460 --> 00:08:31,920
kilometers per hour,
just the camera.

178
00:08:31,920 --> 00:08:36,039
So the camera is taking over
and becoming the primary sensor

179
00:08:36,039 --> 00:08:41,580
in this area of active safety.

180
00:08:41,580 --> 00:08:43,589
Now, why is that?

181
00:08:43,589 --> 00:08:45,630
As I said, these are the
milestones of the camera

182
00:08:45,630 --> 00:08:49,830
disruption, is first the
camera has the highest density

183
00:08:49,830 --> 00:08:53,110
of information as a sensor.

184
00:08:53,110 --> 00:08:55,970
You know, laser
scanner or radar,

185
00:08:55,970 --> 00:09:00,680
the amount of pixels
per angle, per degree

186
00:09:00,680 --> 00:09:01,700
is much, much smaller.

187
00:09:01,700 --> 00:09:03,847
It's orders of magnitude
smaller than a camera.

188
00:09:03,847 --> 00:09:06,180
So you have a lot, a lot of
information from the camera.

189
00:09:06,180 --> 00:09:07,730
It's the lowest cost sensor.

190
00:09:07,730 --> 00:09:10,940
And also the cameras are getting
better in terms of performance

191
00:09:10,940 --> 00:09:12,510
under low light.

192
00:09:12,510 --> 00:09:14,510
So with a camera today
you can do much more,

193
00:09:14,510 --> 00:09:16,940
not only because computing
has progressed, not only

194
00:09:16,940 --> 00:09:19,550
because algorithms
are now better,

195
00:09:19,550 --> 00:09:22,040
but also because the
physics of the camera

196
00:09:22,040 --> 00:09:25,010
are progressing over
time, especially

197
00:09:25,010 --> 00:09:26,780
the light sensitivity
of the camera.

198
00:09:29,370 --> 00:09:33,930
So we also came
to the conclusion

199
00:09:33,930 --> 00:09:37,200
that we need to build our own
hardware and our own chip.

200
00:09:37,200 --> 00:09:40,250
And these are very, very
advanced microprocessors

201
00:09:40,250 --> 00:09:42,930
that they can per silicon
area are about 10 times more

202
00:09:42,930 --> 00:09:45,550
efficient than any
general purpose chip.

203
00:09:45,550 --> 00:09:48,990
And I'll not spend
more time on this.

204
00:09:48,990 --> 00:09:55,530
And so this field
has two major trends.

205
00:09:55,530 --> 00:09:59,880
One, on the left hand
side, is the active safety,

206
00:09:59,880 --> 00:10:03,430
which is driven by regulators.

207
00:10:03,430 --> 00:10:06,690
So the regulators see that
there is a sensor that is

208
00:10:06,690 --> 00:10:10,060
very low cost and saves lives.

209
00:10:10,060 --> 00:10:11,250
So what does the regular do?

210
00:10:11,250 --> 00:10:16,160
They incentivize this kind of
function to the car industry

211
00:10:16,160 --> 00:10:19,470
by coupling it to star ratings.

212
00:10:19,470 --> 00:10:23,460
So if you want to get your four
star or five stars the NCAP

213
00:10:23,460 --> 00:10:26,580
stars on the car, you have to
have this kind of technology

214
00:10:26,580 --> 00:10:28,050
as a standard fit in the car.

215
00:10:28,050 --> 00:10:30,880
So this pushes the
industry by mandates.

216
00:10:30,880 --> 00:10:34,080
It pushes the industry to
have active safety installed

217
00:10:34,080 --> 00:10:34,710
in every car.

218
00:10:34,710 --> 00:10:40,320
So by 2018 every new car
would have such a system.

219
00:10:40,320 --> 00:10:43,080
The left hand side is
the trend to the future,

220
00:10:43,080 --> 00:10:46,410
which is autonomous driving.

221
00:10:46,410 --> 00:10:48,600
Now autonomous driving
has two facets.

222
00:10:48,600 --> 00:10:53,670
One is bringing the
probability of an accident

223
00:10:53,670 --> 00:10:56,610
to infinitesimally
small probability.

224
00:10:56,610 --> 00:10:59,460
So zero-- zero accidents.

225
00:10:59,460 --> 00:11:01,740
Because the more you
delegate the driving

226
00:11:01,740 --> 00:11:03,420
experience to a robotic
system, the less

227
00:11:03,420 --> 00:11:05,910
the chance of an accident.

228
00:11:05,910 --> 00:11:10,440
So it brings us to an era where
there will be no accidents.

229
00:11:10,440 --> 00:11:14,160
But not less importantly,
it has the potential

230
00:11:14,160 --> 00:11:18,180
to transform the entire
transportation business.

231
00:11:18,180 --> 00:11:21,456
How we own cars,
how we build cars,

232
00:11:21,456 --> 00:11:23,670
the number of cars
that would be produced.

233
00:11:23,670 --> 00:11:28,140
And I'll spend a bit more time
about that as I go forward.

234
00:11:28,140 --> 00:11:31,279
Now, in terms of the left
hand side, the regulation,

235
00:11:31,279 --> 00:11:32,070
this is an example.

236
00:11:32,070 --> 00:11:38,300
So you see here a Nissan
Qashquai 2014 has five stars.

237
00:11:38,300 --> 00:11:40,690
And to know how to get the
five stars what you see here,

238
00:11:40,690 --> 00:11:41,690
these are all the tests.

239
00:11:41,690 --> 00:11:44,800
These are autonomous
emergency braking tests.

240
00:11:44,800 --> 00:11:47,850
The car needs to detect the
car in front, the target car,

241
00:11:47,850 --> 00:11:50,340
and apply the brakes
before the collision.

242
00:11:50,340 --> 00:11:52,080
And the car is being tested.

243
00:11:52,080 --> 00:11:56,520
And without that they'll
not get the five stars.

244
00:11:56,520 --> 00:12:01,860
You can see this also
in the number of chips

245
00:12:01,860 --> 00:12:03,190
that have been shipped.

246
00:12:03,190 --> 00:12:04,830
So every car has a chip.

247
00:12:04,830 --> 00:12:07,050
So this chip, the
microprocessor,

248
00:12:07,050 --> 00:12:08,820
is getting the information
from the camera

249
00:12:08,820 --> 00:12:12,150
and all the algorithms are
on this microprocessor.

250
00:12:12,150 --> 00:12:15,700
So we started
launching this in 2007.

251
00:12:15,700 --> 00:12:19,650
So in the first five years
there were one million chips,

252
00:12:19,650 --> 00:12:22,080
so one million cars
with the technology.

253
00:12:22,080 --> 00:12:25,530
And then in 2013
alone, 1.3 million.

254
00:12:25,530 --> 00:12:28,440
Then you see here, 2014, 2.7.

255
00:12:28,440 --> 00:12:30,780
This year is going to
be about five million.

256
00:12:30,780 --> 00:12:32,640
So you see this doubling.

257
00:12:32,640 --> 00:12:35,800
And this is really the
effect of the regulation.

258
00:12:35,800 --> 00:12:39,270
So in many industries
regulation is an impediment.

259
00:12:39,270 --> 00:12:41,630
In this industry, regulation
is something good.

260
00:12:41,630 --> 00:12:46,160
It pushes the industry to
install these kinds of systems,

261
00:12:46,160 --> 00:12:49,310
you know, standard.

262
00:12:49,310 --> 00:12:49,920
OK.

263
00:12:49,920 --> 00:12:53,130
So another example of how
this is moving, there's

264
00:12:53,130 --> 00:12:55,000
also an increasing awareness.

265
00:12:55,000 --> 00:13:01,170
So this is a commercial from
2014 Super Bowl by Hyundai.

266
00:13:01,170 --> 00:13:05,172
So Hyundai is showcasing their
new vehicle called Genesis.

267
00:13:05,172 --> 00:13:06,630
Now, there are many
things that you

268
00:13:06,630 --> 00:13:09,256
can show when you want to
showcase a new vehicle.

269
00:13:09,256 --> 00:13:11,130
You can talk about the
design of the vehicle.

270
00:13:11,130 --> 00:13:14,060
You could talk about the
engine, the infotainment.

271
00:13:14,060 --> 00:13:16,350
But they chose to talk
about the active safety.

272
00:13:16,350 --> 00:13:18,682
So I'll show you.

273
00:13:18,682 --> 00:13:24,090
[VIDEO PLAYBACK]

274
00:13:24,090 --> 00:13:28,500
- Remember when only
Dad could save the day?

275
00:13:28,500 --> 00:13:33,007
Auto emergency braking on the
all new Genesis from Hyundai.

276
00:13:33,007 --> 00:13:33,590
[END PLAYBACK]

277
00:13:33,590 --> 00:13:33,860
AMNON SHASHUA: OK.

278
00:13:33,860 --> 00:13:36,140
So this is the camera
behind the windscreen,

279
00:13:36,140 --> 00:13:38,370
detecting the car in
front, or a pedestrian,

280
00:13:38,370 --> 00:13:41,490
and will break
before the collision.

281
00:13:41,490 --> 00:13:44,316
Now to show you
what this is about--

282
00:13:44,316 --> 00:13:45,440
so that was the commercial.

283
00:13:45,440 --> 00:13:49,790
So in a commercial you can
show anything you like.

284
00:13:49,790 --> 00:13:52,820
So now I'll show you something
really from the field.

285
00:13:52,820 --> 00:13:57,890
So in 2010 Volvo introduced
the first pedestrian detection.

286
00:13:57,890 --> 00:13:59,945
So the same thing,
detect a pedestrian

287
00:13:59,945 --> 00:14:02,570
and if you are about to collide
with a pedestrian the car would

288
00:14:02,570 --> 00:14:06,120
brake, apply the
brakes automatically.

289
00:14:06,120 --> 00:14:09,110
So in 2010 they had about
5,000 journalistic events,

290
00:14:09,110 --> 00:14:12,350
where they put a reporter
behind the steering wheel,

291
00:14:12,350 --> 00:14:14,350
tell the reporter to
drive towards a mannequin,

292
00:14:14,350 --> 00:14:16,940
toward a doll, and
low and behold the car

293
00:14:16,940 --> 00:14:18,470
would brake just before--

294
00:14:18,470 --> 00:14:22,770
a fraction of a second
before you hit the doll.

295
00:14:22,770 --> 00:14:25,460
But then when you buy the car
you can do your own testing.

296
00:14:25,460 --> 00:14:27,120
So I downloaded
from the internet,

297
00:14:27,120 --> 00:14:29,900
its a bunch of Polish guys.

298
00:14:29,900 --> 00:14:32,780
And it's a bit funny
but you'll actually

299
00:14:32,780 --> 00:14:35,270
get a good feeling of
what this system does

300
00:14:35,270 --> 00:14:36,860
by looking at this clip.

301
00:14:50,164 --> 00:14:50,915
OK?

302
00:14:50,915 --> 00:14:55,217
So this is automatic
emergency braking.

303
00:14:55,217 --> 00:14:58,140
Today it works, it
avoids accidents

304
00:14:58,140 --> 00:14:59,610
about 70 kilometers per hour.

305
00:15:05,430 --> 00:15:06,170
OK?

306
00:15:06,170 --> 00:15:09,770
So now you have a better idea
of what I'm talking about.

307
00:15:09,770 --> 00:15:11,900
So now let's go into the future.

308
00:15:11,900 --> 00:15:13,550
So this was just
setting the baseline.

309
00:15:13,550 --> 00:15:14,050
OK.

310
00:15:14,050 --> 00:15:15,890
What is active safety?

311
00:15:15,890 --> 00:15:18,040
Where's computer
vision inside this?

312
00:15:18,040 --> 00:15:22,340
So now let's look in
the next four years.

313
00:15:22,340 --> 00:15:27,850
And the idea is to evolve
this kind of technology

314
00:15:27,850 --> 00:15:30,670
to a point where you
can delegate the driving

315
00:15:30,670 --> 00:15:32,620
experience to a robotic system.

316
00:15:32,620 --> 00:15:35,410
And then the question is,
what needs to be done.

317
00:15:35,410 --> 00:15:38,110
And this slide shows that
there are two paradigms.

318
00:15:38,110 --> 00:15:41,230
And the reality is somewhere
in between these two paradigms.

319
00:15:41,230 --> 00:15:44,840
The right hand side
is where we are today.

320
00:15:44,840 --> 00:15:46,830
You are based only on sensing.

321
00:15:46,830 --> 00:15:47,560
You have camera.

322
00:15:47,560 --> 00:15:51,400
Maybe you have also a radar, or
laser scanner for redundancy.

323
00:15:51,400 --> 00:15:53,110
You get the information
from the sensors.

324
00:15:53,110 --> 00:15:56,800
You have algorithms that try
to interpret the visual field

325
00:15:56,800 --> 00:15:59,430
and take action in
case of an accident,

326
00:15:59,430 --> 00:16:01,840
or control the vehicle.

327
00:16:01,840 --> 00:16:04,950
On the left hand side
is the extreme case,

328
00:16:04,950 --> 00:16:07,630
is the Google approach,
where there is little

329
00:16:07,630 --> 00:16:08,560
sensing involved.

330
00:16:08,560 --> 00:16:11,380
It's a lot of recording.

331
00:16:11,380 --> 00:16:13,660
So you prerecord your drive.

332
00:16:13,660 --> 00:16:16,420
Once you have prerecorded the
drive all what you need to do

333
00:16:16,420 --> 00:16:19,630
is to match your sensing
to the prerecorded drive.

334
00:16:19,630 --> 00:16:23,250
Once you've found the match,
you know your position exactly.

335
00:16:23,250 --> 00:16:25,707
So you don't need to
know to detect lanes.

336
00:16:25,707 --> 00:16:27,040
You know all the moving objects.

337
00:16:27,040 --> 00:16:30,580
Because the recording contains
only stationary objects.

338
00:16:30,580 --> 00:16:32,790
So all the moving
objects pop out.

339
00:16:32,790 --> 00:16:36,309
So that the load on the
sensing is much, much smaller

340
00:16:36,309 --> 00:16:38,350
than in the case where
you didn't do it pre-drive

341
00:16:38,350 --> 00:16:39,902
and you didn't record.

342
00:16:39,902 --> 00:16:41,860
This recording, the
problem with the recording,

343
00:16:41,860 --> 00:16:43,650
is that we are talking
about tons of data.

344
00:16:43,650 --> 00:16:49,330
It's a 360 degree, 3D recording,
at several frames per second.

345
00:16:49,330 --> 00:16:51,160
So the amount of data is huge.

346
00:16:51,160 --> 00:16:55,060
So there's issues of how you
manage this, how you record it,

347
00:16:55,060 --> 00:16:56,860
and how you update
this over time.

348
00:16:56,860 --> 00:17:00,520
Because you have to continuously
update this kind of data.

349
00:17:00,520 --> 00:17:04,359
And reality is going to
be somewhere in between.

350
00:17:07,099 --> 00:17:12,160
So the first leap that is
undergoing and happening

351
00:17:12,160 --> 00:17:16,560
in the next five years is to
reach human level perception.

352
00:17:16,560 --> 00:17:18,790
Now it sounds very,
very ambitious.

353
00:17:18,790 --> 00:17:22,930
But there's lots of indications
that it is not science fiction.

354
00:17:22,930 --> 00:17:26,530
It is really-- there is
a very high probability

355
00:17:26,530 --> 00:17:27,880
that one can reach this.

356
00:17:27,880 --> 00:17:30,400
So in certain areas,
like face recognition,

357
00:17:30,400 --> 00:17:32,600
certain categorization tasks--

358
00:17:32,600 --> 00:17:35,230
now if you look at the
academic achievements,

359
00:17:35,230 --> 00:17:38,160
they have surpassed
human level perception.

360
00:17:38,160 --> 00:17:41,510
I'll spend a few
slides on this later.

361
00:17:41,510 --> 00:17:45,010
So going from adaptive--

362
00:17:45,010 --> 00:17:47,890
going from driver assist
to human level perception,

363
00:17:47,890 --> 00:17:50,530
first, we need to extend
the list of objects.

364
00:17:50,530 --> 00:17:52,080
Not only vehicles
and pedestrians,

365
00:17:52,080 --> 00:17:59,320
but vehicles at any angles, know
about 1,000 different object

366
00:17:59,320 --> 00:18:02,950
categories in the
scene, know about how

367
00:18:02,950 --> 00:18:07,150
to predict a path using context,
which today is not being used.

368
00:18:07,150 --> 00:18:08,860
Detailed road
interpretation, knowing

369
00:18:08,860 --> 00:18:10,600
about curbs and
barriers and guards,

370
00:18:10,600 --> 00:18:13,690
it's all the stuff that when we
look at the road we naturally

371
00:18:13,690 --> 00:18:16,220
interpret it very, very easily.

372
00:18:16,220 --> 00:18:18,730
These are the things that needs
to be done in order to reach

373
00:18:18,730 --> 00:18:20,530
human level perception.

374
00:18:20,530 --> 00:18:25,420
And the tool to do this is
the deep layered networks,

375
00:18:25,420 --> 00:18:28,840
which I'll spend a few slides
about this in a moment.

376
00:18:28,840 --> 00:18:30,430
And the need for
context, so these

377
00:18:30,430 --> 00:18:32,725
are examples, for
example, path planning.

378
00:18:32,725 --> 00:18:34,240
You want to fuse
all the information

379
00:18:34,240 --> 00:18:36,850
available from the image, not
only to look for the lanes,

380
00:18:36,850 --> 00:18:38,974
because in many situations
you look at an image you

381
00:18:38,974 --> 00:18:40,010
don't see lanes.

382
00:18:40,010 --> 00:18:42,130
But a human observer
would very easily

383
00:18:42,130 --> 00:18:46,240
know where the path is just
from looking at the context.

384
00:18:46,240 --> 00:18:49,720
In modeling data environment,
ultimately every pixel

385
00:18:49,720 --> 00:18:50,574
give you a category.

386
00:18:50,574 --> 00:18:52,240
Tell me where this
pixel is coming from,

387
00:18:52,240 --> 00:18:53,948
a pedestrian, from a
vehicle, from inside

388
00:18:53,948 --> 00:18:58,210
of a vehicle, barrier,
curb, guardrail, lamp post,

389
00:18:58,210 --> 00:19:00,580
so forth and so forth.

390
00:19:00,580 --> 00:19:02,060
3D modeling of a vehicle.

391
00:19:02,060 --> 00:19:04,300
So put a 3D bounding
box around the vehicle

392
00:19:04,300 --> 00:19:07,510
so that we can know which side
of a vehicle I'm looking at,

393
00:19:07,510 --> 00:19:09,550
whether it's the front,
or rear, left side,

394
00:19:09,550 --> 00:19:11,510
right side, what is the angle.

395
00:19:11,510 --> 00:19:15,040
Know everything about
vehicles as moving objects

396
00:19:15,040 --> 00:19:16,840
and do a lot of
scene recognition.

397
00:19:16,840 --> 00:19:20,120
I'll give some examples
about that later.

398
00:19:20,120 --> 00:19:23,290
So just deep networks,
I know that you have--

399
00:19:23,290 --> 00:19:25,325
you all know about
deep networks.

400
00:19:25,325 --> 00:19:27,550
I'll just spend
a few slides just

401
00:19:27,550 --> 00:19:29,370
to state what is
the impact there,

402
00:19:29,370 --> 00:19:32,620
not the impact from the
point of view of a scientist,

403
00:19:32,620 --> 00:19:37,900
but the impact from the point
of view of a technologist.

404
00:19:37,900 --> 00:19:41,630
Because there isn't much
science behind this.

405
00:19:41,630 --> 00:19:46,780
So the real turning
point was 2012.

406
00:19:46,780 --> 00:19:51,000
2012, you know,
the AlexNet, they

407
00:19:51,000 --> 00:19:54,340
show they built a
convolutional net that

408
00:19:54,340 --> 00:19:57,940
was able to work on
the ImageNet data set

409
00:19:57,940 --> 00:20:00,280
and reach a performance
level, which

410
00:20:00,280 --> 00:20:02,380
was more or less
double the performance

411
00:20:02,380 --> 00:20:05,030
level of what was done before.

412
00:20:05,030 --> 00:20:07,990
This is another
network by Fergus.

413
00:20:07,990 --> 00:20:10,480
Very, very similar concept
of convolution pooling.

414
00:20:10,480 --> 00:20:13,870
Convolution pooling
two-three dense layers

415
00:20:13,870 --> 00:20:16,060
and you get the output.

416
00:20:16,060 --> 00:20:17,950
This is the ImageNet data set.

417
00:20:17,950 --> 00:20:22,150
You have about 1,000 categories
over one million images.

418
00:20:22,150 --> 00:20:25,390
And these categories
are very challenging.

419
00:20:25,390 --> 00:20:28,150
You know, you look at the
images of a sailing vessel

420
00:20:28,150 --> 00:20:32,050
or images of a husky,
the variation is huge.

421
00:20:32,050 --> 00:20:35,650
It's a really very
difficult task.

422
00:20:35,650 --> 00:20:38,950
2011 the top five--

423
00:20:38,950 --> 00:20:42,520
so the task is that you need
to give a short list of five

424
00:20:42,520 --> 00:20:43,280
categories.

425
00:20:43,280 --> 00:20:46,180
And if the correct categories
is among the top five

426
00:20:46,180 --> 00:20:47,710
then you succeeded.

427
00:20:47,710 --> 00:20:52,220
And the performance
was about 26% error.

428
00:20:52,220 --> 00:20:55,750
And this AlexNet reached 16%.

429
00:20:55,750 --> 00:20:57,970
So it's almost double
the performance.

430
00:20:57,970 --> 00:21:01,200
So this caught the
attention of the community.

431
00:21:01,200 --> 00:21:05,910
It's a big leap from 26% to 16%.

432
00:21:05,910 --> 00:21:09,280
Now if you look what happened
since then, so 2012 for this

433
00:21:09,280 --> 00:21:14,050
ImageNet competition there
was one out of six competitors

434
00:21:14,050 --> 00:21:15,580
use deep networks.

435
00:21:15,580 --> 00:21:20,860
A year later 17 out of 24
competitors used deep networks.

436
00:21:20,860 --> 00:21:24,580
A year later 31 out of
32 using deep networks.

437
00:21:24,580 --> 00:21:27,354
So deep networks
took over basically.

438
00:21:27,354 --> 00:21:29,020
If you look in terms
of the performance,

439
00:21:29,020 --> 00:21:33,030
of the human
performance is about 5%.

440
00:21:33,030 --> 00:21:43,090
And right now we are at 6%, 5%,
by the latest 2015 competitors.

441
00:21:43,090 --> 00:21:44,380
People start cheating.

442
00:21:44,380 --> 00:21:46,630
So I think this is
more or less Baidu was

443
00:21:46,630 --> 00:21:48,970
caught cheating on this test.

444
00:21:48,970 --> 00:21:52,390
So I think 5% is more or
less where things are going.

445
00:21:52,390 --> 00:21:55,390
And this is human
level perception.

446
00:21:55,390 --> 00:21:59,030
Another-- another big success
was the face recognition.

447
00:21:59,030 --> 00:22:01,180
So this is a data set
called face recognition

448
00:22:01,180 --> 00:22:05,470
in the wild, which contains
pictures of celebrities

449
00:22:05,470 --> 00:22:08,850
where every celebrity you
have pictures, you know,

450
00:22:08,850 --> 00:22:10,920
along a spectrum of
many, many years.

451
00:22:10,920 --> 00:22:13,330
You can see the actor
when he was 20 years old

452
00:22:13,330 --> 00:22:15,930
and then when he's
70-80 years old.

453
00:22:15,930 --> 00:22:19,230
Even for humans, this
task is quite challenging,

454
00:22:19,230 --> 00:22:22,910
knowing whether two pictures
are from the same person or not.

455
00:22:22,910 --> 00:22:28,380
And the human level
performance is 97.5% correct.

456
00:22:28,380 --> 00:22:33,180
Now if you look at techniques
not using deep networks,

457
00:22:33,180 --> 00:22:36,390
they reached 91.4%.

458
00:22:36,390 --> 00:22:40,620
And in 2014 a group by
Facebook and Lior Wolf

459
00:22:40,620 --> 00:22:45,600
from Tel Aviv University,
they built a deep network

460
00:22:45,600 --> 00:22:47,400
to do face recognition
and reached

461
00:22:47,400 --> 00:22:51,780
97.3%, which is very, very
close to human perception.

462
00:22:51,780 --> 00:22:55,980
And since then people have
reach 99% on this database.

463
00:22:55,980 --> 00:22:58,770
And again, human level
perception is 97.5.

464
00:22:58,770 --> 00:23:03,060
So this is another area where
these deep networks, also

465
00:23:03,060 --> 00:23:04,050
in speech.

466
00:23:04,050 --> 00:23:08,720
This is a recent paper by
Baidu headed by Andrew Ng.

467
00:23:08,720 --> 00:23:10,440
They surpassed,
just doing an end

468
00:23:10,440 --> 00:23:16,890
to end network which learns
also the structured prediction,

469
00:23:16,890 --> 00:23:20,062
better performance than
Siri, Cortana, Google Now.

470
00:23:20,062 --> 00:23:21,480
OK?

471
00:23:21,480 --> 00:23:24,660
So the impact for
automotive is that networks

472
00:23:24,660 --> 00:23:30,430
are very good at multi-class So
the more categories you have,

473
00:23:30,430 --> 00:23:33,630
the better the performance
of the network would be.

474
00:23:33,630 --> 00:23:37,050
Very good at using context,
imagining or planning a path.

475
00:23:37,050 --> 00:23:40,740
So taking an image as an input
and output would be the path.

476
00:23:40,740 --> 00:23:42,840
And you're cutting short
of all the processes

477
00:23:42,840 --> 00:23:48,160
of looking for lanes and
this kind of algorithms.

478
00:23:48,160 --> 00:23:51,090
Network will be ideal
for pixel level labeling.

479
00:23:51,090 --> 00:23:53,850
For every pixel
give me a category.

480
00:23:53,850 --> 00:23:56,320
And you can use the networks
for sensor integration,

481
00:23:56,320 --> 00:23:59,370
for determining the control of
the vehicle by fusing a lot,

482
00:23:59,370 --> 00:24:03,210
a lot of information coming
from various cameras.

483
00:24:03,210 --> 00:24:06,210
So the challenge of
using deep networks

484
00:24:06,210 --> 00:24:10,350
is that deep networks
are very, very large.

485
00:24:10,350 --> 00:24:12,290
They're not designed
for real time.

486
00:24:12,290 --> 00:24:16,320
The networks that you find in
academic papers and the success

487
00:24:16,320 --> 00:24:17,860
are for easy problems.

488
00:24:17,860 --> 00:24:19,860
The problems that
I've shown right now,

489
00:24:19,860 --> 00:24:21,330
the ImageNet, the
face recognition,

490
00:24:21,330 --> 00:24:24,420
are relatively
considered easy problems

491
00:24:24,420 --> 00:24:29,310
in the context of interpreting
the image for autonomous

492
00:24:29,310 --> 00:24:30,960
driving.

493
00:24:30,960 --> 00:24:35,820
So let me show you what are
the things that one can do.

494
00:24:35,820 --> 00:24:37,380
Let's start with
the path planning.

495
00:24:37,380 --> 00:24:41,910
So this clip that I'll show you
here the purpose of the network

496
00:24:41,910 --> 00:24:43,020
is to determine the path.

497
00:24:43,020 --> 00:24:45,310
So this is the green line.

498
00:24:45,310 --> 00:24:48,990
Now these clips are from
scenes where it will

499
00:24:48,990 --> 00:24:50,520
be impossible to detect lanes.

500
00:24:50,520 --> 00:24:52,440
Because there's simply
no-- simply no lanes.

501
00:24:52,440 --> 00:24:56,010
If you look at this, any
lane detection system

502
00:24:56,010 --> 00:25:00,210
would find nothing in
this kind of scene.

503
00:25:00,210 --> 00:25:02,700
Yet when you look
at this image, you

504
00:25:02,700 --> 00:25:05,400
have no problem in
determining where the path is.

505
00:25:05,400 --> 00:25:07,807
Because you're looking at
the entire image context.

506
00:25:07,807 --> 00:25:09,390
And this is what the
network is doing.

507
00:25:09,390 --> 00:25:11,840
It's being fed the input
layer is the image,

508
00:25:11,840 --> 00:25:16,810
the output layer
is this green line.

509
00:25:16,810 --> 00:25:19,050
Or for example, if you
at this urban setting.

510
00:25:19,050 --> 00:25:20,880
There are no lanes
in an urban setting.

511
00:25:20,880 --> 00:25:22,980
Yet the system can
predict where the path

512
00:25:22,980 --> 00:25:26,310
is by fusing information
from the entire context.

513
00:25:29,350 --> 00:25:32,280
These are roads in
California where

514
00:25:32,280 --> 00:25:35,190
they have these reflectors
called Botts dots.

515
00:25:35,190 --> 00:25:38,610
It's almost impossible
to reliably, you know,

516
00:25:38,610 --> 00:25:42,630
fit lanes to these
kinds of information.

517
00:25:42,630 --> 00:25:45,180
Yet if you look at this
holistic path planning

518
00:25:45,180 --> 00:25:49,900
it reliably can tell
you where it is.

519
00:25:49,900 --> 00:25:51,600
Let's look at free space.

520
00:25:51,600 --> 00:25:53,256
So free space, the
idea of when you

521
00:25:53,256 --> 00:25:54,630
want to do autonomous
driving you

522
00:25:54,630 --> 00:25:55,920
need to know where not to drive.

523
00:25:55,920 --> 00:25:56,420
Right?

524
00:25:56,420 --> 00:25:58,250
You don't want to
drive towards the curb.

525
00:25:58,250 --> 00:25:59,625
It's not only that
you don't want

526
00:25:59,625 --> 00:26:01,350
to hit other moving objects.

527
00:26:01,350 --> 00:26:02,640
That's the easy part.

528
00:26:02,640 --> 00:26:05,250
You don't want to hit a
barrier or a guardrail.

529
00:26:05,250 --> 00:26:07,950
So you want to know
where the free space is.

530
00:26:07,950 --> 00:26:10,590
So you can think of a
network that for every pixel

531
00:26:10,590 --> 00:26:12,450
will give you a label.

532
00:26:12,450 --> 00:26:16,290
And let's now focus only on the
label of road versus not road.

533
00:26:16,290 --> 00:26:18,450
So all the pixel green are road.

534
00:26:18,450 --> 00:26:19,980
Everything else is not road.

535
00:26:19,980 --> 00:26:21,930
So you can see that
the green is not

536
00:26:21,930 --> 00:26:25,450
going over the curb,
which is-- which is nice.

537
00:26:25,450 --> 00:26:30,234
But let's have it
run a bit more.

538
00:26:30,234 --> 00:26:31,650
And then I'll stop
it at the place

539
00:26:31,650 --> 00:26:34,420
where you'll see where
the power of context.

540
00:26:34,420 --> 00:26:37,380
Says let's assume
I stop it here.

541
00:26:37,380 --> 00:26:41,975
Now look at the sidewalk there.

542
00:26:41,975 --> 00:26:44,100
The color of the sidewalk
and the color of the road

543
00:26:44,100 --> 00:26:45,930
is identical.

544
00:26:45,930 --> 00:26:48,330
The height of the curb
is about one centimeter.

545
00:26:48,330 --> 00:26:51,990
So it's not that the
height here, the geometry--

546
00:26:51,990 --> 00:26:54,300
it's basically the context.

547
00:26:54,300 --> 00:26:57,180
The network figured out because
there is a parked car there.

548
00:26:57,180 --> 00:27:01,300
That part is not
part of the road.

549
00:27:01,300 --> 00:27:03,390
So in order to make
this judgment correctly,

550
00:27:03,390 --> 00:27:06,640
one needs to not just look at
a small area around the pixel

551
00:27:06,640 --> 00:27:08,429
and decide whether
it's road or not road.

552
00:27:08,429 --> 00:27:10,720
One needs to collect information
from the entire image.

553
00:27:10,720 --> 00:27:12,490
This is the power of context.

554
00:27:12,490 --> 00:27:15,130
And this is something
that the network can do.

555
00:27:15,130 --> 00:27:18,640
You can see here, where
the blue and red lines.

556
00:27:18,640 --> 00:27:20,590
Red means it's on a vehicle.

557
00:27:20,590 --> 00:27:25,040
Blue that it's on
a physical barrier.

558
00:27:25,040 --> 00:27:26,860
So if I run this back here--

559
00:27:26,860 --> 00:27:28,390
and this is done frame by frame.

560
00:27:28,390 --> 00:27:30,040
So it's a single frame thing.

561
00:27:30,040 --> 00:27:34,600
Same thing here, this height
is one or two centimeters.

562
00:27:34,600 --> 00:27:37,870
The color of the sidewalk and
the color of the road it's

563
00:27:37,870 --> 00:27:38,882
identical.

564
00:27:38,882 --> 00:27:40,840
So being able to make
the correct judgment here

565
00:27:40,840 --> 00:27:42,560
is very, very challenging.

566
00:27:42,560 --> 00:27:46,320
And this is where a
network can succeed.

567
00:27:46,320 --> 00:27:49,480
Here the network also
predicts that this

568
00:27:49,480 --> 00:27:53,490
is a code for being a curb.

569
00:27:53,490 --> 00:27:57,470
The red is the side of a
vehicle or front of the vehicle.

570
00:27:57,470 --> 00:28:00,730
And the next one it
predicts that this

571
00:28:00,730 --> 00:28:06,670
is part of a guardrail,
the coding of this

572
00:28:06,670 --> 00:28:07,640
is part of a guardrail.

573
00:28:07,640 --> 00:28:09,730
So the system has
about 15 categories,

574
00:28:09,730 --> 00:28:12,070
guardrail, curb,
barrier, and so forth.

575
00:28:12,070 --> 00:28:16,030
Let's keep the
questions for later.

576
00:28:16,030 --> 00:28:16,840
And so forth.

577
00:28:16,840 --> 00:28:20,950
So this is one area we
call semantic free space.

578
00:28:20,950 --> 00:28:24,840
So for every pixel in the
scene tell me what it is.

579
00:28:24,840 --> 00:28:26,340
Of course, I'm
interested-- first

580
00:28:26,340 --> 00:28:28,590
and foremost I'm interested
to know where the road is.

581
00:28:28,590 --> 00:28:31,990
And then at the edges of where
the road ends to know what

582
00:28:31,990 --> 00:28:32,680
is the label.

583
00:28:32,680 --> 00:28:34,690
Is it a side of a vehicle,
front of a vehicle,

584
00:28:34,690 --> 00:28:36,050
rear of a vehicle.

585
00:28:36,050 --> 00:28:40,530
Is it a curb barrier
guardrail and so forth.

586
00:28:40,530 --> 00:28:43,570
And this, again, is
done by deep network.

587
00:28:43,570 --> 00:28:45,750
I'll skip this one.

588
00:28:45,750 --> 00:28:51,470
And then you can apply this
from cameras from any angle.

589
00:28:51,470 --> 00:28:53,770
So this is a camera
looking at a corner,

590
00:28:53,770 --> 00:28:56,410
looking at the 45
degrees on the right.

591
00:28:56,410 --> 00:28:59,350
So the system can know
where the free space is.

592
00:28:59,350 --> 00:29:02,390
This is a camera from the
side, with a fish eye.

593
00:29:02,390 --> 00:29:04,720
Again, using the same kind
of technology the system

594
00:29:04,720 --> 00:29:06,355
can know where
the free space is.

595
00:29:09,100 --> 00:29:11,880
Same thing here.

596
00:29:11,880 --> 00:29:14,440
Here as well, day night.

597
00:29:14,440 --> 00:29:16,240
3D modeling, 3D
modeling is to be

598
00:29:16,240 --> 00:29:18,430
able to put a bounding
box, a 3D bounding box,

599
00:29:18,430 --> 00:29:19,450
around the vehicle.

600
00:29:19,450 --> 00:29:23,380
And the color here is that the
green is front, red is rear,

601
00:29:23,380 --> 00:29:27,250
blue is right hand side, and
yellow is left hand side.

602
00:29:27,250 --> 00:29:28,670
If you let this run--

603
00:29:28,670 --> 00:29:30,670
all right.

604
00:29:30,670 --> 00:29:32,950
Now the importance of
putting a 3D bounding

605
00:29:32,950 --> 00:29:35,800
box around the vehicle
is that now you can

606
00:29:35,800 --> 00:29:37,850
place a camera at any angle.

607
00:29:37,850 --> 00:29:39,560
So it's not only
camera looking forward,

608
00:29:39,560 --> 00:29:42,490
but the camera at any
angle, because the way

609
00:29:42,490 --> 00:29:44,680
a vehicle is defined is
invariant to the camera

610
00:29:44,680 --> 00:29:48,310
position, so this is kind of a
preparation for putting cameras

611
00:29:48,310 --> 00:29:50,620
all around the
vehicle at the 360--

612
00:29:50,620 --> 00:29:51,520
360 degree.

613
00:29:55,390 --> 00:29:57,860
Scene recognition,
for example, being--

614
00:29:57,860 --> 00:29:59,770
to know that this
is a bump, is also

615
00:29:59,770 --> 00:30:03,280
being done by a network
that takes an image

616
00:30:03,280 --> 00:30:06,010
and outputs where the bumps are.

617
00:30:06,010 --> 00:30:09,190
The same thing--
same thing here.

618
00:30:09,190 --> 00:30:13,600
More complicated than that is
knowing where this top line is.

619
00:30:13,600 --> 00:30:17,410
So when you go and detect
traffic lights-- so

620
00:30:17,410 --> 00:30:20,366
detecting traffic lights
is the easy problem.

621
00:30:20,366 --> 00:30:22,480
A more complicated
problem is to know

622
00:30:22,480 --> 00:30:25,840
the relevancy of the traffic
lights, which traffic light is

623
00:30:25,840 --> 00:30:27,340
relevant to what direction.

624
00:30:27,340 --> 00:30:29,620
The third one, the
most difficult problem,

625
00:30:29,620 --> 00:30:32,110
is to detect the stop line.

626
00:30:32,110 --> 00:30:35,470
The problem with stop line is
that when you see the stop line

627
00:30:35,470 --> 00:30:36,500
it's a bit too late.

628
00:30:36,500 --> 00:30:38,980
You see this stop line
20-30 meters away.

629
00:30:38,980 --> 00:30:40,990
So it's too late
to start stopping

630
00:30:40,990 --> 00:30:42,820
and have a smooth stopping.

631
00:30:42,820 --> 00:30:46,790
You want to predict where the
stop line is 60-70 meters away.

632
00:30:46,790 --> 00:30:50,319
So here, you want your
algorithm, or your network,

633
00:30:50,319 --> 00:30:52,360
to understand that you
are approaching a junction

634
00:30:52,360 --> 00:30:55,000
and start estimating where the
stop line should be so they can

635
00:30:55,000 --> 00:30:59,530
start slowly reducing your
speed, such that by the time

636
00:30:59,530 --> 00:31:01,330
you see where the
stop line is you

637
00:31:01,330 --> 00:31:05,110
already reduced your
speed considerably.

638
00:31:05,110 --> 00:31:06,760
I'll skip this.

639
00:31:06,760 --> 00:31:08,470
Knowing lane assignment.

640
00:31:08,470 --> 00:31:10,810
Knowing how many lanes
are and which lane you are

641
00:31:10,810 --> 00:31:12,940
is also done by a network.

642
00:31:12,940 --> 00:31:14,780
So the network will
give a probability

643
00:31:14,780 --> 00:31:16,690
whether that this is a
lane, this is a lane.

644
00:31:16,690 --> 00:31:18,685
For example, it knows
that this is not a lane.

645
00:31:18,685 --> 00:31:23,380
It has here red,
zero probability.

646
00:31:23,380 --> 00:31:24,400
So as you can see here--

647
00:31:24,400 --> 00:31:25,450
I'll skip this one.

648
00:31:25,450 --> 00:31:30,560
So these networks, so for
every task there is a network.

649
00:31:30,560 --> 00:31:34,240
And these networks are quite
sophisticated in accessing,

650
00:31:34,240 --> 00:31:37,200
integrating a context
at traffic light.

651
00:31:37,200 --> 00:31:39,780
I'll skip this
with traffic light.

652
00:31:39,780 --> 00:31:43,850
So multiple cameras,
this is how it is--

653
00:31:43,850 --> 00:31:44,620
it looks like.

654
00:31:44,620 --> 00:31:49,690
You have the red ones are three
cameras behind the windscreen.

655
00:31:49,690 --> 00:31:51,500
One is about 180 degrees.

656
00:31:51,500 --> 00:31:52,900
The other one is about 50.

657
00:31:52,900 --> 00:31:55,000
The third one is
about 25 degrees.

658
00:31:55,000 --> 00:31:56,800
And then there are
another five cameras

659
00:31:56,800 --> 00:32:02,290
around the car that give
you all 360 degrees.

660
00:32:02,290 --> 00:32:07,210
And this kind of configuration,
first launch of it,

661
00:32:07,210 --> 00:32:09,820
in a series produced
car, is going to be 2016.

662
00:32:09,820 --> 00:32:14,470
So I'm not talking
about science fiction.

663
00:32:14,470 --> 00:32:20,650
These are how images look
from some of these cameras.

664
00:32:20,650 --> 00:32:25,630
So let me show you a first
clip of automated driving.

665
00:32:25,630 --> 00:32:26,770
This is kind of a funny--

666
00:32:26,770 --> 00:32:28,970
funny clip.

667
00:32:28,970 --> 00:32:33,109
This is an actor who played
a major role in Star Trek.

668
00:32:33,109 --> 00:32:34,150
So I'll not say his name.

669
00:32:34,150 --> 00:32:37,420
Let's see whether you can
identify him yourself.

670
00:32:37,420 --> 00:32:39,250
And he has a program
called-- program

671
00:32:39,250 --> 00:32:41,290
for kids called Reading Rainbow.

672
00:32:41,290 --> 00:32:43,920
So this program is 20 years old.

673
00:32:43,920 --> 00:32:45,520
And he came to
Israel and he wanted

674
00:32:45,520 --> 00:32:50,560
to drive the autonomous vehicle
that we have for his kids

675
00:32:50,560 --> 00:32:52,900
program.

676
00:32:52,900 --> 00:32:54,700
So he was driving my car.

677
00:32:54,700 --> 00:32:55,870
So my car is autonomous.

678
00:32:55,870 --> 00:32:57,550
I can drive from Tel
Aviv to Jerusalem

679
00:32:57,550 --> 00:32:59,485
without touching
the steering wheel.

680
00:32:59,485 --> 00:33:02,270
It's-- I do that all the time.

681
00:33:02,270 --> 00:33:03,850
So he was driving it.

682
00:33:03,850 --> 00:33:06,110
And it's a bit funny.

683
00:33:06,110 --> 00:33:09,480
So let's-- but you'll get
a feeling of what this is.

684
00:33:09,480 --> 00:33:10,500
So let's run this.

685
00:33:10,500 --> 00:33:12,727
It's two minutes.

686
00:33:12,727 --> 00:33:16,720
[VIDEO PLAYBACK]

687
00:33:16,720 --> 00:33:17,350
- Yes.

688
00:33:17,350 --> 00:33:18,490
They can.

689
00:33:18,490 --> 00:33:20,050
That's because
technology companies,

690
00:33:20,050 --> 00:33:22,120
like Mobileye here
in Israel, are

691
00:33:22,120 --> 00:33:25,024
about to introduce self-driving
technologies to the world.

692
00:33:25,024 --> 00:33:26,440
AMNON SHASHUA: You
know who he is?

693
00:33:26,440 --> 00:33:27,981
- In the not too
distant future, just

694
00:33:27,981 --> 00:33:29,530
like in a science fiction movie.

695
00:33:29,530 --> 00:33:31,420
A driver will be
able to hop in a car,

696
00:33:31,420 --> 00:33:33,700
tell it where you want
it to go, and voila,

697
00:33:33,700 --> 00:33:36,440
the car will do the rest.

698
00:33:36,440 --> 00:33:39,640
So right now I'm driving
like everybody does.

699
00:33:39,640 --> 00:33:41,290
My hands are on
the steering wheel

700
00:33:41,290 --> 00:33:45,230
and my foot is on the brake,
or the pedal, as required.

701
00:33:45,230 --> 00:33:47,548
And I'm in control of the car.

702
00:33:47,548 --> 00:34:01,332
But when I take my foot
off the pedal and do this,

703
00:34:01,332 --> 00:34:04,260
now the car is driving itself.

704
00:34:12,560 --> 00:34:13,830
Wow.

705
00:34:13,830 --> 00:34:15,110
This really is amazing.

706
00:34:15,110 --> 00:34:19,350
I feel really safe with the
car doing all of the driving.

707
00:34:19,350 --> 00:34:20,110
OK.

708
00:34:20,110 --> 00:34:20,820
Now watch this.

709
00:34:20,820 --> 00:34:23,010
And this is something
that no one should ever

710
00:34:23,010 --> 00:34:25,909
do in a regular car, ever.

711
00:34:31,810 --> 00:34:32,760
Wow.

712
00:34:32,760 --> 00:34:34,587
That was freaky.

713
00:34:34,587 --> 00:34:35,170
[END PLAYBACK]

714
00:34:35,170 --> 00:34:36,010
AMNON SHASHUA: OK?

715
00:34:36,010 --> 00:34:40,290
So anyone from the young
people know who he is?

716
00:34:40,290 --> 00:34:42,370
So this is Jordy,
from Star Trek.

717
00:34:42,370 --> 00:34:43,480
He had this visor.

718
00:34:43,480 --> 00:34:44,560
He was blind.

719
00:34:44,560 --> 00:34:48,170
He had a visor.

720
00:34:48,170 --> 00:34:48,670
OK.

721
00:34:48,670 --> 00:34:51,400
So let's spend a
few minutes to talk

722
00:34:51,400 --> 00:34:54,040
about what is the impact
of autonomous driving

723
00:34:54,040 --> 00:34:56,090
and how it's going to unfold.

724
00:34:56,090 --> 00:34:58,900
So this is far from
science fiction.

725
00:34:58,900 --> 00:35:01,960
It's actually
unfolding as we speak.

726
00:35:01,960 --> 00:35:06,850
The first hands free driving
on highways is coming out now.

727
00:35:06,850 --> 00:35:08,530
The first one is Tesla.

728
00:35:08,530 --> 00:35:12,010
They have already launched--
they made this public

729
00:35:12,010 --> 00:35:13,670
a week or two ago.

730
00:35:13,670 --> 00:35:17,110
Their first beta drivers
are driving with the system.

731
00:35:17,110 --> 00:35:20,920
And I presume within a month
it will be also installed

732
00:35:20,920 --> 00:35:23,410
to all other drivers.

733
00:35:23,410 --> 00:35:25,470
And this is-- you
can do hands free

734
00:35:25,470 --> 00:35:28,060
when driving on a
highway, unlimited speed.

735
00:35:28,060 --> 00:35:29,520
So you can drive
at highway speeds,

736
00:35:29,520 --> 00:35:35,140
let go of the steering wheel,
and the car will drive.

737
00:35:35,140 --> 00:35:38,500
GM already announced
that middle of 2016

738
00:35:38,500 --> 00:35:40,690
they have the super
cruise, more or less

739
00:35:40,690 --> 00:35:43,570
the same kind of functionality.

740
00:35:43,570 --> 00:35:46,070
Audi also announced 2016.

741
00:35:46,070 --> 00:35:49,720
And these are just
the first comers.

742
00:35:49,720 --> 00:35:53,470
We are working with about
13 car manufacturers that

743
00:35:53,470 --> 00:35:56,530
within the next three
years, three to four years,

744
00:35:56,530 --> 00:35:58,594
having this kind of capability.

745
00:35:58,594 --> 00:36:00,010
So this will be
in the mainstream.

746
00:36:00,010 --> 00:36:02,770
Now what I put there in
red is that the driver

747
00:36:02,770 --> 00:36:05,084
still has primary responsibility
and has to be alert.

748
00:36:05,084 --> 00:36:07,000
That means that the
technology is not perfect.

749
00:36:07,000 --> 00:36:08,230
It could make mistakes.

750
00:36:08,230 --> 00:36:11,090
Therefore, the
driver has to be--

751
00:36:11,090 --> 00:36:12,520
is still the primary--

752
00:36:12,520 --> 00:36:14,350
has the primary responsibility.

753
00:36:14,350 --> 00:36:17,410
So at this stage there's
no disruption here.

754
00:36:17,410 --> 00:36:19,780
It's just a nice
feature to have.

755
00:36:19,780 --> 00:36:21,670
For the car industry,
this is the first step

756
00:36:21,670 --> 00:36:26,290
to start practicing towards
reaching autonomous driving.

757
00:36:26,290 --> 00:36:30,140
The second step starts 2016, and
this is with the eight cameras

758
00:36:30,140 --> 00:36:33,700
that I showed you slide before.

759
00:36:33,700 --> 00:36:36,411
Here, the car can drive
autonomously from highway

760
00:36:36,411 --> 00:36:36,910
to highway.

761
00:36:36,910 --> 00:36:40,220
So on ramp, off ramps,
are done autonomously.

762
00:36:40,220 --> 00:36:44,200
So you-- you with Google Maps or
whatever navigation program you

763
00:36:44,200 --> 00:36:49,240
chart your route, and until the
car reaches city boundaries it

764
00:36:49,240 --> 00:36:50,400
will go all--

765
00:36:50,400 --> 00:36:52,090
go autonomously.

766
00:36:52,090 --> 00:36:54,850
From highway to highway it will
switch from highway to highway

767
00:36:54,850 --> 00:36:57,340
and do that autonomously.

768
00:36:57,340 --> 00:37:01,000
Still, the driver has primary
responsibility, and is alert.

769
00:37:01,000 --> 00:37:04,720
So there's no-- nothing
here is transformative.

770
00:37:04,720 --> 00:37:06,070
It's a nice feature.

771
00:37:06,070 --> 00:37:09,670
Again, it's part of a phased
approach of the car industry

772
00:37:09,670 --> 00:37:12,250
to start practicing.

773
00:37:12,250 --> 00:37:16,765
Starting from 2018, would come
the first small disruption.

774
00:37:16,765 --> 00:37:20,080
The first small disruption
is that technology

775
00:37:20,080 --> 00:37:23,790
would reach a level in
which driver is responsible.

776
00:37:23,790 --> 00:37:27,600
The driver must be there
but not necessarily alert.

777
00:37:27,600 --> 00:37:31,090
So it means that the
driver is an attendant.

778
00:37:31,090 --> 00:37:33,430
The driver is monitoring
just like a pilot

779
00:37:33,430 --> 00:37:36,220
sitting in an airplane while
the plane is in auto-pilot.

780
00:37:36,220 --> 00:37:38,560
The driver needs to be there
in case there is a problem.

781
00:37:38,560 --> 00:37:40,672
The system will give
a grace period of time

782
00:37:40,672 --> 00:37:42,505
until the driver needs
to take back control.

783
00:37:42,505 --> 00:37:47,130
So it's not taking control
in instant-- immediately.

784
00:37:47,130 --> 00:37:50,200
And so this transition
from primary responsibility

785
00:37:50,200 --> 00:37:52,990
to monitoring, like
in aviation, will be

786
00:37:52,990 --> 00:37:56,297
the first disruption, the
beginnings of a disruption.

787
00:37:56,297 --> 00:37:58,630
So let's try to imagine what
kind of disruption this is.

788
00:37:58,630 --> 00:38:01,180
So let's take Uber
as an example.

789
00:38:01,180 --> 00:38:05,200
So today you have free time.

790
00:38:05,200 --> 00:38:06,560
So you own a car.

791
00:38:06,560 --> 00:38:09,280
You have a free time, say
between 3:00 PM to 5:00 PM

792
00:38:09,280 --> 00:38:13,960
So you take your car and
apply Uber and take passengers

793
00:38:13,960 --> 00:38:16,030
and earn some money.

794
00:38:16,030 --> 00:38:17,290
That's Uber today.

795
00:38:17,290 --> 00:38:21,070
Now let's look at 2018 - 2019.

796
00:38:21,070 --> 00:38:23,380
You have zero skills and
you don't have a car.

797
00:38:23,380 --> 00:38:25,420
All what you have, you
have a driver license.

798
00:38:25,420 --> 00:38:28,480
So you are willing
to be an attendant.

799
00:38:28,480 --> 00:38:30,900
So you say, OK, now
I have free time.

800
00:38:30,900 --> 00:38:33,559
An Uber car would come
with an attendant.

801
00:38:33,559 --> 00:38:35,100
You switch places
with the attendant.

802
00:38:35,100 --> 00:38:37,867
You sit behind the steering
wheel and you do nothing.

803
00:38:37,867 --> 00:38:38,950
You don't control the car.

804
00:38:38,950 --> 00:38:41,190
You don't control the
passengers who are coming

805
00:38:41,190 --> 00:38:43,660
who are being taken by the car.

806
00:38:43,660 --> 00:38:45,580
You simply sit there.

807
00:38:45,580 --> 00:38:49,160
Zero skills, therefore your
payment is very, very small.

808
00:38:49,160 --> 00:38:52,750
So now these cars can drive
24/7 because attendant can

809
00:38:52,750 --> 00:38:56,600
be replaced every hour or so.

810
00:38:56,600 --> 00:38:59,830
So here we have
another business model

811
00:38:59,830 --> 00:39:02,620
which makes this public
transportation, the Uber type

812
00:39:02,620 --> 00:39:05,050
of public transportation,
now much more powerful

813
00:39:05,050 --> 00:39:06,740
than it is today.

814
00:39:06,740 --> 00:39:09,126
So this is kind of the
beginning of disruption.

815
00:39:09,126 --> 00:39:10,250
What will be the next step?

816
00:39:10,250 --> 00:39:16,630
The next step 2020-2022,
imagine that a driverless car

817
00:39:16,630 --> 00:39:19,730
can drive without passengers.

818
00:39:19,730 --> 00:39:22,480
So this is one step
before you can allow a car

819
00:39:22,480 --> 00:39:23,850
to drive autonomously.

820
00:39:23,850 --> 00:39:26,920
So without passengers means
that all what you need to prove

821
00:39:26,920 --> 00:39:28,510
is that the car,
your car, would not

822
00:39:28,510 --> 00:39:30,860
hit other cars or pedestrians.

823
00:39:30,860 --> 00:39:33,387
But if it hits an
infrastructure nobody gets

824
00:39:33,387 --> 00:39:35,470
killed because there are
no passengers in the car.

825
00:39:35,470 --> 00:39:38,110
No passengers meaning
nobody in the car.

826
00:39:38,110 --> 00:39:40,542
Now this is already
a major disruption

827
00:39:40,542 --> 00:39:43,000
because what it means, it means
that the household does not

828
00:39:43,000 --> 00:39:45,970
need to own multiple cars.

829
00:39:45,970 --> 00:39:46,930
One car is enough.

830
00:39:46,930 --> 00:39:48,280
I drive to work with the car.

831
00:39:48,280 --> 00:39:50,260
I send the car back home.

832
00:39:50,260 --> 00:39:53,470
Takes my wife, take her
to work, comes back home.

833
00:39:53,470 --> 00:39:54,320
You get the picture.

834
00:39:54,320 --> 00:39:57,400
So this is kind of a beginning
of a major disruption.

835
00:39:57,400 --> 00:40:01,960
Then about 2025 - 2030,
sufficient experience

836
00:40:01,960 --> 00:40:04,870
with mapping data, car
to car communication,

837
00:40:04,870 --> 00:40:06,400
one can imagine how
these cars would

838
00:40:06,400 --> 00:40:08,590
be completely autonomously.

839
00:40:08,590 --> 00:40:12,272
And that is where the
major disruption happens.

840
00:40:12,272 --> 00:40:13,420
OK?

841
00:40:13,420 --> 00:40:16,000
So this is autonomous driving.

842
00:40:16,000 --> 00:40:19,900
Let me go to the second part
about wearable computing.

843
00:40:19,900 --> 00:40:21,320
And then we can take questions.

844
00:40:21,320 --> 00:40:24,850
So this will be much shorter.

845
00:40:24,850 --> 00:40:28,300
So again, computer
vision, but now the camera

846
00:40:28,300 --> 00:40:30,350
is not beside us,
like in the car.

847
00:40:30,350 --> 00:40:31,850
The camera is on us.

848
00:40:31,850 --> 00:40:34,930
Now if the camera is on
us, the first question

849
00:40:34,930 --> 00:40:38,280
that you would ask, who
needs a camera to be on you?

850
00:40:38,280 --> 00:40:39,280
Right?

851
00:40:39,280 --> 00:40:42,220
So the first market segment
for something like this

852
00:40:42,220 --> 00:40:45,130
are the blind and
visually impaired.

853
00:40:45,130 --> 00:40:47,690
So the way to imagine this.

854
00:40:47,690 --> 00:40:51,074
you are visually impaired
or a blind person

855
00:40:51,074 --> 00:40:52,990
so you don't see well
or you don't see at all,

856
00:40:52,990 --> 00:40:53,710
or you don't see well.

857
00:40:53,710 --> 00:40:55,209
So it's very, very
difficult for you

858
00:40:55,209 --> 00:40:56,960
to negotiate the visual world.

859
00:40:56,960 --> 00:40:59,980
You cannot read anything unless
it's few centimeters from

860
00:40:59,980 --> 00:41:01,000
your eye.

861
00:41:01,000 --> 00:41:03,220
You cannot recognize people
unless they start talking

862
00:41:03,220 --> 00:41:03,719
to you.

863
00:41:03,719 --> 00:41:05,409
So you can recognize
their voice.

864
00:41:05,409 --> 00:41:07,450
You cannot cross the street
because you don't see

865
00:41:07,450 --> 00:41:08,892
the traffic light.

866
00:41:08,892 --> 00:41:11,350
You cannot go on a bus because
you don't think what the bus

867
00:41:11,350 --> 00:41:12,110
number is.

868
00:41:12,110 --> 00:41:14,930
So basically you are very,
very constrained, very limited.

869
00:41:14,930 --> 00:41:17,495
Now let's assume that you have
a helper standing beside you.

870
00:41:17,495 --> 00:41:20,770
Now this helper is
relatively intelligent

871
00:41:20,770 --> 00:41:24,795
and has correct eyesight.

872
00:41:24,795 --> 00:41:26,680
Now the helper
looks at you, sees

873
00:41:26,680 --> 00:41:30,020
where you are pointing
your hands, for example,

874
00:41:30,020 --> 00:41:32,861
or pointing your gaze,
looks at the scene,

875
00:41:32,861 --> 00:41:35,110
understands what kind of
information you want to know,

876
00:41:35,110 --> 00:41:37,510
and whispers to your
ear the information.

877
00:41:37,510 --> 00:41:39,965
So say you want to catch a bus.

878
00:41:39,965 --> 00:41:42,340
You know that the bus is coming
because you hear the bus,

879
00:41:42,340 --> 00:41:43,970
maybe you see a silhouette.

880
00:41:43,970 --> 00:41:45,220
So you look at that direction.

881
00:41:45,220 --> 00:41:47,500
The helper looks at the bus.

882
00:41:47,500 --> 00:41:48,670
It sees that there is a bus.

883
00:41:48,670 --> 00:41:50,410
Tells you what
the bus number is.

884
00:41:50,410 --> 00:41:51,679
You want to cross the street.

885
00:41:51,679 --> 00:41:53,720
You know the traffic light
is more or less there.

886
00:41:53,720 --> 00:41:55,750
But you cannot-- you don't know
what the color of the traffic

887
00:41:55,750 --> 00:41:56,740
light is.

888
00:41:56,740 --> 00:41:58,334
So the helper
looks at your gaze,

889
00:41:58,334 --> 00:42:00,000
sees that there's a
traffic light there.

890
00:42:00,000 --> 00:42:01,440
Tell you it's a green light.

891
00:42:01,440 --> 00:42:03,730
You're opening a newspaper.

892
00:42:03,730 --> 00:42:05,350
You point someone
on the newspaper,

893
00:42:05,350 --> 00:42:07,650
the helper would
read you the article.

894
00:42:07,650 --> 00:42:09,300
Or there is a street name.

895
00:42:09,300 --> 00:42:11,560
You point towards
the street name.

896
00:42:11,560 --> 00:42:13,960
The helper would
look at the scene,

897
00:42:13,960 --> 00:42:16,010
understand that there
is a text in the wild,

898
00:42:16,010 --> 00:42:18,460
and simply read you
the street name.

899
00:42:18,460 --> 00:42:22,390
A familiar face appears,
the helper will whisper,

900
00:42:22,390 --> 00:42:24,700
you know, Joe has now--
is now in front of you.

901
00:42:24,700 --> 00:42:26,060
And so forth.

902
00:42:26,060 --> 00:42:29,290
So if you have now replaced
this helper with computer vision

903
00:42:29,290 --> 00:42:31,810
you can imagine how
this could help someone

904
00:42:31,810 --> 00:42:34,100
who is visually impaired.

905
00:42:34,100 --> 00:42:35,470
So let me show you--

906
00:42:35,470 --> 00:42:39,530
so first of all, the number of
visually impaired is quite big.

907
00:42:39,530 --> 00:42:43,960
So the number of blind people
in the US is about 1.5 million.

908
00:42:43,960 --> 00:42:45,100
That's not big.

909
00:42:45,100 --> 00:42:48,310
The number of visually
impaired, and it's people that

910
00:42:48,310 --> 00:42:52,180
their ailment cannot be
corrected through lenses,

911
00:42:52,180 --> 00:42:54,400
is about 26 million.

912
00:42:54,400 --> 00:42:55,630
So this is a sizable number.

913
00:42:55,630 --> 00:42:58,750
World wide is above
400 million people

914
00:42:58,750 --> 00:42:59,960
who are visually impaired.

915
00:42:59,960 --> 00:43:02,950
And they don't have much
technology to help them.

916
00:43:02,950 --> 00:43:05,710
So this is what OrCam is doing.

917
00:43:05,710 --> 00:43:08,767
It's a camera which
clips on eyeglasses.

918
00:43:08,767 --> 00:43:11,350
And there is a computing device,
which you put in your pocket.

919
00:43:11,350 --> 00:43:13,240
And the way you interact with
the device is with your hand,

920
00:43:13,240 --> 00:43:14,110
with your finger.

921
00:43:14,110 --> 00:43:16,750
Because the camera is on you
it could see also your hand.

922
00:43:16,750 --> 00:43:19,330
Once you point,
the camera starts

923
00:43:19,330 --> 00:43:21,700
to extract information
from the scene and talks

924
00:43:21,700 --> 00:43:23,527
to you through an earpiece.

925
00:43:23,527 --> 00:43:24,610
So let's look at the clip.

926
00:43:24,610 --> 00:43:24,760
[VIDEO PLAYBACK]

927
00:43:24,760 --> 00:43:25,420
- Hi.

928
00:43:25,420 --> 00:43:28,630
I'm Liette and I'm
visually impaired.

929
00:43:28,630 --> 00:43:32,539
I want to show you today how
this device changed my life.

930
00:43:39,390 --> 00:43:40,781
- Massaryk.

931
00:43:40,781 --> 00:43:41,280
- Great.

932
00:43:41,280 --> 00:43:42,741
Let's go there.

933
00:43:46,150 --> 00:43:46,983
- Red light.

934
00:43:50,230 --> 00:43:51,440
Green light.

935
00:43:57,240 --> 00:43:58,350
50 shekel.

936
00:43:58,350 --> 00:43:59,554
- 50 shekel.

937
00:43:59,554 --> 00:44:00,470
Let's buy some coffee.

938
00:44:05,470 --> 00:44:06,450
- Breakfast.

939
00:44:06,450 --> 00:44:09,087
Bagel plus coffee with
cream cheese [INAUDIBLE]..

940
00:44:09,087 --> 00:44:09,670
[END PLAYBACK]

941
00:44:09,670 --> 00:44:10,440
AMNON SHASHUA: OK?

942
00:44:10,440 --> 00:44:12,530
So you get the idea.

943
00:44:12,530 --> 00:44:14,500
So we started 2010.

944
00:44:14,500 --> 00:44:18,730
2013 we had already
a prototype working.

945
00:44:18,730 --> 00:44:22,027
And we had a visitor, John
Mark from the New York Times,

946
00:44:22,027 --> 00:44:23,860
and he came and he wrote
a very nice article

947
00:44:23,860 --> 00:44:26,002
about what the company is doing.

948
00:44:26,002 --> 00:44:27,460
And we thought that
at that time it

949
00:44:27,460 --> 00:44:30,520
would be good to launch
the website of the company

950
00:44:30,520 --> 00:44:35,060
and try to get a number,
say, 100 first customers,

951
00:44:35,060 --> 00:44:38,580
so that we can start
experimenting, do field studies

952
00:44:38,580 --> 00:44:41,500
with a prototype device.

953
00:44:41,500 --> 00:44:42,670
So we launched the web site.

954
00:44:42,670 --> 00:44:45,100
We wrote that the
device cost $2,500.

955
00:44:45,100 --> 00:44:46,961
That was June 2013.

956
00:44:46,961 --> 00:44:49,210
And the first 100 people who
would purchase the device

957
00:44:49,210 --> 00:44:51,010
will receive the
device in September.

958
00:44:51,010 --> 00:44:54,970
So within an hour those
100 devices were sold.

959
00:44:54,970 --> 00:45:01,210
And then we kept a waiting list,
which today is about 30,000.

960
00:45:01,210 --> 00:45:05,840
And we started shipping the
devices about a month ago.

961
00:45:05,840 --> 00:45:12,970
So in the last year this device
was with about 200 people.

962
00:45:12,970 --> 00:45:16,980
And we got a lot of feedback
from real users and improved.

963
00:45:16,980 --> 00:45:19,150
And let me show you
some real users.

964
00:45:19,150 --> 00:45:22,390
So this is Marcia from Brazil.

965
00:45:22,390 --> 00:45:24,790
The device at the moment
only works in English.

966
00:45:24,790 --> 00:45:27,710
Later we'll put more languages.

967
00:45:27,710 --> 00:45:31,730
And so she's being
trained to use a device.

968
00:45:31,730 --> 00:45:34,560
And this is a short clip
of about two minutes.

969
00:45:34,560 --> 00:45:36,760
And, you know, watch
her body language.

970
00:45:36,760 --> 00:45:40,210
And also she explains
how she copes

971
00:45:40,210 --> 00:45:42,910
with her disability, especially
how she distinguishes

972
00:45:42,910 --> 00:45:44,910
between different money notes.

973
00:45:44,910 --> 00:45:45,660
They're all green.

974
00:45:45,660 --> 00:45:47,522
So how do you
distinguish between them?

975
00:45:47,522 --> 00:45:48,730
So let's have a look at this.

976
00:45:54,040 --> 00:45:57,780
So the device is reading
the newspaper for her.

977
00:45:57,780 --> 00:46:01,576
[VIDEO PLAYBACK]

978
00:46:01,576 --> 00:46:11,038
- [INAUDIBLE]

979
00:46:11,038 --> 00:46:13,030
- $50.

980
00:46:13,030 --> 00:46:13,530
- $50.

981
00:46:13,530 --> 00:46:15,930
Cincuenta dollars.

982
00:46:15,930 --> 00:46:16,560
- Cincuenta.

983
00:46:16,560 --> 00:46:18,790
Let's see if [INAUDIBLE].

984
00:46:22,661 --> 00:46:24,030
- It green.

985
00:46:24,030 --> 00:46:36,950
All green and I put mark
color, yellow, green, orange.

986
00:46:36,950 --> 00:46:38,180
Different note.

987
00:46:41,302 --> 00:46:53,144
[INAUDIBLE]

988
00:46:53,144 --> 00:46:54,126
- $20.

989
00:46:57,072 --> 00:46:59,036
[? Genia ?]

990
00:46:59,036 --> 00:47:03,946
- [INAUDIBLE]

991
00:47:03,946 --> 00:47:05,420
[END PLAYBACK]

992
00:47:05,420 --> 00:47:07,130
AMNON SHASHUA: OK.

993
00:47:07,130 --> 00:47:10,570
Here's a recent-- from CNN.

994
00:47:10,570 --> 00:47:14,330
It was aired a month ago.

995
00:47:14,330 --> 00:47:16,670
It also gives a bit more
information about the device.

996
00:47:16,670 --> 00:47:17,470
Let's run this.

997
00:47:17,470 --> 00:47:18,820
It's again two minutes.

998
00:47:18,820 --> 00:47:19,486
[VIDEO PLAYBACK]

999
00:47:19,486 --> 00:47:22,630
- Two weekends ago I sat down
and read The New York Times.

1000
00:47:22,630 --> 00:47:25,650
I haven't done that
in maybe 30 years.

1001
00:47:25,650 --> 00:47:26,980
My wife came down.

1002
00:47:26,980 --> 00:47:28,250
I had a cup of coffee.

1003
00:47:28,250 --> 00:47:31,120
I'm reading The New York
Times and she was crying.

1004
00:47:31,120 --> 00:47:35,080
- Just being able to read again
is emotional for Howard Turman.

1005
00:47:35,080 --> 00:47:37,600
He started losing his
vision as a child.

1006
00:47:37,600 --> 00:47:40,150
His new glasses
don't fix his eyes

1007
00:47:40,150 --> 00:47:42,240
but they do the next best thing.

1008
00:47:42,240 --> 00:47:45,600
- Put on my glasses, it
recognizes the finger,

1009
00:47:45,600 --> 00:47:46,780
snaps the picture.

1010
00:47:49,620 --> 00:47:50,920
Now it just reads.

1011
00:47:50,920 --> 00:47:54,490
- The glasses have a camera
that recognizes text and can

1012
00:47:54,490 --> 00:47:56,300
read the world to him.

1013
00:47:56,300 --> 00:47:58,360
- Pull here.

1014
00:47:58,360 --> 00:48:02,440
- The technology is called OrCam
and Turman says it gives him

1015
00:48:02,440 --> 00:48:04,090
a sense of normalcy.

1016
00:48:04,090 --> 00:48:07,180
- Even finding out that Dunkin'
Donuts has a donut I never

1017
00:48:07,180 --> 00:48:09,430
tried was exciting.

1018
00:48:09,430 --> 00:48:11,530
- Dunkin' Donuts.

1019
00:48:11,530 --> 00:48:13,360
- It's a clip on camera.

1020
00:48:13,360 --> 00:48:17,230
So a camera that you can
clip onto any eyeglasses.

1021
00:48:17,230 --> 00:48:19,270
And you have here a
computing device, which

1022
00:48:19,270 --> 00:48:21,120
you can put in your pocket.

1023
00:48:21,120 --> 00:48:24,370
And the way it interacts,
it's with a hand gesture.

1024
00:48:24,370 --> 00:48:26,580
For example, it's written
there, rental and tours.

1025
00:48:29,680 --> 00:48:31,570
- Rentals and tours.

1026
00:48:31,570 --> 00:48:33,160
- It's not perfect though.

1027
00:48:33,160 --> 00:48:36,010
It uses a pretty bulky
cable and sometimes it

1028
00:48:36,010 --> 00:48:38,017
needs a few tries
to get things right.

1029
00:48:38,017 --> 00:48:40,350
- It doesn't read script
because everybody's handwriting

1030
00:48:40,350 --> 00:48:40,960
is different.

1031
00:48:40,960 --> 00:48:45,060
So it doesn't do cursive
very well at all.

1032
00:48:45,060 --> 00:48:47,650
- OrCam has a harder
time in bright light,

1033
00:48:47,650 --> 00:48:51,200
or in tougher situations,
like signs on windows.

1034
00:48:51,200 --> 00:48:54,510
- [INAUDIBLE] U donuts
hours of operation.

1035
00:48:54,510 --> 00:48:55,780
Low PM.

1036
00:48:55,780 --> 00:48:56,950
Pound's PM.

1037
00:48:56,950 --> 00:48:58,240
9:00 PM.

1038
00:48:58,240 --> 00:49:00,910
How was your service today?

1039
00:49:00,910 --> 00:49:03,790
- Shashua says improvements
are on the way.

1040
00:49:03,790 --> 00:49:06,700
Where do you see this technology
going over the long term?

1041
00:49:06,700 --> 00:49:09,100
- Reading, recognizing
faces, recognizing products,

1042
00:49:09,100 --> 00:49:10,230
is only the beginning.

1043
00:49:10,230 --> 00:49:13,510
Where we want to get is
complete visual understanding

1044
00:49:13,510 --> 00:49:15,050
at the level of
human perception,

1045
00:49:15,050 --> 00:49:17,590
such that if you
are disoriented you

1046
00:49:17,590 --> 00:49:19,530
can start understanding
what's around you.

1047
00:49:19,530 --> 00:49:20,920
For example, where's the door?

1048
00:49:20,920 --> 00:49:21,750
The door is there.

1049
00:49:21,750 --> 00:49:22,583
Where is the window?

1050
00:49:22,583 --> 00:49:26,844
Where is an opening in
the space around me?

1051
00:49:26,844 --> 00:49:27,700
OK?

1052
00:49:27,700 --> 00:49:28,940
This is face recognition.

1053
00:49:28,940 --> 00:49:30,540
So again, one of the first 100.

1054
00:49:30,540 --> 00:49:32,891
- Teach OrCam to
recognize anybody?

1055
00:49:32,891 --> 00:49:33,390
- Yep.

1056
00:49:33,390 --> 00:49:35,410
- Who does it know?

1057
00:49:35,410 --> 00:49:36,300
- Libby, my mother.

1058
00:49:36,300 --> 00:49:38,114
- You want to show me?

1059
00:49:38,114 --> 00:49:39,105
- Yep.

1060
00:49:39,105 --> 00:49:39,605
OK.

1061
00:49:43,084 --> 00:49:44,078
- All right.

1062
00:49:44,078 --> 00:49:46,066
[INAUDIBLE] Let's see.

1063
00:49:46,066 --> 00:49:50,042
[INAUDIBLE]

1064
00:49:50,042 --> 00:49:51,944
- Libby.

1065
00:49:51,944 --> 00:49:52,527
[END PLAYBACK]

1066
00:49:52,527 --> 00:49:53,277
AMNON SHASHUA: OK?

1067
00:49:53,277 --> 00:49:55,450
So that's also face recognition.

1068
00:49:55,450 --> 00:49:57,690
Last two slides.

1069
00:49:57,690 --> 00:50:01,865
We started also providing the
device to research groups.

1070
00:50:01,865 --> 00:50:05,330
And this is one of-- this is
a paper in ARVO where they

1071
00:50:05,330 --> 00:50:08,970
took eight visually
impaired and gave them

1072
00:50:08,970 --> 00:50:10,190
the device for one month.

1073
00:50:10,190 --> 00:50:13,256
And then measured the
change in quality of life.

1074
00:50:13,256 --> 00:50:15,380
And how they measure the
change of quality of life,

1075
00:50:15,380 --> 00:50:18,290
they interview them.

1076
00:50:18,290 --> 00:50:21,470
And seven out of the eight
reported significant change

1077
00:50:21,470 --> 00:50:22,390
in quality of life.

1078
00:50:22,390 --> 00:50:25,390
Now they sent us some
of the interviews.

1079
00:50:25,390 --> 00:50:26,660
So on the next--

1080
00:50:26,660 --> 00:50:30,929
here, I'm showing you
part of the interview.

1081
00:50:30,929 --> 00:50:32,720
And what's interesting
about this interview

1082
00:50:32,720 --> 00:50:34,880
is that there is
a trick question.

1083
00:50:34,880 --> 00:50:37,130
The interviewer,
after she tells him

1084
00:50:37,130 --> 00:50:40,520
how the device is,
you know, lifesaving

1085
00:50:40,520 --> 00:50:44,630
and so forth, he tells her, well
the device is very expensive.

1086
00:50:44,630 --> 00:50:46,190
It's a few thousands of dollars.

1087
00:50:46,190 --> 00:50:47,330
Is it worth it?

1088
00:50:47,330 --> 00:50:49,640
So it's one thing to get
something for free and say,

1089
00:50:49,640 --> 00:50:50,870
it's very, very good.

1090
00:50:50,870 --> 00:50:53,630
Another thing's is it's going to
cost you thousands of dollars,

1091
00:50:53,630 --> 00:50:54,390
is it worth it?

1092
00:50:54,390 --> 00:50:55,670
And let's hear her answer,
which is very nice.

1093
00:50:55,670 --> 00:50:56,336
[VIDEO PLAYBACK]

1094
00:50:56,336 --> 00:50:58,840
- In the first few
days I had the OrCam

1095
00:50:58,840 --> 00:51:02,900
I was in total awe of it
because for the first time

1096
00:51:02,900 --> 00:51:06,890
I was able to open
mail and read it,

1097
00:51:06,890 --> 00:51:09,520
instead of having my
husband read my mail.

1098
00:51:09,520 --> 00:51:13,070
And I was able to
go to a restaurant

1099
00:51:13,070 --> 00:51:18,700
and actually read the menu and
order myself with the waitress.

1100
00:51:18,700 --> 00:51:20,300
And that was exciting.

1101
00:51:20,300 --> 00:51:23,890
When you can't do something
for such a long period of time,

1102
00:51:23,890 --> 00:51:26,750
the OrCam was incredible.

1103
00:51:26,750 --> 00:51:28,570
- Believe is what
the estimate is.

1104
00:51:28,570 --> 00:51:31,130
Do you think such
a high price would

1105
00:51:31,130 --> 00:51:33,530
be something people
would be willing to pay

1106
00:51:33,530 --> 00:51:35,160
for a device like this?

1107
00:51:35,160 --> 00:51:39,034
Do you think it's marginally
worth it right now?

1108
00:51:39,034 --> 00:51:41,330
- I think you're going
to find that that's going

1109
00:51:41,330 --> 00:51:43,920
to be on a case by case basis.

1110
00:51:43,920 --> 00:51:46,430
You know, people who have
money there's certainly

1111
00:51:46,430 --> 00:51:49,430
no problem $2,000.

1112
00:51:49,430 --> 00:51:50,930
I don't have money.

1113
00:51:50,930 --> 00:51:52,870
I am low income.

1114
00:51:52,870 --> 00:51:56,300
But I would save my money,
scrape it together in order

1115
00:51:56,300 --> 00:51:58,566
to get it at $2,000.

1116
00:51:58,566 --> 00:51:59,260
[END PLAYBACK]

1117
00:51:59,260 --> 00:52:02,110
AMNON SHASHUA: So
that's interesting.

1118
00:52:02,110 --> 00:52:03,410
Where is it going?

1119
00:52:03,410 --> 00:52:06,590
So there are two
lines of progress.

1120
00:52:06,590 --> 00:52:10,370
One, is when this
existing niche is

1121
00:52:10,370 --> 00:52:14,780
to make the camera understand
the visual field at higher

1122
00:52:14,780 --> 00:52:15,750
levels of detail.

1123
00:52:15,750 --> 00:52:18,530
So one of the things that
we are now working on

1124
00:52:18,530 --> 00:52:21,140
is, we call this chatting mode.

1125
00:52:21,140 --> 00:52:24,110
So it's like-- it's
the image and notation

1126
00:52:24,110 --> 00:52:27,380
type of experiment, or
the ImageNet together

1127
00:52:27,380 --> 00:52:30,470
with natural
language processing.

1128
00:52:30,470 --> 00:52:34,880
Say you are visually impaired
or blind and you're disoriented.

1129
00:52:34,880 --> 00:52:36,740
You don't know where you are.

1130
00:52:36,740 --> 00:52:38,990
So you would like
the device to tell

1131
00:52:38,990 --> 00:52:40,670
you every second what it sees.

1132
00:52:40,670 --> 00:52:41,780
I see here Tommy.

1133
00:52:41,780 --> 00:52:42,710
I see here chairs.

1134
00:52:42,710 --> 00:52:44,020
I see here another person.

1135
00:52:44,020 --> 00:52:46,520
I see here a wall, an opening,
a painting, blah, blah, blah,

1136
00:52:46,520 --> 00:52:49,850
blah, blah, until you get back
your sense of orientation.

1137
00:52:49,850 --> 00:52:51,690
So you want the device
to be able to have

1138
00:52:51,690 --> 00:52:53,450
say, several thousands
of categories,

1139
00:52:53,450 --> 00:52:58,100
like in ImageNet, together with
image annotation capability.

1140
00:52:58,100 --> 00:53:02,180
The kind of stuff that people
are now writing articles about.

1141
00:53:02,180 --> 00:53:05,240
And being able to do
this at the frame rate of

1142
00:53:05,240 --> 00:53:06,500
say, once per second.

1143
00:53:06,500 --> 00:53:08,960
So wherever I'm looking
at tell me what--

1144
00:53:08,960 --> 00:53:09,530
what you see.

1145
00:53:09,530 --> 00:53:12,020
This is one-- another thing
is to have natural language

1146
00:53:12,020 --> 00:53:14,380
processing, NLP ability.

1147
00:53:14,380 --> 00:53:18,686
For example, if you are
looking at an electricity bill.

1148
00:53:18,686 --> 00:53:20,060
The system would
know that you're

1149
00:53:20,060 --> 00:53:24,950
looking at an electricity bill
and give you just the short--

1150
00:53:24,950 --> 00:53:26,674
what is the amount
due, for example.

1151
00:53:26,674 --> 00:53:29,173
The system will tell you are
looking at an electricity bill.

1152
00:53:29,173 --> 00:53:31,220
The amount due is such and such.

1153
00:53:31,220 --> 00:53:35,070
So having more and more
intelligence into the system.

1154
00:53:35,070 --> 00:53:36,270
So this is one area.

1155
00:53:36,270 --> 00:53:40,820
Another area is to go for a
wearable device for people

1156
00:53:40,820 --> 00:53:43,100
with normal sight.

1157
00:53:43,100 --> 00:53:46,260
So here we're talking about, you
know, real wearable computing.

1158
00:53:46,260 --> 00:53:49,490
So this Apple Watch
is wearable computing.

1159
00:53:49,490 --> 00:53:51,285
But doesn't do much computing.

1160
00:53:51,285 --> 00:53:51,920
Right?

1161
00:53:51,920 --> 00:53:56,030
It displays, you know my text
messages, emails, you know,

1162
00:53:56,030 --> 00:53:59,074
measures certain biometrics.

1163
00:53:59,074 --> 00:54:00,740
But that's not, you
know, the holy grail

1164
00:54:00,740 --> 00:54:02,030
of wearable computing.

1165
00:54:02,030 --> 00:54:03,800
The holy grail of
wearable computing

1166
00:54:03,800 --> 00:54:07,580
is assume that you had
Siri with eyes and ears.

1167
00:54:07,580 --> 00:54:11,600
So you had a camera on you
that is observing the scene all

1168
00:54:11,600 --> 00:54:15,710
the time and providing you
real time information whenever

1169
00:54:15,710 --> 00:54:19,030
you need the information,
like the people that you meet.

1170
00:54:19,030 --> 00:54:22,040
What were the recent tweets
of those people that you met?

1171
00:54:22,040 --> 00:54:23,780
What is common
between you and them

1172
00:54:23,780 --> 00:54:26,350
based on Facebook and
LinkedIn and so forth?

1173
00:54:26,350 --> 00:54:28,100
So knowing more about
the people you meet.

1174
00:54:28,100 --> 00:54:30,410
Knowing more about the
stuff that you are doing.

1175
00:54:30,410 --> 00:54:32,480
And creating an
archive of all what you

1176
00:54:32,480 --> 00:54:35,509
are doing throughout the day.

1177
00:54:35,509 --> 00:54:36,800
And this is a device like this.

1178
00:54:36,800 --> 00:54:40,400
This is how it looks like.

1179
00:54:40,400 --> 00:54:42,410
We call it Cassie.

1180
00:54:42,410 --> 00:54:43,580
So this is a real device.

1181
00:54:46,220 --> 00:54:51,150
It works continuously
for about 13 hours.

1182
00:54:51,150 --> 00:54:54,170
So you have a camera working
continuously for 13 hours.

1183
00:54:54,170 --> 00:54:56,150
And the purpose of this camera--

1184
00:54:56,150 --> 00:54:58,120
so the way-- you
put it like this.

1185
00:54:58,120 --> 00:54:59,060
OK?

1186
00:54:59,060 --> 00:55:01,920
So the purpose of the camera
is not to take pictures.

1187
00:55:01,920 --> 00:55:03,410
It doesn't store any picture.

1188
00:55:03,410 --> 00:55:06,110
The purpose of the
camera is to be a sensor,

1189
00:55:06,110 --> 00:55:09,470
is to interpret the visual
world and provide information

1190
00:55:09,470 --> 00:55:11,190
in real time.

1191
00:55:11,190 --> 00:55:13,070
And if everything
goes well, we'll

1192
00:55:13,070 --> 00:55:16,940
start launching this
within six months from now.

1193
00:55:16,940 --> 00:55:20,510
So this is the next big thing,
to go into wearable computing,

1194
00:55:20,510 --> 00:55:24,800
to go into a domain in which a
camera is on you and processing

1195
00:55:24,800 --> 00:55:26,280
information all the time.

1196
00:55:26,280 --> 00:55:29,530
Unlike now, a camera on
your smartphone in which

1197
00:55:29,530 --> 00:55:31,860
you take a picture on demand.

1198
00:55:31,860 --> 00:55:33,507
It's not working for
you all the time.

1199
00:55:33,507 --> 00:55:35,090
Here it's working
for me all the time.

1200
00:55:35,090 --> 00:55:37,490
All the time it's
viewing the visual field.

1201
00:55:37,490 --> 00:55:39,440
Whenever it finds
something interesting

1202
00:55:39,440 --> 00:55:41,330
it will send it
to my smartphone,

1203
00:55:41,330 --> 00:55:45,590
like people that I meet and
other activities that I do.

1204
00:55:45,590 --> 00:55:49,970
So this will be the beginning
of real wearable computing.

1205
00:55:49,970 --> 00:55:51,505
So wearable computing
with sensing,

1206
00:55:51,505 --> 00:55:55,310
with the ability to hear
and listen, to hear and see,

1207
00:55:55,310 --> 00:55:57,450
and process information
in real time.

1208
00:55:57,450 --> 00:55:59,240
This is the next thing that--

1209
00:55:59,240 --> 00:56:02,980
the next big challenge
that we are working on.