1 00:00:01,680 --> 00:00:04,080 The following content is provided under a Creative 2 00:00:04,080 --> 00:00:05,620 Commons license. 3 00:00:05,620 --> 00:00:07,920 Your support will help MIT OpenCourseWare 4 00:00:07,920 --> 00:00:12,280 continue to offer high quality educational resources for free. 5 00:00:12,280 --> 00:00:14,910 To make a donation or view additional materials 6 00:00:14,910 --> 00:00:18,120 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:18,120 --> 00:00:19,325 at ocw.mit.edu. 8 00:00:22,340 --> 00:00:24,090 AMNON SHASHUA: So unlike most of the talks 9 00:00:24,090 --> 00:00:25,470 that you have been given, I'm not 10 00:00:25,470 --> 00:00:27,720 going to teach you anything today. 11 00:00:27,720 --> 00:00:30,440 So it's not kind going to be teaching type of talk. 12 00:00:30,440 --> 00:00:33,600 It will be more towards let's look at the crystal ball 13 00:00:33,600 --> 00:00:37,380 and try to see how the future will unfold 14 00:00:37,380 --> 00:00:42,240 and future where computer vision is a major agent 15 00:00:42,240 --> 00:00:45,160 in this transformative future. 16 00:00:45,160 --> 00:00:48,630 So I'll start with transportation. 17 00:00:48,630 --> 00:00:51,270 This is the field where Mobileye is active. 18 00:00:51,270 --> 00:00:54,090 And then I'll move towards wearable computing, the field 19 00:00:54,090 --> 00:00:55,230 where OrCam is active. 20 00:00:55,230 --> 00:00:58,830 These are two companies that I co-founded. 21 00:00:58,830 --> 00:01:04,610 One, Mobileye in 1999 and OrCam in 2010. 22 00:01:04,610 --> 00:01:08,250 So before I just-- a few words about the computer vision. 23 00:01:08,250 --> 00:01:11,940 I'm assuming that you all know about computer vision. 24 00:01:11,940 --> 00:01:13,630 It's the science of making computers 25 00:01:13,630 --> 00:01:17,640 see and extract meaning out of images, out of video. 26 00:01:17,640 --> 00:01:22,320 This is a field that in the past 20 years, 27 00:01:22,320 --> 00:01:25,230 through machine learning, has made a big jump. 28 00:01:25,230 --> 00:01:28,710 And in the past four years, through deep learning, 29 00:01:28,710 --> 00:01:32,490 has made another jump where there are certain narrow areas 30 00:01:32,490 --> 00:01:34,350 in computer vision and perception 31 00:01:34,350 --> 00:01:37,380 where computers reach human level perception 32 00:01:37,380 --> 00:01:38,550 and even surpass it. 33 00:01:38,550 --> 00:01:42,240 Like facial recognition is one of those areas. 34 00:01:42,240 --> 00:01:45,420 And the belief is that in many narrow areas in computer vision 35 00:01:45,420 --> 00:01:48,340 within the next five years we'll be able to reach human level 36 00:01:48,340 --> 00:01:49,600 perception. 37 00:01:49,600 --> 00:01:51,490 So it's a major branch of AI. 38 00:01:51,490 --> 00:01:53,760 It goes together with machine learning 39 00:01:53,760 --> 00:01:56,250 and as said, a major progress. 40 00:01:56,250 --> 00:02:00,420 And one very important thing, which 41 00:02:00,420 --> 00:02:06,330 is relevant to the industrial impact of computer vision 42 00:02:06,330 --> 00:02:08,340 is that cameras are the lowest cost 43 00:02:08,340 --> 00:02:09,810 sensor that you can imagine. 44 00:02:09,810 --> 00:02:13,050 A camera sensor costs a few dollars. 45 00:02:13,050 --> 00:02:14,310 A lens costs a few dollars. 46 00:02:14,310 --> 00:02:15,540 All the rest is computing. 47 00:02:15,540 --> 00:02:17,340 And every sensor needs computing. 48 00:02:17,340 --> 00:02:20,700 So if you can reach human level perception with a camera 49 00:02:20,700 --> 00:02:24,000 you have a sensor that the cost is so low 50 00:02:24,000 --> 00:02:26,240 that it can be everywhere. 51 00:02:26,240 --> 00:02:29,560 And that this is very-- this is very important. 52 00:02:29,560 --> 00:02:35,070 So I'll show you where things are standing in terms 53 00:02:35,070 --> 00:02:36,240 of avoiding a collision. 54 00:02:36,240 --> 00:02:38,430 So avoiding collision, you have a camera. 55 00:02:38,430 --> 00:02:42,300 Behind the windscreen looking, facing forward, 56 00:02:42,300 --> 00:02:46,290 analyzing the video coming from the camera. 57 00:02:46,290 --> 00:02:49,230 And the purpose of this analysis is to avoid collisions. 58 00:02:49,230 --> 00:02:51,230 So what does it mean to avoid collisions? 59 00:02:51,230 --> 00:02:54,000 The software needs to detect vehicles, 60 00:02:54,000 --> 00:02:58,290 it need to detect pedestrians, need to detect traffic lines, 61 00:02:58,290 --> 00:03:01,620 traffic signs, need to detect traffic lights, 62 00:03:01,620 --> 00:03:05,160 detect lanes, to know where the car is positioned relative 63 00:03:05,160 --> 00:03:06,510 to the lanes. 64 00:03:06,510 --> 00:03:09,150 And then send a signal to the car control systems 65 00:03:09,150 --> 00:03:10,840 to avoid an accident. 66 00:03:10,840 --> 00:03:13,680 So let's look under the hood what this means. 67 00:03:13,680 --> 00:03:18,280 So I'll let this run a bit until all the information appears. 68 00:03:18,280 --> 00:03:20,380 So if we stop here, what do we see? 69 00:03:20,380 --> 00:03:24,090 So the bounding boxes around cars 70 00:03:24,090 --> 00:03:26,130 means that the system has detect cars. 71 00:03:26,130 --> 00:03:29,550 Red means that this vehicle is in our path. 72 00:03:29,550 --> 00:03:34,140 The green line here is the detection of the lane. 73 00:03:34,140 --> 00:03:35,640 This is a traffic-- 74 00:03:35,640 --> 00:03:37,260 this is a no entry traffic sign. 75 00:03:37,260 --> 00:03:40,165 This is a traffic light being detected here. 76 00:03:40,165 --> 00:03:41,790 These are the pedestrians and cyclists. 77 00:03:44,580 --> 00:03:48,100 Even a pedestrian standing here is being detected. 78 00:03:48,100 --> 00:03:51,101 Let's no-- let this run a bit further. 79 00:03:51,101 --> 00:03:51,600 All right. 80 00:03:51,600 --> 00:03:54,480 So these pedestrians crossing the street. 81 00:03:54,480 --> 00:03:58,330 So this is running at about 36 frames per second. 82 00:03:58,330 --> 00:04:00,900 So now imagine also the amount of computations 83 00:04:00,900 --> 00:04:03,135 that are being running here. 84 00:04:03,135 --> 00:04:07,350 Again, this is the traffic sign, traffic light, pedestrians, 85 00:04:07,350 --> 00:04:09,090 pedestrians here. 86 00:04:09,090 --> 00:04:12,690 So this is-- this is what the system does today, 87 00:04:12,690 --> 00:04:15,840 detect objects, detect lane marks, 88 00:04:15,840 --> 00:04:18,160 measure distances to the objects. 89 00:04:18,160 --> 00:04:20,880 And in case you are about to hit an object, 90 00:04:20,880 --> 00:04:23,020 the car would engage. 91 00:04:23,020 --> 00:04:25,110 Engage, at first it will give warnings. 92 00:04:25,110 --> 00:04:28,920 Then later it will apply automatic autonomous braking 93 00:04:28,920 --> 00:04:32,670 in order to avoid the accident. 94 00:04:32,670 --> 00:04:34,800 And here is a list of many, many functions 95 00:04:34,800 --> 00:04:37,020 that the camera does in terms of detecting 96 00:04:37,020 --> 00:04:39,990 objects and detecting-- trying to interpret 97 00:04:39,990 --> 00:04:46,800 the visual field at details that are increasing over the years. 98 00:04:46,800 --> 00:04:51,910 Now computer vision is also creating a disruption. 99 00:04:51,910 --> 00:04:54,870 So if you would ask an engineer say, 15 years ago, 100 00:04:54,870 --> 00:04:57,720 what is a camera good for in this space? 101 00:04:57,720 --> 00:04:59,100 The engineer would say the camera 102 00:04:59,100 --> 00:05:00,872 is good for detecting lanes. 103 00:05:00,872 --> 00:05:02,580 Because there's no other sensor that can, 104 00:05:02,580 --> 00:05:07,410 you know, find the lane marks, not a radar, not a laser 105 00:05:07,410 --> 00:05:08,280 scanner. 106 00:05:08,280 --> 00:05:12,270 And it may be good for helping the radar infusion-- radar 107 00:05:12,270 --> 00:05:19,080 camera fusion to compensate for shortcomings of the radar. 108 00:05:19,080 --> 00:05:21,990 Traffic signs, OK it, will be good for traffic signs. 109 00:05:21,990 --> 00:05:23,170 But that's it. 110 00:05:23,170 --> 00:05:25,800 But what happened over the years is 111 00:05:25,800 --> 00:05:29,400 that the camera slowly started taking territory 112 00:05:29,400 --> 00:05:31,200 from the radar. 113 00:05:31,200 --> 00:05:34,260 Until today, the camera is really the primary sensor 114 00:05:34,260 --> 00:05:35,550 for active safety. 115 00:05:35,550 --> 00:05:39,630 Active safety is all this area for avoiding accidents. 116 00:05:39,630 --> 00:05:42,230 And you can see this through this chart. 117 00:05:42,230 --> 00:05:47,790 So in 2007, now we launched the first camera radar fusion. 118 00:05:47,790 --> 00:05:49,170 So there's no disruption there. 119 00:05:49,170 --> 00:05:51,660 This is what normally people would think a camera is good 120 00:05:51,660 --> 00:05:54,000 for, combining with a radar. 121 00:05:54,000 --> 00:05:58,200 2008, camera is also doing traffic sign recognition. 122 00:05:58,200 --> 00:06:00,910 No disruption there. 123 00:06:00,910 --> 00:06:04,500 2010, camera's doing pedestrian detection. 124 00:06:04,500 --> 00:06:06,210 No disruption there because there's 125 00:06:06,210 --> 00:06:08,880 no other sensor that can reliably detect pedestrians. 126 00:06:08,880 --> 00:06:12,270 Because they emit radar very, very weakly. 127 00:06:12,270 --> 00:06:15,210 And pedestrians mostly stationary object. 128 00:06:15,210 --> 00:06:19,770 And radars are not good at detecting stationary objects. 129 00:06:19,770 --> 00:06:24,390 But then in 2011, there's the first camera 130 00:06:24,390 --> 00:06:25,769 only forward collision warning. 131 00:06:25,769 --> 00:06:27,560 And that was the beginning of a disruption. 132 00:06:27,560 --> 00:06:30,270 So forward collision warning is detect a vehicle in front 133 00:06:30,270 --> 00:06:32,796 and provide a warning if you are about to collide 134 00:06:32,796 --> 00:06:33,420 with a vehicle. 135 00:06:33,420 --> 00:06:36,300 And this was a function that typically 136 00:06:36,300 --> 00:06:38,130 was in the territory of radars. 137 00:06:38,130 --> 00:06:41,100 So a radar sensor is very good at detecting vehicles, 138 00:06:41,100 --> 00:06:43,990 very good at ranging, very, very accurately can 139 00:06:43,990 --> 00:06:46,520 get the range of a vehicle, say 100 meters away up 140 00:06:46,520 --> 00:06:48,510 to an accuracy of a few centimeters. 141 00:06:48,510 --> 00:06:51,210 No camera can reach those accuracies. 142 00:06:51,210 --> 00:06:55,710 So nobody believed that one day a camera will take over 143 00:06:55,710 --> 00:06:57,120 the radar and do this function. 144 00:06:57,120 --> 00:06:59,820 And this is what happened in 2011. 145 00:06:59,820 --> 00:07:01,350 And why did this happen? 146 00:07:01,350 --> 00:07:05,790 This happened because of a commercial constraint. 147 00:07:05,790 --> 00:07:08,700 The regulator, the American regulator, 148 00:07:08,700 --> 00:07:11,220 it's National Highway Safety Transportation Agency, 149 00:07:11,220 --> 00:07:18,390 NHSTA, decided that by 2011 all cars need to have as an option 150 00:07:18,390 --> 00:07:21,600 two functions, forward collision warning and lane departure 151 00:07:21,600 --> 00:07:22,980 warning. 152 00:07:22,980 --> 00:07:25,590 Now this creates a problem because forward collision 153 00:07:25,590 --> 00:07:27,062 warning requires a radar. 154 00:07:27,062 --> 00:07:28,770 Lane departure warning requires a camera. 155 00:07:28,770 --> 00:07:32,280 So now put two sensors in the car, it's expensive. 156 00:07:32,280 --> 00:07:34,560 If you can do it with one sensor, like with a camera, 157 00:07:34,560 --> 00:07:36,430 then you save a lot of money. 158 00:07:36,430 --> 00:07:40,560 So this pushed the car industry to adopt the idea 159 00:07:40,560 --> 00:07:43,020 that the camera can do forward collision warning. 160 00:07:43,020 --> 00:07:45,520 And like all disruptions, once you start small 161 00:07:45,520 --> 00:07:47,950 you kind of grow very, very fast. 162 00:07:47,950 --> 00:07:51,270 So in 2013 the camera is not only 163 00:07:51,270 --> 00:07:55,800 providing warning, but also safe distance 164 00:07:55,800 --> 00:07:57,220 keeping to the car in front. 165 00:07:57,220 --> 00:07:59,400 It's called adaptive cruise control. 166 00:07:59,400 --> 00:08:02,410 Then 2013 also provides emergency braking. 167 00:08:02,410 --> 00:08:04,755 So the camera not only decides that you're 168 00:08:04,755 --> 00:08:06,630 about to collide with a vehicle, it will also 169 00:08:06,630 --> 00:08:08,460 apply the brakes for you. 170 00:08:08,460 --> 00:08:11,350 So 2013 was only partial braking. 171 00:08:11,350 --> 00:08:15,340 So to avoid the accident up to 30 kilometers per hour. 172 00:08:15,340 --> 00:08:18,810 And then in 2015, this was few months ago, 173 00:08:18,810 --> 00:08:21,300 the camera is now involved in full braking. 174 00:08:21,300 --> 00:08:24,420 It's one G-Force of braking avoiding an accident of about 175 00:08:24,420 --> 00:08:26,790 70 - 80 kilometers per hour. 176 00:08:26,790 --> 00:08:29,460 And mitigating an accident up to 220 177 00:08:29,460 --> 00:08:31,920 kilometers per hour, just the camera. 178 00:08:31,920 --> 00:08:36,039 So the camera is taking over and becoming the primary sensor 179 00:08:36,039 --> 00:08:41,580 in this area of active safety. 180 00:08:41,580 --> 00:08:43,589 Now, why is that? 181 00:08:43,589 --> 00:08:45,630 As I said, these are the milestones of the camera 182 00:08:45,630 --> 00:08:49,830 disruption, is first the camera has the highest density 183 00:08:49,830 --> 00:08:53,110 of information as a sensor. 184 00:08:53,110 --> 00:08:55,970 You know, laser scanner or radar, 185 00:08:55,970 --> 00:09:00,680 the amount of pixels per angle, per degree 186 00:09:00,680 --> 00:09:01,700 is much, much smaller. 187 00:09:01,700 --> 00:09:03,847 It's orders of magnitude smaller than a camera. 188 00:09:03,847 --> 00:09:06,180 So you have a lot, a lot of information from the camera. 189 00:09:06,180 --> 00:09:07,730 It's the lowest cost sensor. 190 00:09:07,730 --> 00:09:10,940 And also the cameras are getting better in terms of performance 191 00:09:10,940 --> 00:09:12,510 under low light. 192 00:09:12,510 --> 00:09:14,510 So with a camera today you can do much more, 193 00:09:14,510 --> 00:09:16,940 not only because computing has progressed, not only 194 00:09:16,940 --> 00:09:19,550 because algorithms are now better, 195 00:09:19,550 --> 00:09:22,040 but also because the physics of the camera 196 00:09:22,040 --> 00:09:25,010 are progressing over time, especially 197 00:09:25,010 --> 00:09:26,780 the light sensitivity of the camera. 198 00:09:29,370 --> 00:09:33,930 So we also came to the conclusion 199 00:09:33,930 --> 00:09:37,200 that we need to build our own hardware and our own chip. 200 00:09:37,200 --> 00:09:40,250 And these are very, very advanced microprocessors 201 00:09:40,250 --> 00:09:42,930 that they can per silicon area are about 10 times more 202 00:09:42,930 --> 00:09:45,550 efficient than any general purpose chip. 203 00:09:45,550 --> 00:09:48,990 And I'll not spend more time on this. 204 00:09:48,990 --> 00:09:55,530 And so this field has two major trends. 205 00:09:55,530 --> 00:09:59,880 One, on the left hand side, is the active safety, 206 00:09:59,880 --> 00:10:03,430 which is driven by regulators. 207 00:10:03,430 --> 00:10:06,690 So the regulators see that there is a sensor that is 208 00:10:06,690 --> 00:10:10,060 very low cost and saves lives. 209 00:10:10,060 --> 00:10:11,250 So what does the regular do? 210 00:10:11,250 --> 00:10:16,160 They incentivize this kind of function to the car industry 211 00:10:16,160 --> 00:10:19,470 by coupling it to star ratings. 212 00:10:19,470 --> 00:10:23,460 So if you want to get your four star or five stars the NCAP 213 00:10:23,460 --> 00:10:26,580 stars on the car, you have to have this kind of technology 214 00:10:26,580 --> 00:10:28,050 as a standard fit in the car. 215 00:10:28,050 --> 00:10:30,880 So this pushes the industry by mandates. 216 00:10:30,880 --> 00:10:34,080 It pushes the industry to have active safety installed 217 00:10:34,080 --> 00:10:34,710 in every car. 218 00:10:34,710 --> 00:10:40,320 So by 2018 every new car would have such a system. 219 00:10:40,320 --> 00:10:43,080 The left hand side is the trend to the future, 220 00:10:43,080 --> 00:10:46,410 which is autonomous driving. 221 00:10:46,410 --> 00:10:48,600 Now autonomous driving has two facets. 222 00:10:48,600 --> 00:10:53,670 One is bringing the probability of an accident 223 00:10:53,670 --> 00:10:56,610 to infinitesimally small probability. 224 00:10:56,610 --> 00:10:59,460 So zero-- zero accidents. 225 00:10:59,460 --> 00:11:01,740 Because the more you delegate the driving 226 00:11:01,740 --> 00:11:03,420 experience to a robotic system, the less 227 00:11:03,420 --> 00:11:05,910 the chance of an accident. 228 00:11:05,910 --> 00:11:10,440 So it brings us to an era where there will be no accidents. 229 00:11:10,440 --> 00:11:14,160 But not less importantly, it has the potential 230 00:11:14,160 --> 00:11:18,180 to transform the entire transportation business. 231 00:11:18,180 --> 00:11:21,456 How we own cars, how we build cars, 232 00:11:21,456 --> 00:11:23,670 the number of cars that would be produced. 233 00:11:23,670 --> 00:11:28,140 And I'll spend a bit more time about that as I go forward. 234 00:11:28,140 --> 00:11:31,279 Now, in terms of the left hand side, the regulation, 235 00:11:31,279 --> 00:11:32,070 this is an example. 236 00:11:32,070 --> 00:11:38,300 So you see here a Nissan Qashquai 2014 has five stars. 237 00:11:38,300 --> 00:11:40,690 And to know how to get the five stars what you see here, 238 00:11:40,690 --> 00:11:41,690 these are all the tests. 239 00:11:41,690 --> 00:11:44,800 These are autonomous emergency braking tests. 240 00:11:44,800 --> 00:11:47,850 The car needs to detect the car in front, the target car, 241 00:11:47,850 --> 00:11:50,340 and apply the brakes before the collision. 242 00:11:50,340 --> 00:11:52,080 And the car is being tested. 243 00:11:52,080 --> 00:11:56,520 And without that they'll not get the five stars. 244 00:11:56,520 --> 00:12:01,860 You can see this also in the number of chips 245 00:12:01,860 --> 00:12:03,190 that have been shipped. 246 00:12:03,190 --> 00:12:04,830 So every car has a chip. 247 00:12:04,830 --> 00:12:07,050 So this chip, the microprocessor, 248 00:12:07,050 --> 00:12:08,820 is getting the information from the camera 249 00:12:08,820 --> 00:12:12,150 and all the algorithms are on this microprocessor. 250 00:12:12,150 --> 00:12:15,700 So we started launching this in 2007. 251 00:12:15,700 --> 00:12:19,650 So in the first five years there were one million chips, 252 00:12:19,650 --> 00:12:22,080 so one million cars with the technology. 253 00:12:22,080 --> 00:12:25,530 And then in 2013 alone, 1.3 million. 254 00:12:25,530 --> 00:12:28,440 Then you see here, 2014, 2.7. 255 00:12:28,440 --> 00:12:30,780 This year is going to be about five million. 256 00:12:30,780 --> 00:12:32,640 So you see this doubling. 257 00:12:32,640 --> 00:12:35,800 And this is really the effect of the regulation. 258 00:12:35,800 --> 00:12:39,270 So in many industries regulation is an impediment. 259 00:12:39,270 --> 00:12:41,630 In this industry, regulation is something good. 260 00:12:41,630 --> 00:12:46,160 It pushes the industry to install these kinds of systems, 261 00:12:46,160 --> 00:12:49,310 you know, standard. 262 00:12:49,310 --> 00:12:49,920 OK. 263 00:12:49,920 --> 00:12:53,130 So another example of how this is moving, there's 264 00:12:53,130 --> 00:12:55,000 also an increasing awareness. 265 00:12:55,000 --> 00:13:01,170 So this is a commercial from 2014 Super Bowl by Hyundai. 266 00:13:01,170 --> 00:13:05,172 So Hyundai is showcasing their new vehicle called Genesis. 267 00:13:05,172 --> 00:13:06,630 Now, there are many things that you 268 00:13:06,630 --> 00:13:09,256 can show when you want to showcase a new vehicle. 269 00:13:09,256 --> 00:13:11,130 You can talk about the design of the vehicle. 270 00:13:11,130 --> 00:13:14,060 You could talk about the engine, the infotainment. 271 00:13:14,060 --> 00:13:16,350 But they chose to talk about the active safety. 272 00:13:16,350 --> 00:13:18,682 So I'll show you. 273 00:13:18,682 --> 00:13:24,090 [VIDEO PLAYBACK] 274 00:13:24,090 --> 00:13:28,500 - Remember when only Dad could save the day? 275 00:13:28,500 --> 00:13:33,007 Auto emergency braking on the all new Genesis from Hyundai. 276 00:13:33,007 --> 00:13:33,590 [END PLAYBACK] 277 00:13:33,590 --> 00:13:33,860 AMNON SHASHUA: OK. 278 00:13:33,860 --> 00:13:36,140 So this is the camera behind the windscreen, 279 00:13:36,140 --> 00:13:38,370 detecting the car in front, or a pedestrian, 280 00:13:38,370 --> 00:13:41,490 and will break before the collision. 281 00:13:41,490 --> 00:13:44,316 Now to show you what this is about-- 282 00:13:44,316 --> 00:13:45,440 so that was the commercial. 283 00:13:45,440 --> 00:13:49,790 So in a commercial you can show anything you like. 284 00:13:49,790 --> 00:13:52,820 So now I'll show you something really from the field. 285 00:13:52,820 --> 00:13:57,890 So in 2010 Volvo introduced the first pedestrian detection. 286 00:13:57,890 --> 00:13:59,945 So the same thing, detect a pedestrian 287 00:13:59,945 --> 00:14:02,570 and if you are about to collide with a pedestrian the car would 288 00:14:02,570 --> 00:14:06,120 brake, apply the brakes automatically. 289 00:14:06,120 --> 00:14:09,110 So in 2010 they had about 5,000 journalistic events, 290 00:14:09,110 --> 00:14:12,350 where they put a reporter behind the steering wheel, 291 00:14:12,350 --> 00:14:14,350 tell the reporter to drive towards a mannequin, 292 00:14:14,350 --> 00:14:16,940 toward a doll, and low and behold the car 293 00:14:16,940 --> 00:14:18,470 would brake just before-- 294 00:14:18,470 --> 00:14:22,770 a fraction of a second before you hit the doll. 295 00:14:22,770 --> 00:14:25,460 But then when you buy the car you can do your own testing. 296 00:14:25,460 --> 00:14:27,120 So I downloaded from the internet, 297 00:14:27,120 --> 00:14:29,900 its a bunch of Polish guys. 298 00:14:29,900 --> 00:14:32,780 And it's a bit funny but you'll actually 299 00:14:32,780 --> 00:14:35,270 get a good feeling of what this system does 300 00:14:35,270 --> 00:14:36,860 by looking at this clip. 301 00:14:50,164 --> 00:14:50,915 OK? 302 00:14:50,915 --> 00:14:55,217 So this is automatic emergency braking. 303 00:14:55,217 --> 00:14:58,140 Today it works, it avoids accidents 304 00:14:58,140 --> 00:14:59,610 about 70 kilometers per hour. 305 00:15:05,430 --> 00:15:06,170 OK? 306 00:15:06,170 --> 00:15:09,770 So now you have a better idea of what I'm talking about. 307 00:15:09,770 --> 00:15:11,900 So now let's go into the future. 308 00:15:11,900 --> 00:15:13,550 So this was just setting the baseline. 309 00:15:13,550 --> 00:15:14,050 OK. 310 00:15:14,050 --> 00:15:15,890 What is active safety? 311 00:15:15,890 --> 00:15:18,040 Where's computer vision inside this? 312 00:15:18,040 --> 00:15:22,340 So now let's look in the next four years. 313 00:15:22,340 --> 00:15:27,850 And the idea is to evolve this kind of technology 314 00:15:27,850 --> 00:15:30,670 to a point where you can delegate the driving 315 00:15:30,670 --> 00:15:32,620 experience to a robotic system. 316 00:15:32,620 --> 00:15:35,410 And then the question is, what needs to be done. 317 00:15:35,410 --> 00:15:38,110 And this slide shows that there are two paradigms. 318 00:15:38,110 --> 00:15:41,230 And the reality is somewhere in between these two paradigms. 319 00:15:41,230 --> 00:15:44,840 The right hand side is where we are today. 320 00:15:44,840 --> 00:15:46,830 You are based only on sensing. 321 00:15:46,830 --> 00:15:47,560 You have camera. 322 00:15:47,560 --> 00:15:51,400 Maybe you have also a radar, or laser scanner for redundancy. 323 00:15:51,400 --> 00:15:53,110 You get the information from the sensors. 324 00:15:53,110 --> 00:15:56,800 You have algorithms that try to interpret the visual field 325 00:15:56,800 --> 00:15:59,430 and take action in case of an accident, 326 00:15:59,430 --> 00:16:01,840 or control the vehicle. 327 00:16:01,840 --> 00:16:04,950 On the left hand side is the extreme case, 328 00:16:04,950 --> 00:16:07,630 is the Google approach, where there is little 329 00:16:07,630 --> 00:16:08,560 sensing involved. 330 00:16:08,560 --> 00:16:11,380 It's a lot of recording. 331 00:16:11,380 --> 00:16:13,660 So you prerecord your drive. 332 00:16:13,660 --> 00:16:16,420 Once you have prerecorded the drive all what you need to do 333 00:16:16,420 --> 00:16:19,630 is to match your sensing to the prerecorded drive. 334 00:16:19,630 --> 00:16:23,250 Once you've found the match, you know your position exactly. 335 00:16:23,250 --> 00:16:25,707 So you don't need to know to detect lanes. 336 00:16:25,707 --> 00:16:27,040 You know all the moving objects. 337 00:16:27,040 --> 00:16:30,580 Because the recording contains only stationary objects. 338 00:16:30,580 --> 00:16:32,790 So all the moving objects pop out. 339 00:16:32,790 --> 00:16:36,309 So that the load on the sensing is much, much smaller 340 00:16:36,309 --> 00:16:38,350 than in the case where you didn't do it pre-drive 341 00:16:38,350 --> 00:16:39,902 and you didn't record. 342 00:16:39,902 --> 00:16:41,860 This recording, the problem with the recording, 343 00:16:41,860 --> 00:16:43,650 is that we are talking about tons of data. 344 00:16:43,650 --> 00:16:49,330 It's a 360 degree, 3D recording, at several frames per second. 345 00:16:49,330 --> 00:16:51,160 So the amount of data is huge. 346 00:16:51,160 --> 00:16:55,060 So there's issues of how you manage this, how you record it, 347 00:16:55,060 --> 00:16:56,860 and how you update this over time. 348 00:16:56,860 --> 00:17:00,520 Because you have to continuously update this kind of data. 349 00:17:00,520 --> 00:17:04,359 And reality is going to be somewhere in between. 350 00:17:07,099 --> 00:17:12,160 So the first leap that is undergoing and happening 351 00:17:12,160 --> 00:17:16,560 in the next five years is to reach human level perception. 352 00:17:16,560 --> 00:17:18,790 Now it sounds very, very ambitious. 353 00:17:18,790 --> 00:17:22,930 But there's lots of indications that it is not science fiction. 354 00:17:22,930 --> 00:17:26,530 It is really-- there is a very high probability 355 00:17:26,530 --> 00:17:27,880 that one can reach this. 356 00:17:27,880 --> 00:17:30,400 So in certain areas, like face recognition, 357 00:17:30,400 --> 00:17:32,600 certain categorization tasks-- 358 00:17:32,600 --> 00:17:35,230 now if you look at the academic achievements, 359 00:17:35,230 --> 00:17:38,160 they have surpassed human level perception. 360 00:17:38,160 --> 00:17:41,510 I'll spend a few slides on this later. 361 00:17:41,510 --> 00:17:45,010 So going from adaptive-- 362 00:17:45,010 --> 00:17:47,890 going from driver assist to human level perception, 363 00:17:47,890 --> 00:17:50,530 first, we need to extend the list of objects. 364 00:17:50,530 --> 00:17:52,080 Not only vehicles and pedestrians, 365 00:17:52,080 --> 00:17:59,320 but vehicles at any angles, know about 1,000 different object 366 00:17:59,320 --> 00:18:02,950 categories in the scene, know about how 367 00:18:02,950 --> 00:18:07,150 to predict a path using context, which today is not being used. 368 00:18:07,150 --> 00:18:08,860 Detailed road interpretation, knowing 369 00:18:08,860 --> 00:18:10,600 about curbs and barriers and guards, 370 00:18:10,600 --> 00:18:13,690 it's all the stuff that when we look at the road we naturally 371 00:18:13,690 --> 00:18:16,220 interpret it very, very easily. 372 00:18:16,220 --> 00:18:18,730 These are the things that needs to be done in order to reach 373 00:18:18,730 --> 00:18:20,530 human level perception. 374 00:18:20,530 --> 00:18:25,420 And the tool to do this is the deep layered networks, 375 00:18:25,420 --> 00:18:28,840 which I'll spend a few slides about this in a moment. 376 00:18:28,840 --> 00:18:30,430 And the need for context, so these 377 00:18:30,430 --> 00:18:32,725 are examples, for example, path planning. 378 00:18:32,725 --> 00:18:34,240 You want to fuse all the information 379 00:18:34,240 --> 00:18:36,850 available from the image, not only to look for the lanes, 380 00:18:36,850 --> 00:18:38,974 because in many situations you look at an image you 381 00:18:38,974 --> 00:18:40,010 don't see lanes. 382 00:18:40,010 --> 00:18:42,130 But a human observer would very easily 383 00:18:42,130 --> 00:18:46,240 know where the path is just from looking at the context. 384 00:18:46,240 --> 00:18:49,720 In modeling data environment, ultimately every pixel 385 00:18:49,720 --> 00:18:50,574 give you a category. 386 00:18:50,574 --> 00:18:52,240 Tell me where this pixel is coming from, 387 00:18:52,240 --> 00:18:53,948 a pedestrian, from a vehicle, from inside 388 00:18:53,948 --> 00:18:58,210 of a vehicle, barrier, curb, guardrail, lamp post, 389 00:18:58,210 --> 00:19:00,580 so forth and so forth. 390 00:19:00,580 --> 00:19:02,060 3D modeling of a vehicle. 391 00:19:02,060 --> 00:19:04,300 So put a 3D bounding box around the vehicle 392 00:19:04,300 --> 00:19:07,510 so that we can know which side of a vehicle I'm looking at, 393 00:19:07,510 --> 00:19:09,550 whether it's the front, or rear, left side, 394 00:19:09,550 --> 00:19:11,510 right side, what is the angle. 395 00:19:11,510 --> 00:19:15,040 Know everything about vehicles as moving objects 396 00:19:15,040 --> 00:19:16,840 and do a lot of scene recognition. 397 00:19:16,840 --> 00:19:20,120 I'll give some examples about that later. 398 00:19:20,120 --> 00:19:23,290 So just deep networks, I know that you have-- 399 00:19:23,290 --> 00:19:25,325 you all know about deep networks. 400 00:19:25,325 --> 00:19:27,550 I'll just spend a few slides just 401 00:19:27,550 --> 00:19:29,370 to state what is the impact there, 402 00:19:29,370 --> 00:19:32,620 not the impact from the point of view of a scientist, 403 00:19:32,620 --> 00:19:37,900 but the impact from the point of view of a technologist. 404 00:19:37,900 --> 00:19:41,630 Because there isn't much science behind this. 405 00:19:41,630 --> 00:19:46,780 So the real turning point was 2012. 406 00:19:46,780 --> 00:19:51,000 2012, you know, the AlexNet, they 407 00:19:51,000 --> 00:19:54,340 show they built a convolutional net that 408 00:19:54,340 --> 00:19:57,940 was able to work on the ImageNet data set 409 00:19:57,940 --> 00:20:00,280 and reach a performance level, which 410 00:20:00,280 --> 00:20:02,380 was more or less double the performance 411 00:20:02,380 --> 00:20:05,030 level of what was done before. 412 00:20:05,030 --> 00:20:07,990 This is another network by Fergus. 413 00:20:07,990 --> 00:20:10,480 Very, very similar concept of convolution pooling. 414 00:20:10,480 --> 00:20:13,870 Convolution pooling two-three dense layers 415 00:20:13,870 --> 00:20:16,060 and you get the output. 416 00:20:16,060 --> 00:20:17,950 This is the ImageNet data set. 417 00:20:17,950 --> 00:20:22,150 You have about 1,000 categories over one million images. 418 00:20:22,150 --> 00:20:25,390 And these categories are very challenging. 419 00:20:25,390 --> 00:20:28,150 You know, you look at the images of a sailing vessel 420 00:20:28,150 --> 00:20:32,050 or images of a husky, the variation is huge. 421 00:20:32,050 --> 00:20:35,650 It's a really very difficult task. 422 00:20:35,650 --> 00:20:38,950 2011 the top five-- 423 00:20:38,950 --> 00:20:42,520 so the task is that you need to give a short list of five 424 00:20:42,520 --> 00:20:43,280 categories. 425 00:20:43,280 --> 00:20:46,180 And if the correct categories is among the top five 426 00:20:46,180 --> 00:20:47,710 then you succeeded. 427 00:20:47,710 --> 00:20:52,220 And the performance was about 26% error. 428 00:20:52,220 --> 00:20:55,750 And this AlexNet reached 16%. 429 00:20:55,750 --> 00:20:57,970 So it's almost double the performance. 430 00:20:57,970 --> 00:21:01,200 So this caught the attention of the community. 431 00:21:01,200 --> 00:21:05,910 It's a big leap from 26% to 16%. 432 00:21:05,910 --> 00:21:09,280 Now if you look what happened since then, so 2012 for this 433 00:21:09,280 --> 00:21:14,050 ImageNet competition there was one out of six competitors 434 00:21:14,050 --> 00:21:15,580 use deep networks. 435 00:21:15,580 --> 00:21:20,860 A year later 17 out of 24 competitors used deep networks. 436 00:21:20,860 --> 00:21:24,580 A year later 31 out of 32 using deep networks. 437 00:21:24,580 --> 00:21:27,354 So deep networks took over basically. 438 00:21:27,354 --> 00:21:29,020 If you look in terms of the performance, 439 00:21:29,020 --> 00:21:33,030 of the human performance is about 5%. 440 00:21:33,030 --> 00:21:43,090 And right now we are at 6%, 5%, by the latest 2015 competitors. 441 00:21:43,090 --> 00:21:44,380 People start cheating. 442 00:21:44,380 --> 00:21:46,630 So I think this is more or less Baidu was 443 00:21:46,630 --> 00:21:48,970 caught cheating on this test. 444 00:21:48,970 --> 00:21:52,390 So I think 5% is more or less where things are going. 445 00:21:52,390 --> 00:21:55,390 And this is human level perception. 446 00:21:55,390 --> 00:21:59,030 Another-- another big success was the face recognition. 447 00:21:59,030 --> 00:22:01,180 So this is a data set called face recognition 448 00:22:01,180 --> 00:22:05,470 in the wild, which contains pictures of celebrities 449 00:22:05,470 --> 00:22:08,850 where every celebrity you have pictures, you know, 450 00:22:08,850 --> 00:22:10,920 along a spectrum of many, many years. 451 00:22:10,920 --> 00:22:13,330 You can see the actor when he was 20 years old 452 00:22:13,330 --> 00:22:15,930 and then when he's 70-80 years old. 453 00:22:15,930 --> 00:22:19,230 Even for humans, this task is quite challenging, 454 00:22:19,230 --> 00:22:22,910 knowing whether two pictures are from the same person or not. 455 00:22:22,910 --> 00:22:28,380 And the human level performance is 97.5% correct. 456 00:22:28,380 --> 00:22:33,180 Now if you look at techniques not using deep networks, 457 00:22:33,180 --> 00:22:36,390 they reached 91.4%. 458 00:22:36,390 --> 00:22:40,620 And in 2014 a group by Facebook and Lior Wolf 459 00:22:40,620 --> 00:22:45,600 from Tel Aviv University, they built a deep network 460 00:22:45,600 --> 00:22:47,400 to do face recognition and reached 461 00:22:47,400 --> 00:22:51,780 97.3%, which is very, very close to human perception. 462 00:22:51,780 --> 00:22:55,980 And since then people have reach 99% on this database. 463 00:22:55,980 --> 00:22:58,770 And again, human level perception is 97.5. 464 00:22:58,770 --> 00:23:03,060 So this is another area where these deep networks, also 465 00:23:03,060 --> 00:23:04,050 in speech. 466 00:23:04,050 --> 00:23:08,720 This is a recent paper by Baidu headed by Andrew Ng. 467 00:23:08,720 --> 00:23:10,440 They surpassed, just doing an end 468 00:23:10,440 --> 00:23:16,890 to end network which learns also the structured prediction, 469 00:23:16,890 --> 00:23:20,062 better performance than Siri, Cortana, Google Now. 470 00:23:20,062 --> 00:23:21,480 OK? 471 00:23:21,480 --> 00:23:24,660 So the impact for automotive is that networks 472 00:23:24,660 --> 00:23:30,430 are very good at multi-class So the more categories you have, 473 00:23:30,430 --> 00:23:33,630 the better the performance of the network would be. 474 00:23:33,630 --> 00:23:37,050 Very good at using context, imagining or planning a path. 475 00:23:37,050 --> 00:23:40,740 So taking an image as an input and output would be the path. 476 00:23:40,740 --> 00:23:42,840 And you're cutting short of all the processes 477 00:23:42,840 --> 00:23:48,160 of looking for lanes and this kind of algorithms. 478 00:23:48,160 --> 00:23:51,090 Network will be ideal for pixel level labeling. 479 00:23:51,090 --> 00:23:53,850 For every pixel give me a category. 480 00:23:53,850 --> 00:23:56,320 And you can use the networks for sensor integration, 481 00:23:56,320 --> 00:23:59,370 for determining the control of the vehicle by fusing a lot, 482 00:23:59,370 --> 00:24:03,210 a lot of information coming from various cameras. 483 00:24:03,210 --> 00:24:06,210 So the challenge of using deep networks 484 00:24:06,210 --> 00:24:10,350 is that deep networks are very, very large. 485 00:24:10,350 --> 00:24:12,290 They're not designed for real time. 486 00:24:12,290 --> 00:24:16,320 The networks that you find in academic papers and the success 487 00:24:16,320 --> 00:24:17,860 are for easy problems. 488 00:24:17,860 --> 00:24:19,860 The problems that I've shown right now, 489 00:24:19,860 --> 00:24:21,330 the ImageNet, the face recognition, 490 00:24:21,330 --> 00:24:24,420 are relatively considered easy problems 491 00:24:24,420 --> 00:24:29,310 in the context of interpreting the image for autonomous 492 00:24:29,310 --> 00:24:30,960 driving. 493 00:24:30,960 --> 00:24:35,820 So let me show you what are the things that one can do. 494 00:24:35,820 --> 00:24:37,380 Let's start with the path planning. 495 00:24:37,380 --> 00:24:41,910 So this clip that I'll show you here the purpose of the network 496 00:24:41,910 --> 00:24:43,020 is to determine the path. 497 00:24:43,020 --> 00:24:45,310 So this is the green line. 498 00:24:45,310 --> 00:24:48,990 Now these clips are from scenes where it will 499 00:24:48,990 --> 00:24:50,520 be impossible to detect lanes. 500 00:24:50,520 --> 00:24:52,440 Because there's simply no-- simply no lanes. 501 00:24:52,440 --> 00:24:56,010 If you look at this, any lane detection system 502 00:24:56,010 --> 00:25:00,210 would find nothing in this kind of scene. 503 00:25:00,210 --> 00:25:02,700 Yet when you look at this image, you 504 00:25:02,700 --> 00:25:05,400 have no problem in determining where the path is. 505 00:25:05,400 --> 00:25:07,807 Because you're looking at the entire image context. 506 00:25:07,807 --> 00:25:09,390 And this is what the network is doing. 507 00:25:09,390 --> 00:25:11,840 It's being fed the input layer is the image, 508 00:25:11,840 --> 00:25:16,810 the output layer is this green line. 509 00:25:16,810 --> 00:25:19,050 Or for example, if you at this urban setting. 510 00:25:19,050 --> 00:25:20,880 There are no lanes in an urban setting. 511 00:25:20,880 --> 00:25:22,980 Yet the system can predict where the path 512 00:25:22,980 --> 00:25:26,310 is by fusing information from the entire context. 513 00:25:29,350 --> 00:25:32,280 These are roads in California where 514 00:25:32,280 --> 00:25:35,190 they have these reflectors called Botts dots. 515 00:25:35,190 --> 00:25:38,610 It's almost impossible to reliably, you know, 516 00:25:38,610 --> 00:25:42,630 fit lanes to these kinds of information. 517 00:25:42,630 --> 00:25:45,180 Yet if you look at this holistic path planning 518 00:25:45,180 --> 00:25:49,900 it reliably can tell you where it is. 519 00:25:49,900 --> 00:25:51,600 Let's look at free space. 520 00:25:51,600 --> 00:25:53,256 So free space, the idea of when you 521 00:25:53,256 --> 00:25:54,630 want to do autonomous driving you 522 00:25:54,630 --> 00:25:55,920 need to know where not to drive. 523 00:25:55,920 --> 00:25:56,420 Right? 524 00:25:56,420 --> 00:25:58,250 You don't want to drive towards the curb. 525 00:25:58,250 --> 00:25:59,625 It's not only that you don't want 526 00:25:59,625 --> 00:26:01,350 to hit other moving objects. 527 00:26:01,350 --> 00:26:02,640 That's the easy part. 528 00:26:02,640 --> 00:26:05,250 You don't want to hit a barrier or a guardrail. 529 00:26:05,250 --> 00:26:07,950 So you want to know where the free space is. 530 00:26:07,950 --> 00:26:10,590 So you can think of a network that for every pixel 531 00:26:10,590 --> 00:26:12,450 will give you a label. 532 00:26:12,450 --> 00:26:16,290 And let's now focus only on the label of road versus not road. 533 00:26:16,290 --> 00:26:18,450 So all the pixel green are road. 534 00:26:18,450 --> 00:26:19,980 Everything else is not road. 535 00:26:19,980 --> 00:26:21,930 So you can see that the green is not 536 00:26:21,930 --> 00:26:25,450 going over the curb, which is-- which is nice. 537 00:26:25,450 --> 00:26:30,234 But let's have it run a bit more. 538 00:26:30,234 --> 00:26:31,650 And then I'll stop it at the place 539 00:26:31,650 --> 00:26:34,420 where you'll see where the power of context. 540 00:26:34,420 --> 00:26:37,380 Says let's assume I stop it here. 541 00:26:37,380 --> 00:26:41,975 Now look at the sidewalk there. 542 00:26:41,975 --> 00:26:44,100 The color of the sidewalk and the color of the road 543 00:26:44,100 --> 00:26:45,930 is identical. 544 00:26:45,930 --> 00:26:48,330 The height of the curb is about one centimeter. 545 00:26:48,330 --> 00:26:51,990 So it's not that the height here, the geometry-- 546 00:26:51,990 --> 00:26:54,300 it's basically the context. 547 00:26:54,300 --> 00:26:57,180 The network figured out because there is a parked car there. 548 00:26:57,180 --> 00:27:01,300 That part is not part of the road. 549 00:27:01,300 --> 00:27:03,390 So in order to make this judgment correctly, 550 00:27:03,390 --> 00:27:06,640 one needs to not just look at a small area around the pixel 551 00:27:06,640 --> 00:27:08,429 and decide whether it's road or not road. 552 00:27:08,429 --> 00:27:10,720 One needs to collect information from the entire image. 553 00:27:10,720 --> 00:27:12,490 This is the power of context. 554 00:27:12,490 --> 00:27:15,130 And this is something that the network can do. 555 00:27:15,130 --> 00:27:18,640 You can see here, where the blue and red lines. 556 00:27:18,640 --> 00:27:20,590 Red means it's on a vehicle. 557 00:27:20,590 --> 00:27:25,040 Blue that it's on a physical barrier. 558 00:27:25,040 --> 00:27:26,860 So if I run this back here-- 559 00:27:26,860 --> 00:27:28,390 and this is done frame by frame. 560 00:27:28,390 --> 00:27:30,040 So it's a single frame thing. 561 00:27:30,040 --> 00:27:34,600 Same thing here, this height is one or two centimeters. 562 00:27:34,600 --> 00:27:37,870 The color of the sidewalk and the color of the road it's 563 00:27:37,870 --> 00:27:38,882 identical. 564 00:27:38,882 --> 00:27:40,840 So being able to make the correct judgment here 565 00:27:40,840 --> 00:27:42,560 is very, very challenging. 566 00:27:42,560 --> 00:27:46,320 And this is where a network can succeed. 567 00:27:46,320 --> 00:27:49,480 Here the network also predicts that this 568 00:27:49,480 --> 00:27:53,490 is a code for being a curb. 569 00:27:53,490 --> 00:27:57,470 The red is the side of a vehicle or front of the vehicle. 570 00:27:57,470 --> 00:28:00,730 And the next one it predicts that this 571 00:28:00,730 --> 00:28:06,670 is part of a guardrail, the coding of this 572 00:28:06,670 --> 00:28:07,640 is part of a guardrail. 573 00:28:07,640 --> 00:28:09,730 So the system has about 15 categories, 574 00:28:09,730 --> 00:28:12,070 guardrail, curb, barrier, and so forth. 575 00:28:12,070 --> 00:28:16,030 Let's keep the questions for later. 576 00:28:16,030 --> 00:28:16,840 And so forth. 577 00:28:16,840 --> 00:28:20,950 So this is one area we call semantic free space. 578 00:28:20,950 --> 00:28:24,840 So for every pixel in the scene tell me what it is. 579 00:28:24,840 --> 00:28:26,340 Of course, I'm interested-- first 580 00:28:26,340 --> 00:28:28,590 and foremost I'm interested to know where the road is. 581 00:28:28,590 --> 00:28:31,990 And then at the edges of where the road ends to know what 582 00:28:31,990 --> 00:28:32,680 is the label. 583 00:28:32,680 --> 00:28:34,690 Is it a side of a vehicle, front of a vehicle, 584 00:28:34,690 --> 00:28:36,050 rear of a vehicle. 585 00:28:36,050 --> 00:28:40,530 Is it a curb barrier guardrail and so forth. 586 00:28:40,530 --> 00:28:43,570 And this, again, is done by deep network. 587 00:28:43,570 --> 00:28:45,750 I'll skip this one. 588 00:28:45,750 --> 00:28:51,470 And then you can apply this from cameras from any angle. 589 00:28:51,470 --> 00:28:53,770 So this is a camera looking at a corner, 590 00:28:53,770 --> 00:28:56,410 looking at the 45 degrees on the right. 591 00:28:56,410 --> 00:28:59,350 So the system can know where the free space is. 592 00:28:59,350 --> 00:29:02,390 This is a camera from the side, with a fish eye. 593 00:29:02,390 --> 00:29:04,720 Again, using the same kind of technology the system 594 00:29:04,720 --> 00:29:06,355 can know where the free space is. 595 00:29:09,100 --> 00:29:11,880 Same thing here. 596 00:29:11,880 --> 00:29:14,440 Here as well, day night. 597 00:29:14,440 --> 00:29:16,240 3D modeling, 3D modeling is to be 598 00:29:16,240 --> 00:29:18,430 able to put a bounding box, a 3D bounding box, 599 00:29:18,430 --> 00:29:19,450 around the vehicle. 600 00:29:19,450 --> 00:29:23,380 And the color here is that the green is front, red is rear, 601 00:29:23,380 --> 00:29:27,250 blue is right hand side, and yellow is left hand side. 602 00:29:27,250 --> 00:29:28,670 If you let this run-- 603 00:29:28,670 --> 00:29:30,670 all right. 604 00:29:30,670 --> 00:29:32,950 Now the importance of putting a 3D bounding 605 00:29:32,950 --> 00:29:35,800 box around the vehicle is that now you can 606 00:29:35,800 --> 00:29:37,850 place a camera at any angle. 607 00:29:37,850 --> 00:29:39,560 So it's not only camera looking forward, 608 00:29:39,560 --> 00:29:42,490 but the camera at any angle, because the way 609 00:29:42,490 --> 00:29:44,680 a vehicle is defined is invariant to the camera 610 00:29:44,680 --> 00:29:48,310 position, so this is kind of a preparation for putting cameras 611 00:29:48,310 --> 00:29:50,620 all around the vehicle at the 360-- 612 00:29:50,620 --> 00:29:51,520 360 degree. 613 00:29:55,390 --> 00:29:57,860 Scene recognition, for example, being-- 614 00:29:57,860 --> 00:29:59,770 to know that this is a bump, is also 615 00:29:59,770 --> 00:30:03,280 being done by a network that takes an image 616 00:30:03,280 --> 00:30:06,010 and outputs where the bumps are. 617 00:30:06,010 --> 00:30:09,190 The same thing-- same thing here. 618 00:30:09,190 --> 00:30:13,600 More complicated than that is knowing where this top line is. 619 00:30:13,600 --> 00:30:17,410 So when you go and detect traffic lights-- so 620 00:30:17,410 --> 00:30:20,366 detecting traffic lights is the easy problem. 621 00:30:20,366 --> 00:30:22,480 A more complicated problem is to know 622 00:30:22,480 --> 00:30:25,840 the relevancy of the traffic lights, which traffic light is 623 00:30:25,840 --> 00:30:27,340 relevant to what direction. 624 00:30:27,340 --> 00:30:29,620 The third one, the most difficult problem, 625 00:30:29,620 --> 00:30:32,110 is to detect the stop line. 626 00:30:32,110 --> 00:30:35,470 The problem with stop line is that when you see the stop line 627 00:30:35,470 --> 00:30:36,500 it's a bit too late. 628 00:30:36,500 --> 00:30:38,980 You see this stop line 20-30 meters away. 629 00:30:38,980 --> 00:30:40,990 So it's too late to start stopping 630 00:30:40,990 --> 00:30:42,820 and have a smooth stopping. 631 00:30:42,820 --> 00:30:46,790 You want to predict where the stop line is 60-70 meters away. 632 00:30:46,790 --> 00:30:50,319 So here, you want your algorithm, or your network, 633 00:30:50,319 --> 00:30:52,360 to understand that you are approaching a junction 634 00:30:52,360 --> 00:30:55,000 and start estimating where the stop line should be so they can 635 00:30:55,000 --> 00:30:59,530 start slowly reducing your speed, such that by the time 636 00:30:59,530 --> 00:31:01,330 you see where the stop line is you 637 00:31:01,330 --> 00:31:05,110 already reduced your speed considerably. 638 00:31:05,110 --> 00:31:06,760 I'll skip this. 639 00:31:06,760 --> 00:31:08,470 Knowing lane assignment. 640 00:31:08,470 --> 00:31:10,810 Knowing how many lanes are and which lane you are 641 00:31:10,810 --> 00:31:12,940 is also done by a network. 642 00:31:12,940 --> 00:31:14,780 So the network will give a probability 643 00:31:14,780 --> 00:31:16,690 whether that this is a lane, this is a lane. 644 00:31:16,690 --> 00:31:18,685 For example, it knows that this is not a lane. 645 00:31:18,685 --> 00:31:23,380 It has here red, zero probability. 646 00:31:23,380 --> 00:31:24,400 So as you can see here-- 647 00:31:24,400 --> 00:31:25,450 I'll skip this one. 648 00:31:25,450 --> 00:31:30,560 So these networks, so for every task there is a network. 649 00:31:30,560 --> 00:31:34,240 And these networks are quite sophisticated in accessing, 650 00:31:34,240 --> 00:31:37,200 integrating a context at traffic light. 651 00:31:37,200 --> 00:31:39,780 I'll skip this with traffic light. 652 00:31:39,780 --> 00:31:43,850 So multiple cameras, this is how it is-- 653 00:31:43,850 --> 00:31:44,620 it looks like. 654 00:31:44,620 --> 00:31:49,690 You have the red ones are three cameras behind the windscreen. 655 00:31:49,690 --> 00:31:51,500 One is about 180 degrees. 656 00:31:51,500 --> 00:31:52,900 The other one is about 50. 657 00:31:52,900 --> 00:31:55,000 The third one is about 25 degrees. 658 00:31:55,000 --> 00:31:56,800 And then there are another five cameras 659 00:31:56,800 --> 00:32:02,290 around the car that give you all 360 degrees. 660 00:32:02,290 --> 00:32:07,210 And this kind of configuration, first launch of it, 661 00:32:07,210 --> 00:32:09,820 in a series produced car, is going to be 2016. 662 00:32:09,820 --> 00:32:14,470 So I'm not talking about science fiction. 663 00:32:14,470 --> 00:32:20,650 These are how images look from some of these cameras. 664 00:32:20,650 --> 00:32:25,630 So let me show you a first clip of automated driving. 665 00:32:25,630 --> 00:32:26,770 This is kind of a funny-- 666 00:32:26,770 --> 00:32:28,970 funny clip. 667 00:32:28,970 --> 00:32:33,109 This is an actor who played a major role in Star Trek. 668 00:32:33,109 --> 00:32:34,150 So I'll not say his name. 669 00:32:34,150 --> 00:32:37,420 Let's see whether you can identify him yourself. 670 00:32:37,420 --> 00:32:39,250 And he has a program called-- program 671 00:32:39,250 --> 00:32:41,290 for kids called Reading Rainbow. 672 00:32:41,290 --> 00:32:43,920 So this program is 20 years old. 673 00:32:43,920 --> 00:32:45,520 And he came to Israel and he wanted 674 00:32:45,520 --> 00:32:50,560 to drive the autonomous vehicle that we have for his kids 675 00:32:50,560 --> 00:32:52,900 program. 676 00:32:52,900 --> 00:32:54,700 So he was driving my car. 677 00:32:54,700 --> 00:32:55,870 So my car is autonomous. 678 00:32:55,870 --> 00:32:57,550 I can drive from Tel Aviv to Jerusalem 679 00:32:57,550 --> 00:32:59,485 without touching the steering wheel. 680 00:32:59,485 --> 00:33:02,270 It's-- I do that all the time. 681 00:33:02,270 --> 00:33:03,850 So he was driving it. 682 00:33:03,850 --> 00:33:06,110 And it's a bit funny. 683 00:33:06,110 --> 00:33:09,480 So let's-- but you'll get a feeling of what this is. 684 00:33:09,480 --> 00:33:10,500 So let's run this. 685 00:33:10,500 --> 00:33:12,727 It's two minutes. 686 00:33:12,727 --> 00:33:16,720 [VIDEO PLAYBACK] 687 00:33:16,720 --> 00:33:17,350 - Yes. 688 00:33:17,350 --> 00:33:18,490 They can. 689 00:33:18,490 --> 00:33:20,050 That's because technology companies, 690 00:33:20,050 --> 00:33:22,120 like Mobileye here in Israel, are 691 00:33:22,120 --> 00:33:25,024 about to introduce self-driving technologies to the world. 692 00:33:25,024 --> 00:33:26,440 AMNON SHASHUA: You know who he is? 693 00:33:26,440 --> 00:33:27,981 - In the not too distant future, just 694 00:33:27,981 --> 00:33:29,530 like in a science fiction movie. 695 00:33:29,530 --> 00:33:31,420 A driver will be able to hop in a car, 696 00:33:31,420 --> 00:33:33,700 tell it where you want it to go, and voila, 697 00:33:33,700 --> 00:33:36,440 the car will do the rest. 698 00:33:36,440 --> 00:33:39,640 So right now I'm driving like everybody does. 699 00:33:39,640 --> 00:33:41,290 My hands are on the steering wheel 700 00:33:41,290 --> 00:33:45,230 and my foot is on the brake, or the pedal, as required. 701 00:33:45,230 --> 00:33:47,548 And I'm in control of the car. 702 00:33:47,548 --> 00:34:01,332 But when I take my foot off the pedal and do this, 703 00:34:01,332 --> 00:34:04,260 now the car is driving itself. 704 00:34:12,560 --> 00:34:13,830 Wow. 705 00:34:13,830 --> 00:34:15,110 This really is amazing. 706 00:34:15,110 --> 00:34:19,350 I feel really safe with the car doing all of the driving. 707 00:34:19,350 --> 00:34:20,110 OK. 708 00:34:20,110 --> 00:34:20,820 Now watch this. 709 00:34:20,820 --> 00:34:23,010 And this is something that no one should ever 710 00:34:23,010 --> 00:34:25,909 do in a regular car, ever. 711 00:34:31,810 --> 00:34:32,760 Wow. 712 00:34:32,760 --> 00:34:34,587 That was freaky. 713 00:34:34,587 --> 00:34:35,170 [END PLAYBACK] 714 00:34:35,170 --> 00:34:36,010 AMNON SHASHUA: OK? 715 00:34:36,010 --> 00:34:40,290 So anyone from the young people know who he is? 716 00:34:40,290 --> 00:34:42,370 So this is Jordy, from Star Trek. 717 00:34:42,370 --> 00:34:43,480 He had this visor. 718 00:34:43,480 --> 00:34:44,560 He was blind. 719 00:34:44,560 --> 00:34:48,170 He had a visor. 720 00:34:48,170 --> 00:34:48,670 OK. 721 00:34:48,670 --> 00:34:51,400 So let's spend a few minutes to talk 722 00:34:51,400 --> 00:34:54,040 about what is the impact of autonomous driving 723 00:34:54,040 --> 00:34:56,090 and how it's going to unfold. 724 00:34:56,090 --> 00:34:58,900 So this is far from science fiction. 725 00:34:58,900 --> 00:35:01,960 It's actually unfolding as we speak. 726 00:35:01,960 --> 00:35:06,850 The first hands free driving on highways is coming out now. 727 00:35:06,850 --> 00:35:08,530 The first one is Tesla. 728 00:35:08,530 --> 00:35:12,010 They have already launched-- they made this public 729 00:35:12,010 --> 00:35:13,670 a week or two ago. 730 00:35:13,670 --> 00:35:17,110 Their first beta drivers are driving with the system. 731 00:35:17,110 --> 00:35:20,920 And I presume within a month it will be also installed 732 00:35:20,920 --> 00:35:23,410 to all other drivers. 733 00:35:23,410 --> 00:35:25,470 And this is-- you can do hands free 734 00:35:25,470 --> 00:35:28,060 when driving on a highway, unlimited speed. 735 00:35:28,060 --> 00:35:29,520 So you can drive at highway speeds, 736 00:35:29,520 --> 00:35:35,140 let go of the steering wheel, and the car will drive. 737 00:35:35,140 --> 00:35:38,500 GM already announced that middle of 2016 738 00:35:38,500 --> 00:35:40,690 they have the super cruise, more or less 739 00:35:40,690 --> 00:35:43,570 the same kind of functionality. 740 00:35:43,570 --> 00:35:46,070 Audi also announced 2016. 741 00:35:46,070 --> 00:35:49,720 And these are just the first comers. 742 00:35:49,720 --> 00:35:53,470 We are working with about 13 car manufacturers that 743 00:35:53,470 --> 00:35:56,530 within the next three years, three to four years, 744 00:35:56,530 --> 00:35:58,594 having this kind of capability. 745 00:35:58,594 --> 00:36:00,010 So this will be in the mainstream. 746 00:36:00,010 --> 00:36:02,770 Now what I put there in red is that the driver 747 00:36:02,770 --> 00:36:05,084 still has primary responsibility and has to be alert. 748 00:36:05,084 --> 00:36:07,000 That means that the technology is not perfect. 749 00:36:07,000 --> 00:36:08,230 It could make mistakes. 750 00:36:08,230 --> 00:36:11,090 Therefore, the driver has to be-- 751 00:36:11,090 --> 00:36:12,520 is still the primary-- 752 00:36:12,520 --> 00:36:14,350 has the primary responsibility. 753 00:36:14,350 --> 00:36:17,410 So at this stage there's no disruption here. 754 00:36:17,410 --> 00:36:19,780 It's just a nice feature to have. 755 00:36:19,780 --> 00:36:21,670 For the car industry, this is the first step 756 00:36:21,670 --> 00:36:26,290 to start practicing towards reaching autonomous driving. 757 00:36:26,290 --> 00:36:30,140 The second step starts 2016, and this is with the eight cameras 758 00:36:30,140 --> 00:36:33,700 that I showed you slide before. 759 00:36:33,700 --> 00:36:36,411 Here, the car can drive autonomously from highway 760 00:36:36,411 --> 00:36:36,910 to highway. 761 00:36:36,910 --> 00:36:40,220 So on ramp, off ramps, are done autonomously. 762 00:36:40,220 --> 00:36:44,200 So you-- you with Google Maps or whatever navigation program you 763 00:36:44,200 --> 00:36:49,240 chart your route, and until the car reaches city boundaries it 764 00:36:49,240 --> 00:36:50,400 will go all-- 765 00:36:50,400 --> 00:36:52,090 go autonomously. 766 00:36:52,090 --> 00:36:54,850 From highway to highway it will switch from highway to highway 767 00:36:54,850 --> 00:36:57,340 and do that autonomously. 768 00:36:57,340 --> 00:37:01,000 Still, the driver has primary responsibility, and is alert. 769 00:37:01,000 --> 00:37:04,720 So there's no-- nothing here is transformative. 770 00:37:04,720 --> 00:37:06,070 It's a nice feature. 771 00:37:06,070 --> 00:37:09,670 Again, it's part of a phased approach of the car industry 772 00:37:09,670 --> 00:37:12,250 to start practicing. 773 00:37:12,250 --> 00:37:16,765 Starting from 2018, would come the first small disruption. 774 00:37:16,765 --> 00:37:20,080 The first small disruption is that technology 775 00:37:20,080 --> 00:37:23,790 would reach a level in which driver is responsible. 776 00:37:23,790 --> 00:37:27,600 The driver must be there but not necessarily alert. 777 00:37:27,600 --> 00:37:31,090 So it means that the driver is an attendant. 778 00:37:31,090 --> 00:37:33,430 The driver is monitoring just like a pilot 779 00:37:33,430 --> 00:37:36,220 sitting in an airplane while the plane is in auto-pilot. 780 00:37:36,220 --> 00:37:38,560 The driver needs to be there in case there is a problem. 781 00:37:38,560 --> 00:37:40,672 The system will give a grace period of time 782 00:37:40,672 --> 00:37:42,505 until the driver needs to take back control. 783 00:37:42,505 --> 00:37:47,130 So it's not taking control in instant-- immediately. 784 00:37:47,130 --> 00:37:50,200 And so this transition from primary responsibility 785 00:37:50,200 --> 00:37:52,990 to monitoring, like in aviation, will be 786 00:37:52,990 --> 00:37:56,297 the first disruption, the beginnings of a disruption. 787 00:37:56,297 --> 00:37:58,630 So let's try to imagine what kind of disruption this is. 788 00:37:58,630 --> 00:38:01,180 So let's take Uber as an example. 789 00:38:01,180 --> 00:38:05,200 So today you have free time. 790 00:38:05,200 --> 00:38:06,560 So you own a car. 791 00:38:06,560 --> 00:38:09,280 You have a free time, say between 3:00 PM to 5:00 PM 792 00:38:09,280 --> 00:38:13,960 So you take your car and apply Uber and take passengers 793 00:38:13,960 --> 00:38:16,030 and earn some money. 794 00:38:16,030 --> 00:38:17,290 That's Uber today. 795 00:38:17,290 --> 00:38:21,070 Now let's look at 2018 - 2019. 796 00:38:21,070 --> 00:38:23,380 You have zero skills and you don't have a car. 797 00:38:23,380 --> 00:38:25,420 All what you have, you have a driver license. 798 00:38:25,420 --> 00:38:28,480 So you are willing to be an attendant. 799 00:38:28,480 --> 00:38:30,900 So you say, OK, now I have free time. 800 00:38:30,900 --> 00:38:33,559 An Uber car would come with an attendant. 801 00:38:33,559 --> 00:38:35,100 You switch places with the attendant. 802 00:38:35,100 --> 00:38:37,867 You sit behind the steering wheel and you do nothing. 803 00:38:37,867 --> 00:38:38,950 You don't control the car. 804 00:38:38,950 --> 00:38:41,190 You don't control the passengers who are coming 805 00:38:41,190 --> 00:38:43,660 who are being taken by the car. 806 00:38:43,660 --> 00:38:45,580 You simply sit there. 807 00:38:45,580 --> 00:38:49,160 Zero skills, therefore your payment is very, very small. 808 00:38:49,160 --> 00:38:52,750 So now these cars can drive 24/7 because attendant can 809 00:38:52,750 --> 00:38:56,600 be replaced every hour or so. 810 00:38:56,600 --> 00:38:59,830 So here we have another business model 811 00:38:59,830 --> 00:39:02,620 which makes this public transportation, the Uber type 812 00:39:02,620 --> 00:39:05,050 of public transportation, now much more powerful 813 00:39:05,050 --> 00:39:06,740 than it is today. 814 00:39:06,740 --> 00:39:09,126 So this is kind of the beginning of disruption. 815 00:39:09,126 --> 00:39:10,250 What will be the next step? 816 00:39:10,250 --> 00:39:16,630 The next step 2020-2022, imagine that a driverless car 817 00:39:16,630 --> 00:39:19,730 can drive without passengers. 818 00:39:19,730 --> 00:39:22,480 So this is one step before you can allow a car 819 00:39:22,480 --> 00:39:23,850 to drive autonomously. 820 00:39:23,850 --> 00:39:26,920 So without passengers means that all what you need to prove 821 00:39:26,920 --> 00:39:28,510 is that the car, your car, would not 822 00:39:28,510 --> 00:39:30,860 hit other cars or pedestrians. 823 00:39:30,860 --> 00:39:33,387 But if it hits an infrastructure nobody gets 824 00:39:33,387 --> 00:39:35,470 killed because there are no passengers in the car. 825 00:39:35,470 --> 00:39:38,110 No passengers meaning nobody in the car. 826 00:39:38,110 --> 00:39:40,542 Now this is already a major disruption 827 00:39:40,542 --> 00:39:43,000 because what it means, it means that the household does not 828 00:39:43,000 --> 00:39:45,970 need to own multiple cars. 829 00:39:45,970 --> 00:39:46,930 One car is enough. 830 00:39:46,930 --> 00:39:48,280 I drive to work with the car. 831 00:39:48,280 --> 00:39:50,260 I send the car back home. 832 00:39:50,260 --> 00:39:53,470 Takes my wife, take her to work, comes back home. 833 00:39:53,470 --> 00:39:54,320 You get the picture. 834 00:39:54,320 --> 00:39:57,400 So this is kind of a beginning of a major disruption. 835 00:39:57,400 --> 00:40:01,960 Then about 2025 - 2030, sufficient experience 836 00:40:01,960 --> 00:40:04,870 with mapping data, car to car communication, 837 00:40:04,870 --> 00:40:06,400 one can imagine how these cars would 838 00:40:06,400 --> 00:40:08,590 be completely autonomously. 839 00:40:08,590 --> 00:40:12,272 And that is where the major disruption happens. 840 00:40:12,272 --> 00:40:13,420 OK? 841 00:40:13,420 --> 00:40:16,000 So this is autonomous driving. 842 00:40:16,000 --> 00:40:19,900 Let me go to the second part about wearable computing. 843 00:40:19,900 --> 00:40:21,320 And then we can take questions. 844 00:40:21,320 --> 00:40:24,850 So this will be much shorter. 845 00:40:24,850 --> 00:40:28,300 So again, computer vision, but now the camera 846 00:40:28,300 --> 00:40:30,350 is not beside us, like in the car. 847 00:40:30,350 --> 00:40:31,850 The camera is on us. 848 00:40:31,850 --> 00:40:34,930 Now if the camera is on us, the first question 849 00:40:34,930 --> 00:40:38,280 that you would ask, who needs a camera to be on you? 850 00:40:38,280 --> 00:40:39,280 Right? 851 00:40:39,280 --> 00:40:42,220 So the first market segment for something like this 852 00:40:42,220 --> 00:40:45,130 are the blind and visually impaired. 853 00:40:45,130 --> 00:40:47,690 So the way to imagine this. 854 00:40:47,690 --> 00:40:51,074 you are visually impaired or a blind person 855 00:40:51,074 --> 00:40:52,990 so you don't see well or you don't see at all, 856 00:40:52,990 --> 00:40:53,710 or you don't see well. 857 00:40:53,710 --> 00:40:55,209 So it's very, very difficult for you 858 00:40:55,209 --> 00:40:56,960 to negotiate the visual world. 859 00:40:56,960 --> 00:40:59,980 You cannot read anything unless it's few centimeters from 860 00:40:59,980 --> 00:41:01,000 your eye. 861 00:41:01,000 --> 00:41:03,220 You cannot recognize people unless they start talking 862 00:41:03,220 --> 00:41:03,719 to you. 863 00:41:03,719 --> 00:41:05,409 So you can recognize their voice. 864 00:41:05,409 --> 00:41:07,450 You cannot cross the street because you don't see 865 00:41:07,450 --> 00:41:08,892 the traffic light. 866 00:41:08,892 --> 00:41:11,350 You cannot go on a bus because you don't think what the bus 867 00:41:11,350 --> 00:41:12,110 number is. 868 00:41:12,110 --> 00:41:14,930 So basically you are very, very constrained, very limited. 869 00:41:14,930 --> 00:41:17,495 Now let's assume that you have a helper standing beside you. 870 00:41:17,495 --> 00:41:20,770 Now this helper is relatively intelligent 871 00:41:20,770 --> 00:41:24,795 and has correct eyesight. 872 00:41:24,795 --> 00:41:26,680 Now the helper looks at you, sees 873 00:41:26,680 --> 00:41:30,020 where you are pointing your hands, for example, 874 00:41:30,020 --> 00:41:32,861 or pointing your gaze, looks at the scene, 875 00:41:32,861 --> 00:41:35,110 understands what kind of information you want to know, 876 00:41:35,110 --> 00:41:37,510 and whispers to your ear the information. 877 00:41:37,510 --> 00:41:39,965 So say you want to catch a bus. 878 00:41:39,965 --> 00:41:42,340 You know that the bus is coming because you hear the bus, 879 00:41:42,340 --> 00:41:43,970 maybe you see a silhouette. 880 00:41:43,970 --> 00:41:45,220 So you look at that direction. 881 00:41:45,220 --> 00:41:47,500 The helper looks at the bus. 882 00:41:47,500 --> 00:41:48,670 It sees that there is a bus. 883 00:41:48,670 --> 00:41:50,410 Tells you what the bus number is. 884 00:41:50,410 --> 00:41:51,679 You want to cross the street. 885 00:41:51,679 --> 00:41:53,720 You know the traffic light is more or less there. 886 00:41:53,720 --> 00:41:55,750 But you cannot-- you don't know what the color of the traffic 887 00:41:55,750 --> 00:41:56,740 light is. 888 00:41:56,740 --> 00:41:58,334 So the helper looks at your gaze, 889 00:41:58,334 --> 00:42:00,000 sees that there's a traffic light there. 890 00:42:00,000 --> 00:42:01,440 Tell you it's a green light. 891 00:42:01,440 --> 00:42:03,730 You're opening a newspaper. 892 00:42:03,730 --> 00:42:05,350 You point someone on the newspaper, 893 00:42:05,350 --> 00:42:07,650 the helper would read you the article. 894 00:42:07,650 --> 00:42:09,300 Or there is a street name. 895 00:42:09,300 --> 00:42:11,560 You point towards the street name. 896 00:42:11,560 --> 00:42:13,960 The helper would look at the scene, 897 00:42:13,960 --> 00:42:16,010 understand that there is a text in the wild, 898 00:42:16,010 --> 00:42:18,460 and simply read you the street name. 899 00:42:18,460 --> 00:42:22,390 A familiar face appears, the helper will whisper, 900 00:42:22,390 --> 00:42:24,700 you know, Joe has now-- is now in front of you. 901 00:42:24,700 --> 00:42:26,060 And so forth. 902 00:42:26,060 --> 00:42:29,290 So if you have now replaced this helper with computer vision 903 00:42:29,290 --> 00:42:31,810 you can imagine how this could help someone 904 00:42:31,810 --> 00:42:34,100 who is visually impaired. 905 00:42:34,100 --> 00:42:35,470 So let me show you-- 906 00:42:35,470 --> 00:42:39,530 so first of all, the number of visually impaired is quite big. 907 00:42:39,530 --> 00:42:43,960 So the number of blind people in the US is about 1.5 million. 908 00:42:43,960 --> 00:42:45,100 That's not big. 909 00:42:45,100 --> 00:42:48,310 The number of visually impaired, and it's people that 910 00:42:48,310 --> 00:42:52,180 their ailment cannot be corrected through lenses, 911 00:42:52,180 --> 00:42:54,400 is about 26 million. 912 00:42:54,400 --> 00:42:55,630 So this is a sizable number. 913 00:42:55,630 --> 00:42:58,750 World wide is above 400 million people 914 00:42:58,750 --> 00:42:59,960 who are visually impaired. 915 00:42:59,960 --> 00:43:02,950 And they don't have much technology to help them. 916 00:43:02,950 --> 00:43:05,710 So this is what OrCam is doing. 917 00:43:05,710 --> 00:43:08,767 It's a camera which clips on eyeglasses. 918 00:43:08,767 --> 00:43:11,350 And there is a computing device, which you put in your pocket. 919 00:43:11,350 --> 00:43:13,240 And the way you interact with the device is with your hand, 920 00:43:13,240 --> 00:43:14,110 with your finger. 921 00:43:14,110 --> 00:43:16,750 Because the camera is on you it could see also your hand. 922 00:43:16,750 --> 00:43:19,330 Once you point, the camera starts 923 00:43:19,330 --> 00:43:21,700 to extract information from the scene and talks 924 00:43:21,700 --> 00:43:23,527 to you through an earpiece. 925 00:43:23,527 --> 00:43:24,610 So let's look at the clip. 926 00:43:24,610 --> 00:43:24,760 [VIDEO PLAYBACK] 927 00:43:24,760 --> 00:43:25,420 - Hi. 928 00:43:25,420 --> 00:43:28,630 I'm Liette and I'm visually impaired. 929 00:43:28,630 --> 00:43:32,539 I want to show you today how this device changed my life. 930 00:43:39,390 --> 00:43:40,781 - Massaryk. 931 00:43:40,781 --> 00:43:41,280 - Great. 932 00:43:41,280 --> 00:43:42,741 Let's go there. 933 00:43:46,150 --> 00:43:46,983 - Red light. 934 00:43:50,230 --> 00:43:51,440 Green light. 935 00:43:57,240 --> 00:43:58,350 50 shekel. 936 00:43:58,350 --> 00:43:59,554 - 50 shekel. 937 00:43:59,554 --> 00:44:00,470 Let's buy some coffee. 938 00:44:05,470 --> 00:44:06,450 - Breakfast. 939 00:44:06,450 --> 00:44:09,087 Bagel plus coffee with cream cheese [INAUDIBLE].. 940 00:44:09,087 --> 00:44:09,670 [END PLAYBACK] 941 00:44:09,670 --> 00:44:10,440 AMNON SHASHUA: OK? 942 00:44:10,440 --> 00:44:12,530 So you get the idea. 943 00:44:12,530 --> 00:44:14,500 So we started 2010. 944 00:44:14,500 --> 00:44:18,730 2013 we had already a prototype working. 945 00:44:18,730 --> 00:44:22,027 And we had a visitor, John Mark from the New York Times, 946 00:44:22,027 --> 00:44:23,860 and he came and he wrote a very nice article 947 00:44:23,860 --> 00:44:26,002 about what the company is doing. 948 00:44:26,002 --> 00:44:27,460 And we thought that at that time it 949 00:44:27,460 --> 00:44:30,520 would be good to launch the website of the company 950 00:44:30,520 --> 00:44:35,060 and try to get a number, say, 100 first customers, 951 00:44:35,060 --> 00:44:38,580 so that we can start experimenting, do field studies 952 00:44:38,580 --> 00:44:41,500 with a prototype device. 953 00:44:41,500 --> 00:44:42,670 So we launched the web site. 954 00:44:42,670 --> 00:44:45,100 We wrote that the device cost $2,500. 955 00:44:45,100 --> 00:44:46,961 That was June 2013. 956 00:44:46,961 --> 00:44:49,210 And the first 100 people who would purchase the device 957 00:44:49,210 --> 00:44:51,010 will receive the device in September. 958 00:44:51,010 --> 00:44:54,970 So within an hour those 100 devices were sold. 959 00:44:54,970 --> 00:45:01,210 And then we kept a waiting list, which today is about 30,000. 960 00:45:01,210 --> 00:45:05,840 And we started shipping the devices about a month ago. 961 00:45:05,840 --> 00:45:12,970 So in the last year this device was with about 200 people. 962 00:45:12,970 --> 00:45:16,980 And we got a lot of feedback from real users and improved. 963 00:45:16,980 --> 00:45:19,150 And let me show you some real users. 964 00:45:19,150 --> 00:45:22,390 So this is Marcia from Brazil. 965 00:45:22,390 --> 00:45:24,790 The device at the moment only works in English. 966 00:45:24,790 --> 00:45:27,710 Later we'll put more languages. 967 00:45:27,710 --> 00:45:31,730 And so she's being trained to use a device. 968 00:45:31,730 --> 00:45:34,560 And this is a short clip of about two minutes. 969 00:45:34,560 --> 00:45:36,760 And, you know, watch her body language. 970 00:45:36,760 --> 00:45:40,210 And also she explains how she copes 971 00:45:40,210 --> 00:45:42,910 with her disability, especially how she distinguishes 972 00:45:42,910 --> 00:45:44,910 between different money notes. 973 00:45:44,910 --> 00:45:45,660 They're all green. 974 00:45:45,660 --> 00:45:47,522 So how do you distinguish between them? 975 00:45:47,522 --> 00:45:48,730 So let's have a look at this. 976 00:45:54,040 --> 00:45:57,780 So the device is reading the newspaper for her. 977 00:45:57,780 --> 00:46:01,576 [VIDEO PLAYBACK] 978 00:46:01,576 --> 00:46:11,038 - [INAUDIBLE] 979 00:46:11,038 --> 00:46:13,030 - $50. 980 00:46:13,030 --> 00:46:13,530 - $50. 981 00:46:13,530 --> 00:46:15,930 Cincuenta dollars. 982 00:46:15,930 --> 00:46:16,560 - Cincuenta. 983 00:46:16,560 --> 00:46:18,790 Let's see if [INAUDIBLE]. 984 00:46:22,661 --> 00:46:24,030 - It green. 985 00:46:24,030 --> 00:46:36,950 All green and I put mark color, yellow, green, orange. 986 00:46:36,950 --> 00:46:38,180 Different note. 987 00:46:41,302 --> 00:46:53,144 [INAUDIBLE] 988 00:46:53,144 --> 00:46:54,126 - $20. 989 00:46:57,072 --> 00:46:59,036 [? Genia ?] 990 00:46:59,036 --> 00:47:03,946 - [INAUDIBLE] 991 00:47:03,946 --> 00:47:05,420 [END PLAYBACK] 992 00:47:05,420 --> 00:47:07,130 AMNON SHASHUA: OK. 993 00:47:07,130 --> 00:47:10,570 Here's a recent-- from CNN. 994 00:47:10,570 --> 00:47:14,330 It was aired a month ago. 995 00:47:14,330 --> 00:47:16,670 It also gives a bit more information about the device. 996 00:47:16,670 --> 00:47:17,470 Let's run this. 997 00:47:17,470 --> 00:47:18,820 It's again two minutes. 998 00:47:18,820 --> 00:47:19,486 [VIDEO PLAYBACK] 999 00:47:19,486 --> 00:47:22,630 - Two weekends ago I sat down and read The New York Times. 1000 00:47:22,630 --> 00:47:25,650 I haven't done that in maybe 30 years. 1001 00:47:25,650 --> 00:47:26,980 My wife came down. 1002 00:47:26,980 --> 00:47:28,250 I had a cup of coffee. 1003 00:47:28,250 --> 00:47:31,120 I'm reading The New York Times and she was crying. 1004 00:47:31,120 --> 00:47:35,080 - Just being able to read again is emotional for Howard Turman. 1005 00:47:35,080 --> 00:47:37,600 He started losing his vision as a child. 1006 00:47:37,600 --> 00:47:40,150 His new glasses don't fix his eyes 1007 00:47:40,150 --> 00:47:42,240 but they do the next best thing. 1008 00:47:42,240 --> 00:47:45,600 - Put on my glasses, it recognizes the finger, 1009 00:47:45,600 --> 00:47:46,780 snaps the picture. 1010 00:47:49,620 --> 00:47:50,920 Now it just reads. 1011 00:47:50,920 --> 00:47:54,490 - The glasses have a camera that recognizes text and can 1012 00:47:54,490 --> 00:47:56,300 read the world to him. 1013 00:47:56,300 --> 00:47:58,360 - Pull here. 1014 00:47:58,360 --> 00:48:02,440 - The technology is called OrCam and Turman says it gives him 1015 00:48:02,440 --> 00:48:04,090 a sense of normalcy. 1016 00:48:04,090 --> 00:48:07,180 - Even finding out that Dunkin' Donuts has a donut I never 1017 00:48:07,180 --> 00:48:09,430 tried was exciting. 1018 00:48:09,430 --> 00:48:11,530 - Dunkin' Donuts. 1019 00:48:11,530 --> 00:48:13,360 - It's a clip on camera. 1020 00:48:13,360 --> 00:48:17,230 So a camera that you can clip onto any eyeglasses. 1021 00:48:17,230 --> 00:48:19,270 And you have here a computing device, which 1022 00:48:19,270 --> 00:48:21,120 you can put in your pocket. 1023 00:48:21,120 --> 00:48:24,370 And the way it interacts, it's with a hand gesture. 1024 00:48:24,370 --> 00:48:26,580 For example, it's written there, rental and tours. 1025 00:48:29,680 --> 00:48:31,570 - Rentals and tours. 1026 00:48:31,570 --> 00:48:33,160 - It's not perfect though. 1027 00:48:33,160 --> 00:48:36,010 It uses a pretty bulky cable and sometimes it 1028 00:48:36,010 --> 00:48:38,017 needs a few tries to get things right. 1029 00:48:38,017 --> 00:48:40,350 - It doesn't read script because everybody's handwriting 1030 00:48:40,350 --> 00:48:40,960 is different. 1031 00:48:40,960 --> 00:48:45,060 So it doesn't do cursive very well at all. 1032 00:48:45,060 --> 00:48:47,650 - OrCam has a harder time in bright light, 1033 00:48:47,650 --> 00:48:51,200 or in tougher situations, like signs on windows. 1034 00:48:51,200 --> 00:48:54,510 - [INAUDIBLE] U donuts hours of operation. 1035 00:48:54,510 --> 00:48:55,780 Low PM. 1036 00:48:55,780 --> 00:48:56,950 Pound's PM. 1037 00:48:56,950 --> 00:48:58,240 9:00 PM. 1038 00:48:58,240 --> 00:49:00,910 How was your service today? 1039 00:49:00,910 --> 00:49:03,790 - Shashua says improvements are on the way. 1040 00:49:03,790 --> 00:49:06,700 Where do you see this technology going over the long term? 1041 00:49:06,700 --> 00:49:09,100 - Reading, recognizing faces, recognizing products, 1042 00:49:09,100 --> 00:49:10,230 is only the beginning. 1043 00:49:10,230 --> 00:49:13,510 Where we want to get is complete visual understanding 1044 00:49:13,510 --> 00:49:15,050 at the level of human perception, 1045 00:49:15,050 --> 00:49:17,590 such that if you are disoriented you 1046 00:49:17,590 --> 00:49:19,530 can start understanding what's around you. 1047 00:49:19,530 --> 00:49:20,920 For example, where's the door? 1048 00:49:20,920 --> 00:49:21,750 The door is there. 1049 00:49:21,750 --> 00:49:22,583 Where is the window? 1050 00:49:22,583 --> 00:49:26,844 Where is an opening in the space around me? 1051 00:49:26,844 --> 00:49:27,700 OK? 1052 00:49:27,700 --> 00:49:28,940 This is face recognition. 1053 00:49:28,940 --> 00:49:30,540 So again, one of the first 100. 1054 00:49:30,540 --> 00:49:32,891 - Teach OrCam to recognize anybody? 1055 00:49:32,891 --> 00:49:33,390 - Yep. 1056 00:49:33,390 --> 00:49:35,410 - Who does it know? 1057 00:49:35,410 --> 00:49:36,300 - Libby, my mother. 1058 00:49:36,300 --> 00:49:38,114 - You want to show me? 1059 00:49:38,114 --> 00:49:39,105 - Yep. 1060 00:49:39,105 --> 00:49:39,605 OK. 1061 00:49:43,084 --> 00:49:44,078 - All right. 1062 00:49:44,078 --> 00:49:46,066 [INAUDIBLE] Let's see. 1063 00:49:46,066 --> 00:49:50,042 [INAUDIBLE] 1064 00:49:50,042 --> 00:49:51,944 - Libby. 1065 00:49:51,944 --> 00:49:52,527 [END PLAYBACK] 1066 00:49:52,527 --> 00:49:53,277 AMNON SHASHUA: OK? 1067 00:49:53,277 --> 00:49:55,450 So that's also face recognition. 1068 00:49:55,450 --> 00:49:57,690 Last two slides. 1069 00:49:57,690 --> 00:50:01,865 We started also providing the device to research groups. 1070 00:50:01,865 --> 00:50:05,330 And this is one of-- this is a paper in ARVO where they 1071 00:50:05,330 --> 00:50:08,970 took eight visually impaired and gave them 1072 00:50:08,970 --> 00:50:10,190 the device for one month. 1073 00:50:10,190 --> 00:50:13,256 And then measured the change in quality of life. 1074 00:50:13,256 --> 00:50:15,380 And how they measure the change of quality of life, 1075 00:50:15,380 --> 00:50:18,290 they interview them. 1076 00:50:18,290 --> 00:50:21,470 And seven out of the eight reported significant change 1077 00:50:21,470 --> 00:50:22,390 in quality of life. 1078 00:50:22,390 --> 00:50:25,390 Now they sent us some of the interviews. 1079 00:50:25,390 --> 00:50:26,660 So on the next-- 1080 00:50:26,660 --> 00:50:30,929 here, I'm showing you part of the interview. 1081 00:50:30,929 --> 00:50:32,720 And what's interesting about this interview 1082 00:50:32,720 --> 00:50:34,880 is that there is a trick question. 1083 00:50:34,880 --> 00:50:37,130 The interviewer, after she tells him 1084 00:50:37,130 --> 00:50:40,520 how the device is, you know, lifesaving 1085 00:50:40,520 --> 00:50:44,630 and so forth, he tells her, well the device is very expensive. 1086 00:50:44,630 --> 00:50:46,190 It's a few thousands of dollars. 1087 00:50:46,190 --> 00:50:47,330 Is it worth it? 1088 00:50:47,330 --> 00:50:49,640 So it's one thing to get something for free and say, 1089 00:50:49,640 --> 00:50:50,870 it's very, very good. 1090 00:50:50,870 --> 00:50:53,630 Another thing's is it's going to cost you thousands of dollars, 1091 00:50:53,630 --> 00:50:54,390 is it worth it? 1092 00:50:54,390 --> 00:50:55,670 And let's hear her answer, which is very nice. 1093 00:50:55,670 --> 00:50:56,336 [VIDEO PLAYBACK] 1094 00:50:56,336 --> 00:50:58,840 - In the first few days I had the OrCam 1095 00:50:58,840 --> 00:51:02,900 I was in total awe of it because for the first time 1096 00:51:02,900 --> 00:51:06,890 I was able to open mail and read it, 1097 00:51:06,890 --> 00:51:09,520 instead of having my husband read my mail. 1098 00:51:09,520 --> 00:51:13,070 And I was able to go to a restaurant 1099 00:51:13,070 --> 00:51:18,700 and actually read the menu and order myself with the waitress. 1100 00:51:18,700 --> 00:51:20,300 And that was exciting. 1101 00:51:20,300 --> 00:51:23,890 When you can't do something for such a long period of time, 1102 00:51:23,890 --> 00:51:26,750 the OrCam was incredible. 1103 00:51:26,750 --> 00:51:28,570 - Believe is what the estimate is. 1104 00:51:28,570 --> 00:51:31,130 Do you think such a high price would 1105 00:51:31,130 --> 00:51:33,530 be something people would be willing to pay 1106 00:51:33,530 --> 00:51:35,160 for a device like this? 1107 00:51:35,160 --> 00:51:39,034 Do you think it's marginally worth it right now? 1108 00:51:39,034 --> 00:51:41,330 - I think you're going to find that that's going 1109 00:51:41,330 --> 00:51:43,920 to be on a case by case basis. 1110 00:51:43,920 --> 00:51:46,430 You know, people who have money there's certainly 1111 00:51:46,430 --> 00:51:49,430 no problem $2,000. 1112 00:51:49,430 --> 00:51:50,930 I don't have money. 1113 00:51:50,930 --> 00:51:52,870 I am low income. 1114 00:51:52,870 --> 00:51:56,300 But I would save my money, scrape it together in order 1115 00:51:56,300 --> 00:51:58,566 to get it at $2,000. 1116 00:51:58,566 --> 00:51:59,260 [END PLAYBACK] 1117 00:51:59,260 --> 00:52:02,110 AMNON SHASHUA: So that's interesting. 1118 00:52:02,110 --> 00:52:03,410 Where is it going? 1119 00:52:03,410 --> 00:52:06,590 So there are two lines of progress. 1120 00:52:06,590 --> 00:52:10,370 One, is when this existing niche is 1121 00:52:10,370 --> 00:52:14,780 to make the camera understand the visual field at higher 1122 00:52:14,780 --> 00:52:15,750 levels of detail. 1123 00:52:15,750 --> 00:52:18,530 So one of the things that we are now working on 1124 00:52:18,530 --> 00:52:21,140 is, we call this chatting mode. 1125 00:52:21,140 --> 00:52:24,110 So it's like-- it's the image and notation 1126 00:52:24,110 --> 00:52:27,380 type of experiment, or the ImageNet together 1127 00:52:27,380 --> 00:52:30,470 with natural language processing. 1128 00:52:30,470 --> 00:52:34,880 Say you are visually impaired or blind and you're disoriented. 1129 00:52:34,880 --> 00:52:36,740 You don't know where you are. 1130 00:52:36,740 --> 00:52:38,990 So you would like the device to tell 1131 00:52:38,990 --> 00:52:40,670 you every second what it sees. 1132 00:52:40,670 --> 00:52:41,780 I see here Tommy. 1133 00:52:41,780 --> 00:52:42,710 I see here chairs. 1134 00:52:42,710 --> 00:52:44,020 I see here another person. 1135 00:52:44,020 --> 00:52:46,520 I see here a wall, an opening, a painting, blah, blah, blah, 1136 00:52:46,520 --> 00:52:49,850 blah, blah, until you get back your sense of orientation. 1137 00:52:49,850 --> 00:52:51,690 So you want the device to be able to have 1138 00:52:51,690 --> 00:52:53,450 say, several thousands of categories, 1139 00:52:53,450 --> 00:52:58,100 like in ImageNet, together with image annotation capability. 1140 00:52:58,100 --> 00:53:02,180 The kind of stuff that people are now writing articles about. 1141 00:53:02,180 --> 00:53:05,240 And being able to do this at the frame rate of 1142 00:53:05,240 --> 00:53:06,500 say, once per second. 1143 00:53:06,500 --> 00:53:08,960 So wherever I'm looking at tell me what-- 1144 00:53:08,960 --> 00:53:09,530 what you see. 1145 00:53:09,530 --> 00:53:12,020 This is one-- another thing is to have natural language 1146 00:53:12,020 --> 00:53:14,380 processing, NLP ability. 1147 00:53:14,380 --> 00:53:18,686 For example, if you are looking at an electricity bill. 1148 00:53:18,686 --> 00:53:20,060 The system would know that you're 1149 00:53:20,060 --> 00:53:24,950 looking at an electricity bill and give you just the short-- 1150 00:53:24,950 --> 00:53:26,674 what is the amount due, for example. 1151 00:53:26,674 --> 00:53:29,173 The system will tell you are looking at an electricity bill. 1152 00:53:29,173 --> 00:53:31,220 The amount due is such and such. 1153 00:53:31,220 --> 00:53:35,070 So having more and more intelligence into the system. 1154 00:53:35,070 --> 00:53:36,270 So this is one area. 1155 00:53:36,270 --> 00:53:40,820 Another area is to go for a wearable device for people 1156 00:53:40,820 --> 00:53:43,100 with normal sight. 1157 00:53:43,100 --> 00:53:46,260 So here we're talking about, you know, real wearable computing. 1158 00:53:46,260 --> 00:53:49,490 So this Apple Watch is wearable computing. 1159 00:53:49,490 --> 00:53:51,285 But doesn't do much computing. 1160 00:53:51,285 --> 00:53:51,920 Right? 1161 00:53:51,920 --> 00:53:56,030 It displays, you know my text messages, emails, you know, 1162 00:53:56,030 --> 00:53:59,074 measures certain biometrics. 1163 00:53:59,074 --> 00:54:00,740 But that's not, you know, the holy grail 1164 00:54:00,740 --> 00:54:02,030 of wearable computing. 1165 00:54:02,030 --> 00:54:03,800 The holy grail of wearable computing 1166 00:54:03,800 --> 00:54:07,580 is assume that you had Siri with eyes and ears. 1167 00:54:07,580 --> 00:54:11,600 So you had a camera on you that is observing the scene all 1168 00:54:11,600 --> 00:54:15,710 the time and providing you real time information whenever 1169 00:54:15,710 --> 00:54:19,030 you need the information, like the people that you meet. 1170 00:54:19,030 --> 00:54:22,040 What were the recent tweets of those people that you met? 1171 00:54:22,040 --> 00:54:23,780 What is common between you and them 1172 00:54:23,780 --> 00:54:26,350 based on Facebook and LinkedIn and so forth? 1173 00:54:26,350 --> 00:54:28,100 So knowing more about the people you meet. 1174 00:54:28,100 --> 00:54:30,410 Knowing more about the stuff that you are doing. 1175 00:54:30,410 --> 00:54:32,480 And creating an archive of all what you 1176 00:54:32,480 --> 00:54:35,509 are doing throughout the day. 1177 00:54:35,509 --> 00:54:36,800 And this is a device like this. 1178 00:54:36,800 --> 00:54:40,400 This is how it looks like. 1179 00:54:40,400 --> 00:54:42,410 We call it Cassie. 1180 00:54:42,410 --> 00:54:43,580 So this is a real device. 1181 00:54:46,220 --> 00:54:51,150 It works continuously for about 13 hours. 1182 00:54:51,150 --> 00:54:54,170 So you have a camera working continuously for 13 hours. 1183 00:54:54,170 --> 00:54:56,150 And the purpose of this camera-- 1184 00:54:56,150 --> 00:54:58,120 so the way-- you put it like this. 1185 00:54:58,120 --> 00:54:59,060 OK? 1186 00:54:59,060 --> 00:55:01,920 So the purpose of the camera is not to take pictures. 1187 00:55:01,920 --> 00:55:03,410 It doesn't store any picture. 1188 00:55:03,410 --> 00:55:06,110 The purpose of the camera is to be a sensor, 1189 00:55:06,110 --> 00:55:09,470 is to interpret the visual world and provide information 1190 00:55:09,470 --> 00:55:11,190 in real time. 1191 00:55:11,190 --> 00:55:13,070 And if everything goes well, we'll 1192 00:55:13,070 --> 00:55:16,940 start launching this within six months from now. 1193 00:55:16,940 --> 00:55:20,510 So this is the next big thing, to go into wearable computing, 1194 00:55:20,510 --> 00:55:24,800 to go into a domain in which a camera is on you and processing 1195 00:55:24,800 --> 00:55:26,280 information all the time. 1196 00:55:26,280 --> 00:55:29,530 Unlike now, a camera on your smartphone in which 1197 00:55:29,530 --> 00:55:31,860 you take a picture on demand. 1198 00:55:31,860 --> 00:55:33,507 It's not working for you all the time. 1199 00:55:33,507 --> 00:55:35,090 Here it's working for me all the time. 1200 00:55:35,090 --> 00:55:37,490 All the time it's viewing the visual field. 1201 00:55:37,490 --> 00:55:39,440 Whenever it finds something interesting 1202 00:55:39,440 --> 00:55:41,330 it will send it to my smartphone, 1203 00:55:41,330 --> 00:55:45,590 like people that I meet and other activities that I do. 1204 00:55:45,590 --> 00:55:49,970 So this will be the beginning of real wearable computing. 1205 00:55:49,970 --> 00:55:51,505 So wearable computing with sensing, 1206 00:55:51,505 --> 00:55:55,310 with the ability to hear and listen, to hear and see, 1207 00:55:55,310 --> 00:55:57,450 and process information in real time. 1208 00:55:57,450 --> 00:55:59,240 This is the next thing that-- 1209 00:55:59,240 --> 00:56:02,980 the next big challenge that we are working on.