Back

Kyle Vogt (Cruise) Keynote - MIT AI Conference 2019

Transcript

[00:00:08] Host Create clip So now we move into the Autumn Autonomous Vehicle segment of our programming, we'll kick it off with our keynote speaker, Kyle Avoid who is the chief technology officer, present and co founder of Cruise Automation. Welcome, Kyle. All

[00:00:22] Host Create clip right, everyone's here. Looks like we're awake. Um, thank you guys for having me. This is gonna be uninterested talk. I hope for you guys. Um and I want to start by saying I think this is a really interesting time to be alive because we're at this moments where you have this convergence of growing computer capabilities were just on the cusp of some really interesting things happening. An ML and I really think we've just barely started to scratch the surface of what's gonna happen when we take these systems and apply them to real world problems on. But you know, I'm here, and I'm gonna talk about autonomous driving because I think that's a big one and a really important one for a number of reasons. So you can see the name of my talk there. A big thing here is machine learning is a key enabler for autonomous driving. But first I want to talk a little bit about, um, some of the some of the things that it takes to actually make that happen. So this side's got some sort of vanity metrics. They're not really that important for business. Lots of people on what's not on here is we've raise lots of money like $7.5 billion in the last year or so, and I'll tell you why that actually matters. It's a big number, Um, and sort of a za general note. What I'm kind of excited about right now is, you know, Thio really move the needle and some of these in the mail problems. It takes a lot of capital, and that's hard to do if you're a small startup or a small company and you want to change something, But in transportation, that's like, you know, it's like the air we breathe. It's like, really, it's like where we live, where we eat, moving around wherever you go, whether it's like to a job or, um, you know, to go to the store to buy food affects everyone's life. And because of that, it's an enormous business opportunity, which means there's lots of money flowing into the space, which is great because it means that uh, like you see academia sort of focusing towards this industry because once people graduate, they want to get jobs in that space. And what's that's what that's doing is describing this very nice feedback cycle where the money flowing into the space is generating Maur innovation and more research, which is pushing the industry forward faster than it would without that.

[00:02:24] Host Create clip So that's really great. So you know where Cruz you may have heard of us, but we're in San Francisco, mostly with with autonomous cars. Um, our goal is just to get these cars out there and at scale, and that's that's sort of why we exist. But the real reason for that is we live. We live in kind of a barbaric world today. If you think about it, there's 100 people dying per day in car accidents, and obviously that's not what people talk about on the news. There's lots of other things that sort of take our mind share, but this is a This is a horrible problem, Karak, since the number one killer of teenagers in this country, which is kind of insane, and I was thinking like Why? How did we end up in this place where we have cars that are killing our Children and we're not really talking about that that much? And I think the reason, you know, maybe it's kind of obvious in hindsight is that there is no alternative. You are stuck buying a car and driving a car, and there's no other way to live your life without it. And so you have to subject yourself, you and your family to this to this terrible existence we have today. And so I like. I like to think of Cruise as existing to correct this and, you know, personally, I can't think of, you know, figure out the meaning of life. Yet I think Greg Brockman from opening eyes gonna be here later today, and he's building artificial general intelligence, and I'm pretty sure that thing will eventually tell us the meaning of life might just spit out. 42 is the answer. But either way, um, you know, like until we figure out the meaning of life, I can't think of anything better to do than try to fix this thing that's so horribly broken that's hidden in plain sight and that no one really seems to be doing anything about, um and just just elaborate a little more. Uh, you know, I have a 14 month old kid s Oh, this this, like killing the Children thing really resonates with me. And, uh, if you think about what parents have to go through today when a kid turned 16 and they need to get around like, go to school, get a job, whatever, you have to get them a car or these most people get in the car. They all want to use uber and lift these days. But, you know, most of them still need cars. And ah, I was doing the math on this. And and when there's a S E v er like a 2000 kilogram SUV going 80 miles an hour on the highway, that thing is storing as much kinetic energy, more kinetic energy than a stick of TNT. A stick of dynamite, um so is 1.2 mega joules of energy. You're handing over to your 16 year old kid and saying, Have fun. I mean, would you give him a stick of dynamite for their 16th birthday? That's basically what we have to do today s

[00:04:52] Host Create clip not good. Um, I think we can do much better. Um, so our goal is we want to build some technology that solves this problem. Uh, sort of the minimum minimum table stakes here is you gotta put a product out there That does good. So our cars have to be better drivers than humans. Safer. Um, and we believe just from a moral standpoint, we're gonna make the cars of the future be clean. So they're all gonna be electric, Which means down the road, maybe they can all run on sustainable energy. Um, and there's some other peripheral benefits too. But I think that the safety one is sort of the wake up call for people. We really need to solve this problem. So I want to talk about cars for a second. This is the stock is about. You know, this conference is about the future of computing. But I gotta talk about cars for one second, because if you're an engineer, this

[00:04:52] Host Create clip o. This is

[00:05:41] Host Create clip is Theo Theo Technology

[00:06:22] Host Create clip behind the scenes there, just just in the automotive side of things. And I was doing the math on this Thio to build a car, right? If you want to solve this problem, you gotta have cars. You can't just bolt sensors on existing car. That doesn't work. So if you want to build a car, what does that mean? It takes, um, roughly one millennium year. Is that no one Millennium Engineer? Year of time, Basically 2000 engineers for a year or 2004 years for one poor engineer to build a new car. And that's for like a big car company. Um, like like GMO are others where they've already got the assembly plants, all the engineering expertise in house. They've got the supply chain, all that kind of stuff. It still takes them 2000 engineer years of time to put a new car together and at minimum of $1,000,000,000. So it's like very capital intensive to solve this problem. So this is a money piece there. But I think again, going back to the fact that transportation is so huge, opportunity is so big, it means were able to a thio bring in enough capital to really start to chip away at some of the really tough ML problems and even take them to the next level. I think you might see some of the greatest innovation in machine learning and computing technologies driven almost entirely by the autonomous driving sector. And that's mostly because the demands for the technology are so extreme. And it's one of the few places outside of, you know, like serving up ads to people where there's that kind of regular opportunity. And so I think that's a beautiful convergence of sort of a business opportunity and one that can have impact eso again. You know, in terms of meaning of life, it seems like a good thing to D'oh.

[00:07:56] Host Create clip Um, I want to show you a little bit Why ml matters for self driving Because, like when I was 13 I worked on my first little prototype self driving car. And in my head, I think the problem is simple. You build a system that, like looks for the yellow lines on the side of the road. Isn't that convenient? There's two bright yellow lines that you can pick out of an image and try to drive the car on. Then, of course, getting into this problem you see, it's much more complex. So here's a video from one of our cars loads here. Um, and basically, what I want you to pick up from the scenes is the complexity of the scene went right through that one. Okay, maybe. Okay, maybe we'll get the 2nd 1 Okay, I'll do this old school.

[00:08:44] Host Create clip Other ghost will figure it. So will you see about this Rose with no ley lines And people aren't really parked in any particular orderly fashion, and they're not even staying on their side of the road. And then there's these little social negotiations where you have to choose who goes first. And then in this case, there's actually a section of

[00:08:59] Host Create clip the road that's so narrow that you have to like, you can't even squeeze by at the same time. You have to, like, squeeze next to each other, and then the other person goes a little bit, and then there's a little more space, and then you go and then they go and all those things are like, not easy to draw it on a white board and say, like, if this situation happens, do that because you're white board would be like you need, like, 10 billion whiteboards to capture all the possible snares that you could have. And so to do this thing to, like, solve this problem that has this huge impact on is gonna, you know, save the Children and all that good stuff we need ml. We can't do it without him melt. And so that's kind of nice

[00:09:33] Host Create clip that we get to work on these things.

[00:09:35] Host Create clip Here's another one. I will try to give this one a second, see that place there ago. So other things like the problem with autonomous driving is you don't have the structured world. There's often not ley lines, but then also, you have a really pretty unpredictable other agents in the universe. And there's the social contracts, like people holding stop signs that usually means, like, Don't don't run them over. Andi waits until they turn the sign to slow, and then you can proceed. And so like figuring out the context of this situation like, what is the intent of that person holding the sign as they're waving? Are they waving at me or other guy like which were they looking? And when they're waving, which lane are they telling you to go into these air? Extremely complicated perception problems on. And I think each one of those in itself is sort of Ah, you know, maybe a PhD thesis or something like that, but certainly a really challenging problems. And so, uh, with autonomous driving, like the reason you know they're selling people working on this and we have 1500 people in this company and we're really getting started is the bar for launching is really, really high. Like if you want to beat humans, humans are actually really good at driving. We don't make catastrophic mistakes that often is human drivers. And so, if you think about you've got these, like three critical ingredients that have to be all nailed to make this work, you've got basically beating humans at a really challenging test task much harder than chess or playing video games or things like that. You see these unstructured in environments. You also have to do it in real time. You get maybe 100 milliseconds to make a decision about what's happening there, which drives sort of insane requirements on the computing side and its safety critical. You basically can't make mistakes, or if your email system makes a mistake, you need to have a backup system, then probably a backup system for that backup system, because people are trusting their lives to this so that that kind of makes this the challenge of a generation. Um and and I think you know, I think this has been said before, but the closest analogy I can think of to this is the Apollo program. But instead of that being about, like, sort of national pride and being first, I think in this case we're trying to, like, really solve a problem that exists, and it's hidden in plain sight, and it's pretty horrible. Um, so to get there like to solve these problems, you see the complexity of the scheme is that the scene there so diverse and there's so many different situations. You need machine learning at scale, and I think that presents a few interesting challenges, and they're only really a few companies in the world. Better, I think, forced to do this kind of scale and really need it and then are able to do it. I want to talk about some of the little interesting things hidden in there and the scale side on. Then what's coming next, Like where I think we'll be in a few years from now.

[00:12:09] Host Create clip So some of the critical ingredients for ML at scale and autonomous driving, um, we were kind of bootstrapping the system. If you think about it, we have lots of cars driving, collecting lots of data. That is a good source of information from which we can build our initial models. I think of it is bootstrapping At least today. The predominant way to turn data, which is kind of useless by itself into insights or or improvements to your model is by labeling it and saying this is what that person was doing. This is this is a car. This is it is a human. This is a stop sign, which, you know, if you have a really large amount of data, that's actually a lot off work. A lot of people involved a lot of tools involved. Um and then you have kind of in this case that the data sets that were working on our enormous they're not like click streams like you have it, maybe Facebook or Google, where the information you're building a model on is, you know, where someone clicked on a page or um, you know what the order of the search results were? We have vehicles that generate gigabytes of data per second from, you know, over a couple dozen cameras, high resolution lied ours and radars and all these sensors. And so just the pure infrastructure required to move all that data around. We're talking gigabytes per second per car, times hundreds of cars. It adds up really quickly, sort of some fun of problems in there that need to be solved along the way, and nothing that really does it off the shelf and then training infrastructure and evaluation tools. This is part of the workflow that machine learning engineer goes through when they say, OK, I have a problem and a bunch of data. How do I turn that into a system that solves that problem? Um, we talked a little bit about the fleets and the richest of that data. And the reason we want lots of data and lots of driving is to try to maximize the entropy and diversity of the data sets we have. If you think about it, if you have a 1,000,000,000 miles of driving on nothing but highways, your system really knows nothing about what it means to drive in cities on DSO for us. We basically send the cars loose in San Francisco with highly trained drivers behind the wheel, and the system is running an autonomous mode on teaching us about the performance of our system. But in the background, it's recording all this data so that we have these crazy situations that we can feed into our models and teach them the difference between you know, the little characters of people in this image and the real person. I don't know if he even see them, but there's a real person kind of standing in the middle there, between between the cutouts and in the example below, Uh, which, which is, I put in a block or so I read a couple of years ago. Um, this kind of looks like, you know, I'm not sure if this to you looks like a normal scene or sort of if you put your engineer hat on, this looks like a complicated scene, but the point is in this 11 of the problems with non ML systems is it's really easy to maybe handle each one of these situations in isolation like I want to drive around a contract. I wanna drive through cones or I want to drive around the vehicle where I want to, like, avoid a pedestrian. But when you mix all those together and you end up layering all these maneuvers on top of each other, anything that's based on a state machine or some historical way to solve this problem is going to get stuck and confused there. Um, and so like, you know, we need these situations both to understand the full scope of the problem we're trying to solve, but also to make sure that when you deploy the system, it's full of machine learning models. Um, the vast majority of the time, whatever it encounters on the road, is gonna be things it's familiar with, as in that that scene is similar to what's in your training data set. And in the very rare cases when it encounters something totally new, like maybe some variation of this scene, it is familiar enough to what the system has seen before that it can reason about something safe to do in that situation, and it never finds itself in a situation where it's something completely different than it's ever seen and with no sort of backup plan.

[00:15:51] Host Create clip So that's the That's the reason we need really large data sets in, and it drives some of the complexity. So when you have all this data, like I said before, it's completely useless to sit on this mountain of data if you don't know had extract insights from it. So we've gone pretty far down the rabbit hole on doing things to basically use the minimal amount of human energy to extract the most insights out of our data. So what you've got here are some of these tools where you know, instead of having to draw a box around a person to say, this is a person, we actually have a male models that assists the humans. In doing this task of labeling so sort of this lovely feedback cycle where the better the email models get, the easier it becomes to label new data and the human labor labor required to generate these data sets or insights from this data drops over time. And then on the bottom, you can see sort of a top down bird's eye view of a scene in the colored pixels represent light our points and the challenge there is, You know, a 10 segment second of driving may have 100 individual frames in. The task is basically to label the same car across 100 frames of time. And so we have systems that basically automatically figure out what you label this car here, so you don't need to label it again. We'll sort of do the rest for you. Um, and that's one thing that helps. And I think like when you get to scale, these things become essential because of the size of the data sets you need. And if you're you know, the cost basically goes through the roof. If you're not careful.

[00:17:13] Host Create clip So

[00:17:13] Host Create clip this is

[00:17:14] Host Create clip another vanity metric slide. I think it's none of this stuff really matters. What matters is, can we solve the problem? But just to give you a sense of all the computation it takes to get this stuff done, you know, I'm gonna touch on simulation a little bit just because it's sort of very much related to how we used em Ellen today and will be very much more in the future. But you know, like if you have, um, an autonomous vehicle system. Basically, you can think of that as a virtual driver. And it's not enough to put that driver fruit through like a D M V driver test and say, Is it good enough when you push out US offer release? What you really want to know is across the next 1,000,000 miles of driving. What is the estimated performance of that software gonna be? And that's a really hard question to answer because of how diverse and crazy the world is, like, How can you predict, you know, when it sees those crazy scenes, how how it's gonna perform. And the answer is we basically record everything we've ever seen. And, uh, make sure that the new software does the right thing, which can also be used to train ml systems. And we also simulate a lot of stuff situations that we haven't seen but could plausibly see. So you put those together and you get pretty good coverage in a pretty good way to test the quality of your driving.

[00:18:25] Host Create clip Um, and then the last thing on this list, this email experiment platform is really interesting. I, um, I want to paint a picture for you guys real quick and see if this resonates with any of you. Uh, how many people here are in academia or currently working for an academic institution? Okay, a few of you. So, like, I think the workflow, especially if you're a grad student, is you're sitting at your desk. Maybe on your desk is a computer with a GPU that was donated by in video or something like that. And maybe you have, like, $10,000 of cloud credits, and then you've got, like, these open data sets that you can download from the Internet that are, like, pretty interesting, But they're small and kind of crappy still, and so that that's the tools in your toolbox. And when you like, want to run an experiment or write your PhD thesis? You put all that together, you like, come up with an algorithm and you hit a train. That's sort of a gross simplification. But let's say you get to that point, and then you wait. Yeah, that's your PhD thesis right there. That's it. Uh, and then you wait, like, two weeks for an answer to come back. If you've a really complicated model t like run down little your data and run it through the system, train a model, then evaluate your model and so in my head were kind of living in the dark ages for machine learning, experimentation and development. That's the equivalent of having, like, a C plus loss program. And you want to compile it? And you had compiling. It takes two weeks, like, imagine trying to debug that, um, that's basically the world we're living in right now where the iteration cycle is on the order of days or weeks, eh? So that's kind of insane. So, um well, here's a cool slide. I forgot about this one s evaluation Tools are important, you know, like these air and other things. Like, if you're sitting at your desk and you train and you get the output is probably like a big, uh, CSU file or, you know, whatever your date, it comes out it. So the next thing you want to do to generate once you run that for three days or two weeks is go measure all your data and visually inspect sort of things you didn't inspect or look at the quality of your data. set of the diverse diversity of your data set, and those things are really hard to do using off the shelf tools. So we built a bunch of those and it could be really interesting when you put it all together. Um, but the reason we do all these things, not because they look cool and there's some eye candy. But because it helps us shorten that iteration time, we want to move out of the Dark Ages and into a world where you considerate really, really fast on a male models to save the Children.

[00:20:35] Host Create clip Um, so I want to talk about what's next and sort of the trends, the general trends we're seeing right now, Um, and what I want the world to be like for all of you when you're working on female problems. So this one's pretty obvious, like Moore's Law is kind of dead in terms of CPU frequency, but not really in terms of cramming transistors on a chip. And I'd say they're Moore's Law is alive and well when it comes Thio, uh, neural network computation performance. We see all these custom chips coming out where it's not about faster clock speeds. It's not about more transistors, but it's like how you arrange those transistors to get more throughput for this types of compute workload. And so that's happening. And that means that, um, we're gonna build more complex models because going back to the real time aspect of self driving cars. If you give me, like, 100 milliseconds to solve a problem and I'm an engineer, I'm gonna, like, pack as much complexity into my model as I can before I run out of compute budget in the 100 milliseconds. If you give me a bigger computer, I'm gonna, like, make a more complex model to get better results. Um, and usually not always, but usually when you have a larger or more complex model, the data set that you need to sort of maximize the potential of this model. Before you saturate, it becomes larger. And so you've got, like, three things that are on the exponential paths right now, which, if you're not careful, we'll sort of blow up the cost of the data sets either collecting that data Or more importantly, I think the cost to label that data right now. If you were to pay like an off the shelf company or like the companies that exist today. If you're gonna go to them and say I want a 1,000,000 miles of driving every every bit of data from that Dr annotated by humans, I think that would be billions of dollars, which is which is not tractable in any way, shape or form.

[00:22:27] Host Create clip So I think the thing we're doing today, this is not necessarily future. But what we're doing today is a lot more auto labeling, and I don't know if that's the correct industry or academic term for this. But basically what I mean is you take the human labeling step out of the loop, and there, you know, there's unsurprised learning other ways to do this. But what I think is really interesting about driving is there's a lot of things you can infer from, uh, the way of vehicle drives. If it didn't make any mistakes than you consort of implicitly assume a lot of things were correct about the way that vehicle drove. And while we've got, you know, hundreds of cars out there and lots of people driving around when the eighties are basically driving correctly and the people in the car are saying you did a good job. That, to me, is a very rich source of information. Um, I think you've seen other people in the industry do this. Uh, you know, the simple example is when someone disengages an autonomous system, that's a signal. That's a label. There's probably something wrong happening at that point in time. Now, you don't know which of a 1,000,000 things could have gone wrong too, cause that disengagement. So we do a lot of things with our human drivers to give them tools to tell us a lot more about what was happening. Um, and that's a source of label label, but that's that's still, like kind of paying humans to do the job. What I think is really interesting is if you observe, basically use all the humans on the street around us as a source of ground truth or labels. So and this little side is kind of interesting here. This is one small slice of the kinds of things that could be done right now on the top. You've got a map in this sort of the brightness of the lions on that map represent how frequently we have observed human drivers following you are so going down that particular line of traffic. So it's just like a heat map for where people drive.

[00:24:06] Host Create clip If we analyze that data, we can infer a lot about the correct way to make a left turn or for the A V. But perhaps more importantly, what other people are likely to d'oh not just in a particular intersection, but more generally when they're making a left turn. It's not just this perfect parabola like people aren't like race car drivers, where they're trying to hit the apex of the turn at the exact right speed. They do crazy things, and it's also dependent on context where other cars on the road. And so that's another thing where if we were to write a heuristic big system to do this, it wouldn't work. It will fail like the second. It deviates from this very simple model that we have on our heads when we're writing that heuristic, Um, and so you need sort of a more learned system to do this job. And if we were to try to pay humans to say like, you know, of the 1,000,000,000 observations that are a bees have had of humans driving in San Francisco. Can you go label each one of those every car we've seen and what it was doing at the time? That's another really expensive endeavor. So if you're able to take human driving by observing the world around us and sort of infer from that using things like a map about what they were actually doing, you can build a system sort of like what you're seeing down below, where we've basically haven't told the system how to figure out what other cars were doing. But But just by observing literally billions of instances of other people driving, we can train the system t know what to do and o r. What other cars air about to do in a complex urban environment, even if there are no lane lines. And even if there are like cars, you know, doing these crazy little social interactions, which is pretty cool. So basically, I think the conclusion from this is if you're not, if you're if you're busy, if you're a company in your business model is dependent on humans labeling your data. You're gonna get crushed by the companies who are thinking about how to phrase this how to reframe that problem in a way where they don't need to pay for those those labels or have humans in the loop.

[00:25:57] Host Create clip The other thing? Well, yeah, I talked about simulation briefly. I'll come back to that. So, uh, this is not a picture of San Francisco. This is actually a rendering from our simulation engine. And one of the really interesting things we can do with this is, you know, right now I said we're bootstrapping the system by having lots of people drive lots of cars. But I think, um, in the future, like systems like this where this we're still working on this, But as it gets better, we basically have this whole universe we've created. We actually call it the Matrix where the baby is live in this little universe. They don't know they're in a simulation. And ah, the data that comes back in from that is getting so close to what the car is actually see out in the real world that I think we can put all these vehicles in a simulated world kind of like an open a I, Jim, and just let him go. Ah, and we've started inserting other agents into the world. Yellow's video game programmers here, like taking the eye from StarCraft and other games and dropping it into this little simulation. And so eventually, what you have is this living, breathing ecosystem where the car's air sort of randomly encountering each other and using different flavors of driver profiles in getting themselves in these crazy Harry stuck situations and were basically using this as a source of entropy or fuzz testing. Thio introduce new sources of data that we can generate purely synthetically without paying people to be on the road and labeling what's happening. And because it's in a simulation, we have perfect ground truth. We know actually where all the cars were. Which means the labels that come out of this are far better than you can ever get from having people driving around and having humans try to approximate. You know where cars and people are in that scene. So this is the future. Um, you know, you've got this, like workflow from from ML right now, where you sort of create a model, uh, traditionally, generate the labels and then you train it and maybe you like, analyze what happened there and you literate on this loop, and I think that can take on the order of a few days to a couple of weeks. Um, what I'm looking for in the future is to basically go from an idea like I want to build a model and I want to solve this problem. Here's the Here's the structure are for how it should be laid out. And

[00:28:08] Host Create clip I

[00:28:08] Host Create clip want to, like, hit, compile, and I want to have in the background. When that happens, a data set generated that is larger than anything that become collected and annotated. Today. The model trained and analyzed and basically continuously optimized until it's saturated in the same amount of time it takes. It takes to compile code, and so is getting a little technical here. But the impact of this is any time you can take an activity that goes that takes on the order of days and weeks and compress it down to seconds, you're gonna see a super exponential increase in sort of the quality of systems were able to build and the types of problems we can solve. And I think perhaps most importantly, the barrier to entry for, um, you know, that person sitting at their desk and grad school with an interesting idea being ableto explore the full potential of that idea and truly drive innovation. Um, so I think that's where we're going. And, uh, you know, we're building all of that at Cruise. Hopefully, there's a takeaway here, which is driving a V or deploying a visa at scale requires ML at massive scale. And today we're kind of still living in the Dark ages. So you gotta throw a lot of people in a lot of money at that to do it. Um, it's a means to an end to solve this very important problem.

[00:29:18] Host Create clip But I would also end by saying like because of the things I've outlined today, I do think some of the best innovation is happening and some of the most powerful tools are being created inside of 80 companies. So if you're not already working on a V, maybe this is a good time to ask yourself why. Um yeah says no sales pitch is this isn't a sales pitch, guys. This is a call to action. You need to say of the Children you absolutely need to be working in a V if you're not already. I'm dead serious, like Come on, You wake up in the morning. You know you're going to work. You're gonna do the best possible thing with your skills as an engineer or whatever it is you D'oh! Ah, and it's gonna matter. And, you know, humans. We're only going to go from driving cars. Do not driving cars once. And so you they're a part of that or you're not. And that's for you to determine your legacy. So again, this isn't a sales pitch. But of course, if you do one of our crews we have Michael over here from our recruiting team would be happy to talk to you after the stock. All right. Thank you, guys.