Back

Scale AI - Interview with Christian Szegedy

Transcript

[00:00:00] Host Create clip I'm Alex, the CEO and founder of scale. We labeled data for a I Cos I'm here with Christian Segi, currently a research scientist at Google Research. He's worked on a number of influential results in his research career, so I'm really excited for this conversation. A couple of them. So he published the first state of the art use of deep neural networks for option detection and images and 2013 than 2014. He published the first paper on adversarial examples, which is obviously now ah, hot research topic. He also designed the Inception Architecture, which is one of the most popular architectures for object detection and images. And he invented Batch Norm, which really introduced the concept of normalization in deep learning and is now used in most modern neural network architectures. And now he's one of the few deep learning researchers working on formal reasoning. Welcome. Thank you. Um, so I wanted to start, actually, just by asking when you were working on perception, I think starting 67 years ago. Why were you working on it? Why do you think it was interesting or important research?

[00:01:13] Host Create clip So when I joined Google in 2010 so I waas not really a popular topic. Most people looked at very skeptical eyes and my my purpose of joining agrees to learn machine learning. And and actually, I was not so supplier much into perception per se. I was much more excited about learning machine learning in general because my goal over always waas too. The design systems that are artificially intelligent. So actually reasoning was my original motivation to learn machine learning. But at that time, vision was one of the most obvious outlets. Right? And I had the luck that I managed to get into a group that who were who did research on computer.

[00:02:01] Host Create clip Why did you believe in machine learning at that point? Because they think the results weren't that compelling at that point that you could really believe that machines will be able to do all these things that humans do very well. So what was what was like the core of that belief?

[00:02:17] Host Create clip I believed in a I'm actually learning for decades. I always wanted to work in this area. It's just that when I I did my study, that was not very properly, and it was hard to get a job in the moment, but I will wait for that. Machines were eventually and learn to learn just as well as humans. We just don't know the right techniques. So I was surprised how little new ideas was necessary. So, actually, most of the ideas we use that from the seventies and eighties, Right? So So it hardly anything you

[00:02:44] Host Create clip but you sort of You just had this, like, personal conviction that, hey, machines should be able to learn as well as humans.

[00:02:50] Host Create clip Yes, My guess was that biological systems can do learning, and they actually so learning is you pick your so it shouldn't be something that is that satisfies a lot off, really had engineering or some really big jumps out ideas because I mean biology, fickle thought. So we should be able to somehow get that. I didn't know it was so simple, though, so I I thought it was much more complicated.

[00:03:16] Host Create clip Right? Right. You expected to be would be much harder to get get good at these problems. Yes. Yes. And now what did you expect was sort of the, um the importance of your work in perception when you when you were working on it, it sounds like you sort of fell into it. But what did you think was going to be the impact of that work?

[00:03:37] Host Create clip Yes, so I didn't know how. So what is the timeline? To get to use a breeze? That so the fact that they got to use the Berries that in two years from Lexie to present after 2014. It was a surprise to me, but I was not really surprised that it happened. Davidian that say I would have bad that it happens, get in a decade or five years. But I wouldn't have expected two years for mission to improve so much. Right? When I started computer vision, I didn't even know how far it was. So I was surprised how poorly it was before,

[00:04:09] Host Create clip right? Yes, it was sort of It went through this massive progress over over just

[00:04:17] Host Create clip like that. I think that that I had zero back now the computer vision So my only only bad of us to do produce good results is to do something that nobody else did. I mean, a few people did like Alaska baskets, groundbreaking work so that that were popular toe when I've worked on it. But someone wants computer visionary such a I directly that Google were absolutely super skeptical about neural networks in general in 2012 so that it was hardly an attraction. So it was nice for me because we do me through that hand. It could like two of us could, like, do a lot of six quickly before most people jump in it.

[00:04:54] Host Create clip It's one of these interesting things where most we were working on it for a very long time, didn't was very skeptical, didn't expect the methods to work. And so it's because you like a beginner's mind. In some sense.

[00:05:05] Host Create clip Yes, there's approach. Yes, I thought that that's my only chance. So So, yes, they do it 100%. Because if it doesn't work out, then I am. But I have no chance to catch up with all the other people. Yeah, I

[00:05:17] Host Create clip got it. Yeah, it was It was almost like, Okay, deep learning has to

[00:05:20] Host Create clip work. I mean, doesn't have to I mean, I could have survived without it. I just said OK, I'd bet on it because I have nothing to lose

[00:05:27] Host Create clip the title of your first paper on adversarial examples was, ah, intriguing properties of neural networks. It is. There's, uh I mean, it was almost like you were discovered. This curiosity on it was really framed in a in the context that they are now. Right now, it's like safety is sort of the primary context in which people talk about them.

[00:05:50] Host Create clip Yes, actually, it was It's a stupid study because the I had this adversary examples lying in my room for more than a year, almost two years. I discovered him in 2011 but then which I came to me and wanted to write a paper read all kinds of. So I was too lazy to publish it and then avoid Azad. Okay, you have this thing and we can come back with the other stuff and the publisher, joint paper, various interesting properties. And that's people started to bail out. And they didn't put their own stuff because it's not interesting enough or whatever. And then the paper mostly was about adversity. Example? Yeah, I did. I would have known it beforehand. I would have just brought the paper like it. Good boy, check alone or maybe completely alone. And then I could just be the title of adversity an example. So actually, we planned with my manager to write a paper with a title like the blind spots in your own network's yearly just on the topic. But we just I just was too lazy to do.

[00:06:47] Host Create clip Interesting. I see, Really, There were other intriguing properties they wanted to talk about, but then, uh, then everybody else dropped out and was just about

[00:06:56] Host Create clip s So basically the paper, there was another interview property in the first section. And then there was a very example. But nobody cares about the first thing thinking property, that was kind of like it was not so superior. Teoh.

[00:07:08] Host Create clip Right. Uh, and so that's interesting that you had discovered these adversarial examples two years before you published the paper and you just it was, uh, you didn't think they were important enough to publish.

[00:07:22] Host Create clip Yeah, I thought it should be published, but I didn't. I just procrastinated on doing it first. I always wanted to run a bit more experiments and wanted to Andy somebody to do an emission it experiment. I had experimented, video data says, but not on a mission. It and I saw that was in protest. Yeah. Then it lets off. Wojciech volunteered to do that for the mission. Some people grow, and it was It was not planned these day. So that's how it happened. But it doesn't really. I mean, main thing is that do you think has a van? But I think if I wouldn't have publicly that somebody, as for that, came up with the same idea. I mean, some people came up with similar ideas within that same one year period, like from 2014 to 15. So right, so I had it in 2011. But I was unfortunately, only I liked them. 15 people in the world.

[00:08:18] Host Create clip Yeah. Yeah, well, I'm curious right now. It's a pretty hot topic, generally framed again in the context of safety. Like, how can we deploy the system's too, uh, deploy them to the real world. If adversarial examples exist, why did you think at the time that they were important adversarial examples?

[00:08:40] Host Create clip It was obvious to me that if the planning takes off, you can use adversarial examples for all kinds off at Xsara. And so that was one of my motivating talk in 12 2004. I'm seeing that to say Okay. For example, Spam theatres could be circumvented. Thinks like that. So I thought a lot about these implications of this of this phenomenon. So and then people got ever actually, Jeff Fenton was involved. Us early talk, some members of your examples a year before this established, and he was really shocked. You said, if that's true, then we have to do some tea. Oh, yeah. I mean, that that there were obvious implications. Practical implications?

[00:09:25] Host Create clip Yeah, based on the state of research now, I mean, I think there's a lot of incremental work, uh, to make the old networks robust. These adverse Oh, examples. I'm curious. How is your, um your optimism or bullishness on deep learning changed over time. It sounds like, initially were skeptical. Then you start working on it, sir. Working. And that was exciting. And now deep learning is sort of the state of the art for a large number of problems. How is your optimism with respected deep learning changed over time.

[00:09:55] Host Create clip My my change very, very quickly, So I mean, I wasn't very early group will bring me things like that. It was like a few people and I saw immediately that, like super clever people like Jeff Dein wille entry rank where, however, they thinking about this and what was the thinking process? So I I got called visit in a few months that that's a good idea. And actually, I think so. If people ask me about that, it's a hyper time. I always say that even if you know what, the research would stop right now and nobody would do anything you in deep learning. It would just take what the technology that exists and try to exploit it at maximum extent and see that that would be like 10 years off work. Really cool advances in technology just based on the current state off machine learning gun. Yeah, but most people don't really get it. I think so that they don't see the potential. Ah, in that, if you look at it, the research is excellent. It right. So therefore, I'm I'm extremely bullish, but I would say that deep learning itself is there is a poor notion because it was designed. So it waas point when when it was like it was referring to a few layers that were like Matt Well, here but you're see right, and it's like, Okay, we do deep because it's not one here, right? But I I think it's like it developed into something that you have pro complicated programs that are parliament rised by what kinds of matrices or tens of ours. And you're learning those stands ours and so that it became a much more genetic notion than it used to be. Right? So and I think this tendency to go on. So I know we are not designing the natural architectures anymore. We let other complete submission learning system designed them So and so therefore, I think, is deep learning has a very short time for in future, it will be super said by general programs in theses, Right? So that's why I think it's program center. This is the future. It's not be plenty.

[00:12:01] Host Create clip Deep learning is sort of the dumbest way to do meta programming. Exactly exactly. Yeah and sort of meta program meta programming and program synthesis is is really the future.

[00:12:13] Host Create clip Yes, I think as we generalize deep learning and we improve program synthesis, Davey emerge, and basically machine learning is about creating a program that solves a task automatically right. So currently, if you just use Grady and recent to do these, then then you you can solve certain tasked with certain engineering effort. But if you if you have like more sophisticated software engineer software synthesis methods, then you can you can solve so much more general task automatically than so. Basically, we're moving into the Terrian that that everything will be synthesized by machines. So we have this more and higher and higher levels off off automated feedback loops,

[00:12:56] Host Create clip right? I think it's it's sort of well put where it's like, even even with what we have today. If we just exploited that and implemented in a bunch of different areas, that would already be sort of an astronomical amount of impact. Um, and then But then research is getting is improving as well, and so is sort of like you're layering. You can layer to ex special curves on top of each other and say, Hey, it's gonna be your nose where it's gonna go

[00:13:22] Host Create clip right? Yes, So I think it's kindof like similar to So how sort of has it endeavored from, Let's say, from 1992 2010. I think you guys eating software now right from wrong, let's say 2050 into the next 20 years at least. I don't know what comes after that, but I think it's like currency is kind off like 1990. So if you would have bet on computers and software in 1990 then you would have seen as right as not if you bet on the

[00:13:53] Host Create clip right. Do you think that state of the art perception systems have gotten to a point where, uh, perception is no longer the bottleneck for any robotics problem?

[00:14:04] Host Create clip I think it's not 100% there, but close. So I mean, at least there is really a light at the end of the tunnel. That was not the case in 2010. 2012 feet? No, I would say most people think perception might not be perfect, but that is a clear improvement past that we are on, and it's and it will continue until we are there.

[00:14:26] Host Create clip Use work on perception, and now you work on formal reasoning. And as you mentioned before, like formal reason was actually your primary goal. And so now you're working on, uh, like, for example, your most recent paper was about fear improving. So actually proving mathematical theorems using using deep learning. Why, why you're working on formal reasoning

[00:14:48] Host Create clip The way that we interact with computers have bean essentially the same way that WAAS figured out in 1950 90 50. So it's basically essentially that we're using a glorified forth and compilers. Actually, even the complexity of programming didn't land on it. Say, if anything, it went up. So and I think it's like now, with new techniques and technologies, it's it's slowly starts toe become possible that computer set it up to humans and that the humans have to add up to a computer. So the past 30 40 years was about people getting used to computers and start to use them and, uh, training a lot off software engineers. But I think it can be fundamentally changed. By changing the way we interact with computers so that they can understand fuzzy reasoning, they can understand the intention. They can understand a lot of things that humans take for granted. Then we will be able to interact with computers much for naturally and then productivity could go up a lot, and I think in order to start with that, you could start with some practical tasks that it says area route.

[00:16:03] Host Create clip But I thought that what is the simplest possible thing that you can make sure that your system understands fuzzy for processes and it can turn them into formalized processes. And I think mathematics is the cleanest example of that because it doesn't rely on any level. So it you don't need anybody ever not age anything extra. Everything is that it says in the axioms that only a few of them so the question is, can be creative. Assist them. That's allows you to interact about mathematics just a way that you interact with a human, but and it can interact at the same level. Is a really good team, a message petition and also understand you in natural language. So So that would be a first step toward systems that could revolutionise computer science. Or how do we interacted computers? Because once you have the first step, like it's like a superhuman mathematician, right that you can you can infuse more and more domain, not age, and then you can do that synthesis. Read it. That's why I think it's so fast into this is the future because I think it's like some mathematics is just one step for me to suppress into this right. And so testing is the unification of machine learning and programming, right? So, yeah, therefore, I think is like this in a logical next step to do mathematics. But for some people is completely outrageous. So most people were working for my reasoning for the kids.

[00:17:35] Host Create clip They think it's kind of fact that said, That's ridiculous. You have dream it, it will never happen. But I saw that even in 2011 a lot of tapering computer vision so that I d printing stuff with the ridiculous dream between happened right and it happened in two years. So I think I think it's like that is a significant chance that will be this pattern recognition and perception capabilities. We will be ableto get to the point where we get we can get mathematically capabilities of communication capabilities similar so that at least useful for human. So basically, the my point is that, for example, if we had the software that could read mathematical literature in human form, that would be a very strong indication that we can really read fuzzy reasoning that is given a lot. So imagine that you have employees that you want to program something on them. Give him a tight ass can say, do this. You don't have to describe what the staff, because then you don't need employee that you just died the program by yourself, right? So similarly with the computer, imagine that you have an artificial internal, right? If he's an employee that you just said the same things that you would tell your stuff in a gin and then it would program it for you when they come back. And then you say I wanted something slightly different and and then eat it a little bit about you, and at the end of the day, you get something useful, and I think that that is kind off, like, even if you take the best program, Is that Is that like a potential that that is kind of like 100 ex Easy that you can accelerate its software engineering if you would have that kind of capability? So that's what I think is the possibilities that you could make it the West's office that off without knowing any programming,

[00:19:23] Host Create clip right this sort of fear improving is the first step toe sort of full program synthesis.

[00:19:28] Host Create clip Yeah, basically, to understand you without fully specifying everything,

[00:19:33] Host Create clip right? Right. This sort of like fuzzy commands humans yet to the world. I myself, having spent spent a lot of time doing math, I wonder, like, Why? Why do you think that's actually a tractable problem?

[00:19:46] Host Create clip Some off the similar reasoning problems like computer games like Just go. It looks like they have their my remark, a pristine for solutions, the most of them boys down to perception that we have less than pretty well and you can solve them. That is a lot of complicating factors in Mass. The one off the points complicating factor the safe way is not an option, so you cannot really be slightly better than your opponent. You have to be either prove some single not so that is like a a clear failure In chess. You can be like beating 10% off the time so you can do safely and then you very slowly pull yourself up right, so there's not an option for me, so that that's why we're working on it. So that's why we think it's an exciting problem because it's different and I'm difficult. On the other hand, we think that perception is a very strong tour that allows us to do reasoning better than everything before. So so almost all the tools that exist, I think, will be obsolete. ID just like elects, not operating most of the computer vision before most of the future generation before right? So that is a great protection. And if we see that there were naughty at us, that we're actually intractable for deep learning because given enough data, the tricky question How do we generate enough data? Have the initial system that can reason at a decent pace so that it can put itself up

[00:21:16] Host Create clip In some sense, Your belief is we haven't found a problem that's too big for this. Too difficult for deep learning. Yet you just need enough data, and so it's it's how are you gonna collect enough data? I mean, what's your strategy for collecting of knit? Enough data for formal reasoning?

[00:21:33] Host Create clip Be rely on certain existing former corpus is that these are relatively small, so they they typically they were developed to prove one or a big one big mathematic s economy for color C o M r finite simple group stuff like that so that I like 34 off these corpses. But these are not big enough to to pull yourself up. So the really crazy idea here is to read all off the human mathematics literature. And then I learned to formalize them. So we want to use that initial small set off data. It's kind of like just the spark to initiate a feedback loop in which you learn to read human language mathematics and turned them into conjectures and improve those core injectors and then work your way up. So that's our strategy. And you have a lot off, uh, natural language, mathematics. The good side effects off it is that at the same time, the system will learn toe, understand natural language. So if we get that, then we don't get a very good mathematic systems. We have. You have a really strong system that can the demonstrates strong natural language understanding, which is a which would be in its FM. We should go to get there. But I think addressing both of them at the state I have a higher chance and addressing them in separation.

[00:23:05] Host Create clip That's really interesting, Actually. You think that understanding formal reasoning from just math textbooks and math papers literally doing doing that is it has a higher likelihood of of, ah, of succeeding, even though strong natural language understanding is a prerequisite which were no work we're not quite close to.

[00:23:24] Host Create clip Okay, so that I know that there are alternatives to do wrong, strong reasoning. So either you collect a lot of training data. But even scare wouldn't be able to collect the data for us because we don't have access to mathematicians like in Life Scare. So it's very hard to collect training that for formalized method, right? So So that's not that, not that realistic past. So if you want to create a superhuman mathematician than on the alternative would be to do open ended exploration of mathematics and figure out what I don't like principles off interesting that's in mathematics and learn how to discover the part of mathematics. Even that wouldn't work too well because even if your system would learn toe, you have high level mathematics. You will not be able to tell a concrete problem to it because you don't speak the same language as the system. So you will have to formalize your statements in the system's basic terminology, which would be like self developed. So you mean that you will have to understand it just like an alien artifact. So it would be very hard to work with such a system. So that's what I think, that the self self exploding mathematician system you might learn very good, so become very good.

[00:24:34] Host Create clip Ah, reason against him. But you don't you can have a tablet that it is so so in order that your only chance is to learn to communicate at the same time as as to reason. And I think if you wantto learn natural language understanding so a lot of people make the mistake. I think that they treat languages an object. So you tried to burn natural language understanding as part off manipulating the election. But natural languages, not languages in general, are not about the language itself. They are just communication mediums off something else. I think next sailing with understanding is not really natural language in this, and it's communicating about your understanding off something, so it's really like understanding off mathematics. Communicated were natural language understanding off the world communicated me on it to their language. So So, basically, the natural languages. A compressed communication channel, right? That's really hard to do if you don't have something to communicate about, you don't have a very controlled environment to communicated that mathematics is a really high rock. It got complicated environment about which you can communicate.

[00:25:48] Host Create clip So I think that's the perfect medium to do their natural language understanding as a prototype, right? And once you have mathematics, you could extend that system with all kinds of domain knowledge because it has all the logic, so you can argue about anything, and so you have a logical foundations toe to do that. So So I think. Therefore, it's natural language. Understanding alone is a heart that has them together in mathematics and doing mathematics. Element is a heart that does it. We left it at Langley down. That's

[00:26:19] Host Create clip right, natural language understanding alone. You don't have a sort of communication, a vacuum you, nothing to communicate about. And so now we have a backbone of a formal, like formal reasoning or formal statements like what?

[00:26:35] Host Create clip Basically a sandbox. So your mathematics is a sandbox that you can manipulate in the memory and detect like a small bird. Right about that you can come on, kid,

[00:26:43] Host Create clip right? Your current research, anything is extremely ambitious. It's like these are these are ambitious things to achieve. Why do you think that they're actually possible?

[00:26:51] Host Create clip So I wanted the measure that that I think so. For example, after ago and our first aria or these matters show that reasoning. It's possible, right, with strong perception and relatively, some perception is possible deep neural networks. So therefore, I think we can do a lot of things that human can do way deep, learning relatively of high certainty, right? That's that's one of the argument, the other. But I agree it's still not granted even with this with the supporting evidence. Most people even fire fear before half ago came out for that that God requires that level of human intuition that is not possible with computers or not. Now, maybe in 10 years or 20 years or something, and then half ago came up turned out okay, you just you resonating silence, then added to existing go engine and you get pretty good and then you just go a bit further, and then you get even superhuman. So So I think that must so that component that previously looked like fully human the ability to to do this fuzzy intuition, fuzzy reasoning, this kind of artistic spirit that looked like uniquely human. And most people think it was not possible for computers leprosy that, actually this intuition path is is there, like a deep learning solved that intuition, Right. So basically, we have my own artificially intuition. What you That's quite different.

[00:28:26] Host Create clip So we can infuse that into a lot of Thomas If you want

[00:28:29] Host Create clip your most recent paper that the approach is essentially it's at a high level uses ah, tree search. Uh, with a neural network, a sort of curious six engine starts with a slight intuitive engine and switches. I think it switches between that and and the sort of classical methods. What do you think are sort of the limitations of that approach?

[00:28:51] Host Create clip Yeah, So that's another very limited that projection early because so you already give it a kid, so it has to be inside. A certain search meant it. So you're deep learning algorithm is not really have the freedom to toe export the search base as it once. So next paper is that coming out like I've soon like in a few weeks turns this around. And what we say is we have, like, a certain environment in which the the network has all kinds of factions to perform. And then the network becomes the outside and the search environment becomes just an environment on it. Open it. I think that's ah, much better approach. But even that has probably limitations. So we're thinking about howto transcend most of the limitations to give the maximum freedom, tow the network so that it can be maximally free to accept to feeling all the spaces that are possible. Field, Basically, Yeah. So at Yes, I agree with you. It was It was an important first check. We wanted to just check for the new network, sent something to existing search, and we were like for the it was kind of success, but it was not really it's ah was not that that you reach the escape velocity, it cannot be opened a little improved.

[00:30:11] Host Create clip What? What would you say or the other exciting? We're potentially underrated areas of research. Um, in a I right now

[00:30:20] Host Create clip I think one of the that goes back to your other questions. What should we do about not being misused? Right. So a lot of people do live service and say every do this and that, but I think it's so how do you combat certain negative effects off machine learning? And what are those negative effects? Because a lot of them are kind of busy. So So how? How do people make decisions about our life? Something Kelly like type? For example? Insurance companies are credit agencies and stuff like that. So So what is that? This is just a small thing. I don't really know everything. So, as you know, I get applied more than more than all these biases that going into the A I systems veal affect everybody more than more. And I think that's something one should should do much more research and take it much more seriously happy that Google is taking a lead on that. Actually, So it's so did a lot of people notice that and and that is a significant push. But I think it's it's kindof like not really rewarded on the society leva to do this kind of research because there's no no obvious immediate monetary impact. Our positive impact off that I think it said That's an important thing that one should do. I should research why we are deserting technologies.

[00:31:40] Host Create clip Thanks so much for being here, Christian. It sounds like toe to sum it all up. You're definitely an Aye aye, optimist. And formal reasoning is on the critical path for a strong I.

[00:31:51] Host Create clip Thank you, Alex.