LOGAN: Dario, thanks for doing this.
DARIO: Thanks for having me.
LOGAN: I normally don't tell people's backgrounds in a linear fashion, but I actually haven't heard yours. I don't know if you've ever really told it in earnest like childhood growing up, what led you to starting philanthropic. So maybe can you share a little bit about your background childhood growing up?
DARIO: Yeah. Yeah. So I don't know that my childhood was that interesting or that different from, from people who are in tech or found companies. I was always really interested in math. It felt like it had a sense of objectivity, right? One kid could say, Oh, this show is great.
DARIO: And the other kid could say, Oh, it's terrible. If. When you're doing math, you're like, Oh, man, there's an objective answer to this. So that was always very interesting to me. And I grew up with a younger sister who's one of my co founders, and we always wanted to save the world together.
DARIO: So it's actually amusing that, we're working on something together that, at least potentially could have very wide scope.
DARIO: Yeah, in terms of how I got from there to Anthropic my interest in math led me to be physics major undergrad.
DARIO: But near the end of undergrad I [00:01:00] started reading. I was initially the work of Ray Kurzweil who, I think is a bit crazy about a lot of things, but just the basic idea that there's this acceleration, there's this exponential acceleration of compute and that's going to provide us enough compute.
DARIO: Somehow we had no idea. How then we had no idea it was going to be neural nets, you know What will somehow get us to like very powerful AI and I found that idea really convincing so I was about to start grad school for theoretical physics and you know decided as soon as I got there that I wanted to do biophysics and computational neuroscience because you know if there was gonna be AI, it didn't feel like AI was working yet.
DARIO: And so I wanted to study the closest thing to that that there was, which was, our, which was our brains, the closest, it's a natural intelligence. So therefore, the closest thing to an artificial intelligence that, that, that exists. So I studied that for a few years and worked on networks of real neurons.
DARIO: And then shortly after I graduated, I was at Stanford for a bit, and then I saw a lot of the work [00:02:00] coming out of Google, of Andrew Ng's group at Stanford, and so I said, Okay, I should get involved in this area. My reaction at the time was, Oh, my God, I'm so late to this area.
DARIO: The revolution has already happened. And
LOGAN: What year is this, yeah? 2014, right?
DARIO: I was just like, Oh, my God, like this tiny community of 50 people, like they're the giants of this field. It's too late to get in. If I rush in, maybe I can get some of the scraps. That was my mentality when I entered the field.
DARIO: And now, of course, it's, it's nine years, nine years later than that. And, every, I interview someone every day who's I really want to get into this field. And so I ended up working with Andrew Ng at Baidu for a year. I ended up working at Google Brain for a year.
DARIO: Then I was one of the first people to join OpenAI in 2016. I was there for about five years. And by the end of it I was, VP of research, was driving a lot of the research agenda. We built GT two, GT three, reinforcement learning from human feedback, which is, of course the method [00:03:00] that's used in chat G P T and used along with.
DARIO: Other methods in our model Claude and one of the things I think, one of the big themes of those five years was this idea of scaling that you can put more data, more compute. into the A. I. Models and they just get better and better. And I think that, that thesis was really central.
DARIO: And the second thesis that was central is you don't get everything that way. You can scale the models up, but there are questions that are unanswered. It's ultimately the fact of value distinction. You scale the model up. It learns more and more about the world, but you're not telling it how to act, how to behave.
DARIO: What goals to pursue. And so that dangling thread, that three variable was the second thing. And so those were really the two lessons that I that I learned. And of course, those ended up being the two things that, that philanthropic is really about.
LOGAN: So do you remember what year was it that you joined OpenAI?
DARIO: It was 2016,
LOGAN: And what was the original connection?
LOGAN: Was it just that this seemed to be where the [00:04:00] smart people were going? Yeah,
DARIO: seemed to be where the was actually initially invited to join the organization before it existed in late 2015 as it was forming and decided not to. But then a few months after it started, I a bunch of smart people ended up joining.
DARIO: And so I said, Oh, maybe I'll do this after all.
LOGAN: And so then you were there for a number of years and at some point you made the decision that Anthropic was going to be, I guess it wasn't initially, you've had an unusual path. It wasn't initially a company, right? Originally it was started as a quasi research lab.
LOGAN: Is that fair?
DARIO: I mean our strategy has certainly evolved. But actually, it was a for profit public benefit corporation since the beginning and we had, even since the beginning, we had something on our website saying, we're doing research for now, but, we see the potential for commercial activity down the road.
DARIO: So I think all of these things we kept open as potentialities, but, you're right. For the first year and a half, it was mostly, building [00:05:00] technology, and we were agnostic on, what exactly we were gonna do with that technology or when we wanted to keep our options open.
DARIO: We felt it was better than saying, we're about this or we're about that.
LOGAN: And what was the thought at the time of Hey, we should go do something on our own. Was that your thought? Was it a group of people?
DARIO: it was. It was definitely the thoughts of a group of people. So there were seven cofounders who left. And then I think in total, we got 14 or 15 people from open a I, which was about 10 percent of the size of the organization at the time. It's funny to look back on those days because, in those days, We were the language model part of OpenAI, like we, we, along with a couple people who stayed were those who had developed and scaled the language models.
DARIO: There were many different other things going on at OpenAI, right? There was, there was a robotics project, a theorem proving project, projects to play video games. Some of those still exist but we felt that we were this kind of coherent group of people. [00:06:00] We had this view about. Language models and scaling, which to be fair, I think the organization supported.
DARIO: But then we also had this view about, we, we need to make these models safe in a certain way. And, we need to do them within an organization where we can really believe that these principles are incorporated top to bottom.
LOGAN: What was the, was it because OpenAI had a whole bunch of different things and still does, experimenting around was it evident at what point along the way was it evident that large language models were something that there was a lot of wood to chop and a lot of opportunity around?
DARIO: Yeah, I think, I don't know, it was obvious at different times to different people. For me, I think there were a couple things.
DARIO: The general scaling hypothesis. I wrote this document called the Big Blob of Compute in twenty, twenty seventeen. Which I'll, I'll probably probably publish at some point, although it's primarily of, like historical interest now. And very much in my mind, and I think the minds of, like a small number [00:07:00] of other of other people on both the team that left and the team that, that didn't leave, and, some in other places in the world as well, it was clear that there was really something to scaling.
DARIO: And then as soon as I saw GPT 1, which was done by Alec Radford who's still at OpenAI. Our team actually had nothing to do with GPT 1, but we recognized it immediately and saw that the right thing to do with it was to scale it. And so for me, everything was clear at that moment and even more clear as we scaled up to GPT 2 when, we saw that, the model was capable of, my, my favorite thing was like, we were able to get the model to perform a regression analysis.
DARIO: You give it like, the price of a house and ask it to predict the number of square feet or something like that. You gave it a bunch of examples, then you gave it one more price, and you're like, how many square feet? It didn't do great, but it did better than random. And in those days, I'm like, oh my god, this is like...
DARIO: Some kind of general reasoning and prediction engine Oh my god, what is this thing I have in my hands, right? It's completely it's completely crazy. I, it has been my view [00:08:00] ever since then, that, this would be not just language models, but, language models is exemplar of...
DARIO: The kinds of things you can scale up, this would be really central to the future of technology.
LOGAN: did you consider yourself like a founder or a c e O prior to actually doing it?
DARIO: No. So I really never thought of myself that way. If you went back to, me in childhood, it would have been very unsurprising, for me to be a scientist.
DARIO: But, I never thought of myself as like a founder or a CEO or a business person, right? I always thought of myself as a scientist, someone who discovers things. But I think just having been at Several different organizations convinced me that, in fact, I did have my own vision of how to run a company or how to run an organization because I'd seen so many and I thought I don't know, I'd actually do it this way, right?
DARIO: And so the contrast, not that I disagreed with every decision that made, but just watching all these decisions go by. took me to the point where I'm like, Actually, I do. I do have opinions on these questions, [00:09:00] right? I do have an idea of how you would grow an organization, how you would run a research effort, how you would bring the products of that research organization out into the world in a way that's, makes business sense, but it's also responsible.
DARIO: I don't think, I don't think I really had those thoughts naturally, but as I was brought into contact with organizations that, that, that did that then I became excited about those things. I almost reluctantly learned that I actually had strong opinions on
LOGAN: It not to draw it in contrast to any specific names, but maybe just in others in the field that I would consider large language foundation model as a product.
LOGAN: What's. What's something foundational I guess not to use a cute term, but something foundational that you believe at Anthropic that you would draw in distinction to others in this
DARIO: Yeah, I would say a couple things.
DARIO: One is just this idea that we should be building in. Safety from the beginning. Now I'm aware we're not the only ones [00:10:00] who've who've said that, but I feel like we've built that we've really built that in from the beginning. We've thought about it from the beginning. We've started from a place of caution and kind of.
DARIO: Commercialize things, brought things out into the world starting from, Hey, can we open these switches one by one and see and see what actually makes sense? I think in particular, a way I would think about it is that. What we're aiming to do is not just to be successful as a company on our own, although we are trying to do that but that we're also trying to set a standard for the field, set the pace for the field.
DARIO: So this is a concept we've called race to the top. Race to the bottom is a popular term where, everyone is, competing to, lower cost or delivers things as fast as possible. And as a result, they cut corners and things get worse and worse. So that, that dynamic is real and we always think about how not to contribute to it too much.
DARIO: But there's also a concept of race to the top, which is that if you do something that looks better, that makes you look like the good guys, it naturally has the effect that other [00:11:00] players end up doing the same thing. In fact, having worked at other organizations, something that I learned very much is it's pretty ineffective to Argue with your boss or argue with someone who's running an organization.
DARIO: It's their organization. You're not going to Negotiate with them to implement your vision, right? That just doesn't work. Whereas if you go off and you do something and you show that it works, and you show that, it is possible to, it is possible to be the good guys, then others will follow.
DARIO: And this happened for interpretability for a couple of years. We were the only org that worked on interpretability seen inside neural nets. There are various corporate structures that we've, that we've that we've that we've implemented that, we hope others may emulate and recently we released this responsible scaling plan that I could talk more about that.
DARIO: I could talk more about later but generally we're trying to set the pace. We're trying to do something good, inspiring, also viable and encourage others to do the same thing. And, at the end [00:12:00] again, maybe we win in a business sense and Of course, that's great. But maybe someone else, maybe someone else wins in the business sense, or, we all win.
DARIO: We all split it. But the thing that matters is that the standards are increasing.
LOGAN: I want to talk about all that stuff in a second, but why do you think philosophically that like a development or the scaling of models and safety? Why are they intertwined? I've heard you maybe coiled together in, in, in different ways.
DARIO: Yeah. So I think this actually isn't such an unusual thing. I think this is true. It's true in most fields, right? The common analogy is like bridges, make building a bridge and making it safe. They aren't exactly the same thing, but they both involve all the same principles of like civil engineering.
DARIO: And it's hard to work on bridge safety in the abstract aside from outside the context of a concrete bridge what you need to do is you need to look at the bridge. You're like, okay these are the forces on it. These are the, this is the stress tensor.
DARIO: This is, the strength of the material or whatever. That's the [00:13:00] same thing you come up with in, in building the bridge. If safety differs in any way, maybe it differs in thinking about the edge cases like in safety. You have to worry what goes wrong. 0. 1 percent of the time.
DARIO: Whereas in building the bridge, you have to think about the, you have to think about the median case, it's all the same civil engineering. It's all the same, forces of mechanical physics. And I think it's, I think it's the same in AI in large language models and in particular, safety is itself a task is this thing the model's doing right or wrong?
DARIO: We're, right or wrong could be something as prosaic as is the model telling you how to hot wire a car? Or as, scary and sophisticated as, is the model gonna help me build a bio weapon? Or is it gonna, take over the world and make swarms of nanobots or, whatever. whatever futuristic thing, figuring out whether the model is going to do that and the behavior of the model is itself an intellectual task of the kind that models do. And so The problem and the solution to the problem are mixed together in this way, [00:14:00] where every time you get a more powerful model, you also gain the capability to understand and potentially rein in the models.
DARIO: So we have this problem where these two things are just mixed together in a way that's, I think, hard to untangle. And I think that's the usual thing. I think the only reason that's surprising is that... The community of people who thought about AI safety was historically very separate from the community of people who developed AI.
DARIO: They had a different, they had a different attitude, they came to it from a, a, philosophical, more philosophical, perspective, more of a moral philosophy perspective, whereas those who built the technology were engineers. But just because the communities were different, doesn't, that doesn't imply that the actual content turned out to be different.
DARIO: So I don't know. That's my view on it.
LOGAN: the actual content turned out to be different.
LOGAN: So I don't know that's my view on it. Being a business, would you pick that path or is the business side of it [00:15:00] inherently intertwined as well as something that interests you? Yeah, so a few things on that. One is I think it's, I think it's actually
DARIO: Going to be very difficult to build or would have been very difficult to build models of the scale that we want without being a commercial enterprise. People make jokes about VCs being willing to pour huge amounts of money into anything, but I think that's only true up to a point, right?
DARIO: There's business logic and there's business logic behind it. You guys have LPs, like things need to, it's not just all hype train, right? Things. Things need to make things need to make sense eventually. And we're now getting to the point where you need, certainly multiple billions of dollars, and I think soon tens of billions of dollars, to build models at the frontier.
DARIO: And... To study safety with those models and models at the frontier requires you to have intimate access to those models, particularly for tasks like interpretability. So first of all, yeah, I just think it's very hard. On the other hand, I, or in support of that point I also think that there are some things that, that you learn from the business [00:16:00] side of things.
DARIO: Some of it is just learning the muscle and the operation. Of things like trust and safety. Today we deal with trust and safety issues like, oh, people are trying to use the model for inappropriate purposes, right? Not things that are going to end the world, but things that you know, we'd rather that people we'd rather that people not do.
DARIO: I think the ultimate significance of being able to develop methods to address those things and enforcing those things in practice when they're used at scale by users is it allows us to practice for the cases that are really high stakes. And I think without that organizational institutional practice, it might be it might be difficult to just be thrown into the shark tank.
DARIO: Congratulations. You've built this amazing model. It can cure cancer, but also, someone could make a bio plague that would kill a million people. You've never built a trust and safety org. You have to deploy this model in the world and make sure we do 1 and not that would just be a very difficult thing to do.
DARIO: And I, I don't think we would get it right now. All that said, I will freely [00:17:00] admit, my, my passion is this. My passion is the science and the safety, right? That's my first passion. The, the business stuff is The business stuff is quite a lot of fun.
DARIO: I think, just watched all the different customers, just learning about the whole business ecosystem has been great. But, definitely my first passion is, is the science of it and making sure it goes well.
LOGAN: well.
LOGAN: Was there a serious debate about being a business versus not in the early days? Was that like a real conversation
DARIO: Yeah, so I think certainly everyone was aware from the beginning that, we, that there was a good chance that we would commercialize the models at some point. We. Had this thing on our website. I'm not sure if it's up there anymore, but you can see it on the way back machine that said, for now, we're doing research, but we see commercial potential down the road.
DARIO: So everyone who joined saw that and everyone who joined knew that. But there was a question of, when exactly should we do it? So there was a period around. I think it was April, May, June of 2022 when we had the first version of [00:18:00] CLAWD which was actually like a smaller model than CLAWD1, but, we were training the model that would become CLAWD1 at that time.
DARIO: And we realized that with RL from human feedback, we didn't have our constitutional AI method yet. That this thing was actually great to interact with, and, all of our employees were having fun interacting with it on Slack. We showed it to some... To a small number of external people, and they had lots of fun, they had lots of fun interacting with it.
DARIO: It definitely occurred to me and others that, hey, there could have been a lot of commercial potential to this. I don't think we anticipated the explosion that happened at the end of the year. We definitely saw potential, I don't think we saw that much potential. But yeah, we, we definitely had a debate about it, and I wasn't sure quite, quite what to do.
DARIO: I think our concern was that... With the rate at which the technology was progressing, a kind of big, loud public release might accelerate things so fast that the ecosystem might not know how to handle it. And, I didn't want our kind of first act on the public stage, [00:19:00] after we'd set it, after we'd, Put so much effort into being responsible to accelerate things.
DARIO: So greatly. I generally feel like we made the right call there. I think it's actually pretty debatable. There's many pros, many cons, but I think overall, we made the right call. And then, certainly as as soon as the other models were out and the gun had been fired, then we started to become substantially more aggressive in putting these things out.
DARIO: We're like, okay, all right. Now there's definitely a market in this. People know about it. And we should get out ahead. And, indeed, we've managed to, put ourselves among, among the top two or three players in this space.
LOGAN: was that gun being fired and chat GPT taking off? Was that similar to the maybe fear that you had in of Hey, this might start a
DARIO: Yeah, similar. And in fact, more I think we saw it with, Google's, Google's reaction to it, that there was definitely, just judging from the public. public statements, a kind of a sense of, a sense of a sense of, a sense of fear and existential threat.
DARIO: And, I think they responded in a very economically rational way. I don't [00:20:00] blame them for it at all. But you put the two things together and, it really created an environment where things were, racing forward very quickly and look. I love technology as much as the next person.
DARIO: There was something like, super exciting about the whole, make them dance. Oh, we're responding with something. I can get just as excited about this as everyone. But given the rate at which the technology, given the rate at which the technology is progressing there was a worrying aspect about this as well.
DARIO: And so in this case, I'm at least on balance clad that, we weren't the ones who fired that starting
LOGAN: that starting gun. Yeah, got it.
LOGAN: You recently announced an investment from Amazon. Before that, you did a round with Spark and a little bit more traditional venture capitalists. I don't know, was that the Series B, technically?
DARIO: Yeah, the VC round was the
LOGAN: round was
DARIO: Series B. The round with Amazon, it's complicated and I can't go into the details, but it's not a full, closed, priced round.
LOGAN: Round and all that.
LOGAN: Before that, you had an unusual round as well with FTX, right? Or was it? How did that come to be?
DARIO: Yeah. Honestly, [00:21:00] there, there was actually very little to that. There, there's a community of people who cared a lot about AI safety. And back when back when he was doing FDX before he committed all the fraud or was caught committing all the fraud that he committed Sam Bakeman freed was, presented himself as someone who cared a lot about issues like pandemics, AI safety.
DARIO: He was known to people in my community and honestly, there's not much to tell. I, I only talked to him a handful of times. He basically agreed to invest like the first or second time I talked to him. I thought it was a little weird, he just had a reputation as a, a guy who moves fast and breaks things.
DARIO: So I was like, okay, let's do this. But I didn't know him that well. So I was like, do you know, do I want this person who I don't know on my board? Nope. Do I want him voting in my company? Nope. So we gave him non voting chairs, which ended up being a pretty good call. So yeah, talked to him a handful of times every once in a while.
DARIO: I think, made a demo of our, made a demo of our technology. And yeah, then at some point I found out, he's not a guy who's not a guy [00:22:00] who loves a I safety or maybe he was there's no way to know he's actually a criminal. So
LOGAN: but both can be true, for sure. There's probably criminals that like
DARIO: no way there's no way to
LOGAN: Yeah, got it. I guess now the potential so it still is that the entity is still related to FTX, right? So there's potential that the anthropic investment could one day make FTX people whole, depending on
DARIO: Yeah that's one. That's one of the ironies. It, as it turns out so that, there's a bankruptcy estate or bankruptcy trust that owns these non voting shares.
DARIO: And so far they've they've declined to sell them off, but they're interested in doing so in a general sense. And
LOGAN: I'm told there's people that are very interested in buying those shares
DARIO: yeah, I can't comment on the market dynamics there and we don't. We don't really control them, right? It's a sale between different parties but, hey, if those shares lead to, the people who had their money stolen, getting some or all of their money back, then, that's a random chance, but certainly a good thing.
LOGAN: Good outcome.
LOGAN: So what is the business of Anthropic look today? You guys are focusing mostly on [00:23:00] enterprise customers?
DARIO: we are focusing mostly on enterprise customers. I would say, we have both an enterprise product and we have both an enterprise product and a consumer product. It makes a lot of sense if you're building one of these models to. At least try to offer them in every direction that you can, right?
DARIO: Because the thing that's expensive in terms of both money and people is building the base model. Once you have it, wrapping it in a consumer product versus wrapping it in an API, while both of those things do take substantial work, are not as expensive as the, the base work and the model.
DARIO: And so we have a consumer product that honestly is doing pretty well, but At the same time, our real focus definitely is enterprise. We found that some of the properties of the model in a practical sense, right? The safety properties of the model in a very practical sense as opposed to like a philosophical or future sense are actually useful for the enterprise use cases.
DARIO: We try to make our models helpful, honest. and harmless. Honesty is a very good thing to have, in [00:24:00] knowledge work settings, number of our customers are in like the finance industry, the legal industry starting to get stuff on the health side, different, productivity apps.
DARIO: Those are all cases where a mistake is bad, right? You're doing some financial analysis, you're doing some legal analysis, like you really have a premium on Make sure the thing knows what it doesn't know. Giving you something misleading is much worse than not giving you anything at all.
DARIO: That, that's true across the board, but I think it's especially true in true, it's especially true in those industries. For enterprises, often Inappropriate or embarrassing speech is, something that they're very concerned about, even if it happens very rarely.
DARIO: And so the ability to steer and control the models better, I think is very appealing to, to, is very appealing to a number of enterprise customers. Another thing that's been helpful is this, we have this longer context window. So context window is like how much information the model can take in and process.
DARIO: So our context window is 100, 000 tokens. Tokens are this weird unit. It really corresponds [00:25:00] to 70, 000 words. But, the next, the model with the next. Biggest context window is GPT 4, where there's a version of it that has 32k tokens, 32, 000 which is three times less, but the main GPT 4 has 8, 000, which is about 12 times less.
DARIO: And so just the ability to, for example, something you can do with Claude that you can't do with any other model is, read a mid sized book or novel or textbook or something, just stick it into the context window, upload it and then start to ask questions about it. And so that's something you can't.
DARIO: do or can't do nearly as easily with any other model. And then another thing that's actually been appealing is raw cost. So the sticker price of CLAWD2 is about 4x less than the sticker price of GPT 4. And the way we've been able to do that, I can't go into the details. But we've worked a lot on algorithmic efficiency for both training and inference.
DARIO: So we're able to produce something that's, in the same ballpark as GPT 4 [00:26:00] and better for some things. And we're able to produce it at a substantially lower cost. And we're in fact excited to extend that cost advantage because, we're working with Custom chips with various different companies, and we think that could give an enduring advantage in terms of inference costs.
DARIO: So all of those are particularly helpful for the enterprise case and, we've found pretty strong enterprise adoption even in the face of, competition from multiple companies.
LOGAN: What's something that's perhaps unintuitive to someone that isn't living and breathing this every day about enterprise interest in artificial intelligence as someone sitting at that nexus?
DARIO: Yeah, so I think one of the things is I see a huge I see. First of all, I see a huge advantage to folks who think in terms of the long term. So there's a fact that's the bread and butter for those of us who are building the technology. But, getting it across to the customers is, I think, one of the most important things, which is the pace at which the models are getting better.
DARIO: Some of them get it, [00:27:00] others are starting to get it. But the reason this is important is, put yourself in the shoes of a customer, right?
DARIO: They've got our model, Claude. They wanna, they wanna build something. And, typically they want to start with something small. And, of course...
DARIO: Naturally, they think in terms of what can the model do today. And what I always say is, do that. We gotta start. We gotta iterate from somewhere. But also, think in terms of where the models are gonna be in one or two years. It's gonna be a one or two year arc to, to go from proof of concept, to small scale deployment, to large scale deployment, to true product is
LOGAN: fit for whatever it is
DARIO: that you're launching.
DARIO: So you should basically skate where the puck is going to go. You should think, okay, The models can't do this today, but look, they can do it 40 percent of the time. That probably means they can do it 80 or 90 percent of the time in one or two years. So let's have the faith, the leaf of faith, to build for that instead of building for what the models are able, what the models are able to do today.
DARIO: And if you think that way, The [00:28:00] possibilities of what you can do in one to two years are much more expansive and we can talk about having a kind of longer term partnership where we build this thing together. And I think, the customers that have thought that way are, ones that that, we've been able to work together with on a path towards creating a lot of value.
DARIO: We also do lots of things that are just targeted as what you can do today. But often the things I'm most excited about are those that see the potential of the technology and by starting to build now, they'll have the thing tomorrow as soon as the model of tomorrow comes out, instead of it being another year to build after that. This is something that I think is particularly
LOGAN: Instead of it anyone and particularly any leader. But you had said recently attaching your incentives to approval or cheering of a crowd in some ways destroys your mind and in some ways can destroy your soul. You haven't been as public as other folks in the space have been. I assume that's very.
LOGAN: purposeful, [00:29:00] stylistically and ties into that. Can you talk a little bit
DARIO: Yeah, part of it, it's just it's just my, my, my style for one thing. I think as a scientist, I prefer to speak when there's something clear and substantive to say.
DARIO: I'm not totally low profile. I'm on this podcast right now, and I've been on a few. Given the general volume of the field, there's some need to get the message out there, and that need will probably increase over time. But I think I have noticed and, it's not just, Twitter or social media, but some phenomenon that's a little bit connected to them that, you can really if you think too much in terms of pleasing people in the short term or, making sure that you say something, making sure that you say something popular, it can really lead you down a bad path.
DARIO: And I've, I've seen that with a lot of, with a lot of very smart people. I'm sure you could name some as well. I'm not going to give any names who have gotten caught up in this. And, years later, you look at them and you're like, wow, this is a really smart person who's acting much dumber than they are.
DARIO: And I, I think the way it happens, I don't know, I could give an example, right? Take, like a [00:30:00] debate that's important to me, which is should we build these things fast or should we make these systems safe? So there is an online community, mostly on Twitter, of, people who are, think we should slow down and then an online community of builders who are really excited about, we should make this stuff fast.
DARIO: And if you go to certain corners of Twitter you get these really extreme versions of each one, right?
DARIO: On one hand, you get people who say, Like we should stop building AI. We should have a global worldwide pause, right? I think that doesn't work for a number of reasons. We have our responsible scaling plan is incorporate some aspects of that. So I think it's not an unreasonable discussion or debate to have. But, there's this kind of really extreme position. And then that's created this polarization where there's this other extreme position. It's like we have to build as fast as possible.
DARIO: Any regulation is just regulatory capture. We just need to maximize the speed of progress. And The most extreme people say things like it doesn't matter if humanity's wiped out, AIs are the future. That's a really extreme position. And think of the position of someone who's trying [00:31:00] to be thoughtful about this, trying to build, but build carefully.
DARIO: If you enter that fray too much, if you if you feel like you have to make those people happy. What can end up is either you get polarized on one side or another and then you repeat all the slogans of that side and you become a lot dumber than you would otherwise.
DARIO: The, if you're really good at dealing with Twitter, you can you can try and make both people happy, but, that, that involves a lot of kind of, lying or playing to both sides. And I certainly don't want to do that. That's what I talk about in losing your soul.
DARIO: The truth is the actual, the position that I think is actually responsible might be something that would make all of those people boo instead of all of them cheer. And so you just have to be very careful if you're taking that as your barometer, who's yelling at me on Twitter, who thinks I'm great on Twitter, you're not going to arrive at the position that makes everyone boo that might just be the correct position.
Future of AI
LOGAN: What timeline do you think about then when you're saw it's not the instantaneous dopamine hit of a tweet. You mentioned talking to [00:32:00] enterprises about one to two years and what can be but like what timeline are you solving for?
DARIO: Yeah, I guess I think like 5 to 10 years from now, everything will be like a little bit more clear.
DARIO: And it'll I think it will be more clear which decisions were good decisions, which decisions were bad decisions. I think, certainly less than that time scale is a time scale on which You know, if dangerous things with these models are indeed possible, as I believe they are, but I could be wrong, I think they may play out on that timescale and, we'll, we'll be able to see which companies addressed these dangers well.
DARIO: Or were the dangers not real and people like me warned about them and we were just totally wrong or will it turn out that some tragedy happened and, people like me should have been more extreme and worrying about it or will it turn out that, that companies like Anthropic, picked the right path and, navigated a dangerous situation.
DARIO: I don't know which how it's gonna turn out. I hope it turns out that, we navigated a dangerous situation. And we averted catastrophe [00:33:00] and there were hard trade offs. And, we addressed them skillfully and thoughtfully. That's my hope for how it's gonna turn out.
DARIO: But I don't know that it's gonna turn out that way. But, I feel like looking at that, in five years in 10 years, that's just gonna be a fair judgment of all the things that I'm saying and doing.
LOGAN: ' what would you want the average person listening who is aware of AI, knows what Anthropic is, knows what OpenAI is, knows what Google and others are doing in the space, about safety and about risk?
LOGAN: What would you want them to know from your perspective?
DARIO: Yeah, so I think, if I were to just put it in a few sentences, I think what I would say is, look, I have two concerns here. One is the concern. that people will misuse powerful AI systems. People misusing technology is nothing new.
DARIO: But one thing that I think is new about AI is that it And its ability to put all the pieces together is much greater than any previous technology. I [00:34:00] think in general, we've always been protected by the fact that, if you take a Venn diagram of people who want to do really bad things and, people who have strong technical and operational skills, generally overlap has been pretty small.
DARIO: If you're a person who has a PhD or is capable of running a large organizations. You have better things to do than come up with evil plans to, murder people or destroy society, right? It's just, not very many people are motivated
LOGAN: Are motivated
DARIO: that direction. And then, the people who are, often they're just...
DARIO: Not all of them, but in many cases, not that bright or not that skilled. The problem is, now could we take unskilled person plus skilled AI plus bad motives? And I testified in Congress about this, about the risk of bio weapons. I think cyber is another area bunch of stuff around national security and, the relationships between nations and stability.
DARIO: So that's one. And then I think the other corner of it is what the AI systems [00:35:00] themselves may do. And there's lots you can, find in the internet and various communities on this. But I often put it in a simple way, which is one, the systems are getting much more powerful.
DARIO: Two,
LOGAN: we,
DARIO: There's obviously not much of a barrier to getting the systems to act autonomously in the world, right? People have taken GPT 4, for instance, and turned it into auto GPT. There was even a worm GPT, which was, supposed to act as a computer worm. Powerful, smart systems that can take action.
DARIO: Very long leash of human supervision, and because of the way they're trained, they're not easy to control. We all saw Bing in Sydney. So you put those three things together, and, there's at least, I think some chance that as the systems get more and more powerful, they're going to do things that we don't want them to do, and it may be difficult to fully control them, to fully rein them in.
DARIO: I think that's further out than the misuse, but it's, it's something we should think about.
LOGAN: We've touched on a few of the different things you [00:36:00] guys have done from a safety standpoint. So I want to talk through the I guess the three ones I took down.
LOGAN: So you Long-Term Benefit Trust and Public Benefit Corporation. Can you explain what that is and how you decided to do that?
DARIO: yeah. So we were incorporated as a public benefit corporation from the beginning. Which means? Basically public benefit corporation, it's actually very much like a C Corp. Except the investors. Can't sue the company for failing to maximize for failing to maximize profits.
DARIO: I think in practice, In the vast majority of cases, it operates like a normal company. I think that's one theme I want to get through here. Like 99 percent of what we do we would make the same, the same decision that, that a normal company would most of the time, the, you know the logic of business, which is basically the logic of providing mutual.
DARIO: Mutual value also makes sense from a public benefit corporation, but there's maybe this one, 1 percent of key decisions. I might think about the, the delay of release of Claude decisions [00:37:00] that might relate to, hey, we have a very powerful model, but we need to make really sure that, this thing can't create a bio plague that will kill millions of people before we, before, before we release it.
DARIO: So I think they're going to be a few key moments in the company Where this makes a difference. And then LTBT, as I said, a public benefit corporation is not that different from a C Corp. The idea of the LTBT is to have a set of so L LTBT is long term benefit trust. Right now, the governance of Anthropic is pretty much like that of a normal corporation.
DARIO: But we have a plan that was written into our original Series A documents and has been iterated on since then that will gradually. hand over the ability to appoint a majority of anthropics board seats to a kind of trust of people and on that trust of people, we've we selected the original ones, but then it becomes self sustaining.
DARIO: We selected for a kind of three types of experience. One type is experience in A. I. Safety One type is [00:38:00] experience in national security topics, as I think this is going to become relevant. And another type is, thinking about things like philanthropy and the macroeconomic distribution of income.
DARIO: So I, I think of those as my best guess as to the, the three topics where something that kind of trend, transcends the kind of ordinary activity of companies is going to come up.
LOGAN: And is that who you ultimately, Report into when this structure is finalized.
LOGAN: That'll be the board that Anthropic answers to,
DARIO: So this set of five people appoints a majority, but not all of the corporate board of the company.
DARIO: So there's basically two there's two of these bodies and the LTBT appoints the corporate board. Now look, that said, we all in, in practice, the company is almost always run, day to day by the CEO, right? It's very even speaking of.
DARIO: Even speaking of the corporate board, not just for entropic, but any other company, you think of, think of as a CEO, how many decisions you directly make yourself versus how [00:39:00] many? It's Oh, I have to get the board on board with that. There are some when you when when you when you when you when you when you raise money when issue new employee When you issue new employee shares, when you make a major strategic decision the LTBT is an even more rarefied body.
DARIO: And, I, I've set the expectation with them that, their role is to, get involved in the things that, that really involve critical decisions for humanity. There might only be three or four such decisions in the entire history of Anthropic.
LOGAN: entropy. Now, constitutional AI, can you talk about what that is and what the inputs into it were?
DARIO: inputs into it were?
DARIO: Yes, so constitutional AI is a method that we developed around the end of last year. So easiest to explain it by contrasting it with this previous method called reinforcement learning from human feedback, which I and some other people were the co inventor of at OpenAI in 2017.
DARIO: Reinforcement learning from human feedback. The way it works is, okay, I've trained the giant language model, I've paid my tens of, maybe hundreds of millions of dollars to, to train it. And now I want to [00:40:00] give it some sense of how to act. There, there are questions you can ask the model that don't have any clear factual, factual Factual factually correct answer.
DARIO: I could say, what do you think of this politician? Or What do you think of this policy? Or What should I as a human do in this situation? And, the model doesn't have any definite answer to that. So the way RL from human feedback works is you hire a bunch of contractors, you give them examples of how the model is behaving, and the humans they give feedback to the model.
DARIO: They say this answer is better than that answer. And then over time the model updates itself to learn to do whatever is in line with what the human contractors say. One of the problems with this is, one, it's expensive. It requires a lot of human labor but in addition to that it's very opaque, right?
DARIO: If I serve the model in public and then someone says, Hey, why is this model biased against?
LOGAN: conservatives,
DARIO: Why is this model biased against liberals or why is this model just give me weird sounding advice? Or why does it, why does it give things in a weird style? [00:41:00] I can't really give any answer I can just say I hired 10, 000 contractors I don't know.
DARIO: And you know this was the statistical average of what the contractors generally proposed or the you know the mathematical generalization of it. It's not a very satisfying answer so the method we developed was. It's called constitutional A. I. And the way that works is you have a set of explicit principles that you give to the model.
DARIO: So you know, the principles could be something like on a political question, present both sides of the issue and don't take a position yourself. Say here are some arguments for here are some typical arguments against opinions differ. With that, you basically, just as with RL for human feedback you have the model give an, you have the model give answers, but then you have the model critique its own responses for whether they're in line with the model constitution.
DARIO: And so you can run this in a loop. Basically, the model is both the generator and the evaluator, with the constitution as the pin source of truth. And so this allows you to eliminate human contractors and instead go from this set of principles. [00:42:00] Now, in practice, we find it useful to aug augment that method with human contractors so that you can get the best of both worlds, but you use less human contractors than you were before.
DARIO: You have more of a guiding principle. And, then if someone then if someone calls me up in Congress and says, Hey, why is your model woke? Or why is your model anti woke? Or why is your model doing this crazy thing? I can point to the constitution and I can say, Hey, these are our principles.
DARIO: You could have one of two objections. Maybe you don't agree with our principles. Fine. We can have a debate about that or it's a technical issue. These are our principles. Somehow the train of our model, wasn't perfect and wasn't in line with those principles. And I think separating those two things out is, I think very useful.
DARIO: And even like enterprise customers have found this have found this to be a useful thing, the kind of customizability and the ability to separate the two out.
LOGAN: And the inputs into this were, so you use the UN, Declaration of Human Rights, Apple's Terms of Service.
LOGAN: What else went into coming up with the principles around this? Yeah,
DARIO: the
LOGAN: were
DARIO: principles?
DARIO: Yeah, there were some principles that were developed [00:43:00] for use by an early DeepMind chatbot. But yeah, Apple's Terms of Service, UN Declaration of Human Rights. We added some kind of other things, like we asked the model to respect copyright. So this is one way to, to greatly reduce the probability that the model outputs copyrighted text verbatim.
DARIO: We can all debate. You know what the status is of, models that we train on corpuses of data, but we can all agree that on the output side, we don't want the model to output vast reams of copyrighted text. That's a bad thing to do. And we aim not to do that.
DARIO: And so we just put in the Constitution not to do that.
LOGAN: You have some aspects of your job, and you alluded to this earlier, 99 percent of your job probably looks mostly like a normal company would, but your time, I would guess, is probably not 99%. How much of your time is spent on stuff that is weird CEO testifying in front of Congress or whatever that bucket is versus just day to day operations of running a business?
LOGAN: Or is it hard to disentangle?
DARIO: It's so [00:44:00] I don't know, I would say it's maybe 75 25 or something like
LOGAN: 75 normal?
DARIO: 25 25 percent weird. It certainly takes a lot of time to, to talk to a large number of customers to, to hire for various roles to, look at financial metrics to inspect the building, the models.
DARIO: That eats up a lot of time, but no, I also spend a decent amount of time. Say talking to government officials about what the future is going to look like, thinking about the national security implications, trying to advise folks on you know what can go wrong. We did this whole project of, working with some of the world expert biosecurity experts on, what would it really take for the model to help someone to do something very dangerous.
DARIO: There are certain missing steps in bioweapons synthesis. For obvious reasons, I'm not going to go into what those steps are. I don't even, I don't even know all of them. But, we spent a good number of months and, decent amount of my personal time, along with the incredibly hard work of the team that worked on [00:45:00] it thinking about this and, presenting it to officials within our government, within other allied governments, and, that's just a pretty, pretty unusual thing to do, that, that, that feels something like something more out of a military thriller or something like that.
DARIO: That's unusual. Speaking to Congress is unusual. Thinking about... Where, like where we're going to be in like, three or four years are the models gonna, run rampant on the internet or something like that, spend a good deal of time thinking about, how do we, how do we prepare for that scenario?
DARIO: Another thing is like, Thinking about, could at some point, could the models at some point be morally significant entities? That's really wacky, really strange. I still don't know how, how to be sure of that or how or how you'd measure that.
DARIO: It might be, it might be an important thing. It might not be an important thing. But, we take it seriously. And there is definitely this weird juxtaposition of I'm like, looking for a chief product officer one day, and thinking about bioweapons
LOGAN: bioweapons
DARIO: and, model moral status the next day.
LOGAN: There was a Vox article that said something to the effect [00:46:00] of an employee predicted there was a 20 percent chance that a rogue AI would destroy humanity within the next decade to the reporter, I guess that was around is, does all this stuff weigh heavily on the organization on a daily basis, or is it mostly consistent with a normal startup for the average
DARIO: Yeah, so I don't know. I'll give my own experience, and it's the same thing that I, that I recommend to others. I really freaked out about this stuff in 2018 or 2019 or so when, when I first believed that, turned out to be At least in some ways correct that the models would scale very rapidly and they would have this importance to the world.
DARIO: It's I saw it. I saw it in my mind's eye and I'm like, Oh, my God, this is scary. I'm walking around on the street. I'm the only one who knows what's going to happen to all of these people in 5 or 10 years. It's we're in a really scary situation. Of course, I didn't know if I was crazy or prescient or, I still don't know for sure.
LOGAN: GPT 2
DARIO: that made you realize that, or that you saw, or
LOGAN: Is
DARIO: was it a...
GPT3 vs GPT4
LOGAN: it for me. [00:47:00] It's what made it real. GPT 3 was more so and, Claude and GPT 4 are, of course, even more impressive. But the moment where I really
DARIO: Kind of believed the scaling trends that we had been seeing that, was really real and would lead to real things was like the first time I looked at GPT 2, I was like, oh my, this is this is crazy. This is, there's nothing like, there's nothing like this in the world.
DARIO: It's crazy that this is possible.
LOGAN: And was it in particular the jump between the prior version to that and seeing that
DARIO: Yeah the, it was the delta and it was just the things that it was capable of. Like it felt like a general, it felt like a general induction or general reasoning engine. For years after that, people said models couldn't do reasoning.
DARIO: I looked at GPT 2 and I'm like, Yeah, you scale this thing up. It's really going to be able to see any pattern and reason about anything. Again, we still don't know that for sure. This is still unsure. There are still people who say it can't. And for all I know, they're right. But that was the moment that I that I saw it.
DARIO: And I had a very difficult year or so, or, I, Tried to, come where I tried to come to terms with what I believe to be the [00:48:00] significance of it now, five years later, I'm much more in a position where it's like this is the job we got to do right where, you signed up to do this.
DARIO: You have to be a professional and you have to address risk in a sensible way. And, I found it useful to to think about the strategies used by people who professionally address risk or deal with dangerous situations, right? People who are on, military strike teams, people in the intelligence community, people who deal with, high, high stakes critical decisions for, for national security or disaster relief or something like that.
DARIO: Doctors, surgeons, you talk to all these people and they have You know, they have techniques for thinking of these decisions rationally and, making sure that they don't get caught up in them too much. And so I, I try to adopt those techniques and I've told other people in the org to think in that way as well.
LOGAN: To You had made a comment that you don't like the concept of personalizing companies in this whole, hey, the memification of a CEO in some regard. Is that just a [00:49:00] personal tax to who you are?
LOGAN: Do you think it's actually like a societal issue
DARIO: Yeah, it's definitely my personal style, I think this is closely connected to, the thing about Twitter.
DARIO: I think people should think about companies and the incentives they have and the actual substance of the decisions they make. I think, nine, kind of nine, nine times out of 10, if someone seems I don't know, charming or relatable, or, you talk to them on Twitter, and it seems like there's someone who, you could sit down with them and really that's that could just be very misleading, right?
DARIO: It's, it's not Necessarily a bad sign, but I think it's pretty uncorrelated to like what? What is the actual effect that the company's having in the world? And I think people should focus on that. And we've tried to focus on that in terms of the structural elements, right? The I'm not the only one who is ultimately responsible for these decisions.
DARIO: The L. T. B. T. Is designed As this check as the supervisory body everyone doesn't have to look. What's Dario going to do in this situation? And then Anthropic is only one company within a space of many other companies. There are also government [00:50:00] actors there, and this is the way it should be.
DARIO: No one person, no one entity should, should have too much say over any of this. I think that's always unhealthy.
LOGAN: We've talked about a lot of the negative sides or implications that come around with running an anthropic. I'm sure there's some positive ones other than being in the eye of the nucleus or storm or all the stuff that's going on today.
LOGAN: Do you have any weird data points on like number of applicants or like the inbounds you get or all
DARIO: Yeah. We can talk about, Amazing positive stuff in the short term. And I'm also excited about positive stuff in the long term. So maybe let's take those one by one. And, I think, if I, I think we should talk more about the positive stuff. I often see it as my duty to make sure people are aware of the concerns in a responsible and sober way, but that doesn't mean I'm not excited about all of the potential I am.
DARIO: So speaking about. Anthropic in particular. Millions of people have signed up to use Claude. Thousands of enterprises are thousands of enterprises are using it. [00:51:00] And, smaller number of very large smaller number of very large players have started to adopt it.
DARIO: I just I've just been excited by some of the use cases. When we look at, particularly legal financial things like counting when, when you see suddenly people are able to talk to documents, right? You can just you can just upload a company's financial documents.
DARIO: You can just upload a legal contract and ask questions that, you would have needed a human to spend many hours on. And this is just in a very practical way. This is just Saving people's time and providing them with services that they just, it would be very difficult for them to have otherwise.
DARIO: I don't know it's hard not to be excited by that and of course excited by, all the amazing things that technology can do. I know of someone who was like, used it to translate math papers from Russian and, they were good enough that, it all makes sense.
DARIO: And they were able to they were able to understand something that would have been very difficult for them to understand it before. In the long run, I'm even more excited. I've talked about this a little before, having been in biology and neuroscience, [00:52:00] I'm very convinced that the limiting factor there was that the basic systems were getting too complicated for humans to make sense of.
DARIO: If you look at the history of science, things like physics, there's very simple principles in the world. We managed to solve those because the, physics is not fully solved, many parts of the basic operation of our world, we understand, and then within biology, things like, viral disease or bacterial disease, it's very simple, there's something invading your body you need to find some way to kill the invader without hurting yourself, and because you and the invader are pretty different biologically, it's not that hard, so we've solved the problem.
DARIO: What's left is things like cancer. Alzheimer's disease, the aging process itself to some extent, things like heart disease and, I worked on, I worked on some of those things in, my, in my career as a biological scientist, and just the complexity of it, right?
DARIO: It's you're trying to understand, how proteins,[00:53:00] build cells and how the cells get dysregulated. It's there's 30, 000 different proteins and each one of them has 20 different post translational modifications, and each of those interacts with the other proteins in this really complicated web, that, makes one cell run, and that's just one type of cell, and there's hundreds of other type of cells and so one of the things we've already seen with the language models is that they know more than you or I do, right?
DARIO: A language model, they know about, The history of samurai in Japan, at the same time as they know about the history of cricket in India, at the same time as, they can tell you something about, the biology of the liver or something like that you list enough of these topics and there's no one on earth who knows.
DARIO: Who has that breadth, even to the level that, even to the level that a language model does, even with all the things that it says wrong right now. And so my hope is that, in terms of biology, right now we have this network of thousands of experts who all have to work together. If you can have one language model that can connect all the pieces, and not just [00:54:00] like big data will help biology, that's not my thesis here.
DARIO: My thesis is that they'll be able to do... And work along with the humans a lot of things that human biologists human medicinal chemists do and really track the complexity and be a match for the complexity of, these disease processes that are happening in our body.
DARIO: And so I'm hopeful that we'll have, another. Renaissance of medicine, like we had in the late 19th century or early 20th century when you know All these diseases we didn't know how to cure rely. Oh, we discovered penicillin and we discovered vaccines I'll take cancer as one like it, you know any biologist or medicinal chemist, you know who I said Could we cure cancer in five years?
DARIO: They'd be like that's fucking insane There's so many different types of cancers these breakthrough, you know We have all these breakthroughs that handle one really narrow type of cancer I think if we get this AI stuff right, then we, maybe we could really do that. I know it's hard to be more inspiring than
LOGAN: Yeah, totally. Justin and Rishabh, when you're editing this, I want to cut this back after the [00:55:00] constitutional AI part, because I forgot to ask about responsible scaling policy, which so just a note in there.
LOGAN: The other thing that you guys have done recently is responsible scaling policy.
LOGAN: Can you talk a little bit about what that, yeah.
DARIO: Yeah. Could I get could I
LOGAN: Restate No, the center. Okay, cool. Yeah,
DARIO: Sure. Okay.
LOGAN: Yeah. One of the other things that you introduced around safety, broadly speaking, is responsible scaling policy. Can you talk about what that is?
DARIO: Yes so our responsible scaling policy is this set of commitments that we recently released that is a framework for how to safely make more and more powerful AI systems and confront the greater and greater dangers that we're going to face with those systems.
DARIO: So maybe the easiest way to understand it is, to think of kind of the two sides of the spectrum. So one extreme side of the spectrum is like Build things as fast as possible. Release things as much as possible. Maximize technological progress. And I, I understand that position and have sympathy for it in [00:56:00] many other, in, in many other contexts.
DARIO: I just think, a eyes are, particularly tricky technology.
LOGAN: have to put e slash ACC in your Twitter bio if you believe that. I think
DARIO: Maybe I should put both. And so the the other extreme position, which I, I also have some sympathy for despite it being absolutely the opposite position is, Oh my God, this stuff is really scary. And the most extreme version of it was, we should just pause. We should just stop.
DARIO: We should just stop building the technology for indefinitely or for some specified period of time. And I think my problem with that has always been okay, let's say we pause for six months, what do you actually gain from that? What do you what do you do in those six months?
DARIO: Particularly with the more powerful models. being needed for safety of more powerful models. It's like you've frozen time, you've stopped the engine. What do you get at the end of it? And if you were to pause for an indefinite length of time, then you raise these questions like how do you really get everyone to stop?
DARIO: There's an international system here. There's dictators who want to use this stuff to take over the world. People use that as an excuse, but it's also true.[00:57:00] And that extreme position doesn't make too much sense to me either. But what does make sense to me is, Hey, Let's think about caution in a way that's actually matched to the danger.
DARIO: Right now, whatever we're worried about in the future, right now, today's systems they have a number of problems, but I think they're the problems that come with any new technology, not these kind of... Special problems of, bioweapons that would kill millions of people or, the models taking over the world in some form.
DARIO: Let's have relatively normal precautions now, but then let's define a point at which, you know, when the model has certain capabilities, we should be more careful. So the way we've set this up is we've defined something called AI safety levels. So there's something called bio safety levels in the U.
DARIO: S. government, which is like... For a given virus, you define how dangerous it is, it gets categorized as like BSL 1 or BSL 3 or BSL 4, and that determines the kind of containment measures and procedures you have to take to, to control [00:58:00] that virus. We think of AI models. We think of AI models in the same way.
DARIO: There's value in working with these very powerful models, but they have dangers to them. And so we have these various thresholds between ASL 2 and ASL 3, between ASL 4, and at each level there's certain criteria that we have to meet. So right now we're at ASL 2 as we've defined it.
DARIO: Before we get to ASL 3, we have to develop security that we think is sufficient to prevent any kind of anyone who's not a super sophisticated state actor from stealing the model. That's one thing. Another thing is we have to make sure that when the models reach a certain level of capability, We're really certain that they're not going to provide a certain class of dangerous information and to figure out what that is, we're going to work with some of the best biosecurity experts in the world, some of the best cyber security experts in the world to understand what really would be dangerous compared to what can be done today.
DARIO: Today, with a Google [00:59:00] search, we've defined these tests and these thresholds very carefully. And so how does that relate to the two sides of the spectrum compared to a pause? The ASL thresholds could actually lead us to pause because if we get to a certain capability of model and we don't have the relevant safety and security procedures in place, then we have to stop developing more powerful models.
DARIO: So the idea is There could be a pause, but it's a pause that you can get out of by solving the problem. It's a pause you can get out of by developing the right safety measures. And so it incentivizes you to develop the right safety measures. And in fact, incentivizes you to avoid ever having to pause in the first place by proactively developing the right set of safety measures.
DARIO: And as we go up the scale, we may actually get to the point where you have to very affirmatively show the safety of the model, where you have to say, yes I'm able to look inside this model, with an x ray with interpretability techniques and say, yep, [01:00:00] I'm sure that this model is not going to engage in this dangerous behavior because, there, there isn't any circuitry for doing this, or there's this reliable suppression circuitry.
DARIO: So it's really a way. to shoehorn in a lot of the safety requirements, put them in the critical path of making the model. And hey, if you can be the first one to solve all of these problems and therefore safely scale up the model, not only will you have solved the safety problems, but that kind of aligns the business incentives with the safety incentives.
DARIO: Our hope, of course, is that others will adopt the responsible scaling plan, either, due to, also being excited about it or not wanting to look like the bad guys and that, and that eventually it can also be an inspiration for policies so that, every, so that so that everyone is held to some version of the responsible scaling plan.
DARIO: And, how does it relate to this other thing of build as fast as we can? Look, one way to think about it is like the responsible scaling plan. It doesn't slow you down except where it's absolutely necessary. It only slows you down where it's like there's a crit, [01:01:00] there's a critical danger in this specific place with this specific type of model.
DARIO: Therefore, you need to slow down. It says nothing about, Stopping at some certain amount of compute or stopping for no reason or stopping for a specific amount of time. It says. Keep building until you get to certain thresholds. If you can solve the problems with those thresholds, then keep building after that.
DARIO: It's just that as the models get more and more powerful. Safety has to build along with the capabilities of the model. And our hope is that if we do that, and others do that, it creates the right culture internally at Anthropic, and it creates the right incentives for the ecosystem and companies other than Anthropic.
DARIO: I'm aware that since we published our responsible scaling plan Several other organizations are internally working on responsible scaling plans, for all I know, one of them, one or more of them might be out by the time this podcast is out. There aren't any out now, I think our release of the R.
DARIO: S. P. has lit a fire under some of these other organizations where they, they don't want to look [01:02:00] bad. They don't want to look like, oh, Anthropic has, has a plan for this and the rest of us don't. And that's really part of what we're trying to, what we're trying to do in the world and what we're trying to do with the ecosystem.
DARIO: And hopefully they put out something. Hopefully they try and make it better than ours. That would be a win for us.
LOGAN: Who do you think people worry about looking bad to? Is it just general public? Is it government regulators? Is it customers?
DARIO: Yeah, I think I think all of the above in different ways. Would care about different aspects of it. I wouldn't again. I wouldn't overrate this. I said, I said this whole cheering people cheering for you on Twitter is like a bad thing.
DARIO: But I do wonder if we can we can harness this a little bit right? In this one specific case. Where I don't know, it just looks it just looks bad if we have a plan and you don't so so like other AI companies, if you're working on these at one of the other at one of the other AI companies, tell your leadership that, you feel bad that Anthropic has one of these and you don't and if you're the leaders of one of these companies, go out and do the right thing.
DARIO: Try and make something better than what we have. I won't mind.
LOGAN: Yeah. Okay. Now you said you've been right about a [01:03:00] lot of things related to AI, but you've also been wrong and surprised by a bunch. What have you been most wrong about or surprised by?
DARIO: Yeah, I don't know. So I've been wrong about I've been wrong about a bunch of stuff.
DARIO: I think how this prediction stuff works is like, if you're thinking about the right things, if you're predicting the right things, you only have to be right about 20 20 percent of stuff for, for it to have these huge consequences, right? If you if you predict five things that like, no one in the world thinks is going to happen, and it would have enormous consequences, and you're right about one of them.
DARIO: So it's a little bit like VC, right? It's if if in, in 1999, you invested in Google and for companies that no one heard of that's a pretty good portfolio. So yeah. But yeah, end up being wrong about lots of stuff.
DARIO: One example of that I don't know, I could come up with a few examples but one is I thought, certainly going back in 2019 or so, when, I first saw the scaling situation, I thought that we were going to scale for a while with these pure language models. And then what we needed to do was [01:04:00] immediately start working on agents acting in the world.
DARIO: Not necessarily robotics, but there have been all this stuff on Go. Starcraft, Dota, these other video games, all of which used reinforcement learning before the era of the large language model. So I thought we were going to put the two together almost immediately, and that almost all the training by now, by 2023, 2024, was going to be,
LOGAN: going to be
DARIO: these large language models that were already as big, already as big as they could usefully be made would would act in the world.
DARIO: But we found instead is we've just kept scaling the language models and I, I still think all the RL stuff is going to be promising. It's just we haven't gotten to it because it isn't the lowest hanging fruit because it's simpler to just spend more money to make these models bigger than to design something new.
DARIO: It's completely economically rational and the models just keep getting better and better which, I didn't doubt that they would get didn't doubt that they would get better, but I guess I imagined that things would happen in a little bit of a different order.
LOGAN: Do you think data will be a scaling issue in the [01:05:00] near term?
DARIO: term?
LOGAN: Yeah I think there's actually some chance. I would say there's a 10 percent chance that we get
DARIO: Blocked by data the reason I mostly don't think it is, the deeper you look on, the internet's a big place and the deeper you look, the more high quality data you find. And this is without even getting into kind of licensing of private data. This is just, this is just publicly available data.
DARIO: And then there are a bunch of promising approaches, which I won't get into detail about, for how to make synthetic data. And then, again, I can't get into detail, but we've thought a lot about this, and I bet the other LLM companies have thought a lot about it as well. And I would guess that one of those two, at least one of those two paths, very likely to pan out.
DARIO: But it's not a slam dunk. I don't think we've, proven yet that this will work at the scale we need it to work. This will work for a, a 10 billion model that, you know, that, that needs God knows how many trillion God knows how many trillion words fed into it, real or synthetic.
LOGAN: Why don't you like the term AGI?
DARIO: I like the term AGI [01:06:00] ten years ago because no one was talking about the ability to do general intelligence 10 years ago, and so it felt like a useful concept. But now, I actually think ironically, because we're much closer to the kinds of things AGI is pointing at, it's no longer a useful term.
DARIO: It's, it's a little bit like if you see some object off in the distance on the horizon, you can point at it and give it a name. But you get close to it and, it turns out it's like a big sphere or something and you're standing right under it. And so it's no longer that useful to say this sphere, right?
DARIO: It's, it's basically, it's all around you and it's very close and it actually turns out to denote things that are quite different from one another. So one thing I'll say, I said this on a previous podcast, I said, I think in two to three years, the, yeah. LLMs plus whatever other modalities and tools that we add are going to be at the point where they're as good at human professionals at a wide range of knowledge work tasks, including science and engineering.[01:07:00]
DARIO: I definitely, that would be my prediction. I'm not sure, but I think that's going to be the case. And, when people commented on that or put that on Twitter, they said, Oh, Dario thinks AGI is going to be two to three years away. And so that then conjures up these image of, there's going to be swarms of nanobots building Dyson spheres around the sun in, in two to three years.
DARIO: And of course, this is absurd. I don't necessarily think that at all. Again the specific thing I said was. There are going to be these models that are able to, on average, match the ability of human experts in a wide range of things that they can do. There's so much between that and, the super intelligent God, if that latter thing is even possible or even a coherent concept, which it may be, or it may not be one thing I've learned on the business side of things is that.
DARIO: There's a huge difference between a demo of a model can do something versus this is actually working at scale and can actually economically substitute. There's so many... little interstitial things that's Oh, the model can do 95 percent of the task. [01:08:00] It can't do the other 5%, but it's not useful for us in less, unless we're able to substitute in AI and to end for the process, or it can do a lot of the tasks, There are still some parts that need to be done by humans, and it doesn't integrate with the humans it's not complementary, it's not clear what the right interface is, and so there's so much space between, in theory, can do all the things humans can, and In practice is actually out there in the economy as full co workers for humans, and there's a further thing of can it get past humans?
DARIO: Can it outperform the sum total of humans, say, scientific or engineering output? That's like a, that's another point that point could be, could be like a year away because the model gets is better at making itself smarter and smarter, or it could be many years away.
DARIO: And then there's this further point of okay, can the model like, explore the universe and set out a bunch of von Neumann probes and, build Dyson spheres around the sun and, calculate the meaning of life is 42 or whatever. That's [01:09:00] like a that's like a further point that also raises questions about, what's practical in an engineering sense in all of these kind of weird things.
DARIO: So that's like another further point. It's possible. All of these points are pretty compressed together because there's like a feedback loop, but it's possible they're very far away from each other. And so there's this whole unexplored space of you say the word AGI and you're like referring, you're smooshing together all of those things.
DARIO: I think some of them are very practical and near term, and then I have a hugely hard time thinking about does that I meet, does that lead very quickly to, to all the other things? Or, does it lead after a few years or are those other things like not as coherent or meaningful as we think they are?
DARIO: I think all of those are possible. So it's just a mess. We're just, we're flying very fast into this. This glob of concepts and possibilities, and we don't have the language yet to separate them out. We just say AGI, and I don't know, it's just a it's like a, it's like a buzzword for a certain community or certain set of science fiction concepts.
DARIO: When really we [01:10:00] It's pointing at something real, but it's pointing at like 20 things that are very different from one another, and we badly need language to actually talk about them.
LOGAN: What do you think happens on the next major training run for LLMs?
DARIO: My guess would be, nothing truly insane happens, say, in any training run that, happens in 2024.
DARIO: I think all the, all the, the stuff, the good and bad stuff I've talked about, the, ability, to really invent new science, the ability to to, to cure diseases the ability to make bio, yeah, the ability to make bioweapons, yeah, and maybe someday that the Dyson spheres the least impressive of those things, I think, will happen, I would say no sooner than 2025, maybe 2026.
DARIO: I think we're just going to see in 2024, crisper, more commercially applicable versions of the models that exist today. Like we, we've seen a few of these generations of jumps. I think in 2024, people are certainly going to be surprised. They're going to be surprised at how [01:11:00] much better these things have gotten.
DARIO: But it's not going to quite bend reality yet, if you know what I mean by that. I think we're just going to see things that are crisper, more reliable. Can do longer tasks. Of course, multimodality, which we've seen in the last we've seen the last few weeks from multiple companies is going to play a big part.
DARIO: Ability to use tools is going to play a big part. Generally these things are going to become a lot more capable. They're definitely going to wow people. But this reality bending stuff I've I'm talking about, I don't expect that to happen in 2024.
AI vs Human Brains
LOGAN: 2024. How do you think the analogy of versus a brain breaks down for large language models?
DARIO: Yeah, so it's actually interesting. This is one of the, being a former neuroscientist, this is one of the... The mysteries I still wonder about. So the general impression I have is that the way that the models run and the way they operate I don't think it's all that different. Of course the physiology, all the details are different.
DARIO: But I don't know, the basic combination of linearities and [01:12:00] nonlinearities, the way they think about language, to the extent that we've looked inside these models, which we have With interpretability, we see things that would be very familiar in, the brain or a computer architecture. We have these, we have these registries.
DARIO: We have variable abstraction. We have neurons that fire on different concepts. Again, the alternating linearities and nonlinearities and just interacting with the models. They're not That, they're not that different. Now, what is incredibly different is how the models are trained, right?
DARIO: The, if you compare the size of the model. to the size of the human brain in synapses, which, of course, is an imperfect analogy. But there's something like, still, maybe a thousand times smaller, and yet, they see maybe a thousand or ten thousand times more data than the human brain does. If you think of, the number of words that a human hears over their lifetime, it's a few hundred million.
DARIO: If you think of the number of words that a language model sees, the latest ones are in the trillions, or maybe even tens of [01:13:00] trillions. And that's just, that's like a factor of 10, 000 difference. It's as if we've that neural architectures have some, there's lots of variance to them, but they have some universality to them.
DARIO: But that, Somehow we've climbed the same mountain with the brain and with neural nets in some very different way according to some very different path. And we get systems that when you interact with them are, there's still a hell of a lot they can't do, but I don't see any reason to believe that they're, fundamentally different or fundamentally alien.
DARIO: But what is fundamentally different and what is fundamentally alien is the completely different way in which they're trained.
LOGAN: You said that alignment and values are not things that will just work at scale. And we've talked about constitutional AI and some of the different viewpoints there, but can you extrapolate on that view?
DARIO: Yeah, this is a bit related to the point that I said earlier about that there's this fact value distinction, right? You cram a bunch of facts into the model. You train it on, everything that's present on the Internet, and it leaves this blank space or this [01:14:00] undetermined variable.
DARIO: I, I basically just think that it's up to us to to determine the values, the personality especially the controllability of these systems. There's another sense in which I would say this, which is just that, that naturally these are statistical systems, and they're trained in this very indirect way, right?
DARIO: Even the Constitution, it's like the Constitution is pretty solid, but then the actual training process uses a bunch of examples, it's opaque. And of course the... Part where you put in place tens of trillions of words like no human ever sees that so it's Still I think very opaque and hard to track down and you know I think it's I think it's very prone to failures and this is why we focus on interpretability steerability and reliability we really want to tame these models make sure that You're able to control them and that they do what humans want them to do.
DARIO: I don't think that [01:15:00] comes on its own, any more than that comes on its own for airplanes, right? The early airplanes probably they wouldn't crash every time you fly them. But I wouldn't want to, I wouldn't want to get in the Wright Brothers plane every day.
DARIO: And just bet that every day it would not crash and would not kill me. It's just not, it's not safe to that standard. And I think today's models are basically like that.
LOGAN: Why is mechanistic interpretability so hard to do?
DARIO: Yeah mechanistic interpretability is this. It's an area that we work on, which is basically trying to look inside the models and, and analyze them like an x ray.
DARIO: And I think the reason it's so hard, it's actually the same reason why it's hard to look inside the brain, right? The brain wasn't designed to have humans look inside it. It was, it was designed to serve a function, right? The interface that's accessible. To, to other humans is, your speech not, the actual neurons in your brain, right?
DARIO: They're not designed to be read in that way. Of course, the advantage of reading them in that way is, it's you get something that's much closer to a [01:16:00] ground truth. Not a perfect ground truth, but, if I really understood how to look in your brain and understood and understood what you were thinking, it would be much harder for someone to, To deceive someone else about their intentions or for behaviors that might emerge in some new situation to not to not be evident.
DARIO: So there's a lot of value in it. But yeah, there's you know, there's nothing,
LOGAN: in
DARIO: both the case of the brain and in the case of the large language models, they're not designed or trained in a way that makes them easy to look at. It's a little, it's a little bit like, we're inspecting this alien city that wasn't built to be understood by humans.
DARIO: It was built to function as an alien city. And so we might get lucky, we might get clues, we might be able to figure it out, but there's no guarantee of success. We're on our own. And so that's what we do. We do our best. That said, I am becoming increasingly optimistic that interpretability can be, I don't know about fully solved, but that it can be an important guide to showing that models are safe and even that it will have commercial value in, in the areas [01:17:00] of like trust and safety or classification filters or moderation, fraud detection.
DARIO: I think there's even legal compliance aspects to interpretability. So My co founder, Chris Ola has been working on interpretability. He's run a team at Anthropic for the last two and a half years. Before that, when we were at OpenAI, he ran a team that worked on interpretability of vision models for three years before that.
DARIO: And for that entire period it's been just basic research, right? There's been no, commercial or business application. Chris and I have just kept it going because we, we believe that this is something that will... Pay off from a safety perspective and maybe even from a business perspective and now actually for the first time You know you will you will see by the time this podcast comes out
DARIO: We're we're releasing something that shows that we've really been able to solve something or make good progress towards solving something called the superposition problem which is that if you look inside a neuron, it corresponds to many different concepts.
DARIO: We found a way to [01:18:00] disambiguate those concepts so that we can see all the individual concepts that are lighting up inside, inside a Inside one of these large LLMs, it's not a solution to anything, to everything, but it's just a really big step forward. And for the first time, I'm optimistic that, give us two or three years, I don't know for sure, but we might actually be able to get somewhere with this.
DARIO: And depending on your
LOGAN: be able to get some of understanding of all this stuff, why would that be important for safety?
DARIO: Yeah. I would go back to the x ray analogy there, right? If I can really look inside your brain, if I if I can say this is what's happening, I can ask you questions and you can say things that sound great and, I've no idea if you're telling the truth or if it's all just bullshit, right?
DARIO: But if I look inside your brain and I have the ability to understand what I'm seeing, then it becomes much harder to be misled. Similarly, with language models, I can test them in all kinds of situations and it'll seem like they're fine. The fear is always, Oh, if I talk to the language model in this way, I could get it to [01:19:00] do something really bad.
DARIO: Or if I put it in this situation, it could do something really bad on its own. That's always the fear, right? That's the fear we have every time we deploy a model. We've had a hundred people test it. We've had a thousand people red team it. But when it goes out into the world, a million people will play with it.
DARIO: And one of them will find something that's truly awful. And we'll find, Oh if I use this trick for talking to the model, it's it'll. It will finally be able to produce that bioweapon or Oh, if I put it in this place where it has access to infinite resources on the cloud, it'll, just self replicate itself infinitely.
DARIO: And so interpretability is at least one attempt at a method. to address that problem to say, okay, instead of thinking, instead of trying to test the model in every situation that it could be in, which is impossible, we can look inside it and try and decompile it and say, what would the model do in this situation?
DARIO: We understand that the algorithms that it's following, we understand what goes on in different parts of its brain, at least to some extent. So we [01:20:00] can pose the hypothetical and say, Hey, what would happen in this whole. Part of the space, what would happen in this whole class of areas.
DARIO: And, if we can do that, we have some, we have some ability to exclude certain behaviors. To say, okay, we know the model won't do that. Which you never have behaviorally, it's just it's just like humans, I'm like, what would you do in a life threatening situation?
DARIO: I don't know what I would do, in, I don't know what I would do in a life threatening situation. I don't know what you would do in a life threatening situation. It's hard to know until you're actually in the situation. But if I knew enough about your brain, I might be able to say,
LOGAN: What's your view on open source models?
DARIO: Yeah. That's a, that's a obviously a complex topic. I think as with many things in, as with many things in, in AI versus the rest of technology, From a normal technological perspective, I'm extremely pro open source. I think, it's accelerated science.
DARIO: It's accelerated innovation. It allows, errors to be fixed faster and development to happen faster. And I certainly think, for the smaller models, for the smaller open source [01:21:00] models this is true for AI as well and I don't see much danger to smaller models.
DARIO: Therefore, I think open source as it's being practiced by every open source model that's been released up to this point seems perfectly fine to me. My worry is more around the large models. And my worry in particular is that these models that are offered via API, and I'm, I'm talking about models of the future that really are dangerous not models in two or three years, maybe one year not today's not today's models if they're offered by API, or even if you have fine tuning access to them, there's a lot of levers that you have to control the behavior of the model, right?
DARIO: You can put in your constitution, don't produce bio weapons, right? And then if the model does it anyway, you can basically Make changes to the model. You can say, okay, we're just. Retracting that version and serving a new version of our model that patches a particular hole. You can monitor users.
DARIO: So if a million people are using the model, and within that there's these five bad actors in this terrorist cell, you can use your trust and safety team to identify the [01:22:00] terrorist cell, cut them off, and even call law enforcement if you want to do it. So it, it really provides an ability. You don't have to get things right in the first time.
DARIO: And if something dangerous happens, you can, you really have the ability to fix it. With models where their weights are released you don't have any of that control, right? The minute you release the model. All basically all of this control is lost. And so that's our concern that doesn't mean, by the way, that large open source models shouldn't exist.
DARIO: But the way we put it in our responsible scaling plan is we say, okay, when models get to the level where they're smart enough to create these dangerous capabilities, and the next one for us is ASL three, we're at ASL two right now then models have to be tested for dangerous. behavior according to the complete attack surface according to which they're going to be released in reality.
DARIO: So if you're just releasing an API, then you have to test that, you can't build a bioweapon with the API. If you're releasing the model with an API and fine tuning, then the people who are testing the [01:23:00] model have to mock up the test with the fine tuning. If the model is being released in practice, then the right test to run would be, I'm a mock terrorist cell.
DARIO: I'm, I, I get the weights released to me. I can do anything I want with those weights. Is there some way to release the weights of the model so that they can't be abused? I think there might very well be, but I think... People who want to release model weights have to confront that problem and have to find a solution to that problem.
DARIO: I'll say, by the way, because there's a person on my team who who this is one of their pet peeves. The word open source I don't think is necessarily appropriate in the case of all of these models. I think it is appropriate in the case of small developers and companies where kind of their whole business model is about, their whole business model is about open source, but when much larger companies have released the weights of these models, they generally have not released them under open source lice under, under open source licenses, they've generally asked people to pay them when they use them in commercial ways.
DARIO: So I would think of [01:24:00] this as, less open source. And more that model weight releases a particular business strategy for these for these large companies. And again, I'm not saying that model, model weights can't be released. I'm saying that the, the tests for them need to be commensurate with the issues and that we shouldn't automatically say, oh, open source is good.
DARIO: Some of these are not open source. They're the business strategies of large companies that involve releasing model weights.
LOGAN: Yeah. Do you think about oftentimes? Paul Cristiano recently on a podcast said he thinks there's a 50 percent chance. I think the way he phrased it was that his the way he ends up passing away is something to do with AI. Do you think about percentage chance doom
DARIO: Yeah, I think it's popular to give these percentage numbers. And, the truth is that I'm not sure it's easy to put a number to it and if you forced me to, it would fluctuate all the time. I think I've, I think I've often said that, my, my chance that something [01:25:00] goes, really quite catastrophically wrong on the scale of, human civilization, it might be somewhere between 10 and 25 percent when you put together the risk of something going wrong with the model itself with, something going wrong with Human, people or organizations or nation states misusing the model or it inducing conflict among them or just some way in which kind of society can't handle it.
DARIO: That said, what that means is that there's a 75 to 90 percent chance. That this technology is developed and everything goes fine. In fact, I think if everything goes fine, it'll go not just fine. It'll go really great. Again, this stuff about curing cancer. I think if we can avoid the downsides, then this stuff about, about curing cancer, extending the human lifespan.
DARIO: Solving problems like mental illness. This all sounds utopian, but I don't think it's outside the scope of what the technology could do. I often try to focus on the 75 to 90 percent chance where things will go right. And I think [01:26:00] one of the big motivators for reducing that 10 to 25 percent chance is You know how great it is trying to increase is trying to increase the good part of the pie.
DARIO: And I think the only reason why I spend so much time thinking about the tent that 10 to 25 percent chance is, Hey, it's not going to solve itself. I think the good stuff, companies like, like ours and like the other companies have to build things, but there's a robust economic process that's leading to the good things happening.
DARIO: It's great to be part of it. It's, it's great to be one of the ones building it and causing it to happen, but there, there's a certain robustness to it. And, I find more meaning, I find more, when this is all over, I think. I personally will feel I've done more to contribute to, whatever utopia results if we focus, if I'm able to focus on reducing that, that risk that it goes badly or it doesn't happen because I think that's not the thing that's gonna, that's gonna, that's not the thing that's gonna happen on its own.
DARIO: The market isn't going to provide that.
LOGAN: Do you worry more about the misuse [01:27:00] by people misusing it or the AI themselves? Or is it just different timelines? Yeah partially different timelines. If I had to tell you, I would say
DARIO: The misuse to me seems more concrete. And I think, will happen sooner. Hopefully we'll stop it and it won't happen at all. I, I think the AI itself doing something bad is also a quite significant risk. It's a little off in the future and it's always been a bit more shadowy and vague.
DARIO: But that doesn't mean it isn't real. You just look at the rate the model is getting better and you look at something like being or Sydney, it really gives you a taste of hey, these things can really be out of control and, psychopathic. And the only reason being in Sydney didn't cause any harm is that, they were out of control and psychopathic in a very limited way, limited both in that.
DARIO: It was confined to text and limited in that, it just wasn't that smart, tried to manipulate the reporter, tried to get him to leave his wife, but like it wasn't really, it wasn't really compelling enough to, to [01:28:00] get a human to fall in love with it. But someday maybe a model will be.
DARIO: And maybe it'll be able to act in the world and then you put all those things together and I think there is some risk there. I think it's harder to pin down, but I'm personally, I'm worried about both things and I think our job because we see such a positive potential here is, we have to, we have to find all the.
DARIO: All the possible bad outcomes and shoot them down. We have to get, we have to get all of them. And then if we get all of them then, then we can live in a really great world, hopefully. If
LOGAN: is if you could wave your hands and have everyone follow a single policy, would it be the responsible scaling policy? Would everyone have one of those?
DARIO: Yeah, I think, I think if we make a constraint of like realism where, it's I in fact can't wave my hand and get, everyone in, China or Russia or somewhere else to, stop building these powerful models.
DARIO: Like they're just some levels of like world or international coordination that are just not going to happen because of realism. So if you stipulate that, some, you can't just make everyone stop, or you can't just make everyone build in a certain way.
DARIO: The idea that, [01:29:00] hey, for most things, people should just be able to build what they want to build. But, we're coordinating off these, these particular levels of capability, these particular points in the development curve, where something concerning is happening and say, hey, mostly do what you want, but you've got to take this, there's.
DARIO: A small fraction of stuff you've got to take really seriously and, if you don't take it really seriously, you're the bad guy. That's something that I think I can reasonably recommend that everyone's, that everyone sign on to that. There's some sacrifice to it. There's some loss to winning the race, but it's, it's only as much sacrifice as is needed.
DARIO: And because it's so targeted, I think you can make a strong moral case that, hey, if you don't do this, you're an asshole.
LOGAN: How do you think about the trade-offs between building in public and having people aware with what you're doing? With, on the flip side, maintaining the secrets or that the appropriate things stay within New
DARIO: Yeah. Yeah. This is one of these this is one of these kind of difficult trade-offs, right? Definitely an org benefits from, kind of everyone knowing about [01:30:00] everything. But on the other hand, as we've seen with multiple AI companies secrets leak out.
DARIO: And, even just from a commercial perspective, forget safety. With models built in, the next year or two, let's say a model costs 3 billion, and you have an algorithmic advance that, means you can build the same model for 1. 5 billion, right? Like these kind of 2X. Two X advances along the scaling curve have occurred in the past and, may occur in the future.
DARIO: And, companies, including ours may be aware of such advances. So basically that's three lines of code that's worth. 1. 5 billion dollars. You don't want a wide set of people and you may not even want everyone within your company to know about them. And at Anthropic at least, people have been very understanding about that.
DARIO: People don't want to know these secrets. People are people are on board with the idea, hey, it's not a marker of status that you know these secrets. These secrets, should be known to the tiny number of people who are actually working on the relevant thing.
DARIO: Plus, the CEO and a couple other folks [01:31:00] who, need to be able to put the entire picture together, right? This is compartmentalization and need to know basis. And of course, it has some costs because information doesn't propagate as freely but again, just as with the RSP.
DARIO: Let's take the 80 20. Let's take the little the few pieces of information that are really, that are really essential to protect and be as free as we can with everything else.
LOGAN: We've thrown around big numbers, billions of dollars, and I think just and Justin, maybe splice this back in somewhere else. But those numbers are so big for people and the amount of money that is being spent to train these models.
LOGAN: We're for the average person listening. Like, where does all that money go into and how should they think about, like the need over time to continue to iterate on this?
DARIO: Yeah what I'll say is, at least to my knowledge, no one has trained a model that costs billions of dollars today.
DARIO: People have trained models that cost, I think, of order 100 million. But I think billion dollar models will be trained in [01:32:00] 2024. And my guess is in 2025, 2026, several billion dollar, maybe even 10 billion models will be trained. There's, there's enough compute in the industry and enough ability to do data centers that's, possible.
DARIO: And, I think it will happen, right? If you look at what anthropic has raised, has raised so has raised so far, at least it's been publicly disclosed. We're at roughly 5. 5 billion or so. We're not going to spend that all on one model. But, we certainly are going to spend you know, multiple billion dollars on training a model sometime in the next, sometime in the next two or three years.
DARIO: Where does that go? It's almost all compute. It's almost all GPUs or custom chips. And and the data center and data center that surrounds them, 80 to 90 percent of our cost is capital and almost all our capital cost is compute. The number of people necessary to train these models, the number of engineers and researchers is growing but it's, the cost is absolutely dwarfed by by, by, is dwarfed by the cost of compute.
DARIO: Of course, we also have to pay for the [01:33:00] buildings people work in, but, that again is some tiny fraction of what the cost of compute is. Maybe
LOGAN: of what the on a on an optimistic note here. And we touched on a bunch of the potential medical breakthroughs and things like that.
LOGAN: But... Why should people be optimistic about what Anthropic's doing about the future AI and everything that's going on?
DARIO: Yeah, I don't know. So I'd answer the question in two ways. One, I'm optimistic about solving the problems. I am. Getting super excited about the interpretability work, like people didn't necessarily think this was possible.
DARIO: I still don't know whether it's possible to, to really do a good job interpreting the models, but I'm very excited and very pleased by the progress we've made. I'm also excited about just You know, the wide range of ways we've been able to deploy the model safely, like the wide range of happy customers who just say, this model has been able to solve a problem that we had.
DARIO: It's all, it's solved it reliably. We haven't had, we haven't had all of these safety problems where we've managed to solve them. We've deployed something safely in the world. It's being used by lots of [01:34:00] people. That's great. That's one level of great. And I think the second level of great is this thing you alluded to with medical breakthroughs, mental health breakthroughs I think, energy breakthroughs are already doing pretty well.
DARIO: But, I imagine AI can speed up material science very substantially. I think a if we solve all these problems, I think a world of abundance really is a reality. I don't think it's utopian given what I've seen that the technology is capable of. And, of course, there are people who will look at the flaws of where the technology is right now and say it's not capable of those things.
DARIO: And they're right, it's not capable of those things today. But if the scaling laws that I'm talking about. Really continue to hold that. I think we're going to see some really radical things. One of one of the things, it's not a complete trend.
DARIO: But, I think as we gain more, mastery over ourselves, our own biology the ability to manipulate the technological world around us. I have some hope that will also lead to a, to a Kinder and more moral society[01:35:00] I think in many ways it, it has in the past, although not uniformly.
LOGAN: Yeah. That was what I had. The one thing I wanted to potentially back up to, we have a couple more minutes.
LOGAN: I we hopped that you said your childhood wasn't that interesting at all. Like, where'd you actually grow up? We can, we'll splice
DARIO: Yeah. So I grew up actually right here in San Francisco.
LOGAN: What did your parents do?
DARIO: So my my my father was a designer of leather goods.
DARIO: My, my mother was a project manager for libraries in San Francisco and and Berkeley. So actually no, no scientists in the immediate family. Anyone
LOGAN: that's in the extended family, or? Yeah, I have
DARIO: in
LOGAN: uncle who's
DARIO: extended family, or? Yeah, I have an uncle who's a chemist. I have an uncle who's a physicist on the other side of the
LOGAN: the family. Did you think, if not, if you hadn't gone down this AI path, would you be doing academic work right now?
DARIO: I think that was my assumption like that. That was what I always imagined doing. I imagined being a scientist and, scientists work at universities, but the really interesting thing about the, [01:36:00] this A.
DARIO: I. Boom is that to really be at the forefront of it. You have to have these huge resources. And, I think the huge resources are basically, they're basically only available at companies. First it was the large companies like like Google, but, more recently, startups like ours have been able to raise large amounts of money.
DARIO: And I was drawn to that direction because it had the ingredients necessary to to, to build the things we wanted to build and study the things that we wanted to study as scientists. And, I would say many of my co founders feel the same. One thing, one of my co founders who is a physicist, what he often, what he often brings up is, and It's more more more an academic question because I don't think things are going to go in this direction, but, he said, in, in my field, we build these, 10 billion telescopes in space and, we build these 10 billion particle accelerators.
DARIO: Why did the field go the way why did the field, go in this direction instead, right? Why didn't all the AI academics get together and, build a, build a 10 billion cluster? Why did it happen [01:37:00] in, in, in startups and in large companies? I don't really know the answer to that.
DARIO: Things could have gone the other way, but it doesn't seem like that's the way things have gone. And I, I don't know if it's for the better or the worse. We've learned a huge number of things. By working with customers and seeing how these things impact the economy.
DARIO: So maybe the path things went is the best path they could have gone.
LOGAN: How did, I actually don't know, how did they get access to capital in the prior or in the alternative way of doing that?
LOGAN: Was it through government grants and
DARIO: large, yeah, these large telescopes, they're often like government consortia or private.
DARIO: Large scale private philanthropy. It's honestly this huge patchwork mix. I'm surprised it even happens because if I think about it happening in this field I just can't imagine how it would happen. But in these other fields, somehow they've made it work.
DARIO: I don't actually know how, but I don't know that's just like a, that could have been like just a weird alternate history of our of our industry that didn't happen. And, I very much doubt it's going to happen, although who knows. We went on this path instead.
DARIO: And I don't know, there's a lot that's exciting and interesting [01:38:00] about this path. And, one way or another, this is the situation we're in.
LOGAN: Has working with your sister been as fun and saving the world as you had hoped it when you were
DARIO: Yeah, it's surprisingly similar to what I imagined. If you were to, if you were to look back on the things we were saying to, to, to one another as like an adult observing it, you would have been like, this is crazy. This is crazy. And kids dream, of course. But no, it's just amazing that we're able to work on this together.
LOGAN: Very cool. Dario, thanks for doing this.
DARIO: Thanks for having me.
LOGAN: That was fun. Thank you for for bearing with all the different directions we went and all that stuff.
DARIO: Yeah, that was great.
LOGAN: We'll we'll try to make it a little coherent. Or I was jumping a little bit when you would talk about something.
LOGAN: It's hard because your business model... Is normally I can lay out, like, when I'm talking to someone, I can lay out their business from their their social views from their it's all everything I would be like, alright, I wanna, I just wanna talk about Anthropa, the business, and then we'll move to safety.
LOGAN: And then we would be talking about safety, I was like, shit, okay.[01:39:00] Do I ask the question about safety now, or do I wait?
DARIO: Welcome, welcome to my life.
DARIO: This is what my mind is like on the inside.
LOGAN: I can only, it's mechanistic interpretability on on you. I I sat down with Eliezer for four hours and
DARIO: I, my condolences
LOGAN: It was one of, I will say. He was one of the It was one of the more entertaining conversations I had, and he's such an odd dude. Like he's just he's such a not even his views, just like the way he lives his life is very... Just the way he talks,
DARIO: no, I couldn't.
LOGAN: the way he
DARIO: I could tell you stories, but it'd just be gossip.
LOGAN: He referenced something, and then he goes, give me one second. And he goes and grabs his little bag and pulls out four textbooks from his bag. I was just like, he's walking around with four textbooks. One of the most interesting guys, but with him
DARIO: should interview A. L. next.
LOGAN: Yeah, oh my gosh.
LOGAN: I with him, I was able to go very structured and I wouldn't let, [01:40:00] I, it wasn't even a conversation. It was just like, I would ask a question. nod and then move on because I didn't want to respond like it was like I was giving him the mouthpiece in some ways to say what he wanted to say Without debating him.
LOGAN: I didn't want to get down the rabbit hole dark quest did it and they debated back and forth And eliezer is a very bright guy and he has thought a lot about these things and it's one of those