How Notion Cofounder Simon Last Builds AI for Millions of Users - Ep. 37 with Simon Last
Notion cofounder Simon Last told me everything he’s learned from integrating an AI application into a platform that has over 100 million users . Simon likes to keep a low profile, even though he’s the driving force behind Notion AI , one of the most widely scaled AI applications in the world. In this episode, we get into how AI changes the way he builds software since the days he cofounded Notion with Ivan Zhao in 2013. He talks about the challenges that arise because AI doesn’t follow the deterministic rules of traditional software , and how he designs evals to build AI systems that are reliable at scale. Simon tells me about the AI tools he uses to code and how he would think about rebuilding Notion from scratch with them. He also shares his thoughts on how the growing capabilities of AI are redefining human roles, and argues that we have the responsibility to shape technology to align with our collective vision of the future. This is a must-watch for anyone interested in building reliable AI products at scale. If you found this episode interesting, please like, subscribe, comment, and share! Want even more? Sign up for Every to unlock our ultimate guide to prompting ChatGPT . It’s usually only for paying subscribers, but you can get it here for free. To hear more from Dan Shipper: Subscribe to Every: https://every.to/subscribe Follow him on X: https://twitter.com/danshipper Links to resources mentioned in the episode: Simon Last: @simonlast Notion AI: https://www.notion.so/product/ai The AI code editor Simon uses: Cursor
- Published
- Published Nov 8, 2024
- Uploaded
- Uploaded Jun 13, 2026
- File type
- POD
- Queried
- 00
- Source
- share.transistor.fm
Full transcript
Showing the full transcript for this episode.
AI-generated transcript with timestamped sections.
[00:00] Let's just assume as a thought experiment that you're going to strip away everything that Notion currently is and rebuild it with, [00:07] AI, how would you do it? One big principle would be there's less humans touching the database. The AI should be managing it for you. Ideally, you never need to update any of the field and it should know the amount based on your email. You all probably have like one of the most scaled AI applications in the world. Previously, software was deterministic and now it's very stochastic. It's much more squishy. I'm curious what you learned about doing that. The AI thing is so hard because it can fail in some cases, but then there's this like additional meta problem. You [00:37] if you have an AI evaluating, you need to eval your eval now. How has working with AI changed what you think machines are good at versus what you [00:47] Hmm. Maybe a better question is like, what should humans do? I feel like the most important thing there is making sure that the shape of the future is aligned with what we care about. What do we want them to be doing? [01:11] Hey guys, it's me, Dan Shipper. I just want to jump in to say that this episode is sponsored by Notion. I've known about and been using Notion since 2015, so almost 10 years. And I actually run every and a decent part of my personal life on Notion. They care a ton about the craft and the underlying ideas behind the software that they make. And I think that comes through in the way Notion is built and the experience of using it every day.
[01:41] unlimited AI worth up to $6,000 for free if you're a startup. If that sounds interesting to you, go to ntn.so slash every, select every in the partner dropdown and use the code everyxnotion to get access. And now back to the episode. Simon, welcome to the show. [01:59] Hey, thanks for having me. [02:00] So for people who don't know you, you are the co-founder of Notion. This is, I think, at least as far as I could find, the first interview that you've done outside of internal interviews for Notion, so I really appreciate you coming on. [02:12] Yeah, of course. I tend to keep a low profile, but I'm happy to do it. Great. And you're leading the AI initiatives at Notion. As far as I can tell, you were also really pushing AI before it became a thing internally, which is really interesting. And the place where I want to start with you is obviously Notion is really well known for building thinking tools. And you were building thinking tools before there were even thinking machines. [02:42] create the right primitives that to allow people to interact with that in a really flexible way to build whatever they wanted to build or think however they wanted to think and that was like in a pre AI era and so where I wanted to start with you is to ask you what you think the right primitives are in [02:57] for thinking with AI. [02:59] Yeah, that's a good question. [03:01] I think, yeah, probably helpful to start with what are the new primitives? [03:06] Maybe so. The way I think about it is we've got [03:09] the foundation models or the model itself which I think of like a thinking box where
[03:16] you can give it a bunch of context and some task, and then it goes and does one thing for you. It can involve some reasoning and it can involve [03:23] formatting it. [03:24] as an action, so doing something. And then the other tool is like-- [03:28] Embeddings. [03:29] So just like [03:30] Really good semantic search. [03:32] So that's why I think of those are the new primitives that didn't really exist before. [03:35] I think a lot of the same primitives still matter a lot. Obviously, our relational database [03:40] It's a pretty fundamental concept. If you're trying to track any information, [03:45] It's [03:46] pretty useful to do that. You don't just want to shove it into a text file. You want it in a structured format that's consistent that you can query, that you can connect things. The good news is like all the primitives still matter, but now you can [03:57] plug in these thinking boxes on top. [04:00] to actually automate some of the tasks [04:03] that a human would do. [04:04] in the past and especially like [04:06] things that are cumbersome and you don't want to do. The way I think about the primitives that connect to [04:10] AI, you got databases. [04:13] You have a UI around the database that a human can look at and the AI can use. [04:18] The permission model is really important as well. There's a lot of coding agent [04:21] tools coming out. [04:23] It's super cool, but one issue with that is you don't really want it to just make a Postgres database for you every time. What's the permission model? What can it read or write to? [04:29] How can I see the schema? [04:31] It's actually really nice and important. [04:33] to have a [04:35] uh, [04:35] permission model that the user can understand and control what the AI can read or write to. [04:42] A lot of the same primitives really matter and that I just think about where [04:44] we're adding on top.
[04:47] Whereas before. [04:49] your database might have been essentially just data that you maybe you do like manual data entry to plug it in or some lightweight integration. But now you can actually [04:57] put this reasoning box on top and like [04:59] much more fluidly transform information or [05:01] pipe it in and out or do reasoning steps on top of it. What do you think about chat as one of the primitives? And do you think that's going to continue to be a main way that we interact with these tools? Or are there other primitives that are going to become more important? [05:12] Yeah, I think chat is probably some version of it's probably here to stay. [05:20] The human interface is just so intuitive. You just talk to it. The big issue with chat [05:26] is that [05:28] you get this empty text box. [05:31] And most people don't know what to type in there. [05:33] It's really great. [05:34] for someone that wants to explore the system and figure out what it can do, but not so great if you just want it to do some task for you. And this is actually true, not just of chat, but anything like this is actually one of Notion's biggest challenges is that [05:47] There's a lot of features and it actually takes a little bit of exploration to figure them out. We call them tool makers, people that [05:53] are [05:54] interested in [05:55] in exploring the boundaries of the tool and making their own little custom software. One big discovery for us over the years is most people just don't care about that. They just want a solution to the problem that they have. And they just want to be presented to them. [06:08] don't really have the patience to [06:10] go figure out this complex tool, which is totally understandable. And I think chat is like a low-level primitive where...
[06:18] It makes sense to have, but the real goal is to connect people to some workflow or use case that's solving their problem. [06:24] and [06:25] It's probably not-- [06:27] the best interface all the time. Yeah. We do a lot of work with big companies and I see that all the time as there's probably 5% to 10% of their people are like, they want to play around with chat. They want to learn how all the AI stuff works. And then everyone else is like, let me just do my job. And usually I think what works is like, [06:44] Letting those 5% to 10% find the workflows and then give the workflows to everybody else so that they don't have to chat with it. Or they can start with a chat that's prefilled with the common things that they're doing. One of the interesting things about, I think, about chat, and I'm curious what your thoughts are, is like, [07:01] Often in... [07:04] UI pre-AI, you had to make discrete... [07:08] Updates to the state of the application checking your radio box or it's discrete see the checkers not checked And it's also usually along one dimension and [07:16] But with chat, you can move in a fuzzier, more continuous way through multiple dimensions at a time. Have you thought about that sort of change from discrete to continuous or single dimension to multi-dimension? And how do you think those things work together best? My mental model is that... [07:32] Unless we're talking about embedding sliders, make it funnier sort of thing. Excluding that, [07:36] where the actual... [07:38] parameter is continuous. My mental model for this is you have your [07:42] software state and like you think of it like it's a JSON blob and then you can have [07:47] user UI controls to manipulate that. And like you said, it's typically just edit this key to be false instead of true or something like that.
[07:53] And the user can only do one thing at a time. And I think of the AI as you can give it some high-level instruction, and then it can... [07:59] go execute a sequence of-- [08:01] commands, like a cascade of things, which are turning lots of the knobs. [08:04] Yeah. [08:06] So that's kind of how I think of... It's more like... [08:08] Yeah, I guess the user's mental model can be fuzzier, but ultimately it still maps all the way down to what are the knobs that it's... [08:14] that's turning, it's just that [08:15] it's okay maybe the user has like a fuzzier understanding of it and then it's going and doing like 10 things for you and then it still works in the same way and also it introduces this new challenge of explaining to the user what happened. [08:26] Especially if it's a complex state. Yeah. How are you thinking about that? What have you found in doing that? I think about it like-- [08:33] what is the thing that's changing and what is the most [08:36] efficient, understandable way to present that. [08:40] One that we've explored in the past is [08:42] asking a [08:43] to do edits across multiple documents. [08:45] And then, you know, [08:46] We essentially just came up. I mean, nothing too crazy, but it's UX, where it shows you-- it groups it by the pages, and then it shows you the diff across each one. And then you can go zoom in, and I'll look at the ones that [08:57] you care about. But it's pretty tough. Yeah, I would say it's just a fundamentally hard problem if it's doing something complicated and then explaining the complicated thing. I mean, yeah, it's just hard. Yeah, that makes sense. Or, you know, if it's explaining like one of the things I find is even if you get it to summarize what it did, the summaries are so high level. It's saying a lot without saying anything at all. And getting it to be concrete enough, but not too detailed is like a really... [09:22] like difficult challenge for some reason. Yeah.
[09:27] I think that's probably just fundamentally hard. I think especially if the thing is complicated... [09:31] Like you're not going to fully understand it maybe until you read the whole thing. [09:33] And then I think depending on the use case, though, you probably can go pretty far with calibrating that prompt. [09:39] to [09:40] the appropriate level of granularity. So I think you can get pretty far at least calibrating it [09:47] I guess there's, maybe if you were to pick it apart, there's the problem of... [09:51] Is it summarizing at the appropriate level of granularity? And... [09:55] Maybe it's just like missing an important detail that you actually wanted to be included. [09:59] And then maybe there's like the more fundamental problem, you know? [10:02] you do want to reduce the information. [10:04] And so it makes sense to draw some things. [10:06] I want to go back to the relational database thing. [10:10] Point, I think for me, the way that I've thought about or the mental model I have for relational databases, and you may have a different one, is it's more effective to have a schema for a relational database if you know what data is going to be used for. [10:22] So for example, it's easier to have a relational database for a CRM where it's like, I know I'm going to use it to keep track of customers, so I have a customer table. And what's interesting about embeddings is they... [10:34] able to capture so many more dimensions of what piece of information is relevant to that you can use it for storing information in situations where you don't know what the information is going to be used for in the future and I'm curious about like how obviously so far with notion you've had to solve using relation database to Store information that you don't know what's going to be used for I'm curious like how you think Embeddings changes that picture if at all. That's a really good question. I think I
[11:00] To your first point, I'll first address the point of it's hard to design it when you don't know what it's useful for. I think that's a really good pointer to don't design schemas that you don't know what they're for yet. This is something that I've been playing around with AI helping you design schemas. We've tried versions where it just comes up with all the properties you might want. [11:17] it can come up with a lot of things and, like, not all of them are useful. I've had a lot more success with, like, [11:22] only give the minimal schema that's required for the actual task that the user currently cares about. Each property should have a purpose. [11:29] That just really focuses the task and make it more effective. [11:32] So I think that's [11:33] one point there. In terms of how I think about embeddings versus deterministic querying, I think you're getting at. I just think of them as two different tools that you have in your toolbox. Ideally, you have both. And you can even maybe [11:45] combine them. So this is only something that we're working on a lot is Q&A over databases. And when do you turn to a deterministic SQLite query and when do you turn to an embedding? [11:53] I think it really just depends on the question. And sometimes you want one, sometimes you want the other. And a lot of it is like a performance cost. [12:01] like latency concern. [12:03] Like... [12:04] You could just make everything embeddings. Or, I mean, you could just map a model over every run of the database every time. [12:09] And then... [12:10] You don't need to, you know, [12:11] Like no embeddings and no SQL either, right? Everything's unstructured. [12:15] I think that would be undesirable from a performance perspective and also [12:20] Yeah, it wouldn't be like fully deterministically accurate, which I think [12:24] people probably care about. If you're like, how many sales did we do last quarter? [12:27] Do you really want the model? It can make it up a little bit. It'll get close to right, but it won't be actually right. It seems a bit scary. So it depends on the question. If I'm asking, let's say I have a customer database or something, and I'm like, how many sales last quarter?
[12:39] Yeah, I really do want to call them like amount to sum over it. But then if I'm saying like, [12:45] Do we have any customers in the entertainment space or something? Maybe I wanted to be flexible on that. So yeah, I just think of it like these are just tools in the toolbox and you want both. [12:54] And then... [12:55] the challenges in defining that routing mapping layer of figuring out which tool is best for the job and combine them and then present it to the user with [13:03] the best result. You very famously, a couple years, I think, into Notion's life, went to Kyoto, stripped it all down and pivoted the company, and it became what Notion is today. [13:14] And... [13:15] I'm curious. [13:17] Let's just assume as a thought experiment that you're going to have a second Kyoto. You're going to go strip away everything that notion currently is and rebuild it with [13:27] AI? [13:28] how would you do it or how would you think about it from scratch? What would you do differently now that these tools are here? [13:34] Yeah, that's kind of how I operate. That's great. [13:39] When I'm thinking of a new project, I like to be pretty unencumbered by the way things work. But then the magic is also about... [13:47] taking this unencumbered crazy idea, but then also... [13:50] Ideally, you want like an incremental roadmap for everything. So I think a lot of [13:54] There's a lot of details in there. I don't just want to make crazy ideas. I want to actually ship stuff incrementally, but then still get to the crazy place. The really key exciting thing to me is this thinking box. [14:03] where... [14:06] There's plenty of knowledge work tasks that people don't really want to be doing.
[14:09] or that are too expensive for them to do because you have to hire like the humans to do it. Can we automate that stuff? One big principle would be like probably there's less [14:18] humans touching the database? [14:20] and that the AI should be managing it for you. [14:23] We're talking about customers. Let's say you have like a CRM style thing. [14:26] Ideally, you never need to update any of the fields, right? If the deal closes, [14:30] it should know the amount based on your email. If someone talks in Slack about how the deal is at risk, that should be [14:36] Yeah. [14:37] in the structure somewhere. You shouldn't need to update stuff. And I think in the AI world, like [14:42] the database becomes more of an implementation detail. And hopefully the user interacts more with the... [14:49] processed outputs of it rather than the database, the raw database itself. So like maybe for sales, you really care about a daily progress bar [14:55] or seeing something about the productivity of your [14:58] of your sales people or something like that. [15:00] Those should all be just presented to you directly. And the database is just this background thing. That's implementing the thing you care about. I love that, especially the first point about you shouldn't have to interact with the database. What it reminds me of is there's this constant thing with Notion and with any other kind of tool like this, where especially if you're doing it inside of a company. [15:19] You're always like, is this up to date? [15:21] There's 5% of things that like make it into Notion, but then there's 95% that's like completely like unwritten. [15:27] And I think [15:29] companies operate better when more of that stuff is legible and more of that stuff is written down and it's like it's updated. I've always had this thing that's like I think companies should have librarians that are like just responsible for that. Having worked in a big company, I was like the guy for a particular product and it had a huge sales force and I had written all these documents about like how the product should be sold and like what the details are and whatever. My previous company was a co-browsing company and I sold it and I was the co-brows guy internally at Pega, this big public enterprise software company and I
[15:57] even though I'd written everything down, [15:59] all the sales people were just like, you're the Cobras guy, right? On chat. What about this question? I'd be like, [16:03] see my doc. But I think one, discoverability was like really poor for them. And then two, there's always that thing in your head where you're like, is this up to date? And it seems like what you're saying is that there's an opportunity now without someone having to do it, to take a lot of that stuff that would ordinarily not be written down and get it into a format where it's recorded for other people to use. Is that kind of what you're saying? Yeah, we're definitely excited about that as a use case. With the current... [16:30] Q&A with Notion and third-party connectors, you can get-- at least if the salesperson goes and asks the question, they can ask the AI. [16:36] That's pretty cool. But then... [16:37] Yeah, I think [16:38] a lot of the times you didn't write the doc in the first place. And then once you write it, you want to maintain it. I think those are both like really interesting use cases that I think would be super exciting. A fun thing about [16:48] these thinking boxes is now you can treat a knowledge base like a database. [16:52] where the operations on it can be semantic. I think that's pretty exciting. Yeah. Thinking about [16:57] How can pieces of information conflict with each other? And how would you resolve that? Yeah, that's super interesting. I love the word thinking box. And it makes me think, what is thinking? What do you think the boundaries are of what that thinking box can do versus not do? [17:12] I don't think there's that many boundaries. I think the instructions kind of complete. [17:16] in a way already. It's more that just the models kind of suck still. You give it vision, like assuming it's multimodal, right? [17:25] There's not much more to it, I think. Maybe there's like...
[17:27] robot actuation commands. But those can be represented in the same model too. [17:32] I kind of feel like the abstraction is already complete. I feel like the really critical thing is it has some context and has tools that it can use. [17:39] And then-- [17:41] those tools produce observations, and then you just loop on that. That's an agent that can do anything. Assuming the model actually works, which [17:50] They don't yet. Depends on the use case. I don't know if you saw, but Anthropic dropped a new computer use model for Cloud the other day. And one of the things that they touted in that release is that you don't actually have to explicitly identify tools. And instead, the model just understands that there's a browser in front of it that has certain things. And the computer has applications that allow it to do things. And so the tools are implicit rather than explicit. What do you think are the tradeoffs there? Do you think explicit tools are like... [18:19] actually what's best or should it be implicit and where? [18:23] Yeah, super excited to see that. I mean, on a technical level, it's still tools. It's just the tools are like... [18:27] click this coordinate and type this. I guess that's true. It's just meta tools. It's still implemented as tool. Yeah. It's just that [18:34] the click tool is like, [18:36] Pretty powerful. A lot of things you can click. Click and type. You can do a lot of stuff, right? Yeah. And then the observation is to see what happened afterwards. [18:42] Yeah, I was super interested to see that. It's something I've been expecting to start working and it seems like [18:48] It doesn't quite work that well yet, but you know, early signs and it's cool that they're showing to the world. The way I'm thinking about that is [18:56] Given the task,
[18:57] You want to give the AI the most convenient way to do the task possible. So there's some quality constraint around how to do it. [19:04] And then there's some... [19:05] like performance latency constraint and then maybe [19:09] Something around how users can observe and control. [19:14] So I think [19:15] computer use [19:17] It's very open ended. [19:19] Like you said, at least currently the quality seems much lower. [19:22] than if you were to give it like a more specialized tool. The latency is very bad, super slow. You could get much better results by giving it like a code API, for example. Like if your goal is to download a recipe, [19:34] you can have it go google search and find the recipe or if you give it a recipe search api [19:39] and that's going to be done in less than a second. And then there's the controllability... [19:45] thing which is [19:46] Pretty important, I think, especially if it's doing something autonomous. [19:49] for you. It doesn't seem that interesting to me [19:51] to have it control your computer while you're watching. The interesting thing in scenes [19:56] You ask it to do something and then it goes and comes back to you when it's done and I can do something else. It has its own computer. It has its own computer, exactly. [20:03] And... [20:04] I want to be able to control what it has access to. So I think that's pretty important. And... [20:08] If you're giving it a computer, [20:11] It's pretty open-ended. So we need to develop some of those controls around it, but I'm excited about it. I think ultimately the way I think about it is it's just another tool in the toolbox. [20:19] And... [20:21] The ultimate answer probably looks like a mixture of-- [20:24] When you can give it an API, that's much better. [20:27] And that will always be better.
[20:28] And then when you can't, it's nice to have this escape hatch where it's like do stuff on a computer. That makes a lot of sense. I think that's totally right. Four tasks where... [20:36] they're repeated and you know they're going to happen inside your wrap your application like having a specific api that just does them like really quick is great just like you know you have muscle memory for figuring out how to pick up a glass and maybe there's like a tool in your head that is really tuned for like picking up a glass to drink and then you fall back to this like slower more open-ended thing that can do much more for [20:59] tasks where like the tools that you are more specific can't like can't handle yep yeah i think another angle too is interesting is the [21:08] Mark it. [21:09] Dynamics angle. I think [21:11] we might start seeing people shutting down or not wanting that. There's gonna be a race where, you know, [21:17] people who use the computer using Agen is going to want it to access all their stuff and then [21:21] the companies that manage those tools might not want that. [21:24] If it's like a third party. [21:26] There's already a whole industry around preventing bots from accessing websites and stuff. So this is like [21:30] Now bots are useful for real work. So what are we gonna do about that? I think I'm not sure how that's gonna play out I think it'll be really interesting. What would be your guess? I [21:38] Thank you. [21:39] I think probably it's inevitable that... [21:43] Well, [21:44] I think people are definitely gonna want to do this and they're gonna have [21:47] legitimate reasons to do so. Unlike in the past where maybe it's like scamming or hacking. Like now it's like, no, I'm actually trying to perform this task. I'm paying for your software, so you should let me do that. [21:56] I think that makes sense. [21:58] I could see a world where I think probably the ideal outcome is...
[22:02] Something where... [22:04] Like everyone allows that, but then they get paid for it. [22:08] in some way. I don't know what the shape of that is exactly, but yeah, I think that's ideal. If you make some software that's valuable and [22:15] people are using it in this way. [22:17] somehow value should accrue to you? [22:19] Do you think we're going to see a world where there's [22:22] interfaces that are specifically for like verified humans and then LOM friendly, I guess as an API, but it's a, it's something different from a traditional API, but like LOM friendly, [22:31] Interface is built specifically for LLMs. [22:34] Yeah, my job description is like, [22:36] Design those. Yeah. There's all these like, like quirks. The quirks will go away over time. Yeah, I saw someone [22:42] tweet the other day that you're going to have an alternate form of your website that's just just plain HTML with like divs and buttons. Yeah, I just use that. Back to the 90s. Yeah, yeah, yeah. I love that idea. I think... [22:54] It's a tricky race because on the one hand, the current models are not good at many things, and you do need to design those custom things. [23:02] But then... [23:03] As the models get better, maybe you need a bit less of that. And maybe they can also just build their own. Eventually, the model can just build its own scaffolding. You give it some thing, and it's like, all right, I'm going to make a whole Python code repo. [23:15] that [23:16] And maybe inside of that, it's going to figure out this problem that I was just saying, where it's like, all right, which things can I use code for? That's better. And then which things do I need to call out to some browser, like headless browser for? That's way less ideal, but I'll do it if I need to. And then [23:29] Yeah, I feel like the ultimate abstraction just closes over all this. [23:32] So...
[23:33] as a human, just whether it's using a code API or browser as like implementation detail. You said like part of your job is figuring out good interfaces for LLMs. What are the current properties of a good [23:44] LOM interface. [23:46] It's super fun. Okay, so yeah, there's a bunch of principles. So one is... [23:50] that [23:51] You want to align to things the model's been trained on. [23:55] as much as possible. What does that mean? [23:57] So just as a concrete example, [24:00] One way we've tried representing a Notion page is as this XML tree, which is much more faithful to the way it's persisted. [24:06] But, [24:07] The model just want to speak Markdown. [24:09] Interesting. But they prompted an XML often. So it's prompted an XML, but typically the XML that's used in prompting is pretty simple. It's like, [24:17] wrapping this tag. It's very good at that. The format that we came up with is a much more complicated form of XML, where notion blocks are in a tree in this. [24:25] many layers of nesting and there's rules about which blocks can contain other blocks. [24:31] describing the spec is like, [24:33] There's like several thousand tokens. [24:35] And then the trouble with that is that [24:38] The models can do it, but you're actually harming its ability on other tasks. And I guess our mental model is as it's making the tokens, it has to [24:46] attend to all your complex instructions about the formatting, and also the reasoning of like, answer this question or something. Yeah. And it definitely makes it worse. It's better if you can speak the language that [24:56] that it knows how to do. And that's just a matter of [24:58] while the company turned it on. And so Markdown is a good example. Everyone just turns it on Markdown. And so it's just really good at that. You don't need to give any extra instructions for that. Even if Markdown has a more complicated structure, are you flattening the tree into--
[25:10] More linear. Well, I think I mean markdown is also simpler. [25:13] It's just a very simple kind of like lossy [25:16] Language with... [25:18] very little [25:19] ways it can fail. That's one class of things like aligning to [25:23] um what the models already know another one is around you want the structure of your output [25:32] to be like as simple as possible. [25:34] for the output that you need. I think that's-- [25:37] Really key? Like for any formatting... [25:41] It depends on what you're doing. You want to really go hard on it. Make it as simple as possible while still doing the task that I care about. What would be an example of a time when you learned that or that stood out to you? [25:50] I think the XML structure is like applies there too. Like... [25:54] are [25:55] When we first started doing it, our [25:57] original principle was like, oh, let's just like perfectly map [26:00] the [26:01] the way it's actually persistent and displayed to the user. That's ideal, right? There's no lossiness at all. Then there's all these little like [26:08] quirks about [26:09] just little things that gets wrong. And it was often easier to just simplify it. And even if it's somewhat lossy, it's worth it because [26:18] at least you can control that. [26:20] Whereas if it's too hard for the model... [26:22] It's [26:23] Yeah, that's the end of the line. Here's kind of the basics of describe your task as simple as possible. Use like few shot examples. Another... [26:29] class of learning that's [26:31] kind of interesting is if you're working on the prompt and you notice some class of issues, [26:37] - The way I think about it is that, [26:39] My first line of defense is I want to try to make that class of issues
[26:42] be impossible. [26:43] in the system or validation around it. That's like the ideal. And then if I can't think of a way to do that, then I'm going to try to make the prompt better. Add an example or like change instructions and like that. [26:53] I think that's really fun one. And it's [26:55] Yeah, it really depends on the task, but I'll give you one example. In the example, I'm making a fake test data, and then there's one prompt that is describing the kind of fake test data that you want to make, and there's another prompt which actually... [27:07] Writes out. [27:08] the fake test data in detail. And one little constraint with that is I don't want it to [27:13] too much just because it'll be too many tokens to be too long [27:16] And so I have this constraint in your description, it should only make up to 10 like records or something like that. [27:22] But sometimes I just wouldn't follow that instruction. That's annoying. And... [27:25] One little trick there is, oh, [27:28] actually while it's generating the descriptions, ask it to estimate [27:32] the number of records that will be needed for this test data. [27:36] And just forcing it to output that [27:39] One, it aligns it much better because it's like, you know, not just that it's in the instructions and it can maybe ignore that, but like it had to actually produce a number. And then also if the number produces too high, I can actually just throw an error and have a try again. That's really interesting. I think there's two principles packed in there. One is the thing you said, which is making a certain class of error impossible, which maybe it's like changing your prompt or maybe it's not even calling the AI in a situation where that error might come up or something like that, which is really interesting to me.
[28:09] thing where by asking it to like output how many examples it thinks are necessary you're like [28:15] you're aligning it in the same way that chain of thought works. Yeah, yeah, yeah. Doing structured chains of thought is really useful. Yeah, a specific to your task. Yeah. [28:22] I think you all probably have one of the most scaled AI applications in the world right now. And one of the things I think is really interesting to dig into is previously software was deterministic. [28:34] And now it's very stochastic. It's much more squishy. And especially at scale, like releasing squishy software to the world is like scary. And I'm curious about what you learned about doing that, what you learned about good evals, all that kind of stuff. [28:49] Yeah, it's really annoying. So yeah, I mean prior to 2022 I didn't I never did any AI stuff really besides like taking some classes in college and [28:59] Yeah, I definitely miss [29:01] the days of like, you know, I can like write a QA doc and write some tests and like it all kind of works. I have a good mental model. It's not going to fail at all. And [29:08] in all these cases. I think the AI thing is so hard because it can fail in some cases. But then there's this like additional meta problem. You don't even know [29:16] Like you might not even know the cases where it can fail. And usually what happens is as you're kind of ratcheting on a prompt... [29:21] you end up discovering more and more of these. And sometimes you discover like really major ones. After you had a huge eval set, and then you discover some new ones, like, oh man, this totally breaks it. [29:29] Even just... [29:30] discovering the distribution of the possible errors is like, [29:33] really hard. And I've definitely been [29:35] led astray multiple times of thinking that I'd solved it and then find this whole new class of errors that are like really hard to solve. This is really hard for evals.
[29:42] The way I think about that is you've got deterministic evals, and then you've got non-deterministic evals. You know, if it's possible to make a deterministic eval, that's great. And I love to design [29:51] workflows such that there's some like classifier elements within there. [29:56] producing an enum or yes no value or something like that. Those are great because [30:00] They're super easy to eval because you can just collect the data set of input and then the correct output, and then [30:04] just get a score. So I'm always trying to, yeah, if there's some complex workflow, I love to come up with classifiers within there. [30:11] That's one big strategy. [30:12] Um... [30:15] And then there's non-deterministic evals. Is the vibe of this correct? Yeah, yeah. So just like using an AI to-- Determine. --evaluate something. The trouble is-- [30:24] If you have an AI evaluating, you need to evaluate your eval now. [30:29] If running a prompt is hard, now you have a whole lot of them. I've found that... [30:34] You have to be pretty careful. I've definitely... [30:37] learn to be pretty cautious about these. It's easy to come up with an idea for a model-graded eval. [30:42] that sounds good, but then like in practice, you try making it and then you slog on it for a while. It's like, oh yeah, I'm discovering a hundred cases of this thing. So I think I found that they work best when the thing you're trying to evaluate is quite targeted and you can describe it very carefully. [30:57] Clearly, you want to make the shape of your task [30:59] Evaluation task. [31:01] such that like for the model you're using, [31:03] to run that. [31:05] the model is extremely good at the task. [31:09] to the point where you can actually trust it. And you don't need to spend a bunch of time evaluating that.
[31:13] So, [31:14] Yeah, you want it to be like very clear... [31:16] like narrow really helps. And then using the appropriate model for it. But yeah, that's really hard. And then evaluation general is like really hard. Yeah, another one is you wanna definitely have a really [31:25] solid loop around logging and collecting data sets, collecting issues and labeling them. And then optimizing [31:31] the loop that lets you [31:33] improve a prompt and [31:36] make sure that they're not regressing on the previous examples. So yeah, there's a whole [31:41] Thank you. [31:42] Yeah, there's a lot of stuff to do in there, and it's really annoying. Tell me more about that part. [31:46] using feedback or whatever to improve a prompt and then make sure you're not regressing how are you doing that [31:53] Yeah, I think it really just boils down to having really good evals that are appropriate to the prompt, and then having good datasets that capture the distribution of errors that you care about. [32:05] And then... [32:06] making it easy to run the evals and flag regressions. [32:10] it's kind of, [32:11] simple, but it's just, [32:13] It's annoying to set up and you have lots of prompts. You have to do it like many times. I found you want to have like... [32:18] solid, like standardized ways to do this. [32:20] Are you using all homegrown tools for this? Or are you using like, you know, Anthropic has an evals library. OpenAI has one. There's a lot of open source ones. We use brain trust for like storing data sets and running evals, but all the actual evals, like we write ourselves. One of the things you said earlier that struck me is like, [32:37] This idea of exploring the distribution of errors you can get, and sometimes you'll have a big eval set, and then you'll find something in the distribution where it breaks everything.
[32:46] What have you learned about [32:47] Doing that exploration to minimize the chances you find something totally unexpected. Yeah, I [32:53] I don't try to do evals too early in a project. I think you can actually go too hard as well in the other direction. [33:00] I'm starting a new thing. [33:02] I feel like you want to start with more of a vibe check, and really be flexible about-- [33:07] how the task is structured. [33:11] Because I found that, especially early on, there's a lot of returns to just like changing the structure of the flow. [33:17] And if you spend too much time collecting data sets, you're just going to trip yourself up a lot. [33:21] So I feel like there's like a mode switch when you're like, all right, let's actually productionize this. [33:25] where you want to switch to. [33:26] actually intensively finding issues. [33:29] Yeah, I think it helps to have... [33:32] It helps to have like data labelers. [33:34] dedicated to that. [33:36] trying to understand as best you can [33:38] how it will actually be used and then [33:41] make your dataset map to it, ultimately that's the game. And then after you deploy it, then the game is around [33:46] actually flagging those examples and saving them. Right. How do you think about, for example, when I interact with Notion AI, you're going to go find a bunch of presumably embedded text for my Notion and then put it in the context and that's going to help you answer the question. How do you think about... [34:01] how much to pack the contacts. Are you trying to get all the information, like, [34:06] even if it's not as relevant in there? Or are you like being more selective about it and why? [34:11] Yeah, it's an empirical question. [34:13] For sure. So... [34:16] I think it's changing all the time. Like, even just this year, it's changed so much because the models have all gotten pretty long context. But...
[34:23] It's... [34:24] I don't know. Like, sometimes people make a tweet like, "Rag is dead" or something. But it's like-- I mean, there's also a latency cost concern. And there's a-- like, attention is different in the middle of the context versus the beginning and end. Yeah, yeah. It also just doesn't work yet. Yeah. I think we're definitely still strongly in the world where, like-- [34:42] you want to limit [34:43] the context. And if you can remove irrelevant stuff, there's definitely returns to that. [34:48] even with the latest models. [34:50] And if you ask OpenAentropic, they'll tell you that as well. So I don't know. It's hard. Yeah, it's really an empirical question though. I'm hoping the attention gets way better. [34:58] and the context get longer, and it gets faster to process them, and caching is better. [35:03] then maybe I like that world because then we can worry less about [35:07] removing irrelevance of it that makes our job easier. Yeah, I'm very bullish on being able to do that. I feel like we're not quite there yet. And we've certainly expanded [35:16] of the context that we show as models of God. [35:19] Our original Q&A was on... [35:21] the original GPT-4, which is like 8k tokens. [35:24] So obviously we had to be super constrained on that. We can only show a few thousand tokens, especially if you want to do like multi-turn, and have it just forget things. Obviously now they're much longer so we can show more, but [35:34] But yeah, it's still definitely... [35:36] constrained by [35:38] the quality and the cost latency. [35:40] What are you personally coding with? Are you using Cursor? AIs? Are you using Cloud 3.5? Are you using O1? What's your workflow or stack? I use Cursor. The thing I like about Cursor is just the autocomplete they build is really good. It's like
[35:55] Way better than-- are you using Composer? [35:57] So it's just the one where it's just the tab completion where you can do like arbitrary edits. Is that? Yeah. Composer is like a window that pops out. Oh, so like the command K. Yeah, the command I. Oh, you gotta try it. It's really cool. So what does that do? It's different than that. Cursor composer is much better at doing multifile edits. It's a little bit more agentic. Yeah, I don't think I've actually even tried that. I've used the command K and then I've used the command I. Yeah. [36:20] Or the command sidebar. I think you have to do less because in the sidebar chat, you have to like go scroll through each step and then press apply. And it takes a while. And Composer like does all of that a little bit more automatically. [36:31] Yeah, I haven't tried much doing multi-file edits. I guess my mental model is like, it's probably not that good at that, but I should try it. I feel like I don't, I feel like I don't understand if it's good or not, which is, which I should. My typical workflow is like, yeah, autocomplete all the time. Obviously, that's just like, ambiently there. And then I'll ask for code. And sometimes I'll use a Notion or Cloud or a LictiveT interface as well. And then my model there is more about [36:52] I have a specific function I want it to write. That's typically the abstraction level that I ask it to code at. [36:58] Anything beyond that, I don't know. It's just... [37:01] Thank you. [37:01] No one's really cracked the multi-step coding agent thing yet, but it can write pretty good functions if you give it good instructions and context. And then I also use it for... [37:08] I've tried using the [37:10] retrieval over code, which [37:13] It's helped me a few times. We have a pretty big code base. And so I can be like, what's the function that does this thing? Yeah, yeah, yeah. [37:18] Is that in cursor retrieval over code, or is it somewhere else? It's in cursor, yeah. That's part of the sidebar. I see, I see. That's really cool. And models-wise, are you using Sonnet or using 01?
[37:30] Asana is my day-to-day workhorse for sure. 01. Yeah, I've been... [37:35] Playing with it a bunch. I've had some success with [37:38] Oh, one and one mini on coding stuff, but [37:42] It feels like it's not that much better than Sonnet. [37:44] at coding and then it's slow. And it feels like it's also weird to prompt right now. I've definitely gotten like weird results from it, especially when you like give it a bunch of context. [37:54] It's like they trained it a lot on these math and coding puzzles, which have like very little context and a lot of reasoning. [37:59] And it's really good at that, but only if you can [38:03] Put it in the shape of that. [38:04] And I found it's a bit finicky. So yeah, I'm really excited about the paradigm though. And I'm curious to see when they produce the final model that's been like fully trained on bigger distribution of inputs. That's what I was going to ask you about is what do you think about scaling inference compute as a new paradigm and where do you think it's going? [38:20] Yeah, it's super exciting. It makes a lot of sense intuitively. I think I was surprised [38:25] when it first came out. I think I was initially underwhelmed [38:29] Thank you. [38:30] Because I intuitively have thought that like, oh, [38:32] Wouldn't it be better if it like reasoned in the latent space more? [38:36] Wouldn't that be more powerful? But then I was talking to some friends about it, and actually, it makes a lot of sense for it to be language. [38:42] because the model has all this prior over language already, and the reasoning over language. And so it makes a lot of sense. It's like the dumbest possible thing you could do. It's just like, [38:50] Let's. [38:51] think more over the-- using language. And then my understanding is the tricky part is just making the RL work. And there's all these details like OpenEyes figured out, but no one else has yet.
[39:04] Yeah, I feel pretty excited about it. I think I was pretty impressed by the graph of like, [39:09] Increased scaling compute. [39:11] it gets smarter and it makes a lot of sense. I feel like [39:15] The thing that's going to be the real unlock, though, [39:17] is putting tools in the chain of thought. And that's where it gets, I think, really interesting. Because right now, it's like the shape of the problem you need to give it is like, [39:27] You need to give it all the context it needs up front, and then it can think a lot about that. [39:31] and then produce an answer. But I think there's actually not that many [39:34] Things that are like... I mean, I'm sure there are plenty of things that are like that, but it's actually kind of a blunt... [39:39] or a specific, like, tool, it feels like. I'm... [39:42] Yeah, the big unlock that I'm... [39:44] excited for us. [39:45] putting tools in there and then doing reinforcement learning over that. It was really interesting. That's kind of when like, [39:50] agents are going to actually work is when I want to do that. You can give it some like high level task, give it these tools, like maybe it's like a browser and then [39:58] and then just enter a hard loop. Think a bunch, use some tools, see the outputs, [40:02] Think more and then keep looping on that. I feel like... [40:04] Yeah, about... [40:05] That can solve a lot of stuff. And then whoever can figure out that kind of like, [40:10] long horizon. [40:11] Like reinforcement over it. That seems... [40:13] Yeah. I think tool use is coming. I was at DevDay, and I think they said it's coming before the end of the year, which would be pretty cool. It makes sense. It's like the obvious thing they should do, but it's probably hard to figure it out. So you talked about having them be trained on these math and math reasoning problems. What do you think about zooming models into those kinds of problems versus like... [40:32] I don't know. There's endless amounts of like other things you could have it do. And they're trying to make it reason.
[40:38] Do you think that allows it to come up with new things or be creative? If you let one of these things run, let's say you could just scale the inference time [40:46] infinitely would it come up with new things or do you think that's a different [40:51] type of thinking that requires a different type of training loop. Yeah. [40:55] Thank you. [40:56] I think the interesting thing that obviously I don't know what OpenAI is doing, but like by a speculation, the thing that [41:02] it seems like they're doing is that [41:05] Like the place where you can get [41:07] there's reinforcement to work is when, [41:09] the results can be verified in some way. [41:13] you're like touching reality in some way. Like it's not that [41:17] The model is making stuff up and the human's like, oh, this one's slightly better. It's like you're writing some code and there's a unit test and they have to pass or there's a math problem and there's an answer to it. For this one reasoning step, there's like a correct thing to think. I think that's really interesting because... [41:32] It lets them... [41:34] scale up the training. And then I think you can do that. I think [41:38] With that as a tool, [41:40] You can now [41:42] discover new things just by mining it. If you know how to verify [41:47] some problem you can spend lots of time thinking about. [41:50] the answer and keep going until it's correct. So I think in that domain, you can be really creative of just like [41:56] creative ways to solve this. I think you're [41:59] Maybe pointing at something a bit fuzzier, though, like... [42:02] It made me like aesthetic creativity or something like that. Something like that. Yeah. I don't. [42:06] That feels like a different thing. And it feels like not the direction that the companies are going now. It feels like they're really doubling down on these verifiable things. What's really interesting to me about that is like,
[42:16] Um, [42:17] when you... [42:18] look at how mathematicians or scientists talk about coming up with new theories, there's usually like a big aesthetic component, like the idea of beauty or simplicity or whatever is like driving. [42:28] driving them. And when you're coming up with new theories, they're often not verifiable when you first come up with them. So like an example is we didn't verify a lot of relativity for a long time after Einstein came up with it. Do you worry that focusing solely on training loops where each step can be verified, or at least the outcome can be verified, or some of the thinking steps can be verified, like limits the ability of these things to think in ways that are valuable? [42:53] Yeah. [42:54] I think to some extent it's all they can do probably because if we've exhausted all the human data, [42:59] And you need to make more data. [43:01] And you want it to be good. Yeah. [43:03] Yeah, yeah, you have to verify it. So some of that stuff might be off limits until we like, I don't know though, I guess [43:09] One way to think about it is when Einstein's coming up with this theory, [43:13] He's doing some verification. Presumably he was writing down some equations, like making sure they work out. Yeah. [43:17] But then he also has this built in aesthetic, which is a bit fuzzier of what's a nice theory. But that was trained on. [43:24] his previous learning on physics and all the stuff he learned in his life. [43:28] And I wouldn't be surprised if OpenAI makes some model and they train on all the physics and then they have it going mining to produce new physics and new math. I wouldn't be surprised if it actually... [43:37] develop something analogous to that aesthetic. You keep mining and then it discovers 10,000 new theorems. [43:42] that are all proven to be correct. [43:44] And then you start asking it high-level questions about math. I wouldn't be surprised at all if it had like some...
[43:48] semiologous thing going on there. Yeah, I guess if the theorems that it's seeing are representative of that aesthetic, the aesthetic is in there, even if we're not really even talking about it. Yeah, you would hope that the aesthetic... [44:00] is pointing to the truth in some way, and it's coming from the truth. The simpler theory is nicer because it's more likely to be correct. [44:06] and [44:07] to the extent that those are true, I would expect the model to learn them too. Which is, that's a big assumption. Yeah, I don't know. Yeah, yeah. But then... It's definitely more likely to be useful. [44:17] or beautiful. Yeah, there's definitely like humans have additional constraints and in some ways the aesthetic is like a computational shortcut. The human Go player can prune out a bunch of classes of movies that maybe the AI would like to brute force more. [44:30] Maybe Einstein had to do that in order to just do the limits of human cognition. [44:34] Maybe the only way Einstein could have done it is to have [44:37] Aesthetic shortcuts. Yeah. I think the beauty of LLMs is like they get you the thinking machine without having to brute force. Like they do the same kind of aesthetic shortcut or like intuitive thinking where early AI attempts. It was like, OK, to solve chess like Deep Blue, we're just going through the branching tree. But then you get to these. [44:55] Like when you're going outside of chess, then you get to these problems where the branching tree is so computationally expensive to traverse that it doesn't work anymore. And language models have figured out how to set the frame of possibilities correctly so that they don't run into the computation. Yeah, yeah. I think it's both. If it's producing a new theorem... [45:13] You probably have it try many times. But then you can also have these thinking shortcuts. So I guess you get best of both worlds. Like you couldn't get a human to do that much work. Yeah, yeah, yeah. You're trying many different times and you're examining different smaller parts of the tree, but you're not brute forcing the entire tree because the theorems it produces are important. Yeah, there's probably still shortcuts. It's not going to do stuff that obviously doesn't make sense.
[45:31] But then maybe it's just a really hard task and there's like many things that could make sense. Yeah, yeah, yeah. One of the things that – and I could only find one other interview with you. So this is coming from that interview. But like one of the things that you said in that interview is that one of your core philosophies is if you can build something that you want – [45:47] That's like a great place to start. [45:49] Um, uh, something builds stuff that you're going to use for yourself. You were the first users of notion, all that kind of stuff. And I'm curious if you have anything that you feel like you're building for yourself right now or things like that you want right now. [46:01] Yeah, I would say that principle, it's not universal because obviously if you're designing, if you're making something for like diabetes patients and you have diabetes, but someone needs to make that. Maybe not only people with diabetes. Yeah, it's helpful if you have it. Yeah, so it's a luxury. I think of it as a wonderful luxury. [46:18] that if you're deciding what to do with your life, it's nice to pick one of those things. And I really cherish that. It's a thing that we can use every day. And I like lean into it and try to compound on it. What's something that I'm doing? It ties back to a lot of the AI stuff that we've talked about. I'm excited about [46:33] There's cumbersome tasks that me and people around me are doing all the time that [46:38] Yeah. I can just not do those things. Yeah, within the realm of like knowledge. [46:41] I guess we already talked about [46:44] some of them, but like, [46:45] keeping documentation up to date, [46:47] databases not having to do manual data entry and giving things up to date that's super annoying there's a [46:54] Common class of task around like rolling something up and like summarizing things. [46:58] from a database or from a box or something like that, you have your weekly update. And especially in a bigger organization, there's like many levels of weekly update. And it's super annoying. It's a huge use of everyone's time. Yeah.
[47:09] Why are we doing that? The data is there, the docs are written in the Slack channels. [47:13] responded to and it's all there and then [47:15] You could write an update at any level of granularity that they care about if you just have the right contacts and the instructions. I totally feel that. We have a product studio at Every, so we incubate little products. And everything that we incubate, we use ourselves, which I think is great. And I think one of the really cool things about AI is it means that the low-hanging fruit hasn't been picked yet. So you can make something for yourself. And it's not like there's 10 years of tech nerds who have made it before. It's all new, so everything gets made. [47:45] is every morning now I get like a little podcast of meetings or discords we use discord to chat things that I missed and it's the notebook LM stuff where it's like this sort of NPR fresh air type thing and it's two hosts talking about what happened and I can get caught up while I'm [48:02] doing dishes or whatever. It's really fun. That's super fun. I'm super excited for that kind of thing. [48:06] I actually, I just wrote an idea down that's very similar to that the other day. That's so fun. Yeah, you would love to see that. I definitely... [48:11] vibe a lot with the feeling of openness in the space. [48:15] really exciting. One thing I really appreciate as well, and it really keeps me going a lot, is so many things haven't been tried and also... [48:23] It feels like [48:24] the technology is so overpowered in some ways that [48:28] I found it very common that I can just come up with a random... [48:31] a technical idea. Just like, oh, I wonder if it would work if I did it in this way. And then usually it does work, which is pretty cool. It's like once you develop the intuition.
[48:39] and [48:40] Maybe there's a paper about it you could go read, but it's really just like a simple, like most of the... [48:43] LOM papers are like simple ideas. Yeah. And sometimes there isn't. And it's like you're on the frontier. Yeah. Yeah. Exactly. Yeah. It's easy to be on the frontier. [48:51] And just to try stuff. And it often works, which is like such a good feeling. It's like, oh, I had this creative idea and I tried it and it works. Like, what's better than that? You know, that's... [48:58] That's pretty amazing. Totally. So I put out a tweet or an ex post asking what I should ask you. And someone who submitted a question is Linus. I used to work on the AI team. Linus asked how working with AI has changed what you think is. [49:13] machines are good at versus human what humans are good at. [49:18] Hmm. [49:18] Yeah, I mean, I think it's always shifting. That's one, like the only constant is change in that regard. [49:25] I'm pretty like AGI-pilled, so I think there will be nothing. [49:28] at some point. But right now there are many things that [49:32] Thank you, Inserbetterat. [49:34] I think actually maybe a better question is what should humans do? [49:38] It will quickly be that you're actually just not better at anything. I feel like the most important thing there is making sure that the shape of the future is aligned with what we care about and we agree and consent to it and support it. [49:49] And a lot of it is around [49:51] Like. [49:52] us [49:53] orchestrating the tasks that the AIs are doing. What do you want them to be doing? That's a really [49:58] important question. And then once you have it going and doing that, making sure it is, it's doing the right thing and the way to observe it, check on it. And that feels like the key. So a way to boil that down is like,
[50:09] wanting, [50:11] is what humans are good at. I think so, yeah. It's like defining the high-level goal. And... [50:17] making sure that's what's being done. Like ultimately at the end of the day, so that's everything. And then over time, all the details of how it gets in window will be whittled away and how are the AIs doing all of it. [50:25] Which is great. I think that's a great world. You said you're an AGI pill. What does that mean? [50:29] I would assign a pretty decent probability to AGI in the next 10 years and not necessarily requiring huge paradigm shifts. Do you have a personal definition of AGI, what that means to you? [50:41] Yeah, I like OpenAI's definition. It's just kind of do all the economically useful tasks. I think that's... [50:47] Pretty good. The critical question, I think, is around the intelligence explosion. So do you think it's possible? What's the probability of that? Like, how fast will it happen? And then the key question in there is like, [50:56] when will AIs be good enough at doing all the AI research tasks to significantly contribute to making the models better? [51:03] and [51:05] I'm pretty concerned about that. I find the case pretty compelling. You actually break down what an air racer actually do. [51:12] reading the literature and [51:15] talking to colleagues and gathering contacts and then you're coming up with like high level ideas and agendas and then maybe you're [51:20] designing experiments and then you're like writing code to run the experiments and then you're analyzing the experiments just writing more code and then you're looking at the results and deciding what to do next and like all these things feel like a sufficiently good [51:32] Foundation model could at least get you a lot of progress on all of them and eventually I don't see how you need to change the shape of it that much and then
[51:40] If you get that... [51:41] dynamic really kicking in, yeah, I think it'd be like a really big effect. Yeah. The thing that's interesting about that, going back to the opening AI definition of AGI, I'm curious for your thoughts. The thing that strikes me about that idea, which is AGI is achieved when AI can do all the economically valuable tasks, is that AI changes what the economically valuable tasks are. [52:00] you're creating a moving target. In the same way, I think the Turing test turned out to be a moving target. In a world where humans are leveraging AI to do more economically valuable tasks than they could have done before, [52:12] It's a self-fulfilling AGI has not been achieved yet because we're changing the tasks. Yeah. We've seen the past few years. People are always raising the bar. It's like memes about it. Oh, but I can't do this one particular thing that I really care about. It's not a static world. [52:26] I saw a tweet the other day. [52:27] I think just like [52:28] 68 years ago in South Korea, most people were farmers. [52:32] And then obviously, they're hardly known as a farmer. It's been fully automated. [52:36] And the economy like 50x or something. That's pretty crazy. And I feel like we're headed into a similar dynamic where... [52:42] The Joss people do are going to completely change. [52:44] and there's going to be a bunch of new stuff that we do, and it's going to create 100x more economic value than all the economic value that we have now. AGI is happening in that curve somewhere, which is going to be pretty weird, I think. And I think a lot of that is going to come from the fact that [52:58] when you discover an economically useful task, with these models, if you buy more compute, you can just scale it up more. [53:03] which is in a much more elastic way than you could with humans. You have to like train up and hire and stuff like that. So... [53:08] I expect it to look pretty weird where, yeah, a bunch of
[53:12] class of tasks are going to get-- [53:14] scaled up, it's going to produce a lot of value. And then it's going to make the world [53:17] It's going to make the shape of the economy look different, in a confusing way. [53:21] So in preparation for this interview, I asked Notion AI what I should ask you. [53:27] And one of the questions it asked that I thought was really interesting is, how do you see the role of human creativity and intuition change in a world with AI? I don't think I assign a special value to humans necessarily. [53:40] being creative or intuitive. [53:43] side talking about this multiple times, but we were all surprised, I think, when Dolly came out. It was like, oh, I thought it was going to be boring work, but it's actually like... [53:51] Making beautiful images. It's like art. At least the technical part of art. Implementing the idea. Yeah, I feel like it's like a category error. [53:58] to assign special value to any particular thing that a human can do. [54:03] and that AI might not be good at now. [54:06] What really matters is more of the will of what you want to be done and your ability to [54:11] communicate that and observe it and [54:13] keep it on the road. That's what really matters. But then it's going to be better at us. It's going to be better than us at being creative and being intuitive in these things. [54:19] How have you developed that will in yourself? [54:23] I think... [54:23] Openness is important. So just openness to experience it, especially in a world that's [54:29] as much as this. And I think being ambitious is important, especially with these crazy AIs like [54:36] we should really ratchet up our ambition of [54:39] what we want. [54:39] humanity. [54:41] to do and
[54:43] not be so encumbered by previous failures when this new tech unlocks a lot of stuff. [54:48] That's pretty key. [54:49] Yeah. [54:49] I think, yeah, I'd be optimistic. [54:51] I love it. And I think that's probably as good as any place to end. I really appreciate you taking time to do this. I learned a lot. This is great. [55:21] knowledge bombs about ChatGPT. Every episode is a roller coaster of emotions, insights, and laughter that will leave you on the edge of your seat. [55:31] craving for more. It's not just a show. It's a journey into the future with Dan Shipper as the captain of the spaceship. [55:38] So do yourself a favor. Hit like, smash subscribe, and strap in for the ride of your life. [55:44] And now, without any further ado, let me just say, Dan, I'm absolutely hopelessly in love with you.
Want to learn more?
Ask about this episode