Nicholas

GPT 5.5 just did what no other model could

Nicholas

In this mini episode, I break down OpenAI’s new GPT 5.5 and GPT 5.5 Pro after weeks of early testing. I walk through three real jobs I threw at the model: building an app for me to teach my second grader more advanced subtraction concepts, tackling a tech debt problem in the ChatPRD codebase, and hacking into a proprietary Bluetooth pixel display that every other model had failed me on. My verdict: higher intelligence, better efficiency, and genuinely autonomous long-running loops that change what I think is worth tackling. What you’ll learn: How I think about GPT 5.5 Pro’s pricing vs engineering time, and when I believe the “intelligence tax” is worth paying Why I treat GPT 5.5 as a developer model first, and why I couldn’t find a consumer use case that justified its intelligence The exact prompt pattern I use to unlock a long-running autonomous subagent loop How I got a near-six-hour autonomous run to one-shot 98% of edge cases in a migration over millions of chat threads and drop my Sentry error rate to the floor Why I’m now throwing GPT 5.5 at tech debt, flaky tests, and security backlogs first How I combined a Bluetooth packet sniffer and GPT 5.5 to reverse-engineer a proprietary pixel speaker after Claude Code and GPT 5.4 both gave up How I use the /personality command inside Codex to swap the default “baked potato” tone for something I actually enjoy working with — In this episode, I cover: (00:00) Introduction to GPT 5.5 testing (00:40) What is GPT 5.5 and how much does it cost? (03:23) Testing GPT 5.5 in ChatGPT: the intelligence overhang problem

Published
Published Apr 23, 2026
Uploaded
Uploaded Jun 12, 2026
File type
POD
Queried
0

Full transcript

Showing the full transcript for this episode.

AI-generated transcript with timestamped sections.

0:00-1:30

[00:00] Welcome back to How I AI. I'm Clara Vaux, product leader and AI obsessive here on a mission to help you build better with these new tools. [00:07] Today, I have a very special episode for you where I'm going to tell you everything I think about the new [00:13] GPT 5.5 model, which I've been able to test for the past couple of weeks. [00:18] Spoiler alert, it is a powerhouse, and I've been able to do things with this model [00:23] especially around advanced coding that I haven't been able to do before with [00:27] any other model on the market. [00:29] and I'm going to show you how it breaks my personal high-tech eval [00:33] hacking into this little computer. [00:35] Let's get to it. So before I tell you what I built with GPT 5.5, let me tell you a little bit about the model itself. So today, OpenAI is releasing GPT 5.5 and GPT 5.5 Pro. [00:51] into Codex and ChatGPT. Not available in the API quite yet. [00:56] And this model I've been testing for the past couple of weeks, and I will tell you [01:01] What OpenAI is saying is true. They're saying that it has a higher capacity for complex work [01:06] It is more efficient, including being more token efficient, [01:10] getting that work done. And so the whole idea with this model is it's smarter and it's more efficient. So you're going to get more done. And that has really been [01:17] my experience. Now, I'm glad it's more efficient because it is expensive. GPT 5.5 is $5 per million input tokens and $30 for output tokens and GPT 5.5 Pro

1:30-3:02

[01:30] which has powered all this work that I've been doing is 34 million input tokens and $180 for output tokens. [01:38] This is a pricey one, but when I reflect on what I was able to achieve with this model in early testing, [01:45] I'm going to pay the intelligence tax because I think what I was able to achieve is really important. And [01:53] This is one of the things that I think about a lot. [01:55] When I'm testing these new models or testing these new tools, [01:59] You know, everything has an ROI and there can be an ROI in terms of speed. So can I get the things done that I want to get done? [02:06] faster. And that's certainly been an accelerant from an AI tooling perspective and something we've all experienced for the past couple of years. [02:13] But where GPT 5.5 really helps me is ambition. It has been able to do things that literally I have not been able to do before for a couple of reasons. [02:23] One, just intelligence higher is solved problems that other models and other harnesses other than codecs have really had a hard time with. The second thing I've experienced is because the efficiency is higher. [02:35] I'm able to do more faster without losing context of what I'm working on because it's happening really quickly. Or it's being more autonomous so I don't have to babysit as much. So again, I'm getting more done. [02:47] So I do believe that what OpenAI is telling us is true. [02:52] But that's coming out of my own experience spending hours and hours and hours [02:57] with this model, throwing problems at it that other models have really had a hard time with, including

3:03-4:33

[03:03] GPT 5.5. [03:04] So, [03:05] Let's talk about what I built. And folks, for the less technical here, [03:11] one of the things i'm going to say about the model and i tested it a little bit in chat gbt but not a lot [03:17] is that I don't know what to do with all this intelligence if you don't have complex problems to solve. [03:23] So while I've tested it in ChadGBT in my personal account, which is what I got access to, [03:29] I don't have complex high intelligence [03:31] problems to solve in my personal account. And so it was really hard for me to think of where I would use 5.5 or 5.5 Pro [03:42] in chat GPT simply because the problems I'm solving there aren't that hard. But I did try to solve problems there. So let's just talk about quickly [03:50] how I used 5.5 in ChatGPT and what it gave me. And it will just give you an indication [03:56] of what I'm going to show you a little bit later, but again, [03:59] I think what the consumer or even the everyday enterprise business user is going to struggle with [04:06] using ChatTBT with this model is how many problems do you have that require [04:11] superintelligence. So again, I think this is going to be a model that developers and software engineers really love. And I'm really excited to see what OpenAI does in terms of unleashing [04:21] and boxing this intelligence in use cases that then the quote-unquote everyday person can use. So that's a little bit of my... [04:28] lecture on how much we have an intelligence overhang, basically. [04:32] So,

4:33-6:04

[04:33] What did I ask ChatGPT, GPT 5.5 to do in ChatGPT? Really simple thing. I'm teaching my second grader two-digit and three-digit subtraction. He's actually in first grade, but, you know, San Francisco, I'm trying to push him ahead. And so one of the ways that I've been able to teach him is build these little apps that help him understand subtraction with two digits and three digits and learn some. [04:54] kind of tactics to do that well. [04:57] And so I asked it to build an app for me to teach my second grader more advanced subtraction concepts. [05:03] I haven't been super pleased with some of the vibe coding tools or cloud code on this. [05:09] Nothing's really built this exactly how I wanted. So I wanted to give 5.5%. [05:14] a shot at it, and first out the gate, [05:17] it's a thinker so you can see here it thought for 17 minutes 27 seconds about this you're going to have this experience with this model this is going to be a theme of this mini episode this thing will think [05:28] And it planned a app for advanced subtraction, built the code, all this kind of stuff. Now, here's my question. Do we need 17 minutes of hyperintelligence thinking to build this app? [05:41] Probably not. If I wasn't testing for the purpose of this podcast, [05:44] Would I have waited 18 minutes for this app? [05:47] Probably not. So again, what are we going to do with all this intelligence? Is this the right form factor for, you know, a non-technical software engineer to access it? Not 100% sure. [05:58] And it built me a app here. You can see it includes many lessons, word problems, read aloud.

6:04-7:34

[06:04] It's... [06:05] Fine. [06:06] It's fine. [06:07] It's fine. [06:08] It has different modules in it. The design leaves something to be desired, but again, I'm not really going to the GPT models for front end. [06:17] I really want them to solve my hardest technical problems. [06:21] And so I would just say in ChatGPT, I'm unsure yet only because I'm not sure [06:27] what the average ChatGPT user is really trying to achieve and how much intelligence is required, even on the coding side. [06:36] I just wanted to start there by saying if you're in ChatGP, you're using 5.5. [06:41] Let me know your hard intelligence problems so I can test them. I think the like basic VibeCodeMe, a little simple app, it's fine. It's not great. It's not. [06:50] any more in particular impressive than [06:54] other things on the market, but it does a reasonable job. And then just the sniff of 5.5 is it's going to think a lot. [07:00] And it's going to give you this chain of thought reasoning here. [07:04] to let you know how it's thinking and managing its own process. [07:08] Okay, so I'm gonna put away ChadGBT. [07:11] It's fine. [07:12] Let's talk about using 5.5 Pro in Codex and [07:18] - You all, I love, I love her. I do. My initial reaction when I first started testing GPT 5.5 in Codex [07:27] is. [07:28] I am fine. [07:29] cooking. And what I mean by that is I was kicking off

7:34-9:08

[07:34] tons of tasks in parallel. [07:37] Because the feedback loop for fast, the efficiency you felt right away, [07:41] I was knocking off very long standing [07:44] Tasks with [07:46] tons of subtasks underneath them. And I'll give an example of what those are. And I was able to buy it off a tech debt, [07:52] technical problem in the chat PRD code base that [07:56] I have wanted to take care of for truly months. It has been plaguing me. [08:01] and GPT 5.5 blasted through it. So I want to show you a couple of those examples so you can understand what kind of tasks GPT 5.5 plus Codex is really good at [08:11] And why I think its intelligence is higher and the way it's configured to work autonomously and efficiently is really beneficial for the software engineer. [08:20] So the first thing that I did, which I'm not going to show you for what will become very obvious reasons, is we used OpenAI's codec security product to run... [08:30] a threat assessment and security scan on the chat purity code base. And it was pretty good. We're pretty secure. But it did come up with some low priority or low severity issues that we needed to remediate. [08:41] And instead of taking those one by one, [08:44] what i did is i downloaded the csv of those issues [08:47] uploaded to Codex and just said, can you please architecturally review these issues, group them if they're thematic, [08:53] and then propose a change and then make those changes. [08:56] and i will say it just did it it did it very well we did human review on that we did code review on that [09:02] And we were just really happy with the quality of execution, but also the fact that I can give it a list

9:08-10:39

[09:08] of generally associated but not single project tasks and it can execute on those well. And the real validation of the quality of that output came when [09:18] We had very quickly after that our annual penetration test. [09:21] And our pen test came back. [09:23] super clean. And so [09:25] I would just say if you have a list, a triage list of technical debt, if you have a triage list of security issues, [09:34] even maybe front end debt, [09:36] flaky tests, engineers pay attention. [09:39] You can throw that list at GPT 5.5 and it will get that list done. So that's use case one that I thought was really efficient and great. [09:46] Use case two, and I'm so disappointed it cleared how hard it worked on this project, but [09:52] I have, as I mentioned, this lingering tech debt in the Chat PRD codebase, which is we have millions of chats now for Chat PRD, and we were storing those chats in various legacy formats as well. [10:07] the model providers, both OpenAI and Anthropic, have changed the shape of their model responses over time. And so TLDR for the folks that are less technical... [10:17] Every model in the world has changed a little bit about how they return data via API over the past three years. [10:22] we have a bunch of debt and data debt around that where we were storing legacy formats in our [10:29] database. [10:30] And these legacy formats, because they are AI calls, because they may or may not contain attachments, because they may or may not contain tools,

10:39-12:09

[10:39] very hard to build a clean, cohesive system [10:44] backfill and sanitization of that data into our go forward data model. [10:49] And I have just been slapping like fix after fix after fix and patch after patch after patch [10:56] on this problem because every time we patch it we find another edge case so this is [11:01] An example of a data migration problem with millions of rows, which might not sound [11:06] big to many people, but [11:08] is pretty significant to us in terms of the complexity of the data inside of it with [11:13] functionally unstructured, lightly structured data with [11:16] tons of edge cases. [11:18] And [11:20] I just finally was like, you know, GPT 5.5, take me away, gave the model that problem. [11:27] And it executed... [11:29] so well. It built functionally one shot [11:33] a solution? [11:34] that covered, I'm not kidding, 98%. [11:39] of the edge cases that we had identified. So first of all, one shot building a complex migration by pointing things to docs and libraries [11:48] Very, very good. Something that really been hard for us to do because it was so complex and so unstructured. [11:54] before. The second thing, which I want to show you on the screen now is I needed [12:00] GPT 5.5 and codex to validate that work. [12:04] And so... [12:05] I pulled a production-like [12:08] set of examples,

12:09-13:39

[12:09] into a test environment. [12:12] And I asked Codex, look, I need you to figure out a way to programmatically test every thread. [12:18] That's in local. I pulled a local version of this production-like data. [12:24] post it to Amthropic, [12:26] and OpenAI and any other provider that we're using... [12:29] I need you to make a scalable system for our team to do this programmatically, ideally through a CLI. [12:35] so that any agent can test any thread for these data issues. [12:39] And then I've been saying this a lot to GPT 5.5. I trust you. This is my prompt to GPT 5.5. I trust you to make a call, figure out how to spawn a subagent to do this, test it and identify any issues, repair them and get this ready for production. [12:54] Thank you, because I'm very polite. [12:57] This thing worked. [12:59] for six hours. It was actually five hours and like 57 minutes. [13:04] Truly, it just banged its head against the wall for six years. [13:09] hours. And I did not have to. Zero prompts, zero follow ups, zero steering. I think I had to approve one [13:18] script call or something for it to have access to run in its sandbox. [13:23] But otherwise, [13:24] It just went for... [13:25] six hours. I have not seen personally, everybody says, Oh, I'm getting my [13:30] Agent to Run Overnight. I have not seen it until GPT 5.5 in a very... [13:34] constrained use case. And so [13:36] this thing will do long running autonomous

13:40-15:11

[13:40] tasks that require sort of a loop to understand if it's doing well. [13:44] and moving things forward. [13:46] It ran for almost six hours and then it [13:49] implemented the smoke test, [13:51] It tested all the example data [13:54] And after this, we literally, after 2 million rows, had... [13:59] one edge case that was not caught. [14:02] And so just like think about that for for a minute. [14:05] You know, we had 2 million rows. [14:07] one edge case where before we were hitting edge case after edge case after edge case, six hours of GPT 5.5. [14:15] And then, you know, we saw we saw our error rate just hit the floor in our century monitoring. And so [14:22] People say that AI coding is going to decrease quality because people are vibe coding. That is just such an 18 months or 12 months ago narrative. [14:32] I think quality is going to go up. This kind of problem I've truly avoided because the intelligence was not there to do it autonomously. [14:41] My ability to – and our engineering team's ability to, like – [14:46] break down the problem and spend the dedicated time to hitting every edge case, [14:52] in our synthetic data. [14:53] really hard. And, you know, every time you like plug one hole, another one pops open. [14:59] And just being able to hand this to GPT 5.5 and Codex, [15:03] has changed my life. So again, I am scared about how much this will cost me in [15:09] you know, production when those tokens, but like

15:12-16:49

[15:12] Cheaper than me, cheaper than my engineering team. [15:15] And it really did run six hours. And so I'm just like, throw this thing at your quality issues. [15:22] Throw this thing at your bug backlog. Throw this thing at a security... [15:27] assessment and close the quality gaps or performance gaps or security gaps. [15:33] In your app, it does work. [15:35] really, really, really well. [15:38] So that's my prime use case. If I didn't share anything else, [15:42] This would be enough. [15:44] bit off my largest piece of tech debt in my app. [15:47] basically made my errors go to zero. [15:50] and did it all six hours autonomously in a self-sustaining sub-agent loop. [15:56] I love you, GPT 5.5. [15:57] But there is a real eval, and I told you this in the intro. [16:01] My Real Eval is [16:03] is this thing. This is a Divoom Mini 2 Retro PC Style Bluetooth Speaker [16:10] And Tiny Scream. And I have been, I am not kidding. [16:15] I have been hacking on this thing. [16:17] since... [16:18] January since late January or February. I think I ordered it around Valentine's Day. [16:23] And my only goal [16:26] is to be able to display funny stuff on this screen. Now it comes with an out of the box iPhone app, [16:31] And so I can use this proprietary iPhone app to send [16:36] images to this thing, but I don't want that. I live in the terminal. I want to be able to do this programmatically. [16:41] And this is like proprietary code loaded on this device. I was like very deep in Chinese language.

16:49-18:20

[16:49] repositories and documentation from like Bluetooth hardware providers. I was in [16:55] deep, y'all. And I threw... [16:58] First, I threw cloud code at this. [17:00] And I said, can you figure this out? Cloud Code could not figure it out, even with Opus. [17:05] I threw GPT 5.4. [17:08] Add it. [17:09] It could not figure it out. [17:11] I cannot tell you how... [17:13] crazy. I went with this, but I'm going to try. [17:17] So, [17:18] This is a little device. You think you would be able to plug it in and just say, [17:22] Dear Claude Code, tell me how this device works, make no mistakes. [17:26] No, that's not how it works. [17:28] It connects to your computer or to your phone via Bluetooth. So it is interacting with this app on your phone. [17:36] through Bluetooth. [17:37] And in the app, I can like draw something and click send and it will display here. [17:42] So I know that over Bluetooth, I can change the display of this app, but we could not figure out how to encode that message. [17:52] What did I do? Well, this is a little peek. This has nothing to do with AI. This has a peek to how [17:59] "Cuckoo bananas, your friend Claire is." [18:02] So what I did is I spent truly hours... [18:06] downloading a Bluetooth profiling profile on my phone for developer debugging. [18:13] I then hooked it up to [18:17] Sorry, I'm crazy. [18:18] hooked it up to a packet sniffer,

18:21-19:51

[18:21] so that when I was using the app here on my phone, [18:24] and it sent an image to this computer [18:27] It would log, [18:29] and sniff the packets, [18:30] And tell me what Bluetooth was sending to this little guy. [18:34] I threw these logs and kind of all the information that I had [18:39] at 5.5 and let me show you what happened so I'm gonna get that repo up [18:45] really quickly. [18:46] and show you my desperate prompting. I said, this thing is connected by Bluetooth. [18:51] Take what you know and please just do anything to figure out how to display on this. You have so much information. You should know how to do it. I believe in you. And guess what? This effing thing. [19:04] Did it. It did it. So I was... [19:07] My success? [19:08] My success measure here, [19:11] which is I was able to build a command line tool [19:15] where I can run it in Terminal, press Enter, let's see, [19:21] Did the benchmark hit? [19:24] Hello! It says hello. This is months, months, months of trying to hack into this stupid thing. It was [19:34] encoding and decoding bitmap files. It was crawling the web trying to find if there was some secret SDK. Codex, you did the thing and even better, [19:45] Then that. [19:46] It is now hooked up so that anytime I ask, [19:50] Codex to do a thing?

19:52-21:25

[19:52] it will alert me on this. So let's give it [19:55] a little try live on the podcast and then I will get you out of here. But I am telling you, [20:02] This. [20:02] hack into a proprietary device. That is my intelligence test now. [20:07] All right. [20:08] So let me share my screen really quickly and let's just test if this thing [20:13] works. So I have my terminal up and I am going to [20:18] Go into Codex. [20:20] And I'm going to say something really simple. I'm going to say... [20:24] What can you help me with? [20:27] Okay. [20:28] And I built into my codex config [20:31] a notify hook, [20:33] that should [20:34] Do something on here. [20:36] when it's time to be notified. So [20:38] What can you help me with? Dear Codex, it's going to tell me. [20:42] And... [20:43] Let's see, it's done. Maybe I'm not paying attention to my computer. Let's see if it runs. It should make a noise. [20:51] Your move. Well, your move without the E. Your mauve. [20:56] It made a little beepy boop. You all. [20:58] This is changing. [21:00] my life. So [21:02] Again, I did three assessments of GPT 5.5. This is the one that impressed me most. I will share more about this on the blog. I might even do a little mini up on this particular workflow. I'll try to publish the code. [21:16] But you all, this was my delight moment. I screamed. My children were blown away. They have seen me slave over this thing. I was sending them messages and saying,

21:25-23:02

[21:25] "Hey!" and then like responding to their questions by just showing them the screen. [21:29] I am obsessed. So GPT 5.5 has hit my intelligence benchmark for can you hack into this [21:36] Chinese [21:38] digital screen with proprietary Bluetooth transport mechanisms and [21:43] bitmap, compression, and guess what? [21:46] 5.5 can. [21:48] All right, so that is a wrap for our quick review of GPT 5.5 TLDR. [21:54] I love this thing. It is super smart, it is super efficient, and it will work on its own against complex problems, basically as hard as you ask it. It has solved problems I have not been able to solve before. [22:07] The only thing I will leave you with is that it has the, as I call it, baked potato personality that we've all come to know. [22:16] and love from codex [22:18] It is a doll doll dollard. But... [22:22] I learned over the testing of this if you do slash personality, [22:27] in Codex, you're able to change that to something a little friendlier. And while some of my fellow early testers said it had too much of a Gen Z personality, I said I like to stay young. [22:39] give me that Gen Z GPT 5.5. I'll take it any day. [22:43] over the paper bag [22:46] baked potato personality that you get out of the box. Other than that, [22:49] It's my favorite senior software engineer, staff software engineer. [22:52] I'm going to go blow through a bunch of technical work, and I really love this model. So I can't wait to hear what you think. And if you figure out a high intelligence test that works in Chachupiti,

23:02-23:33

[23:02] Let me know. [23:03] Otherwise, enjoy coding and I can't wait to see what you build. Thanks, y'all. Thanks so much for watching. If you enjoyed the show, please like and subscribe here on YouTube, or even better, leave us a comment with your thoughts. [23:16] You can also find this podcast on Apple Podcasts, Spotify, or your favorite podcast app. Please consider leaving us a rating and review, which will help others find the show. You can see all our episodes and learn more about the show at howiaipod.com. [23:33] See you next time.

Want to learn more?

Ask about this episode