OpenAI DevDay 2025: The Infrastructure Play Nobody Saw Coming

Sangamesh Gella
Oct 7
6 min read

I spent Tuesday morning watching Sam Altman give developers something they didn't know they needed.

Not better models. Infrastructure.

A treasure chest beams code, gears, and text into the sky. Labels read "OpenAI DevDay 2025," "ChatKit," "AgentKit," on a cosmic backdrop. — Gemini Banana/Sangam's Illustration

What Actually Shipped

Four things dropped at DevDay. Each one changes the baseline for what you can build.

AgentKit – A complete toolkit for building AI agents. Visual builder, evaluation tools, and deployment infrastructure. Everything between "I have an idea" and "it's running in production."

ChatKit – Pre-built chat UI you can embed anywhere. Clean interface, accessible, and works on mobile: your brand, their rendering engine.

App SDK – Build complete applications inside ChatGPT. Not plugins. Not integrations. Apps that live where 800 million people already spend their time.

Enhanced Codex – Code generation that actually understands context well enough to be useful in production.

Then Altman dropped the stat: 800 million people use ChatGPT every week. Not "tried it once." Use it. Weekly.

That's not a product metric. That's distribution infrastructure.

The Shift Nobody's Talking About

For two years, every conversation about AI development has centred on models. Which one's smarter? Which one's cheaper? Which one hallucinates less? DevDay wasn't about models.

It was about OpenAI admitting that giving developers GPT-4 access and saying "good luck" isn't enough because every team that builds AI features ends up solving the same problems.

Agent orchestration. Conversation state. Evaluation. UI that doesn't look like garbage. Deployment that doesn't break at 2 am. AgentKit is OpenAI saying, "We built this infrastructure for ourselves. Here, you can use it too."

That's a different kind of offering.

What AgentKit Actually Changes

Before DevDay, building an AI agent meant writing everything yourself.

You'd pick a model (probably GPT-4). Then, you'd build orchestration logic, which determines how the agent decides what to do next, how it handles multi-step tasks, and how it recovers from errors. Then conversation state management. Then, evaluation frameworks are needed because you need to know if the thing actually works. Then, the deployment infrastructure. Then monitoring.

Six weeks later, you'd have something that works in demo and breaks in production.

AgentKit removes all of that baseline work.

Agent Builder gives you a visual interface for designing agent logic. Define steps, set conditions, and handle branches. Instead of writing orchestration code, you're describing behaviour.

Evaluation tools that actually work. You can trace what your agent did step-by-step. Test individual components with datasets. Automatically optimise prompts. All the debugging infrastructure you'd eventually build anyway.

ChatKit handles the UI. You don't need a frontend developer to make something that doesn't feel like a terminal window. Customise branding, plug it into your product, done.

They handle infrastructure scaling, monitoring, and maintaining the operation of the models.

You write business logic. That's it.

The Agent Builder Moment

Here's what jumped out to me during the demo.

Altman called Agent Builder "Canva for agents." Visual interface, drag-and-drop, no code required if you don't want it.

That comparison isn't accidental.

Canva didn't make better design tools. They made design accessible to people who aren't designers. Agent Builder aims to achieve the same goal for AI agents.

Currently, building agents require technical expertise. You need to understand LLM prompting, API integration, error handling, and state management. It's a specialist skill.

If Agent Builder works the way they're positioning it, that changes. Product managers could prototype agents. Support teams could build their own automation. You don't need to be a machine learning engineer to ship AI features.

Maybe it works. Maybe it's overpromised. However, the bet they're making is clear: the bottleneck is no longer model capability. It's developer velocity.

The 800 Million Person Distribution Channel

Let's talk about the App SDK.

You can now build applications that live inside ChatGPT. Full apps. Not "hey ChatGPT, connect to my service and do X." Apps that users interact with without leaving the ChatGPT interface.

Think about what that means for customer acquisition.

Every app you build right now requires you to convince users to visit your website, create an account, and learn your interface. You're fighting for attention against every other app they use.

Or you build where 800 million people already are.

They're already logged in. They already trust the interface. Discovery occurs through natural conversation; they describe what they need, and your app appears if it's relevant.

This isn't theoretical. This is how platform shifts happen. The iPhone didn't just make better mobile websites possible; it made apps the default distribution model for mobile content.

ChatGPT is doing the same thing for AI-native applications. Your competitors aren't just building better features. They're building where users already are.

For Developers: What Changes Tomorrow

If you're building AI features, three things just became table stakes:

Agents become the default interface. Not chat as a nice-to-have feature. Agents that can actually do things, complete tasks, make decisions, and handle complexity. AgentKit removes the infrastructure burden. You have no excuse anymore.

Evaluation becomes non-negotiable. You can't ship agents you can't measure. AgentKit's eval tools mean you're expected to know whether your agent works before customers discover it doesn't. The bar just went up.

Distribution through ChatGPT becomes viable. If your users might benefit from AI assistance, and you're not building a ChatGPT app, you're making a choice. Maybe it's the right choice for your product. But it's a choice, not a default.

For Businesses: The Strategic Question

Strip away the technical details. Here's what matters.

Where do your users go when they need help?

If they open your app, great. Build there. However, suppose they're opening ChatGPT to figure out how to use your product or to solve problems that your product is designed to solve. In that case, you've got a distribution problem masquerading as a feature problem.

App SDK means you can be where they already are. Not through a clunky "share to ChatGPT" button. Native presence in the interface they use daily.

What are you optimising for?

If you need tight control, a specific UI, complex workflows, and deep integration with your systems, consider building traditional apps. But if you need to be fast, experimental, and accessible to non-technical users? AgentKit gets you there in days, not months.

What's your moat?

If your competitive advantage is proprietary data or domain expertise, building AI agents that surface that knowledge through ChatGPT might be your distribution play. If your moat is user experience and brand, you need to control the interface. Neither answer is universal.

The Infrastructure Paradox

Here's the tension I'm sitting with.

OpenAI just made building AI agents dramatically easier. Which means more people will create them. Which means more agents competing for user attention. This means the quality bar rises even as the technical bar falls.

AgentKit removes infrastructure complexity. Great. Now everyone has access to the same infrastructure. Your competitive advantage isn't "we figured out how to build agents." It's "we built agents that solve real problems better than anyone else."

That's a different game.

What Happens Next

This is early innings. We're maybe three weeks into understanding what's possible.

I'm watching:

How developers use Agent Builder. Does it actually let non-specialists build working agents? Or does it become another tool that only developers use, with a slightly better user experience?

What kinds of ChatGPT apps emerge? Are they utilities that could've been web apps? Or are they genuinely new categories of software that only make sense inside an AI interface?

As evaluation tools keep pace with agent complexity, can we still accurately measure their effectiveness as agents become more sophisticated? Or do we hit a point where "works" becomes subjective?

What businesses realise they need to rebuild from scratch. Some products will integrate ChatGPT as a feature. Others will realise their entire product should've been a ChatGPT app from the start.

The Bet OpenAI Is Making

All of this, AgentKit, ChatKit, App SDK, is a bet that the next generation of software lives inside AI interfaces, not alongside them.

Not "add AI to your product." Build your product as an AI-native experience.

Not "integrate with ChatGPT." Build inside ChatGPT.

Not "use AI for support or features." Build agents that are your product.

Maybe they're right. Maybe they're early. But watching 800 million people already change their behaviour, defaulting to ChatGPT for questions, problems, and workflows, it's hard to bet against distribution.

The Question I'm Sitting With

DevDay gave developers tools. Better, faster, easier tools. But easier tools don't make easier decisions. They make more options possible, which means more ways to be wrong.

You can now build AI agents in days. Should you? Where should they live? What problems should they solve? How do you measure if they're working?

Those aren't technical questions. They're strategic ones. And AgentKit doesn't answer them.

What I'm learning: the teams that win aren't the ones with better tools. They're the ones who choose deliberately, ship quickly, and adapt when users prove them wrong.

DevDay changed what's possible. It didn't change what's right for your product. That's still on you.

If you're building with these tools, experimenting with AgentKit, shipping ChatGPT apps, trying to figure out where agents fit in your product, I'd genuinely like to know what you're seeing.

Not the demo version. The 3 am debugging version.

What's working? What sounded good until users touched it? Where are the gaps between OpenAI's vision and your production reality?

I'm collecting notes partly because I'm curious. Partly because the most interesting problems always hide in the gap between announcement and implementation.

P.S. If you found this helpful, I write about Salesforce, AI tools, and productivity stuff that actually works: no fluff, no generic advice, just real experiences from the trenches. For more information, please visit my website's home page and subscribe. Thank you for reading this.

Subscribe Here