Designing Embedded AI Experiences Inside ChatGPT and Claude

Last October, I started leading design work for a new category of TurboTax experience: embedded AI applications running directly inside ChatGPT and Claude. The work began as a relatively small MVP ahead of tax season, with the first release launching in December. We followed with a larger V2 release in January, and then a major expansion in April alongside the Claude Connector launch. Over the course of those months, the product evolved from a lightweight embedded experience into something much deeper: a connected tax preparation workflow that allowed users to begin preparing their taxes almost entirely from inside AI platforms they were already using daily.

Users could connect accounts, go through personalized conversational tax interviews, upload and extract tax documents, generate dynamic filing checklists, and synchronize data directly back into the core TurboTax experience. It was one of the first times TurboTax had operated natively inside major consumer AI ecosystems at this level of depth, and the work ultimately went on to win a Webby Award. More importantly, though, it gave our team an unusually early look into what designing embedded AI applications inside systems like ChatGPT and Claude actually feels like in practice.

One of the most interesting aspects of the work was that we were not designing a traditional standalone product. We were designing an embedded application that had to exist coherently across multiple AI ecosystems simultaneously. Using the UI kits and platform guidance provided by OpenAI and Anthropic, we built what was effectively a centralized MCP-powered application layer for TurboTax. The core experience remained structurally similar regardless of where it launched, but through tokens, platform-specific theming, and custom component layers, the application could dynamically adapt itself to feel native inside ChatGPT or native inside Claude while still operating from the same underlying system architecture.

In practice, it was one product with multiple deployments, multiple orchestration environments, and multiple interaction models depending on where the user entered the experience. The application also had to support bidirectional data synchronization between systems. Users were authorizing secure connections between TurboTax and the AI platforms themselves, allowing conversational workflows to persist information, extract documents, generate preparation states, and synchronize that information back into the primary TurboTax product.

At a high level, that sounds relatively straightforward. In reality, it introduced an entirely new category of product and UX challenges that I do not think the industry fully appreciates yet. Once you begin designing applications inside systems like ChatGPT and Claude, you are no longer fully designing the orchestration layer yourself. OpenAI and Anthropic own the orchestrator, and that changes almost everything about how product design behaves.

Traditional software assumes a relatively controlled environment. Product teams usually own the interface, the navigation model, the interaction sequencing, and most of the surrounding system behavior. Inside AI ecosystems, many of those assumptions no longer hold. The conversation itself becomes the navigation layer. The AI platform partially controls discovery, invocation, memory, rendering behavior, and context retention. Your application stops behaving like a standalone destination and instead becomes one capability among many inside a much larger intelligence environment.

That fundamentally changes the nature of the design problem. Users fluidly move between the native model, embedded applications, uploaded documents, conversational context, generated outputs, and external systems without necessarily perceiving hard boundaries between them. As a result, the work starts looking less like traditional screen design and more like designing orchestration systems. You are shaping continuity, conversational state transitions, interoperability, dynamic interfaces, and trust boundaries that exist across ecosystems you do not fully control.

One of the largest conceptual shifts for me personally was realizing how much the orchestrator itself becomes part of the user experience. In traditional software, if you carefully design a workflow, users generally experience that workflow consistently. Inside AI ecosystems, orchestration itself becomes probabilistic. The same user intent may surface differently depending on conversational history, memory state, model interpretation, invocation timing, or competing tools inside the ecosystem. Product teams are no longer designing fully deterministic flows. They are designing adaptive systems that cooperate with another intelligence layer operating above them.

This creates a very unusual dynamic because parts of the experience become emergent rather than explicitly authored. The platform determines how apps are surfaced, how tools are called, how memory behaves, how transitions occur between systems, and how much conversational continuity exists from one interaction to the next. Designing inside these ecosystems increasingly feels less like designing software and more like designing protocols between multiple layers of intelligence.

There are also surprisingly concrete UX limitations that emerge from this model. In ChatGPT, for example, product teams do not fully control the canvas behavior if a user opens it. That means if there is critical information you always want visible up front, there may not actually be a deterministic way to guarantee its visibility because the platform ultimately controls how the canvas is rendered and expanded. In Claude, the now-familiar “human in the loop” approval cards are similarly orchestrator-controlled. As a product team, you do not fully own how or when those interaction patterns appear. That means there are moments where you cannot deterministically control how users answer certain questions or progress through sensitive workflows, even when those workflows are deeply connected to your application logic.

Those constraints fundamentally change how you think about product design. A large part of the work becomes designing around platform restrictions, orchestration constraints, and interaction systems you do not entirely own. Instead of fully controlling experiences, you are often designing resilient systems that can adapt to different orchestration behaviors while still maintaining continuity and trust.

At the same time, one of the biggest assumptions I changed my mind about during this work was the idea that conversational interfaces alone would simply replace traditional interfaces outright. Conversation is incredibly effective for onboarding, ambiguity reduction, contextual intake, organization, and guidance. Users naturally prefer conversational interaction when they are uncertain, unfamiliar with a workflow, or trying to navigate complexity. But once workflows become denser, more stateful, more document-heavy, or more verification-oriented, users begin demanding structure again.

That does not mean conversation failed. It simply means human cognition still benefits from visibility, side-by-side review, structured confirmation, previews, persistent state, and auditability. One of the strongest patterns we observed was that users loved conversational intake but still wanted highly structured verification before committing actions, especially in workflows involving money, legal implications, identity, or irreversible outcomes. The future likely is not “everything becomes chat.” Instead, it feels much more likely that conversational orchestration will coexist with interfaces that dynamically materialize around the conversation itself depending on context and intent.

Another thing that became immediately obvious during research was how quickly embedded AI systems change user expectations around continuity. As soon as the system begins remembering context, organizing information, understanding intent, and reducing friction, users start expecting that continuity everywhere. The moment the system loses context or breaks continuity, the experience suddenly feels fragmented. Not necessarily because the technology is broken, but because the user’s mental model has already shifted from “I’m using tools” to “I’m operating inside an intelligent environment.”

That transition happens remarkably fast. One of the biggest UX challenges we encountered was not whether users liked the AI interactions themselves. Most did. The challenge was what happened when continuity stopped. Users expected the intelligence layer to persist across onboarding, uploads, interviews, transitions, filing workflows, and product boundaries. From a technical perspective, those boundaries are understandable. From a user perspective, they increasingly feel artificial because the intelligence layer creates an expectation of seamlessness that traditional systems were never designed to support.

This also fundamentally changes how UX operates around trust and confidence. Traditional software is deterministic. Embedded AI systems are probabilistic. Users are constantly trying to understand what the AI knows, what it inferred, what was verified, what system is authoritative, how confident the output is, and who is accountable if something goes wrong. Those questions become especially important in workflows involving finance, healthcare, legal systems, or identity. The challenge becomes less about designing interactions and more about designing confidence calibration. Users need to understand uncertainty, provenance, verification, accountability, and the boundaries between systems that may all appear unified from the outside.

One of the clearest lessons from the work was also how much the industry currently overestimates open-ended prompting. Blank prompt boxes assume users already possess vocabulary, confidence, process understanding, domain knowledge, and awareness of what the system is capable of doing. Many users do not. Especially in high-complexity workflows, what consistently performed better was structured guidance, contextual next steps, conversational scaffolding, progressive disclosure, intelligent suggestions, and dynamically generated interfaces around user intent. Good AI UX often reduces the amount of prompting required rather than increasing it.

After spending the last year working in this space, I genuinely believe embedded AI application design is becoming its own category. It sits somewhere between systems design, conversational UX, orchestration design, platform design, and traditional product design, but it is not fully any one of them. The work increasingly involves orchestrating intelligence, managing probabilistic systems, designing continuity, balancing automation with oversight, shaping trust boundaries, and coordinating dynamic interfaces across ecosystems you do not fully control.

What makes this moment particularly exciting is that very few established patterns exist yet. Most teams are still discovering these interaction models in real time. It feels far less like optimizing mature UX conventions and much more like helping define an entirely new computing paradigm while it is still forming.