Tackling The Demo-to-Production Gap In AI Agents: How Raunak Bhandari Built KiwiQ For Enterprise Reliability

AI agents today can write copy, generate research summaries and draft campaign strategies within minutes. Yet inside enterprises, the real challenge begins once these systems collide with the messy reality of real marketing operations, long workflows, multiple stakeholders, approvals, and the need to deliver consistent outcomes week after week.

This gap between impressive demonstrations and reliable deployment has emerged as one of the biggest constraints for organisations experimenting with agent-based AI systems.

In practice, companies repeatedly encounter three structural challenges. Context and memory degrade as workflows stretch across multiple documents and steps. Human approvals interrupt execution and slow entire processes. And when failures occur, limited visibility into the workflow makes debugging difficult.

This “demo-to-production gap” is where many promising AI initiatives stall.

Building for the Production Gap

It is this operational problem that Raunak Bhandari, a former Google AI lead, set out to solve when he designed KiwiQ, a multi-agent orchestration platform built to support complex enterprise workflows.

Bhandari architected and built KiwiQ’s core technical infrastructure with the goal of enabling AI agents to operate reliably across long-running enterprise processes rather than isolated demo tasks.

The platform is already being used by enterprise customers and currently supports more than 200 production workflows across marketing operations.

Vipul Chaudhary, CTO of Lokal, one of KiwiQ’s early enterprise customers said the platform addressed a failure pattern his team repeatedly encountered while experimenting with agentic workflows.

“We evaluated several orchestration approaches before deploying KiwiQ across our content workflows at Lokal. The problem we kept hitting was context degradation — agents would lose track of brand guidelines and audience preferences midway through a campaign. KiwiQ’s multi-tier memory architecture, particularly the semantic retrieval layer, solved what had been a persistent failure mode for us.”

By grounding the architecture in these operational realities, Bhandari approached AI orchestration less as a chatbot interface and more as production software capable of executing complex, repeatable workflows.

The Founder’s Focus on Enterprise AI Failures

Bhandari’s focus on reliability emerged from his experience building large-scale machine learning systems at Google, where he worked on several products that required AI to function reliably in production environments used by millions.

At Google Maps, Bhandari led development of graph-based machine learning techniques to detect coordinated fraud targeting business listings and reviews. The problem had resisted conventional contribution-level classifiers because coordinated fraud deliberately varied individual behavior — Bhandari's insight was to reframe detection at the graph level, identifying coordination patterns across networks of actors rather than flagging individual contributions. The system improved detection of malicious actors by 18% while reducing false positives by 41%.

He later worked on deep learning systems used in Google Discover’s ranking infrastructure, as well as research focused on improving factual accuracy in Google’s Knowledge Graph, contributing to improvements of nearly 30 % in factual accuracy.

Bhandari was also involved in early work related to Gemini, focused on improving factual verification mechanisms within large language models.

However, these projects revealed a recurring lesson.

Many sophisticated AI models did not fail because of poor algorithms, they failed because the surrounding infrastructure could not reliably support them in real-world operational environments.

It was this experience, Bhandari says, that shaped his thinking when designing KiwiQ.

Rather than focusing purely on model intelligence, he chose to focus on the infrastructure required to make AI systems dependable in production workflows.

Why Orchestration Has Become the Hard Problem

While many AI tools today can generate content or perform discrete tasks, reliably managing multi-step workflows remains significantly more complex.

Marketing operations are rarely linear. A campaign brief may trigger research, drafting, brand checks, legal review, localisation for multiple channels and performance feedback loops often with humans intervening at several stages.

“You can get an AI agent to do impressive things in a controlled environment,” Bhandari explained. “The moment you try to run it across real enterprise workflows with multiple dependencies and humans in the loop, it tends to break.”

This operational fragility is precisely what KiwiQ was designed to address.

Inside the Architecture Bhandari Built

To support reliable production workflows, Bhandari designed KiwiQ around a structured orchestration architecture rather than independent agents operating in isolation.

At the core of the system is TeamGraph, an orchestration engine built by Bhandari to coordinate specialised agents across complex workflows. The platform includes an API layer and software development kit intended to support what he describes as an “agent software development lifecycle” building, testing, deploying and refining AI-driven processes.

Several architectural decisions reflect Bhandari’s focus on reliability.

Multi-tier memory for long workflows

Bhandari designed KiwiQ’s memory system using multiple persistent layers. These include session memory with compression, document-level storage with version history and a semantic retrieval layer combining vector search with knowledge graph structures.

The goal is to prevent context loss when workflows span large numbers of steps or documents.

Parallel human-in-the-loop execution

Instead of pausing entire workflows for approvals, Bhandari built a system that isolates only the step awaiting human input while allowing other tasks to continue executing.

The architecture also includes an internal pattern called “Ask Oscar”, where the system first attempts to resolve questions automatically by searching previous context before escalating to a human reviewer.

Bhandari structured the system around a central coordinating agent called Oscar, which manages interactions among specialised agents responsible for research, strategy, content generation and quality control.

Rather than communicating directly with one another, agents interact through this orchestrator reducing circular dependencies and coordination failures.

Observability and workflow replay

To diagnose failures, Bhandari built an event-level logging system that records every step in the workflow execution. Teams can trace failures to specific agent decisions and replay workflows from intermediate checkpoints.

According to the company, the infrastructure can support roughly 100 concurrent workflows per server while maintaining crash recovery capabilities.

Customer Validation of the Architecture

For early users, the architecture has addressed operational challenges that previously made agent-based systems difficult to deploy in production.

Aswin Chandrasekaran, CTO of HeyBubba AI and another KiwiQ customer, said debugging long AI workflows had been one of the biggest operational barriers.

“Before KiwiQ, when a content AI workflow failed at step 15 of a 20-step process, we had almost no way to diagnose what went wrong. The observability and checkpoint replay capabilities meant we could trace failures to specific agent decisions and rerun from that point. That’s the difference between a tool you demo and a tool you actually ship with.”

Early Use Cases and Future Plans

KiwiQ’s early deployments focus primarily on marketing workflows such as research, campaign strategy development, content creation and distribution.

Bhandari believes these processes improve significantly when feedback loops are incorporated into the system itself.

While the platform’s first use cases focus on marketing operations, Bhandari believes similar orchestration challenges exist across many enterprise functions including legal review, financial analysis and compliance-heavy operational processes.

KiwiQ was founded by Raunak Bhandari and Anish Bharadwaj, a former Amazon product leader, and is currently part of Foundation Capital’s highly selective IIT Build accelerator.

What Comes Next

Bhandari plans to open-source parts of the platform later this year, including KiwiQ’s API layer, software development kit and a natural-language interface designed for non-technical users.

As AI orchestration frameworks continue to proliferate, the long-term differentiator may not be model intelligence alone — but whether these systems can run reliably in production environments.

For Bhandari, that reliability problem remains the central challenge of agentic AI.

“The hardest problem in AI agents today isn’t reasoning,” he said. “It’s making them dependable enough to operate inside real organisations.”

Business

Sports

Others