Your "AI Agent" Is Probably Just a Chatbot. Here's the Test.

Your vendor just demoed their "agentic AI platform." It answered questions, generated a report, and called an API. They're asking for $400K and 18 months. Here's the question they're hoping you won't ask: can it do any of that without you watching?

Gartner estimates that only around 130 vendors out of thousands actually have real agentic capabilities. The rest are selling rebranded chatbots with better marketing decks. The industry has a name for this now: agent washing. And it's costing enterprises more than money.

The numbers paint a grim picture. 42% of companies scrapped most of their AI initiatives in 2025, up from 17% in 2024. Gartner projects that 40% or more of agentic AI projects will be canceled by 2027. The cost isn't the vendor contract. It's the 18 months you can't get back, the organizational trust you burned, and the competitors who used that same window to deploy the real thing.

Why Agent Washing Matters Now

Global enterprise AI spending will hit $2.52 trillion in 2026. That is not a typo. Trillions. And a staggering amount of it is being misdirected.

PwC found that 56% of CEOs report AI produced neither revenue growth nor cost reduction. Only 12% achieved both. Meanwhile, 61% of leaders feel more pressure to prove AI ROI than they did a year ago. The pressure is increasing while the results stay flat. That is a recipe for disillusionment at the executive level, and agent washing is accelerating the cycle.

Here's what's happening on the vendor side. When agentic AI emerged as a real capability in 2025, "agent" became the hottest label in enterprise software. Vendors who were selling chatbots rebranded them as agents overnight. RPA companies added an LLM wrapper and called it agentic automation. Basic workflow tools added a conversational interface and started pitching "agentic orchestration." The underlying technology didn't change. The pitch deck did.

The damage goes beyond wasted budget. When a CTO buys an "AI agent" that turns out to be a chatbot with a nicer UI, the fallout cascades. Engineering teams who integrated the tool lose confidence. Business stakeholders who championed the initiative lose credibility. The next legitimate AI proposal faces a wall of skepticism built from the last one's failure.

The real cost is opportunity cost. Every month your organization spends on an agent-washed product is a month your competitors with real agents use to compound their advantage. In a market moving this fast, 18 months of misdirected effort is not recoverable.

What Real AI Agents Actually Do

Not every AI system needs to be an agent. Chatbots are useful. RPA has its place. The problem is when vendors call those things agents and charge agent prices for chatbot capabilities.

Here are five technical criteria that separate real agents from everything else. Call it the Agent Litmus Test.

1. Autonomy and Proactive Action. A real agent acts without constant human prompting. It initiates tasks based on goals, not just instructions. A chatbot answers "What's the system status?" when you ask. An agent monitors system health continuously and investigates anomalies before you know they exist. The distinction is simple: does it wait for you, or does it work while you sleep?

2. Planning and Multi-Step Reasoning. A real agent breaks down complex goals into sub-tasks, sequences them, and adapts its strategy when something fails. Not executing a predetermined script. Actually reasoning about what to do next based on what just happened. Ask the vendor: if step three of five fails, what does the system do? If the answer is "it stops and alerts a human," that's a workflow tool, not an agent.

3. Persistent Memory and State. A real agent retains context across sessions. It builds knowledge over time. It doesn't start from scratch every conversation. This is one of the hardest problems in agent design, and most systems still get it wrong. If the system can't reference what it learned last Tuesday, you're talking to a chatbot with a fresh context window every time.

4. Tool Orchestration and System Integration. A real agent calls APIs, connects systems, and uses multiple tools in sequence. Critically, it decides which tool to use and when. Not just generating text. Actually taking action in the world. If the demo never leaves the chat interface, that's a red flag the size of a building.

5. Error Recovery and Self-Correction. A real agent recovers from failures without human intervention. When an API call fails, it tries an alternative approach. When data is missing, it figures out where to find it. When a plan isn't working, it replans. This is where most "agents" fall apart. They handle the happy path beautifully and collapse the moment something unexpected happens.

If the vendor demo requires a human to guide every step, you're not looking at an agent. You're looking at an expensive chatbot with a human copilot.

The Red Flags in Vendor Demos

After watching dozens of enterprise AI demos, certain patterns become unmistakable.

Vague "agentic" language without specifics. The vendor says "agentic" repeatedly but cannot explain what the system does autonomously. Press for specifics. "What decisions does the agent make without human input?" Watch how they respond.

Scripted demos only. The demo follows a perfect, rehearsed path. Ask them to go off-script. Ask them to change the input data. Ask them to introduce an error. A real agent handles variability. A scripted demo handles exactly one scenario.

Cannot explain decision-making. Ask why the system chose a particular action. If the answer is essentially "the AI figured it out" with no ability to show reasoning traces, you have no way to audit, debug, or trust the system in production. Black boxes don't survive enterprise governance reviews.

No tool use. The entire demo is text generation. The system produces reports, summaries, and recommendations, but never actually does anything. Never calls an API. Never writes to a database. Never triggers a workflow. Text generation is valuable, but it's not agentic.

Deflection on hard questions. "That's on our roadmap." "Most customers don't need that." "We're working on it for Q3." These are vendor euphemisms for "we can't do that." Real vendors with real agents welcome hard questions. Agent washers deflect them.

Marketing Level 1 as Level 3-4. The vendor claims advanced agentic capabilities but the demo shows basic chatbot interactions with minor automation. They're selling the vision while delivering the MVP. You're paying enterprise prices for a product that's still in development.

Real vendors with real agents welcome hard questions. They'll kill an API mid-demo to show you error recovery. They'll show you failure logs from production deployments. They'll tell you exactly where the system still needs human oversight. Confidence in the product looks like transparency, not deflection.

The Questions to Ask in Your Next Demo

Bring these to your next vendor meeting. They force specifics in ways that agent washers cannot fake.

On autonomy: "Show me the system completing a multi-step task end-to-end without human intervention." "What's the longest sequence of actions it can perform independently?" "What percentage of tasks in a typical deployment run without human input?"

On error handling: "Kill this API connection mid-task. What happens?" "Show me failure logs from a real enterprise deployment." "What's the most common failure mode your customers encounter, and how does the system handle it?"

On memory and context: "How does the system retain context between sessions?" "Show me it referencing information from a previous interaction." "How does it handle contradictions between new information and what it previously learned?" This connects directly to why enterprise context is your moat, not the model. An agent that can't integrate with your accumulated organizational knowledge isn't going to deliver lasting value.

On tool integration: "List every external system this agent can interact with in production today." "Show me the agent orchestrating multiple tools in sequence to complete a single objective." "How do I add a new tool or system integration?"

On deployment reality: "How many enterprise customers have this running in production, not pilots?" "What percentage of deployed use cases run autonomously versus human-in-the-loop?" "What's the most common reason enterprise deployments fail?"

These questions share a common feature: they demand evidence, not promises. If the vendor can answer them with live demonstrations, you're probably looking at a real agent. If they can't, you just saved yourself $400K and 18 months.

Why This Matters Beyond the Budget

The $400K contract isn't the real cost. The real cost is what happens to your organization after it realizes you bought vaporware.

Engineering teams lose trust in AI initiatives. They spent months integrating a tool that didn't deliver. The next AI proposal gets met with eye rolls instead of enthusiasm. Leadership becomes skeptical of future AI investments. The board starts asking harder questions about ROI, not because AI doesn't work, but because the last bet didn't.

Meanwhile, competitors with real agents pull ahead. McKinsey found that 62% of organizations are experimenting with agents, but only 23% are scaling them. MIT reports that 95% of GenAI pilots fail to reach production scale. The gap between experimenting and scaling is where agent washing inflicts its worst damage. It keeps organizations stuck in the experimentation phase, burning cycles on tools that will never scale.

This connects to a broader thesis. Your enterprise context and data remain your moat. Real agents integrate with your context. They learn your processes, connect to your systems, and build on your accumulated organizational knowledge. Agent-washed products promise that integration is "coming in Q3." It never comes.

The organizations that will lead in the next three years are the ones deploying real agents now. Teams will increasingly be generated, not hired. But you can't generate effective agent teams with chatbots wearing agent costumes. Agent washing erodes organizational trust at the exact moment when real agentic capabilities are finally becoming possible. That's the cruelest part. The technology is ready. The market confusion is what's holding enterprises back.

The Clear-Headed Take

Most "AI agents" on the market are rebranded chatbots. That is the uncomfortable reality behind the industry's hottest buzzword. Real agents demonstrate autonomy, planning, persistent memory, tool orchestration, and error recovery. Most products demo zero to one of those five.

Use the Agent Litmus Test in your next vendor evaluation. Ask the hard questions. Demand live demonstrations, not slide decks. Watch what happens when you go off-script. The vendor's reaction to pressure tells you more than their product ever could.

Real AI agents exist. They're being deployed in production right now by enterprises that took the time to separate signal from noise. But the signal-to-noise ratio is brutal. Gartner says roughly 130 out of thousands of vendors have the real thing. That means the overwhelming majority of "agent" products you'll evaluate are agent-washed.

Next time a vendor says "agentic," ask them to define it against these five criteria. If they can't, you just saved yourself 18 months and a very expensive lesson in enterprise disappointment.

What red flags have you seen in your own vendor evaluations?