Building AI that knows what it doesn't know
AI agents are superhuman bluffers. Professors David Teece and Mary-Anne Williams explain what it takes to build ones that know when they don't know
Working with agentic AI can feel less like collaboration than like playing poker against a Jedi who knows more than you, can read your every move, and never has to show their cards. The system generates fluent, confident answers, whether or not it actually knows what it is talking about. It has no model of you, your expertise, your blind spots, or how heavily you are leaning on it. It treats a first-year student and a thirty-year domain expert exactly the same.
You are the only safety net. And the AI does not know it needs one.
This is a structural failure designed into the way these AI systems are trained, evaluated and deployed. The deeper problem is strategic: agentic AI is missing the capabilities that matter most when the stakes are high, the capacity to sense the limits of its own knowledge, to respond appropriately to those limits, and to change its approach when circumstances shift. The encouraging news is that the same body of research that explains the failure also points to the solution.
Why three decades of strategy research matter now
The intellectual foundation for understanding this strategic failure is among the most influential in all of modern management. Teece, Pisano and Shuen's paper, Dynamic Capabilities and Strategic Management, is one of the most cited papers in the social sciences, with more than 67,000 citations. Together with a 2007 paper (Business models and dynamic capabilities) that set out its microfoundations, it established the framework that has shaped how the world's leading organisations navigate uncertainty for a generation. In volatile environments, advantage is determined less by the size of a firm's resource base than by its capacity to sense opportunities and threats, seize them through decisive commitment, and transform itself by continuously reconfiguring assets, structures and capabilities.

That framework proved essential to understanding the first internet revolution. The firms that thrived could sense shifts in technology and customer behaviour, seize platform opportunities through bold business-model innovation, and reconfigure their operations to orchestrate ecosystems of complementary assets. Firms that failed, even those with vast resources, lacked the dynamic capabilities to adapt. The lesson was that, in a world of uncertainty, organisations need more than operational excellence. They need the capacity to recognise what they do not know, and to act on that recognition.
Agentic AI is a disruption at least as profound as the internet, and arguably more dangerous, because of a fundamental asymmetry. The microfoundations of dynamic capabilities, the skills, processes, decision rules and structures that let an organisation sense its own knowledge limits, know when to seek outside expertise, and reconfigure when assumptions prove wrong, are precisely the capabilities agentic AI lacks. Seen through this lens, the path to a solution becomes legible: if the problem is an absence of sensing turned inward, the answer is to give the agent an internal model it can sense against, and to design the human-AI relationship so that it builds capability rather than quietly eroding it.
The bluffing problem is baked in
In a landmark paper published in Nature, researchers from OpenAI and the Georgia Institute of Technology delivered the most rigorous explanation to date of why large language models hallucinate. Their conclusion was uncomfortable: hallucination is not merely a matter of bad training data. It is an inevitable consequence of how these models are built and measured.

Two compounding forces are at work. First, next-word prediction, the foundational training objective, creates statistical pressure toward hallucination even with idealised, error-free data. Facts that appear only once or rarely in training lead to unavoidable errors, whereas recurring patterns such as grammar and spelling do not. The authors formally proved that hallucination rates are bounded below by the fraction of training facts that appear exactly once.
Second, and more perniciously, the evaluation systems used to rank models actively reward guessing over honesty. In a meta-evaluation of the most influential AI benchmarks, the researchers found that nine in ten use binary scoring, where a confident wrong answer and a principled "I don't know" receive the same score: zero. Under that regime, guessing is always optimal. Even at one per cent confidence, guessing (expected score 0.01) beats abstaining (score 0). A model that bluffs will always outrank a model that tells the truth. Honesty is punished; overconfidence is rewarded.
AI is most confident when it should least be trusted
The scale of the problem is now well documented. Testing 26 leading models, the Stanford HAI 2026 AI Index found sycophancy-induced hallucination rates ranging from 22 to 94 per cent: when users expressed a false belief, models agreed rather than corrected them at rates as high as 94 per cent. The AI Incident Database recorded 362 serious incidents in 2025, a 55 per cent jump on the prior year, while research from Stanford's RegLab found large language models hallucinate on 69 to 88 per cent of legal queries.
Learn more: How AI is changing work and boosting economic productivity
The downstream costs are already material. A Melbourne Business School and KPMG study of 48,000 people in 47 countries found that 66 per cent of people relied on AI output without evaluating its accuracy, and 56 per cent reported making mistakes in their work because they trusted unverified AI outputs. The problem compounds at the model level. Similarly, a Carnegie Mellon University study found that LLMs remained persistently overconfident even when wrong, failing to recalibrate after poor performance the way human participants did.
AI is most confident precisely when it is most wrong, destroying the trust signals humans rely on to catch errors.
The missing map
Every AI agent's knowledge can be divided into four regions: reliable knowledge (high confidence, well grounded), uncertain belief (partial evidence, revisable), recognised ignorance (identified gaps) and unrecognised ignorance, the dangerous zone where the system does not know what it does not know. The critical failure is that today's AI cannot distinguish these regions. It generates responses from all four with identical confidence.
"An agent that bluffs is bad enough. An agent that makes its users feel expert while hollowing out their expertise is a strategic liability of a different order"
This amounts to five interrelated challenges:
- A competency challenge: AI cannot map the boundaries of its own competence.
- A risk challenge: it conflates imprecise likelihood with precise probability, because its estimates relate to tokens rather than to concepts and causal links.
- An uncertainty challenge: it cannot distinguish what it does not know from what it does.
- An incompleteness challenge: it cannot detect what is absent, or judge whether the missing piece would change its conclusion, and
- An orchestration challenge: multiple agents must coordinate, communicate their limits and stay aligned with humans, and these capabilities remain largely unsolved.
In the language of dynamic capabilities, all five are failures of sensing, and the failures cascade. Where a capable organisation would pause, commission research or escalate, AI generates a confident answer and moves on. Its attempts to seize and to respond when it hits a knowledge gap are workarounds rather than self-aware decisions: escalation exists, but as preset rules rather than genuine gap recognition; hedging exists, but is ungrounded in real self-assessment. And it cannot transform its approach when conditions change, because orchestration runs on fixed plans. It is an organisation with its environmental scanning turned outward only: it surveys the world but never surveys itself.
The greater danger: false mastery and capability atrophy
The main risk here is that AI used as a crutch quietly erodes the human capabilities that dynamic capabilities depend on. The OECD's Digital Education Outlook 2026 warns that successful task completion with generative AI does not automatically translate into learning, and flags the risk of "metacognitive laziness" when people outsource thinking to general-purpose tools. Field evidence in high school mathematics points in the same direction: unguided access to AI tutoring can improve immediate performance but reduce later performance once the tool is removed, unless the system is deliberately designed with learning safeguards.

In firms, the same dynamic threatens the microfoundations of strategy itself. If AI substitutes for sensing, judgement and experimentation rather than scaffolding them, organisations court "capability atrophy": short-run output gains coupled with long-run strategic fragility. When organisations evaluate outputs rather than processes, AI use breeds a dangerous false mastery, the appearance of competence without the underlying capability. An agent that bluffs is bad enough. An agent that makes its users feel expert while hollowing out their expertise is a strategic liability of a different order.
From tool to colleague: AI as mentor, coach and partner
If AI is only software, competitive advantage is mainly a question of access, cost and integration. But agentic systems with proactive planning, memory, tool use and reflection, are increasingly used in relational modes: as colleague, guide, manager, mentor, coach, partner and, in some settings, leader. Once that shift occurs, advantage depends on how organisations design collaboration to develop human capability rather than replace it. This is the focus of our work presented at the Strategic Management Society's 2026 conference (Williams & Teece, 2026): a framework for building human-AI collaboration as a dynamic capability.
Subscribe to BusinessThink for the latest research, analysis and insights from UNSW Business School
Three relational roles do distinct strategic work. AI-as-mentor develops the person, not just the task, through guided reflection and Socratic questioning. It strengthens the micro-foundations of sensing (what to attend to) and transforming (how the actor learns and evolves). AI-as-coach improves performance through goal-setting, deliberate practice and feedback loops, strengthening seizing and the institutionalisation of better routines. AI-as-partner co-produces artefacts and executes delegated work, accelerating the experimentation that seizing and transforming require. Each role builds capability only when explicitly framed to match the task; misaligned framing increases false mastery and overreliance.
What separates organisations that gain from those that hollow out is what we call Human–Agentic Collaboration Capability (HACC), a developable, higher-order capability for reliably creating value with agentic AI by orchestrating autonomy, verifying outputs and learning across cycles. HACC is not basic AI fluency. It is a bundle of routines that diagnose which tasks suit AI mentoring, coaching or partnering; design the interaction; calibrate reliance to risk; verify and triangulate outputs; and renew routines through retrospectives.
It rises along a capability ladder, from the consumer who asks for answers with little verification and high false-mastery risk, through the co-thinker, verifier and orchestrator, to the architect-governor "superuser" who designs workflows, permissions, auditability and team norms, and institutionalises learning loops. The performance gap between basic users and superusers is widest precisely where it matters most: high task complexity combined with high AI autonomy.
"The goal here is to make AI intelligent enough to know when it is not intelligent enough and to act on that knowledge by handing it back to a human who can"
The firms that win the agentic era will be those that build internal superusers, embed mentoring and coaching guardrails into their AI systems, and treat verification, auditability and learning loops as capabilities rather than compliance afterthoughts.
Fixing the incentives: open rubrics
The researchers behind Evaluating large language models for accuracy incentivises hallucinations (published in Nature) propose "open rubric" evaluations, in which the scoring rule is stated explicitly alongside the question, for example, "correct answers receive 1, incorrect answers −1; abstain if less than 50 per cent confident." This simple change transforms the incentive landscape: accuracy is no longer at odds with humility, because a model can guess when instructed to and abstain when not, exactly as students guess more freely when wrong answers carry no penalty.
Tested across four frontier reasoning models, Google's Gemini 3 Pro, OpenAI's GPT-5, xAI's Grok 4 and Anthropic's Claude Opus 4.5, a consistency-based mitigation outperformed the baseline across all models and penalty levels under open rubrics, whereas under traditional closed scoring, the same mitigation actually reduced headline accuracy, discouraging its adoption.

The lesson here aligns with existing strategy literature: incentive structures shape behaviour at least as powerfully as capabilities do. A firm cannot build dynamic capabilities if its metrics punish honest assessment of failure; the AI industry has built exactly that into the evaluations that decide which models ship. Open rubrics fix the measurement problem, but they do not, on their own, give the agent the internal awareness it lacks. For that, the architecture has to change.
The agentic digital twin: a model of itself, and of you
For an AI to act as a genuine mentor, coach or partner rather than a bluffing crutch that breeds false mastery, it needs two things current systems do not have: a model of itself, and a model of you. The architecture that supplies both is what we call the agentic digital twin.
The digital twin began in engineering as a live virtual replica of a physical asset, continuously updated from sensor data so operators could test and predict without touching the real thing; it has since matured into cognitive and enterprise digital twins that mirror whole organisations. The next move is to turn it inward and place it inside the agent. An agentic digital twin maintains two continuously updated internal models.
The first is a twin of the agent's own knowledge, a live map of the four regions, including the boundary beyond which lies unrecognised ignorance, so the agent can locate a query within its own competence and respond accordingly: answer with confidence, hedge with grounded uncertainty, or hand back to a human. The second is a twin of the collaborator and context, a model of the human's expertise, reliance and exposure to a bad answer, so the agent can calibrate how it mentors, coaches or partners with this person on this decision.
Learn more: AI agents as colleagues: the workplace design nobody’s planning for
These twins are the machine-side complement to HACC. Where HACC is the human capability to orchestrate, calibrate, and verify, the digital twin is the agent's capacity to be honestly orchestrated, calibrated, and verified, because it can finally see and disclose the edge of what it knows and what it assumes about you. Together, they constitute a genuine human-AI collaboration dynamic and close the loop opened by the diagnosis.
The self-twin restores sensing: an agent that models its own competence can sense its limits instead of bluffing past them. The two twins enable real seizing: escalation and help-seeking become deliberate, gap-aware decisions rather than ad hoc rules. And because both twins are continuously updated, they support transforming: the agent can recalibrate under changing conditions and, through a shared model across agents, coordinate when any one of them reaches its limit. This is the focus of our ongoing Discovery Project, funded by the Australian Research Council.
The goal is not smarter AI
The goal here is to make AI intelligent enough to know when it is not intelligent enough and to act on that knowledge by handing it back to a human who can.
The firms that thrive in the AI era will be those that use AI-generated knowledge to augment human capability rather than simply replace it. The same principle applies to the systems themselves: the goal is not AI that replaces human judgement but AI that strengthens it by being transparent about its own limitations: AI designed to mentor and coach rather than quietly de-skill, and engineered, through its digital twins, to know when to defer. The combination of self-aware AI and human judgement is safer than either alone.

Achieving it requires rethinking not just how we build these systems, but also how we evaluate them, deploy them, and govern the collaboration between people and machines. The dynamic capabilities framework, developed to help organisations navigate the uncertainty of the internet age, now offers the strategic foundation for navigating the deeper uncertainty of the AI age. The question for every board, every executive team and every professional relying on these tools is whether they are prepared to be the safety net the AI does not know it needs and whether they will demand the architecture and design of the collaboration that finally lets the AI carry part of that weight itself.
David Teece is Professor at the Haas School of Business, University of California, Berkeley. He is the world's most cited scholar in business and management, with over 260,000 citations. His foundational work on dynamic capabilities, profiting from innovation and digital platform strategy has shaped strategic management theory and practice for three decades. Mary-Anne Williams is Professor of Business AI at UNSW Business School and a Fellow at CodeX, Stanford University. She leads the UNSW Business AI Lab. Their joint research on dynamic capabilities for agentic AI is funded by an Australian Research Council Discovery Project.