How to Win with AI

Decision 6: The Learning System

Design the transformation to compound.

By Sarah Elk, Chuck Whitten, Hernan Saenz, Gene Rapoport, Nicolas Bloch,
Pascal Gautheron, and Anne Hoecker

The operating model delivers today; the learning system defines proprietary intelligence over time.

AI creates a learning dynamic with no historical precedent. Agents deployed at scale generate continuous signal across thousands of simultaneous interactions.
Much of that signal can be captured and fed back into workflows automatically. At the same time, humans remain decisive at the points that matter most—deciding which patterns are worth acting on, when to optimize a process versus redesign it, and what new skills and tools the next generation of agents should be given. It is categorically different from how enterprises have learned before, and it is why early movers pull away from followers faster than in any previous technology wave.
Most AI transformations are designed to generate value but not to learn. The most durable competitive advantage comes from building learning into the program's architecture rather than treating it as a byproduct.
Continual learning can happen at any of three layers: model, harness (code that drives the agent), or context (instructions and skills outside the harness). Enterprise needs to be particularly concerned with the harness and context.
Design every agent workflow with two things, not one: per-agent feedback loops that capture signal continuously and a shared context and memory layer that turns that signal into institutional knowledge every subsequent agent can draw on. Without both, each deployment is a productivity gain that decays when the team moves on; with both, the tenth agent inherits what the first nine learned. Easy to describe, genuinely hard to build.
Agentic activity must be made socially visible across the organization. When AI-driven work happens in shared, observable channels rather than siloed tools, that data becomes an enterprise agentic asset and the whole system learns faster than any human capital training program could achieve on its own.
The tinkerers, domain squads, and frontline users are collectively the best sensing network for what’s working and what isn’t. Building structured pathways from their discoveries back into the program is what separates a learning system from a deployment program.

The compounding effect of agent learning

Every previous wave of enterprise technology created learning, but slowly and through human intermediaries. Someone observed what worked, wrote it up, trained others, and updated the process documentation. AI creates a fundamentally different dynamic. An agent deployed across thousands of customer interactions is continuously generating a signal. The agent can consider what works, what fails, where the edge cases are, and what the data looks like in practice. That signal can be captured automatically, analyzed at scale, and fed back into the agent’s behavior without waiting for a human review cycle. The result is an organization that learns from its operations at a speed and scale without historical parallel. It is why the gap between early movers and followers in AI is widening faster than in any previous technology wave; the advantage is not just positional, it is also compounding.

Most AI transformation programs are designed to deliver value. Very few are designed to learn, and the distinction matters enormously. A program designed to deliver value measures itself by the outcomes it produces, cost reductions, revenue gains, and efficiency improvements. A program designed to learn measures itself by its outcomes and by whether each deployment makes the next deployment smarter, faster, and cheaper. The difference in design is subtle, but the difference in outcome is large. Programs that are designed to learn accumulate institutional knowledge as an asset. They get better at building agents because each agent teaches them what works in their specific data environment, workflows, and organizational context.

Crucially, a program designed to learn does not just let agents tune themselves at the margins. It also puts humans in the loop precisely where the leverage is highest: examining the workflows, skills, and tools the agents use in production, then deciding whether the right move is to optimize the current process or to redesign it entirely and provision a new set of skills and tools around the better design. Agents teach the organization what works in its specific data environment; people decide what to do with that lesson. The compounding comes from deliberately running that loop, not from waiting for it to happen.

Shopify offers a concrete illustration of what it looks like in practice. As their CTO described, the company built a shared internal platform where the data preparation, experiments, and pipelines created by one team are automatically available to every other team that needs them, so the tenth project draws on the foundations laid by the first nine, and the cost and time required to build the next agent fall with every cycle. On top of that platform, they run automated optimization loops in which agents continuously propose and test improvements to existing workflows, while their people focus on the harder problems. In one case, the system ran 400 experiments on a process already considered well-optimized; only one produced a meaningful gain, but that single improvement was one that no human team would have had the time to find. The tools are interesting, but the bigger point is that the architecture is designed so that every deployment makes the next one faster, cheaper, and better.

Madrigal Pharmaceuticals shows the same pattern in a regulated environment. As highlighted by LangChain as one of its success stories, Madrigal’s agentic platform automatically turns every production failure into a new test case and stores every agent’s work in a shared memory layer that the next agent can draw on. The result: Domain experts flag a flaw in agent reasoning one week and see it corrected the next, and use cases that once took weeks to build now ship in hours—agents that improve not because the model changed, but because the system around them gets smarter with every interaction.

Why memory matters to agentic AI

Every agent workflow should be wired from day one to capture signals about its own performance, including the outcome metrics the business cares about, as well as the intermediate signals that tell teams whether the agent is reasoning well, where it is struggling, and which inputs are producing unexpected outputs. It means covering observability (what the agent did) and feedback (whether it was any good) as a design principle built in from the start. The organizations that do it well are intentional about their agents improving continuously after deployment, not just at the point of initial release. Durable competitive advantage is more likely to come from harness architecture, not model selection.

The per-agent feedback loop is necessary but not sufficient. The work that compounds happens in the shared context and memory that lives between your agents and your data. Context is what each agent inherits when it starts a task: the relevant history, the prior decisions, the active state of related workflows. Memory is what the organization has accumulated from acting on that context over time: how the best-performing agent handled a difficult customer, which exception patterns resolved cleanly, what an ambiguous edge case actually meant in your specific business.

Data answers what is true; context tells the agent what is happening right now; memory tells the agent what has worked before. The organizations that get this right design the memory layer as deliberately as they designed the semantic layer, with explicit policies on what gets written, how it gets curated, who can read it, and how stale knowledge gets retired. Most organizations skip this work—not because it’s technically impossible, but because it requires architectural discipline that most enterprises have not built. It produces benefits that show up months later as agents that did not need to be redesigned, rather than as features that shipped today.

The social dimension of learning is equally important and more often overlooked. When AI-driven work happens in isolated tools where each person or team uses their own instance of an AI assistant with no visibility into what others are doing, the organization learns slowly and unevenly. When that work is made visible through shared channels, collaborative workflows, and mechanisms that let people see what their colleagues are building and discovering, the learning accelerates dramatically. The insight that one team discovers in one domain can reach the team working on a related problem in another domain the same day, rather than six months later when someone happens to mention it in a meeting. Designing for that visibility is a deliberate architectural choice and one of the highest-leverage investments a CEO can make in their transformation program.

You will know you have the learning system set up well when it has absorbed everything that can be automated and what remains is the part only people can do.

Finally, the people closest to the work, the tinkerers experimenting at the edges, the frontline users encountering friction in the agent workflows, and the domain squad members discovering what the data actually looks like in practice, are the richest source of signal about where the program should go next. The question is whether your program has the mechanisms to capture and act on that signal, or whether it gets lost in the gap between the people who discover things and those people with the authority to act on them. Building automated pathways from the edges of the organization to the center of the program is what turns a deployment program into a learning system.

You will know you have the learning system set up well when the binding constraint shifts to the human, not because people are doing more of the work, but because the system has absorbed everything that can be automated, and what remains is the part only people can do. Once agents capture the signal, run the experiments, and surface the patterns, the scarce resource shifts from engineering effort or model capability to the human capacity to pose good problems, set the right constraints, and judge which of the system’s proposals are worth keeping. This story is not about removing people from the loop, but about moving them to the part where their judgment compounds, while staying clear-eyed that the tooling for the hardest pieces, registering every agent as an enterprise asset, and discovering reusable patterns across them, does not fully exist yet. Until it does, deliberate human work is what closes the loop.

← Decision 5: Operating Model Decision 7: Governance →

AI TRANSFORMATION: ENTERPRISE GLOSSARY

The Foundation

Agentic AI—An AI system that can plan and execute multistep tasks autonomously toward a defined goal, taking action on systems and data with minimal human intervention. The defining shift from generative AI (chatbots that answer questions) to agentic AI is the move from "tell me something" to "do something."

AI Agent—A specific software entity built around a language model that perceives its environment, reasons about goals, and acts using tools (APIs, databases, other agents). Think of an agent as a digital worker with a defined scope, set of tools, and authority level.

Frontier Models—The most capable AI models at the cutting edge of development (Claude, GPT, Gemini, and similar). They are powerful but expensive and change rapidly enough that any specific model choice is obsolete within months. Most enterprises consume these via API rather than building them.

Probabilistic (vs. Deterministic)—A defining characteristic of AI systems: the same input can produce different outputs depending on context, memory, and model state. Unlike traditional software, which gives the same answer every time, AI is variable. This variability has major implications for testing, governance, and risk management.

Stateful (vs. Stateless)—Agents maintain memory, context, and reasoning state across interactions rather than treating each interaction as independent. State persistence enables long-running workflows but also creates the risk that agent behavior drifts over time as the underlying data and models change.

Compounding—The defining property of AI advantage that did not exist in prior technology waves: Data improves agents, agents improve people, people redesign work, and the redesigned work generates better data. The flywheel turns on its own. The gap between AI leaders and followers gets structurally harder to close every quarter, which is why early movers pull away faster than in any previous technology wave.

The Infrastructure

Enterprise Orchestration—The layer where a company manages its agents, its encoded know-how (skills, tools, and data ontology), and their access to data and systems as a single, governed enterprise asset rather than a scattered collection of vendor-supplied tools. It spans the registries that record what exists, the boundary at which agents access data and systems, and the controls that keep the estate inspectable and changeable. Distinct from agent orchestration, it is the narrower task of coordinating which agent does what within a single workflow.

Harness—The execution infrastructure wrapped around an AI model that turns a probabilistic language model into a reliable working agent. Includes tool integration, memory management, safety controls, and observability. In 2026, the harness became the competitive frontier: The model is the engine, but the harness is the vehicle. Agent = Model + Harness.

Multi-Agent System—A configuration where multiple AI agents work together on a workflow, each handling a specific subtask and passing results downstream. It is common in complex agentic work (e.g., a research agent feeding a writing agent, which then feeds a fact-checking agent). Multi-agent systems are where cascade risk lives.

Model Context Protocol (MCP)—An open standard that defines how AI agents connect to external tools, data sources, and other systems in a consistent, governable way. Now stewarded by the Linux Foundation under the Agentic AI Foundation umbrella. Think of MCP as the USB-C of agentic AI: a common interface that lets any agent plug into any tool without bespoke integration work for each combination.

Semantic Layer—A shared business vocabulary that defines, centrally and consistently, what your key business terms mean—revenue, customer, churn, margin, product—along with the rules and relationships between them. It sits between raw data and AI agents, so every agent reasons from the same definitions. Without it, every agent invents its own dialect of your business.

AI Estate—The full inventory of AI agents, models, tools, integrations, and data pipelines operating across the enterprise. Managing the AI estate is the new infrastructure discipline, analogous to managing the IT estate or real estate portfolio. If you cannot inventory it, you cannot govern it.

Shared Memory Layer—A persistent knowledge store that agents can read from and write to, allowing learning to compound across agents and over time. Without it, every new agent starts from scratch. With it, the tenth agent is faster and smarter on day one than the third agent was after six months.

The Controls

Registries (Agent/Tool/Skill)—Central records of every agent, tool, and skill operating in the enterprise, including purpose, owner, capabilities, and access permissions. If something is not registered, it does not run. The registry is the control surface that enables governance.

Governed Gateway—The infrastructure layer through which every agent tool call routes, enabling policy enforcement, telemetry, and audit trails. Governance happens at this boundary, not inside the agent’s reasoning, which is technically inaccessible to inspection.

Promotion Gates—Automated quality checks that determine whether an agent can move from development to production. Functions like quality control on a manufacturing line: the gate passes or blocks automatically, based on test results, threat assessments, and policy compliance.

Evals (Evaluations)—The structured, continuous testing of AI agents and models to verify they produce accurate, safe, and useful outputs. Unlike traditional software testing, evals run continuously because AI behavior can drift as data shifts and underlying models update. Industry rule of thumb: Building the agent is roughly 20% of the work; evals are 60%; ongoing monitoring is the remaining 20% and never stops.

Golden Test Suites—A defined set of test cases representing known scenarios that agents must pass before reaching production. Like a final exam, an agent has to retake it every time it or its underlying model changes.

Shadow Mode—Running an agent in parallel with existing systems or human decision-makers without acting on its outputs. It is used to validate behavior before giving it real authority. Helps surface edge cases without putting business operations at risk.

The Costs

Tokens/Token Costs—The unit of compute that AI models consume to process and generate text. Every agent action consumes tokens, and at enterprise scale, aggregate token cost becomes a material line item that is invisible to traditional IT cost reporting. Token cost reporting is to AI operations what unit economics are to a manufacturing line.

AI-Native Software Development—A development model where AI agents handle code generation, testing, and documentation while human engineers handle architecture and direction, achieving five to ten times the productivity of traditional development. The function most likely to transform first into an AI-led organization, and the clearest proof point of what an AI-transformed operating model looks like.

The Risks

Prompt Injection—An attack where malicious instructions are inserted into the input an AI agent processes, attempting to override the agent’s intended behavior. Analogous to social engineering against humans, but at machine scale and machine speed.

Cascade Risk—The risk that an error or compromise in one agent propagates through downstream agents in a multi-agent chain, triggering a series of autonomous actions before anyone realizes something is wrong. One of the most underappreciated operational risks in enterprise agentic AI.

The People and Governance

Tinkerers—Employees outside traditional technology roles who naturally experiment with AI tools and discover use cases that no central team would have designed. A leader’s tinkerers are the organization’s sensing network for what’s actually working. Identifying and equipping them is a strategic move, not a culture program.

Two Governance Motions (Run the Business/Change the Business)—The recognition that AI transformation requires a change-the-business governance motion (focused on learning, pivots, multiyear outcomes) running in parallel with existing run-the-business governance (focused on reliability, efficiency, milestone delivery). Running AI through existing governance alone is the single most common reason that transformations stall.

Turn Artificial Intelligence into Proprietary Intelligence

Decision 6: The Learning System

Decision 6: The Learning System

The operating model delivers today; the learning system defines proprietary intelligence over time.

The compounding effect of agent learning

Why memory matters to agentic AI

Table of Contents

Table of Contents

Turn Artificial Intelligence into Proprietary Intelligence

Decision 1: Posture

Decision 2: Domain Focus

Decision 3: Proprietary Data

Decision 4: Technology Architecture

Decision 5: Operating Model

Decision 6: The Learning System

Decision 7: Governance

Start Now

The Foundation

The Infrastructure

The Controls

The Costs

The Risks

The People and Governance

Хотите продолжить обсуждение?

How can we help you?