What happens when AI agents refuse to work until they get paid

Olivier Wulveryck proposes replacing local, isolated AI agents with a centralized 'agentic mesh' where agents charge one another via internal credits before executing any task. The key: combining the A2A protocol for orchestration and AP2 for payments, building an internal economy with cryptographic mandates.

By Unladen Swallow (Olivier Wulveryck's blog) · June 25, 2026.

**The problem: AI agents isolated on developers' laptops**

Olivier Wulveryck opens his article with a direct provocation to the current consensus: giving every developer a powerful local AI agent seems like the ultimate productivity lever, but for organizations operating at scale it is, in reality, a governance and cost trap waiting to spring. The author precisely defines what scaling the SDLC with AI means: bringing AI-assisted development to N teams working on M products, where both N and M exceed 10. This is not about the internal dynamics of a single team, but about truly multi-product organizations.

The current model, according to Wulveryck, is one of monolithic, isolated agentic loops. Tools like GitHub Copilot or Claude run within the individual developer's context, with no centralized visibility, no intelligent model routing, and no mechanism for compensation between teams. That works for a small team, but at scale it generates three structural problems the author develops in detail.

**Problem 1: Architectural drift guaranteed statistically**

The first problem is mathematical in nature. LLMs are probabilistic: any company directive —an architecture decision, a GDPR compliance rule, a security standard— is respected only a percentage of the time. If that failure rate is 10% and it scales to more than ten teams running thousands of iterations, the organization has the statistical certainty that some team will end up shipping code that breaks the global rules. Wulveryck calls this 'massive architectural drift.'

Deterministic guardrails can be built through hooks and validation programs, the author argues, but if those mechanisms run locally on developers' laptops, centralized observability is lost. The CTO or Principal Engineer is ultimately responsible for the organization's software; they cannot settle for 'trusting the team,' they need systemic guarantees. How can a CTO confidently certify what is being deployed when the enforcement mechanisms are scattered and invisible?

**Problem 2: Linear cost scaling with no control**

When AI agents operate locally, the organization loses control over the execution model. Developers are locked into a one-size-fits-all approach: a specific task may work perfectly on a mid-range LLM but fail on a low-cost one, and tools like Copilot or Claude offer no simple way to dynamically route requests to the most cost-effective model according to task complexity. As a result, the organization pays a premium for every call its local agents make. Without centralized caching or intelligent model routing, that cost scales linearly with the number of developers and iterations, quickly becoming an exorbitant expense.

**Problem 3: The internal economy with no answer**

The third problem is financial and organizational. If a developer builds a highly effective AI skill that is later adopted by multiple teams, who absorbs the execution costs? The decentralized model provides no answer. A mechanism is needed to track usage accurately and manage cross-team chargebacks, thereby compensating the teams that build these shared organizational assets.

**The solution: the centralized agentic platform**

To address these three challenges, Wulveryck proposes moving from local black boxes to centralized services. A real agentic platform should manage AI queries dynamically —optimizing models and using caching to control costs at scale—, maintain a financial ledger for cross-team chargebacks, and have an audit log to ensure architectural compliance.

The rest of the article is a step-by-step demonstration of how this architecture could work, relying on two open-source standards: the Agent-2-Agent (A2A) protocol for orchestration and governance, and the Agent Payment Protocol (AP2) for managing the internal economy.

**The scenario: Winston, the local architect**

To make the demonstration concrete, Wulveryck poses a use case: a Product Manager or Tech Lead on a stream-aligned team needs to design a new feature. They turn to their local AI agent, 'Winston' (a character that users of the BMAD framework already know), to design the implementation. The starting prompt is simple: 'For this feature, we need to send 50,000 transactional emails per day.'

Winston runs entirely locally. He is smart, knows the general principles of software architecture, and has guardrails to escalate critical compliance issues such as GDPR. But he operates in a silo: he has a huge blind spot regarding the business context and zero knowledge of the internal components that already exist in the organization.

**The query to the Enterprise Architecture Service via A2A**

Winston understands the technical requirements but is blind to the existing ecosystem. To bridge that gap, he must turn to the centralized platform: the Enterprise Architecture Service, which acts as the organization's brain for standards, blueprints, and reusable blocks. This service is fully automated, managed by a centralized AI agent, and highly optimized.

The agents do not communicate with human prompts to each other; they do so through the A2A (Agent-to-Agent) protocol, a standardized way to query tasks and exchange states. Winston wraps the request in an A2A message with a ceiling_credits field of 1,000 credits and sends it to the Architect Agent.

**The HTTP 402: the agent refuses to work without payment**

Here lies the heart of the article and the hook of the headline. Centralized intelligence is not free: like any internal product, it requires resources to operate. Before processing the request, the Architect Agent evaluates the computational cost. Finding no proof of payment in the incoming message, it halts the request at the payment gateway.

The Architect Agent asks its own LLM to estimate the cost in tokens, generates a unique payment_ctx_id, and responds with an A2A task in 'input-required' state. Wulveryck describes it as an agentic '402 Payment Required.' The payload includes: ceiling_amount of 800 credits, price_per_token of 1, task_type 'architecture-consultation', currency 'CREDITS', the payee (the architect's account), the payment agent's URL, and the payment_ctx_id.

**Escalating the payment with human oversight**

Winston parses the input-required state and the payment data block. He evaluates isPaymentRequired(task) as true. But Winston is not programmed to spend the team's budget blindly: he lacks the autonomy to authorize financial transactions on his own, so he escalates the request to the human.

The human reviews the budget and validates the transaction with a strict limit: the authorization covers only those credits for that specific task. Wulveryck notes that in the future a learning mechanism could be implemented to let Winston automatically approve spending for routine or trusted tasks, without human intervention.

**The Agent Payment Protocol (AP2) and cryptographic mandates**

With human approval secured, Winston initiates the AP2 protocol. AP2 is an open standard designed for AI agents to execute transactions autonomously and securely. Instead of relying on a human clicking a 'pay' button, AP2 uses cryptographically signed Mandates. When a user sets a budget or approves a cost budget, it generates a mandate that grants the agent verifiable and strictly bounded authority to spend.

The AP2 flow works as follows:

1. **Checkout mandate**: Winston creates and seals a checkout mandate. In traditional e-commerce, this step locks the items in the cart. Here there is no real 'cart,' but the step is not trivial: it acts as a cryptographic agreement with the Architect Agent's quote, irrevocably binding the specific task ('architecture-consultation') to the agreed price (800 credits).

2. **Payment mandate**: Winston generates a payment mandate that instructs the platform's ledger to place a hold on the necessary credits from the team's budget. In response, the internal payment service issues an HMAC-signed token. This token acts as portable cryptographic proof of payment, securely binding the transaction amount, the parties involved, and the unique payment_ctx_id.

3. **Resubmission with proof**: Winston resubmits the initial architecture request, this time attaching the mandate IDs and the cryptographic proof. Before doing any computational work, the Architect Agent queries the payment broker to verify the mandates. Because the payment credentials are generated by the buyer (Winston) and not the seller, the system is cryptographically protected against forgery.

**The A2A multi-turn conversation: clarification before the recommendation**

With the payment verified, the architect begins the real work. It is not a simple request/response, but a multi-turn A2A conversation. The architect asks clarifying questions: 'What is the exact daily volume? Transactional or marketing? Any regulatory constraints?' Winston responds with the business context. The architect iterates, refining its understanding before making a recommendation.

The A2A dialogue takes place while there is an input_required state from the 'enterprise architect.' When the architect has enough information, it changes its state to 'working' and indicates that it will consult the Domain Agent to verify feasibility.

**The agentic mesh in action: querying the domain**

This step, according to Wulveryck, is 'the icing on the cake.' Instead of relying solely on its internal training data —which could be outdated or incorrect regarding the legacy notification system— the central Architect Agent dynamically delegates the technical-feasibility query directly to 'Winston@domain', the specialized agent that manages the Notifications bounded context.

The domain-expert agent evaluates the request and responds: '50,000 emails/day is feasible, but it requires a quota increase and strict template validation.' Wulveryck frames this as Domain-Driven Design (DDD) applied to AI: the domain owner validates local feasibility, allowing the central architect to make a safe, systemic decision.

**Decision, artifact, and financial settlement**

With all the inputs —requirements, GDPR compliance, and the domain agent's assessment—, the Architect Agent calls its LLM one last time and emits two consecutive events on the same task:

- An **artifact** with the structured decision: use the internal notifications platform via POST /emails at https://api.lambda.internal/notifications/v1, with OAuth2 client_credentials, and with two prerequisites (quota_increase_required, template_validation_required).

- A **payment settlement** before marking the task as completed. This is a direct HTTP call to the Payment Agent, not an A2A message. The Architect Agent sends the actual_amount consumed: 620 credits (versus the 800 initially estimated).

The Payment Agent verifies that the mandate exists and is closed, that the actual_amount (620) is less than or equal to the ceiling_amount (800), and that the task type matches. It then releases the hold on Winston's account, returns the difference (800 − 620 = 180 credits), transfers 620 credits to the Architect Agent's account, and returns a signed settlement token.

Only then does the Architect close the A2A task, and Winston receives the 'completed' state with the consumption metadata: ap2_actual_amount: 620, ap2_tokens_consumed: 620.

**The full picture: A2A + AP2 + ADRs**

Wulveryck summarizes the system's three layers:

- **A2A** enables true delegation between agents, not simple tool calls, but autonomous conversations with structured intent. - **AP2 + the 402 code** ensure fair internal pricing: agents refuse to work until they are paid, mandates provide portable cryptographic proof, and a neutral internal broker settles the accounts securely. - **ADRs + cryptographic proofs** make every architectural decision fully auditable and deterministically verifiable, from the initial request to the final financial settlement.

**Critical assessment and state of the art**

The author himself explicitly warns that this workflow represents 'a possible near future' rather than the current industry standard. AP2, as he implements it in the proof of concept, was primarily designed for global agentic commerce and brings with it e-commerce concepts such as the checkout phase that are not strictly necessary in an internal enterprise platform. Wulveryck acknowledges having implemented it in full to demonstrate what real, secure autonomy between agents would look like.

As sector context, the A2A protocol was proposed by Google in 2025 as an open standard for communication between AI agents, and it has gained significant traction in the developer community as an alternative or complement to Anthropic's MCP (Model Context Protocol). The Agent Payment Protocol (AP2) is less well known and more experimental, aimed precisely at solving the economics of multi-agent systems.

**Implications for enterprise agentic AI**

Wulveryck's article touches on a real and growing problem that many organizations are beginning to experience: as the adoption of AI agents becomes widespread within companies, the governance, cost, and compliance models that work for a small team start to break down. The proposal of an 'agentic mesh' with an internal economy addresses several risk vectors simultaneously:

1. **Governance**: by centralizing architecture and compliance services, a single auditable control point is created instead of N opaque points distributed across laptops. 2. **Cost**: intelligent model routing and centralized caching make it possible to optimize LLM spending at the organizational level, rather than paying a premium for everything. 3. **Incentives**: the internal credit economy creates the right incentives for teams to build reusable AI tools, knowing they will be compensated when other teams adopt them. 4. **Traceability**: every architectural decision is tied to a cryptographically verifiable transaction, creating a complete audit trail.

**Risks and friction in the proposal**

The proposal is not without friction. Introducing a payment layer —even with virtual credits— into the development workflow adds latency and complexity. The human-approval process for each financial transaction, while logical from a governance standpoint, could become a significant bottleneck if an automatic-approval system for routine tasks is not implemented. Wulveryck acknowledges this and points to automatic learning of spending patterns as a future solution.

In addition, adopting standards like A2A and AP2 requires multiple parts of the organization —platform teams, product teams, finance— to collaborate on designing the internal economic system, which is an organizational challenge as much as a technical one.

**Foresight**

Wulveryck's article points in a direction the sector seems inevitably destined to explore: AI agents will not operate only as individual productivity tools, but as participants in service ecosystems with their own economies. The question is not whether organizations will need these governance and pricing mechanisms between agents, but when and with what standards.

The combination of A2A for semantic orchestration and AP2 for economic settlement offers a conceptually solid framework. If these standards gain industrial adoption —particularly A2A, which has Google's backing— we could see the emergence of enterprise platforms implementing variants of this architecture over the next two to three years.

Sources & references

Unladen Swallow (blog de Olivier Wulveryck) — What happens when AI agents refuse to work until they get paid