Enterprise AI

AI agent costs just went metered. The problem is nobody owns the meter.

AI agent costs flipped from flat per-seat licenses to metered per-token bills in 2026. The real failure isn't the price — it's that no one in the org owns the meter.

8 min read

On June 1, GitHub stopped absorbing the cost of running Copilot. Every plan moved to usage-based billing — premium request units replaced by “AI Credits” that drain with every token an agent reads or writes. Within hours, developers were posting receipts: a single request burning more than $6, a 7,000-unit quota on track to empty in under two days, 16% of a monthly Pro+ allowance gone on one task. The headlines called it a price hike. It wasn’t, exactly. The price was always there. GitHub just stopped paying it for you. AI agent costs didn’t spike so much as get handed over.

That’s the story worth your attention, because the same handoff is landing on every agentic product at once, and almost no one budgeted for it.

TL;DR: AI agent costs flipped this year from a flat per-seat license to a metered, per-token bill — GitHub Copilot is just the loudest example. The real failure isn’t the new price; it’s that no one in the org owns the meter. Finance budgeted seats. Engineering ships agent loops that consume hundreds of thousands of tokens a session. Operations hits the cap. Nobody forecasts or reconciles the consumption, because that process never existed when software cost the same every month. Fix the metering before you negotiate the bill.

What actually changed at GitHub

The mechanics matter, so be precise about them. As of June 1, 2026, GitHub Copilot’s premium request units are gone, replaced by AI Credits consumed “based on token usage, including input, output, and cached tokens, according to the published API rates for each model.” Each plan now comes with a fixed dollar allowance of credits — $10/month on Pro, $39 on Pro+, $19 per user on Business, $39 per user on Enterprise (with promotional bumps to $30 and $70 through August). Spend past the allowance and you pay by the token.

GitHub’s rationale, in its own words, is honest and worth quoting: Copilot has become “an agentic platform capable of running long, multi-step coding sessions” with “significantly higher compute and inference demands,” and under flat pricing “a quick chat question and a multi-hour autonomous coding session can cost the user the same amount.” GitHub had been absorbing the difference. It stopped.

The backlash, captured by The Register on June 2, is the sound of a flat-rate assumption meeting a variable bill: one developer reporting $6 gone on a single request, another watching a quota built to last a month drain in two days, a third posting “1,180 credits used. 16% of my monthly Pro+ allowance. Gone. For basically nothing.” The common thread isn’t that Copilot got worse. It’s that the cost was invisible until someone put a meter on it.

This isn’t a GitHub story. It’s the whole category.

If this were one vendor being greedy, you could switch and move on. It isn’t. The seat-price era of AI is ending across the board because the underlying economics don’t fit a flat fee.

EY’s 2026 analysis of agentic token costs puts numbers on it. A customer-service interaction that cost about $0.04 in 2023 — input, retrieval, response — costs roughly $1.20 in 2026 once it becomes an orchestrated agent that plans, calls tools, and loops. That’s near 30 times more for what looks like the same task. The driver is consumption: “agentic workflows can consume hundreds of thousands of tokens in a single session,” EY notes, “much more than the hundreds needed to support a more traditional generative AI chat experience.” EY’s framing of the pricing shift is the whole point — from “fixed-price seat” to “metered token consumption.”

The vendors are already repricing around it. On April 9, 2026, ServiceNow collapsed its separately-sold AI products into three tiers with embedded AI and token-based consumption baked in — a direct response, per TechTarget’s reporting, to enterprises that can’t get AI into production and prove ROI. The analyst line from Moor Insights & Strategy in that coverage lands: “AI right now is sort of like a cart full of groceries without a meal to make.” The grocery bill, though, still rings up by the item now.

The real problem: a meter with no owner

Here’s the assertion that matters. The hard part of metered AI agent costs isn’t the rate. It’s that token consumption is a number no one in the org is responsible for.

Walk the seam. Finance signed off on a budget built from seat counts — a clean, flat, forecastable line. Engineering builds the agents, and a well-meaning loop that retries on failure or re-reads context every step can quietly multiply token burn by an order of magnitude without anyone noticing until the bill arrives. Operations is where it surfaces, usually as work stopping mid-month because a cap was hit. Three teams touch the cost. None of them owns the reading.

I’ve written before that shadow AI is an inventory gap, not a policy gap — you can’t govern what you can’t see. Metered cost is the same shape one layer over. You can’t forecast what you don’t instrument. Most orgs meter AI at the seat (“we have 200 Copilot licenses”) when the bill is now generated per token, per workflow, per agent loop. The dashboard shows users. The invoice shows consumption. Those are different numbers, and the gap between them is exactly where the surprise lives.

Seat era (until ~2026)Meter era (2026 on)
Unit of billingPer user, per month, flatPer token — input, output, cached
PredictabilityFixed; easy to budgetVariable; scales with work done
Who absorbs the computeThe vendorYou
A heavy session vs. a light oneSame costOne request can burn $6+
What the dashboard tracksNumber of seatsShould be tokens per workflow — usually isn’t
Who owns the forecastProcurement (seat count)Usually no one

Why “it replaces a headcount” is the wrong ROI model

The metering shift also breaks the business case most agent projects were sold on. If you justified an agent by the salary it would replace, a per-token bill is a problem, because now you’re carrying a variable cost you didn’t model against a fixed saving you may not realize.

Gartner has been saying the quiet part for a while. Its June 2025 forecast projected more than 40% of agentic AI projects would be canceled by the end of 2027 — driven by escalating costs, unclear business value, and inadequate risk controls. Its May 2026 follow-up is sharper: autonomous business and AI-driven layoffs “may create budget room but do not deliver returns.” Cutting a name from payroll frees budget. It doesn’t, by itself, make the process cheaper or better — and if the agent doing the work now runs on a meter, you may have traded a predictable salary for an unpredictable bill.

The ROI that survives a metered world is the same one that always held up: AI earns its cost when it removes a step from a process — a status chase, a re-keyed record, a reconciliation done by hand — not when it removes a person. A removed step lowers consumption and labor at once. A removed headcount with the same broken process underneath just moves the cost from the payroll line to the token line, and makes it harder to see.

The working version: meter the process, then the bill

None of this is an argument against agents. It’s an argument about order of operations — and it’s the same one I bring to every integration job.

Instrument consumption per workflow, not per seat. You can’t manage a number you don’t collect. Tag token usage to the agent and the workflow that spent it, so “the Copilot bill” becomes “the code-review agent costs X, the migration agent costs Y.” That’s the difference between a surprise and a forecast.

Route by task. Sending every step to the most capable model is the single most expensive default in agentic AI. Routine extraction and classification run fine on cheap models; reserve the frontier model for the steps where it actually changes the answer. The routing decision, not the vendor’s price list, is where most of the bill is set.

Cap by workflow, with a named owner. A cap on a user is a blunt instrument that stops someone’s work at a random moment. A cap per workflow, owned by one person who reconciles the forecast against the invoice, turns the meter from a monthly shock into a managed line. This is the role that doesn’t exist on most org charts yet, and it’s the one that pays for itself.

Make the consumption legible before you renegotiate. When the same fact — what a workflow costs to run — is current in the vendor portal, stale in the budget spreadsheet, and absent from the team actually triggering the runs, no amount of haggling on the rate fixes it. That agreement about what gets measured and who owns it is a data contract, same as the ones between systems. It’s the unglamorous layer that decides whether the technology works, and it’s the work I do first on every engagement.

Done in the other order — negotiate the price, then discover the consumption — you’ve just locked in a meter you still can’t read.

The operator read

GitHub didn’t break Copilot. It did something more useful: it made the cost visible. For two years the token meter was running and the vendor was eating it, so the org never had to build the muscle to forecast or own consumption. That muscle is now mandatory, and it’s missing in most companies.

The bottleneck was never the model, and it isn’t the price either. It’s that AI quietly became a utility — billed by the unit, like power or compute — while the org still manages it like a flat subscription. The throughline of this work holds here too: the technology arrived, and the process underneath it didn’t. If your AI agent costs jumped this quarter and no one can tell you which workflow spent the money, that’s not a pricing problem. That’s the conversation worth having.

FAQ

Why did GitHub Copilot costs suddenly jump in 2026?
Because GitHub stopped absorbing the compute. On June 1, 2026, every Copilot plan moved to usage-based billing — premium request units were replaced by 'AI Credits' that drain based on token usage, including input, output, and cached tokens, at each model's published API rate. GitHub's own explanation: Copilot is now 'an agentic platform capable of running long, multi-step coding sessions' with 'significantly higher compute and inference demands,' and under the old flat price 'a quick chat question and a multi-hour autonomous coding session can cost the user the same amount.' The price didn't really go up. The meter just got handed to you.
Why are enterprise AI agent costs so unpredictable?
Because an agent's cost scales with how much work it does, not how many people use it. A traditional chat answer takes hundreds of tokens; per EY's 2026 analysis, an agentic workflow can consume hundreds of thousands of tokens in a single session as it plans, retrieves, calls tools, and loops. EY puts the same customer-service interaction at $0.04 in 2023 and $1.20 in 2026 once it became an orchestrated agentic system — roughly 30 times more. A flat seat license hides that variance. A per-token meter exposes it, and most budgets were built on the flat number.
Who should own AI agent costs inside a company?
Someone, by name, per workflow — and that role usually doesn't exist yet. Finance budgeted seats. Engineering ships the agent loops that burn tokens. Operations hits the cap when work stops mid-month. The token consumption sits in the seam between them, and a cost no one owns is a cost no one forecasts. The fix is the same as any integration problem: make the consumption legible (instrument tokens per workflow, not just per user), then assign an owner who reconciles the forecast against the bill.
Do AI agents cut costs by replacing headcount?
That's the wrong math, and it's the one driving cancellations. Gartner's May 2026 view is blunt: autonomous business and AI-driven layoffs 'may create budget room but do not deliver returns.' Real ROI shows up when AI removes a step from a process — a status chase, a re-keyed record, a manual reconciliation — not when it removes a name from payroll. If you justify an agent by the salary it replaces while ignoring the metered token bill it adds, you can end up paying more for less.
How do you control AI agent costs the right way?
Meter the process before you negotiate the bill. Instrument token consumption per workflow so you know which agent loop costs what. Route by task — cheap models for routine steps, frontier models only where they change the answer — instead of sending everything to the most expensive model. Cap by workflow, not just by user. And give one person the forecast. The vendor's pricing page is the last lever; the consumption you can't see is the first.