The Subsidy Ended: What Tool-Using Agents Actually Cost

On June 1, GitHub Copilot’s usage-based billing became active for all Copilot plans, and developers reacted quickly and loudly. A Pro plan still costs $10, but it now comes with a monthly pool of AI credits. Those credits are priced at a penny each, and they’re consumed according to the model used and the tokens […]

Jun 9, 2026 0 0

The Subsidy Ended: What Tool-Using Agents Actually Cost

On June 1, GitHub Copilot’s usage-based billing became active for all Copilot plans, and developers reacted quickly and loudly. A Pro plan still costs $10, but it now comes with a monthly pool of AI credits. Those credits are priced at a penny each, and they’re consumed according to the model used and the tokens processed, including input, output, and cached tokens. For a heavy agentic session running a frontier model, that makes spend feel very different from a flat subscription.

That’s the news, and it’s worth understanding, but it isn’t the important part. Nothing about the underlying cost of agentic work actually changed on June 1. The tokens were always being consumed, the loops were always running, and the tool calls were always expanding the context. What changed is that the meter became visible. A workload that had been quietly subsidized under a flat rate started showing up as an itemized bill.

Where the tokens go

To see why the bill landed so hard, it helps to compare two things that look similar and bill very differently. A chat completion is close to a single transaction. You send a prompt, the model sends an answer, and you pay roughly once for the input and once for the output. A tool-using agent doesn’t work that way at all. An agent doesn’t answer a question so much as work toward it, and it works by looping. It reasons about the task, calls a tool, reads the result, reasons again, calls another tool, and continues until it decides it’s finished.

Every pass through that loop carries a cost that’s easy to miss. In many agent harnesses, each turn carries forward a large share of the accumulated context: prior messages, tool descriptions, retrieved files, and tool results. Even when some of that context is cached, summarized, or pruned, the system is still doing metered work to preserve enough state for the next decision. The final answer you actually wanted is only a thin slice of what you paid for. The loop is the bill.

This is why agent cost doesn’t scale politely. It scales with the number of turns, and the number of turns scales with how much discovery the agent has to do, which in turn scales with how vague the request was and how much irrelevant context it’s dragging along. A clean, well-scoped task might finish in three turns, while the same task posed as an open-ended question might wander through 15, each carrying the cost of everything that came before it. Under a flat rate, that difference was invisible. Under usage-based billing, it’s the difference between a small interaction and an expensive one.

Tool design is now part of the cost model

I wrote recently about a hidden tax on Model Context Protocol servers: the way an overstuffed tool catalog quietly degrades a model’s ability to route to the right tool. Bloated descriptions, overlapping responsibilities, and vague parameters make the model’s job harder and its choices worse. That argument was about accuracy. The billing change adds a second invoice for the same bloat, and this one is denominated in dollars.

The tool catalog is often part of what gets carried through the agent’s loop. A tool described in three tight sentences and a tool described in three rambling paragraphs may both function, but the second one pays rent in the context window every time an agent has it loaded. Multiply that across a catalog of 40 tools and a workflow that runs a dozen turns, and the cost of verbose tool design stops being a rounding error. Tool design was already a correctness discipline. It’s now a cost discipline as well. The same audit that tightens routing accuracy tightens the bill.

Where prompt discipline runs out

There’s a layer of this that individual users can control, and it’s worth knowing because the savings are real and immediate. Two patterns matter most, and I’ve been handing both to the engineers on a pilot I run for a large healthcare organization. They aren’t magic tricks. They’re ways to keep the agent out of unnecessary discovery loops.

The first pattern is about input. Prompt the agent like a short requirement rather than a broad question. A request such as “look at the encounter data and tell me what you find” forces the agent into discovery mode, where it burns turns figuring out what you meant, and every one of those turns carries the full context forward. Compare that to a prompt that front-loads the specifics by naming the project and the table, naming the date field to filter on, stating the output shape you want, and calling out anything that should be excluded. A better prompt would be: “Using the curated clinical project and the silver-zone encounters table, show total encounters by month for calendar year 2025, use admission_date_time for inclusion, and return one row per month ordered chronologically.” The second prompt collapses the loop. The agent has what it needs on the first turn, so it does the work instead of interviewing you for it.

In practice, the difference isn’t just polish. The vague version forces the agent to discover the data model, infer the date semantics, choose an aggregation, and decide on a display format. The specific version turns the task into a bounded query. That difference shows up in accuracy, latency, and cost.

The second pattern is about output, and it’s the lever most people overlook. Ask for plain text or Markdown during the intermediate steps, and save rich HTML formatting for the final, confirmed deliverable. Formatted output is expensive to generate, and requirements shift. If you ask for a polished HTML report on the first pass and then change a filter, you pay full output-token freight to regenerate all that layout, often more than once. The cheaper habit is to validate the numbers in text and format only at the end.

These patterns work, and they also have a ceiling. Both of them put the entire burden of cost control on the user, and they hold only as long as every user exercises the discipline on every prompt. The day someone reverts to “tell me what you find,” the savings evaporate, and the only thing standing between the team and a surprise invoice is a budget cap that reports the overspend after it has already happened.

Cost is a governance problem, not a budgeting one

That fragility is the real lesson. A budget cap is a backstop rather than a control. It will stop a runaway, but it tells you that you overspent rather than why, and it does nothing to make the next run cheaper. Treating cost as a budgeting problem leaves you forever reacting to the meter, while treating it as an architecture problem lets you build the savings in once and stop relying on everyone’s good behavior.

That means the controls that matter belong on the platform rather than in individual prompts. By the platform I don’t mean the agent itself, the coding assistant or chat client a developer drives day-to-day, and I don’t mean the model or a router sitting beneath it. I mean the control plane that sits above the agents, the layer where an organization enforces policy, access, observability, and now cost across every agent and model its developers touch. An administrative console that gives IT visibility into who is doing what and which capabilities they can install is an early, narrow instance of it. A router that sends planning to a cheap model is one feature that belongs there. The platform is where the rules live, and the agent is a consumer of those rules rather than the place you set them. The platform should route models by task, using cheaper models for planning and reserving frontier models for work that earns the price. It should bound the loop, requiring the agent to check in after a fixed number of iterations. It should cap tool-result payloads so a careless query cannot dump a million rows into the context window. It should default intermediate work to plain text, making the cheap path the path of least resistance instead of something users have to remember.

Every one of those controls is something a user can approximate by hand and something the platform can simply guarantee. This is the same principle I keep returning to in the context of data access, where safe behavior cannot depend on the person at the keyboard remembering the rules. Prompts guide behavior. Guardrails make the cheaper and safer behavior the default. Cost governance is guardrails as control plane, with a dollar sign attached, enforced at the same layer where you already enforce who is allowed to see which row.

The pattern, not the vendor

It would be a mistake to read this as only a GitHub story. GitHub is the current example because its change is visible and recent, but usage-based billing for agentic work is the direction of travel for many AI tools. The economics under the hood are similar: Agentic workloads turn single answers into loops of model calls, tool calls, and context management. The flat-rate subsidy was always going to come under pressure once the workload shifted from autocomplete to autonomy.

The organizations that treat June 1 as a pricing event will optimize a few prompts, grumble, and move on until the next vendor changes its meter. The ones that treat it as an architecture signal will push the cost controls down into the platform, where they hold regardless of which provider is counting which token. That’s the more durable place to stand. The bill didn’t get bigger this month. It got honest, and an honest bill is the kind you can engineer against.