GitHub Copilot AI Credits Cost Design: What Skills, MCP, and External Context Actually Bill¶
For / Key Points
For: Developers and teams using GitHub Copilot with Skills, MCP, and external documentation.
Key Points:
- The billing boundary is not whether information was fetched, but whether it entered the model context.
- Output tokens are more expensive, but very large inputs can dominate the bill on their own.
- Skills should be designed as a control plane that routes the agent to the smallest useful context, not as a bulk knowledge injector.
On June 1, 2026, the important cost unit for GitHub Copilot shifted from request count to token volume. A short user request can still become expensive if the agent loads Skills, fetches GitHub Issues or pull requests, and pushes external documentation into the model context.
This article answers one question. When you use Skills, MCP, and external references, what actually becomes billable in AI Credits?
The Billing Boundary Is Context Injection¶
Billing starts when information enters the context the model can use, not when the information is fetched. GitHub Copilot moved from Premium Request Units to GitHub AI Credits on June 1, 2026. Usage is calculated from input tokens, output tokens, and cached tokens at model-specific rates1. GitHub Docs also describes model pricing in terms of input sent to the model, output generated by the model, and cached context that is reused or stored2.
A file that only exists locally, or an Issue that only exists on GitHub, does not itself consume AI Credits. Even when gh or an MCP server retrieves data, that retrieval is not directly token-billed unless the agent passes the result into a prompt, summary, quote, or reasoning step.
The reverse misconception is more dangerous: the user's visible prompt is not the only billable input. Skill text, tool results, conversation history, intermediate generated text, and system/tool definitions all matter once they enter the active context.
If this boundary is unclear, you cannot tell whether a session became expensive because the GitHub API was called or because the fetched payload was injected wholesale into the model. Cost design starts with the text handed to the model, not the number of API calls.
Output Is Expensive, but Large Input Is Expensive Too¶
Output tokens often cost more than input tokens, but a large reference context can dominate the bill before the model writes much at all. For example, GPT-5.5 is priced at 5.00 USD per million input tokens, 0.50 USD per million cached input tokens, and 30.00 USD per million output tokens2. The output-to-input ratio is 6x, which can make input look cheap at first glance.
With the same GPT-5.5 rates, the arithmetic tells a different story.
- 200,000 input tokens: 0.2M x 5.00 USD = 1.00 USD = 100 AI Credits
- 2,000 output tokens: 0.002M x 30.00 USD = 0.06 USD = 6 AI Credits
- The same 200,000 tokens as cached input: 0.2M x 0.50 USD = 0.10 USD = 10 AI Credits
Even with a short final answer, the loaded context can cost more than 16 times the generated answer. Caching can reduce that input cost by 10x, but users do not fully control which context is treated as cached input. Anthropic models also include a separate cache write price in addition to cached input, so designs should not assume that cache always makes large context cheap2.
Reading is not free. Input can become the primary cost once the context grows by orders of magnitude.
One Prompt Can Trigger Multiple Model Calls¶
A short request does not guarantee a single model call. The user may type one sentence, such as "review this by checking GitHub with these rules," while the agent runs a longer internal chain.
Read the user prompt
→ choose a Skill
→ inject SKILL.md into context
→ fetch Issues or pull requests through MCP
→ inject fetched results into context
→ generate intermediate reasoning
→ read additional files
→ inject more context
→ generate the final response
Each step can add model input or output. GitHub Docs notes that agentic features such as agent mode and Copilot cloud agent can include multiple model calls within one task. Complex sessions across large codebases consume substantially more than a quick chat question3.
Copilot CLI adds another cost surface through compaction. When a conversation reaches about 80% of the context window, Copilot CLI automatically compacts context by sending the full conversation to the model and asking it to generate a structured summary4. That summarization step is also a model interaction.
"Short prompt = cheap" is therefore the wrong mental model. Actual cost depends on how many model calls the agent makes, how much tool output it keeps in context, and which model handles each step.
What Skills, MCP, and External References Bill¶
Skills are not a free way to feed unlimited knowledge into Copilot. GitHub Copilot Agent Skills work by letting Copilot decide when a Skill is relevant, then injecting the SKILL.md file into the agent context5. If the file is long, that long instruction payload can occupy context every time the Skill is used.
| Element | Billing implication |
|---|---|
Skill description | May be used to decide whether the Skill applies |
SKILL.md body | Input tokens when the Skill is injected into context |
| Supplementary Markdown or script output | Input tokens for the portion loaded into context |
| GitHub API or MCP results | Input tokens if passed through as context |
| Locally filtered summary | Only the smaller text that is passed to the model |
Local grep or retrieval operation itself | Not directly billed in AI Credits |
Repository-wide custom instructions deserve the same treatment. GitHub describes copilot-instructions.md as context that is automatically added to requests in the repository context, and also documents limits such as keeping instructions under two pages and not making them task-specific6.
Context is a shared resource. Teams need to separate what is always present, what is loaded only when relevant, and what should be processed locally before reaching the model.
Two Cost Surfaces That Are Easy to Miss¶
Two extra costs are easy to miss when you only look at token conversion. Both are documented in GitHub's pricing reference.
- Copilot code review has two charges: token usage is billed in AI Credits. The agentic infrastructure that powers the review also consumes GitHub Actions minutes2.
- Long-context thresholds affect displayed rates: GPT-5.4 pricing applies to prompts at or below 272K tokens, while Gemini 2.5 Pro and Gemini 3.1 Pro pricing applies to prompts at or below 200K tokens2.
The idea that a larger context window is automatically better is a poor default under usage-based billing. Use standard context and reasoning settings by default, then expand only for tasks that justify the extra context.
Filter Before the LLM Reads¶
The core practice is simple: filter before the LLM reads. Design Skills as a routing and control layer that tells the agent how to retrieve the smallest useful evidence set.
A minimal SKILL.md should behave more like a routing definition than a long operating manual. It should specify when to run, what to retrieve, and how to shape the output.
---
name: github-pr-review
description: Review GitHub pull requests. Use when asked for a PR review.
---
steps:
- Use gh to narrow target PRs (state=open, recently updated, assigned to me)
- Extract only changed files and relevant comments
- Pass the extracted JSON into the review step and focus on risk
Put filtering logic on the local side. Do not pass 100 Issues to the model when five filtered records answer the question. The following example fetches 100 records, filters locally, and passes only the top matches to the LLM.
gh issue list --state open --limit 100 --json number,title,labels,updatedAt,url \
--jq '[.[] | select(.labels[].name == "review")] | .[0:5]'
The design pattern comes down to four rules.
- Filter GitHub data by labels, state, path, and date before it enters context.
- Prefer structured JSON around 10 KB over free-form text around 100 KB.
- Search or index large documents, then pass only grounded excerpts instead of full text.
- Let scripts or lightweight models handle extraction, and reserve powerful models for final judgment.
Accuracy and cost are not always in conflict. Removing irrelevant context can make the model focus on the information that actually matters.
Measure It With /context¶
The fastest way to validate the design is to measure it. Copilot CLI's /context command shows fixed System/Tools overhead, conversation history, free space, and the buffer that triggers context management4.
Test in stages. Run a small Skill once, compare the context usage when passing 10 GitHub records versus 100, then switch between lightweight and powerful models to see the slope. Changing only the length of SKILL.md is also a useful way to expose the hidden cost of instruction injection.
Usage-based billing is fairer because it tracks actual consumption, but it also makes design choices visible on the invoice. Treat Skills and MCP as a control plane, filter before context injection, and optimize for the smallest evidence set that can still support a correct answer.