TL;DR
A 200-tool MCP setup burns 150–220k tokens at session start; a two-meta-tool proxy cuts that to ~2,000 tokens — a 98–99% reduction.
At $3/million input tokens and 20 sessions/day, that's $270/month down to $3/month.
The pattern: one discover_tools(query) meta-tool and one call_tool(name, args) meta-tool replace hundreds of schemas at startup, with on-demand retrieval via BM25 + synonym expansion.
A model burning its full context window on schemas before the first message has zero room for conversation history, documents, or reasoning — the real cost is context headroom, not dollars.
Build the proxy if you exceed ~50 tools or any single tool schema tops 2,000 tokens; below that, load everything conventionally.
The MCP token tax has a fix. It's not fewer tools. It's a smarter way to load them.
In The MCP Token Tax, I wrote about a structural problem in how agents load tool schemas: everything loads at session start, before the user types a word. A produc
Discussion
Don’t hold back—comment!
Don’t wait—start sharing your ideas now!