< Back to Blog

I Tested TOON Format on My AI Agent Stack and Cut Token Usage by 40%

How switching from JSON to TOON encoding in my MCP servers reduced AI agent token consumption by 30-45% on real production queries. Full benchmark data and implementation guide.

I run a stack of AI agents that manage my entire consultancy. CRM updates, invoice tracking, content scheduling, client onboarding. All of it flows through MCP servers - lightweight middleware that connects AI models to your business tools so they can actually read and write data, not just talk about it.

The agents work well. But every time they pull a list of records from a database, the response comes back as JSON. Field names repeat on every single record. Structural characters add up fast. For a 10-record query with 10 fields, roughly half the data the AI processes is structural overhead - brackets, braces, quotes, repeated keys - that it doesn't actually need to understand the data.

I found TOON - Token-Oriented Object Notation - an open source format designed to fix this exact problem. It encodes the same data but strips the bloat. The claim was 40% fewer tokens with better parsing accuracy.

I decided to test it on my production MCP stack. Here's what happened.

How MCP Servers Handle Data (and Why It's Wasteful)

When an AI agent queries your CRM or database through an MCP server, the response is structured data - rows and columns, basically a spreadsheet. JSON wraps that spreadsheet in a lot of packaging: every cell gets labeled with its column name, every row gets wrapped in curly braces, every value gets quoted.

This is fine for web APIs where a few kilobytes don't matter. But AI agents pay per token. Every unnecessary bracket, every repeated field name across 20 records, every quote around a number - that's budget going to syntax instead of substance.

TOON takes a different approach. For records that share the same fields (which is most database queries), it declares the column names once at the top and sends each row as a simple comma-separated line. Same data. Way less packaging.

The Benchmark

I pulled real data from my NocoDB content tracking table and ran it through both formats. Five records, ten fields each.

Format Size Token Estimate
Standard JSON (pretty) 4,846 chars ~1,212 tokens
Compact JSON 3,901 chars ~976 tokens
TOON 2,657 chars ~665 tokens

TOON used 45% less space than standard JSON. Even against compact JSON with no whitespace, it saved 32%.

Here's why. Instead of this:

{"Id": 23, "Name": "Pipeline Post", "Brand": "Personal", "Status": "Scheduled"}
{"Id": 24, "Name": "Strategy Post", "Brand": "Personal", "Status": "Scheduled"}

TOON sends this:

list[2]{Id,Name,Brand,Status}:
  23,Pipeline Post,Personal,Scheduled
  24,Strategy Post,Personal,Scheduled

Field names declared once in a header. No brackets. No quotes unless the value needs them. For list queries - which make up about 90% of what my MCP servers return - the savings are significant.

Testing Across Four MCP Servers

I didn't just benchmark static data. I rolled TOON out to four production MCP servers and tested with real queries:

MCP Server What It Connects Result
NocoDB Content tracking database 45% savings, tabular compression kicked in
InvoiceNinja Invoicing and product catalog 35-40% savings after stripping empty arrays
Outline Internal wiki and docs 39% savings on search results (25 docs compressed into table)
Listmonk Email platform and subscriber lists 22% savings on subscriber data, no help on HTML email templates

Zero parsing errors across all four servers. Claude read every TOON response correctly - content records, product catalogs, subscriber lists, document search results.

Where It Didn't Help

Not everything improved.

Mixed data structures - when records have different shapes (some with extra nested fields, some without), TOON can't use the tabular compression. The savings dropped to near zero.

Large text fields - email templates with full HTML bodies, long document content. TOON can't compress a 10KB HTML string. It's just a string either way.

The fix for mixed structures was straightforward: strip the fields the AI doesn't need before encoding. My NocoDB records included empty relationship arrays on every response. Removing those made all records uniform, which unlocked tabular compression on queries that were previously the worst case.

The Implementation

Adding TOON to an MCP server takes about 5 minutes. One npm package, one function change, and a fallback to standard JSON if anything breaks:

import { encode as toonEncode } from "@toon-format/toon";

const USE_TOON = process.env.NOCODB_FORMAT === "toon";

function toTextResult(data) {
  if (USE_TOON) {
    try { return toonEncode(data); }
    catch { return JSON.stringify(data); }
  }
  return JSON.stringify(data);
}

The environment variable toggle lets you A/B test per server. Turn TOON on, run the same queries, compare results. If anything breaks, flip it back in seconds without touching code.

What This Means for Your AI Agent Costs

A single query saving 500 tokens isn't dramatic. But AI agent workflows aren't single queries. A typical content planning session hits my database 4-6 times, checks my social scheduler, queries my wiki, and sometimes pulls a full product catalog. That's 8-10 MCP calls.

At 35-45% savings per list query, the compound reduction across a full work session is meaningful. If you're building agents that make frequent database calls through MCP servers - and most useful agents do - this adds up.

Should You Try This?

If you're building MCP servers or any integration that sends structured data to language models: yes. The implementation is trivial, the risk is zero (JSON fallback), and the savings are real for any response that looks like a list of records.

If your MCP servers mostly return long-form text (documents, email bodies, code), TOON won't change much. It's a structured data optimization, not a general compression tool.

TOON is open source, MIT licensed, with packages for TypeScript, Python, Go, Rust, and .NET. It took me an afternoon to benchmark, implement, and validate across my full stack.

The agents didn't notice the difference. My token usage did.

If you're exploring MCP servers or AI agent automation for your marketing ops, book a quick call - I'm always happy to talk through what's possible.