Skip to content

GCFThe AI-native wire format for structured data.

50-92% fewer tokens than JSON. 100% comprehension on every frontier model. 43 billion+ lossless round-trips across 5 formats (JSON, YAML, TOML, CSV, MessagePack). Spec v3.2 Stable.

Five data formats flowing through GCF into the LLM context window

Zero-code option for MCP servers

One command. 50-92% fewer tokens. Zero code changes.

Zero-code adoption
Drop-in proxy for existing MCP servers. Or integrate natively with any of 6 languages below.
View on GitHub →
pip install gcf-proxyCopy
npm i -g @blackwell-systems/gcf-proxyCopy
go install github.com/blackwell-systems/gcf-proxy@latestCopy
Zero runtime dependencies · No transitive deps · No supply chain risk · Permanent commitment · MIT licensed
🔄

The universal pivot for structured data

JSON, YAML, TOML, CSV, MessagePack: any format in, GCF in the context window, any format out. One format that speaks every format. No other wire format operates across format boundaries.

decode(encode(value)) == value for every structured value. Verified across 43,000,000,000+ lossless round-trips in 5 formats and 6 languages.

📉

50-92% fewer input tokens

50-69% on a single call (generic to graph profile). Up to 92% with session deduplication across repeated tool calls.

At 1000 orders, JSON doesn't even fit in a 200K context window. GCF fits in 47K. Positional fields, inline schemas, and hierarchical grouping eliminate per-record overhead.

✍️

63% fewer output tokens

LLMs produce valid GCF with a 3-line primer.

33% smaller output than TOON. 5/5 generation validity on every frontier model across 3 providers. Zero training.

🧠

100% comprehension on every frontier model

The only format that never fails. Tested across Claude, GPT-5.5, and Gemini with zero format instructions. On structurally complex code graphs, GCF scores 91.2% where JSON drops to 53.4% and TOON to 68.2%. 2,500+ evaluations, 4 providers.

🔒

Proven lossless. 6 languages. Spec v3.2 Stable.

43B+ (yes, really) round-trips across JSON, YAML, TOML, CSV, and MessagePack. Zero failures. 174 conformance fixtures. Cross-language 6x6 encode/decode matrix verified. Read the spec.

🏆

29% fewer tokens than TOON (16 datasets)

Wins 15/16 real-world datasets. 38% fewer on semi-uniform data, 33% on nested, 32% on K8s pod data. TOON's one win is 77 tokens on a single dataset.

See the Difference

Same data. Fewer tokens. Zero information loss.

JSON458 tokens
{
  "orders": [
    {"id": 1001, "customer": "Acme Corp",
     "total": 49.99, "status": "shipped",
     "items": 1},
    {"id": 1002, "customer": "Globex Inc",
     "total": 150.49, "status": "pending",
     "items": 2},
    {"id": 1003, "customer": "Initech LLC",
     "total": 250.99, "status": "processing",
     "items": 3},
    {"id": 1004, ...},
    {"id": 1005, ...},
    {"id": 1006, ...},
    {"id": 1007, ...},
    {"id": 1008, ...},
    {"id": 1009, ...},
    {"id": 1010, ...}
  ]
}
GCF177 tokens
GCF profile=generic
## orders [10]{id,customer,total,status,items}
1001|Acme Corp|49.99|shipped|1
1002|Globex Inc|150.49|pending|2
1003|Initech LLC|250.99|processing|3
1004|Umbrella Co|351.49|delivered|4
1005|Stark Ind|451.99|shipped|5
1006|Wayne Ent|552.49|pending|6
1007|Oscorp|652.99|shipped|7
1008|LexCorp|753.49|processing|8
1009|Cyberdyne|853.99|delivered|9
1010|Soylent|954.49|shipped|10

61% fewer tokens. Scales to 71%+ at production sizes.

10 rows, 5 fields. Token counts verified with tiktoken (cl100k).

Two lines. Any language.

from gcf import encode_generic, decode_generic

gcf_string = encode_generic(data)
original   = decode_generic(gcf_string)

Two Profiles. One Format.

The generic profile is a strict subset of the graph profile. Call encode(), the LLM reads it natively, call decode() at the end.

Graph-shaped data is the fastest-growing data shape in AI: knowledge systems, ontologies, GraphRAG, code intelligence, agent memory. No other token-efficient format treats graphs as first-class. GCF is the only format with native graph syntax: local IDs, typed edges, distance grouping, and session deduplication that compounds to 92% savings across multi-turn sessions.

1

Generic Profile

Any structured data (subset)

Any structured value in, same value out. Verified lossless across 43 billion+ round-trips with JSON, YAML, TOML, CSV, and MessagePack. 71% fewer tokens than JSON.

JSON
[
  {"name":"validateToken",
   "kind":"func","refs":18},
  {"name":"refreshSession",
   "kind":"func","refs":6},
  {"name":"getConnection",
   "kind":"func","refs":34},
  {"name":"runMigration",
   "kind":"func","refs":3}
]
GCF
## results [4]{name,kind,refs}
validateToken|func|18
refreshSession|func|6
getConnection|func|34
runMigration|func|3
2

Graph Profile

Superset: adds IDs, edges, scores

Knowledge graphs, code intelligence, ontologies, relationship networks. Graph-shaped data is the fastest-growing data shape in AI. No other format treats it as a first-class citizen.

JSON
{
  "symbols": [
    {"id":1,"kind":"func",
     "name":"handleReq"},
    {"id":2,"kind":"func",
     "name":"validate"},
    {"id":3,"kind":"iface",
     "name":"AuthCfg"}
  ],
  "edges": [
    {"src":1,"tgt":2,
     "type":"calls"},
    {"src":2,"tgt":3,
     "type":"implements"}
  ]
}
GCF
## symbols [3]
@1 func handleReq 0.95
@2 func validate 0.87
@3 iface AuthCfg 0.60

## edges [2]
@2<@1 calls
@3<@2 implements
3

Session Dedup

Graph profile

JSON retransmits everything on every call. GCF tracks which symbols have been sent and only transmits bare references for known ones. 92% savings by the 5th call.

JSON (call 2: full retransmit)
[
  {"id":1,"kind":"func",
   "name":"handleReq"},
  {"id":2,"kind":"func",
   "name":"validate"},
  {"id":3,"kind":"iface",
   "name":"AuthCfg"},
  {"id":4,"kind":"func",
   "name":"revoke"}
]
GCF (call 2: bare refs + new)
## symbols [4]
@1
@2
@3
@4 func revoke 0.91

GCF Grammar

Two profiles. Session-aware. No ambiguity.

Generic Profile

Section Headers

## symbols [4]{kind,qname}
## edges [3]
## targets
##! summary
  • ## section start
  • [N] element count
  • {fields} inline schema
  • [?] deferred count (streaming)
  • ##! summary trailer

Tabular Data

## users [3]{name,role,active}
Alice|admin|true
Bob|dev|true
Carol|dev|false
  • | pipe-separated values
  • Fields declared once in header
  • No quotes unless needed
  • No braces, no colons per row

Scalars & Key-Value

name=Alice
age=30
active=true
missing=-
empty=""
  • key=value for primitives
  • - null, ~ absent, "" empty string
  • Quote if value contains | or newline

Nested Objects

## orders [2]{id,total,"customer>name","customer>tier"}
1001|249.99|Alice|premium
1002|89.50|Bob|standard
  • "customer>name" flattens nested field into column
  • > separates path levels
  • Values go directly in the row (no attachments)
Graph Profile

Symbols

@1 func handleReq 0.95
@2 iface AuthCfg 0.60 ast
@3
@4 func revoke 0.91
  • @N local ID, kind, qname, score, provenance
  • Bare @N = session ref (already transmitted)

Edges

@2<@1 calls
@3<@1 calls
@4<@2 implements
  • @target<@source type
  • One edge per line, no nesting overhead

Distance Groups

## targets [3]
@0 fn handleReq 0.95 lsp
@1 fn validate 0.87 lsp
@2 fn connect 0.91 lsp

## related [2]
@3 fn helper 0.60 ast
@4 iface Config 0.55 ast
  • targets, related, extended by relevance
  • LLM reads count from header, no scanning

Session Dedup graph profile

GCF profile=graph tool=blast_radius symbols=6 session=true
## targets
@0  # previously transmitted
@1  # previously transmitted
@5 fn pkg.NewFunc 0.85 lsp
  • session=true signals bare refs present
  • @N # previously transmitted = sent in prior call
  • LLM already has the full declaration in context
  • 88% savings vs JSON across multi-turn sessions

Delta Encoding graph profile

GCF tool=topology delta=true tokens=30 savings=85%
## removed
fn pkg.OldHandler
## added
@0 fn pkg.NewHandler 0.85 lsp
## edges_added
pkg.Router -> pkg.NewHandler calls
  • delta=true signals diff from prior payload
  • Only added/removed symbols and edges
  • 95% savings for small topology changes

Streaming both profiles

## results [?]{name,kind}
validate|func
connect|func
handle|func
##! summary counts=3
  • [?] count unknown upfront
  • Rows emit instantly, O(1) memory
  • ##! trailer finalizes count
  • Zero buffering, zero latency

100% comprehension. 71% fewer tokens. 2,400+ LLM evaluations.