You built an AI app. It works. But your OpenAI bill keeps climbing because 70% of queries are questions you've already answered. What if you could stop that in three lines of code?
Apache 2.0 · Python 3.10+ · PyPI
Not word-for-word — nobody asks the same exact string. But “How do I reset my password?” and “I forgot my password, help” are the same question. You pay for both. Every day. Every user.
Most caches are dumb: exact match or nothing. cacheback understands meaning. And when it finds similar queries in its memory, it synthesizes a fresh, contextual response.
“What is Python?” asked twice. Same meaning, instant return. <5ms. $0. Done.
→“Explain Python for beginners” — not identical, but close. CEAG synthesizes a fresh answer from cached knowledge. Fast. Fraction of cost.
→Never seen before. Calls the real API, caches the response. Next time someone asks something similar — it's ready. The cache gets smarter over time.
SQLite for storage, ONNX for embeddings. No Redis, no cloud, no API keys for the cache itself. If anything goes wrong — your app keeps running.
“How do I cancel?” and “Where's the cancel button?” match. Vector embeddings, not string comparison.
MiniLM-L6-v2 · ONNXCEAG creates fresh responses from cached knowledge. Unique, contextual answers — not copy-pasted text. Quality: 0.942.
Cached Ensemble Augmented GenerationChange OpenAI() to CachedOpenAI(). Same API, same types, same streaming. Anthropic wrapper too.
Cache hits stream back chunk-by-chunk, exactly like the original API. Your frontend doesn't know the difference.
buffer & replayDisk full? Corrupt database? ONNX model missing? Cache fails silently, app calls the API directly. 14 failure scenarios tested.
graceful degradationDon't want to change code? Run cacheback-proxy, point your base URL to it. Works with any language.
Simple cache works where questions repeat. CEAG synthesis goes further — it uses conversation context to create fresh responses even for personalized queries. We'd rather be honest upfront than after you install.
Pick your SDK. Change one import. Deploy. Your app is now 70% cheaper to run.
The full SDK is free, open source, Apache 2.0. Use it in production, fork it, sell products built on it. We make money when you want us on speed dial.
Everything. Forever. No trial, no limit.
You ship to prod. We watch your back.
Compliance, isolation, architecture help.
Two lines of code. Savings start on the first duplicate.