Programming & Development

I Put a 5MB Rust Binary Between My Code and Every LLM API — It Cut My Bill by 40%

Every developer using LLMs faces the same three problems: Cost blindness — you cannot answer "how much did I spend today?" No failover — when OpenAI goes down, your app goes down Wasted money — identical prompts hit the API over and over instead of being cached I built llmux to fix all three with zero code changes. What is llmux? A single Rust binary (~5MB) that sits between your code and every LLM API. It handles failover, caching, rate limiting, and cost tracking automatically. Your code (any language) | http://localhost:4000 | ┌──────┐ │ llmux │ ← single binary, ~5MB └──┬───┘ │ ┌────┼────┬────────┐ ▼ ▼ ▼ ▼ OpenAI Claude Gemini Ollama Zero Code Changes You change one environment variable. That is it. # Before export OPENAI_BASE_URL=https://api.openai.com # After export OPENAI_BASE_URL=ht

DEV Community

4d ago

1 0

Discussion

Your thoughts matter!

Your input is valuable—be the first to share it!

No comments yet.

Be the first to share your take and keep the conversation moving.

Join the conversation

UPVOTERS

Community appreciation

See who found this content valuable and showed their support.

Stefani

TOPICS

Explore the same topics

Discover more content from the topics this post is mapped to.

lwn.net

AI agent runs amok in Fedora and elsewhere

Comments

Hacker News

1h ago

dev.to

HTML in Canvas API

For years, web developers have had to make a tough architectural choice when building complex, highly-interactive visual applications on the web/ Should you le…

Original Siri

7h ago

infoq.com

Microsoft Open-Sources PostgreSQL Extension for In-Database Dur…

Recently open-sourced by Microsoft, pg_durable is a PostgreSQL extension that enables durable workflows to run natively inside the database, eliminating the ne…

InfoQ

7h ago

github.com

Claude Desktop spins up a VM without no way of stopping it

Comments

Hacker News

8h ago

infoq.com

Presentation: Beyond Prompting: Context Engineering and Memory …

Adi Polak discusses the architecture required to transition from stateless prompts to state-aware, context-rich AI agents. Drawing on 15 years in distributed s…

InfoQ

14h ago

mohkohn.co.uk

Building an HTML-first site doubled our users overnight

Comments

Hacker News

14h ago

Keep browsing

Explore more from this topic

Dive into the full feed of curated posts covering Programming & Development.

Browse Topics

Continue exploring

Discover more content that aligns with your interests and this post.

dev.to

HTML in Canvas API

For years, web developers have had to make a tough architectural choice when building complex, highly-interactive visual applications on the web/ Should you le…

Original Siri

7h ago

dev.to

Am I Reinventing the Wheel? Building a Company's AI Brain

I've been working on a new project that I call a Company Brain. The idea is simple: Instead of having separate tools for support, sales, marketing, operation…

DEV Community

18h ago

dev.to

Anonymization Strategies

Detection tells you where the PII is. Anonymization decides what to do about it. Presidio's anonymizer ships with five built-in operators, each suited for diff…

DEV Community

1d ago

dev.to

Stop sending every AI coding request to the expensive model

AI coding tools are powerful. But they’re also wasteful. A tiny helper-function question does not need Claude Sonnet. A huge architecture review probably do…

Thomas Lefevre

1d ago

dev.to

Apache Iceberg v4: The Current State, the Proposals, and Why Th…

A few years ago the question about Apache Iceberg was whether open table formats could replace proprietary warehouses. That question is closed. Iceberg won. Th…

Daan Sanchez

1d ago

dev.to

June 17 - Build Vision Data Agents with Tools, Skills, and MCP

Join us on June 17 for a virtual workshop to learn how to build production-ready AI agents. Register for the Zoom! Learn how to build production-ready AI agen…

DEV Community

1d ago

Still curious?

See more related posts

Keep the inspiration flowing with fresh submissions and trending finds from the community.

View Latest

I Put a 5MB Rust Binary Between My Code and Every LLM API — It Cut My Bill by 40%

Your thoughts matter!

Join the conversation

Community appreciation

Explore the same topics

Explore more from this topic

Continue exploring

See more related posts

Share Content