RouteLLM — AI Inference Optimizer

Task Queue & Smart Router

Incoming tasks are analyzed, routed to the optimal model, and tracked with cost savings

routellm.codelab.sh/tasks

🧑‍💻

Generate REST API endpoints for user auth module

Code generation · complexity: high · 2,840 tokens

Claude Opus

$0.042

↓ 38% saved

Done

📝

Summarize 12-page research paper on transformer efficiency

Summarization · complexity: low · 890 tokens

Claude Haiku

$0.001

↓ 92% saved

Done

👁️

Extract table data from scanned invoice PDF

Vision + extraction · complexity: medium · 1,560 tokens

Claude Sonnet

$0.008

↓ 71% saved

Done

🧠

Multi-step math proof: convergence of infinite series

Reasoning · complexity: very high · 4,120 tokens

Claude Opus

$0.061

↓ 24% saved

Running

Optimization Engine

Task analysis, prompt compression, model selection rationale, and cost comparison

routellm.codelab.sh/router/task-0847

Task Analysis auto-detected

Task Type Code Generation

Complexity High (0.82)

Reasoning Depth Multi-step

Input Tokens 4,580

Compressed To 2,840 (↓38%)

Est. Output ~1,200 tokens

Selected Model Claude Opus

Confidence 96%

Optimization Pipeline 5 / 6 applied

1. Prompt Compression Applied

Removed redundant instructions, deduplicated context. 4,580 → 2,840 tokens (38% reduction).

2. Input Restructuring Applied

Converted free-text spec into structured JSON schema for clearer model comprehension.

3. Model Downgrade Check Skipped

Complexity 0.82 exceeds Sonnet threshold (0.65). Opus required for multi-step code gen.

4. Token Budget Cap Applied

max_tokens set to 1,500 (est. 1,200 + 25% buffer) to prevent runaway generation.

$0.068

Naïve Cost

$0.042

Optimized Cost

Cost Analytics Dashboard

Track savings, model distribution, and token efficiency across all tasks

routellm.codelab.sh/analytics

1,284

Tasks Routed

$127

Total Saved

64%

Avg Savings

41%

Token Reduction

Cost Trend (Naïve vs Optimized)

Feb 9 --- Naïve — Optimized Mar 9

Model Distribution

Claude Haiku

52%

Claude Sonnet

28%

Claude Opus

12%

GPT-4o

52% of tasks routed to Haiku — cheap model handles majority of simple tasks