Blog

What we're
learning & building

Practical AI engineering, architecture decisions, lessons from production — written by the engineers who build the systems.

BLOG

All Posts (9)

🔍

Engineering

Why RAG Still Matters in the Age of 1M-Token Context Windows

Long context models are impressive, but they don't eliminate the need for retrieval. Here's a practical breakd…

RAGLLMsEngineering

AR, ARFeb 24, 2026 · 8 min

⚙️

Practical AI

Fine-Tuning vs. Prompting: A Decision Framework for Production Teams

The debate between fine-tuning and prompt engineering is often framed as binary. It's not. We share the framew…

Fine-TuningPromptingProduction

ARFeb 17, 2026 · 11 min

🧠

Architecture

Four Memory Patterns for Production AI Agents

Working memory, episodic memory, semantic memory, procedural memory — what they are, how to implement them, an…

MemoryAgentsArchitecture

SK, AR, ARFeb 10, 2026 · 14 min

📊

Evaluation

Evaluating LLMs in Production: Beyond Benchmarks

MMLU and HumanEval tell you almost nothing about how a model will perform in your product. We walk through the…

EvaluationLLMsProduction

AR, ARFeb 3, 2026 · 9 min

⚡

Infrastructure

A Practical Guide to Speculative Decoding

Speculative decoding can give you 2–4× throughput gains — but it requires careful setup. We cover draft model …

InfrastructurePerformanceInference

ARJan 27, 2026 · 12 min

🛡️

Security

Defending Against Prompt Injection in Production Agents

Prompt injection is the SQL injection of the AI era. We document the attack vectors we've encountered and the …

SecurityAgentsProduction

SK, ARJan 20, 2026 · 13 min

🕸️

Architecture

Combining Knowledge Graphs with Dense Retrieval: What We Learned

Pure vector search fails on multi-hop questions. Pure graph traversal is too rigid. We share our architecture …

RAGKnowledge GraphArchitecture

SK, ARJan 13, 2026 · 16 min

💰

Infrastructure

The LLM Cost Reduction Playbook: From $0.08 to $0.003 per Query

A step-by-step walkthrough of how we reduced a client's inference costs by 96% over 6 months — through model s…

InfrastructureCostFine-Tuning

AR, ARJan 6, 2026 · 18 min

🤖

Architecture

Agent Orchestration Patterns: Supervisor, Pipeline, and Swarm

There's no single right way to orchestrate multiple AI agents. We break down three architectures we've deploye…

AgentsArchitectureMulti-Agent

SK, ARDec 30, 2025 · 10 min

What we'relearning & building

What we're
learning & building