Blog

What we're
learning & building

Practical AI engineering, architecture decisions, lessons from production β€” written by the engineers who build the systems.

BLOG
Category:
All Posts (9)
πŸ”
Engineering
Why RAG Still Matters in the Age of 1M-Token Context Windows

Long context models are impressive, but they don't eliminate the need for retrieval. Here's a practical breakd…

RAGLLMsEngineering
AR, ARFeb 24, 2026 Β· 8 min
βš™οΈ
Practical AI
Fine-Tuning vs. Prompting: A Decision Framework for Production Teams

The debate between fine-tuning and prompt engineering is often framed as binary. It's not. We share the framew…

Fine-TuningPromptingProduction
ARFeb 17, 2026 Β· 11 min
🧠
Architecture
Four Memory Patterns for Production AI Agents

Working memory, episodic memory, semantic memory, procedural memory β€” what they are, how to implement them, an…

MemoryAgentsArchitecture
SK, AR, ARFeb 10, 2026 Β· 14 min
πŸ“Š
Evaluation
Evaluating LLMs in Production: Beyond Benchmarks

MMLU and HumanEval tell you almost nothing about how a model will perform in your product. We walk through the…

EvaluationLLMsProduction
AR, ARFeb 3, 2026 Β· 9 min
⚑
Infrastructure
A Practical Guide to Speculative Decoding

Speculative decoding can give you 2–4Γ— throughput gains β€” but it requires careful setup. We cover draft model …

InfrastructurePerformanceInference
ARJan 27, 2026 Β· 12 min
πŸ›‘οΈ
Security
Defending Against Prompt Injection in Production Agents

Prompt injection is the SQL injection of the AI era. We document the attack vectors we've encountered and the …

SecurityAgentsProduction
SK, ARJan 20, 2026 Β· 13 min
πŸ•ΈοΈ
Architecture
Combining Knowledge Graphs with Dense Retrieval: What We Learned

Pure vector search fails on multi-hop questions. Pure graph traversal is too rigid. We share our architecture …

RAGKnowledge GraphArchitecture
SK, ARJan 13, 2026 Β· 16 min
πŸ’°
Infrastructure
The LLM Cost Reduction Playbook: From $0.08 to $0.003 per Query

A step-by-step walkthrough of how we reduced a client's inference costs by 96% over 6 months β€” through model s…

InfrastructureCostFine-Tuning
AR, ARJan 6, 2026 Β· 18 min
πŸ€–
Architecture
Agent Orchestration Patterns: Supervisor, Pipeline, and Swarm

There's no single right way to orchestrate multiple AI agents. We break down three architectures we've deploye…

AgentsArchitectureMulti-Agent
SK, ARDec 30, 2025 Β· 10 min