Context-aware support at enterprise scale
The company's existing support chatbot handled roughly 15,000 conversations per day across 50,000+ enterprise accounts. Each account had a unique configuration — product tier, feature set, integration stack, billing structure. When a customer opened a new support session, they faced a blank-slate bot with no recollection of their prior 40 conversations, their known issues, or their product setup. Customers were re-explaining their environment at the start of every session. "We're on the Enterprise tier, using the Salesforce integration, and we've had the recurring sync issue since March" — repeated, verbatim, by the same customer, 12 times across 12 sessions.
The re-explanation tax was directly measurable. Sessions that began with context re-establishment took 4.2 minutes longer on average and resolved at a 31% lower first-contact rate than sessions where context was available. CSAT had declined from 4.1 to 3.2 over eight months as the product complexity grew faster than the support infrastructure. Churn analysis revealed that customers with CSAT below 3.5 churned at 2.3× the rate of those above 4.0. The support experience was becoming a meaningful churn driver for a company where annual contract values averaged $42,000.
"Customers were re-explaining their environment at the start of every session — repeated, verbatim, the same 12 times across 12 sessions."
We fine-tuned a 13B parameter model on 2.3 million historical support conversations, product documentation, and resolved ticket logs. The fine-tuning process focused on three capabilities: accurate product terminology recall, appropriate escalation judgment, and response tone calibration for enterprise customers. The base model selection prioritized inference cost over raw capability — frontier model performance on this domain-specific task was achievable with a well-fine-tuned smaller model, and inference cost directly impacted unit economics.
The episodic memory system maintains a structured profile per customer account, updated after every conversation. The profile contains: account configuration snapshot, known recurring issues with resolution history, stated preferences and working hours, escalation history, and a semantic summary of the last 10 conversations. At session start, the relevant account profile is injected into the model's context. The agent greets customers with demonstrated awareness — referencing their configuration, acknowledging known issues, and skipping the environment re-establishment step that had been degrading experience.
The system includes explicit confidence thresholds and escalation triggers. When the model's confidence in a resolution path falls below a calibrated threshold — measured by a secondary classifier trained on resolution outcomes — the session is escalated to a human agent with the full context already prepared. Human agents receive a pre-populated handoff brief: customer context, session summary, what was tried, and a suggested resolution path. Average human agent handling time for escalated sessions dropped 38% because the preparation work the agent previously did manually was now automated.
"The agent greets customers with demonstrated awareness — referencing their configuration, acknowledging known issues, skipping re-establishment entirely."
CSAT moved from 3.2 to 4.6 over the 90 days following full deployment. First-contact resolution rate reached 78%, up from 54%. The 40% inference cost reduction came from replacing frontier model API calls with fine-tuned on-premises inference for the 85% of queries within the model's confident domain — frontier model fallback was reserved for genuinely novel or complex scenarios. Monthly churn among customers interacting primarily with the AI support channel declined by 18% compared to the prior 6-month baseline. This outcome, not the cost reduction, became the primary business justification for continued investment.
The system continuously improves through a feedback loop: human agents who handle escalated sessions can flag whether the handoff brief was accurate. Inaccurate briefs are logged and used for quarterly fine-tuning runs. The episodic memory system is updated with every resolution outcome, making the account profile progressively richer with each interaction. After 6 months, the model required 23% fewer escalations on the same query distribution as at launch — a measurable signal that episodic memory accumulation was improving autonomous resolution capability over time.