DRUT AI | Modern Agentic Ai Company

The Challenge

The Context Collapse Problem

The company's existing support chatbot handled roughly 15,000 conversations per day across 50,000+ enterprise accounts. Each account had a unique configuration — product tier, feature set, integration stack, billing structure. When a customer opened a new support session, they faced a blank-slate bot with no recollection of their prior 40 conversations, their known issues, or their product setup. Customers were re-explaining their environment at the start of every session. "We're on the Enterprise tier, using the Salesforce integration, and we've had the recurring sync issue since March" — repeated, verbatim, by the same customer, 12 times across 12 sessions.

CSAT in Free Fall

The re-explanation tax was directly measurable. Sessions that began with context re-establishment took 4.2 minutes longer on average and resolved at a 31% lower first-contact rate than sessions where context was available. CSAT had declined from 4.1 to 3.2 over eight months as the product complexity grew faster than the support infrastructure. Churn analysis revealed that customers with CSAT below 3.5 churned at 2.3× the rate of those above 4.0. The support experience was becoming a meaningful churn driver for a company where annual contract values averaged $42,000.

"Customers were re-explaining their environment at the start of every session — repeated, verbatim, the same 12 times across 12 sessions."

Our Solution

Fine-Tuned Domain Model

We fine-tuned a 13B parameter model on 2.3 million historical support conversations, product documentation, and resolved ticket logs. The fine-tuning process focused on three capabilities: accurate product terminology recall, appropriate escalation judgment, and response tone calibration for enterprise customers. The base model selection prioritized inference cost over raw capability — frontier model performance on this domain-specific task was achievable with a well-fine-tuned smaller model, and inference cost directly impacted unit economics.

Episodic Memory Architecture

The episodic memory system maintains a structured profile per customer account, updated after every conversation. The profile contains: account configuration snapshot, known recurring issues with resolution history, stated preferences and working hours, escalation history, and a semantic summary of the last 10 conversations. At session start, the relevant account profile is injected into the model's context. The agent greets customers with demonstrated awareness — referencing their configuration, acknowledging known issues, and skipping the environment re-establishment step that had been degrading experience.

Tiered Escalation Logic

The system includes explicit confidence thresholds and escalation triggers. When the model's confidence in a resolution path falls below a calibrated threshold — measured by a secondary classifier trained on resolution outcomes — the session is escalated to a human agent with the full context already prepared. Human agents receive a pre-populated handoff brief: customer context, session summary, what was tried, and a suggested resolution path. Average human agent handling time for escalated sessions dropped 38% because the preparation work the agent previously did manually was now automated.

"The agent greets customers with demonstrated awareness — referencing their configuration, acknowledging known issues, skipping re-establishment entirely."

Results

Measured Outcomes

CSAT moved from 3.2 to 4.6 over the 90 days following full deployment. First-contact resolution rate reached 78%, up from 54%. The 40% inference cost reduction came from replacing frontier model API calls with fine-tuned on-premises inference for the 85% of queries within the model's confident domain — frontier model fallback was reserved for genuinely novel or complex scenarios. Monthly churn among customers interacting primarily with the AI support channel declined by 18% compared to the prior 6-month baseline. This outcome, not the cost reduction, became the primary business justification for continued investment.

Ongoing Improvement Loop

The system continuously improves through a feedback loop: human agents who handle escalated sessions can flag whether the handoff brief was accurate. Inaccurate briefs are logged and used for quarterly fine-tuning runs. The episodic memory system is updated with every resolution outcome, making the account profile progressively richer with each interaction. After 6 months, the model required 23% fewer escalations on the same query distribution as at launch — a measurable signal that episodic memory accumulation was improving autonomous resolution capability over time.

Implementation

Process & Timeline

Data Audit & Preparation

Extracted and cleaned 2.3M historical conversations. Designed memory schema and account profile structure with client data team.

Fine-Tuning Pipeline

Built training data pipeline, ran supervised fine-tuning on Llama 3.1 13B, iterative evaluation against held-out conversation set.

Memory System Build

Built episodic memory store, account profile update logic, and session context injection. Integrated with existing CRM data.

Escalation Integration

Built confidence classifier, escalation trigger logic, and handoff brief generation. Integrated with Zendesk agent workspace.

Staged Rollout

10% traffic for 2 weeks, monitoring CSAT and escalation rates. Full rollout after KPI validation.

Technology Stack

Llama 3.1 13B Fine-tunevLLMPostgreSQLpgvectorPythonFastAPIRedisCeleryZendesk APIAWS SageMakerGrafana

Customer Support AI with Long-Term Memory