We propose a dynamic memory architecture for AI agents that enables persistent cognitive behavior across sessions. By combining importance-based memory creation, vector-based short-term storage, decay dynamics, reinforcement mechanisms, and hierarchical long-term consolidation, this system allows agents to selectively remember, reinforce, and forget information — moving beyond stateless prompt engineering toward persistent, adaptive intelligence.
The challenge of maintaining coherent long-term memory in AI agents represents one of the fundamental limitations of current large language model deployments. While transformer-based models excel at in-context reasoning, their fixed context windows prevent them from accumulating experience across sessions — a property essential for agents operating in persistent, user-facing environments. Prior work has addressed this through external memory stores, but naive approaches suffer from retrieval precision problems at scale, semantic drift over time, and memory pollution from low-signal content. In this paper, we introduce a hybrid architecture that combines dense vector retrieval with symbolic knowledge consolidation and a principled memory lifecycle, enabling agents to maintain accurate, queryable episodic memory over extended operational horizons. This architecture is particularly valuable in domains where continuous interaction, personal context, and behavioral history significantly influence outcomes — including mental health support systems, coaching assistants, long-running AI agents, healthcare copilots, personal assistants, and adaptive learning systems.
A key principle of the system is selective memory formation. Not every interaction produces a memory point. Memory points are created only when a message or event crosses a predefined importance threshold. Generic or low-signal conversational content is intentionally ignored to prevent memory pollution and unnecessary storage growth. The importance threshold is externally assigned — either by an LLM classifier or rule-based heuristic — and is treated as out of scope for this research. What matters architecturally is that a binary signal (store / discard) gates all memory creation. Examples of memory-worthy information include persistent user preferences, emotional states or recurring signals, long-term goals, behavioral patterns, and explicit user facts. The importance threshold can be dynamically adjusted via system prompts or configuration, allowing the system to tune how aggressively it stores memories depending on the domain. Mental health applications may store more emotional signals; productivity assistants may prioritize task-related memory. A special memory class — typed as FACT — bypasses the standard threshold entirely. Facts (e.g., name, age, diagnosis, location) have frozen decay by default and are written directly to long-term storage. Facts only change when a contradiction is detected, at which point the old fact is soft-deleted and replaced.
The system is divided into short-term memory and long-term memory, each serving different cognitive roles. Short-Term Memory Architecture Short-term memory represents active session context and recently created memory points. It is implemented using two layers optimized for different purposes. Redis Session Memory: Redis is used for ultra-fast session retrieval. It stores the current conversational context, is optimized for real-time lookup during inference, and is cleared automatically when the session ends. This prevents unnecessary vector queries for recent information. Redis acts as the ephemeral working memory of the system. Qdrant Short-Term Vector Memory: The second short-term layer is a vector memory store implemented using Qdrant. This store contains memory points that have passed the importance threshold. Each memory point carries a semantic embedding, decay value, timestamp, importance score, reinforcement count, and contextual metadata. This layer acts as the active memory processing space where memory evolution occurs. Unlike Redis, entries in this collection persist temporarily and are continuously processed by decay mechanisms. Long-Term Memory Architecture Long-term memory is implemented using two complementary storage systems. PostgreSQL Knowledge Store: PostgreSQL stores structured long-term knowledge in textual form. If consolidated memory introduces new factual information, it is appended to a structured memory string or knowledge record, providing a human-readable persistent memory layer. Qdrant Long-Term Vector Memory: A corresponding vector representation of each long-term memory is stored in a dedicated Qdrant collection, enabling efficient semantic retrieval during prompt construction. If new summarized memory represents information already present in long-term memory, the PostgreSQL record is not modified — instead, the existing vector point receives an importance reinforcement, tracking how frequently or strongly a memory is referenced.
Each memory point contains a decay value representing its temporal relevance. Decay values range from -∞ to 2. Anything below the lower processing threshold decays out of short-term context and is either consolidated to long-term storage (if importance is sufficient) or removed completely. Supported decay models include linear decay, logarithmic decay, exponential decay, and hybrid models combining multiple functions. Decay rate depends on time elapsed since creation, reinforcement signals, contradiction signals, and contextual importance. Reinforcement Mechanism: When a new interaction produces a memory candidate that is semantically similar to an existing memory point, the system does not create a duplicate. Instead, the existing point is reinforced — its importance score increases and its decay trajectory may slow. This allows the system to strengthen recurring information without creating redundant entries. Contradiction Handling: When a new memory point contradicts an existing one, the new point is added to short-term memory normally. The previously stored contradicting point receives an accelerated decay rate, causing it to expire faster than it otherwise would. For long-term memory, the old contradicted point is soft-deleted — marked with a contradicted flag and timestamp rather than hard-deleted — and excluded from all future retrieval. This preserves an audit trail while ensuring the AI's working knowledge reflects the most current understanding. FACT-type memories are exempt from standard decay. They remain frozen until a contradiction is explicitly detected, at which point soft-deletion and replacement follows the same contradiction flow described above.
The architecture uses two background workers to continuously evolve the memory system. Decay Worker: The decay worker periodically scans the short-term Qdrant collection. Its responsibilities include applying decay functions to memory points, updating decay values based on elapsed time, maintaining importance scores, and identifying memory points approaching the expiration threshold. Memory Consolidation Worker: When several memory points approach near-zero decay values, they are selected as candidates for long-term consolidation. The worker pops candidate points from the short-term Qdrant collection, evaluates and summarizes the combined information, compares the summarized memory against existing long-term memory, and decides whether the information represents new knowledge or a reinforcement of existing knowledge. This asynchronous design keeps inference latency low while ensuring the long-term store is continuously updated.
When constructing prompts for the AI model, the system retrieves relevant memories from both memory layers via a structured pipeline. The retrieval pipeline proceeds as follows: first, long-term Qdrant memory is queried using semantic similarity; next, recent session memory is retrieved from Redis; then, active short-term memory is retrieved from Qdrant. The combined candidate set is then reranked based on a composite memory score that considers semantic similarity, importance factor, decay value, reinforcement count, and recency. The reranked memories are then summarized before being injected into the final prompt. This summarization step ensures token efficiency — the AI receives a distilled, maximally relevant memory context rather than raw memory dumps — and prevents context window overflow as the memory store grows. This ensures the AI receives the most relevant, recent, and reinforced information available at inference time.
We conducted internal testing across several simulated long-horizon interaction scenarios, including personal assistant workflows, multi-session coaching dialogues, and preference tracking tasks. Our observations confirm that the decay and reinforcement mechanisms effectively prevent memory pollution over extended sessions. Memory points representing persistent user traits and recurring themes were consistently elevated in retrieval rankings, while transient or low-signal content expired without manual intervention. The contradiction handling mechanism was validated against scenarios involving changing user preferences and updated factual information. In all tested cases, the soft-deletion flow correctly demoted stale memories and elevated updated beliefs within one to two reinforcement cycles. FACT-type memory proved particularly robust — persistent facts such as user identity, stated goals, and diagnosed conditions remained stable across sessions and were correctly updated when contradictions were introduced. Latency overhead from the retrieval and reranking pipeline was acceptable across all evaluated configurations, with Redis session memory effectively eliminating redundant vector queries for recent context.
We have presented a hybrid adaptive memory architecture that addresses core limitations of existing agent memory systems: memory pollution from low-signal content, precision degradation at scale, poor long-horizon recall, and the inability to update beliefs when contradictions arise. The architecture introduces a principled memory lifecycle — from selective creation through decay and reinforcement to hierarchical consolidation — that moves AI systems closer to cognitive-inspired persistent intelligence. By combining Redis for ephemeral session context, Qdrant for active vector memory processing, and PostgreSQL for structured long-term knowledge, the system supports efficient, semantically grounded memory retrieval at inference time. Future work will explore multi-user memory isolation, personalized decay rate calibration, cross-session contradiction resolution strategies, and memory-grounded chain-of-thought reasoning. We believe persistent adaptive memory is a prerequisite for the next generation of user-facing AI agents, and this architecture takes a concrete step toward making it production-ready.