Skip to content

Memory System Architecture

This document provides a detailed overview of OmniButler's sophisticated memory system architecture, which enables context-aware AI interactions across multiple sessions.

Overview

The memory system is designed to provide the AI with access to multiple types of contextual information:

  1. Working Memory (Message History) - Recent messages in the current conversation
  2. Long-term Memory (Vector Store) - Semantic and episodic memories retrieved through similarity search
  3. Context State (Context Manager) - Session state and metadata preservation

This three-tiered approach is orchestrated through LangChain's CombinedMemory mechanism, ensuring that the AI has access to relevant information while maintaining performance and keeping responses contextually appropriate.

Memory Types

  • Semantic Memory: General knowledge and facts stored as embeddings
  • Episodic Memory: Specific important events and interactions
  • Working Memory: Active conversation context in a sliding window

Architecture Diagram

```mermaid graph TD subgraph Memory System Memory[Memory Orchestrator] CM[CombinedMemory] MHM[Message History Manager] VSM[Vector Store Manager] CTX[Context Manager] end

subgraph Redis Infrastructure
    RH[Redis History Store]
    RV[Redis Vector Store]
    RM[Redis Metadata Store]
    RC[Redis Context Cache]
end

subgraph AI System
    LLM[Language Model]
    Tools[AI Tools]
end

User[User] --> |Message| AI[AI System]

AI --> |Request| Memory
Memory --> CM
CM --> MHM
CM --> VSM
Memory --> CTX

MHM --> RH
VSM --> RV
VSM --> RM
CTX --> RC

CM --> |Combined Context| LLM
Memory --> |State| Tools

style Memory fill:#f9f,stroke:#333,stroke-width:2px
style Redis fill:#bbf,stroke:#333,stroke-width:2px
style AI fill:#dfd,stroke:#333,stroke-width:2px

```

Key Components

1. Memory Orchestrator (Memory Class)

The Memory class serves as the orchestrator for the entire memory system, integrating the different memory components through LangChain's CombinedMemory:

class Memory:
    memory: CombinedMemory

    def __init__(self, user_id: str) -> None:
        self.user_id = user_id
        self.index_name = f"{app_config.memory_index_name}:{user_id}"

        # Initialize managers
        self.vector_store_manager = VectorStoreManager(self.index_name)
        self.message_history_manager = MessageHistoryManager(
            session_id=self.index_name,
            buffer_size=app_config.memory_buffer_size,
        )
        self.context_manager = ContextManager(user_id)

        # Set up combined memory using LangChain's CombinedMemory
        self.memory = CombinedMemory(
            memories=[
                self.message_history_manager.memory,  # Working memory
                self.vector_store_manager.memory,      # Long-term memory
            ],
        )

The Memory class: - Uses LangChain's CombinedMemory to merge multiple memory sources - Provides async methods for memory operations (aadd_memory, asearch_all_memories) - Maintains user-specific memory isolation - Supports both synchronous and asynchronous operations

2. Message History Manager

The MessageHistoryManager maintains a window of recent conversation history, enabling the AI to refer to recent exchanges:

class MessageHistoryManager:
    def __init__(self, session_id: str, buffer_size: int):
        self.session_id = session_id
        self.buffer_size = buffer_size
        self.chat_memory = self.initialize_chat_memory()

    def initialize_chat_memory(self):
        message_history = RedisChatMessageHistory(
            session_id=self.session_id,
            url=app_config.redis_url,
        )

        buffer_memory = ConversationBufferWindowMemory(
            k=self.buffer_size,
            memory_key="chat_history",
            input_key="input",
            output_key="output",
            chat_memory=message_history,
            return_messages=True,
        )

        return buffer_memory

Key features: - Uses Redis for persistent storage of message history - Implements a sliding window to maintain only recent messages - Configurable window size to balance context and performance - Automatically serializes and deserializes messages

3. Vector Store Manager with Two-Tier Metadata Storage

The VectorStoreManager enables semantic search across stored information with a sophisticated two-tier storage approach:

class VectorStoreManager:
    # Class-level cache to reuse vector stores for the same index
    _vector_store_cache: dict[str, LangchainRedis] = {}

    def __init__(self, index_name: str, k: int | None = None):
        self.index_name = index_name
        self.k = k or app_config.retriever_k
        self.redis_vectorstore = None
        self.retriever_memory = None
        self.initialize_vectorstore()

    async def aadd_memory(self, content: str, metadata: dict[str, Any] | None = None) -> str:
        """Add a memory with two-tier storage:
        1. Vector store: content + minimal metadata for search
        2. Redis hash: full metadata preservation
        """
        # Store minimal metadata in vector store (for compatibility)
        vector_metadata = {
            'temp_id': temp_memory_id,
            'created_at': datetime.now(timezone.utc).isoformat()
        }
        ids = self.redis_vectorstore.add_texts([content], metadatas=[vector_metadata])

        # Store full metadata separately in Redis hash
        if metadata:
            redis_client = redis.Redis(connection_pool=redis_manager.get_connection_pool())
            metadata_key = f"metadata:{self.index_name}:{memory_id}"
            redis_client.hset(metadata_key, mapping={
                k: json.dumps(v) if isinstance(v, (list, dict)) else str(v) 
                for k, v in metadata.items()
            })

Key features: - Two-tier storage: Vector store for search, Redis hash for full metadata - Connection pooling: Reuses Redis connections for performance - Vector store caching: Reuses vector store instances across requests - Metadata preservation: Complex metadata (lists, dicts, custom fields) stored separately - Similarity search: Retrieves both vector content and full metadata

4. Context Manager with Async Support

The ContextManager manages session state and message-specific context with full async support:

class ContextManager:
    def __init__(self, user_id: str) -> None:
        self.user_id = user_id
        self.redis_client: Redis[bytes] = Redis.from_url(app_config.redis_url)
        self.async_redis_client: AsyncRedis[bytes] | None = None

    async def aupdate_context(self, key: str, value: Any, ttl: int | None = None) -> None:
        """Update context asynchronously with optional TTL."""
        redis_client = await self._get_async_redis()
        context_key = f"context:{self.user_id}:{key}"
        context_json = json.dumps(value)

        if ttl:
            await redis_client.set(context_key, context_json, ex=ttl)
        else:
            await redis_client.set(context_key, context_json)

    async def aget_all_context(self) -> dict[str, Any]:
        """Get all context keys for the user asynchronously."""
        redis_client = await self._get_async_redis()
        pattern = f"context:{self.user_id}:*"

        # Use scan to get all matching keys
        cursor = 0
        keys: list[bytes] = []
        while True:
            cursor, batch = await redis_client.scan(cursor, match=pattern, count=100)
            keys.extend(batch)
            if cursor == 0:
                break

Key features: - Dual sync/async support: Both synchronous and asynchronous methods - TTL support: Optional time-to-live for context entries - Bulk operations: Get all context with pattern matching - Lazy async client: Creates async Redis client only when needed

Integration with LLM Service

The memory system integrates with the LLMService to provide context for AI interactions:

def __init__(
    self,
    user_id: str,
    format_: LLMOutputFormats = LLMOutputFormats.markdown,
    llm_name: str = SupportedLLMs.openai,
    session_id: str | None = None,
):
    self.format_ = format_
    self.memory = Memory(user_id)
    self.agent_executor = use_llm(
        memory=self.memory,
        user_id=user_id,
        format_=format_,
        llm_name=llm_name
    )

When generating responses, the memory system: 1. Stores user inputs and AI responses in the message history 2. Saves relevant information to the vector store for future retrieval 3. Maintains message-specific context for stateful interactions 4. Retrieves relevant information to provide context for the current interaction

Memory Usage in Prompt Templates

The memory system is integrated into the LLM prompt templates:

prompt_template = ChatPromptTemplate.from_messages(
    [
        SystemMessagePromptTemplate.from_template(system_message_content),
        MessagesPlaceholder(variable_name="agent_scratchpad"),
        SystemMessagePromptTemplate.from_template("Chat History:\n{chat_history}"),
        HumanMessagePromptTemplate.from_template("{input}"),
    ]
)

Data Flow

```mermaid sequenceDiagram participant User participant LLMService participant Memory participant VectorStore participant MessageHistory participant ContextManager participant LLM

User->>LLMService: Send message
LLMService->>Memory: Load context
Memory->>MessageHistory: Retrieve recent messages
Memory->>VectorStore: Semantic search
Memory->>ContextManager: Get message context
Memory-->>LLMService: Combined context
LLMService->>LLM: Generate response with context
LLM-->>LLMService: Response
LLMService->>Memory: Update memory
Memory->>MessageHistory: Store new exchange
Memory->>VectorStore: Update with new information
Memory->>ContextManager: Store message context
LLMService-->>User: Stream response

```

Technical Considerations

Two-Tier Storage Architecture

The memory system uses a two-tier storage approach for vector memories: 1. Vector Store Layer: Stores embeddings with minimal metadata for search 2. Metadata Layer: Stores full metadata in Redis hash sets

This approach solves LangChain's Redis vector store limitation of only supporting simple types (string, int, float) in metadata.

Persistence and TTL

Memory data is persisted in Redis with configurable Time-To-Live (TTL) settings: - Short-term message history: Sliding window (no explicit TTL) - Vector store data: Persistent (no TTL) - Full metadata: 30-day TTL - Context data: Optional TTL per key

Performance Optimization

  1. Connection Pooling: All Redis operations use a shared connection pool
  2. Vector Store Caching: Vector store instances are cached and reused
  3. Sliding Window: Message history limited to recent messages
  4. Efficient Search: Similarity search with configurable k and score thresholds
  5. Async Operations: Full async support for non-blocking operations

Memory Triggering

  • Semantic memories: Automatically retrieved based on similarity
  • Important facts: Must be manually triggered by the user or system
  • Working memory: Always available in the conversation context

Security Considerations

Memory data is secured through: - User-specific namespacing of all keys - Redis authentication and encryption - No storage of sensitive PII in the memory system - Proper type annotations for better code safety

Implementation Details

Adding Memories

# Add a semantic memory
await memory.aadd_memory(
    content="The user prefers morning meetings",
    metadata={
        "type": "preference",
        "category": "scheduling",
        "importance": 0.8,
        "timestamp": datetime.now().isoformat()
    }
)

# Add an important fact (manually triggered)
await memory.aadd_memory(
    content="User's birthday is March 15th",
    metadata={
        "type": "important_fact",
        "category": "personal",
        "importance": 1.0,
        "recurring": True
    }
)

Searching Memories

# Search all memories
memories = await memory.asearch_all_memories(
    query="When does the user like to meet?",
    k=5
)

# Each memory includes:
# - content: The actual memory text
# - metadata: Full preserved metadata
# - score: Similarity score
# - id: Unique identifier

Future Enhancements

  1. Temporal Memory Features (ADR-021): Time-based memory retrieval and decay
  2. Memory Compression: Using AI to summarize and compress older memories
  3. Cross-User Memory: Shared context while maintaining privacy
  4. Memory Importance Learning: AI-driven importance scoring
  5. Memory Analytics: Usage patterns and effectiveness metrics