AI Chatbot Architecture¶

This document outlines the implementation and architecture of OmniButler's AI chatbot system.

Overview¶

The OmniButler AI chatbot provides a powerful conversational interface that enables users to interact with their financial data using natural language. The system leverages state-of-the-art language models, a sophisticated memory architecture, and specialized tools to deliver accurate, context-aware responses.

System Architecture¶

```mermaid sequenceDiagram participant User participant Frontend participant WebSocket participant LLMService participant Memory participant LLM participant Tools

User->>Frontend: Send message
Frontend->>WebSocket: Send via WebSocket
WebSocket->>WebSocket: Verify Firebase token
WebSocket->>LLMService: Process message
LLMService->>Memory: Retrieve context
LLMService->>LLM: Generate response
LLM->>Tools: Use specialized tools
Tools-->>LLM: Tool responses
LLM-->>LLMService: Generated content
LLMService-->>WebSocket: Stream response chunks
WebSocket-->>Frontend: Stream response
Frontend-->>User: Display response

```

Key Components¶

1. WebSocket Endpoint¶

Provides real-time, bidirectional communication
Manages user authentication via Firebase tokens
Handles session creation and management
Streams responses in chunks for immediate feedback

2. LLM Service¶

The LLMService class orchestrates the interaction between the user input and language model:

Processes various input types (text, audio, image, location)
Manages streaming response generation
Coordinates memory usage and context retrieval
Handles event-based streaming of responses

3. Memory System¶

A three-tiered memory architecture:

Message History: Stores previous conversations using Redis
Vector Store: Enables semantic search across user data
Context Manager: Preserves state between interactions

4. Language Model Integration¶

Supports multiple LLM providers:

OpenAI: Primary model provider for production
TogetherAI: Alternative model provider
Configuration for streaming capabilities
Custom prompting and response formatting

5. Specialized Tools¶

The chatbot has access to specialized tools:

TransactionRetrieverTool: Searches financial transactions
WhatsappMessagingTool: Sends WhatsApp messages
GooglePlacesTool: Finds locations
get_current_time: Gets current time
Various other tools for financial data analysis

Communication Flow¶

Authentication
Client establishes WebSocket connection
Client sends Firebase token and optional session ID
Server verifies token with Firebase
Server retrieves or creates user from Firestore
Message Processing
Client sends message through WebSocket
Server processes message through LLMService
LLMService loads relevant context from memory
Response Generation
LLM generates response using available tools and context
Response is streamed in chunks through WebSocket
Each chunk is delivered as a JSON message
Special "END_OF_MESSAGE" signifies completion
Error Handling
WebSocket disconnect events are logged
Authentication failures trigger connection closure
LLM errors are captured and reported

Prompt Engineering¶

The chatbot uses sophisticated prompt engineering:

You are a highly knowledgeable and efficient assistant designed to support
Tech-Savvy Professional Parents aged 35-45. Your primary goal is to provide
accurate and comprehensive financial advice tailored to their complex needs
and busy schedules.

The system prompt includes:

Role definition and target audience
Guidelines for response style and content
Available tools and their usage
Guidelines for output formatting

Output Formatting¶

The system supports multiple output formats:

Markdown: For rich text web interfaces
SSML: For voice interfaces
WhatsApp: For messaging
Plain Text: For simple interfaces

Each format has specific formatting instructions for:

Financial data (currency symbols, decimal places)
Lists and structural elements
Formatting based on the output channel

Example Implementation¶

WebSocket Handler¶

@chatbot_router.websocket("/async/chatbot")
async def chatbot_websocket(websocket: WebSocket) -> None:
    await websocket.accept()

    # Authentication
    auth_message = await websocket.receive_json()
    token = auth_message.get("token")

    # User verification
    decoded_token = verify_firebase_token(token)
    user_id = decoded_token.uid if decoded_token else None
    app_user = FirestoreAppUserRepository().find_by_provider_user_id(
        user_id, decoded_token.provider_name
    )

    # Initialize LLM service
    llm_service = LLMService(
        user_id=app_user.appUserId,
        format_=LLMOutputFormats.markdown,
    )

    # Message handling loop
    try:
        while True:
            message = await websocket.receive_text()

            # Stream response chunks
            async for content in llm_service.generate_streaming_response(input_=message):
                await websocket.send_json({"content": content})

            await websocket.send_json({"content": "END_OF_MESSAGE"})
    except WebSocketDisconnect:
        logger.info("WebSocket disconnect")

Future Enhancements¶

Multi-modal Input Processing
Enhanced image processing capabilities
Voice input enhancements
Document analysis
Advanced Context Management
Improved retrieval mechanisms
Long-term memory optimization
Context prioritization
Tool Enhancements
Additional financial analysis tools
Expanded third-party integrations
More sophisticated transaction analysis
Performance Optimizations
Caching strategies for common queries
Distributed memory management
Response generation optimizations