Skip to content

AI Chatbot Architecture

This document outlines the implementation and architecture of OmniButler's AI chatbot system.

Overview

The OmniButler AI chatbot provides a powerful conversational interface that enables users to interact with their financial data using natural language. The system leverages state-of-the-art language models, a sophisticated memory architecture, and specialized tools to deliver accurate, context-aware responses.

System Architecture

```mermaid sequenceDiagram participant User participant Frontend participant WebSocket participant LLMService participant Memory participant LLM participant Tools

User->>Frontend: Send message
Frontend->>WebSocket: Send via WebSocket
WebSocket->>WebSocket: Verify Firebase token
WebSocket->>LLMService: Process message
LLMService->>Memory: Retrieve context
LLMService->>LLM: Generate response
LLM->>Tools: Use specialized tools
Tools-->>LLM: Tool responses
LLM-->>LLMService: Generated content
LLMService-->>WebSocket: Stream response chunks
WebSocket-->>Frontend: Stream response
Frontend-->>User: Display response

```

Key Components

1. WebSocket Endpoint

  • Provides real-time, bidirectional communication
  • Manages user authentication via Firebase tokens
  • Handles session creation and management
  • Streams responses in chunks for immediate feedback

2. LLM Service

The LLMService class orchestrates the interaction between the user input and language model:

  • Processes various input types (text, audio, image, location)
  • Manages streaming response generation
  • Coordinates memory usage and context retrieval
  • Handles event-based streaming of responses

3. Memory System

A three-tiered memory architecture:

  • Message History: Stores previous conversations using Redis
  • Vector Store: Enables semantic search across user data
  • Context Manager: Preserves state between interactions

4. Language Model Integration

Supports multiple LLM providers:

  • OpenAI: Primary model provider for production
  • TogetherAI: Alternative model provider
  • Configuration for streaming capabilities
  • Custom prompting and response formatting

5. Specialized Tools

The chatbot has access to specialized tools:

  • TransactionRetrieverTool: Searches financial transactions
  • WhatsappMessagingTool: Sends WhatsApp messages
  • GooglePlacesTool: Finds locations
  • get_current_time: Gets current time
  • Various other tools for financial data analysis

Communication Flow

  1. Authentication
  2. Client establishes WebSocket connection
  3. Client sends Firebase token and optional session ID
  4. Server verifies token with Firebase
  5. Server retrieves or creates user from Firestore

  6. Message Processing

  7. Client sends message through WebSocket
  8. Server processes message through LLMService
  9. LLMService loads relevant context from memory

  10. Response Generation

  11. LLM generates response using available tools and context
  12. Response is streamed in chunks through WebSocket
  13. Each chunk is delivered as a JSON message
  14. Special "END_OF_MESSAGE" signifies completion

  15. Error Handling

  16. WebSocket disconnect events are logged
  17. Authentication failures trigger connection closure
  18. LLM errors are captured and reported

Prompt Engineering

The chatbot uses sophisticated prompt engineering:

You are a highly knowledgeable and efficient assistant designed to support
Tech-Savvy Professional Parents aged 35-45. Your primary goal is to provide
accurate and comprehensive financial advice tailored to their complex needs
and busy schedules.

The system prompt includes:

  • Role definition and target audience
  • Guidelines for response style and content
  • Available tools and their usage
  • Guidelines for output formatting

Output Formatting

The system supports multiple output formats:

  • Markdown: For rich text web interfaces
  • SSML: For voice interfaces
  • WhatsApp: For messaging
  • Plain Text: For simple interfaces

Each format has specific formatting instructions for:

  • Financial data (currency symbols, decimal places)
  • Lists and structural elements
  • Formatting based on the output channel

Example Implementation

WebSocket Handler

@chatbot_router.websocket("/async/chatbot")
async def chatbot_websocket(websocket: WebSocket) -> None:
    await websocket.accept()

    # Authentication
    auth_message = await websocket.receive_json()
    token = auth_message.get("token")

    # User verification
    decoded_token = verify_firebase_token(token)
    user_id = decoded_token.uid if decoded_token else None
    app_user = FirestoreAppUserRepository().find_by_provider_user_id(
        user_id, decoded_token.provider_name
    )

    # Initialize LLM service
    llm_service = LLMService(
        user_id=app_user.appUserId,
        format_=LLMOutputFormats.markdown,
    )

    # Message handling loop
    try:
        while True:
            message = await websocket.receive_text()

            # Stream response chunks
            async for content in llm_service.generate_streaming_response(input_=message):
                await websocket.send_json({"content": content})

            await websocket.send_json({"content": "END_OF_MESSAGE"})
    except WebSocketDisconnect:
        logger.info("WebSocket disconnect")

Future Enhancements

  1. Multi-modal Input Processing
  2. Enhanced image processing capabilities
  3. Voice input enhancements
  4. Document analysis

  5. Advanced Context Management

  6. Improved retrieval mechanisms
  7. Long-term memory optimization
  8. Context prioritization

  9. Tool Enhancements

  10. Additional financial analysis tools
  11. Expanded third-party integrations
  12. More sophisticated transaction analysis

  13. Performance Optimizations

  14. Caching strategies for common queries
  15. Distributed memory management
  16. Response generation optimizations