AI Chatbot Architecture¶
This document outlines the implementation and architecture of OmniButler's AI chatbot system.
Overview¶
The OmniButler AI chatbot provides a powerful conversational interface that enables users to interact with their financial data using natural language. The system leverages state-of-the-art language models, a sophisticated memory architecture, and specialized tools to deliver accurate, context-aware responses.
System Architecture¶
```mermaid sequenceDiagram participant User participant Frontend participant WebSocket participant LLMService participant Memory participant LLM participant Tools
User->>Frontend: Send message
Frontend->>WebSocket: Send via WebSocket
WebSocket->>WebSocket: Verify Firebase token
WebSocket->>LLMService: Process message
LLMService->>Memory: Retrieve context
LLMService->>LLM: Generate response
LLM->>Tools: Use specialized tools
Tools-->>LLM: Tool responses
LLM-->>LLMService: Generated content
LLMService-->>WebSocket: Stream response chunks
WebSocket-->>Frontend: Stream response
Frontend-->>User: Display response
```
Key Components¶
1. WebSocket Endpoint¶
- Provides real-time, bidirectional communication
- Manages user authentication via Firebase tokens
- Handles session creation and management
- Streams responses in chunks for immediate feedback
2. LLM Service¶
The LLMService class orchestrates the interaction between the user input and language model:
- Processes various input types (text, audio, image, location)
- Manages streaming response generation
- Coordinates memory usage and context retrieval
- Handles event-based streaming of responses
3. Memory System¶
A three-tiered memory architecture:
- Message History: Stores previous conversations using Redis
- Vector Store: Enables semantic search across user data
- Context Manager: Preserves state between interactions
4. Language Model Integration¶
Supports multiple LLM providers:
- OpenAI: Primary model provider for production
- TogetherAI: Alternative model provider
- Configuration for streaming capabilities
- Custom prompting and response formatting
5. Specialized Tools¶
The chatbot has access to specialized tools:
TransactionRetrieverTool: Searches financial transactionsWhatsappMessagingTool: Sends WhatsApp messagesGooglePlacesTool: Finds locationsget_current_time: Gets current time- Various other tools for financial data analysis
Communication Flow¶
- Authentication
- Client establishes WebSocket connection
- Client sends Firebase token and optional session ID
- Server verifies token with Firebase
-
Server retrieves or creates user from Firestore
-
Message Processing
- Client sends message through WebSocket
- Server processes message through LLMService
-
LLMService loads relevant context from memory
-
Response Generation
- LLM generates response using available tools and context
- Response is streamed in chunks through WebSocket
- Each chunk is delivered as a JSON message
-
Special "END_OF_MESSAGE" signifies completion
-
Error Handling
- WebSocket disconnect events are logged
- Authentication failures trigger connection closure
- LLM errors are captured and reported
Prompt Engineering¶
The chatbot uses sophisticated prompt engineering:
You are a highly knowledgeable and efficient assistant designed to support
Tech-Savvy Professional Parents aged 35-45. Your primary goal is to provide
accurate and comprehensive financial advice tailored to their complex needs
and busy schedules.
The system prompt includes:
- Role definition and target audience
- Guidelines for response style and content
- Available tools and their usage
- Guidelines for output formatting
Output Formatting¶
The system supports multiple output formats:
- Markdown: For rich text web interfaces
- SSML: For voice interfaces
- WhatsApp: For messaging
- Plain Text: For simple interfaces
Each format has specific formatting instructions for:
- Financial data (currency symbols, decimal places)
- Lists and structural elements
- Formatting based on the output channel
Example Implementation¶
WebSocket Handler¶
@chatbot_router.websocket("/async/chatbot")
async def chatbot_websocket(websocket: WebSocket) -> None:
await websocket.accept()
# Authentication
auth_message = await websocket.receive_json()
token = auth_message.get("token")
# User verification
decoded_token = verify_firebase_token(token)
user_id = decoded_token.uid if decoded_token else None
app_user = FirestoreAppUserRepository().find_by_provider_user_id(
user_id, decoded_token.provider_name
)
# Initialize LLM service
llm_service = LLMService(
user_id=app_user.appUserId,
format_=LLMOutputFormats.markdown,
)
# Message handling loop
try:
while True:
message = await websocket.receive_text()
# Stream response chunks
async for content in llm_service.generate_streaming_response(input_=message):
await websocket.send_json({"content": content})
await websocket.send_json({"content": "END_OF_MESSAGE"})
except WebSocketDisconnect:
logger.info("WebSocket disconnect")
Future Enhancements¶
- Multi-modal Input Processing
- Enhanced image processing capabilities
- Voice input enhancements
-
Document analysis
-
Advanced Context Management
- Improved retrieval mechanisms
- Long-term memory optimization
-
Context prioritization
-
Tool Enhancements
- Additional financial analysis tools
- Expanded third-party integrations
-
More sophisticated transaction analysis
-
Performance Optimizations
- Caching strategies for common queries
- Distributed memory management
- Response generation optimizations