Skip to content

WhatsApp Chat Integration

Related Documentation:

This document outlines OmniButler's WhatsApp communication channel implementation.

Overview

OmniButler integrates with WhatsApp through Twilio's API to provide a text-based interface for accessing financial information and AI assistant features. The implementation allows users to interact with the system naturally through WhatsApp messages.

System Architecture

```mermaid sequenceDiagram participant User participant Twilio participant OmniButler participant LLMService participant Firestore

User->>Twilio: Send WhatsApp message
Twilio->>OmniButler: POST /api/v1/whatsapp/conversations
OmniButler->>OmniButler: Background task processing
OmniButler->>Firestore: Get user mapping
OmniButler->>LLMService: Generate response
LLMService->>Firestore: Query user data
Firestore-->>LLMService: Return data
LLMService-->>OmniButler: AI response
OmniButler->>Twilio: Send message response
Twilio->>User: Deliver WhatsApp message
Twilio->>OmniButler: POST /api/v1/whatsapp/status (webhook)
OmniButler->>Firestore: Log message status

```

Key Components

1. Webhook Endpoints

The system provides two main endpoints:

Message Reception Endpoint

@conversation_router.post("/conversations")
async def conversation(request: Request, background_tasks: BackgroundTasks) -> Response:
    # Validate the Twilio request signature if not in test mode
    is_test_mode = request.headers.get("X-Test-Mode") == "true"
    if not is_test_mode and not validate_twilio_signature(request):
        logger.warning("Invalid Twilio signature")
        return Response(status_code=200)  # Return 200 to avoid Twilio retries

    # Parse the form data from Twilio
    form_data = await request.form()
    call_details = WhatsappMessageDetails.model_validate(dict(form_data))

    # Log the received message
    logger.info(
        f"Received WhatsApp message from {call_details.From[:10]}..., SID: {call_details.MessageSid}"
    )

    # Send an immediate success response
    response = Response(status_code=200)

    # Process the request in the background
    background_tasks.add_task(
        process_whatsapp_message, call_details
    )

    return response

Status Tracking Endpoint

@conversation_router.post("/status")
async def conversation_status(request: Request) -> Response:
    # Validate the Twilio request signature
    is_valid = validate_twilio_signature(request)
    if not is_valid:
        logger.warning("Invalid Twilio signature on status webhook")
        return Response(status_code=200)  # Return 200 to avoid Twilio retries

    # Extract and log status details
    form_data = await request.form()
    message_sid = form_data.get("MessageSid")
    message_status = form_data.get("MessageStatus")

    logger.info(f"WhatsApp message status update: {message_sid} -> {message_status}")

    # Update message status in database
    repository = FirestoreWhatsAppUserMappingRepository()
    await repository.update_message_status(
        message_sid=message_sid,
        status=message_status,
    )

    return Response(status_code=200)

2. Message Processing

The WhatsappConversationUseCase class handles:

  • User identification based on phone number
  • Message content extraction
  • AI response generation
  • Response formatting for WhatsApp
  • Message delivery
  • Error handling and recovery

3. Twilio Integration

  • Uses Twilio's messaging API for WhatsApp communication
  • Implements webhook endpoints for receiving messages and status updates
  • Processes messages asynchronously to provide immediate acknowledgment to Twilio
  • Validates webhook signatures for security

Message Flow

  1. User sends a message via WhatsApp to the Twilio-provisioned phone number
  2. Twilio forwards the message to OmniButler's /conversations endpoint
  3. OmniButler acknowledges receipt with a 200 status code to Twilio
  4. Background processing begins:
  5. The message is parsed and validated
  6. The user is identified by their WhatsApp number using Firestore repository
  7. The message content is sent to the LLM service
  8. An appropriate response is generated
  9. Response is sent back to the user via Twilio's API
  10. Twilio sends delivery status updates to the /status endpoint

Implementation Details

WhatsApp Message DTO

class WhatsappMessageDetails(BaseModel):
    """
    Represents incoming WhatsApp message details from Twilio.

    Attributes:
        From: Sender's phone number with whatsapp: prefix
        Body: Message content text (optional for media messages)
        MessageSid: Unique message identifier from Twilio
        ProfileName: Sender's WhatsApp profile name
        WaId: WhatsApp ID (usually the phone number without prefix)
        NumMedia: Number of media attachments
        MediaUrl0: URL to the first media item (if any)
        MediaContentType0: MIME type of the first media item
    """
    From: str
    Body: str = ""
    MessageSid: str
    ProfileName: str = ""
    WaId: str = ""
    NumMedia: str = "0"
    MediaUrl0: str | None = None
    MediaContentType0: str | None = None

Configuration

Required environment variables:

  • TWILIO_ACCOUNT_SID: Twilio account identifier
  • TWILIO_AUTH_TOKEN: Authentication token for Twilio API
  • TWILIO_FROM_NUMBER: WhatsApp-enabled Twilio phone number

Current Status

The WhatsApp integration is fully implemented using Firestore for data storage and properly integrated with the LLM system. Key improvements include:

  1. Clean architecture implementation with proper separation of concerns
  2. Move from SQLAlchemy to Firestore for user mapping storage
  3. Consolidated webhook endpoints in a single file
  4. Better error handling and recovery
  5. Improved test coverage
  6. Background task processing for better performance

Future Enhancements

  1. Rich Media Support
  2. Handle and respond to images, voice messages, and location data
  3. Send formatted responses with images and interactive elements

  4. User Authentication

  5. Implement secure authentication flow for WhatsApp users
  6. Link WhatsApp numbers to existing user accounts

  7. Conversation Memory

  8. Preserve context across multiple message exchanges
  9. Access previous interactions for more coherent responses

  10. Interactive Features

  11. Quick replies for common actions
  12. Template messages for structured data
  13. Transaction notifications and alerts