Skip to content

ADR-006: Completing Automated Email Sync & Progress Tracking

Status

Implemented

Context

ADR-005 (005-email-integration-system.md) and the Email Processing Use Case define the intended architecture for automated email sync in OmniButler:

  • Automatic sync on Gmail connect (via ProviderUser watcher)
  • Periodic background sync (scheduler)
  • Batched, incremental processing (weekly, paginated)
  • Real-time progress tracking for user feedback/notifications

However, several critical features described in ADR-005 and the use case doc are not implemented in the codebase:

  • The ProviderUser watcher does not trigger email sync on Gmail connect
  • There is no batch orchestration function (e.g., process_email_batch) coordinating incremental/batched sync
  • There is no periodic scheduler for ongoing sync
  • Progress is not written to Firestore in a frontend-consumable, unified way

This ADR fills those gaps, specifying exactly what must be built or refactored to realize the intended architecture. All code/processes are to be taken verbatim from ADR-005 and email_processing.md, except for the notification/progress tracking table, which is updated as described below.

Gaps & Requirements (with References)

1. Enhanced ProviderUser Watcher

Reference: ADR-005, "Enhanced Provider User Watcher" section

Missing:

  • No logic to detect services.gmail activation and trigger sync
  • No creation of progress tracking state on connect

Required:

  • In src/infrastructure/database/firestore/watchers/provider_users.py, implement the on_provider_users_update function as shown in ADR-005, including the logic for detecting Gmail activation and triggering the initial sync. Use the code block from ADR-005 verbatim.
  • If not present, copy the function from ADR-005 and implement it as specified.

2. Batch Orchestration Function

Reference: ADR-005, process_email_batch pseudocode; email_processing.md "Background Processing Integration"

Missing:

  • No function that coordinates batched, incremental sync (weekly, paginated)
  • No chaining of batches or progress updates

Required:

  • Implement process_email_batch as described in ADR-005 and email_processing.md, using the code and process flow provided there. This function should:
  • Accept provider_user_id, start_date, end_date, batch_size, page_token, etc.
  • Use EmailProcessingOrchestrator to fetch, store, summarize, tag, embed emails for the batch
  • Update progress state after each batch
  • If more pages, schedule next batch (with next page token)
  • If batch complete, move to previous week and repeat until done
  • On error, update progress state with error info
  • Use the code and logic from ADR-005/email_processing.md verbatim.

3. Periodic Email Sync Scheduler

Reference: ADR-005, "Periodic Email Sync" section; email_processing.md "Periodic Synchronization"

Missing:

  • No scheduler (APScheduler or otherwise) for hourly/daily sync
  • No integration with FastAPI app startup

Required:

  • Implement the scheduler as described in ADR-005, using the code block provided there. Place it in src/application/schedulers/email_sync.py (or similar).
  • Integrate scheduler startup in application/app.py as shown in ADR-005.

4. System-Wide Notification/Progress Tracking Table

Reference: ADR-005, "Progress Tracking" and "Data Model Extensions"; email_processing.md "Processing States"

Change:

  • Instead of using ProviderUser.email_processing_status, implement a new Firestore collection (e.g., notifications or syncProgress) for system-wide progress/notification tracking.
  • Schema (updated):
{
  'syncId': str,           # unique id for this sync operation
  'appUserId': str,        # AppUser id
  'familyId': str,         # Family id (if applicable)
  'providerUserId': str,   # ProviderUser id
  'type': 'email',         # sync type
  'status': 'pending' | 'in_progress' | 'completed' | 'failed',
  'progress': int,         # 0-100
  'error': str | None,
  'startedAt': timestamp,
  'updatedAt': timestamp,
  'details': dict,         # batch info, counts, etc.
}
  • All sync jobs (initial and periodic) write progress here, updating at each batch, on error, and on completion.
  • The frontend will subscribe to this table for real-time notification and progress updates.
  • Security: Only the relevant app user, their family, and admins should be able to read these documents. Only backend services should write.
  • Single System: Only use the new system-wide notification/progress tracking collection. Do not update ProviderUser.email_processing_status - we will maintain only one progress tracking system.

5. Batching, Progress Calculation, Error Handling, and Concurrency

  • Batching: Each batch should process a fixed number of emails (e.g., 100), as described in ADR-005/email_processing.md. If there are more emails, use pagination (next_page_token) and schedule the next batch.
  • Progress Calculation: Progress should be calculated as a percentage of total expected emails for the sync period. If the total is unknown, use batch count as a proxy and update as more info becomes available.
  • Error Handling: On any error, update the notification/progress document with status: 'failed' and include error details. If a batch fails, mark the sync as failed and stop further processing for that user until manual intervention or the next scheduled run.
  • Concurrency & Deduplication: Before starting a new sync, check if one is already in progress for the user. If so, skip or queue the new sync. Ensure idempotency by using unique syncId and checking status before processing.

6. Testing & Monitoring

  • Testing:
  • Unit tests for watcher, batch orchestrator, scheduler, and progress tracking
  • Integration tests for end-to-end sync and progress updates
  • Simulate errors, race conditions, and verify progress/error reporting
  • Monitoring:
  • Add logging and monitoring for sync jobs, progress updates, and failures

Implementation Plan

Phase 1: System-Wide Progress Tracking

  • Implement the new Firestore notification/progress schema and repository
  • Create models and repository for the new collection

Phase 2: ProviderUser Watcher

  • Implement the watcher logic verbatim from ADR-005, including Gmail activation detection and sync trigger
  • Create initial progress state in the new notification/progress table

Phase 3: Batch Orchestration

  • Implement process_email_batch as described in ADR-005/email_processing.md
  • Integrate with watcher and scheduler
  • Update pipeline to write progress at each step, including appUserId and familyId

Phase 4: Scheduler

  • Implement APScheduler job for periodic sync as described in ADR-005
  • Integrate with app startup

Phase 5: Testing & Monitoring

  • Write unit/integration tests for watcher, batch orchestrator, scheduler, and progress tracking
  • Add logging and monitoring for sync jobs and progress

Consequences

Positive

  • Fulfills the vision of ADR-005 and email_processing.md
  • Enables real-time, detailed progress tracking for frontend notification system
  • Extensible to other sync types (transactions, etc.)
  • Robust error handling and observability
  • Single source of truth for progress tracking

Negative

  • Adds complexity (watcher, batch orchestrator, scheduler, new Firestore collection)
  • Slightly higher Firestore write/read costs
  • Requires careful coordination to avoid duplicate syncs

Mitigations

  • Use efficient Firestore queries and indexes
  • Deduplicate sync triggers by checking in-progress status
  • Monitor and alert on sync failures

References