ADR-006: Completing Automated Email Sync & Progress Tracking¶
Status¶
Implemented
Context¶
ADR-005 (005-email-integration-system.md) and the Email Processing Use Case define the intended architecture for automated email sync in OmniButler:
- Automatic sync on Gmail connect (via ProviderUser watcher)
- Periodic background sync (scheduler)
- Batched, incremental processing (weekly, paginated)
- Real-time progress tracking for user feedback/notifications
However, several critical features described in ADR-005 and the use case doc are not implemented in the codebase:
- The ProviderUser watcher does not trigger email sync on Gmail connect
- There is no batch orchestration function (e.g.,
process_email_batch) coordinating incremental/batched sync - There is no periodic scheduler for ongoing sync
- Progress is not written to Firestore in a frontend-consumable, unified way
This ADR fills those gaps, specifying exactly what must be built or refactored to realize the intended architecture. All code/processes are to be taken verbatim from ADR-005 and email_processing.md, except for the notification/progress tracking table, which is updated as described below.
Gaps & Requirements (with References)¶
1. Enhanced ProviderUser Watcher¶
Reference: ADR-005, "Enhanced Provider User Watcher" section
Missing:
- No logic to detect
services.gmailactivation and trigger sync - No creation of progress tracking state on connect
Required:
- In
src/infrastructure/database/firestore/watchers/provider_users.py, implement theon_provider_users_updatefunction as shown in ADR-005, including the logic for detecting Gmail activation and triggering the initial sync. Use the code block from ADR-005 verbatim. - If not present, copy the function from ADR-005 and implement it as specified.
2. Batch Orchestration Function¶
Reference: ADR-005, process_email_batch pseudocode; email_processing.md "Background Processing Integration"
Missing:
- No function that coordinates batched, incremental sync (weekly, paginated)
- No chaining of batches or progress updates
Required:
- Implement
process_email_batchas described in ADR-005 and email_processing.md, using the code and process flow provided there. This function should: - Accept
provider_user_id,start_date,end_date,batch_size,page_token, etc. - Use
EmailProcessingOrchestratorto fetch, store, summarize, tag, embed emails for the batch - Update progress state after each batch
- If more pages, schedule next batch (with next page token)
- If batch complete, move to previous week and repeat until done
- On error, update progress state with error info
- Use the code and logic from ADR-005/email_processing.md verbatim.
3. Periodic Email Sync Scheduler¶
Reference: ADR-005, "Periodic Email Sync" section; email_processing.md "Periodic Synchronization"
Missing:
- No scheduler (APScheduler or otherwise) for hourly/daily sync
- No integration with FastAPI app startup
Required:
- Implement the scheduler as described in ADR-005, using the code block provided there. Place it in
src/application/schedulers/email_sync.py(or similar). - Integrate scheduler startup in
application/app.pyas shown in ADR-005.
4. System-Wide Notification/Progress Tracking Table¶
Reference: ADR-005, "Progress Tracking" and "Data Model Extensions"; email_processing.md "Processing States"
Change:
- Instead of using
ProviderUser.email_processing_status, implement a new Firestore collection (e.g.,notificationsorsyncProgress) for system-wide progress/notification tracking. - Schema (updated):
{
'syncId': str, # unique id for this sync operation
'appUserId': str, # AppUser id
'familyId': str, # Family id (if applicable)
'providerUserId': str, # ProviderUser id
'type': 'email', # sync type
'status': 'pending' | 'in_progress' | 'completed' | 'failed',
'progress': int, # 0-100
'error': str | None,
'startedAt': timestamp,
'updatedAt': timestamp,
'details': dict, # batch info, counts, etc.
}
- All sync jobs (initial and periodic) write progress here, updating at each batch, on error, and on completion.
- The frontend will subscribe to this table for real-time notification and progress updates.
- Security: Only the relevant app user, their family, and admins should be able to read these documents. Only backend services should write.
- Single System: Only use the new system-wide notification/progress tracking collection. Do not update
ProviderUser.email_processing_status- we will maintain only one progress tracking system.
5. Batching, Progress Calculation, Error Handling, and Concurrency¶
- Batching: Each batch should process a fixed number of emails (e.g., 100), as described in ADR-005/email_processing.md. If there are more emails, use pagination (
next_page_token) and schedule the next batch. - Progress Calculation: Progress should be calculated as a percentage of total expected emails for the sync period. If the total is unknown, use batch count as a proxy and update as more info becomes available.
- Error Handling: On any error, update the notification/progress document with
status: 'failed'and include error details. If a batch fails, mark the sync as failed and stop further processing for that user until manual intervention or the next scheduled run. - Concurrency & Deduplication: Before starting a new sync, check if one is already in progress for the user. If so, skip or queue the new sync. Ensure idempotency by using unique
syncIdand checking status before processing.
6. Testing & Monitoring¶
- Testing:
- Unit tests for watcher, batch orchestrator, scheduler, and progress tracking
- Integration tests for end-to-end sync and progress updates
- Simulate errors, race conditions, and verify progress/error reporting
- Monitoring:
- Add logging and monitoring for sync jobs, progress updates, and failures
Implementation Plan¶
Phase 1: System-Wide Progress Tracking¶
- Implement the new Firestore notification/progress schema and repository
- Create models and repository for the new collection
Phase 2: ProviderUser Watcher¶
- Implement the watcher logic verbatim from ADR-005, including Gmail activation detection and sync trigger
- Create initial progress state in the new notification/progress table
Phase 3: Batch Orchestration¶
- Implement
process_email_batchas described in ADR-005/email_processing.md - Integrate with watcher and scheduler
- Update pipeline to write progress at each step, including appUserId and familyId
Phase 4: Scheduler¶
- Implement APScheduler job for periodic sync as described in ADR-005
- Integrate with app startup
Phase 5: Testing & Monitoring¶
- Write unit/integration tests for watcher, batch orchestrator, scheduler, and progress tracking
- Add logging and monitoring for sync jobs and progress
Consequences¶
Positive¶
- Fulfills the vision of ADR-005 and email_processing.md
- Enables real-time, detailed progress tracking for frontend notification system
- Extensible to other sync types (transactions, etc.)
- Robust error handling and observability
- Single source of truth for progress tracking
Negative¶
- Adds complexity (watcher, batch orchestrator, scheduler, new Firestore collection)
- Slightly higher Firestore write/read costs
- Requires careful coordination to avoid duplicate syncs
Mitigations¶
- Use efficient Firestore queries and indexes
- Deduplicate sync triggers by checking in-progress status
- Monitor and alert on sync failures