Skip to content

ADR-003: Database and Data Storage Strategy

Status: Implemented Date: 2024-06-10 Context: The system must store data securely and efficiently, balancing performance, cost, and compliance. Different types of data have different requirements for consistency, scalability, and security.

Decision: 1. Data Classification and Storage: - Firestore is used for: - User mappings and relationships - Non-sensitive metadata - ID-to-ID relationships - Temporary or frequently changing data - Data requiring real-time updates - Data with flexible schema requirements

  • SQLAlchemy (Turso) is used for:

    • Sensitive user information
    • Transaction history
    • Financial data
    • Authentication data
    • Personal identifiable information (PII)
    • Data requiring strong consistency
    • Data with complex relationships
  • Data Access Patterns:

  • Repository pattern is enforced for all data access
  • Each repository must implement a clear interface
  • Direct database access is prohibited outside repositories
  • Transactions are used for multi-document updates
  • Optimistic concurrency control is implemented where needed

Examples: - Good:

# Repository Interface
class UserRepository(Protocol):
    async def get_by_id(self, user_id: str) -> User | None:
        raise NotImplementedError

    async def create(self, user: User) -> bool:
        raise NotImplementedError

# Firestore Implementation
class FirestoreUserRepository:
    def __init__(self, db: firestore.AsyncClient):
        self.db = db
        self.collection = self.db.collection("appUsers")

    async def get_by_id(self, user_id: str) -> User | None:
        doc = await self.collection.document(user_id).get()
        if not doc.exists:
            return None
        return User.model_validate(doc.to_dict())

# SQLAlchemy Implementation
class TursoUserRepository:
    def __init__(self, session: AsyncSession):
        self.session = session

    async def get_by_id(self, user_id: str) -> User | None:
        result = await self.session.execute(
            select(UserModel).where(UserModel.id == user_id)
        )
        user = result.scalar_one_or_none()
        return User.from_model(user) if user else None

# Good Service Implementation
class UserService:
    def __init__(
        self,
        user_repo: UserRepository,
        logger: Logger,
        metrics: MetricsCollector
    ):
        self.user_repo = user_repo
        self.logger = logger
        self.metrics = metrics

    async def get_user(self, user_id: str) -> User | None:
        with self.metrics.timer("user.get_by_id"):
            try:
                user = await self.user_repo.get_by_id(user_id)
                self.metrics.increment(
                    "user.queries",
                    tags={"operation": "get_by_id", "result": "found" if user else "not_found"}
                )
                return user
            except Exception as e:
                self.logger.error(
                    "Failed to get user",
                    extra={"user_id": user_id, "error": str(e)}
                )
                raise UserServiceError("Failed to get user") from e

    async def update_user(self, user_id: str, data: UserUpdate) -> bool:
        with self.metrics.timer("user.update"):
            try:
                # Validate update data
                if not data.is_valid():
                    raise ValidationError("Invalid update data")

                # Get existing user
                user = await self.user_repo.get_by_id(user_id)
                if not user:
                    raise UserNotFoundError(f"User {user_id} not found")

                # Apply updates
                updated_user = user.apply_update(data)

                # Save changes
                success = await self.user_repo.update(updated_user)

                self.metrics.increment(
                    "user.updates",
                    tags={"operation": "update", "result": "success" if success else "failed"}
                )

                return success
            except Exception as e:
                self.logger.error(
                    "Failed to update user",
                    extra={"user_id": user_id, "error": str(e)}
                )
                raise UserServiceError("Failed to update user") from e

  • Bad:
    # Direct database access
    class UserService:
        def __init__(self, db):
            self.db = db
    
        def get_user(self, user_id: str):
            # Directly queries database without abstraction
            doc = self.db.collection("appUsers").document(user_id).get()
            return doc.to_dict()
    
        def update_user(self, user_id: str, data: dict):
            # No transaction handling
            # No validation
            self.db.collection("appUsers").document(user_id).update(data)
    

Consequences: - Positive: - Sensitive data is protected and compliant - Each database is used for its strengths - Code is decoupled from specific implementations - Data access is consistent and maintainable - Testing is easier with clear interfaces

  • Negative (if violated):
  • Security risks from improper data storage
  • Performance issues from using wrong database
  • Maintenance difficulties from mixed access patterns
  • Testing complexity from direct database access
  • Compliance issues from improper data handling

References: - See docs/dev_practices.md (Firestore Usage, Data Access Patterns, Data Consistency)