ADR-003: Database and Data Storage Strategy¶
Status: Implemented Date: 2024-06-10 Context: The system must store data securely and efficiently, balancing performance, cost, and compliance. Different types of data have different requirements for consistency, scalability, and security.
Decision: 1. Data Classification and Storage: - Firestore is used for: - User mappings and relationships - Non-sensitive metadata - ID-to-ID relationships - Temporary or frequently changing data - Data requiring real-time updates - Data with flexible schema requirements
-
SQLAlchemy (Turso) is used for:
- Sensitive user information
- Transaction history
- Financial data
- Authentication data
- Personal identifiable information (PII)
- Data requiring strong consistency
- Data with complex relationships
-
Data Access Patterns:
- Repository pattern is enforced for all data access
- Each repository must implement a clear interface
- Direct database access is prohibited outside repositories
- Transactions are used for multi-document updates
- Optimistic concurrency control is implemented where needed
Examples: - Good:
# Repository Interface
class UserRepository(Protocol):
async def get_by_id(self, user_id: str) -> User | None:
raise NotImplementedError
async def create(self, user: User) -> bool:
raise NotImplementedError
# Firestore Implementation
class FirestoreUserRepository:
def __init__(self, db: firestore.AsyncClient):
self.db = db
self.collection = self.db.collection("appUsers")
async def get_by_id(self, user_id: str) -> User | None:
doc = await self.collection.document(user_id).get()
if not doc.exists:
return None
return User.model_validate(doc.to_dict())
# SQLAlchemy Implementation
class TursoUserRepository:
def __init__(self, session: AsyncSession):
self.session = session
async def get_by_id(self, user_id: str) -> User | None:
result = await self.session.execute(
select(UserModel).where(UserModel.id == user_id)
)
user = result.scalar_one_or_none()
return User.from_model(user) if user else None
# Good Service Implementation
class UserService:
def __init__(
self,
user_repo: UserRepository,
logger: Logger,
metrics: MetricsCollector
):
self.user_repo = user_repo
self.logger = logger
self.metrics = metrics
async def get_user(self, user_id: str) -> User | None:
with self.metrics.timer("user.get_by_id"):
try:
user = await self.user_repo.get_by_id(user_id)
self.metrics.increment(
"user.queries",
tags={"operation": "get_by_id", "result": "found" if user else "not_found"}
)
return user
except Exception as e:
self.logger.error(
"Failed to get user",
extra={"user_id": user_id, "error": str(e)}
)
raise UserServiceError("Failed to get user") from e
async def update_user(self, user_id: str, data: UserUpdate) -> bool:
with self.metrics.timer("user.update"):
try:
# Validate update data
if not data.is_valid():
raise ValidationError("Invalid update data")
# Get existing user
user = await self.user_repo.get_by_id(user_id)
if not user:
raise UserNotFoundError(f"User {user_id} not found")
# Apply updates
updated_user = user.apply_update(data)
# Save changes
success = await self.user_repo.update(updated_user)
self.metrics.increment(
"user.updates",
tags={"operation": "update", "result": "success" if success else "failed"}
)
return success
except Exception as e:
self.logger.error(
"Failed to update user",
extra={"user_id": user_id, "error": str(e)}
)
raise UserServiceError("Failed to update user") from e
- Bad:
# Direct database access class UserService: def __init__(self, db): self.db = db def get_user(self, user_id: str): # Directly queries database without abstraction doc = self.db.collection("appUsers").document(user_id).get() return doc.to_dict() def update_user(self, user_id: str, data: dict): # No transaction handling # No validation self.db.collection("appUsers").document(user_id).update(data)
Consequences: - Positive: - Sensitive data is protected and compliant - Each database is used for its strengths - Code is decoupled from specific implementations - Data access is consistent and maintainable - Testing is easier with clear interfaces
- Negative (if violated):
- Security risks from improper data storage
- Performance issues from using wrong database
- Maintenance difficulties from mixed access patterns
- Testing complexity from direct database access
- Compliance issues from improper data handling
References:
- See docs/dev_practices.md (Firestore Usage, Data Access Patterns, Data Consistency)