ADR-004: Microservices Architecture¶
Status: ✅ Accepted
Date: 2024-03-15 (Q1 2024)
Decision Makers: CTO, Solution Architect, DevOps Lead
Consulted: Backend Team, Frontend Lead
Informed: Engineering Team, Product Team
Context¶
MachineAvatars started as a proof-of-concept monolithic application. As we prepared for production, we needed to decide on the architectural pattern for the backend.
PoC Limitations:
- Single Python FastAPI application (~15K lines of code)
- All functionality in one codebase
- Difficult to deploy changes (risk of breaking everything)
- Can't scale individual features independently
- Hard to assign ownership to teams
Production Requirements:
- Support 3 chatbot types (3D, Text, Voice) with different resource needs
- Independent scaling (voice chatbot uses 10x more CPU than text)
- Multiple teams working simultaneously without conflicts
- Fault isolation (one feature failure shouldn't crash entire system)
- Technology flexibility (e.g., use Node.js for real-time features)
Team Structure:
- 5 backend engineers
- 2 frontend engineers
- 1 DevOps engineer
Decision¶
We adopted a microservices architecture with 20+ independent services.
Service Breakdown¶
machineagents-backend/
├── gateway-service (Port 8000) - API Gateway & Routing
├── auth-service (Port 8001) - Authentication & JWT
├── user-service (Port 8002) - User Management
├── create-chatbot-service (Port 8003) - Chatbot Creation
├── selection-chatbot-service (Port 8004) - Chatbot Selection
├── data-crawling-service (Port 8005) - Website/PDF Processing
├── state-3d-chatbot-service (Port 8006) - 3D Chatbot State
├── state-text-chatbot-service (Port 8007) - Text Chatbot State
├── state-voice-chatbot-service (Port 8008) - Voice Chatbot State
├── system-prompt-service (Port 8009) - Prompt Management
├── chatbot-maintenance-service (Port 8010) - Chatbot Maintenance
├── response-3d-chatbot-service (Port 8011) - 3D Responses (CORE)
├── response-text-chatbot-service (Port 8012) - Text Responses
├── response-voice-chatbot-service (Port 8013) - Voice Responses
├── chat-history-service (Port 8014) - Conversation Storage
├── client-data-collection-service (Port 8015) - Analytics
├── llm-model-service (Port 8016) - LLM Abstraction
├── homepage-chatbot-service (Port 8017) - Homepage Chatbot
├── remote-physio-service (Port 8018) - PhysioTherapy Assistant
└── payment-service (Port 8019) - Razorpay Integration
Service Communication: REST APIs (HTTP/JSON)
Alternatives Considered¶
Alternative 1: Monolithic Architecture¶
Evaluated: Single FastAPI application with modular code organization
Pros:
- ✅ Simplest to develop initially
- ✅ Easy to debug (one codebase)
- ✅ No network overhead between components
- ✅ Simpler deployment (one artifact)
- ✅ Easier testing (integration tests straightforward)
Cons:
- ❌ Can't scale independently: Text chatbot needs 1 instance, Voice needs 10
- ❌ Deployment risk: Small change requires full redeploy
- ❌ Blast radius: One bug can crash entire system
- ❌ Team coordination: Merge conflicts, deployment conflicts
- ❌ Technology lock-in: Everything must be Python
Scaling Challenge:
Monolith:
- Peak usage: 100 voice requests/sec (high CPU)
- Also handling: 500 text requests/sec (low CPU)
- Must scale to 10 instances for voice
- Result: 90% wasted capacity for text processing
Why Rejected: Poor resource utilization, high deployment risk
Alternative 2: Modular Monolith¶
Evaluated: Single codebase with clear module boundaries, separate deployables
Pros:
- ✅ Module boundaries enforce separation
- ✅ Can extract to microservices later
- ✅ Shared code easy (internal imports)
- ✅ Simpler than full microservices
Cons:
- ❌ Still monolithic deployment: Can't independently deploy modules
- ❌ Database coupling: Modules share same DB (coordination needed)
- ❌ Halfway measure: Complexity without full benefits
Why Rejected: Doesn't solve independent scaling problem
Alternative 3: Serverless Functions (Azure Functions)¶
Evaluated: Each feature as Azure Function
Pros:
- ✅ Auto-scaling (pay per invocation)
- ✅ Zero infrastructure management
- ✅ Natural isolation
- ✅ Cost-effective at low scale
Cons:
- ❌ Cold starts: 1-3 second delay (unacceptable for chatbots)
- ❌ Execution limits: 10 minute max (blocks long conversations)
- ❌ Stateless: Complex state management needed
- ❌ Vendor lock-in: Azure-specific
- ❌ Limited control: Can't optimize infrastructure
Benchmark:
- Serverless: 80th percentile = 1.2s latency (cold start)
- Container: 80th percentile = 120ms latency 10x slower ❌
Why Rejected: Cold starts unacceptable for user-facing chatbot
Alternative 4: Hybrid (Monolith + Key Microservices)¶
Evaluated: Core features in monolith, specialized services extracted
Pros:
- ✅ Simpler than full microservices
- ✅ Extract only what needs independent scaling
- ✅ Gradual migration path
Cons:
- ❌ Unclear boundaries: When to extract vs. keep in monolith?
- ❌ Two deployment models: Confusing for team
- ❌ Technical debt: Temptation to keep adding to monolith
Why Rejected: We want clean separation from the start
Decision Rationale¶
Why Microservices?¶
1. Independent Scaling (Resource Optimization)
# Different resource needs
response-voice-chatbot-service:
cpu: 2 cores # TTS/STT processing
memory: 4GB
replicas: 10 # High demand
response-text-chatbot-service:
cpu: 0.5 cores
memory: 1GB
replicas: 2 # Lower demand
llm-model-service:
cpu: 1 core
memory: 2GB
replicas: 5 # Shared by all
Cost Savings: 60% vs. scaling monolith for peak
2. Fault Isolation
graph LR
A[User] --> B[Gateway]
B --> C[Response 3D]
B --> D[Response Text]
B --> E[Response Voice]
C --> F{Service Fails}
F -->|3D Down| G[Text/Voice Still Work]
style C fill:#F44336
style D fill:#4CAF50
style E fill:#4CAF50
Blast Radius: Service failure affects only that feature, not entire system
3. Team Ownership & Velocity
| Service | Owner | Deploy Frequency |
|---|---|---|
| response-3d-chatbot | Engineer 1 | 3x/week |
| response-text-chatbot | Engineer 2 | 2x/week |
| llm-model-service | ML Engineer | Monthly |
| payment-service | Engineer 3 | Rarely |
No coordination needed: Teams deploy independently
4. Technology Flexibility
Most services: Python FastAPI (team expertise)
Future real-time service: Node.js (WebSocket)
Future analytics: Go (performance)
Not locked into single stack
5. Clear API Contracts
# response-3d-chatbot-service API
POST /api/3d-chat
Request:
{
"user_id": "...",
"project_id": "...",
"message": "...",
"session_id": "..."
}
Response:
{
"answer": "...",
"visemes": [...],
"audio_url": "..."
}
Well-defined interfaces enable parallel development
Consequences¶
Positive Consequences¶
✅ Independent Scaling: 60% cost savings vs. monolith
✅ Fault Isolation: Service failures don't cascade
✅ Fast Deployment: 2-5 minutes per service (vs. 15 min monolith)
✅ Team Velocity: 3x more deploys/week (parallel work)
✅ Clear Ownership: Each service has dedicated owner
✅ Technology Choice: Can use best tool for job
✅ Easier Testing: Test services in isolation
Negative Consequences¶
❌ Operational Complexity: 20+ services to monitor
❌ Network Overhead: Inter-service calls add latency (5-15ms each)
❌ Distributed Debugging: Harder to trace requests across services
❌ Data Consistency: No distributed transactions
❌ DevOps Burden: 20 CI/CD pipelines, 20 Docker images
Mitigation Strategies¶
For Complexity:
- Centralized logging (Loki + DataDog)
- Distributed tracing (request IDs)
- Service mesh (future: Istio)
For Data Consistency:
- Event-driven architecture (planned)
- Eventual consistency (acceptable for our use cases)
- Compensating transactions where needed
Implementation Details¶
Service Template¶
Every service follows standard pattern:
service-name/
├── src/
│ ├── main.py # FastAPI app entry point
│ ├── logger.py # Standardized logging
│ ├── models.py # Pydantic models
│ └── utils/
├── Dockerfile # Multi-stage build
├── requirements.txt # Dependencies
├── .env.example # Environment template
└── README.md # Service documentation
Docker Compose (Development)¶
# docker-compose.yml
services:
gateway-service:
build: ./gateway-service
ports:
- "8000:8000"
environment:
- MONGO_URI=${MONGO_URI}
depends_on:
- auth-service
auth-service:
build: ./auth-service
ports:
- "8001:8001"
# ... 18 more services
Production Deployment (Azure Container Apps)¶
# Each service deployed independently
az containerapp create \
--name response-3d-chatbot-service \
--resource-group machineagents-rg \
--image machineagents.azurecr.io/response-3d-chatbot:latest \
--target-port 8011 \
--min-replicas 2 \
--max-replicas 10 \
--cpu 2 --memory 4Gi
Service Communication Patterns¶
Synchronous (REST)¶
# Gateway → Auth Service (every request)
async def verify_token(token: str):
response = await httpx.get(
"http://auth-service:8001/verify",
headers={"Authorization": f"Bearer {token}"}
)
return response.json()
# Response 3D → LLM Service (for AI responses)
async def call_llm(messages):
response = await httpx.post(
"http://llm-model-service:8016/call-model/openai-35",
json={"messages": messages}
)
return response.json()
Asynchronous (Planned for Q2 2025)¶
Event Bus (Azure Service Bus):
- chatbot.created → analytics, email notifications
- payment.succeeded → user-service, chatbot activation
- document.uploaded → data-crawling, embedding generation
Performance Impact¶
Network Latency¶
| Request Flow | Latency | Hops |
|---|---|---|
| User → Gateway → Auth → Response Service → LLM | 150ms | 4 |
| Monolith (hypothetical) | 120ms | 1 |
| Overhead | +30ms | +3 |
Acceptable: 30ms overhead < 200ms total budget
Compliance & Security¶
Service-to-Service Auth:
- Azure Managed Identity (production)
- No hardcoded credentials
- Network isolation (VNet)
Data Flow:
- All inter-service calls encrypted (TLS)
- No direct DB access from frontend
- Gateway enforces authentication
Migration Path¶
Scenario 1: Consolidate Services¶
If operational burden too high:
- Merge related services (e.g., 3 state services → 1)
- Reduce 20 services → 8-10
- Keep core separation (auth, payment, chatbot responses)
- Estimated time: 6 weeks
Scenario 2: Move to Monolith¶
If microservices prove wrong:
- Merge all services into single FastAPI app
- Keep module boundaries
- Single deployment artifact
- Estimated time: 12 weeks (significant refactor)
Review Schedule¶
Next Review: 2025-06-30 (15 months after implementation)
Review Criteria:
- Deployment frequency still high (> 10/week total)
- Operational burden manageable (< 20 hours/week DevOps)
- Team satisfied with independence (> 4.0/5)
- Uptime > 99.5% across all services
Related ADRs¶
- ADR-001: Multi-Provider LLM Strategy - Enabled by service isolation
- ADR-005: Next.js Frontend - API Gateway patterns
- ADR-006: Azure Cloud Provider (planned) - Container infrastructure
Evidence & Metrics¶
Development Velocity (6 months post-migration):
- Deployments/week: 45 (vs. 8 with monolith)
- Mean time to deploy: 4 minutes (vs. 18 minutes)
- Incidents caused by deployments: 2 (vs. 12 with monolith)
Production Metrics:
- Uptime (individual services): 99.2%-99.9%
- Overall system uptime: 99.7%
- Latency impact: +25ms average (acceptable)
Last Updated: 2025-12-26
Review Date: 2025-06-30
Status: Active and successful
"Microservices: complexity traded for flexibility."