ADR-004: Microservices Architecture¶

Status: ✅ Accepted
Date: 2024-03-15 (Q1 2024)
Decision Makers: CTO, Solution Architect, DevOps Lead
Consulted: Backend Team, Frontend Lead
Informed: Engineering Team, Product Team

Context¶

MachineAvatars started as a proof-of-concept monolithic application. As we prepared for production, we needed to decide on the architectural pattern for the backend.

PoC Limitations:

Single Python FastAPI application (~15K lines of code)
All functionality in one codebase
Difficult to deploy changes (risk of breaking everything)
Can't scale individual features independently
Hard to assign ownership to teams

Production Requirements:

Support 3 chatbot types (3D, Text, Voice) with different resource needs
Independent scaling (voice chatbot uses 10x more CPU than text)
Multiple teams working simultaneously without conflicts
Fault isolation (one feature failure shouldn't crash entire system)
Technology flexibility (e.g., use Node.js for real-time features)

Team Structure:

5 backend engineers
2 frontend engineers
1 DevOps engineer

Decision¶

We adopted a microservices architecture with 20+ independent services.

Service Breakdown¶

machineagents-backend/
├── gateway-service                (Port 8000) - API Gateway & Routing
├── auth-service                   (Port 8001) - Authentication & JWT
├── user-service                   (Port 8002) - User Management
├── create-chatbot-service         (Port 8003) - Chatbot Creation
├── selection-chatbot-service      (Port 8004) - Chatbot Selection
├── data-crawling-service          (Port 8005) - Website/PDF Processing
├── state-3d-chatbot-service       (Port 8006) - 3D Chatbot State
├── state-text-chatbot-service     (Port 8007) - Text Chatbot State
├── state-voice-chatbot-service    (Port 8008) - Voice Chatbot State
├── system-prompt-service          (Port 8009) - Prompt Management
├── chatbot-maintenance-service    (Port 8010) - Chatbot Maintenance
├── response-3d-chatbot-service    (Port 8011) - 3D Responses (CORE)
├── response-text-chatbot-service  (Port 8012) - Text Responses
├── response-voice-chatbot-service (Port 8013) - Voice Responses
├── chat-history-service           (Port 8014) - Conversation Storage
├── client-data-collection-service (Port 8015) - Analytics
├── llm-model-service              (Port 8016) - LLM Abstraction
├── homepage-chatbot-service       (Port 8017) - Homepage Chatbot
├── remote-physio-service          (Port 8018) - PhysioTherapy Assistant
└── payment-service                (Port 8019) - Razorpay Integration

Service Communication: REST APIs (HTTP/JSON)

Alternatives Considered¶

Alternative 1: Monolithic Architecture¶

Evaluated: Single FastAPI application with modular code organization

Pros:

✅ Simplest to develop initially
✅ Easy to debug (one codebase)
✅ No network overhead between components
✅ Simpler deployment (one artifact)
✅ Easier testing (integration tests straightforward)

Cons:

❌ Can't scale independently: Text chatbot needs 1 instance, Voice needs 10
❌ Deployment risk: Small change requires full redeploy
❌ Blast radius: One bug can crash entire system
❌ Team coordination: Merge conflicts, deployment conflicts
❌ Technology lock-in: Everything must be Python

Scaling Challenge:

Monolith:
- Peak usage: 100 voice requests/sec (high CPU)
- Also handling: 500 text requests/sec (low CPU)
- Must scale to 10 instances for voice
- Result: 90% wasted capacity for text processing

Why Rejected: Poor resource utilization, high deployment risk

Alternative 2: Modular Monolith¶

Evaluated: Single codebase with clear module boundaries, separate deployables

Pros:

✅ Module boundaries enforce separation
✅ Can extract to microservices later
✅ Shared code easy (internal imports)
✅ Simpler than full microservices

Cons:

❌ Still monolithic deployment: Can't independently deploy modules
❌ Database coupling: Modules share same DB (coordination needed)
❌ Halfway measure: Complexity without full benefits

Why Rejected: Doesn't solve independent scaling problem

Alternative 3: Serverless Functions (Azure Functions)¶

Evaluated: Each feature as Azure Function

Pros:

✅ Auto-scaling (pay per invocation)
✅ Zero infrastructure management
✅ Natural isolation
✅ Cost-effective at low scale

Cons:

❌ Cold starts: 1-3 second delay (unacceptable for chatbots)
❌ Execution limits: 10 minute max (blocks long conversations)
❌ Stateless: Complex state management needed
❌ Vendor lock-in: Azure-specific
❌ Limited control: Can't optimize infrastructure

Benchmark:

Serverless: 80^th percentile = 1.2s latency (cold start)
Container: 80^th percentile = 120ms latency 10x slower ❌

Why Rejected: Cold starts unacceptable for user-facing chatbot

Alternative 4: Hybrid (Monolith + Key Microservices)¶

Evaluated: Core features in monolith, specialized services extracted

Pros:

✅ Simpler than full microservices
✅ Extract only what needs independent scaling
✅ Gradual migration path

Cons:

❌ Unclear boundaries: When to extract vs. keep in monolith?
❌ Two deployment models: Confusing for team
❌ Technical debt: Temptation to keep adding to monolith

Why Rejected: We want clean separation from the start

Decision Rationale¶

Why Microservices?¶

1. Independent Scaling (Resource Optimization)

# Different resource needs
response-voice-chatbot-service:
  cpu: 2 cores # TTS/STT processing
  memory: 4GB
  replicas: 10 # High demand

response-text-chatbot-service:
  cpu: 0.5 cores
  memory: 1GB
  replicas: 2 # Lower demand

llm-model-service:
  cpu: 1 core
  memory: 2GB
  replicas: 5 # Shared by all

Cost Savings: 60% vs. scaling monolith for peak

2. Fault Isolation

graph LR
    A[User] --> B[Gateway]
    B --> C[Response 3D]
    B --> D[Response Text]
    B --> E[Response Voice]

    C --> F{Service Fails}
    F -->|3D Down| G[Text/Voice Still Work]

    style C fill:#F44336
    style D fill:#4CAF50
    style E fill:#4CAF50

Blast Radius: Service failure affects only that feature, not entire system

3. Team Ownership & Velocity

Service	Owner	Deploy Frequency
response-3d-chatbot	Engineer 1	3x/week
response-text-chatbot	Engineer 2	2x/week
llm-model-service	ML Engineer	Monthly
payment-service	Engineer 3	Rarely

No coordination needed: Teams deploy independently

4. Technology Flexibility

Most services: Python FastAPI (team expertise)
Future real-time service: Node.js (WebSocket)
Future analytics: Go (performance)

Not locked into single stack

5. Clear API Contracts

# response-3d-chatbot-service API
POST /api/3d-chat
Request:
{
  "user_id": "...",
  "project_id": "...",
  "message": "...",
  "session_id": "..."
}

Response:
{
  "answer": "...",
  "visemes": [...],
  "audio_url": "..."
}

Well-defined interfaces enable parallel development

Consequences¶

Positive Consequences¶

✅ Independent Scaling: 60% cost savings vs. monolith
✅ Fault Isolation: Service failures don't cascade
✅ Fast Deployment: 2-5 minutes per service (vs. 15 min monolith)
✅ Team Velocity: 3x more deploys/week (parallel work)
✅ Clear Ownership: Each service has dedicated owner
✅ Technology Choice: Can use best tool for job
✅ Easier Testing: Test services in isolation

Negative Consequences¶

❌ Operational Complexity: 20+ services to monitor
❌ Network Overhead: Inter-service calls add latency (5-15ms each)
❌ Distributed Debugging: Harder to trace requests across services
❌ Data Consistency: No distributed transactions
❌ DevOps Burden: 20 CI/CD pipelines, 20 Docker images

Mitigation Strategies¶

For Complexity:

Centralized logging (Loki + DataDog)
Distributed tracing (request IDs)
Service mesh (future: Istio)

For Data Consistency:

Event-driven architecture (planned)
Eventual consistency (acceptable for our use cases)
Compensating transactions where needed

Implementation Details¶

Service Template¶

Every service follows standard pattern:

service-name/
├── src/
│   ├── main.py           # FastAPI app entry point
│   ├── logger.py         # Standardized logging
│   ├── models.py         # Pydantic models
│   └── utils/
├── Dockerfile            # Multi-stage build
├── requirements.txt      # Dependencies
├── .env.example          # Environment template
└── README.md             # Service documentation

Docker Compose (Development)¶

# docker-compose.yml
services:
  gateway-service:
    build: ./gateway-service
    ports:
      - "8000:8000"
    environment:
      - MONGO_URI=${MONGO_URI}
    depends_on:
      - auth-service

  auth-service:
    build: ./auth-service
    ports:
      - "8001:8001"

  # ... 18 more services

Production Deployment (Azure Container Apps)¶

# Each service deployed independently
az containerapp create \
--name response-3d-chatbot-service \
--resource-group machineagents-rg \
--image machineagents.azurecr.io/response-3d-chatbot:latest \
--target-port 8011 \
--min-replicas 2 \
--max-replicas 10 \
--cpu 2 --memory 4Gi

Service Communication Patterns¶

Synchronous (REST)¶

# Gateway → Auth Service (every request)
async def verify_token(token: str):
    response = await httpx.get(
        "http://auth-service:8001/verify",
        headers={"Authorization": f"Bearer {token}"}
    )
    return response.json()

# Response 3D → LLM Service (for AI responses)
async def call_llm(messages):
    response = await httpx.post(
        "http://llm-model-service:8016/call-model/openai-35",
        json={"messages": messages}
    )
    return response.json()

Asynchronous (Planned for Q2 2025)¶

Event Bus (Azure Service Bus):
- chatbot.created → analytics, email notifications
- payment.succeeded → user-service, chatbot activation
- document.uploaded → data-crawling, embedding generation

Performance Impact¶

Network Latency¶

Request Flow	Latency	Hops
User → Gateway → Auth → Response Service → LLM	150ms	4
Monolith (hypothetical)	120ms	1
Overhead	+30ms	+3

Acceptable: 30ms overhead < 200ms total budget

Compliance & Security¶

Service-to-Service Auth:

Azure Managed Identity (production)
No hardcoded credentials
Network isolation (VNet)

Data Flow:

All inter-service calls encrypted (TLS)
No direct DB access from frontend
Gateway enforces authentication

Migration Path¶

Scenario 1: Consolidate Services¶

If operational burden too high:

Merge related services (e.g., 3 state services → 1)
Reduce 20 services → 8-10
Keep core separation (auth, payment, chatbot responses)
Estimated time: 6 weeks

Scenario 2: Move to Monolith¶

If microservices prove wrong:

Merge all services into single FastAPI app
Keep module boundaries
Single deployment artifact
Estimated time: 12 weeks (significant refactor)

Review Schedule¶

Next Review: 2025-06-30 (15 months after implementation)

Review Criteria:

Deployment frequency still high (> 10/week total)
Operational burden manageable (< 20 hours/week DevOps)
Team satisfied with independence (> 4.0/5)
Uptime > 99.5% across all services

ADR-001: Multi-Provider LLM Strategy - Enabled by service isolation
ADR-005: Next.js Frontend - API Gateway patterns
ADR-006: Azure Cloud Provider (planned) - Container infrastructure

Evidence & Metrics¶

Development Velocity (6 months post-migration):

Deployments/week: 45 (vs. 8 with monolith)
Mean time to deploy: 4 minutes (vs. 18 minutes)
Incidents caused by deployments: 2 (vs. 12 with monolith)

Production Metrics:

Uptime (individual services): 99.2%-99.9%
Overall system uptime: 99.7%
Latency impact: +25ms average (acceptable)

Last Updated: 2025-12-26
Review Date: 2025-06-30
Status: Active and successful

"Microservices: complexity traded for flexibility."

ADR-004: Microservices Architecture¶

Context¶

Decision¶

Service Breakdown¶

Alternatives Considered¶

Alternative 1: Monolithic Architecture¶

Alternative 2: Modular Monolith¶

Alternative 3: Serverless Functions (Azure Functions)¶

Alternative 4: Hybrid (Monolith + Key Microservices)¶

Decision Rationale¶

Why Microservices?¶

Consequences¶

Positive Consequences¶

Negative Consequences¶

Mitigation Strategies¶

Implementation Details¶

Service Template¶

Docker Compose (Development)¶

Production Deployment (Azure Container Apps)¶

Service Communication Patterns¶

Synchronous (REST)¶

Asynchronous (Planned for Q2 2025)¶

Performance Impact¶

Network Latency¶

Compliance & Security¶

Migration Path¶

Scenario 1: Consolidate Services¶

Scenario 2: Move to Monolith¶

Review Schedule¶

Related ADRs¶

Evidence & Metrics¶