Skip to content

ADR-004: Microservices Architecture

Status: ✅ Accepted
Date: 2024-03-15 (Q1 2024)
Decision Makers: CTO, Solution Architect, DevOps Lead
Consulted: Backend Team, Frontend Lead
Informed: Engineering Team, Product Team


Context

MachineAvatars started as a proof-of-concept monolithic application. As we prepared for production, we needed to decide on the architectural pattern for the backend.

PoC Limitations:

  • Single Python FastAPI application (~15K lines of code)
  • All functionality in one codebase
  • Difficult to deploy changes (risk of breaking everything)
  • Can't scale individual features independently
  • Hard to assign ownership to teams

Production Requirements:

  • Support 3 chatbot types (3D, Text, Voice) with different resource needs
  • Independent scaling (voice chatbot uses 10x more CPU than text)
  • Multiple teams working simultaneously without conflicts
  • Fault isolation (one feature failure shouldn't crash entire system)
  • Technology flexibility (e.g., use Node.js for real-time features)

Team Structure:

  • 5 backend engineers
  • 2 frontend engineers
  • 1 DevOps engineer

Decision

We adopted a microservices architecture with 20+ independent services.

Service Breakdown

machineagents-backend/
├── gateway-service                (Port 8000) - API Gateway & Routing
├── auth-service                   (Port 8001) - Authentication & JWT
├── user-service                   (Port 8002) - User Management
├── create-chatbot-service         (Port 8003) - Chatbot Creation
├── selection-chatbot-service      (Port 8004) - Chatbot Selection
├── data-crawling-service          (Port 8005) - Website/PDF Processing
├── state-3d-chatbot-service       (Port 8006) - 3D Chatbot State
├── state-text-chatbot-service     (Port 8007) - Text Chatbot State
├── state-voice-chatbot-service    (Port 8008) - Voice Chatbot State
├── system-prompt-service          (Port 8009) - Prompt Management
├── chatbot-maintenance-service    (Port 8010) - Chatbot Maintenance
├── response-3d-chatbot-service    (Port 8011) - 3D Responses (CORE)
├── response-text-chatbot-service  (Port 8012) - Text Responses
├── response-voice-chatbot-service (Port 8013) - Voice Responses
├── chat-history-service           (Port 8014) - Conversation Storage
├── client-data-collection-service (Port 8015) - Analytics
├── llm-model-service              (Port 8016) - LLM Abstraction
├── homepage-chatbot-service       (Port 8017) - Homepage Chatbot
├── remote-physio-service          (Port 8018) - PhysioTherapy Assistant
└── payment-service                (Port 8019) - Razorpay Integration

Service Communication: REST APIs (HTTP/JSON)


Alternatives Considered

Alternative 1: Monolithic Architecture

Evaluated: Single FastAPI application with modular code organization

Pros:

  • ✅ Simplest to develop initially
  • ✅ Easy to debug (one codebase)
  • ✅ No network overhead between components
  • ✅ Simpler deployment (one artifact)
  • ✅ Easier testing (integration tests straightforward)

Cons:

  • Can't scale independently: Text chatbot needs 1 instance, Voice needs 10
  • Deployment risk: Small change requires full redeploy
  • Blast radius: One bug can crash entire system
  • Team coordination: Merge conflicts, deployment conflicts
  • Technology lock-in: Everything must be Python

Scaling Challenge:

Monolith:
- Peak usage: 100 voice requests/sec (high CPU)
- Also handling: 500 text requests/sec (low CPU)
- Must scale to 10 instances for voice
- Result: 90% wasted capacity for text processing

Why Rejected: Poor resource utilization, high deployment risk


Alternative 2: Modular Monolith

Evaluated: Single codebase with clear module boundaries, separate deployables

Pros:

  • ✅ Module boundaries enforce separation
  • ✅ Can extract to microservices later
  • ✅ Shared code easy (internal imports)
  • ✅ Simpler than full microservices

Cons:

  • Still monolithic deployment: Can't independently deploy modules
  • Database coupling: Modules share same DB (coordination needed)
  • Halfway measure: Complexity without full benefits

Why Rejected: Doesn't solve independent scaling problem


Alternative 3: Serverless Functions (Azure Functions)

Evaluated: Each feature as Azure Function

Pros:

  • ✅ Auto-scaling (pay per invocation)
  • ✅ Zero infrastructure management
  • ✅ Natural isolation
  • ✅ Cost-effective at low scale

Cons:

  • Cold starts: 1-3 second delay (unacceptable for chatbots)
  • Execution limits: 10 minute max (blocks long conversations)
  • Stateless: Complex state management needed
  • Vendor lock-in: Azure-specific
  • Limited control: Can't optimize infrastructure

Benchmark:

  • Serverless: 80th percentile = 1.2s latency (cold start)
  • Container: 80th percentile = 120ms latency 10x slower

Why Rejected: Cold starts unacceptable for user-facing chatbot


Alternative 4: Hybrid (Monolith + Key Microservices)

Evaluated: Core features in monolith, specialized services extracted

Pros:

  • ✅ Simpler than full microservices
  • ✅ Extract only what needs independent scaling
  • ✅ Gradual migration path

Cons:

  • Unclear boundaries: When to extract vs. keep in monolith?
  • Two deployment models: Confusing for team
  • Technical debt: Temptation to keep adding to monolith

Why Rejected: We want clean separation from the start


Decision Rationale

Why Microservices?

1. Independent Scaling (Resource Optimization)

# Different resource needs
response-voice-chatbot-service:
  cpu: 2 cores # TTS/STT processing
  memory: 4GB
  replicas: 10 # High demand

response-text-chatbot-service:
  cpu: 0.5 cores
  memory: 1GB
  replicas: 2 # Lower demand

llm-model-service:
  cpu: 1 core
  memory: 2GB
  replicas: 5 # Shared by all

Cost Savings: 60% vs. scaling monolith for peak


2. Fault Isolation

graph LR
    A[User] --> B[Gateway]
    B --> C[Response 3D]
    B --> D[Response Text]
    B --> E[Response Voice]

    C --> F{Service Fails}
    F -->|3D Down| G[Text/Voice Still Work]

    style C fill:#F44336
    style D fill:#4CAF50
    style E fill:#4CAF50

Blast Radius: Service failure affects only that feature, not entire system


3. Team Ownership & Velocity

Service Owner Deploy Frequency
response-3d-chatbot Engineer 1 3x/week
response-text-chatbot Engineer 2 2x/week
llm-model-service ML Engineer Monthly
payment-service Engineer 3 Rarely

No coordination needed: Teams deploy independently


4. Technology Flexibility

Most services: Python FastAPI (team expertise)
Future real-time service: Node.js (WebSocket)
Future analytics: Go (performance)

Not locked into single stack


5. Clear API Contracts

# response-3d-chatbot-service API
POST /api/3d-chat
Request:
{
  "user_id": "...",
  "project_id": "...",
  "message": "...",
  "session_id": "..."
}

Response:
{
  "answer": "...",
  "visemes": [...],
  "audio_url": "..."
}

Well-defined interfaces enable parallel development


Consequences

Positive Consequences

Independent Scaling: 60% cost savings vs. monolith
Fault Isolation: Service failures don't cascade
Fast Deployment: 2-5 minutes per service (vs. 15 min monolith)
Team Velocity: 3x more deploys/week (parallel work)
Clear Ownership: Each service has dedicated owner
Technology Choice: Can use best tool for job
Easier Testing: Test services in isolation

Negative Consequences

Operational Complexity: 20+ services to monitor
Network Overhead: Inter-service calls add latency (5-15ms each)
Distributed Debugging: Harder to trace requests across services
Data Consistency: No distributed transactions
DevOps Burden: 20 CI/CD pipelines, 20 Docker images

Mitigation Strategies

For Complexity:

  • Centralized logging (Loki + DataDog)
  • Distributed tracing (request IDs)
  • Service mesh (future: Istio)

For Data Consistency:

  • Event-driven architecture (planned)
  • Eventual consistency (acceptable for our use cases)
  • Compensating transactions where needed

Implementation Details

Service Template

Every service follows standard pattern:

service-name/
├── src/
│   ├── main.py           # FastAPI app entry point
│   ├── logger.py         # Standardized logging
│   ├── models.py         # Pydantic models
│   └── utils/
├── Dockerfile            # Multi-stage build
├── requirements.txt      # Dependencies
├── .env.example          # Environment template
└── README.md             # Service documentation

Docker Compose (Development)

# docker-compose.yml
services:
  gateway-service:
    build: ./gateway-service
    ports:
      - "8000:8000"
    environment:
      - MONGO_URI=${MONGO_URI}
    depends_on:
      - auth-service

  auth-service:
    build: ./auth-service
    ports:
      - "8001:8001"

  # ... 18 more services

Production Deployment (Azure Container Apps)

# Each service deployed independently
az containerapp create \
--name response-3d-chatbot-service \
--resource-group machineagents-rg \
--image machineagents.azurecr.io/response-3d-chatbot:latest \
--target-port 8011 \
--min-replicas 2 \
--max-replicas 10 \
--cpu 2 --memory 4Gi

Service Communication Patterns

Synchronous (REST)

# Gateway → Auth Service (every request)
async def verify_token(token: str):
    response = await httpx.get(
        "http://auth-service:8001/verify",
        headers={"Authorization": f"Bearer {token}"}
    )
    return response.json()

# Response 3D → LLM Service (for AI responses)
async def call_llm(messages):
    response = await httpx.post(
        "http://llm-model-service:8016/call-model/openai-35",
        json={"messages": messages}
    )
    return response.json()

Asynchronous (Planned for Q2 2025)

Event Bus (Azure Service Bus):
- chatbot.created → analytics, email notifications
- payment.succeeded → user-service, chatbot activation
- document.uploaded → data-crawling, embedding generation

Performance Impact

Network Latency

Request Flow Latency Hops
User → Gateway → Auth → Response Service → LLM 150ms 4
Monolith (hypothetical) 120ms 1
Overhead +30ms +3

Acceptable: 30ms overhead < 200ms total budget


Compliance & Security

Service-to-Service Auth:

  • Azure Managed Identity (production)
  • No hardcoded credentials
  • Network isolation (VNet)

Data Flow:

  • All inter-service calls encrypted (TLS)
  • No direct DB access from frontend
  • Gateway enforces authentication

Migration Path

Scenario 1: Consolidate Services

If operational burden too high:

  1. Merge related services (e.g., 3 state services → 1)
  2. Reduce 20 services → 8-10
  3. Keep core separation (auth, payment, chatbot responses)
  4. Estimated time: 6 weeks

Scenario 2: Move to Monolith

If microservices prove wrong:

  1. Merge all services into single FastAPI app
  2. Keep module boundaries
  3. Single deployment artifact
  4. Estimated time: 12 weeks (significant refactor)

Review Schedule

Next Review: 2025-06-30 (15 months after implementation)

Review Criteria:

  • Deployment frequency still high (> 10/week total)
  • Operational burden manageable (< 20 hours/week DevOps)
  • Team satisfied with independence (> 4.0/5)
  • Uptime > 99.5% across all services

  • ADR-001: Multi-Provider LLM Strategy - Enabled by service isolation
  • ADR-005: Next.js Frontend - API Gateway patterns
  • ADR-006: Azure Cloud Provider (planned) - Container infrastructure

Evidence & Metrics

Development Velocity (6 months post-migration):

  • Deployments/week: 45 (vs. 8 with monolith)
  • Mean time to deploy: 4 minutes (vs. 18 minutes)
  • Incidents caused by deployments: 2 (vs. 12 with monolith)

Production Metrics:

  • Uptime (individual services): 99.2%-99.9%
  • Overall system uptime: 99.7%
  • Latency impact: +25ms average (acceptable)

Last Updated: 2025-12-26
Review Date: 2025-06-30
Status: Active and successful


"Microservices: complexity traded for flexibility."