Skip to content

Product Architecture Overview

Purpose: High-level system architecture and component breakdown
Audience: Technical Team, Architects, CTO, Technical Investors
Owner: Solution Architect, CTO
Last Updated: 2025-12-26
Version: 1.0


Executive Summary

MachineAvatars is built on a modern microservices architecture with 23 independent services, leveraging Azure cloud infrastructure, multi-LLM AI capabilities, and advanced RAG (Retrieval-Augmented Generation) for intelligent conversations.

Architecture Highlights:

  • 23 Microservices - Independent, scalable, fault-tolerant
  • Multi-LLM Strategy - 9 AI models across 4 providers
  • RAG Pipeline - Milvus vector database with 384-dim embeddings
  • Next.js Frontend - SSR/SSG for performance and SEO
  • Azure Cloud - Container Apps, Cosmos DB, Blob Storage
  • 99.7% Uptime - Production-proven reliability

High-Level Architecture

graph TB
    subgraph "User Layer"
        A[Web Browser]
        B[Mobile App - Future]
    end

    subgraph "Frontend - Next.js 14"
        C[Next.js Application]
        D[3D Avatar Renderer<br/>Three.js]
        E[API Client<br/>Axios]
    end

    subgraph "API Gateway"
        F[Gateway Service<br/>Port 8000]
    end

    subgraph "Authentication"
        G[Auth Service<br/>Port 8001<br/>JWT]
    end

    subgraph "Core Services"
        H[User Service<br/>Port 8002]
        I[Create Chatbot<br/>Port 8003]
        J[Selection Service<br/>Port 8004]
    end

    subgraph "Chatbot Services - 3 Types"
        K[3D Chatbot State<br/>Port 8006]
        L[Text Chatbot State<br/>Port 8007]
        M[Voice Chatbot State<br/>Port 8008]
        N[3D Response<br/>Port 8011]
        O[Text Response<br/>Port 8012]
        P[Voice Response<br/>Port 8013]
    end

    subgraph "AI/ML Layer"
        Q[LLM Model Service<br/>Port 8016]
        R[Data Crawling<br/>Port 8005]
        S[System Prompts<br/>Port 8009]
    end

    subgraph "Supporting Services"
        T[Chat History<br/>Port 8014]
        U[Payment Service<br/>Port 8019]
        V[Analytics<br/>Port 8015]
    end

    subgraph "Data Layer"
        W[(Azure Cosmos DB<br/>MongoDB API)]
        X[(Milvus<br/>Vector Database)]
        Y[Azure Blob Storage]
    end

    subgraph "External AI APIs"
        Z1[OpenAI<br/>GPT-4, GPT-3.5]
        Z2[Anthropic<br/>Claude]
        Z3[Google<br/>Gemini]
        Z4[Azure ML<br/>Llama, DeepSeek]
    end

    A --> C
    C --> E
    E --> F
    F --> G
    F --> H
    F --> I
    F --> J
    F --> K
    F --> L
    F --> M
    F --> N
    F --> O
    F --> P
    F --> T
    F --> U

    N --> Q
    O --> Q
    P --> Q

    Q --> Z1
    Q --> Z2
    Q --> Z3
    Q --> Z4

    N --> X
    O --> X
    P --> X

    R --> W
    R --> X

    H --> W
    I --> W
    J --> W
    K --> W
    L --> W
    M --> W
    T --> W
    U --> W

    N --> Y
    P --> Y

    style F fill:#FFE082
    style Q fill:#FFF3E0
    style W fill:#CE93D8
    style X fill:#90CAF9

System Components

1. Frontend Layer

Next.js Application (Port: 3000)

Technology:

  • Next.js 14.1.6 (App Router)
  • React 18.2.0
  • TypeScript 5.x
  • Tailwind CSS 3.4.1

Key Features:

  • SSR/SSG: Server-side rendering for SEO, static generation for marketing pages
  • 3D Avatars: Three.js 0.171.0 for avatar rendering (60 FPS)
  • Real-Time: WebSocket support for voice chatbot streaming
  • Responsive: Mobile-first design, works on all devices

Architecture Pattern:

app/
├── (marketing)/        # Public pages (SSG)
│   ├── page.tsx       # Homepage
│   ├── pricing/
│   └── features/
├── dashboard/          # Auth-required (SSR)
│   ├── chatbots/
│   └── settings/
└── api/                # Backend-for-frontend
    ├── auth/
    └── proxy/

Deployment: Vercel (free tier) / Azure Static Web Apps (production)


2. API Gateway Layer

Gateway Service (Port 8000)

Purpose: Single entry point for all backend API calls

Responsibilities:

  • Routing: Directs requests to appropriate microservices
  • Authentication: Validates JWT tokens via Auth Service
  • Rate Limiting: Prevents abuse (100 requests/min per user)
  • CORS: Handles cross-origin requests
  • Request/Response Logging: Centralized logging

Technology:

  • FastAPI (Python)
  • Uvicorn (ASGI server)

Example Flow:

Client → Gateway (8000) → Auth (8001) → Response 3D (8011) → Client

3. Authentication & User Management

Auth Service (Port 8001)

Purpose: Authentication, authorization, token management

Features:

  • JWT Tokens: Secure, stateless authentication
  • Login/Signup: Email + password (bcrypt hashing)
  • OTP Verification: For email verification
  • Token Refresh: Automatic token renewal
  • Session Management: Track active sessions

Database: users collection in Cosmos DB

Security:

  • Password hashing (bcrypt, 12 rounds)
  • Rate limiting (5 login attempts per 15 minutes)
  • Account lockout after failed attempts

User Service (Port 8002)

Purpose: User profile management, settings

Features:

  • Update user profile
  • Manage subscription info
  • Email notifications
  • User preferences

Database: users collection


4. Chatbot Management

Create Chatbot Service (Port 8003)

Purpose: Create new chatbot instances

Flow:

  1. User selects chatbot type (3D, Text, Voice)
  2. Chooses avatar (for 3D)
  3. Sets personality/tone
  4. System creates project_id
  5. Initializes database collections

Database: chatbot_selection, projectid_creation


Selection Service (Port 8004)

Purpose: Retrieve chatbot configurations

Features:

  • Get chatbot details by project_id
  • List all chatbots for a user
  • Update chatbot settings
  • Delete chatbots (soft delete)

5. Chatbot State Management

Purpose: Manage chatbot state and session data

State Services (Ports 8006, 8007, 8008)

  • 3D Chatbot State (8006): Avatar animation states, lip-sync data
  • Text Chatbot State (8007): Message history, typing indicators
  • Voice Chatbot State (8008): Audio session, transcription state

Shared Functionality:

  • Session initialization
  • State persistence
  • Context management

6. Chatbot Response Generation

Response 3D Chatbot Service (Port 8011) ⭐ CORE SERVICE

Purpose: Generate AI responses for 3D avatar chatbots

Flow:

sequenceDiagram
    User->>3D Response: Send message
    3D Response->>Milvus: Search embeddings (top-5)
    Milvus-->>3D Response: Return relevant chunks
    3D Response->>LLM Service: Generate response (context + query)
    LLM Service-->>3D Response: AI answer
    3D Response->>Azure TTS: Text-to-speech
    Azure TTS-->>3D Response: Audio (WAV)
    3D Response->>Blob Storage: Upload audio
    3D Response-->>User: Response + audio URL + visemes

Key Features:

  • RAG pipeline (Milvus vector search)
  • Multi-LLM support (user-selectable model)
  • Voice synthesis (Azure TTS)
  • Lip-sync data (visemes for avatar animation)
  • Conversation history tracking

Performance:

  • p50 latency: 1.8 seconds
  • p95 latency: 3.5 seconds

Response Text Chatbot Service (Port 8012)

Purpose: Text-only chatbot responses

Simpler than 3D:

  • No TTS generation
  • No visemes
  • Faster (p50: 0.8 seconds)

Response Voice Chatbot Service (Port 8013)

Purpose: Voice conversation chatbots

Additional Features:

  • STT (Speech-to-Text): Whisper API for transcription
  • TTS: Azure Neural TTS
  • Real-time streaming: WebSocket support (planned)

7. AI/ML Layer

LLM Model Service (Port 8016) ⭐ CORE SERVICE

Purpose: Unified interface for all LLM providers

Supported Models (10 total):

Azure OpenAI:

  • GPT-4-0613 (gpt-4-0613)
  • GPT-3.5 Turbo 16K (gpt-35-turbo-16k-0613)
  • GPT-4o Mini (gpt-4o-mini-2024-07-18)
  • o1-mini (o1-mini-2024-09-12)

Azure ML Endpoints:

  • Llama 3.3 70B Instruct
  • DeepSeek R1
  • Ministral 3B
  • Phi-3 Small 8K

External APIs:

  • Gemini 2.0 Flash (Google)
  • Claude 3.5 Sonnet (Anthropic)
  • Grok-3 (xAI)

Routing Logic:

  • User-specified model (per chatbot)
  • Fallback chain (GPT-4 → GPT-3.5)
  • Cost optimization (auto-route simple queries to cheap models)

Configuration:

  • Temperature: 0.7 (universal)
  • Max tokens: Varies by model
  • Retry logic: 3 attempts with exponential backoff

Data Crawling Service (Port 8005)

Purpose: Process documents and websites into embeddings

Supported Formats:

  • PDFs: Text extraction via PyPDF2
  • Text files: Direct processing
  • Websites: Crawl up to 50 URLs, extract content
  • Q&A pairs: Direct embedding

Pipeline:

  1. Extract text
  2. Preprocess (remove HTML, emails, phone numbers)
  3. Chunk: 1000 characters, 200 overlap
  4. Embed: BAAI/bge-small-en-v1.5 (384 dimensions)
  5. Store: Milvus vector database

Configuration:

  • Max chunks per document: 50
  • Chunk size: 1000 characters
  • Overlap: 200 characters

System Prompt Service (Port 8009)

Purpose: Manage AI system prompts and guardrails

Features:

  • Default prompts (Customer Support, Sales, Custom)
  • User-customized prompts
  • Guardrails (prohibited topics, keywords)
  • Prompt versioning

Database: system_prompts_default, system_prompts_user, guardrails


8. Supporting Services

Chat History Service (Port 8014)

Purpose: Store and retrieve conversation history

Features:

  • Save chat messages (input, output, tokens)
  • Retrieve conversation by session_id
  • Analytics (token usage, response times)
  • TTL (Time-to-Live): 90 days auto-deletion

Database: chatbot_history collection

Schema:

{
  "user_id": "...",
  "project_id": "...",
  "session_id": "...",
  "chat_data": [
    {
      "input_prompt": "...",
      "output_response": "...",
      "input_tokens": 150,
      "output_tokens": 300,
      "total_tokens": 450
    }
  ],
  "session_total_tokens": 450
}

Payment Service (Port 8019)

Purpose: Handle subscriptions and payments

Integration: Razorpay (India payment gateway)

Features:

  • Create payment orders
  • Verify payment signatures
  • Webhook handling (payment success/failure)
  • Subscription management

Database: Payment records stored in Cosmos DB



Client Data Collection Service (Port 8015)

Purpose: Analytics and usage tracking

Metrics Collected:

  • User interactions (clicks, searches)
  • Chatbot usage (conversations, messages)
  • Performance metrics (response times)
  • Error rates

Storage: Cosmos DB + future integration with DataDog


Superadmin Service (Port 8020)

Purpose: Administrative dashboard API for platform management

Features:

  • Legal document distribution (Terms & Conditions, Privacy Policy PDFs)
  • Subscription plan management (dynamic aggregation from 4 collections)
  • Admin authentication
  • User management & chat history retrieval

Database: 8 MongoDB collections (users, subscriptions, chatbot_history, etc.)

Security: ⚠️ Plain-text password authentication (requires migration to bcrypt)


Homepage Chatbot Service (Port 8021)

Purpose: Dedicated chatbot for MachineAgents.ai homepage/landing page

Key Features:

  • Intelligent greeting system with UTM-based personalization
  • Lead collection & form submission handling
  • 7 avatar support (Eva, Shayla, Myra, Chris, Jack, Anu, Emma)
  • No RAG/Embeddings - Pure GPT-4 sales assistant
  • Dual collections (generate_greeting + generate_greetings for compatibility)

Unique: Homepage-specific with no multi-tenancy (single chatbot)

Integration: TTS + lip-sync for avatar animation, Calendly booking links


Remote Physio Service (Port 8022)

Purpose: ⚠️ Client-specific physiotherapy consultation chatbot

Key Features:

  • 8-stage state machine (language selection → problem assessment → clinical summary → follow-up)
  • Bilingual support (English + Hindi/Hinglish)
  • Hybrid RAG (BM25 + Vector search) for exercise/assessment recommendations
  • Clinical summary generation with automated medical documentation
  • User profile persistence (name, age, weight stored permanently)
  • Session inactivity handling (5-minute timeout with return flow)

Integration: Remote Physios API (rp-api.anubhaanant.com, api.remotephysios.com)

Architecture Issue: Hardcoded client-specific logic embedded in main product (should be refactored for multi-tenancy)


9. Data Layer

Azure Cosmos DB (MongoDB API)

Purpose: Primary database for all metadata

Configuration:

  • API: MongoDB (v4.2 compatible)
  • Region: Primary - East US, Secondary - Southeast Asia
  • Throughput: Autoscale 400-4000 RU/s
  • Backup: Continuous (30-day point-in-time restore)

Collections:

  • users - User accounts
  • chatbot_selection - Chatbot configs
  • chatbot_history - Conversations (90-day TTL)
  • files - Document metadata with chunks
  • files_secondary - Document metadata without chunks
  • projectid_creation - Project metadata
  • system_prompts_default - Default prompts
  • system_prompts_user - User prompts
  • guardrails - Safety rules
  • And 4 more...

Performance:

  • p50 query latency: 28ms
  • p95 query latency: 62ms
  • Uptime: 99.98%

Milvus Vector Database

Purpose: Store and search document embeddings

Configuration:

  • Version: 2.3+
  • Collection: "embeddings"
  • Dimensions: 384 (BAAI/bge-small-en-v1.5)
  • Index: IVF_FLAT (nlist=128)
  • Metric: L2 distance
  • Architecture: Partition-based multi-tenancy

Partitions:

  • Each chatbot = separate partition
  • Naming: User_{user_id}_Project_{project_id}
  • Benefits: 10-100x faster search, data isolation

Schema:

{
  "id": INT64,
  "document_id": VARCHAR(100),
  "user_id": VARCHAR(100),
  "project_id": VARCHAR(100),  # Partition key
  "chunk_index": INT32,
  "text": VARCHAR(2000),
  "embedding": FLOAT_VECTOR(384),
  "data_type": VARCHAR(50),
  "source_url": VARCHAR(500),
  "created_at": VARCHAR(100)
}

Performance:

  • p50 search latency: 15ms
  • p95 search latency: 35ms
  • Scales to 100M+ vectors

Azure Blob Storage

Purpose: Store files and audio

Containers:

  • audio-files/ - TTS-generated audio (WAV, 7-day retention)
  • documents/ - Uploaded PDFs, text files
  • avatars/ - 3D avatar models (GLB format)

Configuration:

  • Tier: Hot (frequent access)
  • Redundancy: LRS (Locally Redundant Storage)
  • CDN: Azure CDN for fast delivery

Data Flow Diagrams

User Query → AI Response Flow

sequenceDiagram
    participant U as User
    participant F as Frontend
    participant G as Gateway
    participant A as Auth
    participant R as Response Service
    participant M as Milvus
    participant L as LLM Service
    participant O as OpenAI
    participant T as Azure TTS
    participant B as Blob Storage

    U->>F: "What is your refund policy?"
    F->>G: POST /api/3d-chat
    G->>A: Verify JWT token
    A-->>G: Token valid
    G->>R: Forward request

    R->>M: Embed query (384-dim)
    R->>M: Search partition (top-5)
    M-->>R: Return relevant chunks

    R->>L: Call LLM with context
    L->>O: API call (GPT-3.5)
    O-->>L: AI response
    L-->>R: Response text

    R->>T: Text-to-speech
    T-->>R: Audio WAV
    R->>B: Upload audio
    B-->>R: Audio URL

    R-->>G: Response + URL + visemes
    G-->>F: JSON response
    F-->>U: Display answer + play audio

Document Upload → Embedding Flow

sequenceDiagram
    participant U as User
    participant F as Frontend
    participant D as Data Crawling
    participant E as Embedder
    participant M as Milvus
    participant C as Cosmos DB

    U->>F: Upload PDF
    F->>D: POST /crawl-data (PDF)

    D->>D: Extract text
    D->>D: Chunk (1000/200)

    loop For each chunk
        D->>E: Generate embedding
        E-->>D: 384-dim vector
    end

    D->>M: Create partition if needed
    D->>M: Bulk insert embeddings
    M-->>D: Insert success

    D->>C: Save metadata
    C-->>D: Save success

    D-->>F: Success response
    F-->>U: "Document processed!"

Technology Stack Summary

Frontend

Technology Version Purpose
Next.js 14.1.6 React framework, SSR/SSG
React 18.2.0 UI library
TypeScript 5.x Type safety
Three.js 0.171.0 3D avatar rendering
Tailwind CSS 3.4.1 Styling
Axios 1.6.7 HTTP client
Framer Motion 11.0.8 Animations

Backend

Technology Version Purpose
FastAPI Latest Python web framework
Python 3.9+ Programming language
Uvicorn Latest ASGI server
PyMongo Latest MongoDB driver
Pymilvus Latest Milvus client

AI/ML

Technology Version Purpose
OpenAI API Latest GPT-4, GPT-3.5
BAAI/bge-small-en-v1.5 v1.5 Embedding model (384-dim)
Azure TTS Latest Text-to-speech
Whisper API Latest Speech-to-text

Data

Technology Version Purpose
Azure Cosmos DB Latest MongoDB-compatible DB
Milvus 2.3+ Vector database
Azure Blob Storage Latest File storage
etcd Latest Milvus metadata
MinIO Latest Milvus object storage

DevOps

Technology Version Purpose
Docker Latest Containerization
Azure Container Apps Latest Hosting
GitHub Actions Latest CI/CD
DataDog Latest APM, monitoring
Loki Latest Logging

Scalability & Performance

Current Scale

  • Users: 5,000+
  • Chatbots: 500+
  • Conversations/Month: 150,000+
  • Documents: 500,000+
  • Embeddings: 25M vectors

Performance Metrics

Metric p50 p95 Target
End-to-end response 1.8s 3.5s < 3s
Milvus search 15ms 35ms < 50ms
Cosmos DB query 28ms 62ms < 100ms
LLM API call 800ms 1500ms < 2s

Horizontal Scaling

  • Microservices: Each service independently scalable
  • Auto-scaling: 2-10 replicas based on CPU/memory
  • Milvus: Can add query nodes for read scaling
  • Cosmos DB: Auto-scales RU/s (400-4000)


Last Updated: 2025-12-26
Version: 1.0
Review Cycle: Quarterly
Next Review: 2025-03-31


"20+ microservices, one powerful platform."