Product Architecture Overview¶

Purpose: High-level system architecture and component breakdown
Audience: Technical Team, Architects, CTO, Technical Investors
Owner: Solution Architect, CTO
Last Updated: 2025-12-26
Version: 1.0

Executive Summary¶

MachineAvatars is built on a modern microservices architecture with 23 independent services, leveraging Azure cloud infrastructure, multi-LLM AI capabilities, and advanced RAG (Retrieval-Augmented Generation) for intelligent conversations.

Architecture Highlights:

✅ 23 Microservices - Independent, scalable, fault-tolerant
✅ Multi-LLM Strategy - 9 AI models across 4 providers
✅ RAG Pipeline - Milvus vector database with 384-dim embeddings
✅ Next.js Frontend - SSR/SSG for performance and SEO
✅ Azure Cloud - Container Apps, Cosmos DB, Blob Storage
✅ 99.7% Uptime - Production-proven reliability

High-Level Architecture¶

graph TB
    subgraph "User Layer"
        A[Web Browser]
        B[Mobile App - Future]
    end

    subgraph "Frontend - Next.js 14"
        C[Next.js Application]
        D[3D Avatar Renderer<br/>Three.js]
        E[API Client<br/>Axios]
    end

    subgraph "API Gateway"
        F[Gateway Service<br/>Port 8000]
    end

    subgraph "Authentication"
        G[Auth Service<br/>Port 8001<br/>JWT]
    end

    subgraph "Core Services"
        H[User Service<br/>Port 8002]
        I[Create Chatbot<br/>Port 8003]
        J[Selection Service<br/>Port 8004]
    end

    subgraph "Chatbot Services - 3 Types"
        K[3D Chatbot State<br/>Port 8006]
        L[Text Chatbot State<br/>Port 8007]
        M[Voice Chatbot State<br/>Port 8008]
        N[3D Response<br/>Port 8011]
        O[Text Response<br/>Port 8012]
        P[Voice Response<br/>Port 8013]
    end

    subgraph "AI/ML Layer"
        Q[LLM Model Service<br/>Port 8016]
        R[Data Crawling<br/>Port 8005]
        S[System Prompts<br/>Port 8009]
    end

    subgraph "Supporting Services"
        T[Chat History<br/>Port 8014]
        U[Payment Service<br/>Port 8019]
        V[Analytics<br/>Port 8015]
    end

    subgraph "Data Layer"
        W[(Azure Cosmos DB<br/>MongoDB API)]
        X[(Milvus<br/>Vector Database)]
        Y[Azure Blob Storage]
    end

    subgraph "External AI APIs"
        Z1[OpenAI<br/>GPT-4, GPT-3.5]
        Z2[Anthropic<br/>Claude]
        Z3[Google<br/>Gemini]
        Z4[Azure ML<br/>Llama, DeepSeek]
    end

    A --> C
    C --> E
    E --> F
    F --> G
    F --> H
    F --> I
    F --> J
    F --> K
    F --> L
    F --> M
    F --> N
    F --> O
    F --> P
    F --> T
    F --> U

    N --> Q
    O --> Q
    P --> Q

    Q --> Z1
    Q --> Z2
    Q --> Z3
    Q --> Z4

    N --> X
    O --> X
    P --> X

    R --> W
    R --> X

    H --> W
    I --> W
    J --> W
    K --> W
    L --> W
    M --> W
    T --> W
    U --> W

    N --> Y
    P --> Y

    style F fill:#FFE082
    style Q fill:#FFF3E0
    style W fill:#CE93D8
    style X fill:#90CAF9

System Components¶

1. Frontend Layer¶

Next.js Application (Port: 3000)¶

Technology:

Next.js 14.1.6 (App Router)
React 18.2.0
TypeScript 5.x
Tailwind CSS 3.4.1

Key Features:

SSR/SSG: Server-side rendering for SEO, static generation for marketing pages
3D Avatars: Three.js 0.171.0 for avatar rendering (60 FPS)
Real-Time: WebSocket support for voice chatbot streaming
Responsive: Mobile-first design, works on all devices

Architecture Pattern:

app/
├── (marketing)/        # Public pages (SSG)
│   ├── page.tsx       # Homepage
│   ├── pricing/
│   └── features/
├── dashboard/          # Auth-required (SSR)
│   ├── chatbots/
│   └── settings/
└── api/                # Backend-for-frontend
    ├── auth/
    └── proxy/

Deployment: Vercel (free tier) / Azure Static Web Apps (production)

2. API Gateway Layer¶

Gateway Service (Port 8000)¶

Purpose: Single entry point for all backend API calls

Responsibilities:

Routing: Directs requests to appropriate microservices
Authentication: Validates JWT tokens via Auth Service
Rate Limiting: Prevents abuse (100 requests/min per user)
CORS: Handles cross-origin requests
Request/Response Logging: Centralized logging

Technology:

FastAPI (Python)
Uvicorn (ASGI server)

Example Flow:

Client → Gateway (8000) → Auth (8001) → Response 3D (8011) → Client

3. Authentication & User Management¶

Auth Service (Port 8001)¶

Purpose: Authentication, authorization, token management

Features:

JWT Tokens: Secure, stateless authentication
Login/Signup: Email + password (bcrypt hashing)
OTP Verification: For email verification
Token Refresh: Automatic token renewal
Session Management: Track active sessions

Database: users collection in Cosmos DB

Security:

Password hashing (bcrypt, 12 rounds)
Rate limiting (5 login attempts per 15 minutes)
Account lockout after failed attempts

User Service (Port 8002)¶

Purpose: User profile management, settings

Features:

Update user profile
Manage subscription info
Email notifications
User preferences

Database: users collection

4. Chatbot Management¶

Create Chatbot Service (Port 8003)¶

Purpose: Create new chatbot instances

Flow:

User selects chatbot type (3D, Text, Voice)
Chooses avatar (for 3D)
Sets personality/tone
System creates project_id
Initializes database collections

Database: chatbot_selection, projectid_creation

Selection Service (Port 8004)¶

Purpose: Retrieve chatbot configurations

Features:

Get chatbot details by project_id
List all chatbots for a user
Update chatbot settings
Delete chatbots (soft delete)

5. Chatbot State Management¶

Purpose: Manage chatbot state and session data

State Services (Ports 8006, 8007, 8008)¶

3D Chatbot State (8006): Avatar animation states, lip-sync data
Text Chatbot State (8007): Message history, typing indicators
Voice Chatbot State (8008): Audio session, transcription state

Shared Functionality:

Session initialization
State persistence
Context management

6. Chatbot Response Generation¶

Response 3D Chatbot Service (Port 8011) ⭐ CORE SERVICE¶

Purpose: Generate AI responses for 3D avatar chatbots

Flow:

sequenceDiagram
    User->>3D Response: Send message
    3D Response->>Milvus: Search embeddings (top-5)
    Milvus-->>3D Response: Return relevant chunks
    3D Response->>LLM Service: Generate response (context + query)
    LLM Service-->>3D Response: AI answer
    3D Response->>Azure TTS: Text-to-speech
    Azure TTS-->>3D Response: Audio (WAV)
    3D Response->>Blob Storage: Upload audio
    3D Response-->>User: Response + audio URL + visemes

Key Features:

RAG pipeline (Milvus vector search)
Multi-LLM support (user-selectable model)
Voice synthesis (Azure TTS)
Lip-sync data (visemes for avatar animation)
Conversation history tracking

Performance:

p50 latency: 1.8 seconds
p95 latency: 3.5 seconds

Response Text Chatbot Service (Port 8012)¶

Purpose: Text-only chatbot responses

Simpler than 3D:

No TTS generation
No visemes
Faster (p50: 0.8 seconds)

Response Voice Chatbot Service (Port 8013)¶

Purpose: Voice conversation chatbots

Additional Features:

STT (Speech-to-Text): Whisper API for transcription
TTS: Azure Neural TTS
Real-time streaming: WebSocket support (planned)

7. AI/ML Layer¶

LLM Model Service (Port 8016) ⭐ CORE SERVICE¶

Purpose: Unified interface for all LLM providers

Supported Models (10 total):

Azure OpenAI:

GPT-4-0613 (gpt-4-0613)
GPT-3.5 Turbo 16K (gpt-35-turbo-16k-0613)
GPT-4o Mini (gpt-4o-mini-2024-07-18)
o1-mini (o1-mini-2024-09-12)

Azure ML Endpoints:

Llama 3.3 70B Instruct
DeepSeek R1
Ministral 3B
Phi-3 Small 8K

External APIs:

Gemini 2.0 Flash (Google)
Claude 3.5 Sonnet (Anthropic)
Grok-3 (xAI)

Routing Logic:

User-specified model (per chatbot)
Fallback chain (GPT-4 → GPT-3.5)
Cost optimization (auto-route simple queries to cheap models)

Configuration:

Temperature: 0.7 (universal)
Max tokens: Varies by model
Retry logic: 3 attempts with exponential backoff

Data Crawling Service (Port 8005)¶

Purpose: Process documents and websites into embeddings

Supported Formats:

PDFs: Text extraction via PyPDF2
Text files: Direct processing
Websites: Crawl up to 50 URLs, extract content
Q&A pairs: Direct embedding

Pipeline:

Extract text
Preprocess (remove HTML, emails, phone numbers)
Chunk: 1000 characters, 200 overlap
Embed: BAAI/bge-small-en-v1.5 (384 dimensions)
Store: Milvus vector database

Configuration:

Max chunks per document: 50
Chunk size: 1000 characters
Overlap: 200 characters

System Prompt Service (Port 8009)¶

Purpose: Manage AI system prompts and guardrails

Features:

Default prompts (Customer Support, Sales, Custom)
User-customized prompts
Guardrails (prohibited topics, keywords)
Prompt versioning

Database: system_prompts_default, system_prompts_user, guardrails

8. Supporting Services¶

Chat History Service (Port 8014)¶

Purpose: Store and retrieve conversation history

Features:

Save chat messages (input, output, tokens)
Retrieve conversation by session_id
Analytics (token usage, response times)
TTL (Time-to-Live): 90 days auto-deletion

Database: chatbot_history collection

Schema:

{
  "user_id": "...",
  "project_id": "...",
  "session_id": "...",
  "chat_data": [
    {
      "input_prompt": "...",
      "output_response": "...",
      "input_tokens": 150,
      "output_tokens": 300,
      "total_tokens": 450
    }
  ],
  "session_total_tokens": 450
}

Payment Service (Port 8019)¶

Purpose: Handle subscriptions and payments

Integration: Razorpay (India payment gateway)

Features:

Create payment orders
Verify payment signatures
Webhook handling (payment success/failure)
Subscription management

Database: Payment records stored in Cosmos DB

Client Data Collection Service (Port 8015)¶

Purpose: Analytics and usage tracking

Metrics Collected:

User interactions (clicks, searches)
Chatbot usage (conversations, messages)
Performance metrics (response times)
Error rates

Storage: Cosmos DB + future integration with DataDog

Superadmin Service (Port 8020)¶

Purpose: Administrative dashboard API for platform management

Features:

Legal document distribution (Terms & Conditions, Privacy Policy PDFs)
Subscription plan management (dynamic aggregation from 4 collections)
Admin authentication
User management & chat history retrieval

Database: 8 MongoDB collections (users, subscriptions, chatbot_history, etc.)

Security: ⚠️ Plain-text password authentication (requires migration to bcrypt)

Homepage Chatbot Service (Port 8021)¶

Purpose: Dedicated chatbot for MachineAgents.ai homepage/landing page

Key Features:

Intelligent greeting system with UTM-based personalization
Lead collection & form submission handling
7 avatar support (Eva, Shayla, Myra, Chris, Jack, Anu, Emma)
No RAG/Embeddings - Pure GPT-4 sales assistant
Dual collections (generate_greeting + generate_greetings for compatibility)

Unique: Homepage-specific with no multi-tenancy (single chatbot)

Integration: TTS + lip-sync for avatar animation, Calendly booking links

Remote Physio Service (Port 8022)¶

Purpose: ⚠️ Client-specific physiotherapy consultation chatbot

Key Features:

8-stage state machine (language selection → problem assessment → clinical summary → follow-up)
Bilingual support (English + Hindi/Hinglish)
Hybrid RAG (BM25 + Vector search) for exercise/assessment recommendations
Clinical summary generation with automated medical documentation
User profile persistence (name, age, weight stored permanently)
Session inactivity handling (5-minute timeout with return flow)

Integration: Remote Physios API (rp-api.anubhaanant.com, api.remotephysios.com)

Architecture Issue: Hardcoded client-specific logic embedded in main product (should be refactored for multi-tenancy)

9. Data Layer¶

Azure Cosmos DB (MongoDB API)¶

Purpose: Primary database for all metadata

Configuration:

API: MongoDB (v4.2 compatible)
Region: Primary - East US, Secondary - Southeast Asia
Throughput: Autoscale 400-4000 RU/s
Backup: Continuous (30-day point-in-time restore)

Collections:

users - User accounts
chatbot_selection - Chatbot configs
chatbot_history - Conversations (90-day TTL)
files - Document metadata with chunks
files_secondary - Document metadata without chunks
projectid_creation - Project metadata
system_prompts_default - Default prompts
system_prompts_user - User prompts
guardrails - Safety rules
And 4 more...

Performance:

p50 query latency: 28ms
p95 query latency: 62ms
Uptime: 99.98%

Milvus Vector Database¶

Purpose: Store and search document embeddings

Configuration:

Version: 2.3+
Collection: "embeddings"
Dimensions: 384 (BAAI/bge-small-en-v1.5)
Index: IVF_FLAT (nlist=128)
Metric: L2 distance
Architecture: Partition-based multi-tenancy

Partitions:

Each chatbot = separate partition
Naming: User_{user_id}_Project_{project_id}
Benefits: 10-100x faster search, data isolation

Schema:

{
  "id": INT64,
  "document_id": VARCHAR(100),
  "user_id": VARCHAR(100),
  "project_id": VARCHAR(100),  # Partition key
  "chunk_index": INT32,
  "text": VARCHAR(2000),
  "embedding": FLOAT_VECTOR(384),
  "data_type": VARCHAR(50),
  "source_url": VARCHAR(500),
  "created_at": VARCHAR(100)
}

Performance:

p50 search latency: 15ms
p95 search latency: 35ms
Scales to 100M+ vectors

Azure Blob Storage¶

Purpose: Store files and audio

Containers:

audio-files/ - TTS-generated audio (WAV, 7-day retention)
documents/ - Uploaded PDFs, text files
avatars/ - 3D avatar models (GLB format)

Configuration:

Tier: Hot (frequent access)
Redundancy: LRS (Locally Redundant Storage)
CDN: Azure CDN for fast delivery

Data Flow Diagrams¶

User Query → AI Response Flow¶

sequenceDiagram
    participant U as User
    participant F as Frontend
    participant G as Gateway
    participant A as Auth
    participant R as Response Service
    participant M as Milvus
    participant L as LLM Service
    participant O as OpenAI
    participant T as Azure TTS
    participant B as Blob Storage

    U->>F: "What is your refund policy?"
    F->>G: POST /api/3d-chat
    G->>A: Verify JWT token
    A-->>G: Token valid
    G->>R: Forward request

    R->>M: Embed query (384-dim)
    R->>M: Search partition (top-5)
    M-->>R: Return relevant chunks

    R->>L: Call LLM with context
    L->>O: API call (GPT-3.5)
    O-->>L: AI response
    L-->>R: Response text

    R->>T: Text-to-speech
    T-->>R: Audio WAV
    R->>B: Upload audio
    B-->>R: Audio URL

    R-->>G: Response + URL + visemes
    G-->>F: JSON response
    F-->>U: Display answer + play audio

Document Upload → Embedding Flow¶

sequenceDiagram
    participant U as User
    participant F as Frontend
    participant D as Data Crawling
    participant E as Embedder
    participant M as Milvus
    participant C as Cosmos DB

    U->>F: Upload PDF
    F->>D: POST /crawl-data (PDF)

    D->>D: Extract text
    D->>D: Chunk (1000/200)

    loop For each chunk
        D->>E: Generate embedding
        E-->>D: 384-dim vector
    end

    D->>M: Create partition if needed
    D->>M: Bulk insert embeddings
    M-->>D: Insert success

    D->>C: Save metadata
    C-->>D: Save success

    D-->>F: Success response
    F-->>U: "Document processed!"

Technology Stack Summary¶

Frontend¶

Technology	Version	Purpose
Next.js	14.1.6	React framework, SSR/SSG
React	18.2.0	UI library
TypeScript	5.x	Type safety
Three.js	0.171.0	3D avatar rendering
Tailwind CSS	3.4.1	Styling
Axios	1.6.7	HTTP client
Framer Motion	11.0.8	Animations

Backend¶

Technology	Version	Purpose
FastAPI	Latest	Python web framework
Python	3.9+	Programming language
Uvicorn	Latest	ASGI server
PyMongo	Latest	MongoDB driver
Pymilvus	Latest	Milvus client

AI/ML¶

Technology	Version	Purpose
OpenAI API	Latest	GPT-4, GPT-3.5
BAAI/bge-small-en-v1.5	v1.5	Embedding model (384-dim)
Azure TTS	Latest	Text-to-speech
Whisper API	Latest	Speech-to-text

Data¶

Technology	Version	Purpose
Azure Cosmos DB	Latest	MongoDB-compatible DB
Milvus	2.3+	Vector database
Azure Blob Storage	Latest	File storage
etcd	Latest	Milvus metadata
MinIO	Latest	Milvus object storage

DevOps¶

Technology	Version	Purpose
Docker	Latest	Containerization
Azure Container Apps	Latest	Hosting
GitHub Actions	Latest	CI/CD
DataDog	Latest	APM, monitoring
Loki	Latest	Logging

Scalability & Performance¶

Current Scale¶

Users: 5,000+
Chatbots: 500+
Conversations/Month: 150,000+
Documents: 500,000+
Embeddings: 25M vectors

Performance Metrics¶

Metric	p50	p95	Target
End-to-end response	1.8s	3.5s	< 3s
Milvus search	15ms	35ms	< 50ms
Cosmos DB query	28ms	62ms	< 100ms
LLM API call	800ms	1500ms	< 2s

Horizontal Scaling¶

Microservices: Each service independently scalable
Auto-scaling: 2-10 replicas based on CPU/memory
Milvus: Can add query nodes for read scaling
Cosmos DB: Auto-scales RU/s (400-4000)

Last Updated: 2025-12-26
Version: 1.0
Review Cycle: Quarterly
Next Review: 2025-03-31

"20+ microservices, one powerful platform."

Product Architecture Overview¶

Executive Summary¶

High-Level Architecture¶

System Components¶

1. Frontend Layer¶

Next.js Application (Port: 3000)¶

2. API Gateway Layer¶

Gateway Service (Port 8000)¶

3. Authentication & User Management¶

Auth Service (Port 8001)¶

User Service (Port 8002)¶

4. Chatbot Management¶

Create Chatbot Service (Port 8003)¶

Selection Service (Port 8004)¶

5. Chatbot State Management¶

State Services (Ports 8006, 8007, 8008)¶

6. Chatbot Response Generation¶

Response 3D Chatbot Service (Port 8011) ⭐ CORE SERVICE¶

Response Text Chatbot Service (Port 8012)¶

Response Voice Chatbot Service (Port 8013)¶

7. AI/ML Layer¶

LLM Model Service (Port 8016) ⭐ CORE SERVICE¶

Data Crawling Service (Port 8005)¶

System Prompt Service (Port 8009)¶

8. Supporting Services¶

Chat History Service (Port 8014)¶

Payment Service (Port 8019)¶

Client Data Collection Service (Port 8015)¶

Superadmin Service (Port 8020)¶

Homepage Chatbot Service (Port 8021)¶

Remote Physio Service (Port 8022)¶

9. Data Layer¶

Azure Cosmos DB (MongoDB API)¶

Milvus Vector Database¶

Azure Blob Storage¶

Data Flow Diagrams¶

User Query → AI Response Flow¶

Document Upload → Embedding Flow¶

Technology Stack Summary¶

Frontend¶

Backend¶

AI/ML¶

Data¶

DevOps¶

Scalability & Performance¶

Current Scale¶

Performance Metrics¶

Horizontal Scaling¶

Related Documentation¶