Product Architecture Overview¶
Purpose: High-level system architecture and component breakdown
Audience: Technical Team, Architects, CTO, Technical Investors
Owner: Solution Architect, CTO
Last Updated: 2025-12-26
Version: 1.0
Executive Summary¶
MachineAvatars is built on a modern microservices architecture with 23 independent services, leveraging Azure cloud infrastructure, multi-LLM AI capabilities, and advanced RAG (Retrieval-Augmented Generation) for intelligent conversations.
Architecture Highlights:
- ✅ 23 Microservices - Independent, scalable, fault-tolerant
- ✅ Multi-LLM Strategy - 9 AI models across 4 providers
- ✅ RAG Pipeline - Milvus vector database with 384-dim embeddings
- ✅ Next.js Frontend - SSR/SSG for performance and SEO
- ✅ Azure Cloud - Container Apps, Cosmos DB, Blob Storage
- ✅ 99.7% Uptime - Production-proven reliability
High-Level Architecture¶
graph TB
subgraph "User Layer"
A[Web Browser]
B[Mobile App - Future]
end
subgraph "Frontend - Next.js 14"
C[Next.js Application]
D[3D Avatar Renderer<br/>Three.js]
E[API Client<br/>Axios]
end
subgraph "API Gateway"
F[Gateway Service<br/>Port 8000]
end
subgraph "Authentication"
G[Auth Service<br/>Port 8001<br/>JWT]
end
subgraph "Core Services"
H[User Service<br/>Port 8002]
I[Create Chatbot<br/>Port 8003]
J[Selection Service<br/>Port 8004]
end
subgraph "Chatbot Services - 3 Types"
K[3D Chatbot State<br/>Port 8006]
L[Text Chatbot State<br/>Port 8007]
M[Voice Chatbot State<br/>Port 8008]
N[3D Response<br/>Port 8011]
O[Text Response<br/>Port 8012]
P[Voice Response<br/>Port 8013]
end
subgraph "AI/ML Layer"
Q[LLM Model Service<br/>Port 8016]
R[Data Crawling<br/>Port 8005]
S[System Prompts<br/>Port 8009]
end
subgraph "Supporting Services"
T[Chat History<br/>Port 8014]
U[Payment Service<br/>Port 8019]
V[Analytics<br/>Port 8015]
end
subgraph "Data Layer"
W[(Azure Cosmos DB<br/>MongoDB API)]
X[(Milvus<br/>Vector Database)]
Y[Azure Blob Storage]
end
subgraph "External AI APIs"
Z1[OpenAI<br/>GPT-4, GPT-3.5]
Z2[Anthropic<br/>Claude]
Z3[Google<br/>Gemini]
Z4[Azure ML<br/>Llama, DeepSeek]
end
A --> C
C --> E
E --> F
F --> G
F --> H
F --> I
F --> J
F --> K
F --> L
F --> M
F --> N
F --> O
F --> P
F --> T
F --> U
N --> Q
O --> Q
P --> Q
Q --> Z1
Q --> Z2
Q --> Z3
Q --> Z4
N --> X
O --> X
P --> X
R --> W
R --> X
H --> W
I --> W
J --> W
K --> W
L --> W
M --> W
T --> W
U --> W
N --> Y
P --> Y
style F fill:#FFE082
style Q fill:#FFF3E0
style W fill:#CE93D8
style X fill:#90CAF9
System Components¶
1. Frontend Layer¶
Next.js Application (Port: 3000)¶
Technology:
- Next.js 14.1.6 (App Router)
- React 18.2.0
- TypeScript 5.x
- Tailwind CSS 3.4.1
Key Features:
- SSR/SSG: Server-side rendering for SEO, static generation for marketing pages
- 3D Avatars: Three.js 0.171.0 for avatar rendering (60 FPS)
- Real-Time: WebSocket support for voice chatbot streaming
- Responsive: Mobile-first design, works on all devices
Architecture Pattern:
app/
├── (marketing)/ # Public pages (SSG)
│ ├── page.tsx # Homepage
│ ├── pricing/
│ └── features/
├── dashboard/ # Auth-required (SSR)
│ ├── chatbots/
│ └── settings/
└── api/ # Backend-for-frontend
├── auth/
└── proxy/
Deployment: Vercel (free tier) / Azure Static Web Apps (production)
2. API Gateway Layer¶
Gateway Service (Port 8000)¶
Purpose: Single entry point for all backend API calls
Responsibilities:
- Routing: Directs requests to appropriate microservices
- Authentication: Validates JWT tokens via Auth Service
- Rate Limiting: Prevents abuse (100 requests/min per user)
- CORS: Handles cross-origin requests
- Request/Response Logging: Centralized logging
Technology:
- FastAPI (Python)
- Uvicorn (ASGI server)
Example Flow:
3. Authentication & User Management¶
Auth Service (Port 8001)¶
Purpose: Authentication, authorization, token management
Features:
- JWT Tokens: Secure, stateless authentication
- Login/Signup: Email + password (bcrypt hashing)
- OTP Verification: For email verification
- Token Refresh: Automatic token renewal
- Session Management: Track active sessions
Database: users collection in Cosmos DB
Security:
- Password hashing (bcrypt, 12 rounds)
- Rate limiting (5 login attempts per 15 minutes)
- Account lockout after failed attempts
User Service (Port 8002)¶
Purpose: User profile management, settings
Features:
- Update user profile
- Manage subscription info
- Email notifications
- User preferences
Database: users collection
4. Chatbot Management¶
Create Chatbot Service (Port 8003)¶
Purpose: Create new chatbot instances
Flow:
- User selects chatbot type (3D, Text, Voice)
- Chooses avatar (for 3D)
- Sets personality/tone
- System creates project_id
- Initializes database collections
Database: chatbot_selection, projectid_creation
Selection Service (Port 8004)¶
Purpose: Retrieve chatbot configurations
Features:
- Get chatbot details by project_id
- List all chatbots for a user
- Update chatbot settings
- Delete chatbots (soft delete)
5. Chatbot State Management¶
Purpose: Manage chatbot state and session data
State Services (Ports 8006, 8007, 8008)¶
- 3D Chatbot State (8006): Avatar animation states, lip-sync data
- Text Chatbot State (8007): Message history, typing indicators
- Voice Chatbot State (8008): Audio session, transcription state
Shared Functionality:
- Session initialization
- State persistence
- Context management
6. Chatbot Response Generation¶
Response 3D Chatbot Service (Port 8011) ⭐ CORE SERVICE¶
Purpose: Generate AI responses for 3D avatar chatbots
Flow:
sequenceDiagram
User->>3D Response: Send message
3D Response->>Milvus: Search embeddings (top-5)
Milvus-->>3D Response: Return relevant chunks
3D Response->>LLM Service: Generate response (context + query)
LLM Service-->>3D Response: AI answer
3D Response->>Azure TTS: Text-to-speech
Azure TTS-->>3D Response: Audio (WAV)
3D Response->>Blob Storage: Upload audio
3D Response-->>User: Response + audio URL + visemes
Key Features:
- RAG pipeline (Milvus vector search)
- Multi-LLM support (user-selectable model)
- Voice synthesis (Azure TTS)
- Lip-sync data (visemes for avatar animation)
- Conversation history tracking
Performance:
- p50 latency: 1.8 seconds
- p95 latency: 3.5 seconds
Response Text Chatbot Service (Port 8012)¶
Purpose: Text-only chatbot responses
Simpler than 3D:
- No TTS generation
- No visemes
- Faster (p50: 0.8 seconds)
Response Voice Chatbot Service (Port 8013)¶
Purpose: Voice conversation chatbots
Additional Features:
- STT (Speech-to-Text): Whisper API for transcription
- TTS: Azure Neural TTS
- Real-time streaming: WebSocket support (planned)
7. AI/ML Layer¶
LLM Model Service (Port 8016) ⭐ CORE SERVICE¶
Purpose: Unified interface for all LLM providers
Supported Models (10 total):
Azure OpenAI:
- GPT-4-0613 (gpt-4-0613)
- GPT-3.5 Turbo 16K (gpt-35-turbo-16k-0613)
- GPT-4o Mini (gpt-4o-mini-2024-07-18)
- o1-mini (o1-mini-2024-09-12)
Azure ML Endpoints:
- Llama 3.3 70B Instruct
- DeepSeek R1
- Ministral 3B
- Phi-3 Small 8K
External APIs:
- Gemini 2.0 Flash (Google)
- Claude 3.5 Sonnet (Anthropic)
- Grok-3 (xAI)
Routing Logic:
- User-specified model (per chatbot)
- Fallback chain (GPT-4 → GPT-3.5)
- Cost optimization (auto-route simple queries to cheap models)
Configuration:
- Temperature: 0.7 (universal)
- Max tokens: Varies by model
- Retry logic: 3 attempts with exponential backoff
Data Crawling Service (Port 8005)¶
Purpose: Process documents and websites into embeddings
Supported Formats:
- PDFs: Text extraction via PyPDF2
- Text files: Direct processing
- Websites: Crawl up to 50 URLs, extract content
- Q&A pairs: Direct embedding
Pipeline:
- Extract text
- Preprocess (remove HTML, emails, phone numbers)
- Chunk: 1000 characters, 200 overlap
- Embed: BAAI/bge-small-en-v1.5 (384 dimensions)
- Store: Milvus vector database
Configuration:
- Max chunks per document: 50
- Chunk size: 1000 characters
- Overlap: 200 characters
System Prompt Service (Port 8009)¶
Purpose: Manage AI system prompts and guardrails
Features:
- Default prompts (Customer Support, Sales, Custom)
- User-customized prompts
- Guardrails (prohibited topics, keywords)
- Prompt versioning
Database: system_prompts_default, system_prompts_user, guardrails
8. Supporting Services¶
Chat History Service (Port 8014)¶
Purpose: Store and retrieve conversation history
Features:
- Save chat messages (input, output, tokens)
- Retrieve conversation by session_id
- Analytics (token usage, response times)
- TTL (Time-to-Live): 90 days auto-deletion
Database: chatbot_history collection
Schema:
{
"user_id": "...",
"project_id": "...",
"session_id": "...",
"chat_data": [
{
"input_prompt": "...",
"output_response": "...",
"input_tokens": 150,
"output_tokens": 300,
"total_tokens": 450
}
],
"session_total_tokens": 450
}
Payment Service (Port 8019)¶
Purpose: Handle subscriptions and payments
Integration: Razorpay (India payment gateway)
Features:
- Create payment orders
- Verify payment signatures
- Webhook handling (payment success/failure)
- Subscription management
Database: Payment records stored in Cosmos DB
Client Data Collection Service (Port 8015)¶
Purpose: Analytics and usage tracking
Metrics Collected:
- User interactions (clicks, searches)
- Chatbot usage (conversations, messages)
- Performance metrics (response times)
- Error rates
Storage: Cosmos DB + future integration with DataDog
Superadmin Service (Port 8020)¶
Purpose: Administrative dashboard API for platform management
Features:
- Legal document distribution (Terms & Conditions, Privacy Policy PDFs)
- Subscription plan management (dynamic aggregation from 4 collections)
- Admin authentication
- User management & chat history retrieval
Database: 8 MongoDB collections (users, subscriptions, chatbot_history, etc.)
Security: ⚠️ Plain-text password authentication (requires migration to bcrypt)
Homepage Chatbot Service (Port 8021)¶
Purpose: Dedicated chatbot for MachineAgents.ai homepage/landing page
Key Features:
- Intelligent greeting system with UTM-based personalization
- Lead collection & form submission handling
- 7 avatar support (Eva, Shayla, Myra, Chris, Jack, Anu, Emma)
- No RAG/Embeddings - Pure GPT-4 sales assistant
- Dual collections (generate_greeting + generate_greetings for compatibility)
Unique: Homepage-specific with no multi-tenancy (single chatbot)
Integration: TTS + lip-sync for avatar animation, Calendly booking links
Remote Physio Service (Port 8022)¶
Purpose: ⚠️ Client-specific physiotherapy consultation chatbot
Key Features:
- 8-stage state machine (language selection → problem assessment → clinical summary → follow-up)
- Bilingual support (English + Hindi/Hinglish)
- Hybrid RAG (BM25 + Vector search) for exercise/assessment recommendations
- Clinical summary generation with automated medical documentation
- User profile persistence (name, age, weight stored permanently)
- Session inactivity handling (5-minute timeout with return flow)
Integration: Remote Physios API (rp-api.anubhaanant.com, api.remotephysios.com)
Architecture Issue: Hardcoded client-specific logic embedded in main product (should be refactored for multi-tenancy)
9. Data Layer¶
Azure Cosmos DB (MongoDB API)¶
Purpose: Primary database for all metadata
Configuration:
- API: MongoDB (v4.2 compatible)
- Region: Primary - East US, Secondary - Southeast Asia
- Throughput: Autoscale 400-4000 RU/s
- Backup: Continuous (30-day point-in-time restore)
Collections:
users- User accountschatbot_selection- Chatbot configschatbot_history- Conversations (90-day TTL)files- Document metadata with chunksfiles_secondary- Document metadata without chunksprojectid_creation- Project metadatasystem_prompts_default- Default promptssystem_prompts_user- User promptsguardrails- Safety rules- And 4 more...
Performance:
- p50 query latency: 28ms
- p95 query latency: 62ms
- Uptime: 99.98%
Milvus Vector Database¶
Purpose: Store and search document embeddings
Configuration:
- Version: 2.3+
- Collection: "embeddings"
- Dimensions: 384 (BAAI/bge-small-en-v1.5)
- Index: IVF_FLAT (nlist=128)
- Metric: L2 distance
- Architecture: Partition-based multi-tenancy
Partitions:
- Each chatbot = separate partition
- Naming:
User_{user_id}_Project_{project_id} - Benefits: 10-100x faster search, data isolation
Schema:
{
"id": INT64,
"document_id": VARCHAR(100),
"user_id": VARCHAR(100),
"project_id": VARCHAR(100), # Partition key
"chunk_index": INT32,
"text": VARCHAR(2000),
"embedding": FLOAT_VECTOR(384),
"data_type": VARCHAR(50),
"source_url": VARCHAR(500),
"created_at": VARCHAR(100)
}
Performance:
- p50 search latency: 15ms
- p95 search latency: 35ms
- Scales to 100M+ vectors
Azure Blob Storage¶
Purpose: Store files and audio
Containers:
audio-files/- TTS-generated audio (WAV, 7-day retention)documents/- Uploaded PDFs, text filesavatars/- 3D avatar models (GLB format)
Configuration:
- Tier: Hot (frequent access)
- Redundancy: LRS (Locally Redundant Storage)
- CDN: Azure CDN for fast delivery
Data Flow Diagrams¶
User Query → AI Response Flow¶
sequenceDiagram
participant U as User
participant F as Frontend
participant G as Gateway
participant A as Auth
participant R as Response Service
participant M as Milvus
participant L as LLM Service
participant O as OpenAI
participant T as Azure TTS
participant B as Blob Storage
U->>F: "What is your refund policy?"
F->>G: POST /api/3d-chat
G->>A: Verify JWT token
A-->>G: Token valid
G->>R: Forward request
R->>M: Embed query (384-dim)
R->>M: Search partition (top-5)
M-->>R: Return relevant chunks
R->>L: Call LLM with context
L->>O: API call (GPT-3.5)
O-->>L: AI response
L-->>R: Response text
R->>T: Text-to-speech
T-->>R: Audio WAV
R->>B: Upload audio
B-->>R: Audio URL
R-->>G: Response + URL + visemes
G-->>F: JSON response
F-->>U: Display answer + play audio
Document Upload → Embedding Flow¶
sequenceDiagram
participant U as User
participant F as Frontend
participant D as Data Crawling
participant E as Embedder
participant M as Milvus
participant C as Cosmos DB
U->>F: Upload PDF
F->>D: POST /crawl-data (PDF)
D->>D: Extract text
D->>D: Chunk (1000/200)
loop For each chunk
D->>E: Generate embedding
E-->>D: 384-dim vector
end
D->>M: Create partition if needed
D->>M: Bulk insert embeddings
M-->>D: Insert success
D->>C: Save metadata
C-->>D: Save success
D-->>F: Success response
F-->>U: "Document processed!"
Technology Stack Summary¶
Frontend¶
| Technology | Version | Purpose |
|---|---|---|
| Next.js | 14.1.6 | React framework, SSR/SSG |
| React | 18.2.0 | UI library |
| TypeScript | 5.x | Type safety |
| Three.js | 0.171.0 | 3D avatar rendering |
| Tailwind CSS | 3.4.1 | Styling |
| Axios | 1.6.7 | HTTP client |
| Framer Motion | 11.0.8 | Animations |
Backend¶
| Technology | Version | Purpose |
|---|---|---|
| FastAPI | Latest | Python web framework |
| Python | 3.9+ | Programming language |
| Uvicorn | Latest | ASGI server |
| PyMongo | Latest | MongoDB driver |
| Pymilvus | Latest | Milvus client |
AI/ML¶
| Technology | Version | Purpose |
|---|---|---|
| OpenAI API | Latest | GPT-4, GPT-3.5 |
| BAAI/bge-small-en-v1.5 | v1.5 | Embedding model (384-dim) |
| Azure TTS | Latest | Text-to-speech |
| Whisper API | Latest | Speech-to-text |
Data¶
| Technology | Version | Purpose |
|---|---|---|
| Azure Cosmos DB | Latest | MongoDB-compatible DB |
| Milvus | 2.3+ | Vector database |
| Azure Blob Storage | Latest | File storage |
| etcd | Latest | Milvus metadata |
| MinIO | Latest | Milvus object storage |
DevOps¶
| Technology | Version | Purpose |
|---|---|---|
| Docker | Latest | Containerization |
| Azure Container Apps | Latest | Hosting |
| GitHub Actions | Latest | CI/CD |
| DataDog | Latest | APM, monitoring |
| Loki | Latest | Logging |
Scalability & Performance¶
Current Scale¶
- Users: 5,000+
- Chatbots: 500+
- Conversations/Month: 150,000+
- Documents: 500,000+
- Embeddings: 25M vectors
Performance Metrics¶
| Metric | p50 | p95 | Target |
|---|---|---|---|
| End-to-end response | 1.8s | 3.5s | < 3s |
| Milvus search | 15ms | 35ms | < 50ms |
| Cosmos DB query | 28ms | 62ms | < 100ms |
| LLM API call | 800ms | 1500ms | < 2s |
Horizontal Scaling¶
- Microservices: Each service independently scalable
- Auto-scaling: 2-10 replicas based on CPU/memory
- Milvus: Can add query nodes for read scaling
- Cosmos DB: Auto-scales RU/s (400-4000)
Related Documentation¶
Last Updated: 2025-12-26
Version: 1.0
Review Cycle: Quarterly
Next Review: 2025-03-31
"20+ microservices, one powerful platform."