Database Schema - MongoDB Collections¶
Section: 4-data-architecture-governance
Document: MongoDB Database Schema
Database: Machine_agent_dev (Development), Machine_agent_demo (Production)
Collections: 9 primary collections
Last Updated: 2025-12-30
🎯 Overview¶
This document provides complete schema documentation for all 9 MongoDB collections used in the MachineAvatars platform, including field definitions, data types, indexes, relationships, and sample documents.
Database Provider: Azure Cosmos DB for MongoDB API
API Version: MongoDB 4.0
Region: Central India
Encryption: AES-256 (automatic)
📊 Collections Summary¶
| Collection | Purpose | Avg Docs | Avg Size | Critical Fields |
|---|---|---|---|---|
| users_multichatbot_v2 | User accounts & authentication | ~10K | 2KB | email, user_id, verified |
| chatbot_selections | Chatbot configurations | ~5K | 5KB | project_id, user_id, selection_avatar |
| chatbot_history | Conversation logs | ~500K | 1KB | project_id, session_id, question, answer |
| files | Uploaded file metadata (GridFS) | ~2K | 500B | user_id, project_id, file_type |
| files_secondary | Secondary file storage | ~1K | 500B | user_id, project_id |
| system_prompts_user | Custom system prompts | ~3K | 2KB | user_id, project_id, system_prompt |
| projectid_creation | Project metadata | ~5K | 1KB | user_id, project_id, chatbot_name |
| organisation_data | Enterprise organizations | ~50 | 3KB | organization_id, owner_user_id |
| trash_collection_name | Soft-deleted chatbots (7-day retention) | ~500 | 5KB | user_id, project_id, deleted_at |
1. users_multichatbot_v2¶
Purpose: User accounts, authentication, subscription management
Used By:
- auth-service (login)
- user-service (signup, OTP, features)
- All services (user context validation)
Schema¶
{
// Identity
"_id": ObjectId("65a1b2c3d4e5f6789abc"),
"user_id": "User-123456", // Generated on OTP verification
"email": "user@example.com", // Unique, indexed
"password": "plaintext123", // ⚠️ PLAIN TEXT - SECURITY ISSUE!
"name": "John Doe",
// Verification
"verified": false, // true after OTP verification
"otp": "654321", // 6-digit OTP
"otp_expiration": 1735214460, // Unix timestamp (60 seconds validity)
// Account Status
"account_status": "paused", // "paused" | "active" | "suspended"
"payment_status": "pending", // "pending" | "active" | "failed"
// Subscription
"subscription_id": "sub_009", // Free: sub_009, Pro: sub_001, etc.
"subscription_date": "2025-01-15", // ISO date string
"subscription_plan": "Free", // Plan name (optional, duplicated)
// Timestamps
"user_created_at(DATE)": "2025-01-15", // ISO date string
"created_at": ISODate("2025-01-15T10:00:00.000Z"), // Optional
// Organization (Enterprise only)
"organization_id": "org_abc123", // Optional
"role": "Admin", // "Owner" | "Admin" | "Editor" | "Viewer" | "Analyst"
"department_id": "dept_sales", // Optional
// Consent (GDPR/DPDPA)
"user_consent": {
"data_processing": true, // Required
"marketing_emails": false, // Optional
"analytics_tracking": true, // Optional
"consented_at": "2025-01-15T10:00:00Z",
"consent_version": "v1.0"
}
}
Indexes¶
db.users_multichatbot_v2.createIndex({ email: 1 }, { unique: true });
db.users_multichatbot_v2.createIndex({ user_id: 1 });
db.users_multichatbot_v2.createIndex({ subscription_id: 1 });
db.users_multichatbot_v2.createIndex({ organization_id: 1 });
Sample Document¶
{
"_id": ObjectId("65a1b2c3d4e5f6789abc"),
"user_id": "User-789456",
"email": "sarah.johnson@example.com",
"password": "SecurePass123", // ⚠️ Should be bcrypt hash!
"name": "Sarah Johnson",
"verified": true,
"account_status": "active",
"payment_status": "active",
"subscription_id": "sub_001", // Pro plan
"subscription_plan": "Pro",
"subscription_date": "2025-01-10",
"user_created_at(DATE)": "2025-01-10",
"role": "Owner"
}
Relationships¶
- 1:N with chatbot_selections - One user, many chatbots
- 1:N with chatbot_history - One user, many conversations
- 1:N with projectid_creation - One user, many projects
- 1:1 with features_per_user - One user, one feature set
- N:1 with organisation_data - Many users, one organization (Enterprise)
2. chatbot_selections¶
Purpose: Chatbot configurations and settings
Used By:
- create-chatbot-service (create chatbots)
- chatbot-maintenance-service (CRUD operations)
- selection-3d/text/voice-chatbot-service (get configuration)
Schema¶
{
// Identity
"_id": ObjectId("65a1b2c3d4e5f6789def"),
"user_id": "User-123456",
"project_id": "User-123456_Project_1", // Unique per user
// Avatar Selection (3D Chatbots)
"selection_avatar": "Avatar_Lisa", // "Avatar_Lisa" | "Avatar_Emma" | "Avatar_Jack" | etc.
"selection_voice": "Female_1", // Voice ID
"selection_model": "gpt-4-turbo", // LLM model
// Chatbot Type
"chatbot_type": "3d", // "3d" | "text" | "voice"
// Branding
"chatbot_name": "Customer Support Bot",
"chatbot_logo_url": "https://...", // Optional
"brand_color": "#FF6622", // Hex color
// Behavior
"temperature": 0.7, // LLM temperature (0.0 - 1.0)
"max_tokens": 2048, // Max response length
// Guardrails
"guardrails_enabled": true,
"guardrails_id": "507f1f77bcf86cd799439015", // Reference to chatbot_guardrails
// Greeting
"greeting_message": "Hello! How can I help you today?",
"greeting_audio_url": "https://...", // Optional pre-recorded greeting
// Analytics
"total_conversations": 1234,
"last_active": ISODate("2025-01-15T14:30:00.000Z"),
// Deletion
"isDeleted": false, // Soft delete flag
"deleted_at": null, // Timestamp when deleted
// Timestamps
"created_at": ISODate("2025-01-10T10:00:00.000Z"),
"updated_at": ISODate("2025-01-15T12:00:00.000Z")
}
Indexes¶
db.chatbot_selections.createIndex(
{ user_id: 1, project_id: 1 },
{ unique: true }
);
db.chatbot_selections.createIndex({ user_id: 1 });
db.chatbot_selections.createIndex({ isDeleted: 1 });
Sample Document¶
{
"_id": ObjectId("65a1b2c3d4e5f6789def"),
"user_id": "User-789456",
"project_id": "User-789456_Project_Support",
"selection_avatar": "Avatar_Emma",
"selection_voice": "Female_2",
"selection_model": "gpt-4-turbo",
"chatbot_type": "3d",
"chatbot_name": "Emma - Customer Support",
"brand_color": "#4A90E2",
"temperature": 0.5,
"max_tokens": 1500,
"guardrails_enabled": true,
"greeting_message": "Hi! I'm Emma. How may I assist you?",
"total_conversations": 567,
"last_active": ISODate("2025-01-15T16:45:00.000Z"),
"isDeleted": false,
"created_at": ISODate("2025-01-12T09:00:00.000Z"),
"updated_at": ISODate("2025-01-15T11:30:00.000Z")
}
Relationships¶
- N:1 with users_multichatbot_v2 - Many chatbots, one user
- 1:1 with projectid_creation - One chatbot, one project
- 1:N with chatbot_history - One chatbot, many conversations
- 1:1 with chatbot_guardrails - One chatbot, one guardrail config (optional)
3. chatbot_history¶
Purpose: Conversation logs and chat history
Used By:
- response-3d/text/voice-chatbot-service (save conversations)
- chat-history-service (retrieve history)
- analytics services (conversation analytics)
Schema¶
{
// Identity
"_id": ObjectId("65a1b2c3d4e5f6789ghi"),
"user_id": "User-123456",
"project_id": "User-123456_Project_1",
"session_id": "session_xyz789", // Groups conversation turns
// Conversation Turn
"question": "What are your business hours?",
"answer": "We are open Monday to Friday, 9 AM to 5 PM EST.",
// Context
"retrieved_context": [ // RAG context from Milvus
"Business hours: Mon-Fri 9-5 EST",
"Weekend: Closed"
],
// Model Details
"model_used": "gpt-4-turbo",
"tokens_used": 156, // Total tokens (prompt + completion)
"response_time_ms": 1234, // Latency in milliseconds
// Audio (3D/Voice chatbots)
"audio_duration_seconds": 3.5, // TTS audio length
"lipsync_data": [...], // Rhubarb lipsync JSON (3D only)
// Metadata
"user_ip": "203.0.113.50", // Client IP
"user_agent": "Mozilla/5.0...", // Browser
"device_type": "mobile", // "mobile" | "desktop"
// Timestamps
"timestamp": ISODate("2025-01-15T14:30:15.000Z"),
"created_at": ISODate("2025-01-15T14:30:15.000Z")
}
Indexes¶
db.chatbot_history.createIndex({ user_id: 1, project_id: 1 });
db.chatbot_history.createIndex({ session_id: 1 });
db.chatbot_history.createIndex({ timestamp: -1 }); // For chronological retrieval
db.chatbot_history.createIndex({ user_id: 1, timestamp: -1 });
Sample Document¶
{
"_id": ObjectId("65a1b2c3d4e5f6789ghi"),
"user_id": "User-789456",
"project_id": "User-789456_Project_Support",
"session_id": "session_abc123",
"question": "How do I reset my password?",
"answer": "To reset your password, click on 'Forgot Password' on the login page. You'll receive an OTP via email to verify your identity and set a new password.",
"retrieved_context": [
"Password reset process: Click 'Forgot Password', enter email, receive OTP, verify OTP, set new password.",
"OTPs are valid for 60 seconds."
],
"model_used": "gpt-4-turbo",
"tokens_used": 89,
"response_time_ms": 987,
"audio_duration_seconds": 5.2,
"user_ip": "198.51.100.42",
"device_type": "desktop",
"timestamp": ISODate("2025-01-15T10:22:33.000Z")
}
Retention Policy¶
| Plan | Retention Period |
|---|---|
| Free | 7 days |
| Pro | 30 days |
| Business | 90 days |
| Premium/Enterprise | Unlimited |
Auto-deletion: Conversations older than retention period are automatically deleted (via scheduled job)
4. files¶
Purpose: Metadata for uploaded files stored in GridFS
Used By:
- data-crawling-service (crawled websites)
- file-upload-service (user uploads)
- chatbot-maintenance-service (document updates)
Schema¶
{
// Identity
"_id": ObjectId("65a1b2c3d4e5f6789jkl"),
"user_id": "User-123456",
"project_id": "User-123456_Project_1",
// File Info
"file_name": "product_catalog.pdf",
"file_type": "pdf", // "pdf" | "docx" | "xlsx" | "txt" | "url" | "qna"
"file_size_bytes": 2048576, // 2MB
"mime_type": "application/pdf",
// Content
"extracted_text": "Full text content...", // Extracted text from file
"chunks": [ // Text chunking for RAG
{
"chunk_index": 0,
"content": "Chunk text...",
"start_pos": 0,
"end_pos": 500,
"length": 500
}
],
// Embeddings
"embeddings": [0.123, -0.456, ...], // Vector embedding (384 dimensions for bge-small)
"embedding_model": "BAAI/bge-small-en-v1.5",
// GridFS (if large file)
"file_blob": Binary("..."), // Binary data if small file
"gridfs_id": ObjectId("..."), // GridFS file ID if large file
// URL-specific (for crawled websites)
"source_url": "https://example.com/products", // Original URL
"crawled_at": ISODate("2025-01-15T10:00:00.000Z"),
// Timestamps
"created_at": ISODate("2025-01-15T10:00:00.000Z"),
"last_modified": ISODate("2025-01-15T12:30:00.000Z")
}
Indexes¶
db.files.createIndex({ user_id: 1, project_id: 1 });
db.files.createIndex({ file_type: 1 });
db.files.createIndex({ created_at: -1 });
Sample Document (PDF)¶
{
"_id": ObjectId("65a1b2c3d4e5f6789jkl"),
"user_id": "User-789456",
"project_id": "User-789456_Project_Support",
"file_name": "help_guide_2025.pdf",
"file_type": "pdf",
"file_size_bytes": 1536000,
"mime_type": "application/pdf",
"extracted_text": "MachineAvatars Help Guide 2025\n\nTable of Contents...",
"chunks": [
{
"chunk_index": 0,
"content": "MachineAvatars Help Guide 2025\n\nGetting Started...",
"start_pos": 0,
"end_pos": 512,
"length": 512
}
],
"gridfs_id": ObjectId("65a1b2c3d4e5f6789xyz"),
"embedding_model": "BAAI/bge-small-en-v1.5",
"created_at": ISODate("2025-01-14T15:00:00.000Z"),
"last_modified": ISODate("2025-01-14T15:00:00.000Z")
}
Sample Document (Crawled URL)¶
{
"_id": ObjectId("65a1b2c3d4e5f6789mno"),
"user_id": "User-789456",
"project_id": "User-789456_Project_Support",
"file_type": "url",
"source_url": "https://machineavatars.com/pricing",
"extracted_text": {
"https://machineavatars.com/pricing": "Pricing Plans\n\nFree Plan: $0/month..."
},
"chunks": [
{
"chunk_index": 0,
"content": "Pricing Plans\n\nFree Plan: $0/month, 1 chatbot, 50 conversations...",
"start_pos": 0,
"end_pos": 512,
"length": 512
}
],
"crawled_at": ISODate("2025-01-15T08:00:00.000Z"),
"created_at": ISODate("2025-01-15T08:00:00.000Z")
}
5. files_secondary¶
Purpose: Secondary file storage (specific use cases)
Schema: Similar to files collection
Note: Used for organization-specific files or backup storage
6. system_prompts_user¶
Purpose: Custom system prompts per chatbot
Used By:
- systemprompt-service (CRUD)
- response-3d/text/voice-chatbot-service (retrieve prompt)
Schema¶
{
// Identity
"_id": ObjectId("65a1b2c3d4e5f6789pqr"),
"user_id": "User-123456",
"project_id": "User-123456_Project_1",
// Prompt
"system_prompt": "You are Emma, a friendly customer support assistant for ACME Corp. Always be polite, concise, and helpful. If you don't know the answer, admit it and offer to connect the user with a human agent.",
// Configuration
"prompt_version": "v2.0", // Versioning for A/B testing
"is_active": true, // Enable/disable
// Timestamps
"created_at": ISODate("2025-01-10T10:00:00.000Z"),
"updated_at": ISODate("2025-01-15T14:00:00.000Z")
}
Indexes¶
db.system_prompts_user.createIndex(
{ user_id: 1, project_id: 1 },
{ unique: true }
);
db.system_prompts_user.createIndex({ is_active: 1 });
Sample Document¶
{
"_id": ObjectId("65a1b2c3d4e5f6789pqr"),
"user_id": "User-789456",
"project_id": "User-789456_Project_Support",
"system_prompt": "You are an AI assistant for TechCorp's IT Help Desk. You specialize in troubleshooting software issues, password resets, and hardware problems. Always ask clarifying questions before providing solutions. Be technical but clear.",
"prompt_version": "v3.1",
"is_active": true,
"created_at": ISODate("2025-01-12T11:00:00.000Z"),
"updated_at": ISODate("2025-01-15T09:30:00.000Z")
}
7. projectid_creation¶
Purpose: Project metadata and tracking
Used By:
- create-chatbot-service (create projects)
- All services (project validation)
Schema¶
{
// Identity
"_id": ObjectId("65a1b2c3d4e5f6789stu"),
"user_id": "User-123456",
"project_id": "User-123456_Project_1", // Unique identifier
// Project Info
"chatbot_name": "Customer Support Bot",
"chatbot_description": "Handles customer inquiries about products and services",
// Metadata
"chatbot_type": "3d", // "3d" | "text" | "voice"
"industry": "E-commerce", // Optional
"use_case": "Customer Support", // Optional
// Statistics
"total_documents": 15, // Files uploaded
"total_urls_crawled": 5, // Websites crawled
"total_qna_pairs": 23, // Manual Q&A
"total_vectors": 1250, // Embeddings in Milvus
// Status
"status": "active", // "active" | "paused" | "archived"
// Deletion
"isDeleted": false,
"deleted_at": null,
// Timestamps
"created_at": ISODate("2025-01-10T10:00:00.000Z"),
"updated_at": ISODate("2025-01-15T16:00:00.000Z")
}
Indexes¶
db.projectid_creation.createIndex(
{ user_id: 1, project_id: 1 },
{ unique: true }
);
db.projectid_creation.createIndex({ user_id: 1 });
db.projectid_creation.createIndex({ status: 1 });
Sample Document¶
{
"_id": ObjectId("65a1b2c3d4e5f6789stu"),
"user_id": "User-789456",
"project_id": "User-789456_Project_Support",
"chatbot_name": "TechSupport AI - Emma",
"chatbot_description": "AI-powered IT helpdesk assistant for internal employees",
"chatbot_type": "3d",
"industry": "Technology",
"use_case": "Internal IT Support",
"total_documents": 8,
"total_urls_crawled": 3,
"total_qna_pairs": 47,
"total_vectors": 892,
"status": "active",
"isDeleted": false,
"created_at": ISODate("2025-01-12T09:00:00.000Z"),
"updated_at": ISODate("2025-01-15T17:20:00.000Z")
}
8. organisation_data¶
Purpose: Enterprise organization management
Used By:
- Enterprise features only
- Multi-user account management
Schema¶
{
// Identity
"_id": ObjectId("65a1b2c3d4e5f6789vwx"),
"organization_id": "org_abc123",
// Organization Info
"name": "ACME Corporation",
"domain": "acme.com", // Email domain for SSO
"industry": "Manufacturing",
// Owner
"owner_user_id": "User-123456",
// Subscription
"plan": "Premium", // "Business" | "Premium" | "Enterprise"
"billing_email": "billing@acme.com",
"seats_purchased": 50, // Number of user seats
"seats_used": 32, // Current users
// Departments
"departments": [
{
"department_id": "dept_sales",
"name": "Sales",
"admin_user_ids": ["User-456"],
"member_user_ids": ["User-789", "User-321"]
},
{
"department_id": "dept_support",
"name": "Customer Support",
"admin_user_ids": ["User-654"],
"member_user_ids": ["User-987", "User-246"]
}
],
// SSO Configuration
"sso_enabled": true,
"sso_provider": "Okta", // "Okta" | "Azure AD" | "Google" | "Custom SAML"
"sso_metadata_url": "https://acme.okta.com/app/...",
// Security
"ip_whitelist": [
"203.0.113.0/24", // Office IP range
"198.51.100.50/32" // VPN gateway
],
// Features
"features_enabled": [
"department_partitioning",
"advanced_analytics",
"custom_branding",
"api_access",
"priority_support"
],
// Timestamps
"created_at": ISODate("2025-01-01T00:00:00.000Z"),
"updated_at": ISODate("2025-01-15T10:00:00.000Z")
}
Indexes¶
db.organisation_data.createIndex({ organization_id: 1 }, { unique: true });
db.organisation_data.createIndex({ owner_user_id: 1 });
db.organisation_data.createIndex({ domain: 1 });
Sample Document¶
{
"_id": ObjectId("65a1b2c3d4e5f6789vwx"),
"organization_id": "org_techcorp",
"name": "TechCorp Solutions Inc.",
"domain": "techcorp.com",
"industry": "Technology/SaaS",
"owner_user_id": "User-100001",
"plan": "Enterprise",
"billing_email": "accounting@techcorp.com",
"seats_purchased": 100,
"seats_used": 73,
"departments": [
{
"department_id": "dept_engineering",
"name": "Engineering",
"admin_user_ids": ["User-100002"],
"member_user_ids": ["User-100010", "User-100011", "User-100012"]
},
{
"department_id": "dept_sales",
"name": "Sales & Marketing",
"admin_user_ids": ["User-100003"],
"member_user_ids": ["User-100020", "User-100021"]
}
],
"sso_enabled": true,
"sso_provider": "Azure AD",
"sso_metadata_url": "https://login.microsoftonline.com/techcorp.com/...",
"ip_whitelist": [
"198.51.100.0/24",
"203.0.113.42/32"
],
"features_enabled": [
"department_partitioning",
"advanced_analytics",
"custom_branding",
"api_access",
"priority_support",
"hipaa_compliance",
"custom_integrations"
],
"created_at": ISODate("2024-12-01T00:00:00.000Z"),
"updated_at": ISODate("2025-01-15T14:30:00.000Z")
}
9. trash_collection_name¶
Purpose: Soft-deleted chatbots (7-day retention for recovery)
Used By:
- chatbot-maintenance-service (trash management)
Schema¶
{
// Same fields as chatbot_selections + projectid_creation
...
// Deletion Metadata
"deleted_at": ISODate("2025-01-15T10:00:00.000Z"), // When deleted
"deleted_by": "User-123456", // Who deleted (optional)
// Note: Milvus embeddings and Azure Blobs are NOT in trash (hard deleted)
}
Retention & Auto-Purge¶
Retention Period: 7 days
Auto-Purge: Documents older than 7 days should be automatically deleted (not implemented in code - manual cleanup needed)
Recommended Cron Job:
// Daily cleanup of trash older than 7 days
db.trash_collection_name.deleteMany({
deleted_at: { $lt: new Date(Date.now() - 7 * 24 * 60 * 60 * 1000) },
});
📊 Collection Relationships¶
graph TB
USER[users_multichatbot_v2]
ORG[organisation_data]
CHATBOT[chatbot_selections]
PROJECT[projectid_creation]
HISTORY[chatbot_history]
FILES[files]
PROMPTS[system_prompts_user]
FEATURES[features_per_user]
TRASH[trash_collection_name]
USER -->|1:N| CHATBOT
USER -->|1:N| PROJECT
USER -->|1:N| HISTORY
USER -->|1:N| FILES
USER -->|1:N| PROMPTS
USER -->|1:1| FEATURES
USER -->|N:1| ORG
CHATBOT -->|1:1| PROJECT
CHATBOT -->|1:N| HISTORY
PROJECT -->|1:N| FILES
PROJECT -->|1:1| PROMPTS
CHATBOT -.->|soft delete| TRASH
PROJECT -.->|soft delete| TRASH
style USER fill:#E3F2FD
style CHATBOT fill:#FFF3E0
style HISTORY fill:#C8E6C9
style TRASH fill:#FFCDD2
🔒 Data Security¶
Encryption:
- All collections encrypted at rest (AES-256, Azure-managed)
- TLS 1.3 for all database connections
Access Control:
- Azure Managed Identity authentication
- Connection strings stored in Azure Key Vault (planned)
- Database-level RBAC
Sensitive Data:
- Passwords: ⚠️ PLAIN TEXT - MUST MIGRATE TO BCRYPT
- Email: Encrypted at rest
- Conversation data: User-owned, exportable (GDPR)
- PII fields: Include in data export API
Compliance:
- GDPR: Right to erasure, data export2
- DPDPA: Data localization (Central India)
- HIPAA: On-premise only (not cloud)
🔗 Related Documentation¶
Services:
- Auth Service - users collection
- User Service - users, features
- Chatbot Maintenance - All collections
Data Architecture:
- Index - Data architecture overview
- Vector Store - Milvus collections
- Data Dictionary - Field reference
Security:
- Encryption - Database encryption
- Compliance - GDPR retention
Progress: Section 4 - 2/8 files complete (25%)
"Schema is the contract between code and data." 📊✅