Skip to content

Database Schema - MongoDB Collections

Section: 4-data-architecture-governance
Document: MongoDB Database Schema
Database: Machine_agent_dev (Development), Machine_agent_demo (Production)
Collections: 9 primary collections
Last Updated: 2025-12-30


🎯 Overview

This document provides complete schema documentation for all 9 MongoDB collections used in the MachineAvatars platform, including field definitions, data types, indexes, relationships, and sample documents.

Database Provider: Azure Cosmos DB for MongoDB API
API Version: MongoDB 4.0
Region: Central India
Encryption: AES-256 (automatic)


📊 Collections Summary

Collection Purpose Avg Docs Avg Size Critical Fields
users_multichatbot_v2 User accounts & authentication ~10K 2KB email, user_id, verified
chatbot_selections Chatbot configurations ~5K 5KB project_id, user_id, selection_avatar
chatbot_history Conversation logs ~500K 1KB project_id, session_id, question, answer
files Uploaded file metadata (GridFS) ~2K 500B user_id, project_id, file_type
files_secondary Secondary file storage ~1K 500B user_id, project_id
system_prompts_user Custom system prompts ~3K 2KB user_id, project_id, system_prompt
projectid_creation Project metadata ~5K 1KB user_id, project_id, chatbot_name
organisation_data Enterprise organizations ~50 3KB organization_id, owner_user_id
trash_collection_name Soft-deleted chatbots (7-day retention) ~500 5KB user_id, project_id, deleted_at

1. users_multichatbot_v2

Purpose: User accounts, authentication, subscription management

Used By:

  • auth-service (login)
  • user-service (signup, OTP, features)
  • All services (user context validation)

Schema

{
    // Identity
    "_id": ObjectId("65a1b2c3d4e5f6789abc"),
    "user_id": "User-123456",              // Generated on OTP verification
    "email": "user@example.com",            // Unique, indexed
    "password": "plaintext123",             // ⚠️ PLAIN TEXT - SECURITY ISSUE!
    "name": "John Doe",

    // Verification
    "verified": false,                      // true after OTP verification
    "otp": "654321",                        // 6-digit OTP
    "otp_expiration": 1735214460,           // Unix timestamp (60 seconds validity)

    // Account Status
    "account_status": "paused",             // "paused" | "active" | "suspended"
    "payment_status": "pending",            // "pending" | "active" | "failed"

    // Subscription
    "subscription_id": "sub_009",           // Free: sub_009, Pro: sub_001, etc.
    "subscription_date": "2025-01-15",      // ISO date string
    "subscription_plan": "Free",            // Plan name (optional, duplicated)

    // Timestamps
    "user_created_at(DATE)": "2025-01-15",  // ISO date string
    "created_at": ISODate("2025-01-15T10:00:00.000Z"),  // Optional

    // Organization (Enterprise only)
    "organization_id": "org_abc123",        // Optional
    "role": "Admin",                        // "Owner" | "Admin" | "Editor" | "Viewer" | "Analyst"
    "department_id": "dept_sales",          // Optional

    // Consent (GDPR/DPDPA)
    "user_consent": {
        "data_processing": true,            // Required
        "marketing_emails": false,          // Optional
        "analytics_tracking": true,         // Optional
        "consented_at": "2025-01-15T10:00:00Z",
        "consent_version": "v1.0"
    }
}

Indexes

db.users_multichatbot_v2.createIndex({ email: 1 }, { unique: true });
db.users_multichatbot_v2.createIndex({ user_id: 1 });
db.users_multichatbot_v2.createIndex({ subscription_id: 1 });
db.users_multichatbot_v2.createIndex({ organization_id: 1 });

Sample Document

{
    "_id": ObjectId("65a1b2c3d4e5f6789abc"),
    "user_id": "User-789456",
    "email": "sarah.johnson@example.com",
    "password": "SecurePass123",  // ⚠️ Should be bcrypt hash!
    "name": "Sarah Johnson",
    "verified": true,
    "account_status": "active",
    "payment_status": "active",
    "subscription_id": "sub_001",  // Pro plan
    "subscription_plan": "Pro",
    "subscription_date": "2025-01-10",
    "user_created_at(DATE)": "2025-01-10",
    "role": "Owner"
}

Relationships

  • 1:N with chatbot_selections - One user, many chatbots
  • 1:N with chatbot_history - One user, many conversations
  • 1:N with projectid_creation - One user, many projects
  • 1:1 with features_per_user - One user, one feature set
  • N:1 with organisation_data - Many users, one organization (Enterprise)

2. chatbot_selections

Purpose: Chatbot configurations and settings

Used By:

  • create-chatbot-service (create chatbots)
  • chatbot-maintenance-service (CRUD operations)
  • selection-3d/text/voice-chatbot-service (get configuration)

Schema

{
    // Identity
    "_id": ObjectId("65a1b2c3d4e5f6789def"),
    "user_id": "User-123456",
    "project_id": "User-123456_Project_1",  // Unique per user

    // Avatar Selection (3D Chatbots)
    "selection_avatar": "Avatar_Lisa",       // "Avatar_Lisa" | "Avatar_Emma" | "Avatar_Jack" | etc.
    "selection_voice": "Female_1",           // Voice ID
    "selection_model": "gpt-4-turbo",        // LLM model

    // Chatbot Type
    "chatbot_type": "3d",                    // "3d" | "text" | "voice"

    // Branding
    "chatbot_name": "Customer Support Bot",
    "chatbot_logo_url": "https://...",       // Optional
    "brand_color": "#FF6622",                // Hex color

    // Behavior
    "temperature": 0.7,                      // LLM temperature (0.0 - 1.0)
    "max_tokens": 2048,                      // Max response length

    // Guardrails
    "guardrails_enabled": true,
    "guardrails_id": "507f1f77bcf86cd799439015",  // Reference to chatbot_guardrails

    // Greeting
    "greeting_message": "Hello! How can I help you today?",
    "greeting_audio_url": "https://...",     // Optional pre-recorded greeting

    // Analytics
    "total_conversations": 1234,
    "last_active": ISODate("2025-01-15T14:30:00.000Z"),

    // Deletion
    "isDeleted": false,                      // Soft delete flag
    "deleted_at": null,                      // Timestamp when deleted

    // Timestamps
    "created_at": ISODate("2025-01-10T10:00:00.000Z"),
    "updated_at": ISODate("2025-01-15T12:00:00.000Z")
}

Indexes

db.chatbot_selections.createIndex(
  { user_id: 1, project_id: 1 },
  { unique: true }
);
db.chatbot_selections.createIndex({ user_id: 1 });
db.chatbot_selections.createIndex({ isDeleted: 1 });

Sample Document

{
    "_id": ObjectId("65a1b2c3d4e5f6789def"),
    "user_id": "User-789456",
    "project_id": "User-789456_Project_Support",
    "selection_avatar": "Avatar_Emma",
    "selection_voice": "Female_2",
    "selection_model": "gpt-4-turbo",
    "chatbot_type": "3d",
    "chatbot_name": "Emma - Customer Support",
    "brand_color": "#4A90E2",
    "temperature": 0.5,
    "max_tokens": 1500,
    "guardrails_enabled": true,
    "greeting_message": "Hi! I'm Emma. How may I assist you?",
    "total_conversations": 567,
    "last_active": ISODate("2025-01-15T16:45:00.000Z"),
    "isDeleted": false,
    "created_at": ISODate("2025-01-12T09:00:00.000Z"),
    "updated_at": ISODate("2025-01-15T11:30:00.000Z")
}

Relationships

  • N:1 with users_multichatbot_v2 - Many chatbots, one user
  • 1:1 with projectid_creation - One chatbot, one project
  • 1:N with chatbot_history - One chatbot, many conversations
  • 1:1 with chatbot_guardrails - One chatbot, one guardrail config (optional)

3. chatbot_history

Purpose: Conversation logs and chat history

Used By:

  • response-3d/text/voice-chatbot-service (save conversations)
  • chat-history-service (retrieve history)
  • analytics services (conversation analytics)

Schema

{
    // Identity
    "_id": ObjectId("65a1b2c3d4e5f6789ghi"),
    "user_id": "User-123456",
    "project_id": "User-123456_Project_1",
    "session_id": "session_xyz789",          // Groups conversation turns

    // Conversation Turn
    "question": "What are your business hours?",
    "answer": "We are open Monday to Friday, 9 AM to 5 PM EST.",

    // Context
    "retrieved_context": [                   // RAG context from Milvus
        "Business hours: Mon-Fri 9-5 EST",
        "Weekend: Closed"
    ],

    // Model Details
    "model_used": "gpt-4-turbo",
    "tokens_used": 156,                      // Total tokens (prompt + completion)
    "response_time_ms": 1234,                // Latency in milliseconds

    // Audio (3D/Voice chatbots)
    "audio_duration_seconds": 3.5,           // TTS audio length
    "lipsync_data": [...],                   // Rhubarb lipsync JSON (3D only)

    // Metadata
    "user_ip": "203.0.113.50",              // Client IP
    "user_agent": "Mozilla/5.0...",          // Browser
    "device_type": "mobile",                 // "mobile" | "desktop"

    // Timestamps
    "timestamp": ISODate("2025-01-15T14:30:15.000Z"),
    "created_at": ISODate("2025-01-15T14:30:15.000Z")
}

Indexes

db.chatbot_history.createIndex({ user_id: 1, project_id: 1 });
db.chatbot_history.createIndex({ session_id: 1 });
db.chatbot_history.createIndex({ timestamp: -1 }); // For chronological retrieval
db.chatbot_history.createIndex({ user_id: 1, timestamp: -1 });

Sample Document

{
    "_id": ObjectId("65a1b2c3d4e5f6789ghi"),
    "user_id": "User-789456",
    "project_id": "User-789456_Project_Support",
    "session_id": "session_abc123",
    "question": "How do I reset my password?",
    "answer": "To reset your password, click on 'Forgot Password' on the login page. You'll receive an OTP via email to verify your identity and set a new password.",
    "retrieved_context": [
        "Password reset process: Click 'Forgot Password', enter email, receive OTP, verify OTP, set new password.",
        "OTPs are valid for 60 seconds."
    ],
    "model_used": "gpt-4-turbo",
    "tokens_used": 89,
    "response_time_ms": 987,
    "audio_duration_seconds": 5.2,
    "user_ip": "198.51.100.42",
    "device_type": "desktop",
    "timestamp": ISODate("2025-01-15T10:22:33.000Z")
}

Retention Policy

Plan Retention Period
Free 7 days
Pro 30 days
Business 90 days
Premium/Enterprise Unlimited

Auto-deletion: Conversations older than retention period are automatically deleted (via scheduled job)


4. files

Purpose: Metadata for uploaded files stored in GridFS

Used By:

  • data-crawling-service (crawled websites)
  • file-upload-service (user uploads)
  • chatbot-maintenance-service (document updates)

Schema

{
    // Identity
    "_id": ObjectId("65a1b2c3d4e5f6789jkl"),
    "user_id": "User-123456",
    "project_id": "User-123456_Project_1",

    // File Info
    "file_name": "product_catalog.pdf",
    "file_type": "pdf",                      // "pdf" | "docx" | "xlsx" | "txt" | "url" | "qna"
    "file_size_bytes": 2048576,              // 2MB
    "mime_type": "application/pdf",

    // Content
    "extracted_text": "Full text content...",  // Extracted text from file
    "chunks": [                              // Text chunking for RAG
        {
            "chunk_index": 0,
            "content": "Chunk text...",
            "start_pos": 0,
            "end_pos": 500,
            "length": 500
        }
    ],

    // Embeddings
    "embeddings": [0.123, -0.456, ...],     // Vector embedding (384 dimensions for bge-small)
    "embedding_model": "BAAI/bge-small-en-v1.5",

    // GridFS (if large file)
    "file_blob": Binary("..."),              // Binary data if small file
    "gridfs_id": ObjectId("..."),            // GridFS file ID if large file

    // URL-specific (for crawled websites)
    "source_url": "https://example.com/products",  // Original URL
    "crawled_at": ISODate("2025-01-15T10:00:00.000Z"),

    // Timestamps
    "created_at": ISODate("2025-01-15T10:00:00.000Z"),
    "last_modified": ISODate("2025-01-15T12:30:00.000Z")
}

Indexes

db.files.createIndex({ user_id: 1, project_id: 1 });
db.files.createIndex({ file_type: 1 });
db.files.createIndex({ created_at: -1 });

Sample Document (PDF)

{
    "_id": ObjectId("65a1b2c3d4e5f6789jkl"),
    "user_id": "User-789456",
    "project_id": "User-789456_Project_Support",
    "file_name": "help_guide_2025.pdf",
    "file_type": "pdf",
    "file_size_bytes": 1536000,
    "mime_type": "application/pdf",
    "extracted_text": "MachineAvatars Help Guide 2025\n\nTable of Contents...",
    "chunks": [
        {
            "chunk_index": 0,
            "content": "MachineAvatars Help Guide 2025\n\nGetting Started...",
            "start_pos": 0,
            "end_pos": 512,
            "length": 512
        }
    ],
    "gridfs_id": ObjectId("65a1b2c3d4e5f6789xyz"),
    "embedding_model": "BAAI/bge-small-en-v1.5",
    "created_at": ISODate("2025-01-14T15:00:00.000Z"),
    "last_modified": ISODate("2025-01-14T15:00:00.000Z")
}

Sample Document (Crawled URL)

{
    "_id": ObjectId("65a1b2c3d4e5f6789mno"),
    "user_id": "User-789456",
    "project_id": "User-789456_Project_Support",
    "file_type": "url",
    "source_url": "https://machineavatars.com/pricing",
    "extracted_text": {
        "https://machineavatars.com/pricing": "Pricing Plans\n\nFree Plan: $0/month..."
    },
    "chunks": [
        {
            "chunk_index": 0,
            "content": "Pricing Plans\n\nFree Plan: $0/month, 1 chatbot, 50 conversations...",
            "start_pos": 0,
            "end_pos": 512,
            "length": 512
        }
    ],
    "crawled_at": ISODate("2025-01-15T08:00:00.000Z"),
    "created_at": ISODate("2025-01-15T08:00:00.000Z")
}

5. files_secondary

Purpose: Secondary file storage (specific use cases)

Schema: Similar to files collection

Note: Used for organization-specific files or backup storage


6. system_prompts_user

Purpose: Custom system prompts per chatbot

Used By:

  • systemprompt-service (CRUD)
  • response-3d/text/voice-chatbot-service (retrieve prompt)

Schema

{
    // Identity
    "_id": ObjectId("65a1b2c3d4e5f6789pqr"),
    "user_id": "User-123456",
    "project_id": "User-123456_Project_1",

    // Prompt
    "system_prompt": "You are Emma, a friendly customer support assistant for ACME Corp. Always be polite, concise, and helpful. If you don't know the answer, admit it and offer to connect the user with a human agent.",

    // Configuration
    "prompt_version": "v2.0",                // Versioning for A/B testing
    "is_active": true,                       // Enable/disable

    // Timestamps
    "created_at": ISODate("2025-01-10T10:00:00.000Z"),
    "updated_at": ISODate("2025-01-15T14:00:00.000Z")
}

Indexes

db.system_prompts_user.createIndex(
  { user_id: 1, project_id: 1 },
  { unique: true }
);
db.system_prompts_user.createIndex({ is_active: 1 });

Sample Document

{
    "_id": ObjectId("65a1b2c3d4e5f6789pqr"),
    "user_id": "User-789456",
    "project_id": "User-789456_Project_Support",
    "system_prompt": "You are an AI assistant for TechCorp's IT Help Desk. You specialize in troubleshooting software issues, password resets, and hardware problems. Always ask clarifying questions before providing solutions. Be technical but clear.",
    "prompt_version": "v3.1",
    "is_active": true,
    "created_at": ISODate("2025-01-12T11:00:00.000Z"),
    "updated_at": ISODate("2025-01-15T09:30:00.000Z")
}

7. projectid_creation

Purpose: Project metadata and tracking

Used By:

  • create-chatbot-service (create projects)
  • All services (project validation)

Schema

{
    // Identity
    "_id": ObjectId("65a1b2c3d4e5f6789stu"),
    "user_id": "User-123456",
    "project_id": "User-123456_Project_1",   // Unique identifier

    // Project Info
    "chatbot_name": "Customer Support Bot",
    "chatbot_description": "Handles customer inquiries about products and services",

    // Metadata
    "chatbot_type": "3d",                    // "3d" | "text" | "voice"
    "industry": "E-commerce",                // Optional
    "use_case": "Customer Support",          // Optional

    // Statistics
    "total_documents": 15,                   // Files uploaded
    "total_urls_crawled": 5,                 // Websites crawled
    "total_qna_pairs": 23,                   // Manual Q&A
    "total_vectors": 1250,                   // Embeddings in Milvus

    // Status
    "status": "active",                      // "active" | "paused" | "archived"

    // Deletion
    "isDeleted": false,
    "deleted_at": null,

    // Timestamps
    "created_at": ISODate("2025-01-10T10:00:00.000Z"),
    "updated_at": ISODate("2025-01-15T16:00:00.000Z")
}

Indexes

db.projectid_creation.createIndex(
  { user_id: 1, project_id: 1 },
  { unique: true }
);
db.projectid_creation.createIndex({ user_id: 1 });
db.projectid_creation.createIndex({ status: 1 });

Sample Document

{
    "_id": ObjectId("65a1b2c3d4e5f6789stu"),
    "user_id": "User-789456",
    "project_id": "User-789456_Project_Support",
    "chatbot_name": "TechSupport AI - Emma",
    "chatbot_description": "AI-powered IT helpdesk assistant for internal employees",
    "chatbot_type": "3d",
    "industry": "Technology",
    "use_case": "Internal IT Support",
    "total_documents": 8,
    "total_urls_crawled": 3,
    "total_qna_pairs": 47,
    "total_vectors": 892,
    "status": "active",
    "isDeleted": false,
    "created_at": ISODate("2025-01-12T09:00:00.000Z"),
    "updated_at": ISODate("2025-01-15T17:20:00.000Z")
}

8. organisation_data

Purpose: Enterprise organization management

Used By:

  • Enterprise features only
  • Multi-user account management

Schema

{
    // Identity
    "_id": ObjectId("65a1b2c3d4e5f6789vwx"),
    "organization_id": "org_abc123",

    // Organization Info
    "name": "ACME Corporation",
    "domain": "acme.com",                    // Email domain for SSO
    "industry": "Manufacturing",

    // Owner
    "owner_user_id": "User-123456",

    // Subscription
    "plan": "Premium",                       // "Business" | "Premium" | "Enterprise"
    "billing_email": "billing@acme.com",
    "seats_purchased": 50,                   // Number of user seats
    "seats_used": 32,                        // Current users

    // Departments
    "departments": [
        {
            "department_id": "dept_sales",
            "name": "Sales",
            "admin_user_ids": ["User-456"],
            "member_user_ids": ["User-789", "User-321"]
        },
        {
            "department_id": "dept_support",
            "name": "Customer Support",
            "admin_user_ids": ["User-654"],
            "member_user_ids": ["User-987", "User-246"]
        }
    ],

    // SSO Configuration
    "sso_enabled": true,
    "sso_provider": "Okta",                  // "Okta" | "Azure AD" | "Google" | "Custom SAML"
    "sso_metadata_url": "https://acme.okta.com/app/...",

    // Security
    "ip_whitelist": [
        "203.0.113.0/24",                    // Office IP range
        "198.51.100.50/32"                   // VPN gateway
    ],

    // Features
    "features_enabled": [
        "department_partitioning",
        "advanced_analytics",
        "custom_branding",
        "api_access",
        "priority_support"
    ],

    // Timestamps
    "created_at": ISODate("2025-01-01T00:00:00.000Z"),
    "updated_at": ISODate("2025-01-15T10:00:00.000Z")
}

Indexes

db.organisation_data.createIndex({ organization_id: 1 }, { unique: true });
db.organisation_data.createIndex({ owner_user_id: 1 });
db.organisation_data.createIndex({ domain: 1 });

Sample Document

{
    "_id": ObjectId("65a1b2c3d4e5f6789vwx"),
    "organization_id": "org_techcorp",
    "name": "TechCorp Solutions Inc.",
    "domain": "techcorp.com",
    "industry": "Technology/SaaS",
    "owner_user_id": "User-100001",
    "plan": "Enterprise",
    "billing_email": "accounting@techcorp.com",
    "seats_purchased": 100,
    "seats_used": 73,
    "departments": [
        {
            "department_id": "dept_engineering",
            "name": "Engineering",
            "admin_user_ids": ["User-100002"],
            "member_user_ids": ["User-100010", "User-100011", "User-100012"]
        },
        {
            "department_id": "dept_sales",
            "name": "Sales & Marketing",
            "admin_user_ids": ["User-100003"],
            "member_user_ids": ["User-100020", "User-100021"]
        }
    ],
    "sso_enabled": true,
    "sso_provider": "Azure AD",
    "sso_metadata_url": "https://login.microsoftonline.com/techcorp.com/...",
    "ip_whitelist": [
        "198.51.100.0/24",
        "203.0.113.42/32"
    ],
    "features_enabled": [
        "department_partitioning",
        "advanced_analytics",
        "custom_branding",
        "api_access",
        "priority_support",
        "hipaa_compliance",
        "custom_integrations"
    ],
    "created_at": ISODate("2024-12-01T00:00:00.000Z"),
    "updated_at": ISODate("2025-01-15T14:30:00.000Z")
}

9. trash_collection_name

Purpose: Soft-deleted chatbots (7-day retention for recovery)

Used By:

  • chatbot-maintenance-service (trash management)

Schema

{
    // Same fields as chatbot_selections + projectid_creation
    ...

    // Deletion Metadata
    "deleted_at": ISODate("2025-01-15T10:00:00.000Z"),  // When deleted
    "deleted_by": "User-123456",                         // Who deleted (optional)

    // Note: Milvus embeddings and Azure Blobs are NOT in trash (hard deleted)
}

Retention & Auto-Purge

Retention Period: 7 days
Auto-Purge: Documents older than 7 days should be automatically deleted (not implemented in code - manual cleanup needed)

Recommended Cron Job:

// Daily cleanup of trash older than 7 days
db.trash_collection_name.deleteMany({
  deleted_at: { $lt: new Date(Date.now() - 7 * 24 * 60 * 60 * 1000) },
});

📊 Collection Relationships

graph TB
    USER[users_multichatbot_v2]
    ORG[organisation_data]
    CHATBOT[chatbot_selections]
    PROJECT[projectid_creation]
    HISTORY[chatbot_history]
    FILES[files]
    PROMPTS[system_prompts_user]
    FEATURES[features_per_user]
    TRASH[trash_collection_name]

    USER -->|1:N| CHATBOT
    USER -->|1:N| PROJECT
    USER -->|1:N| HISTORY
    USER -->|1:N| FILES
    USER -->|1:N| PROMPTS
    USER -->|1:1| FEATURES
    USER -->|N:1| ORG

    CHATBOT -->|1:1| PROJECT
    CHATBOT -->|1:N| HISTORY

    PROJECT -->|1:N| FILES
    PROJECT -->|1:1| PROMPTS

    CHATBOT -.->|soft delete| TRASH
    PROJECT -.->|soft delete| TRASH

    style USER fill:#E3F2FD
    style CHATBOT fill:#FFF3E0
    style HISTORY fill:#C8E6C9
    style TRASH fill:#FFCDD2

🔒 Data Security

Encryption:

  • All collections encrypted at rest (AES-256, Azure-managed)
  • TLS 1.3 for all database connections

Access Control:

  • Azure Managed Identity authentication
  • Connection strings stored in Azure Key Vault (planned)
  • Database-level RBAC

Sensitive Data:

  • Passwords: ⚠️ PLAIN TEXT - MUST MIGRATE TO BCRYPT
  • Email: Encrypted at rest
  • Conversation data: User-owned, exportable (GDPR)
  • PII fields: Include in data export API

Compliance:

  • GDPR: Right to erasure, data export2
  • DPDPA: Data localization (Central India)
  • HIPAA: On-premise only (not cloud)

Services:

Data Architecture:

Security:


Progress: Section 4 - 2/8 files complete (25%)

"Schema is the contract between code and data." 📊✅