Skip to content

Data Dictionary

Section: 4-data-architecture-governance
Document: Complete Field Reference
Collections: All 9 MongoDB collections
Fields: 50+ documented


🎯 Overview

Comprehensive reference guide for all fields across MongoDB collections. Use this as a quick lookup for field names, types, descriptions, and validation rules.


📚 Field Reference (Alphabetical)

A

account_status

  • Type: String
  • Values: "paused" | "active" | "suspended"
  • Description: User account status
  • Default: "paused" (until OTP verification)
  • Collections: users_multichatbot_v2
  • Used By: All services (authorization checks)

admin_user_ids

  • Type: Array
  • Format: ["User-123456", "User-789012"]
  • Description: Department admins (organization management)
  • Collections: organisation_data (departments array)
  • Used By: Enterprise features

answer

  • Type: String
  • Max Length: 65535
  • Description: Chatbot response text
  • Collections: chatbot_history
  • Used By: response-3d/text/voice-chatbot-service

audio_duration_seconds

  • Type: Number (Float)
  • Description: TTS audio length in seconds
  • Example: 3.5
  • Collections: chatbot_history
  • Used By: response-3d-chatbot-service, response-voice-chatbot-service

B - C

brand_color

  • Type: String
  • Format: Hex color code (e.g., "#FF6622")
  • Description: Chatbot UI brand color
  • Collections: chatbot_selections
  • Used By: Frontend (chatbot UI)

chatbot_description

  • Type: String
  • Max Length: 500
  • Description: Chatbot purpose description
  • Collections: projectid_creation
  • Used By: create-chatbot-service

chatbot_logo_url

  • Type: String (URL)
  • Description: Custom chatbot logo
  • Optional: Yes
  • Collections: chatbot_selections
  • Used By: Frontend

chatbot_name

  • Type: String
  • Max Length: 100
  • Description: Display name for chatbot
  • Example: "Customer Support Bot"
  • Collections: chatbot_selections, projectid_creation
  • Used By: All chatbot services

chatbot_type

  • Type: String
  • Values: "3d" | "text" | "voice"
  • Description: Type of chatbot interface
  • Collections: chatbot_selections, projectid_creation
  • Used By: Selection services, response services

chunk_id

  • Type: String
  • Format: "{userid}{projectid}_chunk"
  • Example: "User-123456_Project_1_chunk_0"
  • Description: Unique identifier for text chunk
  • Collections: files (chunks array), Milvus vectors
  • Used By: RAG pipeline

chunks

  • Type: Array
  • Schema:
    {
        "chunk_index": 0,
        "content": "Text...",
        "start_pos": 0,
        "end_pos": 500,
        "length": 500
    }
    
  • Description: Text divided into searchable chunks
  • Collections: files
  • Used By: Data ingestion, vectorization
  • created_at

    • Type: ISODate
    • Format: ISO 8601 timestamp
    • Example: ISODate("2025-01-15T10:00:00.000Z")
    • Description: Record creation timestamp
    • Collections: All collections
    • Used By: All services (auditing)

    D - E

    deleted_at

    • Type: ISODate | null
    • Description: Soft delete timestamp
    • Null: Not deleted
    • Collections: chatbot_selections, projectid_creation, trash_collection_name
    • Used By: chatbot-maintenance-service

    department_id

    • Type: String
    • Format: "dept_{name}"
    • Example: "dept_sales"
    • Description: Department unique identifier
    • Collections: organisation_data, users_multichatbot_v2
    • Used By: Enterprise multi-user features

    device_type

    • Type: String
    • Values: "mobile" | "desktop"
    • Description: User device type
    • Collections: chatbot_history
    • Used By: Analytics

    email

    • Type: String (Email)
    • Validation: Valid email format
    • Indexed: Unique index
    • Description: User email address (login identifier)
    • Collections: users_multichatbot_v2
    • Used By: auth-service, user-service

    embedding

    • Type: Array | FLOAT_VECTOR (Milvus)
    • Dimensions: 1536 (OpenAI) or 384 (bge-small)
    • Description: Vector embedding for semantic search
    • Collections: files, Milvus collections
    • Used By: RAG pipeline

    embedding_model

    • Type: String
    • Values: "text-embedding-ada-002" | "BAAI/bge-small-en-v1.5"
    • Description: Model used to generate embeddings
    • Collections: files
    • Used By: Data ingestion services

    extracted_text

    • Type: String | Object
    • Description: Text extracted from uploaded files or URLs
    • Format (files): String
    • Format (URLs): { "url": "text content" }
    • Collections: files
    • Used By: Data crawling, file upload services

    F - G

    features_enabled

    • Type: Array
    • Example: ["department_partitioning", "advanced_analytics", "custom_branding"]
    • Description: Enterprise features enabled for organization
    • Collections: organisation_data
    • Used By: Enterprise feature flags

    file_name

    • Type: String
    • Example: "product_catalog.pdf"
    • Description: Original uploaded file name
    • Collections: files, files_secondary
    • Used By: File management services

    file_size_bytes

    • Type: Number (Integer)
    • Description: File size in bytes
    • Example: 2048576 (2MB)
    • Collections: files
    • Used By: Storage management

    file_type

    • Type: String
    • Values: "pdf" | "docx" | "xlsx" | "txt" | "url" | "qna"
    • Description: Type of uploaded/ingested content
    • Collections: files, chatbot_history (Milvus)
    • Used By: Data processing pipeline

    greeting_message

    • Type: String
    • Max Length: 500
    • Description: Initial message shown to users
    • Example: "Hello! How can I help you today?"
    • Collections: chatbot_selections
    • Used By: Frontend chatbot UI

    guardrails_enabled

    • Type: Boolean
    • Description: Whether content moderation is active
    • Collections: chatbot_selections
    • Used By: Response services (guardrails check)

    I - L

    industry

    • Type: String
    • Example: "E-commerce", "Healthcare", "Technology"
    • Description: Business industry classification
    • Collections: projectid_creation, organisation_data
    • Used By: Analytics, segmentation

    ip_whitelist

    • Type: Array
    • Format: CIDR notation
    • Example: ["203.0.113.0/24", "198.51.100.50/32"]
    • Description: Allowed IP ranges (Enterprise security)
    • Collections: organisation_data
    • Used By: Network security, gateway

    isDeleted

    • Type: Boolean
    • Description: Soft delete flag
    • Default: false
    • Collections: chatbot_selections, projectid_creation
    • Used By: chatbot-maintenance-service

    last_active

    • Type: ISODate
    • Description: Last chatbot interaction timestamp
    • Collections: chatbot_selections
    • Used By: Analytics (chatbot usage)

    lipsync_data

    • Type: Array
    • Description: Rhubarb lip-sync animation data (3D chatbots only)
    • Collections: chatbot_history
    • Used By: response-3d-chatbot-service, frontend 3D renderer

    • M - N

      max_tokens

      • Type: Number (Integer)
      • Range: 256 - 4096
      • Default: 2048
      • Description: Maximum LLM response tokens
      • Collections: chatbot_selections
      • Used By: Response services (LLM calls)

      member_user_ids

      • Type: Array
      • Description: Department members (non-admin)
      • Collections: organisation_data (departments array)
      • Used By: Enterprise RBAC

      model_used

      • Type: String
      • Values: "gpt-4-turbo" | "gpt-3.5-turbo" | "gpt-4" | etc.
      • Description: LLM model used for response generation
      • Collections: chatbot_history
      • Used By: Analytics, cost tracking

      name

      • Type: String
      • Max Length: 100
      • Description: User's full name
      • Collections: users_multichatbot_v2, organisation_data
      • Used By: User management, display

      O - P

      organization_id

      • Type: String
      • Format: "org_{uuid}"
      • Example: "org_abc123"
      • Description: Enterprise organization unique identifier
      • Indexed: Yes (unique)
      • Collections: organisation_data, users_multichatbot_v2
      • Used By: Enterprise features

      otp

      • Type: String
      • Format: 6 digits
      • Example: "123456"
      • Description: One-time password for email verification
      • Expiry: 60 seconds
      • Collections: users_multichatbot_v2
      • Used By: user-service (signup, password reset)

      otp_expiration

      • Type: Number (Integer)
      • Format: Unix timestamp
      • Example: 1735214460
      • Description: OTP expiration time (60 seconds from generation)
      • Collections: users_multichatbot_v2
      • Used By: user-service (OTP validation)

      owner_user_id

      • Type: String
      • Format: "User-XXXXXX"
      • Description: Organization owner (admin role)
      • Collections: organisation_data
      • Used By: Enterprise management

      password

      • Type: String
      • ⚠️ WARNING: Currently stored in PLAIN TEXT! (security issue)
      • Description: User password (should be bcrypt hash!)
      • Collections: users_multichatbot_v2
      • Used By: auth-service (login), user-service (signup)
      • FIX REQUIRED: Migrate to password_hash field with bcrypt

      payment_status

      • Type: String
      • Values: "pending" | "active" | "failed" | "cancelled"
      • Description: Payment/subscription status
      • Collections: users_multichatbot_v2
      • Used By: Payment service, feature access control

      plan

      • Type: String
      • Values: "Free" | "Pro" | "Business" | "Premium" | "Enterprise"
      • Description: Subscription plan name
      • Collections: organisation_data
      • Used By: Feature gating, analytics

      project_id

      • Type: String
      • Format: "{userid}_Project" or "{userid}"
      • Example: "User-123456_Project_Support"
      • Description: Unique chatbot/project identifier
      • Indexed: Yes (composite with user_id)
      • Collections: chatbot_selections, chatbot_history, files, projectid_creation, etc.
      • Used By: All chatbot services

      Q - R

      question

      • Type: String
      • Max Length: 65535
      • Description: User's question/input
      • Collections: chatbot_history
      • Used By: Response services, analytics

      response_time_ms

      • Type: Number (Integer)
      • Description: Total response generation time (milliseconds)
      • Example: 1234 (1.23 seconds)
      • Collections: chatbot_history
      • Used By: Performance monitoring

      retrieved_context

      • Type: Array
      • Description: RAG context chunks from Milvus
      • Example: ["Business hours: Mon-Fri 9-5", "Contact: support@example.com"]
      • Collections: chatbot_history
      • Used By: Response services, debugging

      role

      • Type: String
      • Values: "Owner" | "Admin" | "Editor" | "Viewer" | "Analyst"
      • Description: User's RBAC role
      • Collections: users_multichatbot_v2
      • Used By: Access control, authorization

      S

      seats_purchased

      • Type: Number (Integer)
      • Description: Number of user seats in organization plan
      • Example: 50
      • Collections: organisation_data
      • Used By: Enterprise billing, user limits

      seats_used

      • Type: Number (Integer)
      • Description: Current number of active users
      • Collections: organisation_data
      • Used By: Seat management

      selection_avatar

      • Type: String
      • Values: "Avatar_Lisa" | "Avatar_Emma" | "Avatar_Jack" | "Avatar_Sarah" | "Avatar_Michael"
      • Description: Selected 3D avatar character
      • Collections: chatbot_selections
      • Used By: response-3d-chatbot-service, frontend

      selection_model

      • Type: String
      • Values: "gpt-4-turbo" | "gpt-4" | "gpt-3.5-turbo" | etc.
      • Description: Selected LLM model
      • Collections: chatbot selections
      • Used By: Response services (LLM selection)

      selection_voice

      • Type: String
      • Values: "Female_1" | "Female_2" | "Male_1" | "Male_2" | etc.
      • Description: Azure TTS voice ID
      • Collections: chatbot_selections
      • Used By: TTS services

      session_id

      • Type: String
      • Format: "session_{randomstring}"
      • Example: "session_abc123xyz"
      • Description: Groups conversation turns from same session
      • Collections: chatbot_history
      • Used By: Chat history retrieval, analytics

      source_url

      • Type: String (URL)
      • Description: Original URL for crawled content
      • Collections: files (file_type="url")
      • Used By: Data crawling service

      sso_enabled

      • Type: Boolean
      • Description: Single Sign-On enabled for organization
      • Collections: organisation_data
      • Used By: Enterprise SSO integration

      sso_provider

      • Type: String
      • Values: "Okta" | "Azure AD" | "Google" | "Custom SAML"
      • Description: SSO identity provider
      • Collections: organisation_data
      • Used By: SSO authentication flow

      status

      • Type: String
      • Values: "active" | "paused" | "archived"
      • Description: Project/chatbot operational status
      • Collections: projectid_creation
      • Used By: Chatbot management

      subscription_date

      • Type: String (ISO Date)
      • Format: "YYYY-MM-DD"
      • Example: "2025-01-15"
      • Description: Subscription start date
      • Collections: users_multichatbot_v2
      • Used By: Billing, analytics

      subscription_id

      • Type: String
      • Format: "sub_{number}"
      • Values: "sub_009" (Free), "sub_001" (Pro), etc.
      • Description: Subscription plan identifier
      • Collections: users_multichatbot_v2
      • Used By: Feature access control

      system_prompt

      • Type: String
      • Max Length: 10000
      • Description: Custom system prompt for LLM
      • Example: "You are a helpful customer support assistant..."
      • Collections: system_prompts_user
      • Used By: Response services (LLM context)

      T - U

      temperature

      • Type: Number (Float)
      • Range: 0.0 - 1.0
      • Default: 0.7
      • Description: LLM creativity/randomness parameter
      • Collections: chatbot_selections
      • Used By: Response services (LLM calls)

      text

      • Type: String | VARCHAR(65535)
      • Description: Text content (MongoDB) or vector metadata (Milvus)
      • Collections: files, Milvus collections
      • Used By: RAG pipeline, search results

      timestamp

      • Type: ISODate | INT64 (Milvus)
      • Description: Record timestamp
      • Collections: chatbot_history, Milvus
      • Used By: Chronological queries, retention policies

      tokens_used

      • Type: Number (Integer)
      • Description: Total tokens consumed (prompt + completion)
      • Collections: chatbot_history
      • Used By: Cost tracking, analytics

      total_conversations

      • Type: Number (Integer)
      • Description: Lifetime conversation count for chatbot
      • Collections: chatbot_selections
      • Used By: Analytics, usage metrics

      total_documents, total_urls_crawled, total_qna_pairs, total_vectors

      • Type: Number (Integer)
      • Description: Statistics for project knowledge base
      • Collections: projectid_creation
      • Used By: Analytics, project overview

      updated_at

      • Type: ISODate
      • Description: Last modification timestamp
      • Collections: Most collections
      • Used By: Change tracking

      url

      • Type: String (VARCHAR 2048 in Milvus)
      • Description: Source URL for content
      • Collections: files, Milvus
      • Used By: Attribution, source tracking

      use_case

      • Type: String
      • Example: "Customer Support", "Lead Generation"
      • Description: Business use case for chatbot
      • Collections: projectid_creation
      • Used By: Analytics, categorization

      user_agent

      • Type: String
      • Example: "Mozilla/5.0..."
      • Description: Browser user agent string
      • Collections: chatbot_history
      • Used By: Analytics, device detection

      user_consent

      • Type: Object
      • Schema:
        {
            "data_processing": true,
            "marketing_emails": false,
            "analytics_tracking": true,
            "consented_at": "2025-01-15T10:00:00Z",
            "consent_version": "v1.0"
        }
        
      • Description: GDPR/DPDPA consent tracking
      • Collections: users_multichatbot_v2
      • Used By: Compliance, data processing

      user_created_at(DATE)

      • Type: String (ISO Date)
      • Format: "YYYY-MM-DD"
      • Description: User account creation date
      • Collections: users_multichatbot_v2
      • Used By: User analytics

      user_id

      • Type: String
      • Format: "User-XXXXXX" (6 random digits)
      • Example: "User-123456"
      • Generated: On OTP verification
      • Indexed: Yes
      • Description: Unique user identifier
      • Collections: All collections
      • Used By: All services (primary user reference)

      user_ip

      • Type: String (IP address)
      • Format: IPv4 or IPv6
      • Example: "203.0.113.50"
      • Description: Client IP address
      • Collections: chatbot_history
      • Used By: Security, analytics, geo-location

      V - Z

      verified

      • Type: Boolean
      • Description: Email verification status
      • Default: false
      • Set to true: After OTP verification
      • Collections: users_multichatbot_v2
      • Used By: auth-service (login check)


      Progress: Section 4 - 8/8 files complete (100%)

      "Names matter. Document them well." 📖✅