Data Dictionary¶
Section: 4-data-architecture-governance
Document: Complete Field Reference
Collections: All 9 MongoDB collections
Fields: 50+ documented
🎯 Overview¶
Comprehensive reference guide for all fields across MongoDB collections. Use this as a quick lookup for field names, types, descriptions, and validation rules.
📚 Field Reference (Alphabetical)¶
A¶
account_status
- Type: String
- Values: "paused" | "active" | "suspended"
- Description: User account status
- Default: "paused" (until OTP verification)
- Collections: users_multichatbot_v2
- Used By: All services (authorization checks)
admin_user_ids
- Type: Array
- Format: ["User-123456", "User-789012"]
- Description: Department admins (organization management)
- Collections: organisation_data (departments array)
- Used By: Enterprise features
answer
- Type: String
- Max Length: 65535
- Description: Chatbot response text
- Collections: chatbot_history
- Used By: response-3d/text/voice-chatbot-service
audio_duration_seconds
- Type: Number (Float)
- Description: TTS audio length in seconds
- Example: 3.5
- Collections: chatbot_history
- Used By: response-3d-chatbot-service, response-voice-chatbot-service
B - C¶
brand_color
- Type: String
- Format: Hex color code (e.g., "#FF6622")
- Description: Chatbot UI brand color
- Collections: chatbot_selections
- Used By: Frontend (chatbot UI)
chatbot_description
- Type: String
- Max Length: 500
- Description: Chatbot purpose description
- Collections: projectid_creation
- Used By: create-chatbot-service
chatbot_logo_url
- Type: String (URL)
- Description: Custom chatbot logo
- Optional: Yes
- Collections: chatbot_selections
- Used By: Frontend
chatbot_name
- Type: String
- Max Length: 100
- Description: Display name for chatbot
- Example: "Customer Support Bot"
- Collections: chatbot_selections, projectid_creation
- Used By: All chatbot services
chatbot_type
- Type: String
- Values: "3d" | "text" | "voice"
- Description: Type of chatbot interface
- Collections: chatbot_selections, projectid_creation
- Used By: Selection services, response services
chunk_id
- Type: String
- Format: "{userid}{projectid}_chunk"
- Example: "User-123456_Project_1_chunk_0"
- Description: Unique identifier for text chunk
- Collections: files (chunks array), Milvus vectors
- Used By: RAG pipeline
chunks
- Type: Array
- Schema:
- Description: Text divided into searchable chunks
- Collections: files
- Used By: Data ingestion, vectorization
created_at
- Type: ISODate
- Format: ISO 8601 timestamp
- Example: ISODate("2025-01-15T10:00:00.000Z")
- Description: Record creation timestamp
- Collections: All collections
- Used By: All services (auditing)
D - E¶
deleted_at
- Type: ISODate | null
- Description: Soft delete timestamp
- Null: Not deleted
- Collections: chatbot_selections, projectid_creation, trash_collection_name
- Used By: chatbot-maintenance-service
department_id
- Type: String
- Format: "dept_{name}"
- Example: "dept_sales"
- Description: Department unique identifier
- Collections: organisation_data, users_multichatbot_v2
- Used By: Enterprise multi-user features
device_type
- Type: String
- Values: "mobile" | "desktop"
- Description: User device type
- Collections: chatbot_history
- Used By: Analytics
- Type: String (Email)
- Validation: Valid email format
- Indexed: Unique index
- Description: User email address (login identifier)
- Collections: users_multichatbot_v2
- Used By: auth-service, user-service
embedding
- Type: Array
| FLOAT_VECTOR (Milvus) - Dimensions: 1536 (OpenAI) or 384 (bge-small)
- Description: Vector embedding for semantic search
- Collections: files, Milvus collections
- Used By: RAG pipeline
embedding_model
- Type: String
- Values: "text-embedding-ada-002" | "BAAI/bge-small-en-v1.5"
- Description: Model used to generate embeddings
- Collections: files
- Used By: Data ingestion services
extracted_text
- Type: String | Object
- Description: Text extracted from uploaded files or URLs
- Format (files): String
- Format (URLs): { "url": "text content" }
- Collections: files
- Used By: Data crawling, file upload services
F - G¶
features_enabled
- Type: Array
- Example: ["department_partitioning", "advanced_analytics", "custom_branding"]
- Description: Enterprise features enabled for organization
- Collections: organisation_data
- Used By: Enterprise feature flags
file_name
- Type: String
- Example: "product_catalog.pdf"
- Description: Original uploaded file name
- Collections: files, files_secondary
- Used By: File management services
file_size_bytes
- Type: Number (Integer)
- Description: File size in bytes
- Example: 2048576 (2MB)
- Collections: files
- Used By: Storage management
file_type
- Type: String
- Values: "pdf" | "docx" | "xlsx" | "txt" | "url" | "qna"
- Description: Type of uploaded/ingested content
- Collections: files, chatbot_history (Milvus)
- Used By: Data processing pipeline
greeting_message
- Type: String
- Max Length: 500
- Description: Initial message shown to users
- Example: "Hello! How can I help you today?"
- Collections: chatbot_selections
- Used By: Frontend chatbot UI
guardrails_enabled
- Type: Boolean
- Description: Whether content moderation is active
- Collections: chatbot_selections
- Used By: Response services (guardrails check)
I - L¶
industry
- Type: String
- Example: "E-commerce", "Healthcare", "Technology"
- Description: Business industry classification
- Collections: projectid_creation, organisation_data
- Used By: Analytics, segmentation
ip_whitelist
- Type: Array
- Format: CIDR notation
- Example: ["203.0.113.0/24", "198.51.100.50/32"]
- Description: Allowed IP ranges (Enterprise security)
- Collections: organisation_data
- Used By: Network security, gateway
isDeleted
- Type: Boolean
- Description: Soft delete flag
- Default: false
- Collections: chatbot_selections, projectid_creation
- Used By: chatbot-maintenance-service
last_active
- Type: ISODate
- Description: Last chatbot interaction timestamp
- Collections: chatbot_selections
- Used By: Analytics (chatbot usage)
lipsync_data
- Type: Array
- Description: Rhubarb lip-sync animation data (3D chatbots only)
- Collections: chatbot_history
- Used By: response-3d-chatbot-service, frontend 3D renderer
M - N¶
max_tokens
- Type: Number (Integer)
- Range: 256 - 4096
- Default: 2048
- Description: Maximum LLM response tokens
- Collections: chatbot_selections
- Used By: Response services (LLM calls)
member_user_ids
- Type: Array
- Description: Department members (non-admin)
- Collections: organisation_data (departments array)
- Used By: Enterprise RBAC
model_used
- Type: String
- Values: "gpt-4-turbo" | "gpt-3.5-turbo" | "gpt-4" | etc.
- Description: LLM model used for response generation
- Collections: chatbot_history
- Used By: Analytics, cost tracking
name
- Type: String
- Max Length: 100
- Description: User's full name
- Collections: users_multichatbot_v2, organisation_data
- Used By: User management, display
O - P¶
organization_id
- Type: String
- Format: "org_{uuid}"
- Example: "org_abc123"
- Description: Enterprise organization unique identifier
- Indexed: Yes (unique)
- Collections: organisation_data, users_multichatbot_v2
- Used By: Enterprise features
otp
- Type: String
- Format: 6 digits
- Example: "123456"
- Description: One-time password for email verification
- Expiry: 60 seconds
- Collections: users_multichatbot_v2
- Used By: user-service (signup, password reset)
otp_expiration
- Type: Number (Integer)
- Format: Unix timestamp
- Example: 1735214460
- Description: OTP expiration time (60 seconds from generation)
- Collections: users_multichatbot_v2
- Used By: user-service (OTP validation)
owner_user_id
- Type: String
- Format: "User-XXXXXX"
- Description: Organization owner (admin role)
- Collections: organisation_data
- Used By: Enterprise management
password
- Type: String
- ⚠️ WARNING: Currently stored in PLAIN TEXT! (security issue)
- Description: User password (should be bcrypt hash!)
- Collections: users_multichatbot_v2
- Used By: auth-service (login), user-service (signup)
- FIX REQUIRED: Migrate to password_hash field with bcrypt
payment_status
- Type: String
- Values: "pending" | "active" | "failed" | "cancelled"
- Description: Payment/subscription status
- Collections: users_multichatbot_v2
- Used By: Payment service, feature access control
plan
- Type: String
- Values: "Free" | "Pro" | "Business" | "Premium" | "Enterprise"
- Description: Subscription plan name
- Collections: organisation_data
- Used By: Feature gating, analytics
project_id
- Type: String
- Format: "{userid}_Project" or "{userid}"
- Example: "User-123456_Project_Support"
- Description: Unique chatbot/project identifier
- Indexed: Yes (composite with user_id)
- Collections: chatbot_selections, chatbot_history, files, projectid_creation, etc.
- Used By: All chatbot services
Q - R¶
question
- Type: String
- Max Length: 65535
- Description: User's question/input
- Collections: chatbot_history
- Used By: Response services, analytics
response_time_ms
- Type: Number (Integer)
- Description: Total response generation time (milliseconds)
- Example: 1234 (1.23 seconds)
- Collections: chatbot_history
- Used By: Performance monitoring
retrieved_context
- Type: Array
- Description: RAG context chunks from Milvus
- Example: ["Business hours: Mon-Fri 9-5", "Contact: support@example.com"]
- Collections: chatbot_history
- Used By: Response services, debugging
role
- Type: String
- Values: "Owner" | "Admin" | "Editor" | "Viewer" | "Analyst"
- Description: User's RBAC role
- Collections: users_multichatbot_v2
- Used By: Access control, authorization
S¶
seats_purchased
- Type: Number (Integer)
- Description: Number of user seats in organization plan
- Example: 50
- Collections: organisation_data
- Used By: Enterprise billing, user limits
seats_used
- Type: Number (Integer)
- Description: Current number of active users
- Collections: organisation_data
- Used By: Seat management
selection_avatar
- Type: String
- Values: "Avatar_Lisa" | "Avatar_Emma" | "Avatar_Jack" | "Avatar_Sarah" | "Avatar_Michael"
- Description: Selected 3D avatar character
- Collections: chatbot_selections
- Used By: response-3d-chatbot-service, frontend
selection_model
- Type: String
- Values: "gpt-4-turbo" | "gpt-4" | "gpt-3.5-turbo" | etc.
- Description: Selected LLM model
- Collections: chatbot selections
- Used By: Response services (LLM selection)
selection_voice
- Type: String
- Values: "Female_1" | "Female_2" | "Male_1" | "Male_2" | etc.
- Description: Azure TTS voice ID
- Collections: chatbot_selections
- Used By: TTS services
session_id
- Type: String
- Format: "session_{randomstring}"
- Example: "session_abc123xyz"
- Description: Groups conversation turns from same session
- Collections: chatbot_history
- Used By: Chat history retrieval, analytics
source_url
- Type: String (URL)
- Description: Original URL for crawled content
- Collections: files (file_type="url")
- Used By: Data crawling service
sso_enabled
- Type: Boolean
- Description: Single Sign-On enabled for organization
- Collections: organisation_data
- Used By: Enterprise SSO integration
sso_provider
- Type: String
- Values: "Okta" | "Azure AD" | "Google" | "Custom SAML"
- Description: SSO identity provider
- Collections: organisation_data
- Used By: SSO authentication flow
status
- Type: String
- Values: "active" | "paused" | "archived"
- Description: Project/chatbot operational status
- Collections: projectid_creation
- Used By: Chatbot management
subscription_date
- Type: String (ISO Date)
- Format: "YYYY-MM-DD"
- Example: "2025-01-15"
- Description: Subscription start date
- Collections: users_multichatbot_v2
- Used By: Billing, analytics
subscription_id
- Type: String
- Format: "sub_{number}"
- Values: "sub_009" (Free), "sub_001" (Pro), etc.
- Description: Subscription plan identifier
- Collections: users_multichatbot_v2
- Used By: Feature access control
system_prompt
- Type: String
- Max Length: 10000
- Description: Custom system prompt for LLM
- Example: "You are a helpful customer support assistant..."
- Collections: system_prompts_user
- Used By: Response services (LLM context)
T - U¶
temperature
- Type: Number (Float)
- Range: 0.0 - 1.0
- Default: 0.7
- Description: LLM creativity/randomness parameter
- Collections: chatbot_selections
- Used By: Response services (LLM calls)
text
- Type: String | VARCHAR(65535)
- Description: Text content (MongoDB) or vector metadata (Milvus)
- Collections: files, Milvus collections
- Used By: RAG pipeline, search results
timestamp
- Type: ISODate | INT64 (Milvus)
- Description: Record timestamp
- Collections: chatbot_history, Milvus
- Used By: Chronological queries, retention policies
tokens_used
- Type: Number (Integer)
- Description: Total tokens consumed (prompt + completion)
- Collections: chatbot_history
- Used By: Cost tracking, analytics
total_conversations
- Type: Number (Integer)
- Description: Lifetime conversation count for chatbot
- Collections: chatbot_selections
- Used By: Analytics, usage metrics
total_documents, total_urls_crawled, total_qna_pairs, total_vectors
- Type: Number (Integer)
- Description: Statistics for project knowledge base
- Collections: projectid_creation
- Used By: Analytics, project overview
updated_at
- Type: ISODate
- Description: Last modification timestamp
- Collections: Most collections
- Used By: Change tracking
url
- Type: String (VARCHAR 2048 in Milvus)
- Description: Source URL for content
- Collections: files, Milvus
- Used By: Attribution, source tracking
use_case
- Type: String
- Example: "Customer Support", "Lead Generation"
- Description: Business use case for chatbot
- Collections: projectid_creation
- Used By: Analytics, categorization
user_agent
- Type: String
- Example: "Mozilla/5.0..."
- Description: Browser user agent string
- Collections: chatbot_history
- Used By: Analytics, device detection
user_consent
- Type: Object
- Schema:
- Description: GDPR/DPDPA consent tracking
- Collections: users_multichatbot_v2
- Used By: Compliance, data processing
user_created_at(DATE)
- Type: String (ISO Date)
- Format: "YYYY-MM-DD"
- Description: User account creation date
- Collections: users_multichatbot_v2
- Used By: User analytics
user_id
- Type: String
- Format: "User-XXXXXX" (6 random digits)
- Example: "User-123456"
- Generated: On OTP verification
- Indexed: Yes
- Description: Unique user identifier
- Collections: All collections
- Used By: All services (primary user reference)
user_ip
- Type: String (IP address)
- Format: IPv4 or IPv6
- Example: "203.0.113.50"
- Description: Client IP address
- Collections: chatbot_history
- Used By: Security, analytics, geo-location
V - Z¶
verified
- Type: Boolean
- Description: Email verification status
- Default: false
- Set to true: After OTP verification
- Collections: users_multichatbot_v2
- Used By: auth-service (login check)
🔗 Related Documentation¶
- Database Schema - Full collection schemas
- Vector Store - Milvus field details
- Index - Architecture overview
Progress: Section 4 - 8/8 files complete (100%)
"Names matter. Document them well." 📖✅