Chatbot Maintenance Service (Port 8010)¶
Service Path: machineagents-be/chatbot-maintence-service/
Port: 8010
Total Lines: 1,129
Purpose: Comprehensive chatbot lifecycle management including deletion (soft/hard), trash recovery, document content management, guardrails configuration, and screenshot storage for chatbot UI simulation.
Table of Contents¶
- Service Overview
- Architecture & Dependencies
- Database Collections
- Core Features
- API Endpoints
- Chatbot Deletion System
- Trash Management System
- Document Content Management
- Guardrails System
- Screenshot Management
- Security Analysis
- Integration Points
Service Overview¶
Primary Responsibilities¶
-
Chatbot Lifecycle Management:
-
Soft delete with isDeleted flag (CosmosDB)
- Hard delete of embeddings (Milvus)
- Hard delete of files (Azure Blob Storage)
-
Complete cleanup across 11 collections
-
Trash & Recovery:
-
7-day trash retention window
- Restore deleted chatbots
-
Permanent deletion from trash
-
Document Content Updates:
-
Modify extracted text
- Regenerate embeddings
- Update chunks
-
Create new documents
-
Guardrails Configuration:
-
System-wide default guardrails
- User/project-specific guardrails
- Category-based content restrictions
-
Hierarchical fallback system
-
Screenshot Management:
- Upload mobile/desktop screenshots
- Azure Blob Storage integration
- Device-specific retrieval with fallback
- User-Agent detection
Architecture & Dependencies¶
Technology Stack¶
Framework:
- FastAPI (web framework)
- Uvicorn (ASGI server)
Databases:
- MongoDB (CosmosDB) - 11 collections
- Milvus - Vector embeddings (via shared service)
Storage:
- Azure Blob Storage - Screenshots & files
AI/ML:
- FastEmbed (BAAI/bge-small-en-v1.5) - Embeddings
Shared Services:
database/milvus_embeddings_service- Milvus operationsstorage/azure_blob_service- Azure Blob operations
Key Imports¶
from fastapi import FastAPI, HTTPException, Query, UploadFile, File, Form, Request, Body
from pymongo import MongoClient
from datetime import datetime, timedelta
from bson import ObjectId
from pydantic import BaseModel, Field
from typing import List, Dict, Any, Optional, Tuple
from azure.storage.blob import BlobServiceClient
from fastembed.embedding import FlagEmbedding as Embedding
# Shared services
sys.path.insert(0, '/app/shared')
from database.milvus_embeddings_service import get_milvus_embeddings_service
from storage.azure_blob_service import get_azure_blob_service
Environment Variables¶
MONGO_URI=mongodb://...
MONGO_DB_NAME=Machine_agent_dev
AZURE_STORAGE_ACCOUNT_NAME=qablobmachineagents
AZURE_STORAGE_ACCOUNT_KEY=kRXPNm77... # ⚠️ HARDCODED FALLBACK
AZURE_SCREENSHOTS_CONTAINER_NAME=screenshots-dev
Embedding Model Configuration¶
Same model as:
- Data Crawling Service
- Response 3D/Text/Voice Chatbot Services
- Selection Chatbot Service
Database Collections¶
11 MongoDB Collections¶
users_collection = db["users_multichatbot_v2"] # User accounts
chatbot_collection = db["chatbot_selections"] # Chatbot configs
trash_chatbot_collection = db["trash_collection_name"] # Deleted chatbots (7 days)
selection_collection = db["selection_history"] # Avatar/voice/model selections
files_collection = db["files"] # Uploaded documents
guardrails_collection = db["chatbot_guardrails"] # Content restrictions
user_system_prompt_collection = db["system_prompts_user"] # Custom prompts
projectid_collection = db["projectid_creation"] # Project metadata
trash_project_collection = db["trash_collection_name"] # Deleted projects
generate_greeting_collection = db["generate_greeting"] # Avatar greetings
files_collection2 = db["organisation_data"] # Organization data
files_secondary_collection = db["files_secondary"] # Secondary files
history_collection = db["chatbot_history"] # Chat logs
lead_collection = db["LEAD_COLLECTION"] # Lead forms
Note: trash_chatbot_collection and trash_project_collection both use the same collection "trash_collection_name" - documents are differentiated by having project_id field or not.
Core Features¶
1. Tri-Storage Deletion Architecture¶
The Problem: Data exists in 3 different storage systems
The Solution: Coordinated deletion across all three
| Storage System | Deletion Type | Strategy |
|---|---|---|
| Milvus | Hard Delete | Immediate permanent deletion of embeddings |
| Azure Blob | Hard Delete | Immediate permanent deletion of files |
| CosmosDB | Soft Delete | Set isDeleted: true + deleted_at timestamp |
Why Different Strategies?
- Milvus: No soft delete feature - embeddings must be purged
- Azure Blob: Storage cost optimization - delete immediately
- CosmosDB: Business logic - 7-day recovery window for users
2. 7-Day Trash System¶
Retention Policy:
- Deleted chatbots move to
trash_collection_name - Available for restore within 7 days
- Auto-purge after 7 days (not implemented in code - manual)
Recovery Flow:
- User requests restore
- System finds trash docs (WHERE
deleted_at>= 7 days ago) - Removes
deleted_atandselection_modelfields - Re-inserts into main collections
- Deletes from trash
Limitation: Milvus/Blob data is GONE - only metadata recovers
3. Document Content Update System¶
Use Case: User wants to edit pre-crawled content without re-uploading
Capabilities:
- Update
extracted_text - Regenerate embeddings with FastEmbed
- Recreate chunks (single chunk per document)
- Update
last_modifiedtimestamp - Create new documents if none exist
Upsert Logic:
4. Hierarchical Guardrails System¶
3-Level Priority:
- User-specific (
user_id+project_id) - Highest priority - System default (
_system_default_user_+_system_default_project_) - None - No guardrails
Category Management:
- Add/Update: Set category instruction to string
- Delete: Set category instruction to
null - Dynamic topics: Unlimited custom categories
5. Screenshot Upload & Retrieval¶
Device Types:
mobile- Mobile device screenshotsdesktop- Desktop screenshots
Storage Path:
Retrieval Logic:
- Detect device from query param OR User-Agent header
- Search for requested device type
- Fallback: If not found, try opposite device type
- Return public blob URL
API Endpoints¶
Endpoint Summary¶
| Endpoint | Method | Purpose |
|---|---|---|
/v2/get-chatbot-selection |
GET | Fetch chatbot config (non-deleted) |
/v2/delete-chatbot-selection |
DELETE | Tri-storage deletion |
/v2/get-trashed-chatbot-sevendays |
GET | List all trashed chatbots (7 days) |
/v2/get-trashed-chatbots |
GET | Restore specific chatbot from trash |
/v2/delete-trashed-chatbots |
DELETE | Permanently delete from trash |
/v2/get_extracted_text/{user_id}/{project_id} |
GET | Retrieve categorized documents |
/v2/update_document_content/{user_id}/{project_id} |
POST | Update/create documents |
/v2/guardrails/default |
POST | Create/update system default guardrails |
/v2/guardrails/user/{user_id}/project/{project_id} |
PUT | Upsert user-specific guardrails |
/v2/guardrails |
GET | Get guardrails with fallback |
/screenshots/upload |
POST | Upload screenshot to Azure Blob |
/screenshots |
GET | Retrieve screenshot with device fallback |
Total: 12 endpoints
Chatbot Deletion System¶
DELETE /v2/delete-chatbot-selection¶
Purpose: Complete chatbot deletion across all storage systems
Request:
Flow:
Step 1: Delete Milvus Embeddings (Hard Delete)¶
milvus_service = get_milvus_embeddings_service()
deleted = milvus_service.delete_embeddings_by_user_project(
collection_name="embeddings",
user_id=user_id,
project_id=project_id
)
What Gets Deleted:
- All vector embeddings for this project
- Indexed documents in Milvus
- NOT RECOVERABLE
Step 2: Delete Azure Blob Files (Hard Delete)¶
azure_blob = get_azure_blob_service()
blob_prefix = f"{user_id}/{project_id}/"
blobs = azure_blob.list_blobs(prefix=blob_prefix)
for blob in blobs:
azure_blob.delete_blob(blob.get('name'))
What Gets Deleted:
- All uploaded files (PDFs, docs, images)
- Screenshot images
- NOT RECOVERABLE
Step 3: Soft Delete CosmosDB Documents¶
11 Collections Updated:
collections_to_check = [
("chatbot_selections", chatbot_collection),
("selection_history", selection_collection),
("organisation_data", files_collection2),
("files", files_collection),
("chatbot_guardrails", guardrails_collection),
("system_prompts_user", user_system_prompt_collection),
("files_secondary", files_secondary_collection),
("generate_greeting", generate_greeting_collection),
("chatbot_history", history_collection),
("LEAD_COLLECTION", lead_collection),
("projectid_creation", projectid_collection)
]
for name, collection in collections_to_check:
collection.update_many(
{"user_id": user_id, "project_id": project_id},
{"$set": {"isDeleted": True, "deleted_at": deletion_timestamp}}
)
Response:
{
"message": "Chatbot deleted successfully",
"details": {
"milvus_embeddings_deleted": true,
"azure_blobs_deleted": 5,
"cosmos_documents_soft_deleted": 23,
"deleted_at": "2024-01-15T10:30:00.000Z"
}
}
Error Handling¶
Resilient Design:
- Milvus failure → Log error, continue deletion
- Blob failure → Log error, continue deletion
- CosmosDB failure → Raises HTTPException
Why? Partial deletion is better than no deletion - user can retry
Trash Management System¶
GET /v2/get-trashed-chatbot-sevendays¶
Purpose: List all trashed chatbots for a user
Request:
Query Logic:
seven_days_ago = datetime.utcnow() - timedelta(days=7)
trashed_chatbots = trash_chatbot_collection.find({
"user_id": user_id,
"deleted_at": {"$gte": seven_days_ago}
})
Response:
{
"user_id": "User-123",
"trashed_chatbots": [
{
"_id": "507f1f77bcf86cd799439011",
"project_id": "User-123_Project_1",
"selection_avatar": "Avatar_Lisa",
"selection_voice": "Female_1",
"deleted_at": "2024-01-10T10:30:00.000Z"
}
]
}
GET /v2/get-trashed-chatbots - RESTORE Function¶
Purpose: Recover deleted chatbot from trash
Request:
Flow:
- Find Trash Documents:
trashed_chatbots = trash_chatbot_collection.find({
"user_id": user_id,
"project_id": project_id,
"deleted_at": {"$gte": seven_days_ago}
})
trashed_projects = trash_project_collection.find({
"user_id": user_id,
"project_id": project_id,
"deleted_at": {"$gte": seven_days_ago}
})
- Clean & Restore:
# Remove deletion metadata
restored_chatbots = [{
k: v for k, v in cb.items()
if k not in ["_id", "deleted_at", "selection_model"]
} for cb in trashed_chatbots]
# Re-insert into main collections
chatbot_collection.insert_many(restored_chatbots)
projectid_collection.insert_many(restored_projects)
- Delete from Trash:
trash_chatbot_collection.delete_many({"_id": {"$in": chatbot_object_ids}})
trash_project_collection.delete_many({"_id": {"$in": project_object_ids}})
Response:
{
"trashed_chatbots": [...],
"trashed_projects": [...],
"restored_chatbots_count": 1,
"restored_projects_count": 1
}
⚠️ LIMITATION: Milvus embeddings and Azure Blobs are NOT restored (hard deleted)
DELETE /v2/delete-trashed-chatbots¶
Purpose: Permanently delete from trash (no recovery)
Request:
Implementation:
chatbot_result = trash_chatbot_collection.delete_many({
"user_id": user_id,
"project_id": project_id
})
project_result = trash_project_collection.delete_many({
"user_id": user_id,
"project_id": project_id
})
Response:
{
"message": "2 trashed chatbot(s) and 1 trashed project(s) permanently deleted.",
"chatbots_deleted": 2,
"projects_deleted": 1,
"total_deleted": 3
}
Document Content Management¶
GET /v2/get_extracted_text/{user_id}/{project_id}¶
Purpose: Retrieve all extracted text categorized by file type
Request:
Query:
documents = files_collection.find(
{"user_id": user_id, "project_id": project_id},
{"file_blob": 0, "embeddings": 0} # Exclude large fields
)
Categorization Logic:
categorized_data = {
"files": [], # PDF, DOCX, JSON, CSV, etc.
"qna": [], # Q&A pairs
"text": [], # Plain text/TXT files
"url": [] # Crawled web content
}
for doc in documents:
file_type = doc.get("file_type", "unknown").lower()
text_content = doc.get("extracted_text")
if file_type == "url" and isinstance(text_content, dict):
# URL content: separate entries per URL
for url, content in text_content.items():
categorized_data["url"].append({
"id": doc_id,
"file_type": file_type,
"url": url,
"content": content
})
elif file_type == "qna":
categorized_data["qna"].append(entry)
elif file_type in ("text", "txt"):
categorized_data["text"].append(entry)
else:
categorized_data["files"].append(entry)
Response:
{
"user_id": "User-123",
"project_id": "User-123_Project_1",
"categorized_texts": {
"files": [
{
"id": "507f1f77bcf86cd799439011",
"file_type": "pdf",
"content": "Financial report Q4 2023..."
}
],
"qna": [
{
"id": "507f1f77bcf86cd799439012",
"file_type": "qna",
"content": "Q: What are your hours? A: 9 AM - 5 PM"
}
],
"url": [
{
"id": "507f1f77bcf86cd799439013",
"file_type": "url",
"url": "https://example.com/about",
"content": "About us page content..."
}
]
}
}
Empty Category Removal:
POST /v2/update_document_content/{user_id}/{project_id}¶
Purpose: Update or create document content with new embeddings
Request:
POST /v2/update_document_content/User-123/User-123_Project_1
Content-Type: application/json
{
"documents": [
{
"content": "Updated company information...",
"file_type": "text"
}
]
}
Flow:
1. Find Existing Document¶
2. Generate New Embedding¶
embedding_generator = embedder.embed([new_content])
embedding_list = list(embedding_generator)
new_embedding = list(map(float, embedding_list[0]))
Model: BAAI/bge-small-en-v1.5 (384 dimensions)
3. Create Chunk¶
new_chunks = [{
"chunk_index": 0,
"content": new_content,
"start_pos": 0,
"end_pos": len(new_content),
"length": len(new_content)
}]
Note: Single chunk per document (no chunking strategy)
4. Update or Insert¶
If Document Exists:
files_collection.update_one(
{"_id": doc_id},
{"$set": {
"extracted_text": new_content,
"embeddings": new_embedding,
"chunks": new_chunks,
"last_modified": datetime.utcnow()
}}
)
If Document Does NOT Exist:
new_doc = {
"user_id": user_id,
"project_id": project_id,
"extracted_text": new_content,
"embeddings": new_embedding,
"chunks": new_chunks,
"file_type": file_type,
"created_at": datetime.utcnow(),
"last_modified": datetime.utcnow()
}
files_collection.insert_one(new_doc)
Response:
{
"user_id": "User-123",
"project_id": "User-123_Project_1",
"total_documents": 2,
"updated_documents": 1,
"failed_documents": 1,
"results": [
{
"doc_id": "507f1f77bcf86cd799439011",
"status": "success",
"message": "Document updated successfully"
},
{
"doc_id": "507f1f77bcf86cd799439014",
"status": "created",
"message": "Document created successfully"
}
]
}
⚠️ LIMITATION: Only updates files collection, NOT Milvus - embeddings become out of sync
Guardrails System¶
Architecture¶
Purpose: Content moderation and topic restrictions for chatbot responses
3-Level Hierarchy:
1. User-Specific Guardrails (user_id + project_id)
↓ (if not found)
2. System Default Guardrails (_system_default_user_ + _system_default_project_)
↓ (if not found)
3. No Guardrails (empty response)
Data Model¶
Pydantic Models:
class GuardrailBase(BaseModel):
name: str
description: Optional[str] = None
categories: Dict[str, str] = Field(default_factory=dict)
class GuardrailCreateUpdateWithCategories(BaseModel):
name: str
description: Optional[str] = None
categories: Dict[str, Optional[str]] = Field(default_factory=dict)
class GuardrailAPIResponse(GuardrailBase):
id: str = Field(alias="_id")
user_id: str
project_id: str
created_at: datetime
updated_at: datetime
is_system_default_source: bool = False
Key Difference:
- Input (
GuardrailCreateUpdateWithCategories):categoriesvalues can beOptional[str](null = delete) - Output (
GuardrailAPIResponse):categoriesvalues are alwaysstr
Category Update Logic¶
Helper Function:
def prepare_category_updates(input_categories: Dict[str, Optional[str]]) -> Tuple[Dict[str, str], List[str]]:
categories_to_set_op: Dict[str, str] = {}
categories_to_unset_op: List[str] = []
for category_name, instruction in input_categories.items():
dot_notation_key = f"categories.{category_name.strip()}"
if instruction is None:
# Mark for removal
categories_to_unset_op.append(dot_notation_key)
elif isinstance(instruction, str) and instruction.strip():
# Mark for set/update
categories_to_set_op[dot_notation_key] = instruction.strip()
else:
# Skip invalid
pass
return categories_to_set_op, categories_to_unset_op
MongoDB Operations:
- Add/Update:
$set: {"categories.Politics": "Avoid political discussions"} - Delete:
$unset: {"categories.Finance": ""}
POST /v2/guardrails/default¶
Purpose: Create or update system-wide default guardrails
Request:
POST /v2/guardrails/default
Content-Type: application/json
{
"name": "Default System Guardrails",
"description": "Standard content restrictions.",
"categories": {
"Politics": "Please refrain from discussing politics.",
"Religion": "Discussions on religion are not permitted.",
"Profanity": "Avoid using profanity or offensive language."
}
}
To Remove a Category:
{
"name": "Default System Guardrails",
"categories": {
"Politics": "Strictly no political discussions.",
"Religion": null
}
}
Implementation:
now = datetime.utcnow()
query = {
"user_id": "_system_default_user_",
"project_id": "_system_default_project_"
}
categories_to_set, categories_to_unset = prepare_category_updates(input_categories)
update_payload = {
"$set": {
"name": name,
"description": description,
"updated_at": now,
**categories_to_set # e.g., {"categories.Politics": "..."}
},
"$setOnInsert": {
"user_id": "_system_default_user_",
"project_id": "_system_default_project_",
"created_at": now
}
}
if categories_to_unset:
update_payload["$unset"] = {key: "" for key in categories_to_unset}
result = guardrails_collection.update_one(query, update_payload, upsert=True)
Response:
{
"_id": "507f1f77bcf86cd799439011",
"user_id": "_system_default_user_",
"project_id": "_system_default_project_",
"name": "Default System Guardrails",
"description": "Standard content restrictions.",
"categories": {
"Politics": "Please refrain from discussing politics.",
"Profanity": "Avoid using profanity or offensive language."
},
"created_at": "2024-01-15T10:00:00.000Z",
"updated_at": "2024-01-15T10:30:00.000Z",
"is_system_default_source": true
}
PUT /v2/guardrails/user/{user_id}/project/{project_id}¶
Purpose: Create/update user-specific guardrails
Request:
PUT /v2/guardrails/user/User-123/project/User-123_Project_1
Content-Type: application/json
{
"name": "My Custom Guardrails",
"description": "Additional restrictions for my chatbot",
"categories": {
"Crypto": "Do not discuss cryptocurrency investments.",
"Health": "Avoid providing medical advice."
}
}
Validation:
if user_id == "_system_default_user_" and project_id == "_system_default_project_":
raise HTTPException(
status_code=400,
detail="Cannot modify default guardrails via this endpoint. Use /v2/guardrails/default."
)
Implementation: Same as default guardrails but with real user_id/project_id
Response:
{
"_id": "507f1f77bcf86cd799439012",
"user_id": "User-123",
"project_id": "User-123_Project_1",
"name": "My Custom Guardrails",
"categories": {
"Crypto": "Do not discuss cryptocurrency investments.",
"Health": "Avoid providing medical advice."
},
"created_at": "2024-01-15T11:00:00.000Z",
"updated_at": "2024-01-15T11:00:00.000Z",
"is_system_default_source": false
}
GET /v2/guardrails¶
Purpose: Retrieve guardrails with automatic fallback
Request:
Flow:
- Try User-Specific:
user_specific_doc = guardrails_collection.find_one({
"user_id": user_id,
"project_id": project_id
})
if user_specific_doc:
return user_specific_doc
- Fallback to Default:
default_doc = guardrails_collection.find_one({
"user_id": "_system_default_user_",
"project_id": "_system_default_project_"
})
if default_doc:
return default_doc
- No Guardrails Found:
raise HTTPException(
status_code=404,
detail="No guardrails configured for this user/project, and no system default guardrails found."
)
Response Indicator:
{
"_id": "507f1f77bcf86cd799439011",
"user_id": "_system_default_user_",
"project_id": "_system_default_project_",
"name": "Default System Guardrails",
"categories": {...},
"is_system_default_source": true
}
is_system_default_source Flag:
true→ Returned default guardrails (user-specific not found)false→ Returned user-specific guardrails
Empty Categories Handling:
if "categories" not in doc_to_return or not isinstance(doc_to_return.get("categories"), dict):
doc_to_return["categories"] = {}
Screenshot Management¶
Purpose¶
Store and serve mobile/desktop screenshots for chatbot UI simulation/demo purposes.
Azure Blob Storage Configuration¶
STORAGE_ACCOUNT_NAME = os.getenv("AZURE_STORAGE_ACCOUNT_NAME", "qablobmachineagents")
STORAGE_ACCOUNT_KEY = os.getenv("AZURE_STORAGE_ACCOUNT_KEY", "kRXPNm77...") # ⚠️ HARDCODED FALLBACK
SCREENSHOTS_CONTAINER_NAME = os.getenv("AZURE_SCREENSHOTS_CONTAINER_NAME", "screenshots")
blob_service_client = BlobServiceClient(
account_url=f"https://{STORAGE_ACCOUNT_NAME}.blob.core.windows.net",
credential=STORAGE_ACCOUNT_KEY
)
Container Auto-Creation:
try:
container_client.create_container()
except Exception as ex:
if 'ContainerAlreadyExists' not in str(ex):
raise
POST /screenshots/upload¶
Purpose: Upload mobile or desktop screenshot
Request:
POST /screenshots/upload
Content-Type: multipart/form-data
project_id=User-123_Project_1
user_id=User-123
type=mobile
file=<binary image data>
Parameters:
project_id(Form) - Project identifieruser_id(Form) - User identifiertype(Form) - Enum:mobile|desktopfile(File) - Image file
Blob Naming:
sanitized_filename = "".join(
c for c in file.filename
if c.isalnum() or c in (' ', '.', '_')
).rstrip().replace(' ', '_')
blob_name = f"{project_id}/{user_id}/{type.value}/{sanitized_filename}"
# Example: User-123_Project_1/User-123/mobile/homepage_screenshot.png
Upload:
blob_client = container_client.get_blob_client(blob_name)
file.file.seek(0)
blob_client.upload_blob(file.file, overwrite=True)
Response:
{
"message": "Screenshot uploaded successfully",
"url": "https://qablobmachineagents.blob.core.windows.net/screenshots/User-123_Project_1/User-123/mobile/homepage_screenshot.png"
}
GET /screenshots¶
Purpose: Retrieve screenshot with device-based fallback
Request:
Parameters:
project_id(Query) - Project identifieruser_id(Query) - User identifierdevice(Query, Optional) -mobile|desktop
Device Detection Logic:
- From Query Param:
- From User-Agent Header:
if requested_type is None:
user_agent = request.headers.get("User-Agent", "").lower()
if "mobile" in user_agent:
requested_type = ScreenshotType.mobile
else:
requested_type = ScreenshotType.desktop
Retrieval Flow:
- Search for Requested Type:
blob_name_requested = f"{project_id}/{user_id}/{requested_type.value}/"
blob_list = list(container_client.list_blobs(name_starts_with=blob_name_requested))
if blob_list:
first_blob = blob_list[0]
blob_url = f"https://{STORAGE_ACCOUNT_NAME}.blob.core.windows.net/{SCREENSHOTS_CONTAINER_NAME}/{first_blob.name}"
return {"url": blob_url, "type": requested_type.value}
- Fallback to Opposite Type:
fallback_type = ScreenshotType.desktop if requested_type == ScreenshotType.mobile else ScreenshotType.mobile
blob_name_fallback = f"{project_id}/{user_id}/{fallback_type.value}/"
blob_list_fallback = list(container_client.list_blobs(name_starts_with=blob_name_fallback))
if blob_list_fallback:
# Return fallback screenshot
return {"url": blob_url, "type": fallback_type.value}
- No Screenshots Found:
Response:
{
"url": "https://qablobmachineagents.blob.core.windows.net/screenshots/User-123_Project_1/User-123/mobile/homepage.png",
"type": "mobile"
}
Use Case: Frontend displays chatbot UI preview using uploaded screenshots
Security Analysis¶
🔴 CRITICAL: Hardcoded Azure Storage Credentials¶
Lines 985-986:
STORAGE_ACCOUNT_NAME = os.getenv("AZURE_STORAGE_ACCOUNT_NAME", "qablobmachineagents")
STORAGE_ACCOUNT_KEY = os.getenv("AZURE_STORAGE_ACCOUNT_KEY", "kRXPNm77OyebjhfvHSchOHE1KwpkQefdjZLt4k/Nwajf3xUO+HIts2+hoBmF1iiO9Gv8Z9JbYH/v+ASt1ubG5w==")
Risk: Full access to Azure Blob Storage account + all containers
Impact:
- Upload malicious files
- Delete all screenshots
- Access other containers
- High storage cost (upload spam)
Fix:
STORAGE_ACCOUNT_KEY = os.getenv("AZURE_STORAGE_ACCOUNT_KEY")
if not STORAGE_ACCOUNT_KEY:
raise RuntimeError("AZURE_STORAGE_ACCOUNT_KEY environment variable not set")
🟠 SECURITY: Overly Permissive CORS¶
Lines 29-35:
app.add_middleware(
CORS Middleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
Risk: Cross-Origin Resource Sharing from ANY domain
Fix:
🟡 CODE QUALITY: Same Collection for Different Trash Types¶
Lines 69, 75:
trash_chatbot_collection = db["trash_collection_name"]
trash_project_collection = db["trash_collection_name"]
Issue: Both variables point to same collection - differentiation only by document structure
Better Design:
🟡 DATA INTEGRITY: Milvus/Blob Not Restored¶
Lines 154-265: Deletion flow hard-deletes Milvus + Blob data
Issue: Trash restore only recovers CosmosDB metadata - embeddings/files gone forever
User Impact:
- Restored chatbot has no knowledge base
- RAG system returns no results
- User must re-upload all documents
Solution: Consider not hard-deleting Milvus/Blob immediately, or warn user prominently
🟢 GOOD PRACTICE: Shared Service Integration¶
Lines 23-24:
from database.milvus_embeddings_service import get_milvus_embeddings_service
from storage.azure_blob_service import get_azure_blob_service
Benefit: Centralized Milvus/Blob logic across all services
🟢 GOOD PRACTICE: Resilient Deletion¶
Lines 183-185, 209-210:
except Exception as milvus_error:
logger.error(f"❌ Error deleting Milvus embeddings: {milvus_error}")
# Continue with deletion even if Milvus fails
Benefit: Partial deletion better than complete failure
Integration Points¶
1. Response Chatbot Services Integration¶
Deleted Chatbot Check:
Response services (3D/Text/Voice) check isDeleted flag:
chatbot_config = chatbot_collection.find_one({
"user_id": user_id,
"project_id": project_id,
"$or": [
{"isDeleted": {"$exists": False}},
{"isDeleted": False}
]
})
if not chatbot_config or chatbot_config.get("isDeleted") is True:
return {"error": "This chatbot is no longer available."}
Soft Delete Advantage: Response services immediately stop serving deleted chatbots
2. Data Crawling Service Integration¶
Blob Storage Cleanup:
When chatbot deleted, this service removes blobs crawled by Data Crawling Service:
blob_prefix = f"{user_id}/{project_id}/"
# Deletes all: PDFs, extracted text, images uploaded via crawling
3. Shared Service Dependencies¶
Milvus Embeddings Service:
Azure Blob Service:
4. Frontend Integration¶
Trash UI Flow:
- User clicks "Delete Chatbot" → Frontend calls
DELETE /v2/delete-chatbot-selection - Chatbot moves to trash (7-day window)
- User views trash → Frontend calls
GET /v2/get-trashed-chatbot-sevendays - User restores → Frontend calls
GET /v2/get-trashed-chatbots - User permanently deletes → Frontend calls
DELETE /v2/delete-trashed-chatbots
Guardrails UI Flow:
- Admin sets defaults →
POST /v2/guardrails/default - User customizes →
PUT /v2/guardrails/user/{user_id}/project/{project_id} - Response services fetch →
GET /v2/guardrails - Response services apply restrictions based on categories
Screenshot UI Flow:
- User uploads chatbot UI preview →
POST /screenshots/upload - Marketing page displays preview →
GET /screenshots - Mobile users auto-detected via User-Agent
5. Guardrails Consumer Services¶
Response 3D Chatbot Service:
guardrails = requests.get(f"{MAINTENANCE_SERVICE_URL}/v2/guardrails?user_id={user_id}&project_id={project_id}")
# Apply guardrails.categories to LLM system prompt
Integration Pattern: Same for Text/Voice chatbot services
Summary¶
Service Statistics¶
- Total Lines: 1,129
- Total Endpoints: 12
- Total Collections: 11 (MongoDB) + 1 (Milvus) + 1 (Azure Blob)
- Shared Services: 2 (Milvus, Azure Blob)
- Security Issues: 2 critical (hardcoded key, CORS)
Key Capabilities¶
- ✅ Tri-Storage Deletion - Coordinated Milvus + Blob + CosmosDB cleanup
- ✅ 7-Day Trash Recovery - Soft delete with restore window
- ✅ Document Content Updates - Re-embed and update knowledge base
- ✅ Hierarchical Guardrails - User-specific with system fallback
- ✅ Screenshot Management - Device-specific with auto-fallback
Critical Fixes Needed¶
- 🔴 Externalize Azure Storage credentials
- 🟠 Restrict CORS to known origins
- 🟡 Split trash collections (chatbots vs projects)
- 🟡 Warn users about Milvus/Blob permanent deletion
- 🟡 Sync Milvus when updating document content
Deployment Notes¶
Docker Compose (Port 8010):
chatbot-maintenance-service:
build: ./chatbot-maintence-service
container_name: chatbot-maintenance-service
ports:
- "8010:8010"
volumes:
- ./shared:/app/shared:ro
environment:
- MONGO_URI=...
- AZURE_STORAGE_ACCOUNT_NAME=qablobmachineagents
- AZURE_STORAGE_ACCOUNT_KEY=***
- AZURE_SCREENSHOTS_CONTAINER_NAME=screenshots-dev
Shared Volume Required: /app/shared for Milvus/Blob services
Documentation Complete: Chatbot Maintenance Service (Port 8010)
Status: COMPREHENSIVE, DEVELOPER-GRADE, INVESTOR-GRADE, AUDIT-READY ✅