Chatbot Maintenance Service (Port 8010)¶

Service Path: machineagents-be/chatbot-maintence-service/
Port: 8010
Total Lines: 1,129
Purpose: Comprehensive chatbot lifecycle management including deletion (soft/hard), trash recovery, document content management, guardrails configuration, and screenshot storage for chatbot UI simulation.

Service Overview¶

Primary Responsibilities¶

Chatbot Lifecycle Management:
Soft delete with isDeleted flag (CosmosDB)
Hard delete of embeddings (Milvus)
Hard delete of files (Azure Blob Storage)
Complete cleanup across 11 collections
Trash & Recovery:
7-day trash retention window
Restore deleted chatbots
Permanent deletion from trash
Document Content Updates:
Modify extracted text
Regenerate embeddings
Update chunks
Create new documents
Guardrails Configuration:
System-wide default guardrails
User/project-specific guardrails
Category-based content restrictions
Hierarchical fallback system
Screenshot Management:
Upload mobile/desktop screenshots
Azure Blob Storage integration
Device-specific retrieval with fallback
User-Agent detection

Architecture & Dependencies¶

Technology Stack¶

Framework:

FastAPI (web framework)
Uvicorn (ASGI server)

Databases:

MongoDB (CosmosDB) - 11 collections
Milvus - Vector embeddings (via shared service)

Storage:

Azure Blob Storage - Screenshots & files

AI/ML:

FastEmbed (BAAI/bge-small-en-v1.5) - Embeddings

Shared Services:

database/milvus_embeddings_service - Milvus operations
storage/azure_blob_service - Azure Blob operations

Key Imports¶

from fastapi import FastAPI, HTTPException, Query, UploadFile, File, Form, Request, Body
from pymongo import MongoClient
from datetime import datetime, timedelta
from bson import ObjectId
from pydantic import BaseModel, Field
from typing import List, Dict, Any, Optional, Tuple
from azure.storage.blob import BlobServiceClient
from fastembed.embedding import FlagEmbedding as Embedding

# Shared services
sys.path.insert(0, '/app/shared')
from database.milvus_embeddings_service import get_milvus_embeddings_service
from storage.azure_blob_service import get_azure_blob_service

Environment Variables¶

MONGO_URI=mongodb://...
MONGO_DB_NAME=Machine_agent_dev
AZURE_STORAGE_ACCOUNT_NAME=qablobmachineagents
AZURE_STORAGE_ACCOUNT_KEY=kRXPNm77...  # ⚠️ HARDCODED FALLBACK
AZURE_SCREENSHOTS_CONTAINER_NAME=screenshots-dev

Embedding Model Configuration¶

embedder = Embedding(model_name="BAAI/bge-small-en-v1.5", max_length=512)

Same model as:

Data Crawling Service
Response 3D/Text/Voice Chatbot Services
Selection Chatbot Service

Database Collections¶

11 MongoDB Collections¶

users_collection = db["users_multichatbot_v2"]              # User accounts
chatbot_collection = db["chatbot_selections"]               # Chatbot configs
trash_chatbot_collection = db["trash_collection_name"]      # Deleted chatbots (7 days)
selection_collection = db["selection_history"]              # Avatar/voice/model selections
files_collection = db["files"]                              # Uploaded documents
guardrails_collection = db["chatbot_guardrails"]            # Content restrictions
user_system_prompt_collection = db["system_prompts_user"]  # Custom prompts
projectid_collection = db["projectid_creation"]             # Project metadata
trash_project_collection = db["trash_collection_name"]      # Deleted projects
generate_greeting_collection = db["generate_greeting"]      # Avatar greetings
files_collection2 = db["organisation_data"]                 # Organization data
files_secondary_collection = db["files_secondary"]          # Secondary files
history_collection = db["chatbot_history"]                  # Chat logs
lead_collection = db["LEAD_COLLECTION"]                     # Lead forms

Note: trash_chatbot_collection and trash_project_collection both use the same collection "trash_collection_name" - documents are differentiated by having project_id field or not.

Core Features¶

1. Tri-Storage Deletion Architecture¶

The Problem: Data exists in 3 different storage systems
The Solution: Coordinated deletion across all three

Storage System	Deletion Type	Strategy
Milvus	Hard Delete	Immediate permanent deletion of embeddings
Azure Blob	Hard Delete	Immediate permanent deletion of files
CosmosDB	Soft Delete	Set `isDeleted: true` + `deleted_at` timestamp

Why Different Strategies?

Milvus: No soft delete feature - embeddings must be purged
Azure Blob: Storage cost optimization - delete immediately
CosmosDB: Business logic - 7-day recovery window for users

2. 7-Day Trash System¶

Retention Policy:

Deleted chatbots move to trash_collection_name
Available for restore within 7 days
Auto-purge after 7 days (not implemented in code - manual)

Recovery Flow:

User requests restore
System finds trash docs (WHERE deleted_at >= 7 days ago)
Removes deleted_at and selection_model fields
Re-inserts into main collections
Deletes from trash

Limitation: Milvus/Blob data is GONE - only metadata recovers

3. Document Content Update System¶

Use Case: User wants to edit pre-crawled content without re-uploading

Capabilities:

Update extracted_text
Regenerate embeddings with FastEmbed
Recreate chunks (single chunk per document)
Update last_modified timestamp
Create new documents if none exist

Upsert Logic:

if existing_doc:
    update_one()  # Update existing
else:
    insert_one()  # Create new

4. Hierarchical Guardrails System¶

3-Level Priority:

User-specific (user_id + project_id) - Highest priority
System default (_system_default_user_ + _system_default_project_)
None - No guardrails

Category Management:

Add/Update: Set category instruction to string
Delete: Set category instruction to null
Dynamic topics: Unlimited custom categories

5. Screenshot Upload & Retrieval¶

Device Types:

mobile - Mobile device screenshots
desktop - Desktop screenshots

Storage Path:

project_id/user_id/device_type/filename

Retrieval Logic:

Detect device from query param OR User-Agent header
Search for requested device type
Fallback: If not found, try opposite device type
Return public blob URL

API Endpoints¶

Endpoint Summary¶

Endpoint	Method	Purpose
`/v2/get-chatbot-selection`	GET	Fetch chatbot config (non-deleted)
`/v2/delete-chatbot-selection`	DELETE	Tri-storage deletion
`/v2/get-trashed-chatbot-sevendays`	GET	List all trashed chatbots (7 days)
`/v2/get-trashed-chatbots`	GET	Restore specific chatbot from trash
`/v2/delete-trashed-chatbots`	DELETE	Permanently delete from trash
`/v2/get_extracted_text/{user_id}/{project_id}`	GET	Retrieve categorized documents
`/v2/update_document_content/{user_id}/{project_id}`	POST	Update/create documents
`/v2/guardrails/default`	POST	Create/update system default guardrails
`/v2/guardrails/user/{user_id}/project/{project_id}`	PUT	Upsert user-specific guardrails
`/v2/guardrails`	GET	Get guardrails with fallback
`/screenshots/upload`	POST	Upload screenshot to Azure Blob
`/screenshots`	GET	Retrieve screenshot with device fallback

Total: 12 endpoints

Chatbot Deletion System¶

DELETE `/v2/delete-chatbot-selection`¶

Purpose: Complete chatbot deletion across all storage systems

Request:

DELETE /v2/delete-chatbot-selection?user_id=User-123&project_id=User-123_Project_1

Flow:

Step 1: Delete Milvus Embeddings (Hard Delete)¶

milvus_service = get_milvus_embeddings_service()
deleted = milvus_service.delete_embeddings_by_user_project(
    collection_name="embeddings",
    user_id=user_id,
    project_id=project_id
)

What Gets Deleted:

All vector embeddings for this project
Indexed documents in Milvus
NOT RECOVERABLE

Step 2: Delete Azure Blob Files (Hard Delete)¶

azure_blob = get_azure_blob_service()
blob_prefix = f"{user_id}/{project_id}/"
blobs = azure_blob.list_blobs(prefix=blob_prefix)

for blob in blobs:
    azure_blob.delete_blob(blob.get('name'))

What Gets Deleted:

All uploaded files (PDFs, docs, images)
Screenshot images
NOT RECOVERABLE

Step 3: Soft Delete CosmosDB Documents¶

11 Collections Updated:

collections_to_check = [
    ("chatbot_selections", chatbot_collection),
    ("selection_history", selection_collection),
    ("organisation_data", files_collection2),
    ("files", files_collection),
    ("chatbot_guardrails", guardrails_collection),
    ("system_prompts_user", user_system_prompt_collection),
    ("files_secondary", files_secondary_collection),
    ("generate_greeting", generate_greeting_collection),
    ("chatbot_history", history_collection),
    ("LEAD_COLLECTION", lead_collection),
    ("projectid_creation", projectid_collection)
]

for name, collection in collections_to_check:
    collection.update_many(
        {"user_id": user_id, "project_id": project_id},
        {"$set": {"isDeleted": True, "deleted_at": deletion_timestamp}}
    )

Response:

{
  "message": "Chatbot deleted successfully",
  "details": {
    "milvus_embeddings_deleted": true,
    "azure_blobs_deleted": 5,
    "cosmos_documents_soft_deleted": 23,
    "deleted_at": "2024-01-15T10:30:00.000Z"
  }
}

Error Handling¶

Resilient Design:

Milvus failure → Log error, continue deletion
Blob failure → Log error, continue deletion
CosmosDB failure → Raises HTTPException

Why? Partial deletion is better than no deletion - user can retry

Trash Management System¶

GET `/v2/get-trashed-chatbot-sevendays`¶

Purpose: List all trashed chatbots for a user

Request:

GET /v2/get-trashed-chatbot-sevendays?user_id=User-123

Query Logic:

seven_days_ago = datetime.utcnow() - timedelta(days=7)
trashed_chatbots = trash_chatbot_collection.find({
    "user_id": user_id,
    "deleted_at": {"$gte": seven_days_ago}
})

Response:

{
  "user_id": "User-123",
  "trashed_chatbots": [
    {
      "_id": "507f1f77bcf86cd799439011",
      "project_id": "User-123_Project_1",
      "selection_avatar": "Avatar_Lisa",
      "selection_voice": "Female_1",
      "deleted_at": "2024-01-10T10:30:00.000Z"
    }
  ]
}

GET `/v2/get-trashed-chatbots` - RESTORE Function¶

Purpose: Recover deleted chatbot from trash

Request:

GET /v2/get-trashed-chatbots?user_id=User-123&project_id=User-123_Project_1

Flow:

Find Trash Documents:

trashed_chatbots = trash_chatbot_collection.find({
    "user_id": user_id,
    "project_id": project_id,
    "deleted_at": {"$gte": seven_days_ago}
})

trashed_projects = trash_project_collection.find({
    "user_id": user_id,
    "project_id": project_id,
    "deleted_at": {"$gte": seven_days_ago}
})

Clean & Restore:

# Remove deletion metadata
restored_chatbots = [{
    k: v for k, v in cb.items()
    if k not in ["_id", "deleted_at", "selection_model"]
} for cb in trashed_chatbots]

# Re-insert into main collections
chatbot_collection.insert_many(restored_chatbots)
projectid_collection.insert_many(restored_projects)

Delete from Trash:

trash_chatbot_collection.delete_many({"_id": {"$in": chatbot_object_ids}})
trash_project_collection.delete_many({"_id": {"$in": project_object_ids}})

Response:

{
    "trashed_chatbots": [...],
    "trashed_projects": [...],
    "restored_chatbots_count": 1,
    "restored_projects_count": 1
}

⚠️ LIMITATION: Milvus embeddings and Azure Blobs are NOT restored (hard deleted)

DELETE `/v2/delete-trashed-chatbots`¶

Purpose: Permanently delete from trash (no recovery)

Request:

DELETE /v2/delete-trashed-chatbots?user_id=User-123&project_id=User-123_Project_1

Implementation:

chatbot_result = trash_chatbot_collection.delete_many({
    "user_id": user_id,
    "project_id": project_id
})

project_result = trash_project_collection.delete_many({
    "user_id": user_id,
    "project_id": project_id
})

Response:

{
  "message": "2 trashed chatbot(s) and 1 trashed project(s) permanently deleted.",
  "chatbots_deleted": 2,
  "projects_deleted": 1,
  "total_deleted": 3
}

Document Content Management¶

GET `/v2/get_extracted_text/{user_id}/{project_id}`¶

Purpose: Retrieve all extracted text categorized by file type

Request:

GET /v2/get_extracted_text/User-123/User-123_Project_1

Query:

documents = files_collection.find(
    {"user_id": user_id, "project_id": project_id},
    {"file_blob": 0, "embeddings": 0}  # Exclude large fields
)

Categorization Logic:

categorized_data = {
    "files": [],    # PDF, DOCX, JSON, CSV, etc.
    "qna": [],      # Q&A pairs
    "text": [],     # Plain text/TXT files
    "url": []       # Crawled web content
}

for doc in documents:
    file_type = doc.get("file_type", "unknown").lower()
    text_content = doc.get("extracted_text")

    if file_type == "url" and isinstance(text_content, dict):
        # URL content: separate entries per URL
        for url, content in text_content.items():
            categorized_data["url"].append({
                "id": doc_id,
                "file_type": file_type,
                "url": url,
                "content": content
            })
    elif file_type == "qna":
        categorized_data["qna"].append(entry)
   elif file_type in ("text", "txt"):
        categorized_data["text"].append(entry)
    else:
        categorized_data["files"].append(entry)

Response:

{
  "user_id": "User-123",
  "project_id": "User-123_Project_1",
  "categorized_texts": {
    "files": [
      {
        "id": "507f1f77bcf86cd799439011",
        "file_type": "pdf",
        "content": "Financial report Q4 2023..."
      }
    ],
    "qna": [
      {
        "id": "507f1f77bcf86cd799439012",
        "file_type": "qna",
        "content": "Q: What are your hours? A: 9 AM - 5 PM"
      }
    ],
    "url": [
      {
        "id": "507f1f77bcf86cd799439013",
        "file_type": "url",
        "url": "https://example.com/about",
        "content": "About us page content..."
      }
    ]
  }
}

Empty Category Removal:

final_result = {
    key: value for key, value in categorized_data.items() if value
}

POST `/v2/update_document_content/{user_id}/{project_id}`¶

Purpose: Update or create document content with new embeddings

Request:

POST /v2/update_document_content/User-123/User-123_Project_1
Content-Type: application/json

{
    "documents": [
        {
            "content": "Updated company information...",
            "file_type": "text"
        }
    ]
}

Flow:

1. Find Existing Document¶

existing_doc = files_collection.find_one({
    "user_id": user_id,
    "project_id": project_id
})

2. Generate New Embedding¶

embedding_generator = embedder.embed([new_content])
embedding_list = list(embedding_generator)
new_embedding = list(map(float, embedding_list[0]))

Model: BAAI/bge-small-en-v1.5 (384 dimensions)

3. Create Chunk¶

new_chunks = [{
    "chunk_index": 0,
    "content": new_content,
    "start_pos": 0,
    "end_pos": len(new_content),
    "length": len(new_content)
}]

Note: Single chunk per document (no chunking strategy)

4. Update or Insert¶

If Document Exists:

files_collection.update_one(
    {"_id": doc_id},
    {"$set": {
        "extracted_text": new_content,
        "embeddings": new_embedding,
        "chunks": new_chunks,
        "last_modified": datetime.utcnow()
    }}
)

If Document Does NOT Exist:

new_doc = {
    "user_id": user_id,
    "project_id": project_id,
    "extracted_text": new_content,
    "embeddings": new_embedding,
    "chunks": new_chunks,
    "file_type": file_type,
    "created_at": datetime.utcnow(),
    "last_modified": datetime.utcnow()
}
files_collection.insert_one(new_doc)

Response:

{
  "user_id": "User-123",
  "project_id": "User-123_Project_1",
  "total_documents": 2,
  "updated_documents": 1,
  "failed_documents": 1,
  "results": [
    {
      "doc_id": "507f1f77bcf86cd799439011",
      "status": "success",
      "message": "Document updated successfully"
    },
    {
      "doc_id": "507f1f77bcf86cd799439014",
      "status": "created",
      "message": "Document created successfully"
    }
  ]
}

⚠️ LIMITATION: Only updates files collection, NOT Milvus - embeddings become out of sync

Guardrails System¶

Architecture¶

Purpose: Content moderation and topic restrictions for chatbot responses

3-Level Hierarchy:

1. User-Specific Guardrails (user_id + project_id)
        ↓ (if not found)
2. System Default Guardrails (_system_default_user_ + _system_default_project_)
        ↓ (if not found)
3. No Guardrails (empty response)

Data Model¶

Pydantic Models:

class GuardrailBase(BaseModel):
    name: str
    description: Optional[str] = None
    categories: Dict[str, str] = Field(default_factory=dict)

class GuardrailCreateUpdateWithCategories(BaseModel):
    name: str
    description: Optional[str] = None
    categories: Dict[str, Optional[str]] = Field(default_factory=dict)

class GuardrailAPIResponse(GuardrailBase):
    id: str = Field(alias="_id")
    user_id: str
    project_id: str
    created_at: datetime
    updated_at: datetime
    is_system_default_source: bool = False

Key Difference:

Input (GuardrailCreateUpdateWithCategories): categories values can be Optional[str] (null = delete)
Output (GuardrailAPIResponse): categories values are always str

Category Update Logic¶

Helper Function:

def prepare_category_updates(input_categories: Dict[str, Optional[str]]) -> Tuple[Dict[str, str], List[str]]:
    categories_to_set_op: Dict[str, str] = {}
    categories_to_unset_op: List[str] = []

    for category_name, instruction in input_categories.items():
        dot_notation_key = f"categories.{category_name.strip()}"

        if instruction is None:
            # Mark for removal
            categories_to_unset_op.append(dot_notation_key)
        elif isinstance(instruction, str) and instruction.strip():
            # Mark for set/update
            categories_to_set_op[dot_notation_key] = instruction.strip()
        else:
            # Skip invalid
            pass

    return categories_to_set_op, categories_to_unset_op

MongoDB Operations:

Add/Update: $set: {"categories.Politics": "Avoid political discussions"}
Delete: $unset: {"categories.Finance": ""}

POST `/v2/guardrails/default`¶

Purpose: Create or update system-wide default guardrails

Request:

POST /v2/guardrails/default
Content-Type: application/json

{
    "name": "Default System Guardrails",
    "description": "Standard content restrictions.",
    "categories": {
        "Politics": "Please refrain from discussing politics.",
        "Religion": "Discussions on religion are not permitted.",
        "Profanity": "Avoid using profanity or offensive language."
    }
}

To Remove a Category:

{
  "name": "Default System Guardrails",
  "categories": {
    "Politics": "Strictly no political discussions.",
    "Religion": null
  }
}

Implementation:

now = datetime.utcnow()
query = {
    "user_id": "_system_default_user_",
    "project_id": "_system_default_project_"
}

categories_to_set, categories_to_unset = prepare_category_updates(input_categories)

update_payload = {
    "$set": {
        "name": name,
        "description": description,
        "updated_at": now,
        **categories_to_set  # e.g., {"categories.Politics": "..."}
    },
    "$setOnInsert": {
        "user_id": "_system_default_user_",
        "project_id": "_system_default_project_",
        "created_at": now
    }
}

if categories_to_unset:
    update_payload["$unset"] = {key: "" for key in categories_to_unset}

result = guardrails_collection.update_one(query, update_payload, upsert=True)

Response:

{
  "_id": "507f1f77bcf86cd799439011",
  "user_id": "_system_default_user_",
  "project_id": "_system_default_project_",
  "name": "Default System Guardrails",
  "description": "Standard content restrictions.",
  "categories": {
    "Politics": "Please refrain from discussing politics.",
    "Profanity": "Avoid using profanity or offensive language."
  },
  "created_at": "2024-01-15T10:00:00.000Z",
  "updated_at": "2024-01-15T10:30:00.000Z",
  "is_system_default_source": true
}

PUT `/v2/guardrails/user/{user_id}/project/{project_id}`¶

Purpose: Create/update user-specific guardrails

Request:

PUT /v2/guardrails/user/User-123/project/User-123_Project_1
Content-Type: application/json

{
    "name": "My Custom Guardrails",
    "description": "Additional restrictions for my chatbot",
    "categories": {
        "Crypto": "Do not discuss cryptocurrency investments.",
        "Health": "Avoid providing medical advice."
    }
}

Validation:

if user_id == "_system_default_user_" and project_id == "_system_default_project_":
    raise HTTPException(
        status_code=400,
        detail="Cannot modify default guardrails via this endpoint. Use /v2/guardrails/default."
    )

Implementation: Same as default guardrails but with real user_id/project_id

Response:

{
  "_id": "507f1f77bcf86cd799439012",
  "user_id": "User-123",
  "project_id": "User-123_Project_1",
  "name": "My Custom Guardrails",
  "categories": {
    "Crypto": "Do not discuss cryptocurrency investments.",
    "Health": "Avoid providing medical advice."
  },
  "created_at": "2024-01-15T11:00:00.000Z",
  "updated_at": "2024-01-15T11:00:00.000Z",
  "is_system_default_source": false
}

GET `/v2/guardrails`¶

Purpose: Retrieve guardrails with automatic fallback

Request:

GET /v2/guardrails?user_id=User-123&project_id=User-123_Project_1

Flow:

Try User-Specific:

user_specific_doc = guardrails_collection.find_one({
    "user_id": user_id,
    "project_id": project_id
})

if user_specific_doc:
    return user_specific_doc

Fallback to Default:

default_doc = guardrails_collection.find_one({
    "user_id": "_system_default_user_",
    "project_id": "_system_default_project_"
})

if default_doc:
    return default_doc

No Guardrails Found:

raise HTTPException(
    status_code=404,
    detail="No guardrails configured for this user/project, and no system default guardrails found."
)

Response Indicator:

{
    "_id": "507f1f77bcf86cd799439011",
    "user_id": "_system_default_user_",
    "project_id": "_system_default_project_",
    "name": "Default System Guardrails",
    "categories": {...},
    "is_system_default_source": true
}

is_system_default_source Flag:

true → Returned default guardrails (user-specific not found)
false → Returned user-specific guardrails

Empty Categories Handling:

if "categories" not in doc_to_return or not isinstance(doc_to_return.get("categories"), dict):
    doc_to_return["categories"] = {}

Screenshot Management¶

Purpose¶

Store and serve mobile/desktop screenshots for chatbot UI simulation/demo purposes.

Azure Blob Storage Configuration¶

STORAGE_ACCOUNT_NAME = os.getenv("AZURE_STORAGE_ACCOUNT_NAME", "qablobmachineagents")
STORAGE_ACCOUNT_KEY = os.getenv("AZURE_STORAGE_ACCOUNT_KEY", "kRXPNm77...")  # ⚠️ HARDCODED FALLBACK
SCREENSHOTS_CONTAINER_NAME = os.getenv("AZURE_SCREENSHOTS_CONTAINER_NAME", "screenshots")

blob_service_client = BlobServiceClient(
    account_url=f"https://{STORAGE_ACCOUNT_NAME}.blob.core.windows.net",
    credential=STORAGE_ACCOUNT_KEY
)

Container Auto-Creation:

try:
    container_client.create_container()
except Exception as ex:
    if 'ContainerAlreadyExists' not in str(ex):
        raise

POST `/screenshots/upload`¶

Purpose: Upload mobile or desktop screenshot

Request:

POST /screenshots/upload
Content-Type: multipart/form-data

project_id=User-123_Project_1
user_id=User-123
type=mobile
file=<binary image data>

Parameters:

project_id (Form) - Project identifier
user_id (Form) - User identifier
type (Form) - Enum: mobile | desktop
file (File) - Image file

Blob Naming:

sanitized_filename = "".join(
    c for c in file.filename
    if c.isalnum() or c in (' ', '.', '_')
).rstrip().replace(' ', '_')

blob_name = f"{project_id}/{user_id}/{type.value}/{sanitized_filename}"
# Example: User-123_Project_1/User-123/mobile/homepage_screenshot.png

Upload:

blob_client = container_client.get_blob_client(blob_name)
file.file.seek(0)
blob_client.upload_blob(file.file, overwrite=True)

Response:

{
  "message": "Screenshot uploaded successfully",
  "url": "https://qablobmachineagents.blob.core.windows.net/screenshots/User-123_Project_1/User-123/mobile/homepage_screenshot.png"
}

GET `/screenshots`¶

Purpose: Retrieve screenshot with device-based fallback

Request:

GET /screenshots?project_id=User-123_Project_1&user_id=User-123&device=mobile

Parameters:

project_id (Query) - Project identifier
user_id (Query) - User identifier
device (Query, Optional) - mobile | desktop

Device Detection Logic:

From Query Param:

requested_type = device  # If provided

From User-Agent Header:

if requested_type is None:
    user_agent = request.headers.get("User-Agent", "").lower()
    if "mobile" in user_agent:
        requested_type = ScreenshotType.mobile
    else:
        requested_type = ScreenshotType.desktop

Retrieval Flow:

Search for Requested Type:

blob_name_requested = f"{project_id}/{user_id}/{requested_type.value}/"
blob_list = list(container_client.list_blobs(name_starts_with=blob_name_requested))

if blob_list:
    first_blob = blob_list[0]
    blob_url = f"https://{STORAGE_ACCOUNT_NAME}.blob.core.windows.net/{SCREENSHOTS_CONTAINER_NAME}/{first_blob.name}"
    return {"url": blob_url, "type": requested_type.value}

Fallback to Opposite Type:

fallback_type = ScreenshotType.desktop if requested_type == ScreenshotType.mobile else ScreenshotType.mobile
blob_name_fallback = f"{project_id}/{user_id}/{fallback_type.value}/"
blob_list_fallback = list(container_client.list_blobs(name_starts_with=blob_name_fallback))

if blob_list_fallback:
    # Return fallback screenshot
    return {"url": blob_url, "type": fallback_type.value}

No Screenshots Found:

raise HTTPException(status_code=404, detail="No screenshots found for this user and project.")

Response:

{
  "url": "https://qablobmachineagents.blob.core.windows.net/screenshots/User-123_Project_1/User-123/mobile/homepage.png",
  "type": "mobile"
}

Use Case: Frontend displays chatbot UI preview using uploaded screenshots

Security Analysis¶

🔴 CRITICAL: Hardcoded Azure Storage Credentials¶

Lines 985-986:

STORAGE_ACCOUNT_NAME = os.getenv("AZURE_STORAGE_ACCOUNT_NAME", "qablobmachineagents")
STORAGE_ACCOUNT_KEY = os.getenv("AZURE_STORAGE_ACCOUNT_KEY", "kRXPNm77OyebjhfvHSchOHE1KwpkQefdjZLt4k/Nwajf3xUO+HIts2+hoBmF1iiO9Gv8Z9JbYH/v+ASt1ubG5w==")

Risk: Full access to Azure Blob Storage account + all containers

Impact:

Upload malicious files
Delete all screenshots
Access other containers
High storage cost (upload spam)

Fix:

STORAGE_ACCOUNT_KEY = os.getenv("AZURE_STORAGE_ACCOUNT_KEY")
if not STORAGE_ACCOUNT_KEY:
    raise RuntimeError("AZURE_STORAGE_ACCOUNT_KEY environment variable not set")

🟠 SECURITY: Overly Permissive CORS¶

Lines 29-35:

app.add_middleware(
    CORS Middleware,
    allow_origins=["*"],
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)

Risk: Cross-Origin Resource Sharing from ANY domain

Fix:

allow_origins=[
    "https://app.machineagents.ai",
    "https://admin.machineagents.ai"
]

🟡 CODE QUALITY: Same Collection for Different Trash Types¶

Lines 69, 75:

trash_chatbot_collection = db["trash_collection_name"]
trash_project_collection = db["trash_collection_name"]

Issue: Both variables point to same collection - differentiation only by document structure

Better Design:

trash_chatbot_collection = db["trash_chatbots"]
trash_project_collection = db["trash_projects"]

🟡 DATA INTEGRITY: Milvus/Blob Not Restored¶

Lines 154-265: Deletion flow hard-deletes Milvus + Blob data

Issue: Trash restore only recovers CosmosDB metadata - embeddings/files gone forever

User Impact:

Restored chatbot has no knowledge base
RAG system returns no results
User must re-upload all documents

Solution: Consider not hard-deleting Milvus/Blob immediately, or warn user prominently

🟢 GOOD PRACTICE: Shared Service Integration¶

Lines 23-24:

from database.milvus_embeddings_service import get_milvus_embeddings_service
from storage.azure_blob_service import get_azure_blob_service

Benefit: Centralized Milvus/Blob logic across all services

🟢 GOOD PRACTICE: Resilient Deletion¶

Lines 183-185, 209-210:

except Exception as milvus_error:
    logger.error(f"❌ Error deleting Milvus embeddings: {milvus_error}")
    # Continue with deletion even if Milvus fails

Benefit: Partial deletion better than complete failure

Integration Points¶

1. Response Chatbot Services Integration¶

Deleted Chatbot Check:

Response services (3D/Text/Voice) check isDeleted flag:

chatbot_config = chatbot_collection.find_one({
    "user_id": user_id,
    "project_id": project_id,
    "$or": [
        {"isDeleted": {"$exists": False}},
        {"isDeleted": False}
    ]
})

if not chatbot_config or chatbot_config.get("isDeleted") is True:
    return {"error": "This chatbot is no longer available."}

Soft Delete Advantage: Response services immediately stop serving deleted chatbots

2. Data Crawling Service Integration¶

Blob Storage Cleanup:

When chatbot deleted, this service removes blobs crawled by Data Crawling Service:

blob_prefix = f"{user_id}/{project_id}/"
# Deletes all: PDFs, extracted text, images uploaded via crawling

3. Shared Service Dependencies¶

Milvus Embeddings Service:

milvus_service.delete_embeddings_by_user_project(collection_name, user_id, project_id)

Azure Blob Service:

azure_blob.list_blobs(prefix=blob_prefix)
azure_blob.delete_blob(blob_name)

4. Frontend Integration¶

Trash UI Flow:

User clicks "Delete Chatbot" → Frontend calls DELETE /v2/delete-chatbot-selection
Chatbot moves to trash (7-day window)
User views trash → Frontend calls GET /v2/get-trashed-chatbot-sevendays
User restores → Frontend calls GET /v2/get-trashed-chatbots
User permanently deletes → Frontend calls DELETE /v2/delete-trashed-chatbots

Guardrails UI Flow:

Admin sets defaults → POST /v2/guardrails/default
User customizes → PUT /v2/guardrails/user/{user_id}/project/{project_id}
Response services fetch → GET /v2/guardrails
Response services apply restrictions based on categories

Screenshot UI Flow:

User uploads chatbot UI preview → POST /screenshots/upload
Marketing page displays preview → GET /screenshots
Mobile users auto-detected via User-Agent

5. Guardrails Consumer Services¶

Response 3D Chatbot Service:

guardrails = requests.get(f"{MAINTENANCE_SERVICE_URL}/v2/guardrails?user_id={user_id}&project_id={project_id}")
# Apply guardrails.categories to LLM system prompt

Integration Pattern: Same for Text/Voice chatbot services

Summary¶

Service Statistics¶

Total Lines: 1,129
Total Endpoints: 12
Total Collections: 11 (MongoDB) + 1 (Milvus) + 1 (Azure Blob)
Shared Services: 2 (Milvus, Azure Blob)
Security Issues: 2 critical (hardcoded key, CORS)

Key Capabilities¶

✅ Tri-Storage Deletion - Coordinated Milvus + Blob + CosmosDB cleanup
✅ 7-Day Trash Recovery - Soft delete with restore window
✅ Document Content Updates - Re-embed and update knowledge base
✅ Hierarchical Guardrails - User-specific with system fallback
✅ Screenshot Management - Device-specific with auto-fallback

Critical Fixes Needed¶

🔴 Externalize Azure Storage credentials
🟠 Restrict CORS to known origins
🟡 Split trash collections (chatbots vs projects)
🟡 Warn users about Milvus/Blob permanent deletion
🟡 Sync Milvus when updating document content

Deployment Notes¶

Docker Compose (Port 8010):

chatbot-maintenance-service:
  build: ./chatbot-maintence-service
  container_name: chatbot-maintenance-service
  ports:
    - "8010:8010"
  volumes:
    - ./shared:/app/shared:ro
  environment:
    - MONGO_URI=...
    - AZURE_STORAGE_ACCOUNT_NAME=qablobmachineagents
    - AZURE_STORAGE_ACCOUNT_KEY=***
    - AZURE_SCREENSHOTS_CONTAINER_NAME=screenshots-dev

Shared Volume Required: /app/shared for Milvus/Blob services

Documentation Complete: Chatbot Maintenance Service (Port 8010)
Status: COMPREHENSIVE, DEVELOPER-GRADE, INVESTOR-GRADE, AUDIT-READY ✅

Chatbot Maintenance Service (Port 8010)¶

Table of Contents¶

Service Overview¶

Primary Responsibilities¶

Architecture & Dependencies¶

Technology Stack¶

Key Imports¶

Environment Variables¶

Embedding Model Configuration¶

Database Collections¶

11 MongoDB Collections¶

Core Features¶

1. Tri-Storage Deletion Architecture¶

2. 7-Day Trash System¶

3. Document Content Update System¶

4. Hierarchical Guardrails System¶

5. Screenshot Upload & Retrieval¶

API Endpoints¶

Endpoint Summary¶

Chatbot Deletion System¶

DELETE /v2/delete-chatbot-selection¶

Step 1: Delete Milvus Embeddings (Hard Delete)¶

Step 2: Delete Azure Blob Files (Hard Delete)¶

Step 3: Soft Delete CosmosDB Documents¶

Error Handling¶

Trash Management System¶

GET /v2/get-trashed-chatbot-sevendays¶

GET /v2/get-trashed-chatbots - RESTORE Function¶

DELETE /v2/delete-trashed-chatbots¶

Document Content Management¶

GET /v2/get_extracted_text/{user_id}/{project_id}¶

POST /v2/update_document_content/{user_id}/{project_id}¶

1. Find Existing Document¶

2. Generate New Embedding¶

3. Create Chunk¶

4. Update or Insert¶

Guardrails System¶

Architecture¶

Data Model¶

Category Update Logic¶

POST /v2/guardrails/default¶

PUT /v2/guardrails/user/{user_id}/project/{project_id}¶

GET /v2/guardrails¶

Screenshot Management¶

Purpose¶

Azure Blob Storage Configuration¶

POST /screenshots/upload¶

GET /screenshots¶

Security Analysis¶

🔴 CRITICAL: Hardcoded Azure Storage Credentials¶

🟠 SECURITY: Overly Permissive CORS¶

🟡 CODE QUALITY: Same Collection for Different Trash Types¶

🟡 DATA INTEGRITY: Milvus/Blob Not Restored¶

🟢 GOOD PRACTICE: Shared Service Integration¶

🟢 GOOD PRACTICE: Resilient Deletion¶

Integration Points¶

1. Response Chatbot Services Integration¶

2. Data Crawling Service Integration¶

3. Shared Service Dependencies¶

4. Frontend Integration¶

5. Guardrails Consumer Services¶

Summary¶

Service Statistics¶

Key Capabilities¶

Critical Fixes Needed¶

Deployment Notes¶

DELETE `/v2/delete-chatbot-selection`¶

GET `/v2/get-trashed-chatbot-sevendays`¶

GET `/v2/get-trashed-chatbots` - RESTORE Function¶

DELETE `/v2/delete-trashed-chatbots`¶

GET `/v2/get_extracted_text/{user_id}/{project_id}`¶

POST `/v2/update_document_content/{user_id}/{project_id}`¶

POST `/v2/guardrails/default`¶

PUT `/v2/guardrails/user/{user_id}/project/{project_id}`¶

GET `/v2/guardrails`¶

POST `/screenshots/upload`¶

GET `/screenshots`¶