Skip to content

Chatbot Maintenance Service (Port 8010)

Service Path: machineagents-be/chatbot-maintence-service/
Port: 8010
Total Lines: 1,129
Purpose: Comprehensive chatbot lifecycle management including deletion (soft/hard), trash recovery, document content management, guardrails configuration, and screenshot storage for chatbot UI simulation.


Table of Contents

  1. Service Overview
  2. Architecture & Dependencies
  3. Database Collections
  4. Core Features
  5. API Endpoints
  6. Chatbot Deletion System
  7. Trash Management System
  8. Document Content Management
  9. Guardrails System
  10. Screenshot Management
  11. Security Analysis
  12. Integration Points

Service Overview

Primary Responsibilities

  1. Chatbot Lifecycle Management:

  2. Soft delete with isDeleted flag (CosmosDB)

  3. Hard delete of embeddings (Milvus)
  4. Hard delete of files (Azure Blob Storage)
  5. Complete cleanup across 11 collections

  6. Trash & Recovery:

  7. 7-day trash retention window

  8. Restore deleted chatbots
  9. Permanent deletion from trash

  10. Document Content Updates:

  11. Modify extracted text

  12. Regenerate embeddings
  13. Update chunks
  14. Create new documents

  15. Guardrails Configuration:

  16. System-wide default guardrails

  17. User/project-specific guardrails
  18. Category-based content restrictions
  19. Hierarchical fallback system

  20. Screenshot Management:

  21. Upload mobile/desktop screenshots
  22. Azure Blob Storage integration
  23. Device-specific retrieval with fallback
  24. User-Agent detection

Architecture & Dependencies

Technology Stack

Framework:

  • FastAPI (web framework)
  • Uvicorn (ASGI server)

Databases:

  • MongoDB (CosmosDB) - 11 collections
  • Milvus - Vector embeddings (via shared service)

Storage:

  • Azure Blob Storage - Screenshots & files

AI/ML:

  • FastEmbed (BAAI/bge-small-en-v1.5) - Embeddings

Shared Services:

  • database/milvus_embeddings_service - Milvus operations
  • storage/azure_blob_service - Azure Blob operations

Key Imports

from fastapi import FastAPI, HTTPException, Query, UploadFile, File, Form, Request, Body
from pymongo import MongoClient
from datetime import datetime, timedelta
from bson import ObjectId
from pydantic import BaseModel, Field
from typing import List, Dict, Any, Optional, Tuple
from azure.storage.blob import BlobServiceClient
from fastembed.embedding import FlagEmbedding as Embedding

# Shared services
sys.path.insert(0, '/app/shared')
from database.milvus_embeddings_service import get_milvus_embeddings_service
from storage.azure_blob_service import get_azure_blob_service

Environment Variables

MONGO_URI=mongodb://...
MONGO_DB_NAME=Machine_agent_dev
AZURE_STORAGE_ACCOUNT_NAME=qablobmachineagents
AZURE_STORAGE_ACCOUNT_KEY=kRXPNm77...  # ⚠️ HARDCODED FALLBACK
AZURE_SCREENSHOTS_CONTAINER_NAME=screenshots-dev

Embedding Model Configuration

embedder = Embedding(model_name="BAAI/bge-small-en-v1.5", max_length=512)

Same model as:

  • Data Crawling Service
  • Response 3D/Text/Voice Chatbot Services
  • Selection Chatbot Service

Database Collections

11 MongoDB Collections

users_collection = db["users_multichatbot_v2"]              # User accounts
chatbot_collection = db["chatbot_selections"]               # Chatbot configs
trash_chatbot_collection = db["trash_collection_name"]      # Deleted chatbots (7 days)
selection_collection = db["selection_history"]              # Avatar/voice/model selections
files_collection = db["files"]                              # Uploaded documents
guardrails_collection = db["chatbot_guardrails"]            # Content restrictions
user_system_prompt_collection = db["system_prompts_user"]  # Custom prompts
projectid_collection = db["projectid_creation"]             # Project metadata
trash_project_collection = db["trash_collection_name"]      # Deleted projects
generate_greeting_collection = db["generate_greeting"]      # Avatar greetings
files_collection2 = db["organisation_data"]                 # Organization data
files_secondary_collection = db["files_secondary"]          # Secondary files
history_collection = db["chatbot_history"]                  # Chat logs
lead_collection = db["LEAD_COLLECTION"]                     # Lead forms

Note: trash_chatbot_collection and trash_project_collection both use the same collection "trash_collection_name" - documents are differentiated by having project_id field or not.


Core Features

1. Tri-Storage Deletion Architecture

The Problem: Data exists in 3 different storage systems
The Solution: Coordinated deletion across all three

Storage System Deletion Type Strategy
Milvus Hard Delete Immediate permanent deletion of embeddings
Azure Blob Hard Delete Immediate permanent deletion of files
CosmosDB Soft Delete Set isDeleted: true + deleted_at timestamp

Why Different Strategies?

  • Milvus: No soft delete feature - embeddings must be purged
  • Azure Blob: Storage cost optimization - delete immediately
  • CosmosDB: Business logic - 7-day recovery window for users

2. 7-Day Trash System

Retention Policy:

  • Deleted chatbots move to trash_collection_name
  • Available for restore within 7 days
  • Auto-purge after 7 days (not implemented in code - manual)

Recovery Flow:

  1. User requests restore
  2. System finds trash docs (WHERE deleted_at >= 7 days ago)
  3. Removes deleted_at and selection_model fields
  4. Re-inserts into main collections
  5. Deletes from trash

Limitation: Milvus/Blob data is GONE - only metadata recovers

3. Document Content Update System

Use Case: User wants to edit pre-crawled content without re-uploading

Capabilities:

  • Update extracted_text
  • Regenerate embeddings with FastEmbed
  • Recreate chunks (single chunk per document)
  • Update last_modified timestamp
  • Create new documents if none exist

Upsert Logic:

if existing_doc:
    update_one()  # Update existing
else:
    insert_one()  # Create new

4. Hierarchical Guardrails System

3-Level Priority:

  1. User-specific (user_id + project_id) - Highest priority
  2. System default (_system_default_user_ + _system_default_project_)
  3. None - No guardrails

Category Management:

  • Add/Update: Set category instruction to string
  • Delete: Set category instruction to null
  • Dynamic topics: Unlimited custom categories

5. Screenshot Upload & Retrieval

Device Types:

  • mobile - Mobile device screenshots
  • desktop - Desktop screenshots

Storage Path:

project_id/user_id/device_type/filename

Retrieval Logic:

  1. Detect device from query param OR User-Agent header
  2. Search for requested device type
  3. Fallback: If not found, try opposite device type
  4. Return public blob URL

API Endpoints

Endpoint Summary

Endpoint Method Purpose
/v2/get-chatbot-selection GET Fetch chatbot config (non-deleted)
/v2/delete-chatbot-selection DELETE Tri-storage deletion
/v2/get-trashed-chatbot-sevendays GET List all trashed chatbots (7 days)
/v2/get-trashed-chatbots GET Restore specific chatbot from trash
/v2/delete-trashed-chatbots DELETE Permanently delete from trash
/v2/get_extracted_text/{user_id}/{project_id} GET Retrieve categorized documents
/v2/update_document_content/{user_id}/{project_id} POST Update/create documents
/v2/guardrails/default POST Create/update system default guardrails
/v2/guardrails/user/{user_id}/project/{project_id} PUT Upsert user-specific guardrails
/v2/guardrails GET Get guardrails with fallback
/screenshots/upload POST Upload screenshot to Azure Blob
/screenshots GET Retrieve screenshot with device fallback

Total: 12 endpoints


Chatbot Deletion System

DELETE /v2/delete-chatbot-selection

Purpose: Complete chatbot deletion across all storage systems

Request:

DELETE /v2/delete-chatbot-selection?user_id=User-123&project_id=User-123_Project_1

Flow:

Step 1: Delete Milvus Embeddings (Hard Delete)

milvus_service = get_milvus_embeddings_service()
deleted = milvus_service.delete_embeddings_by_user_project(
    collection_name="embeddings",
    user_id=user_id,
    project_id=project_id
)

What Gets Deleted:

  • All vector embeddings for this project
  • Indexed documents in Milvus
  • NOT RECOVERABLE

Step 2: Delete Azure Blob Files (Hard Delete)

azure_blob = get_azure_blob_service()
blob_prefix = f"{user_id}/{project_id}/"
blobs = azure_blob.list_blobs(prefix=blob_prefix)

for blob in blobs:
    azure_blob.delete_blob(blob.get('name'))

What Gets Deleted:

  • All uploaded files (PDFs, docs, images)
  • Screenshot images
  • NOT RECOVERABLE

Step 3: Soft Delete CosmosDB Documents

11 Collections Updated:

collections_to_check = [
    ("chatbot_selections", chatbot_collection),
    ("selection_history", selection_collection),
    ("organisation_data", files_collection2),
    ("files", files_collection),
    ("chatbot_guardrails", guardrails_collection),
    ("system_prompts_user", user_system_prompt_collection),
    ("files_secondary", files_secondary_collection),
    ("generate_greeting", generate_greeting_collection),
    ("chatbot_history", history_collection),
    ("LEAD_COLLECTION", lead_collection),
    ("projectid_creation", projectid_collection)
]

for name, collection in collections_to_check:
    collection.update_many(
        {"user_id": user_id, "project_id": project_id},
        {"$set": {"isDeleted": True, "deleted_at": deletion_timestamp}}
    )

Response:

{
  "message": "Chatbot deleted successfully",
  "details": {
    "milvus_embeddings_deleted": true,
    "azure_blobs_deleted": 5,
    "cosmos_documents_soft_deleted": 23,
    "deleted_at": "2024-01-15T10:30:00.000Z"
  }
}

Error Handling

Resilient Design:

  • Milvus failure → Log error, continue deletion
  • Blob failure → Log error, continue deletion
  • CosmosDB failure → Raises HTTPException

Why? Partial deletion is better than no deletion - user can retry


Trash Management System

GET /v2/get-trashed-chatbot-sevendays

Purpose: List all trashed chatbots for a user

Request:

GET /v2/get-trashed-chatbot-sevendays?user_id=User-123

Query Logic:

seven_days_ago = datetime.utcnow() - timedelta(days=7)
trashed_chatbots = trash_chatbot_collection.find({
    "user_id": user_id,
    "deleted_at": {"$gte": seven_days_ago}
})

Response:

{
  "user_id": "User-123",
  "trashed_chatbots": [
    {
      "_id": "507f1f77bcf86cd799439011",
      "project_id": "User-123_Project_1",
      "selection_avatar": "Avatar_Lisa",
      "selection_voice": "Female_1",
      "deleted_at": "2024-01-10T10:30:00.000Z"
    }
  ]
}

GET /v2/get-trashed-chatbots - RESTORE Function

Purpose: Recover deleted chatbot from trash

Request:

GET /v2/get-trashed-chatbots?user_id=User-123&project_id=User-123_Project_1

Flow:

  1. Find Trash Documents:
trashed_chatbots = trash_chatbot_collection.find({
    "user_id": user_id,
    "project_id": project_id,
    "deleted_at": {"$gte": seven_days_ago}
})

trashed_projects = trash_project_collection.find({
    "user_id": user_id,
    "project_id": project_id,
    "deleted_at": {"$gte": seven_days_ago}
})
  1. Clean & Restore:
# Remove deletion metadata
restored_chatbots = [{
    k: v for k, v in cb.items()
    if k not in ["_id", "deleted_at", "selection_model"]
} for cb in trashed_chatbots]

# Re-insert into main collections
chatbot_collection.insert_many(restored_chatbots)
projectid_collection.insert_many(restored_projects)
  1. Delete from Trash:
trash_chatbot_collection.delete_many({"_id": {"$in": chatbot_object_ids}})
trash_project_collection.delete_many({"_id": {"$in": project_object_ids}})

Response:

{
    "trashed_chatbots": [...],
    "trashed_projects": [...],
    "restored_chatbots_count": 1,
    "restored_projects_count": 1
}

⚠️ LIMITATION: Milvus embeddings and Azure Blobs are NOT restored (hard deleted)

DELETE /v2/delete-trashed-chatbots

Purpose: Permanently delete from trash (no recovery)

Request:

DELETE /v2/delete-trashed-chatbots?user_id=User-123&project_id=User-123_Project_1

Implementation:

chatbot_result = trash_chatbot_collection.delete_many({
    "user_id": user_id,
    "project_id": project_id
})

project_result = trash_project_collection.delete_many({
    "user_id": user_id,
    "project_id": project_id
})

Response:

{
  "message": "2 trashed chatbot(s) and 1 trashed project(s) permanently deleted.",
  "chatbots_deleted": 2,
  "projects_deleted": 1,
  "total_deleted": 3
}

Document Content Management

GET /v2/get_extracted_text/{user_id}/{project_id}

Purpose: Retrieve all extracted text categorized by file type

Request:

GET /v2/get_extracted_text/User-123/User-123_Project_1

Query:

documents = files_collection.find(
    {"user_id": user_id, "project_id": project_id},
    {"file_blob": 0, "embeddings": 0}  # Exclude large fields
)

Categorization Logic:

categorized_data = {
    "files": [],    # PDF, DOCX, JSON, CSV, etc.
    "qna": [],      # Q&A pairs
    "text": [],     # Plain text/TXT files
    "url": []       # Crawled web content
}

for doc in documents:
    file_type = doc.get("file_type", "unknown").lower()
    text_content = doc.get("extracted_text")

    if file_type == "url" and isinstance(text_content, dict):
        # URL content: separate entries per URL
        for url, content in text_content.items():
            categorized_data["url"].append({
                "id": doc_id,
                "file_type": file_type,
                "url": url,
                "content": content
            })
    elif file_type == "qna":
        categorized_data["qna"].append(entry)
   elif file_type in ("text", "txt"):
        categorized_data["text"].append(entry)
    else:
        categorized_data["files"].append(entry)

Response:

{
  "user_id": "User-123",
  "project_id": "User-123_Project_1",
  "categorized_texts": {
    "files": [
      {
        "id": "507f1f77bcf86cd799439011",
        "file_type": "pdf",
        "content": "Financial report Q4 2023..."
      }
    ],
    "qna": [
      {
        "id": "507f1f77bcf86cd799439012",
        "file_type": "qna",
        "content": "Q: What are your hours? A: 9 AM - 5 PM"
      }
    ],
    "url": [
      {
        "id": "507f1f77bcf86cd799439013",
        "file_type": "url",
        "url": "https://example.com/about",
        "content": "About us page content..."
      }
    ]
  }
}

Empty Category Removal:

final_result = {
    key: value for key, value in categorized_data.items() if value
}

POST /v2/update_document_content/{user_id}/{project_id}

Purpose: Update or create document content with new embeddings

Request:

POST /v2/update_document_content/User-123/User-123_Project_1
Content-Type: application/json

{
    "documents": [
        {
            "content": "Updated company information...",
            "file_type": "text"
        }
    ]
}

Flow:

1. Find Existing Document

existing_doc = files_collection.find_one({
    "user_id": user_id,
    "project_id": project_id
})

2. Generate New Embedding

embedding_generator = embedder.embed([new_content])
embedding_list = list(embedding_generator)
new_embedding = list(map(float, embedding_list[0]))

Model: BAAI/bge-small-en-v1.5 (384 dimensions)

3. Create Chunk

new_chunks = [{
    "chunk_index": 0,
    "content": new_content,
    "start_pos": 0,
    "end_pos": len(new_content),
    "length": len(new_content)
}]

Note: Single chunk per document (no chunking strategy)

4. Update or Insert

If Document Exists:

files_collection.update_one(
    {"_id": doc_id},
    {"$set": {
        "extracted_text": new_content,
        "embeddings": new_embedding,
        "chunks": new_chunks,
        "last_modified": datetime.utcnow()
    }}
)

If Document Does NOT Exist:

new_doc = {
    "user_id": user_id,
    "project_id": project_id,
    "extracted_text": new_content,
    "embeddings": new_embedding,
    "chunks": new_chunks,
    "file_type": file_type,
    "created_at": datetime.utcnow(),
    "last_modified": datetime.utcnow()
}
files_collection.insert_one(new_doc)

Response:

{
  "user_id": "User-123",
  "project_id": "User-123_Project_1",
  "total_documents": 2,
  "updated_documents": 1,
  "failed_documents": 1,
  "results": [
    {
      "doc_id": "507f1f77bcf86cd799439011",
      "status": "success",
      "message": "Document updated successfully"
    },
    {
      "doc_id": "507f1f77bcf86cd799439014",
      "status": "created",
      "message": "Document created successfully"
    }
  ]
}

⚠️ LIMITATION: Only updates files collection, NOT Milvus - embeddings become out of sync


Guardrails System

Architecture

Purpose: Content moderation and topic restrictions for chatbot responses

3-Level Hierarchy:

1. User-Specific Guardrails (user_id + project_id)
        ↓ (if not found)
2. System Default Guardrails (_system_default_user_ + _system_default_project_)
        ↓ (if not found)
3. No Guardrails (empty response)

Data Model

Pydantic Models:

class GuardrailBase(BaseModel):
    name: str
    description: Optional[str] = None
    categories: Dict[str, str] = Field(default_factory=dict)

class GuardrailCreateUpdateWithCategories(BaseModel):
    name: str
    description: Optional[str] = None
    categories: Dict[str, Optional[str]] = Field(default_factory=dict)

class GuardrailAPIResponse(GuardrailBase):
    id: str = Field(alias="_id")
    user_id: str
    project_id: str
    created_at: datetime
    updated_at: datetime
    is_system_default_source: bool = False

Key Difference:

  • Input (GuardrailCreateUpdateWithCategories): categories values can be Optional[str] (null = delete)
  • Output (GuardrailAPIResponse): categories values are always str

Category Update Logic

Helper Function:

def prepare_category_updates(input_categories: Dict[str, Optional[str]]) -> Tuple[Dict[str, str], List[str]]:
    categories_to_set_op: Dict[str, str] = {}
    categories_to_unset_op: List[str] = []

    for category_name, instruction in input_categories.items():
        dot_notation_key = f"categories.{category_name.strip()}"

        if instruction is None:
            # Mark for removal
            categories_to_unset_op.append(dot_notation_key)
        elif isinstance(instruction, str) and instruction.strip():
            # Mark for set/update
            categories_to_set_op[dot_notation_key] = instruction.strip()
        else:
            # Skip invalid
            pass

    return categories_to_set_op, categories_to_unset_op

MongoDB Operations:

  • Add/Update: $set: {"categories.Politics": "Avoid political discussions"}
  • Delete: $unset: {"categories.Finance": ""}

POST /v2/guardrails/default

Purpose: Create or update system-wide default guardrails

Request:

POST /v2/guardrails/default
Content-Type: application/json

{
    "name": "Default System Guardrails",
    "description": "Standard content restrictions.",
    "categories": {
        "Politics": "Please refrain from discussing politics.",
        "Religion": "Discussions on religion are not permitted.",
        "Profanity": "Avoid using profanity or offensive language."
    }
}

To Remove a Category:

{
  "name": "Default System Guardrails",
  "categories": {
    "Politics": "Strictly no political discussions.",
    "Religion": null
  }
}

Implementation:

now = datetime.utcnow()
query = {
    "user_id": "_system_default_user_",
    "project_id": "_system_default_project_"
}

categories_to_set, categories_to_unset = prepare_category_updates(input_categories)

update_payload = {
    "$set": {
        "name": name,
        "description": description,
        "updated_at": now,
        **categories_to_set  # e.g., {"categories.Politics": "..."}
    },
    "$setOnInsert": {
        "user_id": "_system_default_user_",
        "project_id": "_system_default_project_",
        "created_at": now
    }
}

if categories_to_unset:
    update_payload["$unset"] = {key: "" for key in categories_to_unset}

result = guardrails_collection.update_one(query, update_payload, upsert=True)

Response:

{
  "_id": "507f1f77bcf86cd799439011",
  "user_id": "_system_default_user_",
  "project_id": "_system_default_project_",
  "name": "Default System Guardrails",
  "description": "Standard content restrictions.",
  "categories": {
    "Politics": "Please refrain from discussing politics.",
    "Profanity": "Avoid using profanity or offensive language."
  },
  "created_at": "2024-01-15T10:00:00.000Z",
  "updated_at": "2024-01-15T10:30:00.000Z",
  "is_system_default_source": true
}

PUT /v2/guardrails/user/{user_id}/project/{project_id}

Purpose: Create/update user-specific guardrails

Request:

PUT /v2/guardrails/user/User-123/project/User-123_Project_1
Content-Type: application/json

{
    "name": "My Custom Guardrails",
    "description": "Additional restrictions for my chatbot",
    "categories": {
        "Crypto": "Do not discuss cryptocurrency investments.",
        "Health": "Avoid providing medical advice."
    }
}

Validation:

if user_id == "_system_default_user_" and project_id == "_system_default_project_":
    raise HTTPException(
        status_code=400,
        detail="Cannot modify default guardrails via this endpoint. Use /v2/guardrails/default."
    )

Implementation: Same as default guardrails but with real user_id/project_id

Response:

{
  "_id": "507f1f77bcf86cd799439012",
  "user_id": "User-123",
  "project_id": "User-123_Project_1",
  "name": "My Custom Guardrails",
  "categories": {
    "Crypto": "Do not discuss cryptocurrency investments.",
    "Health": "Avoid providing medical advice."
  },
  "created_at": "2024-01-15T11:00:00.000Z",
  "updated_at": "2024-01-15T11:00:00.000Z",
  "is_system_default_source": false
}

GET /v2/guardrails

Purpose: Retrieve guardrails with automatic fallback

Request:

GET /v2/guardrails?user_id=User-123&project_id=User-123_Project_1

Flow:

  1. Try User-Specific:
user_specific_doc = guardrails_collection.find_one({
    "user_id": user_id,
    "project_id": project_id
})

if user_specific_doc:
    return user_specific_doc
  1. Fallback to Default:
default_doc = guardrails_collection.find_one({
    "user_id": "_system_default_user_",
    "project_id": "_system_default_project_"
})

if default_doc:
    return default_doc
  1. No Guardrails Found:
raise HTTPException(
    status_code=404,
    detail="No guardrails configured for this user/project, and no system default guardrails found."
)

Response Indicator:

{
    "_id": "507f1f77bcf86cd799439011",
    "user_id": "_system_default_user_",
    "project_id": "_system_default_project_",
    "name": "Default System Guardrails",
    "categories": {...},
    "is_system_default_source": true
}

is_system_default_source Flag:

  • true → Returned default guardrails (user-specific not found)
  • false → Returned user-specific guardrails

Empty Categories Handling:

if "categories" not in doc_to_return or not isinstance(doc_to_return.get("categories"), dict):
    doc_to_return["categories"] = {}

Screenshot Management

Purpose

Store and serve mobile/desktop screenshots for chatbot UI simulation/demo purposes.

Azure Blob Storage Configuration

STORAGE_ACCOUNT_NAME = os.getenv("AZURE_STORAGE_ACCOUNT_NAME", "qablobmachineagents")
STORAGE_ACCOUNT_KEY = os.getenv("AZURE_STORAGE_ACCOUNT_KEY", "kRXPNm77...")  # ⚠️ HARDCODED FALLBACK
SCREENSHOTS_CONTAINER_NAME = os.getenv("AZURE_SCREENSHOTS_CONTAINER_NAME", "screenshots")

blob_service_client = BlobServiceClient(
    account_url=f"https://{STORAGE_ACCOUNT_NAME}.blob.core.windows.net",
    credential=STORAGE_ACCOUNT_KEY
)

Container Auto-Creation:

try:
    container_client.create_container()
except Exception as ex:
    if 'ContainerAlreadyExists' not in str(ex):
        raise

POST /screenshots/upload

Purpose: Upload mobile or desktop screenshot

Request:

POST /screenshots/upload
Content-Type: multipart/form-data

project_id=User-123_Project_1
user_id=User-123
type=mobile
file=<binary image data>

Parameters:

  • project_id (Form) - Project identifier
  • user_id (Form) - User identifier
  • type (Form) - Enum: mobile | desktop
  • file (File) - Image file

Blob Naming:

sanitized_filename = "".join(
    c for c in file.filename
    if c.isalnum() or c in (' ', '.', '_')
).rstrip().replace(' ', '_')

blob_name = f"{project_id}/{user_id}/{type.value}/{sanitized_filename}"
# Example: User-123_Project_1/User-123/mobile/homepage_screenshot.png

Upload:

blob_client = container_client.get_blob_client(blob_name)
file.file.seek(0)
blob_client.upload_blob(file.file, overwrite=True)

Response:

{
  "message": "Screenshot uploaded successfully",
  "url": "https://qablobmachineagents.blob.core.windows.net/screenshots/User-123_Project_1/User-123/mobile/homepage_screenshot.png"
}

GET /screenshots

Purpose: Retrieve screenshot with device-based fallback

Request:

GET /screenshots?project_id=User-123_Project_1&user_id=User-123&device=mobile

Parameters:

  • project_id (Query) - Project identifier
  • user_id (Query) - User identifier
  • device (Query, Optional) - mobile | desktop

Device Detection Logic:

  1. From Query Param:
requested_type = device  # If provided
  1. From User-Agent Header:
if requested_type is None:
    user_agent = request.headers.get("User-Agent", "").lower()
    if "mobile" in user_agent:
        requested_type = ScreenshotType.mobile
    else:
        requested_type = ScreenshotType.desktop

Retrieval Flow:

  1. Search for Requested Type:
blob_name_requested = f"{project_id}/{user_id}/{requested_type.value}/"
blob_list = list(container_client.list_blobs(name_starts_with=blob_name_requested))

if blob_list:
    first_blob = blob_list[0]
    blob_url = f"https://{STORAGE_ACCOUNT_NAME}.blob.core.windows.net/{SCREENSHOTS_CONTAINER_NAME}/{first_blob.name}"
    return {"url": blob_url, "type": requested_type.value}
  1. Fallback to Opposite Type:
fallback_type = ScreenshotType.desktop if requested_type == ScreenshotType.mobile else ScreenshotType.mobile
blob_name_fallback = f"{project_id}/{user_id}/{fallback_type.value}/"
blob_list_fallback = list(container_client.list_blobs(name_starts_with=blob_name_fallback))

if blob_list_fallback:
    # Return fallback screenshot
    return {"url": blob_url, "type": fallback_type.value}
  1. No Screenshots Found:
raise HTTPException(status_code=404, detail="No screenshots found for this user and project.")

Response:

{
  "url": "https://qablobmachineagents.blob.core.windows.net/screenshots/User-123_Project_1/User-123/mobile/homepage.png",
  "type": "mobile"
}

Use Case: Frontend displays chatbot UI preview using uploaded screenshots


Security Analysis

🔴 CRITICAL: Hardcoded Azure Storage Credentials

Lines 985-986:

STORAGE_ACCOUNT_NAME = os.getenv("AZURE_STORAGE_ACCOUNT_NAME", "qablobmachineagents")
STORAGE_ACCOUNT_KEY = os.getenv("AZURE_STORAGE_ACCOUNT_KEY", "kRXPNm77OyebjhfvHSchOHE1KwpkQefdjZLt4k/Nwajf3xUO+HIts2+hoBmF1iiO9Gv8Z9JbYH/v+ASt1ubG5w==")

Risk: Full access to Azure Blob Storage account + all containers

Impact:

  • Upload malicious files
  • Delete all screenshots
  • Access other containers
  • High storage cost (upload spam)

Fix:

STORAGE_ACCOUNT_KEY = os.getenv("AZURE_STORAGE_ACCOUNT_KEY")
if not STORAGE_ACCOUNT_KEY:
    raise RuntimeError("AZURE_STORAGE_ACCOUNT_KEY environment variable not set")

🟠 SECURITY: Overly Permissive CORS

Lines 29-35:

app.add_middleware(
    CORS Middleware,
    allow_origins=["*"],
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)

Risk: Cross-Origin Resource Sharing from ANY domain

Fix:

allow_origins=[
    "https://app.machineagents.ai",
    "https://admin.machineagents.ai"
]

🟡 CODE QUALITY: Same Collection for Different Trash Types

Lines 69, 75:

trash_chatbot_collection = db["trash_collection_name"]
trash_project_collection = db["trash_collection_name"]

Issue: Both variables point to same collection - differentiation only by document structure

Better Design:

trash_chatbot_collection = db["trash_chatbots"]
trash_project_collection = db["trash_projects"]

🟡 DATA INTEGRITY: Milvus/Blob Not Restored

Lines 154-265: Deletion flow hard-deletes Milvus + Blob data

Issue: Trash restore only recovers CosmosDB metadata - embeddings/files gone forever

User Impact:

  • Restored chatbot has no knowledge base
  • RAG system returns no results
  • User must re-upload all documents

Solution: Consider not hard-deleting Milvus/Blob immediately, or warn user prominently

🟢 GOOD PRACTICE: Shared Service Integration

Lines 23-24:

from database.milvus_embeddings_service import get_milvus_embeddings_service
from storage.azure_blob_service import get_azure_blob_service

Benefit: Centralized Milvus/Blob logic across all services

🟢 GOOD PRACTICE: Resilient Deletion

Lines 183-185, 209-210:

except Exception as milvus_error:
    logger.error(f"❌ Error deleting Milvus embeddings: {milvus_error}")
    # Continue with deletion even if Milvus fails

Benefit: Partial deletion better than complete failure


Integration Points

1. Response Chatbot Services Integration

Deleted Chatbot Check:

Response services (3D/Text/Voice) check isDeleted flag:

chatbot_config = chatbot_collection.find_one({
    "user_id": user_id,
    "project_id": project_id,
    "$or": [
        {"isDeleted": {"$exists": False}},
        {"isDeleted": False}
    ]
})

if not chatbot_config or chatbot_config.get("isDeleted") is True:
    return {"error": "This chatbot is no longer available."}

Soft Delete Advantage: Response services immediately stop serving deleted chatbots

2. Data Crawling Service Integration

Blob Storage Cleanup:

When chatbot deleted, this service removes blobs crawled by Data Crawling Service:

blob_prefix = f"{user_id}/{project_id}/"
# Deletes all: PDFs, extracted text, images uploaded via crawling

3. Shared Service Dependencies

Milvus Embeddings Service:

milvus_service.delete_embeddings_by_user_project(collection_name, user_id, project_id)

Azure Blob Service:

azure_blob.list_blobs(prefix=blob_prefix)
azure_blob.delete_blob(blob_name)

4. Frontend Integration

Trash UI Flow:

  1. User clicks "Delete Chatbot" → Frontend calls DELETE /v2/delete-chatbot-selection
  2. Chatbot moves to trash (7-day window)
  3. User views trash → Frontend calls GET /v2/get-trashed-chatbot-sevendays
  4. User restores → Frontend calls GET /v2/get-trashed-chatbots
  5. User permanently deletes → Frontend calls DELETE /v2/delete-trashed-chatbots

Guardrails UI Flow:

  1. Admin sets defaults → POST /v2/guardrails/default
  2. User customizes → PUT /v2/guardrails/user/{user_id}/project/{project_id}
  3. Response services fetch → GET /v2/guardrails
  4. Response services apply restrictions based on categories

Screenshot UI Flow:

  1. User uploads chatbot UI preview → POST /screenshots/upload
  2. Marketing page displays preview → GET /screenshots
  3. Mobile users auto-detected via User-Agent

5. Guardrails Consumer Services

Response 3D Chatbot Service:

guardrails = requests.get(f"{MAINTENANCE_SERVICE_URL}/v2/guardrails?user_id={user_id}&project_id={project_id}")
# Apply guardrails.categories to LLM system prompt

Integration Pattern: Same for Text/Voice chatbot services


Summary

Service Statistics

  • Total Lines: 1,129
  • Total Endpoints: 12
  • Total Collections: 11 (MongoDB) + 1 (Milvus) + 1 (Azure Blob)
  • Shared Services: 2 (Milvus, Azure Blob)
  • Security Issues: 2 critical (hardcoded key, CORS)

Key Capabilities

  1. Tri-Storage Deletion - Coordinated Milvus + Blob + CosmosDB cleanup
  2. 7-Day Trash Recovery - Soft delete with restore window
  3. Document Content Updates - Re-embed and update knowledge base
  4. Hierarchical Guardrails - User-specific with system fallback
  5. Screenshot Management - Device-specific with auto-fallback

Critical Fixes Needed

  1. 🔴 Externalize Azure Storage credentials
  2. 🟠 Restrict CORS to known origins
  3. 🟡 Split trash collections (chatbots vs projects)
  4. 🟡 Warn users about Milvus/Blob permanent deletion
  5. 🟡 Sync Milvus when updating document content

Deployment Notes

Docker Compose (Port 8010):

chatbot-maintenance-service:
  build: ./chatbot-maintence-service
  container_name: chatbot-maintenance-service
  ports:
    - "8010:8010"
  volumes:
    - ./shared:/app/shared:ro
  environment:
    - MONGO_URI=...
    - AZURE_STORAGE_ACCOUNT_NAME=qablobmachineagents
    - AZURE_STORAGE_ACCOUNT_KEY=***
    - AZURE_SCREENSHOTS_CONTAINER_NAME=screenshots-dev

Shared Volume Required: /app/shared for Milvus/Blob services


Documentation Complete: Chatbot Maintenance Service (Port 8010)
Status: COMPREHENSIVE, DEVELOPER-GRADE, INVESTOR-GRADE, AUDIT-READY ✅