Database Operations¶
Section: 8-deployment-operations
Document: MongoDB & Milvus Operations Guide
Audience: Database Administrators, DevOps Engineers
Last Updated: 2025-12-30
🎯 Overview¶
Complete guide for MongoDB (Cosmos DB) and Milvus database operations including migrations, backups, restores, and maintenance.
💾 MongoDB Operations¶
Database Migrations¶
Migration Framework: Custom Python migration scripts
Location: migrations/ directory in backend services
Migration File Naming:
migrations/
├── 001_add_password_hash.py
├── 002_add_subscription_fields.py
├── 003_create_organisation_collection.py
└── ...
Migration Template:
"""
Migration: [Description]
Date: 2025-12-30
Author: DevOps Team
"""
def up(db):
"""Apply migration"""
# Forward migration logic
pass
def down(db):
"""Rollback migration"""
# Rollback logic
pass
def validate(db):
"""Validate migration applied correctly"""
# Validation logic
return True
Example Migration - Add password_hash field:
import bcrypt
from pymongo import MongoClient
def up(db):
"""Add password_hash field to all users"""
users = db.users_multichatbot_v2
# Find users without password_hash
cursor = users.find({"password_hash": {"$exists": False}})
count = 0
for user in cursor:
# Hash existing password
if "password" in user:
hashed = bcrypt.hashpw(
user["password"].encode('utf-8'),
bcrypt.gensalt(rounds=12)
)
users.update_one(
{"_id": user["_id"]},
{
"$set": {"password_hash": hashed.decode('utf-8')},
"$unset": {"password": ""} # Remove plain text
}
)
count += 1
print(f"Updated {count} users with password_hash")
def down(db):
"""Remove password_hash field"""
db.users_multichatbot_v2.update_many(
{},
{"$unset": {"password_hash": ""}}
)
def validate(db):
"""Verify all users have password_hash"""
total = db.users_multichatbot_v2.count_documents({})
with_hash = db.users_multichatbot_v2.count_documents(
{"password_hash": {"$exists": True}}
)
return total == with_hash
Run Migration:
# Run single migration
python run_migration.py --migration 001_add_password_hash
# Run all pending migrations
python run_migration.py --all
# Rollback last migration
python run_migration.py --rollback
# Dry run (no changes)
python run_migration.py --migration 001_add_password_hash --dry-run
Backup Strategies¶
Automated Backups (Cosmos DB):
- Frequency: Continuous
- Retention: 30 days (point-in-time restore)
- Type: Automatic, no configuration needed
- Cost: Included in Cosmos DB pricing
Manual Backups:
# Export single collection
mongoexport --uri="$MONGO_URI" \
--db=Machine_agent_prod \
--collection=users_multichatbot_v2 \
--out=users_$(date +%Y%m%d_%H%M%S).json
# Export all collections with script
#!/bin/bash
BACKUP_DIR="backups/$(date +%Y%m%d_%H%M%S)"
mkdir -p $BACKUP_DIR
collections=(
"users_multichatbot_v2"
"chatbot_selections"
"chatbot_history"
"files"
"files_secondary"
"system_prompts_user"
"projectid_creation"
"organisation_data"
"trash_collection_name"
)
for collection in "${collections[@]}"; do
echo "Backing up $collection..."
mongoexport --uri="$MONGO_URI" \
--db=Machine_agent_prod \
--collection=$collection \
--out=$BACKUP_DIR/${collection}.json
done
# Upload to Azure Blob
az storage blob upload-batch \
--destination backups \
--source $BACKUP_DIR \
--account-name qablobmachineagents
Restore Procedures¶
Point-in-Time Restore (Last 30 days):
# Via Azure CLI
az cosmosdb mongodb database restore \
--account-name machineagents-cosmosdb-prod \
--resource-group machineagents-data-rg \
--database-name Machine_agent_prod \
--restore-timestamp "2025-12-30T10:00:00Z" \
--target-database-name Machine_agent_prod_restored
# Verify restored database
mongosh "$MONGO_URI/Machine_agent_prod_restored" \
--eval "db.getCollectionNames()"
# Switch applications to restored database
# Update Key Vault secret with new database name
Import from Manual Backup:
# Download backup from Azure Blob
az storage blob download-batch \
--destination ./restore \
--source backups/20251230_100000 \
--account-name qablobmachineagents
# Import collections
for file in restore/*.json; do
collection=$(basename $file .json)
echo "Importing $collection..."
mongoimport --uri="$MONGO_URI" \
--db=Machine_agent_prod \
--collection=$collection \
--file=$file \
--mode=upsert
done
Performance Tuning¶
Monitor RU/s Consumption:
# Check current throughput
az cosmosdb mongodb collection throughput show \
--account-name machineagents-cosmosdb-prod \
--database-name Machine_agent_prod \
--name chatbot_history \
--resource-group machineagents-data-rg
# Check if throttled
az monitor metrics list \
--resource /subscriptions/{sub}/resourceGroups/machineagents-data-rg/providers/Microsoft.DocumentDB/databaseAccounts/machineagents-cosmosdb-prod \
--metric "TotalRequestUnits" \
--start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%SZ) \
| jq '.value[].timeseries[].data[] | select(.maximum > 4000)'
Optimize Queries:
// Before: Slow query (table scan)
db.chatbot_history.find({ user_id: "User-123" });
// After: Use index
db.chatbot_history.createIndex({ user_id: 1, created_at: -1 });
db.chatbot_history.find({ user_id: "User-123" }).sort({ created_at: -1 });
// Check query performance
db.chatbot_history.find({ user_id: "User-123" }).explain("executionStats");
🔍 Milvus Operations¶
Collection Management¶
Create Collection:
from pymilvus import Collection, FieldSchema, CollectionSchema, DataType
# Define schema
fields = [
FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True),
FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=384),
FieldSchema(name="text", dtype=DataType.VARCHAR, max_length=2000),
FieldSchema(name="user_id", dtype=DataType.VARCHAR, max_length=100),
FieldSchema(name="project_id", dtype=DataType.VARCHAR, max_length=100),
]
schema = CollectionSchema(fields=fields, description="Chatbot vectors")
# Create collection
collection = Collection(name="chatbot_vectors_User123_Project1", schema=schema)
# Create IVF_FLAT index
index_params = {
"index_type": "IVF_FLAT",
"metric_type": "COSINE",
"params": {"nlist": 128}
}
collection.create_index(field_name="embedding", index_params=index_params)
Drop Collection:
from pymilvus import utility
# Drop collection (PERMANENT!)
utility.drop_collection("chatbot_vectors_User123_Project1")
Backup & Restore¶
Backup Milvus Data:
# Stop writes (set read-only mode)
# Via application feature flag
# Create snapshot
docker exec milvus-standalone bash -c "
cd /var/lib/milvus &&
tar -czf /tmp/milvus_snapshot_$(date +%Y%m%d).tar.gz db/ wal/
"
# Copy to host
docker cp milvus-standalone:/tmp/milvus_snapshot_$(date +%Y%m%d).tar.gz ./
# Upload to Azure Blob
az storage blob upload \
--container-name milvus-backups \
--file milvus_snapshot_$(date +%Y%m%d).tar.gz \
--name $(date +%Y%m%d)/milvus_snapshot.tar.gz \
--account-name qablobmachineagents
Restore from Backup:
# Download backup
az storage blob download \
--container-name milvus-backups \
--name 20251230/milvus_snapshot.tar.gz \
--file milvus_restore.tar.gz \
--account-name qablobmachineagents
# Stop Milvus
docker stop milvus-standalone
# Clear existing data
docker exec milvus-standalone rm -rf /var/lib/milvus/db /var/lib/milvus/wal
# Restore backup
docker cp milvus_restore.tar.gz milvus-standalone:/tmp/
docker exec milvus-standalone tar -xzf /tmp/milvus_restore.tar.gz -C /var/lib/milvus/
# Restart Milvus
docker start milvus-standalone
# Verify collections
python -c "
from pymilvus import connections, utility
connections.connect(host='localhost', port='19530')
print(utility.list_collections())
"
Performance Optimization¶
Index Tuning:
# IVF_FLAT (current) - Good for small-medium datasets
index_params = {
"index_type": "IVF_FLAT",
"metric_type": "COSINE",
"params": {"nlist": 128} # Number of clusters
}
# HNSW - Better for large datasets, faster search
index_params = {
"index_type": "HNSW",
"metric_type": "COSINE",
"params": {
"M": 16, # Number of bi-directional links
"efConstruction": 200 # Build-time search scope
}
}
# Search parameters
search_params = {
"metric_type": "COSINE",
"params": {"ef": 100} # Search-time scope (higher = more accurate, slower)
}
Connection Pooling:
from pymilvus import connections
# Configure connection pool
connections.connect(
alias="default",
host="milvus-prod.eastus.azurecontainer.io",
port="19530",
pool_size=10, # Connection pool size
timeout=30
)
🔗 Related Documentation¶
"Backups are only good if you test restores." 💾✅