Infrastructure Architecture¶
Section: 3-product-architecture
Document: Azure Infrastructure & Cloud Resources
Audience: DevOps, Cloud Architects, Infrastructure Engineers
Last Updated: 2025-12-30
🎯 Overview¶
Complete documentation of Azure cloud infrastructure powering MachineAvatars platform, including compute, storage, networking, and security resources.
Infrastructure Highlights:
- 23 containerized backend services (Azure Container Apps)
- Multi-region database (Cosmos DB, Milvus)
- Secure networking (VNet, NSG, Private Endpoints)
- Centralized secrets management (Azure Key Vault)
🏢 Azure Resources Organization¶
Resource Groups¶
| Resource Group | Purpose | Region | Resources |
|---|---|---|---|
| machineagents-prod-rg | Production workloads | East US | Container Apps, Databases |
| machineagents-data-rg | Data layer | East US | Cosmos DB, Milvus, Blob Storage |
| machineagents-network-rg | Networking | East US | VNet, NSG, Load Balancer |
| machineagents-security-rg | Security | East US | Key Vault, Firewall |
| machineagents-monitoring-rg | Observability | East US | Log Analytics, App Insights |
Naming Convention:
Tags:
Environment: prod, staging, devCostCenter: engineering, data, infrastructureOwner: DevOps teamProject: MachineAvatars
💻 Compute Resources¶
Azure Container Apps¶
Environment: machineagents-prod-env
23 Backend Services:
| Service | Port | CPU | Memory | Replicas | Auto-Scale |
|---|---|---|---|---|---|
| gateway-service | 8000 | 0.5 | 1GB | 2-5 | ✅ CPU>70% |
| auth-service | 8001 | 0.25 | 512MB | 2-4 | ✅ CPU>60% |
| user-service | 8002 | 0.25 | 512MB | 1-3 | ✅ CPU>60% |
| create-chatbot | 8003 | 0.5 | 1GB | 1-3 | ✅ Requests>100/min |
| selection-service | 8004 | 0.25 | 512MB | 1-3 | ✅ CPU>60% |
| data-crawling | 8005 | 1.0 | 2GB | 1-5 | ✅ CPU>80% |
| 3d-state | 8006 | 0.25 | 512MB | 2-4 | ✅ Requests>50/min |
| text-state | 8007 | 0.25 | 512MB | 1-3 | ✅ Requests>50/min |
| voice-state | 8008 | 0.25 | 512MB | 1-3 | ✅ Requests>50/min |
| system-prompts | 8009 | 0.25 | 512MB | 1-2 | ✅ CPU>60% |
| chatbot-maintenance | 8010 | 0.5 | 1GB | 1-3 | ✅ CPU>70% |
| response-3d | 8011 | 1.0 | 2GB | 2-10 | ✅ CPU>75% |
| response-text | 8012 | 0.5 | 1GB | 2-8 | ✅ CPU>75% |
| response-voice | 8013 | 1.0 | 2GB | 1-5 | ✅ CPU>75% |
| chat-history | 8014 | 0.25 | 512MB | 1-3 | ✅ CPU>60% |
| analytics | 8015 | 0.5 | 1GB | 1-3 | ✅ CPU>65% |
| llm-model-service | 8016 | 0.5 | 1GB | 2-6 | ✅ CPU>70% |
| embedding-service | 8017 | 1.0 | 2GB | 2-6 | ✅ CPU>80% |
| tts-service | 8018 | 0.5 | 1GB | 2-5 | ✅ CPU>70% |
| payment-service | 8019 | 0.25 | 512MB | 2-4 | ✅ CPU>60% |
| notification-service | 8020 | 0.25 | 512MB | 1-2 | ✅ CPU>60% |
| feature-service | 8021 | 0.25 | 512MB | 1-2 | ✅ CPU>60% |
| admin-service | 8022 | 0.25 | 512MB | 1-2 | ✅ CPU>60% |
Total Resources:
- Min Replicas: 31 (idle state)
- Max Replicas: 94 (full scale)
- Total CPU: 12-36 vCPU
- Total Memory: 24-72 GB
Container Configuration:
# Example: response-3d-service
containerApp:
name: response-3d-prod
image: machineagentsacr.azurecr.io/response-3d:v1.2.3
resources:
cpu: 1.0
memory: 2Gi
replicas:
min: 2
max: 10
scale:
rules:
- type: cpu
metadata:
type: Utilization
value: "75"
env:
- name: MONGO_URI
secretRef: mongo-connection-string
- name: MILVUS_HOST
value: milvus-prod.eastus.azurecontainer.io
probes:
liveness:
httpGet:
path: /health
port: 8011
initialDelaySeconds: 30
periodSeconds: 10
readiness:
httpGet:
path: /ready
port: 8011
initialDelaySeconds: 5
periodSeconds: 5
Azure Container Registry (ACR)¶
Registry: machineagentsacr.azurecr.io
Features:
- Geo-replication: East US (primary), Southeast Asia (replica)
- Vulnerability Scanning: Microsoft Defender for Containers
- Image Retention: 30 days for dev tags, unlimited for prod
- Webhooks: Trigger deployments on image push
Image Naming:
Tags:
latest- Latest stablev{major}.{minor}.{patch}- Semantic versioningdev-{commit-sha}- Development builds
💾 Database Infrastructure¶
Azure Cosmos DB (MongoDB API)¶
Account: machineagents-cosmosdb-prod
Configuration:
- API: MongoDB (4.2 compatible)
- Consistency: Session (default)
- Multi-region: Primary (East US), Secondary (Southeast Asia)
- Throughput: Autoscale 400-4000 RU/s per collection
- Backup:entire Continuous (30-day point-in-time restore)
- Encryption: Azure-managed keys (AES-256)
Database: Machine_agent_prod
Collections (9 primary):
| Collection | Throughput | Partition Key | TTL | Size |
|---|---|---|---|---|
| users_multichatbot_v2 | 1000-4000 RU/s | /user_id | None | ~10K docs |
| chatbot_selections | 800-3000 RU/s | /project_id | None | ~5K docs |
| chatbot_history | 2000-8000 RU/s | /project_id | 90 days | ~500K docs |
| files | 1000-4000 RU/s | /project_id | None | ~2K docs |
| files_secondary | 400-2000 RU/s | /project_id | None | ~1K docs |
| system_prompts_user | 400-2000 RU/s | /project_id | None | ~3K docs |
| projectid_creation | 800-3000 RU/s | /user_id | None | ~5K docs |
| organisation_data | 400-1000 RU/s | /organization_id | None | ~50 docs |
| trash_collection_name | 400-1000 RU/s | /user_id | 7 days | ~500 docs |
Indexes:
- Default: All fields indexed
- Custom composite indexes for common queries
- Unique indexes: email (users), project_id (chatbot_selections)
Connection String:
- Stored in Azure Key Vault
- Referenced via Managed Identity
Milvus Vector Database¶
Deployment: Docker containers on Azure Container Instance
Components:
services:
milvus standalone:
image: milvusdb/milvus:v2.3.4
container: machineagents-milvus-prod
cpu: 4 vCPU
memory: 16 GB
volumes:
- /milvus/conf:/milvus/conf
- /milvus/db:/var/lib/milvus
ports:
- 19530:19530
- 9091:9091
etcd:
image: quay.io/coreos/etcd:v3.5.5
cpu: 1 vCPU
memory: 2 GB
volumes:
- /etcd:/etcd
minio:
image: minio/minio:RELEASE.2023-03-20T20-16-18Z
cpu: 2 vCPU
memory: 4 GB
volumes:
- /minio:/minio_data
Configuration:
# milvus.yaml
common:
defaultPartitionName: _default
indexSliceSize: 16
etcd:
endpoints:
- etcd:2379
minio:
address: minio
port: 9000
bucketName: milvus-bucket
rootCoord:
dmlChannelNum: 16
dataNode:
dataSync:
parallelism: 4
Collections Configuration:
- Dynamic collections per project:
chatbot_vectors_{project_id} - Vector dimensions: 1536 (OpenAI) or 384 (bge-small)
- Index type: IVF_FLAT
- Metric: COSINE similarity
Persistence: Azure Blob Storage (minio backend)
📦 Storage Infrastructure¶
Azure Blob Storage¶
Account: qablobmachineagents
Configuration:
- Tier: Hot (frequent access)
- Redundancy: LRS (Locally Redundant Storage)
- Encryption: AES-256 (automatic)
- Access Tier: Hot for active data, Cool for archives
- Lifecycle Management:
- Move audio files to Cool tier after 30 days
- Delete audio files after 90 days
Containers:
| Container | Purpose | Size | Public Access |
|---|---|---|---|
| audio-files | TTS audio (WAV) | ~50GB | Private |
| documents | Uploaded PDFs, DOCX | ~10GB | Private |
| avatars | 3D models (GLB) | ~500MB | CDN |
| screenshots | Chatbot screenshots | ~2GB | Private |
| milvus-data | Milvus persistence | ~20GB | Private |
CDN Integration:
- Azure CDN Standard
- Endpoint: cdn.machineavatars.com
- Cache avatars and static assets
- TTL: 7 days
🌐 Networking Infrastructure¶
Virtual Network (VNet)¶
VNet: machineagents-prod-vnet
Address Space: 10.0.0.0/16
Subnets:
| Subnet | CIDR | Purpose | NSG |
|---|---|---|---|
| container-apps | 10.0.1.0/24 | Backend services | container-apps-nsg |
| databases | 10.0.2.0/24 | Cosmos DB, Milvus | databases-nsg |
| storage | 10.0.3.0/24 | Blob Storage | storage-nsg |
| management | 10.0.4.0/24 | Bastion, VPN | management-nsg |
Network Security Groups (NSGs):
container-apps-nsg:
Inbound Rules:
- Allow HTTPS (443) from Internet (priority 100)
- Allow HTTP (80) from Internet (priority 110)
- Allow Internal (ports 8000-8022) from VNet (priority 200)
- Deny All (priority 4096)
Outbound Rules:
- Allow HTTPS (443) to Internet (OpenAI, external APIs)
- Allow databases subnet (1433, 27017, 19530)
- Allow storage subnet (443)
databases-nsg:
Inbound Rules:
- Allow MongoDB (27017) from container-apps subnet
- Allow Milvus (19530) from container-apps subnet
- Deny All from Internet
Outbound Rules:
- Allow responses to container-apps subnet
Private Endpoints¶
Cosmos DB Private Endpoint:
- Name: cosmosdb-private-endpoint
- Subnet: databases
- Private IP: 10.0.2.10
- DNS: privatelink.mongo.cosmos.azure.com
Blob Storage Private Endpoint:
- Name: blob-private-endpoint
- Subnet: storage
- Private IP: 10.0.3.10
- DNS: privatelink.blob.core.windows.net
Key Vault Private Endpoint:
- Name: keyvault-private-endpoint
- Subnet: management
- Private IP: 10.0.4.10
- DNS: privatelink.vaultcore.azure.net
Load Balancing¶
Azure Application Gateway (planned):
- WAF enabled (Web Application Firewall)
- SSL termination
- URL-based routing
- Health probes
Current: Container Apps built-in load balancing
🔐 Security Infrastructure¶
Azure Key Vault¶
Vault: machineagents-keyvault-prod
Secrets Organization:
| Secret Name | Type | Rotation |
|---|---|---|
| mongo-connection-string | Database | Manual |
| openai-api-key | API Key | 90 days |
| anthropic-api-key | API Key | 90 days |
| azure-openai-key | API Key | 90 days |
| razorpay-key | API Key | Manual |
| jwt-secret | Auth | 180 days |
| azure-storage-key | Storage | 90 days |
Access Policies:
- Managed Identities: Container Apps use managed identities
- Service Principals: CI/CD pipelines
- RBAC: Role-based access for humans
Audit Logging: All secret access logged for 90 days
Azure Firewall (Planned Q2 2025)¶
Current: NSG-based security
Planned: Azure Firewall Premium
Rules:
- Allow outbound HTTPS to approved LLM providers only
- Block all other outbound except Azure services
- DDoS protection integration
DDoS Protection¶
Plan: Azure DDoS Protection Standard
Coverage:
- All public IP addresses
- Application Gateway (when deployed)
- Real-time attack metrics
- Auto-mitigation
📊 Monitoring Infrastructure¶
Azure Monitor¶
Log Analytics Workspace: machineagents-logs-prod
Data Sources:
- Container Apps logs
- Azure resources (diagnostics)
- Custom application logs
Retention: 90 days (standard), 1 year (compliance data)
Application Insights¶
Instance: machineagents-appinsights-prod
Instrumentation:
- Python SDK in backend services
- Automatic dependency tracking
- Custom events and metrics
Features:
- Live metrics stream
- Application map
- Performance profiling
- Failure analysis
💰 Cost Management¶
Monthly Cost Breakdown (Estimated)¶
| Service | Cost/Month | Percentage |
|---|---|---|
| Azure Container Apps | $800 | 35% |
| Cosmos DB | $600 | 26% |
| Azure OpenAI API | $400 | 17% |
| Blob Storage + CDN | $200 | 9% |
| Milvus (Container Instance) | $150 | 7% |
| Networking (VNet, NSG) | $80 | 3% |
| Key Vault | $20 | 1% |
| Monitoring | $50 | 2% |
| Total | ~$2,300 | 100% |
Cost Optimization Strategies:
- Auto-scaling: Scale down during low traffic (nights, weekends)
- Reserved Instances: Planned for stable services (save 30-40%)
- Blob Lifecycle: Move old data to Cool/Archive tiers
- Right-sizing: Monitor actual usage, adjust CPU/memory allocations
🔗 Related Documentation¶
- System Architecture Index¶
"Infrastructure as Code is Infrastructure as Documentation." ☁️✅