Skip to content

Infrastructure Architecture

Section: 3-product-architecture
Document: Azure Infrastructure & Cloud Resources
Audience: DevOps, Cloud Architects, Infrastructure Engineers
Last Updated: 2025-12-30


🎯 Overview

Complete documentation of Azure cloud infrastructure powering MachineAvatars platform, including compute, storage, networking, and security resources.

Infrastructure Highlights:

  • 23 containerized backend services (Azure Container Apps)
  • Multi-region database (Cosmos DB, Milvus)
  • Secure networking (VNet, NSG, Private Endpoints)
  • Centralized secrets management (Azure Key Vault)

🏢 Azure Resources Organization

Resource Groups

Resource Group Purpose Region Resources
machineagents-prod-rg Production workloads East US Container Apps, Databases
machineagents-data-rg Data layer East US Cosmos DB, Milvus, Blob Storage
machineagents-network-rg Networking East US VNet, NSG, Load Balancer
machineagents-security-rg Security East US Key Vault, Firewall
machineagents-monitoring-rg Observability East US Log Analytics, App Insights

Naming Convention:

{service}-{environment}-{region}-{resource-type}
Example: machineagents-prod-eastus-webapp

Tags:

  • Environment: prod, staging, dev
  • CostCenter: engineering, data, infrastructure
  • Owner: DevOps team
  • Project: MachineAvatars

💻 Compute Resources

Azure Container Apps

Environment: machineagents-prod-env

23 Backend Services:

Service Port CPU Memory Replicas Auto-Scale
gateway-service 8000 0.5 1GB 2-5 ✅ CPU>70%
auth-service 8001 0.25 512MB 2-4 ✅ CPU>60%
user-service 8002 0.25 512MB 1-3 ✅ CPU>60%
create-chatbot 8003 0.5 1GB 1-3 ✅ Requests>100/min
selection-service 8004 0.25 512MB 1-3 ✅ CPU>60%
data-crawling 8005 1.0 2GB 1-5 ✅ CPU>80%
3d-state 8006 0.25 512MB 2-4 ✅ Requests>50/min
text-state 8007 0.25 512MB 1-3 ✅ Requests>50/min
voice-state 8008 0.25 512MB 1-3 ✅ Requests>50/min
system-prompts 8009 0.25 512MB 1-2 ✅ CPU>60%
chatbot-maintenance 8010 0.5 1GB 1-3 ✅ CPU>70%
response-3d 8011 1.0 2GB 2-10 ✅ CPU>75%
response-text 8012 0.5 1GB 2-8 ✅ CPU>75%
response-voice 8013 1.0 2GB 1-5 ✅ CPU>75%
chat-history 8014 0.25 512MB 1-3 ✅ CPU>60%
analytics 8015 0.5 1GB 1-3 ✅ CPU>65%
llm-model-service 8016 0.5 1GB 2-6 ✅ CPU>70%
embedding-service 8017 1.0 2GB 2-6 ✅ CPU>80%
tts-service 8018 0.5 1GB 2-5 ✅ CPU>70%
payment-service 8019 0.25 512MB 2-4 ✅ CPU>60%
notification-service 8020 0.25 512MB 1-2 ✅ CPU>60%
feature-service 8021 0.25 512MB 1-2 ✅ CPU>60%
admin-service 8022 0.25 512MB 1-2 ✅ CPU>60%

Total Resources:

  • Min Replicas: 31 (idle state)
  • Max Replicas: 94 (full scale)
  • Total CPU: 12-36 vCPU
  • Total Memory: 24-72 GB

Container Configuration:

# Example: response-3d-service
containerApp:
  name: response-3d-prod
  image: machineagentsacr.azurecr.io/response-3d:v1.2.3
  resources:
    cpu: 1.0
    memory: 2Gi
  replicas:
    min: 2
    max: 10
  scale:
    rules:
      - type: cpu
        metadata:
          type: Utilization
          value: "75"
  env:
    - name: MONGO_URI
      secretRef: mongo-connection-string
    - name: MILVUS_HOST
      value: milvus-prod.eastus.azurecontainer.io
  probes:
    liveness:
      httpGet:
        path: /health
        port: 8011
      initialDelaySeconds: 30
      periodSeconds: 10
    readiness:
      httpGet:
        path: /ready
        port: 8011
      initialDelaySeconds: 5
      periodSeconds: 5

Azure Container Registry (ACR)

Registry: machineagentsacr.azurecr.io

Features:

  • Geo-replication: East US (primary), Southeast Asia (replica)
  • Vulnerability Scanning: Microsoft Defender for Containers
  • Image Retention: 30 days for dev tags, unlimited for prod
  • Webhooks: Trigger deployments on image push

Image Naming:

{service}:{version}
Example: response-3d:v1.2.3, auth-service:v2.0.1

Tags:

  • latest - Latest stable
  • v{major}.{minor}.{patch} - Semantic versioning
  • dev-{commit-sha} - Development builds

💾 Database Infrastructure

Azure Cosmos DB (MongoDB API)

Account: machineagents-cosmosdb-prod

Configuration:

  • API: MongoDB (4.2 compatible)
  • Consistency: Session (default)
  • Multi-region: Primary (East US), Secondary (Southeast Asia)
  • Throughput: Autoscale 400-4000 RU/s per collection
  • Backup:entire Continuous (30-day point-in-time restore)
  • Encryption: Azure-managed keys (AES-256)

Database: Machine_agent_prod

Collections (9 primary):

Collection Throughput Partition Key TTL Size
users_multichatbot_v2 1000-4000 RU/s /user_id None ~10K docs
chatbot_selections 800-3000 RU/s /project_id None ~5K docs
chatbot_history 2000-8000 RU/s /project_id 90 days ~500K docs
files 1000-4000 RU/s /project_id None ~2K docs
files_secondary 400-2000 RU/s /project_id None ~1K docs
system_prompts_user 400-2000 RU/s /project_id None ~3K docs
projectid_creation 800-3000 RU/s /user_id None ~5K docs
organisation_data 400-1000 RU/s /organization_id None ~50 docs
trash_collection_name 400-1000 RU/s /user_id 7 days ~500 docs

Indexes:

  • Default: All fields indexed
  • Custom composite indexes for common queries
  • Unique indexes: email (users), project_id (chatbot_selections)

Connection String:

  • Stored in Azure Key Vault
  • Referenced via Managed Identity

Milvus Vector Database

Deployment: Docker containers on Azure Container Instance

Components:

services:
  milvus standalone:
    image: milvusdb/milvus:v2.3.4
    container: machineagents-milvus-prod
    cpu: 4 vCPU
    memory: 16 GB
    volumes:
      - /milvus/conf:/milvus/conf
      - /milvus/db:/var/lib/milvus
    ports:
      - 19530:19530
      - 9091:9091

  etcd:
    image: quay.io/coreos/etcd:v3.5.5
    cpu: 1 vCPU
    memory: 2 GB
    volumes:
      - /etcd:/etcd

  minio:
    image: minio/minio:RELEASE.2023-03-20T20-16-18Z
    cpu: 2 vCPU
    memory: 4 GB
    volumes:
      - /minio:/minio_data

Configuration:

# milvus.yaml
common:
  defaultPartitionName: _default
  indexSliceSize: 16

etcd:
  endpoints:
    - etcd:2379

minio:
  address: minio
  port: 9000
  bucketName: milvus-bucket

rootCoord:
  dmlChannelNum: 16

dataNode:
  dataSync:
    parallelism: 4

Collections Configuration:

  • Dynamic collections per project: chatbot_vectors_{project_id}
  • Vector dimensions: 1536 (OpenAI) or 384 (bge-small)
  • Index type: IVF_FLAT
  • Metric: COSINE similarity

Persistence: Azure Blob Storage (minio backend)


📦 Storage Infrastructure

Azure Blob Storage

Account: qablobmachineagents

Configuration:

  • Tier: Hot (frequent access)
  • Redundancy: LRS (Locally Redundant Storage)
  • Encryption: AES-256 (automatic)
  • Access Tier: Hot for active data, Cool for archives
  • Lifecycle Management:
  • Move audio files to Cool tier after 30 days
  • Delete audio files after 90 days

Containers:

Container Purpose Size Public Access
audio-files TTS audio (WAV) ~50GB Private
documents Uploaded PDFs, DOCX ~10GB Private
avatars 3D models (GLB) ~500MB CDN
screenshots Chatbot screenshots ~2GB Private
milvus-data Milvus persistence ~20GB Private

CDN Integration:

  • Azure CDN Standard
  • Endpoint: cdn.machineavatars.com
  • Cache avatars and static assets
  • TTL: 7 days

🌐 Networking Infrastructure

Virtual Network (VNet)

VNet: machineagents-prod-vnet
Address Space: 10.0.0.0/16

Subnets:

Subnet CIDR Purpose NSG
container-apps 10.0.1.0/24 Backend services container-apps-nsg
databases 10.0.2.0/24 Cosmos DB, Milvus databases-nsg
storage 10.0.3.0/24 Blob Storage storage-nsg
management 10.0.4.0/24 Bastion, VPN management-nsg

Network Security Groups (NSGs):

container-apps-nsg:

Inbound Rules:
- Allow HTTPS (443) from Internet (priority 100)
- Allow HTTP (80) from Internet (priority 110)
- Allow Internal (ports 8000-8022) from VNet (priority 200)
- Deny All (priority 4096)

Outbound Rules:
- Allow HTTPS (443) to Internet (OpenAI, external APIs)
- Allow databases subnet (1433, 27017, 19530)
- Allow storage subnet (443)

databases-nsg:

Inbound Rules:
- Allow MongoDB (27017) from container-apps subnet
- Allow Milvus (19530) from container-apps subnet
- Deny All from Internet

Outbound Rules:
- Allow responses to container-apps subnet

Private Endpoints

Cosmos DB Private Endpoint:

  • Name: cosmosdb-private-endpoint
  • Subnet: databases
  • Private IP: 10.0.2.10
  • DNS: privatelink.mongo.cosmos.azure.com

Blob Storage Private Endpoint:

  • Name: blob-private-endpoint
  • Subnet: storage
  • Private IP: 10.0.3.10
  • DNS: privatelink.blob.core.windows.net

Key Vault Private Endpoint:

  • Name: keyvault-private-endpoint
  • Subnet: management
  • Private IP: 10.0.4.10
  • DNS: privatelink.vaultcore.azure.net

Load Balancing

Azure Application Gateway (planned):

  • WAF enabled (Web Application Firewall)
  • SSL termination
  • URL-based routing
  • Health probes

Current: Container Apps built-in load balancing


🔐 Security Infrastructure

Azure Key Vault

Vault: machineagents-keyvault-prod

Secrets Organization:

Secret Name Type Rotation
mongo-connection-string Database Manual
openai-api-key API Key 90 days
anthropic-api-key API Key 90 days
azure-openai-key API Key 90 days
razorpay-key API Key Manual
jwt-secret Auth 180 days
azure-storage-key Storage 90 days

Access Policies:

  • Managed Identities: Container Apps use managed identities
  • Service Principals: CI/CD pipelines
  • RBAC: Role-based access for humans

Audit Logging: All secret access logged for 90 days


Azure Firewall (Planned Q2 2025)

Current: NSG-based security
Planned: Azure Firewall Premium

Rules:

  • Allow outbound HTTPS to approved LLM providers only
  • Block all other outbound except Azure services
  • DDoS protection integration

DDoS Protection

Plan: Azure DDoS Protection Standard

Coverage:

  • All public IP addresses
  • Application Gateway (when deployed)
  • Real-time attack metrics
  • Auto-mitigation

📊 Monitoring Infrastructure

Azure Monitor

Log Analytics Workspace: machineagents-logs-prod

Data Sources:

  • Container Apps logs
  • Azure resources (diagnostics)
  • Custom application logs

Retention: 90 days (standard), 1 year (compliance data)


Application Insights

Instance: machineagents-appinsights-prod

Instrumentation:

  • Python SDK in backend services
  • Automatic dependency tracking
  • Custom events and metrics

Features:

  • Live metrics stream
  • Application map
  • Performance profiling
  • Failure analysis

💰 Cost Management

Monthly Cost Breakdown (Estimated)

Service Cost/Month Percentage
Azure Container Apps $800 35%
Cosmos DB $600 26%
Azure OpenAI API $400 17%
Blob Storage + CDN $200 9%
Milvus (Container Instance) $150 7%
Networking (VNet, NSG) $80 3%
Key Vault $20 1%
Monitoring $50 2%
Total ~$2,300 100%

Cost Optimization Strategies:

  1. Auto-scaling: Scale down during low traffic (nights, weekends)
  2. Reserved Instances: Planned for stable services (save 30-40%)
  3. Blob Lifecycle: Move old data to Cool/Archive tiers
  4. Right-sizing: Monitor actual usage, adjust CPU/memory allocations

- System Architecture Index


"Infrastructure as Code is Infrastructure as Documentation." ☁️✅