Incident Response¶

Section: 5-security-architecture
Document: Security Incident Response
Status: Incident Response Plan
Audience: Security teams, DevOps, management

🎯 Overview¶

MachineAvatars maintains a comprehensive incident response plan to quickly detect, contain, and recover from security incidents while minimizing impact to customers and business operations.

Response Team: Security incident response team (SIRT)
Availability: 24/7 for CRITICAL incidents
Escalation: Defined escalation paths

🚨 Incident Classification¶

Severity Levels¶

Level	Description	Examples	Response Time	Escalation
P0 - CRITICAL	Data breach, system compromise	Database breach, ransomware	Immediate	CTO, CEO
P1 - HIGH	Major security violation	API key compromise, DDoS attack	1 hour	Security Lead
P2 - MEDIUM	Minor security issue	Failed login spike, suspicious activity	4 hours	Security Team
P3 - LOW	Non-urgent security concern	Vulnerability disclosure, patch available	24 hours	Security Team

📋 6-Step Incident Response Process¶

graph LR
    A[1. Detection] --> B[2. Triage]
    B --> C[3. Containment]
    C --> D[4. Eradication]
    D --> E[5. Recovery]
    E --> F[6. Post-Incident]

    style A fill:#FFE082
    style C fill:#FFCDD2
    style E fill:#C8E6C9

Step 1: Detection¶

Detection Sources:

Azure Security Center alerts
Monitoring system alerts
User reports
Third-party disclosures
Automated vulnerability scans

Immediate Actions:

Document incident details
Assign incident ID
Notify security team (Slack #security-incidents)
Start incident log

Incident Template:

# Incident: INC-2025-001

**Detected:** 2025-01-15 10:30 UTC
**Severity:** P0 - CRITICAL
**Type:** Data Breach
**Affected Systems:** MongoDB production database
**Reported By:** Azure Security Center
**Status:** Triaging

## Initial Assessment:

- Suspicious database queries detected
- Unknown IP address accessing user collection
- ~1000 user records queried in 5 minutes

## Actions Taken:

1. [ ] Security team notified
2. [ ] CTO escalated
3. [ ] Logs collected
4. [ ] Containment initiated

Step 2: Triage¶

Triage Checklist:

Verify incident authenticity (not false positive)
Assess severity using classification matrix
Identify affected systems/data
Estimate impact (users, data, business)
Assign incident commander
Form response team

Incident Commander Responsibilities:

Lead response efforts
Coordinate team
Make containment decisions
Communicate with stakeholders

Response Team:

Incident Commander
Security Engineers (2)
DevOps Engineer
Backend Developer (if needed)
Legal/Compliance (for data breaches)

Step 3: Containment¶

Goal: Stop the bleeding, prevent further damage

Short-term Containment:

Isolate affected systems

# Block malicious IP in Azure Firewall
az network firewall rule create \
  --name BlockMaliciousIP \
  --priority 100 \
  --action Deny \
  --source-addresses 203.0.113.50

Revoke compromised credentials

# Rotate API key immediately
az keyvault secret set \
  --vault-name machineavatars-kv \
  --name azure-openai-gpt4-key \
  --value "NEW_SECURE_KEY"

# Restart affected services

Enable enhanced logging

# Increase log verbosity
logging.setLevel(logging.DEBUG)

Long-term Containment:

Patch vulnerable systems
Update firewall rules permanently
Implement additional monitoring
Review and strengthen access controls

Step 4: Eradication¶

Goal: Remove threat completely

Actions:

Identify root cause
Analyze logs
Review attack vectors
Determine entry point
Remove malicious artifacts
Delete backdoors
Remove malware
Clean compromised accounts
Patch vulnerabilities
Apply security updates
Fix code vulnerabilities
Harden configurations

Example (API Key Compromise):

# 1. Revoke old key
az keyvault secret delete --name compromised-api-key

# 2. Generate new key
NEW_KEY=$(generate_secure_key)

# 3. Store in Key Vault
az keyvault secret set --name new-api-key --value $NEW_KEY

# 4. Update all services
kubectl rollout restart deployment/response-3d-chatbot

# 5. Verify old key no longer works
curl -H "Authorization: Bearer OLD_KEY" https://api.example.com
# Expected: 401 Unauthorized

Step 5: Recovery¶

Goal: Restore normal operations safely

Recovery Steps:

Verify threat eliminated
Re-scan systems
Monitor for suspicious activity
Confirm patches applied
Restore services incrementally

# Blue-green deployment
# 1. Deploy fixed version to staging
# 2. Run security tests
# 3. Gradually shift traffic to new version
# 4. Monitor for 24 hours
# 5. Complete cutover

Enhanced monitoring

# Add temporary alerts
alert_on_suspicious_patterns = True
alert_threshold_lowered = True

Validate data integrity
Check for data tampering
Verify backups
Test critical functionality

Recovery Validation Checklist:

All services healthy
No suspicious activity detected (24 hours)
Vulnerability patched and verified
Monitoring alerts configured
Backups verified
Customer notification sent (if required)

Step 6: Post-Incident Review¶

Timeline: Within 5 business days of resolution

Post-Incident Report:

# Post-Incident Report: INC-2025-001

## Executive Summary

- **Incident:** Unauthorized database access
- **Duration:** 2 hours 45 minutes
- **Impact:** 1,250 user records accessed (no exfiltration confirmed)
- **Root Cause:** Hardcoded MongoDB connection string in public GitHub repository

## Timeline

- 10:30 - Initial detection
- 10:35 - Triage completed, P0 declared
- 10:40 - Containment: IP blocked, connection string rotated
- 11:15 - Eradication: GitHub repository made private, secrets removed
- 12:30 - Recovery: New connection string deployed, monitoring enhanced
- 13:15 - Incident resolved

## Impact Assessment

- **Users Affected:** 1,250
- **Data Accessed:** Email addresses, usernames (no passwords accessed - stored separately)
- **Business Impact:** Minimal service disruption (5 minutes downtime)
- **Financial Impact:** $0 (no ransom, no fines)

## Root Cause Analysis

**What Happened:**

1. Developer accidentally committed connection string to public GitHub repo
2. Attacker found exposed credentials via GitHub scanning
3. Attacker accessed database and queried user collection
4. Azure Security Center detected unusual access pattern

**Why It Happened:**

1. No pre-commit hooks to detect secrets
   2.. Developer not trained on secret management
2. No code review caught the issue
3. Repository set to public by mistake

## Actions Taken

**Immediate:**

- ✅ IP blocked
- ✅ Connection string rotated
- ✅ Repository made private
- ✅ All secrets removed from Git history
- ✅ Users notified (data breach notification)

**Short-term (1 week):**

- ✅ Pre-commit hooks installed (detect-secrets)
- ✅ All developers trained on secret management
- ✅ Code review process enhanced
- ✅ All repositories audited for secrets

**Long-term (1-3 months):**

- 🔄 Migrate all secrets to Azure Key Vault
- 🔄 Implement SAST in CI/CD
- 🔄 Quarterly security training
- 🔄 Annual penetration testing

## Lessons Learned

**What Went Well:**

- Quick detection (Azure Security Center)
- Rapid response (2h 45m total)
- Good communication (war room effective)
- No data exfiltration

**What Needs Improvement:**

- Secret management (hardcoded credentials)
- Developer training (security awareness)
- Code review (missed the secret)
- Repository permissions (shouldn't be public)

## Metrics

- **MTTD (Mean Time To Detect):** 5 minutes
- **MTTR (Mean Time To Respond):** 165 minutes
- **MTTC (Mean Time To Contain):** 10 minutes
- **MTTE (Mean Time To Eradicate):** 95 minutes
- **MTTR (Mean Time To Recover):** 75 minutes

## Sign-off

- **Incident Commander:** John Doe
- **Security Lead:** Jane Smith
- **CTO:** Bob Johnson
- **Date:** 2025-01-20

📘 Incident Playbooks¶

Playbook 1: Data Breach¶

Triggers:

Unauthorized database access
Data exfiltration detected
Customer data exposed

Response:

Contain: Block attacker access
Assess: Determine data accessed/exfiltrated
Notify: Legal, compliance, affected users (within 72 hours for GDPR)
Investigate: How did breach occur?
Remediate: Fix vulnerability
Report: Regulatory authorities (if required)

Legal Requirements:

GDPR: 72-hour notification to supervisory authority
DPDPA 2023: ASAP notification to Data Protection Board
HIPAA: 60-day notification (if PHI involved)

Playbook 2: DDoS Attack¶

Triggers:

Massive traffic spike
Service degradation
Azure DDoS Protection activated

Response:

Verify: Confirm DDoS (not legitimate traffic spike)
Azure DDoS: Ensure mitigation active
Scale: Increase infrastructure capacity
CDN: Enable Azure Front Door (if not already)
Monitor: Track attack patterns
Communicate: Status page updates
Post-Mortem: Analyze attack, improve defenses

Playbook 3: Ransomware¶

Triggers:

Files encrypted
Ransom note detected
System locked

Response:

DO NOT PAY RANSOM
Isolate: Disconnect affected systems
Identify: Determine ransomware variant
Restore: From clean backups
Scan: Full malware scan
Patch: Fix entry point
Report: Law enforcement (FBI, local authorities)

Playbook 4: API Key Compromise¶

Triggers:

Unexpected API usage spike
API key found in public repository
Third-party reports compromised key

Response:

Revoke: Immediate key rotation
Audit: Review API logs for abuse
Assess: Financial impact (API bill)
Notify: Affected customers/partners
Investigate: How was key exposed?
Prevent: Implement secret scanning

Example:

# Automated response
if detect_api_key_in_public_repo():
    revoke_api_key_immediately()
    generate_new_key()
    notify_security_team()
    create_incident_ticket()

Playbook 5: Account Takeover¶

Triggers:

Multiple failed login attempts
Login from unusual location
User report of unauthorized access

Response:

Lock: Suspend compromised account
Reset: Force password reset
Revoke: Invalidate all sessions/tokens
Audit: Review account activity
Notify: User via verified email/phone
Enable: MFA requirement
Investigate: Phishing? Credential stuffing?

📞 Contact Information¶

Internal¶

Security Team:

Email: security@machineavatars.com
Slack: #security-incidents
Phone: +91-98765-43210 (24/7)

Escalation:

P0/P1: Notify CTO immediately
P2: Security Lead within 4 hours
P3: Email security team

External¶

Regulatory Authorities:

GDPR (India): dataprotection@india.gov
DPDPA: Data Protection Board of India

Law Enforcement:

Cyber Crime:Indian Cyber Crime Coordination Centre
FBI (if international): ic3.gov

📊 Incident Metrics & KPIs¶

Track Monthly:

Number of incidents by severity
MTTD, MTTR, MTTC, MTTE
False positive rate
Recurring incident types

Goals:

MTTD: <15 minutes
MTTR (P0): <4 hours
MTTR (P1): <24 hours
False positive rate: <10%

Security:

Secret Management - API key compromise response
Authentication & Authorization - Account takeover response
Security Testing - Vulnerability disclosure

"Hope for the best, prepare for the worst, respond with confidence." 🚨🔐

Incident Response¶

🎯 Overview¶

🚨 Incident Classification¶

Severity Levels¶

📋 6-Step Incident Response Process¶

Step 1: Detection¶

Step 2: Triage¶

Step 3: Containment¶

Step 4: Eradication¶

Step 5: Recovery¶

Step 6: Post-Incident Review¶

📘 Incident Playbooks¶

Playbook 1: Data Breach¶

Playbook 2: DDoS Attack¶

Playbook 3: Ransomware¶

Playbook 4: API Key Compromise¶

Playbook 5: Account Takeover¶

📞 Contact Information¶

Internal¶

External¶

📊 Incident Metrics & KPIs¶

🔗 Related Documentation¶