Skip to content

Incident ResponseΒΆ

Section: 5-security-architecture
Document: Security Incident Response
Status: Incident Response Plan
Audience: Security teams, DevOps, management


🎯 Overview¢

MachineAvatars maintains a comprehensive incident response plan to quickly detect, contain, and recover from security incidents while minimizing impact to customers and business operations.

Response Team: Security incident response team (SIRT)
Availability: 24/7 for CRITICAL incidents
Escalation: Defined escalation paths


🚨 Incident Classification¢

Severity LevelsΒΆ

Level Description Examples Response Time Escalation
P0 - CRITICAL Data breach, system compromise Database breach, ransomware Immediate CTO, CEO
P1 - HIGH Major security violation API key compromise, DDoS attack 1 hour Security Lead
P2 - MEDIUM Minor security issue Failed login spike, suspicious activity 4 hours Security Team
P3 - LOW Non-urgent security concern Vulnerability disclosure, patch available 24 hours Security Team

πŸ“‹ 6-Step Incident Response ProcessΒΆ

graph LR
    A[1. Detection] --> B[2. Triage]
    B --> C[3. Containment]
    C --> D[4. Eradication]
    D --> E[5. Recovery]
    E --> F[6. Post-Incident]

    style A fill:#FFE082
    style C fill:#FFCDD2
    style E fill:#C8E6C9

Step 1: DetectionΒΆ

Detection Sources:

  • Azure Security Center alerts
  • Monitoring system alerts
  • User reports
  • Third-party disclosures
  • Automated vulnerability scans

Immediate Actions:

  1. Document incident details
  2. Assign incident ID
  3. Notify security team (Slack #security-incidents)
  4. Start incident log

Incident Template:

# Incident: INC-2025-001

**Detected:** 2025-01-15 10:30 UTC
**Severity:** P0 - CRITICAL
**Type:** Data Breach
**Affected Systems:** MongoDB production database
**Reported By:** Azure Security Center
**Status:** Triaging

## Initial Assessment:

- Suspicious database queries detected
- Unknown IP address accessing user collection
- ~1000 user records queried in 5 minutes

## Actions Taken:

1. [ ] Security team notified
2. [ ] CTO escalated
3. [ ] Logs collected
4. [ ] Containment initiated

Step 2: TriageΒΆ

Triage Checklist:

  • Verify incident authenticity (not false positive)
  • Assess severity using classification matrix
  • Identify affected systems/data
  • Estimate impact (users, data, business)
  • Assign incident commander
  • Form response team

Incident Commander Responsibilities:

  • Lead response efforts
  • Coordinate team
  • Make containment decisions
  • Communicate with stakeholders

Response Team:

  • Incident Commander
  • Security Engineers (2)
  • DevOps Engineer
  • Backend Developer (if needed)
  • Legal/Compliance (for data breaches)

Step 3: ContainmentΒΆ

Goal: Stop the bleeding, prevent further damage

Short-term Containment:

  1. Isolate affected systems
# Block malicious IP in Azure Firewall
az network firewall rule create \
  --name BlockMaliciousIP \
  --priority 100 \
  --action Deny \
  --source-addresses 203.0.113.50
  1. Revoke compromised credentials
# Rotate API key immediately
az keyvault secret set \
  --vault-name machineavatars-kv \
  --name azure-openai-gpt4-key \
  --value "NEW_SECURE_KEY"

# Restart affected services
  1. Enable enhanced logging
    # Increase log verbosity
    logging.setLevel(logging.DEBUG)
    

Long-term Containment:

  • Patch vulnerable systems
  • Update firewall rules permanently
  • Implement additional monitoring
  • Review and strengthen access controls

Step 4: EradicationΒΆ

Goal: Remove threat completely

Actions:

  1. Identify root cause

  2. Analyze logs

  3. Review attack vectors
  4. Determine entry point

  5. Remove malicious artifacts

  6. Delete backdoors

  7. Remove malware
  8. Clean compromised accounts

  9. Patch vulnerabilities

  10. Apply security updates
  11. Fix code vulnerabilities
  12. Harden configurations

Example (API Key Compromise):

# 1. Revoke old key
az keyvault secret delete --name compromised-api-key

# 2. Generate new key
NEW_KEY=$(generate_secure_key)

# 3. Store in Key Vault
az keyvault secret set --name new-api-key --value $NEW_KEY

# 4. Update all services
kubectl rollout restart deployment/response-3d-chatbot

# 5. Verify old key no longer works
curl -H "Authorization: Bearer OLD_KEY" https://api.example.com
# Expected: 401 Unauthorized

Step 5: RecoveryΒΆ

Goal: Restore normal operations safely

Recovery Steps:

  1. Verify threat eliminated

  2. Re-scan systems

  3. Monitor for suspicious activity
  4. Confirm patches applied

  5. Restore services incrementally

# Blue-green deployment
# 1. Deploy fixed version to staging
# 2. Run security tests
# 3. Gradually shift traffic to new version
# 4. Monitor for 24 hours
# 5. Complete cutover
  1. Enhanced monitoring
# Add temporary alerts
alert_on_suspicious_patterns = True
alert_threshold_lowered = True
  1. Validate data integrity
  2. Check for data tampering
  3. Verify backups
  4. Test critical functionality

Recovery Validation Checklist:

  • All services healthy
  • No suspicious activity detected (24 hours)
  • Vulnerability patched and verified
  • Monitoring alerts configured
  • Backups verified
  • Customer notification sent (if required)

Step 6: Post-Incident ReviewΒΆ

Timeline: Within 5 business days of resolution

Post-Incident Report:

# Post-Incident Report: INC-2025-001

## Executive Summary

- **Incident:** Unauthorized database access
- **Duration:** 2 hours 45 minutes
- **Impact:** 1,250 user records accessed (no exfiltration confirmed)
- **Root Cause:** Hardcoded MongoDB connection string in public GitHub repository

## Timeline

- 10:30 - Initial detection
- 10:35 - Triage completed, P0 declared
- 10:40 - Containment: IP blocked, connection string rotated
- 11:15 - Eradication: GitHub repository made private, secrets removed
- 12:30 - Recovery: New connection string deployed, monitoring enhanced
- 13:15 - Incident resolved

## Impact Assessment

- **Users Affected:** 1,250
- **Data Accessed:** Email addresses, usernames (no passwords accessed - stored separately)
- **Business Impact:** Minimal service disruption (5 minutes downtime)
- **Financial Impact:** $0 (no ransom, no fines)

## Root Cause Analysis

**What Happened:**

1. Developer accidentally committed connection string to public GitHub repo
2. Attacker found exposed credentials via GitHub scanning
3. Attacker accessed database and queried user collection
4. Azure Security Center detected unusual access pattern

**Why It Happened:**

1. No pre-commit hooks to detect secrets
   2.. Developer not trained on secret management
2. No code review caught the issue
3. Repository set to public by mistake

## Actions Taken

**Immediate:**

- βœ… IP blocked
- βœ… Connection string rotated
- βœ… Repository made private
- βœ… All secrets removed from Git history
- βœ… Users notified (data breach notification)

**Short-term (1 week):**

- βœ… Pre-commit hooks installed (detect-secrets)
- βœ… All developers trained on secret management
- βœ… Code review process enhanced
- βœ… All repositories audited for secrets

**Long-term (1-3 months):**

- πŸ”„ Migrate all secrets to Azure Key Vault
- πŸ”„ Implement SAST in CI/CD
- πŸ”„ Quarterly security training
- πŸ”„ Annual penetration testing

## Lessons Learned

**What Went Well:**

- Quick detection (Azure Security Center)
- Rapid response (2h 45m total)
- Good communication (war room effective)
- No data exfiltration

**What Needs Improvement:**

- Secret management (hardcoded credentials)
- Developer training (security awareness)
- Code review (missed the secret)
- Repository permissions (shouldn't be public)

## Metrics

- **MTTD (Mean Time To Detect):** 5 minutes
- **MTTR (Mean Time To Respond):** 165 minutes
- **MTTC (Mean Time To Contain):** 10 minutes
- **MTTE (Mean Time To Eradicate):** 95 minutes
- **MTTR (Mean Time To Recover):** 75 minutes

## Sign-off

- **Incident Commander:** John Doe
- **Security Lead:** Jane Smith
- **CTO:** Bob Johnson
- **Date:** 2025-01-20

πŸ“˜ Incident PlaybooksΒΆ

Playbook 1: Data BreachΒΆ

Triggers:

  • Unauthorized database access
  • Data exfiltration detected
  • Customer data exposed

Response:

  1. Contain: Block attacker access
  2. Assess: Determine data accessed/exfiltrated
  3. Notify: Legal, compliance, affected users (within 72 hours for GDPR)
  4. Investigate: How did breach occur?
  5. Remediate: Fix vulnerability
  6. Report: Regulatory authorities (if required)

Legal Requirements:

  • GDPR: 72-hour notification to supervisory authority
  • DPDPA 2023: ASAP notification to Data Protection Board
  • HIPAA: 60-day notification (if PHI involved)

Playbook 2: DDoS AttackΒΆ

Triggers:

  • Massive traffic spike
  • Service degradation
  • Azure DDoS Protection activated

Response:

  1. Verify: Confirm DDoS (not legitimate traffic spike)
  2. Azure DDoS: Ensure mitigation active
  3. Scale: Increase infrastructure capacity
  4. CDN: Enable Azure Front Door (if not already)
  5. Monitor: Track attack patterns
  6. Communicate: Status page updates
  7. Post-Mortem: Analyze attack, improve defenses

Playbook 3: RansomwareΒΆ

Triggers:

  • Files encrypted
  • Ransom note detected
  • System locked

Response:

  1. DO NOT PAY RANSOM
  2. Isolate: Disconnect affected systems
  3. Identify: Determine ransomware variant
  4. Restore: From clean backups
  5. Scan: Full malware scan
  6. Patch: Fix entry point
  7. Report: Law enforcement (FBI, local authorities)

Playbook 4: API Key CompromiseΒΆ

Triggers:

  • Unexpected API usage spike
  • API key found in public repository
  • Third-party reports compromised key

Response:

  1. Revoke: Immediate key rotation
  2. Audit: Review API logs for abuse
  3. Assess: Financial impact (API bill)
  4. Notify: Affected customers/partners
  5. Investigate: How was key exposed?
  6. Prevent: Implement secret scanning

Example:

# Automated response
if detect_api_key_in_public_repo():
    revoke_api_key_immediately()
    generate_new_key()
    notify_security_team()
    create_incident_ticket()

Playbook 5: Account TakeoverΒΆ

Triggers:

  • Multiple failed login attempts
  • Login from unusual location
  • User report of unauthorized access

Response:

  1. Lock: Suspend compromised account
  2. Reset: Force password reset
  3. Revoke: Invalidate all sessions/tokens
  4. Audit: Review account activity
  5. Notify: User via verified email/phone
  6. Enable: MFA requirement
  7. Investigate: Phishing? Credential stuffing?

πŸ“ž Contact InformationΒΆ

InternalΒΆ

Security Team:

  • Email: security@machineavatars.com
  • Slack: #security-incidents
  • Phone: +91-98765-43210 (24/7)

Escalation:

  • P0/P1: Notify CTO immediately
  • P2: Security Lead within 4 hours
  • P3: Email security team

ExternalΒΆ

Regulatory Authorities:

  • GDPR (India): dataprotection@india.gov
  • DPDPA: Data Protection Board of India

Law Enforcement:

  • Cyber Crime:Indian Cyber Crime Coordination Centre
  • FBI (if international): ic3.gov

πŸ“Š Incident Metrics & KPIsΒΆ

Track Monthly:

  • Number of incidents by severity
  • MTTD, MTTR, MTTC, MTTE
  • False positive rate
  • Recurring incident types

Goals:

  • MTTD: <15 minutes
  • MTTR (P0): <4 hours
  • MTTR (P1): <24 hours
  • False positive rate: <10%

Security:


"Hope for the best, prepare for the worst, respond with confidence." πŸš¨πŸ”