Incident ResponseΒΆ
Section: 5-security-architecture
Document: Security Incident Response
Status: Incident Response Plan
Audience: Security teams, DevOps, management
π― OverviewΒΆ
MachineAvatars maintains a comprehensive incident response plan to quickly detect, contain, and recover from security incidents while minimizing impact to customers and business operations.
Response Team: Security incident response team (SIRT)
Availability: 24/7 for CRITICAL incidents
Escalation: Defined escalation paths
π¨ Incident ClassificationΒΆ
Severity LevelsΒΆ
| Level | Description | Examples | Response Time | Escalation |
|---|---|---|---|---|
| P0 - CRITICAL | Data breach, system compromise | Database breach, ransomware | Immediate | CTO, CEO |
| P1 - HIGH | Major security violation | API key compromise, DDoS attack | 1 hour | Security Lead |
| P2 - MEDIUM | Minor security issue | Failed login spike, suspicious activity | 4 hours | Security Team |
| P3 - LOW | Non-urgent security concern | Vulnerability disclosure, patch available | 24 hours | Security Team |
π 6-Step Incident Response ProcessΒΆ
graph LR
A[1. Detection] --> B[2. Triage]
B --> C[3. Containment]
C --> D[4. Eradication]
D --> E[5. Recovery]
E --> F[6. Post-Incident]
style A fill:#FFE082
style C fill:#FFCDD2
style E fill:#C8E6C9
Step 1: DetectionΒΆ
Detection Sources:
- Azure Security Center alerts
- Monitoring system alerts
- User reports
- Third-party disclosures
- Automated vulnerability scans
Immediate Actions:
- Document incident details
- Assign incident ID
- Notify security team (Slack #security-incidents)
- Start incident log
Incident Template:
# Incident: INC-2025-001
**Detected:** 2025-01-15 10:30 UTC
**Severity:** P0 - CRITICAL
**Type:** Data Breach
**Affected Systems:** MongoDB production database
**Reported By:** Azure Security Center
**Status:** Triaging
## Initial Assessment:
- Suspicious database queries detected
- Unknown IP address accessing user collection
- ~1000 user records queried in 5 minutes
## Actions Taken:
1. [ ] Security team notified
2. [ ] CTO escalated
3. [ ] Logs collected
4. [ ] Containment initiated
Step 2: TriageΒΆ
Triage Checklist:
- Verify incident authenticity (not false positive)
- Assess severity using classification matrix
- Identify affected systems/data
- Estimate impact (users, data, business)
- Assign incident commander
- Form response team
Incident Commander Responsibilities:
- Lead response efforts
- Coordinate team
- Make containment decisions
- Communicate with stakeholders
Response Team:
- Incident Commander
- Security Engineers (2)
- DevOps Engineer
- Backend Developer (if needed)
- Legal/Compliance (for data breaches)
Step 3: ContainmentΒΆ
Goal: Stop the bleeding, prevent further damage
Short-term Containment:
- Isolate affected systems
# Block malicious IP in Azure Firewall
az network firewall rule create \
--name BlockMaliciousIP \
--priority 100 \
--action Deny \
--source-addresses 203.0.113.50
- Revoke compromised credentials
# Rotate API key immediately
az keyvault secret set \
--vault-name machineavatars-kv \
--name azure-openai-gpt4-key \
--value "NEW_SECURE_KEY"
# Restart affected services
- Enable enhanced logging
Long-term Containment:
- Patch vulnerable systems
- Update firewall rules permanently
- Implement additional monitoring
- Review and strengthen access controls
Step 4: EradicationΒΆ
Goal: Remove threat completely
Actions:
-
Identify root cause
-
Analyze logs
- Review attack vectors
-
Determine entry point
-
Remove malicious artifacts
-
Delete backdoors
- Remove malware
-
Clean compromised accounts
-
Patch vulnerabilities
- Apply security updates
- Fix code vulnerabilities
- Harden configurations
Example (API Key Compromise):
# 1. Revoke old key
az keyvault secret delete --name compromised-api-key
# 2. Generate new key
NEW_KEY=$(generate_secure_key)
# 3. Store in Key Vault
az keyvault secret set --name new-api-key --value $NEW_KEY
# 4. Update all services
kubectl rollout restart deployment/response-3d-chatbot
# 5. Verify old key no longer works
curl -H "Authorization: Bearer OLD_KEY" https://api.example.com
# Expected: 401 Unauthorized
Step 5: RecoveryΒΆ
Goal: Restore normal operations safely
Recovery Steps:
-
Verify threat eliminated
-
Re-scan systems
- Monitor for suspicious activity
-
Confirm patches applied
-
Restore services incrementally
# Blue-green deployment
# 1. Deploy fixed version to staging
# 2. Run security tests
# 3. Gradually shift traffic to new version
# 4. Monitor for 24 hours
# 5. Complete cutover
- Enhanced monitoring
- Validate data integrity
- Check for data tampering
- Verify backups
- Test critical functionality
Recovery Validation Checklist:
- All services healthy
- No suspicious activity detected (24 hours)
- Vulnerability patched and verified
- Monitoring alerts configured
- Backups verified
- Customer notification sent (if required)
Step 6: Post-Incident ReviewΒΆ
Timeline: Within 5 business days of resolution
Post-Incident Report:
# Post-Incident Report: INC-2025-001
## Executive Summary
- **Incident:** Unauthorized database access
- **Duration:** 2 hours 45 minutes
- **Impact:** 1,250 user records accessed (no exfiltration confirmed)
- **Root Cause:** Hardcoded MongoDB connection string in public GitHub repository
## Timeline
- 10:30 - Initial detection
- 10:35 - Triage completed, P0 declared
- 10:40 - Containment: IP blocked, connection string rotated
- 11:15 - Eradication: GitHub repository made private, secrets removed
- 12:30 - Recovery: New connection string deployed, monitoring enhanced
- 13:15 - Incident resolved
## Impact Assessment
- **Users Affected:** 1,250
- **Data Accessed:** Email addresses, usernames (no passwords accessed - stored separately)
- **Business Impact:** Minimal service disruption (5 minutes downtime)
- **Financial Impact:** $0 (no ransom, no fines)
## Root Cause Analysis
**What Happened:**
1. Developer accidentally committed connection string to public GitHub repo
2. Attacker found exposed credentials via GitHub scanning
3. Attacker accessed database and queried user collection
4. Azure Security Center detected unusual access pattern
**Why It Happened:**
1. No pre-commit hooks to detect secrets
2.. Developer not trained on secret management
2. No code review caught the issue
3. Repository set to public by mistake
## Actions Taken
**Immediate:**
- β
IP blocked
- β
Connection string rotated
- β
Repository made private
- β
All secrets removed from Git history
- β
Users notified (data breach notification)
**Short-term (1 week):**
- β
Pre-commit hooks installed (detect-secrets)
- β
All developers trained on secret management
- β
Code review process enhanced
- β
All repositories audited for secrets
**Long-term (1-3 months):**
- π Migrate all secrets to Azure Key Vault
- π Implement SAST in CI/CD
- π Quarterly security training
- π Annual penetration testing
## Lessons Learned
**What Went Well:**
- Quick detection (Azure Security Center)
- Rapid response (2h 45m total)
- Good communication (war room effective)
- No data exfiltration
**What Needs Improvement:**
- Secret management (hardcoded credentials)
- Developer training (security awareness)
- Code review (missed the secret)
- Repository permissions (shouldn't be public)
## Metrics
- **MTTD (Mean Time To Detect):** 5 minutes
- **MTTR (Mean Time To Respond):** 165 minutes
- **MTTC (Mean Time To Contain):** 10 minutes
- **MTTE (Mean Time To Eradicate):** 95 minutes
- **MTTR (Mean Time To Recover):** 75 minutes
## Sign-off
- **Incident Commander:** John Doe
- **Security Lead:** Jane Smith
- **CTO:** Bob Johnson
- **Date:** 2025-01-20
π Incident PlaybooksΒΆ
Playbook 1: Data BreachΒΆ
Triggers:
- Unauthorized database access
- Data exfiltration detected
- Customer data exposed
Response:
- Contain: Block attacker access
- Assess: Determine data accessed/exfiltrated
- Notify: Legal, compliance, affected users (within 72 hours for GDPR)
- Investigate: How did breach occur?
- Remediate: Fix vulnerability
- Report: Regulatory authorities (if required)
Legal Requirements:
- GDPR: 72-hour notification to supervisory authority
- DPDPA 2023: ASAP notification to Data Protection Board
- HIPAA: 60-day notification (if PHI involved)
Playbook 2: DDoS AttackΒΆ
Triggers:
- Massive traffic spike
- Service degradation
- Azure DDoS Protection activated
Response:
- Verify: Confirm DDoS (not legitimate traffic spike)
- Azure DDoS: Ensure mitigation active
- Scale: Increase infrastructure capacity
- CDN: Enable Azure Front Door (if not already)
- Monitor: Track attack patterns
- Communicate: Status page updates
- Post-Mortem: Analyze attack, improve defenses
Playbook 3: RansomwareΒΆ
Triggers:
- Files encrypted
- Ransom note detected
- System locked
Response:
- DO NOT PAY RANSOM
- Isolate: Disconnect affected systems
- Identify: Determine ransomware variant
- Restore: From clean backups
- Scan: Full malware scan
- Patch: Fix entry point
- Report: Law enforcement (FBI, local authorities)
Playbook 4: API Key CompromiseΒΆ
Triggers:
- Unexpected API usage spike
- API key found in public repository
- Third-party reports compromised key
Response:
- Revoke: Immediate key rotation
- Audit: Review API logs for abuse
- Assess: Financial impact (API bill)
- Notify: Affected customers/partners
- Investigate: How was key exposed?
- Prevent: Implement secret scanning
Example:
# Automated response
if detect_api_key_in_public_repo():
revoke_api_key_immediately()
generate_new_key()
notify_security_team()
create_incident_ticket()
Playbook 5: Account TakeoverΒΆ
Triggers:
- Multiple failed login attempts
- Login from unusual location
- User report of unauthorized access
Response:
- Lock: Suspend compromised account
- Reset: Force password reset
- Revoke: Invalidate all sessions/tokens
- Audit: Review account activity
- Notify: User via verified email/phone
- Enable: MFA requirement
- Investigate: Phishing? Credential stuffing?
π Contact InformationΒΆ
InternalΒΆ
Security Team:
- Email: security@machineavatars.com
- Slack: #security-incidents
- Phone: +91-98765-43210 (24/7)
Escalation:
- P0/P1: Notify CTO immediately
- P2: Security Lead within 4 hours
- P3: Email security team
ExternalΒΆ
Regulatory Authorities:
- GDPR (India): dataprotection@india.gov
- DPDPA: Data Protection Board of India
Law Enforcement:
- Cyber Crime:Indian Cyber Crime Coordination Centre
- FBI (if international): ic3.gov
π Incident Metrics & KPIsΒΆ
Track Monthly:
- Number of incidents by severity
- MTTD, MTTR, MTTC, MTTE
- False positive rate
- Recurring incident types
Goals:
- MTTD: <15 minutes
- MTTR (P0): <4 hours
- MTTR (P1): <24 hours
- False positive rate: <10%
π Related DocumentationΒΆ
Security:
- Secret Management - API key compromise response
- Authentication & Authorization - Account takeover response
- Security Testing - Vulnerability disclosure
"Hope for the best, prepare for the worst, respond with confidence." π¨π