UNPKG

@sebastienrousseau/dotfiles

Version:

The Trusted Shell Platform — Universal dotfiles managed by Chezmoi. Features Bash & Zsh for macOS, Linux & WSL. Rust modern tooling & enterprise-grade security.

580 lines (413 loc) 18.1 kB
--- render_with_liquid: false --- # Incident Response Plan Incident response procedures for the dotfiles repository. Covers supply chain compromise, secrets exposure, configuration drift, tool tampering, and CI pipeline attacks. Based on NIST SP 800-61 Rev. 2 (Computer Security Incident Handling Guide). ## Purpose and Scope This plan defines the detection, triage, containment, eradication, recovery, and post-mortem procedures for security incidents affecting the dotfiles distribution. ### In Scope | Domain | Description | |--------|-------------| | Supply chain | Dependency tampering in Homebrew, Nix, Zinit, Neovim plugins, npm, pip | | Secrets exposure | API keys, SSH keys, tokens leaked to git history, shell history, or logs | | Configuration drift | Unauthorized changes to deployed dotfiles, chezmoi apply failures | | Tool compromise | Malicious updates to mise, Nix, or Homebrew-managed binaries | | CI pipeline | GitHub Actions workflow tampering, compromised action dependencies | ### Out of Scope | Domain | Reason | |--------|--------| | Operating system vulnerabilities | Managed by OS vendor security updates | | Hardware compromise | Physical security is outside repository scope | | Third-party SaaS breaches | Upstream provider responsibility (1Password, GitHub) | --- ## Severity Classification Matrix Severity levels and response times align with the Vulnerability Response SLA defined in [COMPLIANCE.md](COMPLIANCE.md). | Severity | Definition | Examples | Initial Response | Resolution Target | |----------|------------|----------|------------------|-------------------| | **Critical** | Active exploitation or secrets exposed in public repository | Leaked API key in git history; compromised CI workflow pushing malicious code; active credential abuse | 24 hours | 48 hours | | **High** | Confirmed compromise with no evidence of active exploitation | Unsigned commits merged to main; supply chain dependency with known CVE; tampered binary in PATH | 72 hours | 7 days | | **Medium** | Policy violation or misconfiguration with limited blast radius | Chezmoi apply drift on non-sensitive config; TLS bypass pattern in script; stale Nix flake lock | 5 business days | 30 days | | **Low** | Informational finding, hardening opportunity, or cosmetic policy gap | Missing CODEOWNERS entry; advisory shellcheck warning; outdated pinned version | 10 business days | 90 days | --- ## Incident Response Phases ### Phase 1: Detection Identify the incident through automated or manual signals. **Automated detection sources:** | Source | Signal | Tool | |--------|--------|------| | Pre-commit hooks | Blocked secret, insecure pattern | Gitleaks, detect-secrets, compliance-guard | | CI pipeline | Failed security scan, unsigned commit | `security-enhanced.yml`, `compliance-guard.yml` | | Nightly checks | Dependency version drift, CVE match | `nightly.yml` | | CodeQL | Static analysis finding | `codeql.yml` | | Audit log | Unexpected operation | `dot audit` (`~/.local/share/dotfiles.log`) | **Manual detection signals:** - Unexpected `chezmoi diff` output on a clean system - Unrecognized entries in `git log --show-signature` - Binary hash mismatch for tools in `~/.local/bin` - Shell startup latency spike (possible injected sourcing) ### Phase 2: Triage Classify the incident severity and assign ownership. ```bash # Gather initial evidence dot audit | tail -50 git log --oneline --show-signature -20 chezmoi diff chezmoi verify # Check for secrets in recent history gitleaks detect --source . --log-opts="-20" # Validate binary integrity mise ls --current nix flake metadata ``` **Triage decision tree:** 1. Are credentials or secrets exposed publicly? -> **Critical** 2. Is a signed commit chain broken? -> **High** 3. Is the CI pipeline producing unexpected artifacts? -> **High** 4. Is configuration drift limited to non-sensitive files? -> **Medium** 5. Is the finding advisory with no active risk? -> **Low** ### Phase 3: Containment Stop the incident from spreading. Actions depend on severity. **Immediate containment (Critical/High):** ```bash # Revoke exposed credentials dot secrets rotate --all-exposed # Revoke SSH certificates dot ssh-cert revoke # Lock configuration files to prevent further modification lock-configs # Disable compromised CI workflow gh workflow disable <workflow-name> # Force-protect the main branch gh api repos/{owner}/{repo}/branches/master/protection \ --method PUT \ --field required_status_checks='{"strict":true,"contexts":["ci"]}' \ --field enforce_admins=true ``` **Short-term containment (Medium/Low):** ```bash # Pin the affected dependency to a known-good version # In .chezmoidata.toml or flake.nix, revert to last verified version # Re-apply known-good configuration chezmoi apply --force # Clear potentially tainted caches rm -rf ~/.cache/shell/ ``` ### Phase 4: Eradication Remove the root cause of the incident. | Incident Type | Eradication Action | |---------------|-------------------| | Secrets in git history | Rewrite history with `git filter-repo`; rotate all exposed credentials | | Compromised dependency | Pin to patched version; update `flake.lock` or `lazy-lock.json` | | Unsigned commits | Rebase and re-sign the commit chain; enforce branch protection | | Tampered binary | Reinstall from verified source; validate checksums | | CI workflow compromise | Audit workflow diff; re-pin actions to verified SHA; rotate `GITHUB_TOKEN` | ### Phase 5: Recovery Restore normal operations and verify integrity. ```bash # Re-apply dotfiles from clean source chezmoi init --apply # Verify deployed state matches source chezmoi verify # Run full test suite ./tests/framework/test_runner.sh # Run compliance checks pre-commit run --all-files # Validate system health dot health dot doctor # Confirm audit log captures recovery dot audit | tail -20 ``` ### Phase 6: Post-Mortem Conduct a structured review within 5 business days of resolution. Use the [Post-Incident Review Template](#post-incident-review-template) below. --- ## Runbooks ### Runbook 1: Secrets Leaked to Git History **Detection:** Gitleaks pre-commit hook, TruffleHog CI scan, or manual discovery. ```bash # Step 1: Identify exposed secrets gitleaks detect --source . --verbose --report-path /tmp/gitleaks-report.json # Step 2: Determine exposure window git log --all --oneline --diff-filter=A -- '**/.*env*' '**/*key*' '**/*token*' # Step 3: Rotate all exposed credentials immediately # API keys: regenerate in provider dashboard # SSH keys: generate new keypair and update authorized_keys ssh-keygen -t ed25519 -f ~/.ssh/id_ed25519_new # Tokens: revoke and reissue via provider # Step 4: Remove secrets from git history git filter-repo --invert-paths --path <file-containing-secret> # Step 5: Force-push cleaned history (requires branch protection override) git push origin --force --all # Step 6: Notify GitHub to purge cached views # Open a support ticket at https://support.github.com for cache invalidation # Step 7: Update Gitleaks baseline gitleaks detect --source . --baseline-path config/gitleaks-baseline.json # Step 8: Verify clean state gitleaks detect --source . --verbose ``` **Post-action:** Add the leaked pattern to `config/gitleaks.toml` allowlist if it was a false positive, or add a new rule if the pattern was not previously covered. ### Runbook 2: Supply Chain Compromise **Detection:** CVE advisory, unexpected binary behavior, hash mismatch, or `nightly.yml` alert. ```bash # Step 1: Identify the compromised package mise ls --current brew list --versions nix flake metadata # Step 2: Check upstream advisories gh api /advisories --jq '.[] | select(.package.name == "<package>")' # Step 3: Pin to last known-good version # For Nix: revert flake.lock to known-good commit git checkout <known-good-commit> -- flake.lock nix flake lock --update-input nixpkgs # For Homebrew: pin the formula brew pin <formula> # For mise: set explicit version mise use <tool>@<safe-version> # Step 4: Verify integrity of installed binary sha256sum "$(which <tool>)" # Compare against published checksums from upstream release # Step 5: Re-apply configuration with clean dependencies chezmoi apply --force # Step 6: Run full test suite to detect behavioral changes ./tests/framework/test_runner.sh ``` ### Runbook 3: Configuration Drift **Detection:** `chezmoi diff` shows unexpected changes, `chezmoi verify` fails, or `dot health` reports drift. ```bash # Step 1: Identify drift scope chezmoi diff chezmoi managed --include=files | wc -l # Step 2: Capture current deployed state for forensic comparison chezmoi dump --format=json > /tmp/chezmoi-state-$(date +%s).json # Step 3: Check audit log for unauthorized operations dot audit | grep -E "(apply|edit|add)" | tail -20 # Step 4: Determine root cause # Option A: Local manual edit (benign) # Option B: External tool modified config (investigate) # Option C: Compromised apply (escalate to Critical) # Step 5: Restore to source-of-truth state chezmoi apply --force # Step 6: Verify restoration chezmoi verify chezmoi diff # Should produce no output # Step 7: Lock critical configs to prevent recurrence lock-configs ``` ### Runbook 4: Tool Compromise (mise/Nix Managed) **Detection:** Unexpected binary behavior, hash mismatch, or CVE disclosure for a managed tool. ```bash # Step 1: Quarantine the affected tool chmod 000 "$(which <tool>)" # Step 2: Record forensic evidence sha256sum "$(which <tool>)" > /tmp/quarantine-evidence.txt ls -la "$(which <tool>)" >> /tmp/quarantine-evidence.txt file "$(which <tool>)" >> /tmp/quarantine-evidence.txt # Step 3: Check if the tool executed during shell startup grep "<tool>" ~/.cache/shell/* grep "<tool>" ~/.local/share/dotfiles.log # Step 4: Reinstall from verified source mise install <tool>@<verified-version> --force # Or for Nix: nix profile remove <package> nix profile install nixpkgs#<package> # Step 5: Verify replacement binary sha256sum "$(which <tool>)" # Step 6: Clear cached eval output that may reference the compromised tool rm -rf ~/.cache/shell/ # Step 7: Restart shell and verify clean startup exec "$SHELL" -l dot health ``` ### Runbook 5: CI Pipeline Compromise **Detection:** Unexpected workflow behavior, modified workflow files, compromised action dependency. ```bash # Step 1: Disable the affected workflow gh workflow disable <workflow-name> # Step 2: Audit recent workflow changes git log --oneline --all -- '.github/workflows/' git diff HEAD~10 -- '.github/workflows/' # Step 3: Check action pinning integrity grep -r "uses:" .github/workflows/ | grep -v "@" # Find unpinned actions # Step 4: Verify no unauthorized secrets access gh api repos/{owner}/{repo}/actions/runs \ --jq '.workflow_runs[] | {id, name, conclusion, head_sha}' | head -20 # Step 5: Review workflow permissions grep -r "permissions:" .github/workflows/ # Step 6: Restore workflows from known-good state git checkout <known-good-commit> -- .github/workflows/ # Step 7: Re-pin all actions to verified SHA # Replace tag references with full commit SHA # Example: actions/checkout@v4 -> actions/checkout@<full-sha> # Step 8: Re-enable workflow and verify gh workflow enable <workflow-name> gh workflow run <workflow-name> gh run list --workflow=<workflow-name> --limit=1 ``` --- ## Communication Protocol ### Notification Matrix | Severity | Notify | Channel | Timeframe | |----------|--------|---------|-----------| | Critical | Repository owner, all contributors | GitHub Security Advisory, direct message | Immediate | | High | Repository owner | GitHub Issue (private), direct message | Within 24 hours | | Medium | Repository owner | GitHub Issue | Within 5 business days | | Low | Tracked in backlog | GitHub Issue with `security` label | Next review cycle | ### Escalation Path ```text 1. Incident detected └── Automated: CI/hook blocks and logs └── Manual: Reporter opens private advisory 2. Triage (within Initial Response SLA) └── Classify severity └── Assign owner 3. Escalation triggers └── No response within SLA -> escalate to next severity level └── Scope expansion -> reclassify severity upward └── Active exploitation confirmed -> immediate Critical classification ``` ### External Notification | Condition | Action | |-----------|--------| | Leaked credentials for third-party service | Notify the service provider to revoke/rotate | | Compromised upstream dependency | Open issue on upstream repository | | GitHub Actions vulnerability | Report via GitHub security advisory | --- ## Evidence Preservation Preserve all forensic evidence before performing eradication or recovery actions. ### Evidence Collection Checklist | Evidence | Command | Storage | |----------|---------|---------| | Dotfiles audit log | `cp ~/.local/share/dotfiles.log /tmp/incident-$(date +%s)/` | Local archive | | Git reflog | `git reflog > /tmp/incident-$(date +%s)/reflog.txt` | Local archive | | Git signatures | `git log --show-signature -50 > /tmp/incident-$(date +%s)/signatures.txt` | Local archive | | Chezmoi state | `chezmoi dump --format=json > /tmp/incident-$(date +%s)/chezmoi-state.json` | Local archive | | Binary hashes | `sha256sum ~/.local/bin/* > /tmp/incident-$(date +%s)/binary-hashes.txt` | Local archive | | CI run logs | `gh run view <run-id> --log > /tmp/incident-$(date +%s)/ci-log.txt` | Local archive | | Shell cache | `cp -r ~/.cache/shell/ /tmp/incident-$(date +%s)/shell-cache/` | Local archive | ### Structured Log Format Incident evidence is recorded in JSONL format for automated processing: ```bash # Append structured incident event to log log_incident() { local severity="$1" type="$2" description="$3" printf '{"timestamp":"%s","severity":"%s","type":"%s","description":"%s","user":"%s","hostname":"%s"}\n' \ "$(date -u +%Y-%m-%dT%H:%M:%SZ)" \ "$severity" \ "$type" \ "$description" \ "$(whoami)" \ "$(hostname)" \ >> ~/.local/share/incident-log.jsonl } # Usage log_incident "critical" "secrets-exposure" "API key found in commit abc1234" ``` ### Retention Policy | Evidence Type | Retention Period | |---------------|-----------------| | Incident logs (JSONL) | 1 year | | Git reflog snapshots | 90 days | | Binary hash records | Until next verified release | | CI run logs | 90 days (GitHub default) | --- ## Recovery Procedures ### Standard Recovery ```bash # Roll back to a known-good dotfiles state dot rollback # Restore specific configuration files dot restore <target> # Full re-apply from source of truth chezmoi init --apply --force # Verify system health dot health dot doctor ``` ### Full System Recovery For incidents requiring complete re-provisioning: ```bash # Step 1: Export current secrets (if not compromised) dot secrets export > /tmp/secrets-backup.age # Step 2: Clear all deployed dotfiles chezmoi purge # Step 3: Clear all caches rm -rf ~/.cache/shell/ rm -rf ~/.cache/chezmoi/ # Step 4: Re-initialize from clean clone git clone <repo-url> ~/.dotfiles cd ~/.dotfiles git verify-commit HEAD # Verify signature chain # Step 5: Re-apply chezmoi init --apply # Step 6: Re-import secrets (if exported) dot secrets import /tmp/secrets-backup.age # Step 7: Verify chezmoi verify dot health ./tests/framework/test_runner.sh ``` ### Recovery Verification Checklist | Check | Command | Expected | |-------|---------|----------| | Chezmoi state clean | `chezmoi diff` | No output | | All managed files present | `chezmoi verify` | Exit code 0 | | Test suite passes | `./tests/framework/test_runner.sh` | All assertions pass | | Health dashboard green | `dot health` | No errors | | Signatures valid | `git log --show-signature -5` | All commits signed | | Pre-commit hooks active | `pre-commit run --all-files` | All hooks pass | --- ## Post-Incident Review Template Conduct a review within 5 business days of resolution. Copy the template below into a new file under `docs/security/incidents/`. ````markdown # Post-Incident Review: [INCIDENT-YYYY-NNN] ## Metadata | Field | Value | |-------|-------| | **Date detected** | YYYY-MM-DD HH:MM UTC | | **Date resolved** | YYYY-MM-DD HH:MM UTC | | **Severity** | Critical / High / Medium / Low | | **Incident type** | Secrets exposure / Supply chain / Drift / Tool compromise / CI compromise | | **Responder** | @handle | ## Timeline | Time (UTC) | Event | |------------|-------| | HH:MM | Incident detected by [source] | | HH:MM | Triage completed, classified as [severity] | | HH:MM | Containment action taken: [description] | | HH:MM | Root cause identified: [description] | | HH:MM | Eradication completed | | HH:MM | Recovery verified | ## Root Cause [Describe the root cause. Include the specific technical failure, misconfiguration, or external event.] ## Impact | Dimension | Assessment | |-----------|------------| | Data exposed | [What data, if any, was exposed] | | Systems affected | [Which machines, configs, or pipelines] | | Duration of exposure | [Time between introduction and remediation] | ## Response Evaluation | Metric | Target | Actual | |--------|--------|--------| | Detection time | Automated / < 1 hour | [actual] | | Initial response | Per severity SLA | [actual] | | Resolution | Per severity SLA | [actual] | ## Lessons Learned ### What went well - [Item] ### What needs improvement - [Item] ## Action Items | Action | Owner | Due Date | Status | |--------|-------|----------|--------| | [Preventive measure] | @handle | YYYY-MM-DD | Open | | [Detection improvement] | @handle | YYYY-MM-DD | Open | | [Process update] | @handle | YYYY-MM-DD | Open | ```` --- ## References - [NIST SP 800-61 Rev. 2 — Computer Security Incident Handling Guide](https://csrc.nist.gov/publications/detail/sp/800-61/rev-2/final) - [COMPLIANCE.md — Vulnerability Response SLA](COMPLIANCE.md) - [THREAT_MODEL.md — Attack Surfaces and Mitigations](THREAT_MODEL.md) - [SECURITY.md — Security Controls and Hardening](SECURITY.md) - [SLSA Framework](https://slsa.dev/) - [Gitleaks](https://github.com/gitleaks/gitleaks)