UNPKG

aiwg

Version:

Cognitive architecture for AI-augmented software development with structured memory, ensemble validation, and closed-loop correction. FAIR-aligned artifacts, 84% cost reduction via human-in-the-loop, standards adopted by 100+ organizations.

579 lines (421 loc) 19.8 kB
# Architecture Decision Record: JWT vs Server-Side Sessions (Complete Example) ## Document Metadata **ADR ID**: ADR-003 **Date**: 2026-01-15 **Status**: Accepted **Authors**: Software Architect, Security Auditor **Reviewers**: Technical Lead, DevOps Engineer, Security Team **Supersedes**: None **Superseded By**: None --- ## Title Use JWT (JSON Web Tokens) with Server-Side Session Storage for Authentication --- ## Status **Accepted** (2026-01-20) **Decision History**: - 2026-01-15: Proposed - 2026-01-18: Reviewed by security team - 2026-01-20: Accepted by architecture review board - 2026-01-25: Implementation started --- ## Context Our e-commerce platform requires a secure, scalable authentication mechanism to support user login, session management, and access control for protected resources. We need to make a decision on session management architecture for our microservices-based system. ### Business Drivers 1. **Scale Requirements**: Support 100,000+ concurrent users during peak traffic (Black Friday, product launches) 2. **Security Requirements**: Prevent session hijacking, CSRF attacks, and comply with PCI DSS 3.2.1 3. **User Experience**: Fast authentication (<2s), seamless navigation across services 4. **Cost Constraints**: Minimize infrastructure costs while maintaining performance ### Technical Context **Current System**: - Microservices architecture with 12 backend services - API Gateway (Kong) routing requests to services - Existing PostgreSQL database for user accounts - Redis available for caching **Constraints**: - Must support both web (SPA) and mobile app clients - Services deployed across multiple AWS regions (us-east-1, eu-west-1) - Cannot share session state via database (too slow) - Must support horizontal scaling of all services ### Forces in Tension | Force | JWT Favors | Server-Side Sessions Favor | |-------|------------|---------------------------| | **Scalability** | Stateless, no session storage needed | Requires shared session store (Redis) | | **Security** | Cannot revoke tokens before expiry | Can immediately revoke sessions | | **Performance** | No database/cache lookup per request | Redis lookup on every authenticated request | | **Simplicity** | No infrastructure for session storage | Requires Redis cluster | | **Token Size** | Large tokens (~1KB) increase bandwidth | Small session ID (~32 bytes) | | **Logout** | Complex (requires blacklist) | Simple (delete session) | | **Multi-Region** | Works without cross-region state | Requires replicated session store | --- ## Decision We will use **JWT (JSON Web Tokens) signed with RS256** for authentication **combined with server-side session storage in Redis** to track active sessions. ### Hybrid Approach This hybrid approach combines the benefits of both patterns: 1. **JWT as the authentication token** (stateless verification) 2. **Redis session store** to track active sessions (revocation capability) ### Specific Design **Token Format**: ```json { "header": { "alg": "RS256", "typ": "JWT" }, "payload": { "sub": "user_id_123", "email": "alice@example.com", "iat": 1706515200, "exp": 1706601600, "session_id": "sess_abc123", "roles": ["customer"] }, "signature": "..." } ``` **Session Storage** (Redis): ``` Key: session:{session_id} Value: {user_id, email, ip_address, user_agent, created_at} TTL: 24 hours (86400 seconds) ``` **Verification Flow**: 1. Service receives request with JWT in `Authorization: Bearer` header or `session_token` cookie 2. Service verifies JWT signature using public key (cached locally) 3. If signature valid, extract `session_id` from JWT payload 4. Check Redis for session existence: `EXISTS session:{session_id}` 5. If session exists, request is authenticated 6. If session missing, reject request (session revoked) **Session Revocation**: - Logout: `DEL session:{session_id}` in Redis - JWT becomes invalid immediately even though signature is still valid --- ## Rationale ### Why This Decision 1. **Performance**: JWT verification is fast (signature check with cached public key), no database/cache query for authentication 2. **Scalability**: Services can verify JWTs independently without central session store query on every request 3. **Security**: Can revoke sessions immediately via Redis deletion (addresses JWT's main weakness) 4. **Simplicity**: Redis is already deployed for caching; reuse existing infrastructure 5. **Cost**: Redis session storage is cheap (32 bytes × 100K users = 3.2MB), minimal cost impact ### Why Not Pure JWT (Stateless Only) **Rejected because**: - Cannot immediately revoke sessions on logout, password change, or account compromise - Would need JWT blacklist, which negates statelessness benefits - Token size increases with more user claims (1KB+ impacts bandwidth) - Security team requires immediate revocation capability (PCI DSS control) ### Why Not Pure Server-Side Sessions (Stateful Only) **Rejected because**: - Every authenticated request requires Redis query (adds latency) - Session data grows with user attributes (30-50 bytes vs 1KB JWT) - Harder to scale horizontally (all services need Redis access) - More complex multi-region setup (Redis replication complexity) ### Trade-Offs We're Accepting | Trade-Off | Impact | Mitigation | |-----------|--------|------------| | **Complexity** | Two-step verification (JWT + Redis check) | Encapsulate in auth middleware library | | **Redis Dependency** | Services depend on Redis availability | Redis cluster with replication, circuit breaker on failure | | **Dual Storage** | Session data in both JWT and Redis | Keep JWT minimal, Redis stores only session metadata | | **Token Size** | JWT is still ~500 bytes (HTTP overhead) | Use HTTP compression, consider separate mobile token format | --- ## Consequences ### Positive Consequences **Performance**: - JWT verification: ~5ms (signature check) - Redis session check: ~2ms (simple EXISTS query) - Total: ~7ms authentication overhead per request **Security**: - Immediate session revocation on logout (fixes JWT's main weakness) - Token forgery prevented by RS256 signature - CSRF protection via `SameSite=Strict` cookies **Scalability**: - Services scale horizontally without session affinity - Redis cluster scales to 1M+ sessions - Multi-region support via Redis replication **Developer Experience**: - Standard JWT libraries available in all languages - Middleware abstracts complexity - Easy to test (mock Redis, use test JWTs) **Monitoring**: - Track active sessions via Redis key count - Detect anomalies (sudden session spike = credential stuffing) - Audit logout events ### Negative Consequences **Infrastructure Dependency**: - Services depend on Redis availability - **Mitigation**: Redis cluster with 99.95% uptime SLA, circuit breaker fallback **Complexity**: - Two-step verification adds cognitive load - **Mitigation**: Auth middleware library hides complexity, comprehensive documentation **Token Size**: - JWT ~500 bytes vs session ID ~32 bytes (15x larger) - **Impact**: 500 bytes × 1M requests/day = 500MB/day extra bandwidth - **Mitigation**: HTTP compression reduces to ~200 bytes, acceptable cost **Eventual Consistency**: - Multi-region Redis replication has ~50ms lag - **Impact**: Revoked session might be accepted for brief window in other regions - **Mitigation**: Use regional logout (redirect to home region for logout) ### Operational Impacts **Infrastructure**: - Requires Redis cluster (3 nodes minimum for high availability) - Estimated cost: $150/month (AWS ElastiCache r6g.medium × 3) - Public/private key pair for JWT signing (rotate quarterly) **Monitoring**: - New metrics: jwt_verification_time, session_check_time, active_session_count - New alerts: redis_session_store_unavailable, jwt_signature_verification_failed **Deployment**: - All services must update auth middleware to hybrid verification - Rollout plan: canary deployment, monitor error rates - Rollback plan: revert to old session-only middleware --- ## Alternatives Considered ### Alternative 1: Pure JWT (Stateless Only) **Description**: Use JWT without server-side session storage. Services verify JWT signature only. **Pros**: - Simplest architecture (no session store needed) - Best performance (no Redis query) - Natural fit for microservices (fully stateless) **Cons**: - **Cannot revoke sessions immediately** (deal-breaker for security team) - Logout requires JWT blacklist (negates statelessness) - Larger tokens (1KB+) if storing user attributes - Token refresh complexity (refresh tokens need storage anyway) **Why Rejected**: Security requirement for immediate session revocation (PCI DSS 8.1.8) cannot be met. --- ### Alternative 2: Pure Server-Side Sessions (Stateful Only) **Description**: Traditional session storage in Redis. Client receives opaque session ID (32-byte random string), server looks up full session data on every request. **Pros**: - Simple session revocation (delete Redis key) - Small session IDs (32 bytes) - Centralized session management **Cons**: - **Redis query on every request** (adds 2-5ms latency) - All services tightly coupled to Redis - Scaling challenge: Redis becomes single point of contention - Multi-region complexity: session replication across regions **Why Rejected**: Redis query per request adds measurable latency (5ms × 1M requests/day = 1.4 hours of cumulative latency). Performance requirement is <2s for p95 requests; cannot afford 5ms auth overhead. --- ### Alternative 3: OAuth 2.0 + OpenID Connect (OIDC) **Description**: Delegate authentication to external identity provider (Auth0, Okta, Keycloak). **Pros**: - Industry-standard protocol - Offload authentication complexity - Built-in MFA, social login, SSO **Cons**: - **Vendor lock-in** (Auth0 pricing scales with MAU) - External dependency (Auth0 downtime = our downtime) - Cost: $150/month $1500/month at 100K MAU - Latency: External redirect adds 200-500ms **Why Rejected**: Cost and vendor lock-in. We have competency to build auth in-house. May revisit for enterprise SSO in future. --- ### Alternative 4: Session Cookies Only (Traditional) **Description**: Server sets encrypted session cookie, no JWT. Cookie contains encrypted session ID. **Pros**: - Simple, proven pattern - Small cookies (~100 bytes encrypted) - Native browser support **Cons**: - **Mobile app support harder** (cookies work differently on mobile) - CSRF vulnerability (requires additional CSRF tokens) - Cross-domain complications (API on api.example.com, app on example.com) **Why Rejected**: Mobile app requires API token pattern (Authorization header), not cookies. Hybrid JWT+cookie approach supports both web and mobile. --- ## Implementation Notes ### Migration Plan **Phase 1**: Deploy JWT auth middleware (2 weeks) - Services support both old session-only and new JWT+session - Feature flag: `jwt_auth_enabled=false` (default) **Phase 2**: Enable JWT for 5% traffic (1 week) - Monitor error rates, latency - Collect feedback **Phase 3**: Ramp to 100% (2 weeks) - Gradual rollout: 5% 25% 50% 100% - Each ramp waits 3 days, monitors metrics **Phase 4**: Remove old session-only code (1 week) - Cleanup, documentation **Total Timeline**: 6 weeks ### Key Implementation Decisions **JWT Signing Algorithm**: RS256 (asymmetric) - Services only need public key (no secret distribution) - Key rotation easier (rotate private key, publish new public key) **Token Expiry**: 24 hours - Balances security (shorter = better) and UX (longer = fewer re-logins) - Refresh tokens not needed (user re-authenticates after 24h) **Session Storage Schema**: ``` Key: session:{session_id} Value: JSON {user_id, email, ip_address, user_agent, created_at, last_active_at} TTL: 86400 seconds (24 hours) ``` **Auth Middleware Library** (Node.js example): ```typescript // @src/middleware/auth.ts export async function authenticateRequest(req: Request): Promise<User> { const token = extractToken(req); // From header or cookie const claims = verifyJWT(token); // Verify signature const sessionExists = await redis.exists(`session:${claims.session_id}`); if (!sessionExists) { throw new UnauthorizedError('Session revoked'); } return { id: claims.sub, email: claims.email }; } ``` ### Testing Strategy **Unit Tests**: - JWT signature verification - Session Redis operations - Token extraction from headers/cookies **Integration Tests**: - Full auth flow (login JWT verify logout) - Session revocation scenarios - Expired token handling **Load Tests**: - 10,000 concurrent authenticated requests - Target: p95 < 50ms for auth middleware --- ## Security Considerations ### Threat Model **T1: Token Theft (XSS)** - **Mitigation**: HttpOnly cookies (JavaScript cannot access) - **Mitigation**: Content Security Policy (CSP) headers **T2: Token Theft (Man-in-the-Middle)** - **Mitigation**: HTTPS only (Strict-Transport-Security header) - **Mitigation**: Secure cookie attribute (only sent over HTTPS) **T3: Session Fixation** - **Mitigation**: Generate new session_id on login (invalidate old session) **T4: Session Hijacking** - **Mitigation**: Bind session to IP address (check in Redis) - **Mitigation**: Short TTL (24 hours) **T5: CSRF (Cross-Site Request Forgery)** - **Mitigation**: SameSite=Strict cookie attribute - **Mitigation**: Check Origin/Referer headers **T6: Token Replay** - **Mitigation**: Short TTL (24 hours) - **Mitigation**: Session revocation on logout ### Compliance **PCI DSS 3.2.1**: - Requirement 8.1.8: "Session timeout after 15 minutes of inactivity" - **Our implementation**: Session TTL refreshed on activity, absolute timeout 24 hours - Requirement 8.2.1: "Strong cryptography for authentication credentials" - **Our implementation**: RS256 (2048-bit RSA keys), bcrypt password hashing **GDPR**: - Article 32: "Appropriate technical measures" - **Our implementation**: Encrypted tokens, secure session storage --- ## Monitoring & Observability ### Metrics ``` # Authentication performance auth.jwt_verification_time (histogram, p50/p95/p99) auth.session_check_time (histogram, p50/p95/p99) auth.total_time (histogram, p50/p95/p99) # Session metrics auth.active_sessions (gauge) auth.sessions_created_total (counter) auth.sessions_revoked_total (counter) # Error metrics auth.jwt_signature_invalid_total (counter) auth.session_not_found_total (counter) auth.redis_unavailable_total (counter) ``` ### Alerts ```yaml - name: RedisSessionStoreUnavailable condition: auth.redis_unavailable_total > 10 in 5m severity: critical action: Page on-call engineer - name: HighJWTSignatureFailures condition: auth.jwt_signature_invalid_total > 100 in 5m severity: warning action: Notify security team (possible attack) - name: AuthLatencyHigh condition: auth.total_time p95 > 50ms severity: warning action: Notify engineering team ``` --- ## Future Considerations ### Items Out of Scope (For Now) 1. **Refresh Tokens**: Current design requires re-login after 24 hours. If user feedback demands longer sessions, we'll add refresh tokens (stored in Redis with 30-day TTL). 2. **Multi-Factor Authentication (MFA)**: Not addressed in this ADR. Separate ADR planned for MFA (likely TOTP-based). 3. **OAuth 2.0 / OpenID Connect**: Not implementing external identity providers now. May revisit for enterprise customers needing SSO. 4. **Biometric Authentication**: WebAuthn support for mobile apps. Future enhancement. 5. **Anonymous/Guest Sessions**: Current design requires authentication. Guest checkout may need separate mechanism. ### Review Triggers This decision should be reviewed if: - Active session count exceeds 1M (Redis scaling required) - User complaints about re-login frequency (consider refresh tokens) - Security audit identifies weaknesses (immediate review) - New authentication standards emerge (e.g., Passkeys become mainstream) **Scheduled Review Date**: 2027-01-15 (1 year from acceptance) --- ## References **Requirements**: - @.aiwg/requirements/use-cases/UC-AUTH-001-user-authentication.md - User authentication use case - @.aiwg/requirements/nfr-modules/security.md - Security requirements (PCI DSS, GDPR) - @.aiwg/requirements/nfr-modules/performance.md - Performance requirements (<2s login) **Architecture**: - @.aiwg/architecture/software-architecture-doc.md#authentication-service - Authentication service design - @.aiwg/architecture/decisions/ADR-005-bcrypt-password-hashing.md - Password hashing decision - @.aiwg/architecture/diagrams/authentication-flow.md - Authentication flow diagram **Security**: - @.aiwg/security/threat-models/authentication-threat-model.md - Authentication threat model - @.aiwg/security/controls/CTRL-002-session-management.md - Session management control **Implementation**: - @src/auth/JWTService.ts - JWT generation and verification - @src/auth/SessionManager.ts - Redis session storage - @src/middleware/AuthMiddleware.ts - Authentication middleware **Research**: - [RFC 7519 - JSON Web Token (JWT)](https://tools.ietf.org/html/rfc7519) - [OWASP Session Management Cheat Sheet](https://cheatsheetseries.owasp.org/cheatsheets/Session_Management_Cheat_Sheet.html) - [Auth0: JWT vs Server-Side Sessions](https://auth0.com/blog/stateless-auth-for-stateful-minds/) **Standards**: - PCI DSS 3.2.1 - Payment Card Industry Data Security Standard - GDPR Article 32 - Security of processing --- ## Why This Example is Effective ### Comprehensive Context - Explains business drivers, technical constraints, and forces in tension - Shows we understood the problem space deeply before deciding ### Rigorous Alternative Analysis - 4 alternatives considered with specific pros/cons - Clear rejection rationale for each alternative - Shows we didn't jump to first solution ### Concrete Consequences - Quantified impacts: 500MB/day bandwidth, $150/month cost, 7ms latency - Specific monitoring, deployment, and operational plans - Both positive and negative consequences documented honestly ### Actionable Implementation - Migration plan with timeline (6 weeks, 4 phases) - Code examples showing actual implementation - Testing strategy covering unit, integration, load tests ### Security Rigor - Threat model with 6 specific threats and mitigations - Compliance mapping (PCI DSS, GDPR) - Consideration of attack vectors (XSS, MITM, CSRF, etc.) ### Future-Aware - Explicit items out of scope with rationale - Review triggers and scheduled review date - Acknowledged what we don't know yet --- ## Anti-Patterns to Avoid ### ❌ Decision Without Context **Bad**: "We're using JWT because it's modern." **Good**: "JWT addresses scalability (stateless) but we need session storage for revocation (security requirement)." ### ❌ No Alternatives Considered **Bad**: Only explaining chosen solution **Good**: 4 alternatives analyzed, each with rejection rationale ### ❌ Vague Consequences **Bad**: "This will improve performance." **Good**: "JWT verification: ~5ms, Redis check: ~2ms, total: ~7ms per request (quantified performance impact)." ### ❌ Missing Trade-Offs **Bad**: Only listing benefits **Good**: "We accept 15x larger tokens (500 bytes vs 32) for immediate revocation capability." ### ❌ No Implementation Guidance **Bad**: Stopping at high-level decision **Good**: Code examples, migration plan, testing strategy, monitoring setup --- **ADR Version**: 1.0 **Template Version**: 1.0 **Example Author**: Software Architect **Last Updated**: 2026-01-28 **Quality Review**: Passed (Security Auditor, Technical Lead)