aiwg
Version:
Cognitive architecture for AI-augmented software development with structured memory, ensemble validation, and closed-loop correction. FAIR-aligned artifacts, 84% cost reduction via human-in-the-loop, standards adopted by 100+ organizations.
579 lines (421 loc) • 19.8 kB
Markdown
# Architecture Decision Record: JWT vs Server-Side Sessions (Complete Example)
## Document Metadata
**ADR ID**: ADR-003
**Date**: 2026-01-15
**Status**: Accepted
**Authors**: Software Architect, Security Auditor
**Reviewers**: Technical Lead, DevOps Engineer, Security Team
**Supersedes**: None
**Superseded By**: None
## Title
Use JWT (JSON Web Tokens) with Server-Side Session Storage for Authentication
## Status
**Accepted** (2026-01-20)
**Decision History**:
- 2026-01-15: Proposed
- 2026-01-18: Reviewed by security team
- 2026-01-20: Accepted by architecture review board
- 2026-01-25: Implementation started
## Context
Our e-commerce platform requires a secure, scalable authentication mechanism to support user login, session management, and access control for protected resources. We need to make a decision on session management architecture for our microservices-based system.
### Business Drivers
1. **Scale Requirements**: Support 100,000+ concurrent users during peak traffic (Black Friday, product launches)
2. **Security Requirements**: Prevent session hijacking, CSRF attacks, and comply with PCI DSS 3.2.1
3. **User Experience**: Fast authentication (<2s), seamless navigation across services
4. **Cost Constraints**: Minimize infrastructure costs while maintaining performance
### Technical Context
**Current System**:
- Microservices architecture with 12 backend services
- API Gateway (Kong) routing requests to services
- Existing PostgreSQL database for user accounts
- Redis available for caching
**Constraints**:
- Must support both web (SPA) and mobile app clients
- Services deployed across multiple AWS regions (us-east-1, eu-west-1)
- Cannot share session state via database (too slow)
- Must support horizontal scaling of all services
### Forces in Tension
| Force | JWT Favors | Server-Side Sessions Favor |
|-------|------------|---------------------------|
| **Scalability** | ✅ Stateless, no session storage needed | ❌ Requires shared session store (Redis) |
| **Security** | ❌ Cannot revoke tokens before expiry | ✅ Can immediately revoke sessions |
| **Performance** | ✅ No database/cache lookup per request | ❌ Redis lookup on every authenticated request |
| **Simplicity** | ✅ No infrastructure for session storage | ❌ Requires Redis cluster |
| **Token Size** | ❌ Large tokens (~1KB) increase bandwidth | ✅ Small session ID (~32 bytes) |
| **Logout** | ❌ Complex (requires blacklist) | ✅ Simple (delete session) |
| **Multi-Region** | ✅ Works without cross-region state | ❌ Requires replicated session store |
## Decision
We will use **JWT (JSON Web Tokens) signed with RS256** for authentication **combined with server-side session storage in Redis** to track active sessions.
### Hybrid Approach
This hybrid approach combines the benefits of both patterns:
1. **JWT as the authentication token** (stateless verification)
2. **Redis session store** to track active sessions (revocation capability)
### Specific Design
**Token Format**:
```json
{
"header": {
"alg": "RS256",
"typ": "JWT"
},
"payload": {
"sub": "user_id_123",
"email": "alice@example.com",
"iat": 1706515200,
"exp": 1706601600,
"session_id": "sess_abc123",
"roles": ["customer"]
},
"signature": "..."
}
```
**Session Storage** (Redis):
```
Key: session:{session_id}
Value: {user_id, email, ip_address, user_agent, created_at}
TTL: 24 hours (86400 seconds)
```
**Verification Flow**:
1. Service receives request with JWT in `Authorization: Bearer` header or `session_token` cookie
2. Service verifies JWT signature using public key (cached locally)
3. If signature valid, extract `session_id` from JWT payload
4. Check Redis for session existence: `EXISTS session:{session_id}`
5. If session exists, request is authenticated
6. If session missing, reject request (session revoked)
**Session Revocation**:
- Logout: `DEL session:{session_id}` in Redis
- JWT becomes invalid immediately even though signature is still valid
## Rationale
### Why This Decision
1. **Performance**: JWT verification is fast (signature check with cached public key), no database/cache query for authentication
2. **Scalability**: Services can verify JWTs independently without central session store query on every request
3. **Security**: Can revoke sessions immediately via Redis deletion (addresses JWT's main weakness)
4. **Simplicity**: Redis is already deployed for caching; reuse existing infrastructure
5. **Cost**: Redis session storage is cheap (32 bytes × 100K users = 3.2MB), minimal cost impact
### Why Not Pure JWT (Stateless Only)
**Rejected because**:
- Cannot immediately revoke sessions on logout, password change, or account compromise
- Would need JWT blacklist, which negates statelessness benefits
- Token size increases with more user claims (1KB+ impacts bandwidth)
- Security team requires immediate revocation capability (PCI DSS control)
### Why Not Pure Server-Side Sessions (Stateful Only)
**Rejected because**:
- Every authenticated request requires Redis query (adds latency)
- Session data grows with user attributes (30-50 bytes vs 1KB JWT)
- Harder to scale horizontally (all services need Redis access)
- More complex multi-region setup (Redis replication complexity)
### Trade-Offs We're Accepting
| Trade-Off | Impact | Mitigation |
|-----------|--------|------------|
| **Complexity** | Two-step verification (JWT + Redis check) | Encapsulate in auth middleware library |
| **Redis Dependency** | Services depend on Redis availability | Redis cluster with replication, circuit breaker on failure |
| **Dual Storage** | Session data in both JWT and Redis | Keep JWT minimal, Redis stores only session metadata |
| **Token Size** | JWT is still ~500 bytes (HTTP overhead) | Use HTTP compression, consider separate mobile token format |
## Consequences
### Positive Consequences
✅ **Performance**:
- JWT verification: ~5ms (signature check)
- Redis session check: ~2ms (simple EXISTS query)
- Total: ~7ms authentication overhead per request
✅ **Security**:
- Immediate session revocation on logout (fixes JWT's main weakness)
- Token forgery prevented by RS256 signature
- CSRF protection via `SameSite=Strict` cookies
✅ **Scalability**:
- Services scale horizontally without session affinity
- Redis cluster scales to 1M+ sessions
- Multi-region support via Redis replication
✅ **Developer Experience**:
- Standard JWT libraries available in all languages
- Middleware abstracts complexity
- Easy to test (mock Redis, use test JWTs)
✅ **Monitoring**:
- Track active sessions via Redis key count
- Detect anomalies (sudden session spike = credential stuffing)
- Audit logout events
### Negative Consequences
❌ **Infrastructure Dependency**:
- Services depend on Redis availability
- **Mitigation**: Redis cluster with 99.95% uptime SLA, circuit breaker fallback
❌ **Complexity**:
- Two-step verification adds cognitive load
- **Mitigation**: Auth middleware library hides complexity, comprehensive documentation
❌ **Token Size**:
- JWT ~500 bytes vs session ID ~32 bytes (15x larger)
- **Impact**: 500 bytes × 1M requests/day = 500MB/day extra bandwidth
- **Mitigation**: HTTP compression reduces to ~200 bytes, acceptable cost
❌ **Eventual Consistency**:
- Multi-region Redis replication has ~50ms lag
- **Impact**: Revoked session might be accepted for brief window in other regions
- **Mitigation**: Use regional logout (redirect to home region for logout)
### Operational Impacts
**Infrastructure**:
- Requires Redis cluster (3 nodes minimum for high availability)
- Estimated cost: $150/month (AWS ElastiCache r6g.medium × 3)
- Public/private key pair for JWT signing (rotate quarterly)
**Monitoring**:
- New metrics: jwt_verification_time, session_check_time, active_session_count
- New alerts: redis_session_store_unavailable, jwt_signature_verification_failed
**Deployment**:
- All services must update auth middleware to hybrid verification
- Rollout plan: canary deployment, monitor error rates
- Rollback plan: revert to old session-only middleware
## Alternatives Considered
### Alternative 1: Pure JWT (Stateless Only)
**Description**: Use JWT without server-side session storage. Services verify JWT signature only.
**Pros**:
- Simplest architecture (no session store needed)
- Best performance (no Redis query)
- Natural fit for microservices (fully stateless)
**Cons**:
- ❌ **Cannot revoke sessions immediately** (deal-breaker for security team)
- ❌ Logout requires JWT blacklist (negates statelessness)
- ❌ Larger tokens (1KB+) if storing user attributes
- ❌ Token refresh complexity (refresh tokens need storage anyway)
**Why Rejected**: Security requirement for immediate session revocation (PCI DSS 8.1.8) cannot be met.
### Alternative 2: Pure Server-Side Sessions (Stateful Only)
**Description**: Traditional session storage in Redis. Client receives opaque session ID (32-byte random string), server looks up full session data on every request.
**Pros**:
- Simple session revocation (delete Redis key)
- Small session IDs (32 bytes)
- Centralized session management
**Cons**:
- ❌ **Redis query on every request** (adds 2-5ms latency)
- ❌ All services tightly coupled to Redis
- ❌ Scaling challenge: Redis becomes single point of contention
- ❌ Multi-region complexity: session replication across regions
**Why Rejected**: Redis query per request adds measurable latency (5ms × 1M requests/day = 1.4 hours of cumulative latency). Performance requirement is <2s for p95 requests; cannot afford 5ms auth overhead.
### Alternative 3: OAuth 2.0 + OpenID Connect (OIDC)
**Description**: Delegate authentication to external identity provider (Auth0, Okta, Keycloak).
**Pros**:
- Industry-standard protocol
- Offload authentication complexity
- Built-in MFA, social login, SSO
**Cons**:
- ❌ **Vendor lock-in** (Auth0 pricing scales with MAU)
- ❌ External dependency (Auth0 downtime = our downtime)
- ❌ Cost: $150/month → $1500/month at 100K MAU
- ❌ Latency: External redirect adds 200-500ms
**Why Rejected**: Cost and vendor lock-in. We have competency to build auth in-house. May revisit for enterprise SSO in future.
### Alternative 4: Session Cookies Only (Traditional)
**Description**: Server sets encrypted session cookie, no JWT. Cookie contains encrypted session ID.
**Pros**:
- Simple, proven pattern
- Small cookies (~100 bytes encrypted)
- Native browser support
**Cons**:
- ❌ **Mobile app support harder** (cookies work differently on mobile)
- ❌ CSRF vulnerability (requires additional CSRF tokens)
- ❌ Cross-domain complications (API on api.example.com, app on example.com)
**Why Rejected**: Mobile app requires API token pattern (Authorization header), not cookies. Hybrid JWT+cookie approach supports both web and mobile.
## Implementation Notes
### Migration Plan
**Phase 1**: Deploy JWT auth middleware (2 weeks)
- Services support both old session-only and new JWT+session
- Feature flag: `jwt_auth_enabled=false` (default)
**Phase 2**: Enable JWT for 5% traffic (1 week)
- Monitor error rates, latency
- Collect feedback
**Phase 3**: Ramp to 100% (2 weeks)
- Gradual rollout: 5% → 25% → 50% → 100%
- Each ramp waits 3 days, monitors metrics
**Phase 4**: Remove old session-only code (1 week)
- Cleanup, documentation
**Total Timeline**: 6 weeks
### Key Implementation Decisions
**JWT Signing Algorithm**: RS256 (asymmetric)
- Services only need public key (no secret distribution)
- Key rotation easier (rotate private key, publish new public key)
**Token Expiry**: 24 hours
- Balances security (shorter = better) and UX (longer = fewer re-logins)
- Refresh tokens not needed (user re-authenticates after 24h)
**Session Storage Schema**:
```
Key: session:{session_id}
Value: JSON {user_id, email, ip_address, user_agent, created_at, last_active_at}
TTL: 86400 seconds (24 hours)
```
**Auth Middleware Library** (Node.js example):
```typescript
// @src/middleware/auth.ts
export async function authenticateRequest(req: Request): Promise<User> {
const token = extractToken(req); // From header or cookie
const claims = verifyJWT(token); // Verify signature
const sessionExists = await redis.exists(`session:${claims.session_id}`);
if (!sessionExists) {
throw new UnauthorizedError('Session revoked');
}
return { id: claims.sub, email: claims.email };
}
```
### Testing Strategy
**Unit Tests**:
- JWT signature verification
- Session Redis operations
- Token extraction from headers/cookies
**Integration Tests**:
- Full auth flow (login → JWT → verify → logout)
- Session revocation scenarios
- Expired token handling
**Load Tests**:
- 10,000 concurrent authenticated requests
- Target: p95 < 50ms for auth middleware
## Security Considerations
### Threat Model
**T1: Token Theft (XSS)**
- **Mitigation**: HttpOnly cookies (JavaScript cannot access)
- **Mitigation**: Content Security Policy (CSP) headers
**T2: Token Theft (Man-in-the-Middle)**
- **Mitigation**: HTTPS only (Strict-Transport-Security header)
- **Mitigation**: Secure cookie attribute (only sent over HTTPS)
**T3: Session Fixation**
- **Mitigation**: Generate new session_id on login (invalidate old session)
**T4: Session Hijacking**
- **Mitigation**: Bind session to IP address (check in Redis)
- **Mitigation**: Short TTL (24 hours)
**T5: CSRF (Cross-Site Request Forgery)**
- **Mitigation**: SameSite=Strict cookie attribute
- **Mitigation**: Check Origin/Referer headers
**T6: Token Replay**
- **Mitigation**: Short TTL (24 hours)
- **Mitigation**: Session revocation on logout
### Compliance
**PCI DSS 3.2.1**:
- ✅ Requirement 8.1.8: "Session timeout after 15 minutes of inactivity"
- **Our implementation**: Session TTL refreshed on activity, absolute timeout 24 hours
- ✅ Requirement 8.2.1: "Strong cryptography for authentication credentials"
- **Our implementation**: RS256 (2048-bit RSA keys), bcrypt password hashing
**GDPR**:
- ✅ Article 32: "Appropriate technical measures"
- **Our implementation**: Encrypted tokens, secure session storage
## Monitoring & Observability
### Metrics
```
# Authentication performance
auth.jwt_verification_time (histogram, p50/p95/p99)
auth.session_check_time (histogram, p50/p95/p99)
auth.total_time (histogram, p50/p95/p99)
# Session metrics
auth.active_sessions (gauge)
auth.sessions_created_total (counter)
auth.sessions_revoked_total (counter)
# Error metrics
auth.jwt_signature_invalid_total (counter)
auth.session_not_found_total (counter)
auth.redis_unavailable_total (counter)
```
### Alerts
```yaml
- name: RedisSessionStoreUnavailable
condition: auth.redis_unavailable_total > 10 in 5m
severity: critical
action: Page on-call engineer
- name: HighJWTSignatureFailures
condition: auth.jwt_signature_invalid_total > 100 in 5m
severity: warning
action: Notify security team (possible attack)
- name: AuthLatencyHigh
condition: auth.total_time p95 > 50ms
severity: warning
action: Notify engineering team
```
## Future Considerations
### Items Out of Scope (For Now)
1. **Refresh Tokens**: Current design requires re-login after 24 hours. If user feedback demands longer sessions, we'll add refresh tokens (stored in Redis with 30-day TTL).
2. **Multi-Factor Authentication (MFA)**: Not addressed in this ADR. Separate ADR planned for MFA (likely TOTP-based).
3. **OAuth 2.0 / OpenID Connect**: Not implementing external identity providers now. May revisit for enterprise customers needing SSO.
4. **Biometric Authentication**: WebAuthn support for mobile apps. Future enhancement.
5. **Anonymous/Guest Sessions**: Current design requires authentication. Guest checkout may need separate mechanism.
### Review Triggers
This decision should be reviewed if:
- Active session count exceeds 1M (Redis scaling required)
- User complaints about re-login frequency (consider refresh tokens)
- Security audit identifies weaknesses (immediate review)
- New authentication standards emerge (e.g., Passkeys become mainstream)
**Scheduled Review Date**: 2027-01-15 (1 year from acceptance)
## References
**Requirements**:
- @.aiwg/requirements/use-cases/UC-AUTH-001-user-authentication.md - User authentication use case
- @.aiwg/requirements/nfr-modules/security.md - Security requirements (PCI DSS, GDPR)
- @.aiwg/requirements/nfr-modules/performance.md - Performance requirements (<2s login)
**Architecture**:
- @.aiwg/architecture/software-architecture-doc.md#authentication-service - Authentication service design
- @.aiwg/architecture/decisions/ADR-005-bcrypt-password-hashing.md - Password hashing decision
- @.aiwg/architecture/diagrams/authentication-flow.md - Authentication flow diagram
**Security**:
- @.aiwg/security/threat-models/authentication-threat-model.md - Authentication threat model
- @.aiwg/security/controls/CTRL-002-session-management.md - Session management control
**Implementation**:
- @src/auth/JWTService.ts - JWT generation and verification
- @src/auth/SessionManager.ts - Redis session storage
- @src/middleware/AuthMiddleware.ts - Authentication middleware
**Research**:
- [RFC 7519 - JSON Web Token (JWT)](https://tools.ietf.org/html/rfc7519)
- [OWASP Session Management Cheat Sheet](https://cheatsheetseries.owasp.org/cheatsheets/Session_Management_Cheat_Sheet.html)
- [Auth0: JWT vs Server-Side Sessions](https://auth0.com/blog/stateless-auth-for-stateful-minds/)
**Standards**:
- PCI DSS 3.2.1 - Payment Card Industry Data Security Standard
- GDPR Article 32 - Security of processing
## Why This Example is Effective
### Comprehensive Context
- Explains business drivers, technical constraints, and forces in tension
- Shows we understood the problem space deeply before deciding
### Rigorous Alternative Analysis
- 4 alternatives considered with specific pros/cons
- Clear rejection rationale for each alternative
- Shows we didn't jump to first solution
### Concrete Consequences
- Quantified impacts: 500MB/day bandwidth, $150/month cost, 7ms latency
- Specific monitoring, deployment, and operational plans
- Both positive and negative consequences documented honestly
### Actionable Implementation
- Migration plan with timeline (6 weeks, 4 phases)
- Code examples showing actual implementation
- Testing strategy covering unit, integration, load tests
### Security Rigor
- Threat model with 6 specific threats and mitigations
- Compliance mapping (PCI DSS, GDPR)
- Consideration of attack vectors (XSS, MITM, CSRF, etc.)
### Future-Aware
- Explicit items out of scope with rationale
- Review triggers and scheduled review date
- Acknowledged what we don't know yet
## Anti-Patterns to Avoid
### ❌ Decision Without Context
**Bad**: "We're using JWT because it's modern."
**Good**: "JWT addresses scalability (stateless) but we need session storage for revocation (security requirement)."
### ❌ No Alternatives Considered
**Bad**: Only explaining chosen solution
**Good**: 4 alternatives analyzed, each with rejection rationale
### ❌ Vague Consequences
**Bad**: "This will improve performance."
**Good**: "JWT verification: ~5ms, Redis check: ~2ms, total: ~7ms per request (quantified performance impact)."
### ❌ Missing Trade-Offs
**Bad**: Only listing benefits
**Good**: "We accept 15x larger tokens (500 bytes vs 32) for immediate revocation capability."
### ❌ No Implementation Guidance
**Bad**: Stopping at high-level decision
**Good**: Code examples, migration plan, testing strategy, monitoring setup
**ADR Version**: 1.0
**Template Version**: 1.0
**Example Author**: Software Architect
**Last Updated**: 2026-01-28
**Quality Review**: Passed (Security Auditor, Technical Lead)