@gati-framework/observability
Version:
Observability stack for Gati framework - Prometheus, Grafana, Loki, and Tracing
276 lines (210 loc) • 6.11 kB
Markdown
# @gati-framework/observability
Complete observability stack for Gati applications with Prometheus, Grafana, Loki, and distributed tracing.
## Features
- ✅ **Prometheus Metrics** - Automatic metrics collection and exposure
- ✅ **Grafana Dashboards** - Pre-built dashboards for monitoring
- ✅ **Loki Integration** - Structured log aggregation
- ✅ **Distributed Tracing** - OpenTelemetry-based request tracing
- ✅ **Custom Metrics** - Easy custom metric creation
- ✅ **Auto-instrumentation** - Zero-code instrumentation
## Installation
```bash
pnpm add @gati-framework/observability
```
## Quick Start
```typescript
import { ObservabilityStack } from '@gati-framework/observability';
const observability = new ObservabilityStack({
serviceName: 'my-gati-app',
serviceVersion: '1.0.0',
environment: 'production',
prometheus: true,
loki: {
host: 'http://loki:3100',
labels: { team: 'platform' },
},
tracing: {
serviceName: 'my-gati-app',
autoInstrument: true,
metricsPort: 9464,
},
});
// Apply middleware to your Express/HTTP server
const middleware = observability.getMiddleware();
app.use(...middleware);
// Expose metrics endpoint
app.get('/metrics', await observability.getMetricsHandler());
```
## Prometheus Metrics
### Built-in Metrics
- `gati_http_requests_total` - Total HTTP requests
- `gati_http_request_duration_seconds` - Request duration histogram
- `gati_active_connections` - Active connection count
- `gati_errors_total` - Total errors by type and route
- Process metrics (CPU, memory, etc.)
### Custom Metrics
```typescript
const { metrics } = observability;
// Create custom counter
const orderCounter = metrics.createCounter(
'orders_total',
'Total number of orders',
['status']
);
orderCounter.inc({ status: 'completed' });
// Create custom gauge
const queueSize = metrics.createGauge(
'queue_size',
'Current queue size',
['queue_name']
);
queueSize.set({ queue_name: 'emails' }, 42);
// Create custom histogram
const queryDuration = metrics.createHistogram(
'db_query_duration_seconds',
'Database query duration',
['query_type'],
[0.01, 0.05, 0.1, 0.5, 1]
);
queryDuration.observe({ query_type: 'select' }, 0.043);
```
## Loki Logging
### Structured Logging
```typescript
const { logger } = observability;
logger.info('User logged in', {
userId: '123',
email: 'user@example.com'
});
logger.warn('Rate limit exceeded', {
ip: '192.168.1.1',
limit: 100
});
logger.error('Database connection failed', new Error('Connection timeout'), {
database: 'users',
retries: 3,
});
```
### Child Loggers
```typescript
const requestLogger = logger.child({
requestId: 'abc-123',
userId: '456',
});
requestLogger.info('Processing request');
requestLogger.debug('Query executed', { query: 'SELECT ...' });
```
## Distributed Tracing
### Automatic Instrumentation
Tracing is automatically enabled for HTTP requests when auto-instrumentation is on.
### Manual Spans
```typescript
const { tracing } = observability;
// Create and manage spans manually
await tracing.withSpan('process-order', async (span) => {
span.setAttribute('order.id', orderId);
span.setAttribute('order.amount', amount);
// Your business logic here
await processPayment(orderId);
tracing.addEvent('payment-processed', { orderId });
return result;
}, {
'order.type': 'standard',
});
```
### Trace Context
```typescript
// Get current trace ID for correlation
const traceId = tracing.getTraceContext();
console.log('Trace ID:', traceId);
```
## Grafana Dashboards
Pre-built Grafana dashboard available at:
`node_modules/@gati-framework/observability/dashboards/gati-overview.json`
### Import Dashboard
1. Open Grafana
2. Go to Dashboards → Import
3. Upload `gati-overview.json`
4. Select your Prometheus data source
5. Click Import
### Dashboard Panels
- HTTP Request Rate
- Request Duration (p95, p99)
- Active Connections
- Error Rate by Type
- Memory Usage
- CPU Usage
- Status Code Distribution
## Configuration
### Prometheus Configuration
Create `prometheus.yml`:
```yaml
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'gati-app'
static_configs:
- targets: ['localhost:3000']
metrics_path: '/metrics'
```
### Loki Configuration
```yaml
# loki-config.yaml
auth_enabled: false
server:
http_listen_port: 3100
ingester:
lifecycler:
ring:
kvstore:
store: inmemory
replication_factor: 1
schema_config:
configs:
- from: 2020-10-24
store: boltdb-shipper
object_store: filesystem
schema: v11
index:
prefix: index_
period: 24h
storage_config:
boltdb_shipper:
active_index_directory: /tmp/loki/index
cache_location: /tmp/loki/cache
filesystem:
directory: /tmp/loki/chunks
```
## Docker Compose Example
```yaml
version: '3'
services:
prometheus:
image: prom/prometheus:latest
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
grafana:
image: grafana/grafana:latest
ports:
- "3001:3000"
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
loki:
image: grafana/loki:latest
ports:
- "3100:3100"
volumes:
- ./loki-config.yaml:/etc/loki/local-config.yaml
```
## Best Practices
1. **Use structured logging** - Always include relevant context in logs
2. **Set appropriate metric buckets** - Tailor histogram buckets to your latency profile
3. **Create custom metrics** - Track business KPIs alongside technical metrics
4. **Use trace context** - Include trace IDs in logs for correlation
5. **Monitor cardinality** - Avoid high-cardinality label values
## API Reference
See [TypeScript definitions](./src/index.ts) for complete API documentation.
## License
MIT © Krishna Paul