@clduab11/gemini-flow

Version:

Revolutionary AI agent swarm coordination platform with Google Services integration, multimedia processing, and production-ready monitoring. Features 8 Google AI services, quantum computing capabilities, and enterprise-grade security.

github.com/claude-ai/gemini-flow

claude-ai/gemini-flow

472 lines (368 loc) • 11.4 kB

Markdown

# Service Startup and Shutdown Procedures ## Overview This document provides step-by-step procedures for safely starting up and shutting down Google Services components in the Gemini-Flow platform. ## Table of Contents 1. [Prerequisites](#prerequisites) 2. [Startup Procedures](#startup-procedures) 3. [Shutdown Procedures](#shutdown-procedures) 4. [Health Verification](#health-verification) 5. [Rollback Procedures](#rollback-procedures) 6. [Automation Scripts](#automation-scripts) ## Prerequisites ### Required Access - [ ] Google Cloud Console access with Project Editor role - [ ] Kubernetes cluster access (if using GKE) - [ ] Monitoring dashboard access (Grafana/Cloud Monitoring) - [ ] PagerDuty/alerting system access ### Pre-Startup Checklist - [ ] Verify all dependencies are healthy - [ ] Check service account credentials - [ ] Confirm database connectivity - [ ] Validate configuration files - [ ] Review recent error logs ## Startup Procedures ### 1. Core Services Startup #### 1.1 Vertex AI Connector ```bash # Verify Vertex AI service account gcloud auth application-default login export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json" # Start Vertex AI connector npm run start:vertex-ai # Verify health curl -f http://localhost:8080/health/vertex-ai || exit 1 ``` #### 1.2 Google Workspace Integration ```bash # Configure OAuth2 credentials export GOOGLE_WORKSPACE_CLIENT_ID="your-client-id" export GOOGLE_WORKSPACE_CLIENT_SECRET="your-client-secret" # Start workspace service npm run start:workspace # Health check curl -f http://localhost:8081/health/workspace || exit 1 ``` #### 1.3 Streaming API Services ```bash # Start multimodal streaming npm run start:streaming # Verify WebRTC connectivity npm run test:webrtc-connectivity # Check streaming endpoints curl -f http://localhost:8082/health/streaming || exit 1 ``` ### 2. Agent Services Startup #### 2.1 AgentSpace Manager ```bash # Initialize agent memory npm run init:agent-memory # Start AgentSpace service npm run start:agentspace # Verify agent spawning capability npm run test:agent-spawn || exit 1 ``` #### 2.2 A2A Protocol Manager ```bash # Start A2A message router npm run start:a2a-router # Initialize consensus protocols npm run init:consensus # Health verification curl -f http://localhost:8083/health/a2a || exit 1 ``` ### 3. Media Services Startup #### 3.1 Veo3 Video Generation ```bash # Start video generation service npm run start:veo3 # Verify GPU allocation nvidia-smi kubectl get nodes -l accelerator=nvidia-tesla-v100 # Test video generation endpoint curl -f http://localhost:8084/health/veo3 || exit 1 ``` #### 3.2 Imagen4 Image Generation ```bash # Start image generation service npm run start:imagen4 # Check storage connectivity gsutil ls gs://your-media-bucket # Health check curl -f http://localhost:8085/health/imagen4 || exit 1 ``` ### 4. Research Services Startup #### 4.1 Co-Scientist Integration ```bash # Start research coordinator npm run start:co-scientist # Verify academic database connections npm run test:academic-db-connectivity # Health verification curl -f http://localhost:8086/health/co-scientist || exit 1 ``` #### 4.2 Project Mariner ```bash # Start browser automation service npm run start:mariner # Verify Puppeteer/Selenium grid npm run test:browser-grid # Health check curl -f http://localhost:8087/health/mariner || exit 1 ``` ### 5. Post-Startup Verification ```bash #!/bin/bash # Complete system health check echo "=== Gemini-Flow Startup Verification ===" services=( "vertex-ai:8080" "workspace:8081" "streaming:8082" "agentspace:8083" "a2a-router:8083" "veo3:8084" "imagen4:8085" "co-scientist:8086" "mariner:8087" ) for service in "${services[@]}"; do name=$(echo $service | cut -d: -f1) port=$(echo $service | cut -d: -f2) if curl -f --max-time 10 http://localhost:$port/health > /dev/null 2>&1; then echo "✅ $name service healthy" else echo "❌ $name service unhealthy" exit 1 fi done echo "=== All services started successfully ===" ``` ## Shutdown Procedures ### 1. Graceful Shutdown Sequence #### Step 1: Stop accepting new requests ```bash # Drain load balancer traffic kubectl scale deployment gemini-flow-lb --replicas=0 # Wait for connection draining (30 seconds) sleep 30 ``` #### Step 2: Complete in-flight requests ```bash # Monitor active connections ss -tulpn | grep :8080 # Wait for completion (max 5 minutes) timeout 300 bash -c 'while [[ $(ss -tulpn | grep :8080 | wc -l) -gt 0 ]]; do sleep 5; done' ``` #### Step 3: Shutdown services in reverse dependency order ```bash # Stop research services first npm run stop:co-scientist npm run stop:mariner # Stop media generation services npm run stop:veo3 npm run stop:imagen4 # Stop agent coordination npm run stop:agentspace npm run stop:a2a-router # Stop streaming services npm run stop:streaming # Stop core services npm run stop:workspace npm run stop:vertex-ai ``` ### 2. Emergency Shutdown ```bash #!/bin/bash # Emergency shutdown - immediate termination echo "=== EMERGENCY SHUTDOWN INITIATED ===" # Kill all gemini-flow processes pkill -f "gemini-flow" pkill -f "vertex-ai" pkill -f "workspace" # Stop Docker containers docker stop $(docker ps -q --filter "label=app=gemini-flow") # Kubernetes emergency shutdown kubectl delete pods -l app=gemini-flow --force --grace-period=0 echo "=== Emergency shutdown complete ===" ``` ## Health Verification ### Automated Health Checks ```bash #!/bin/bash # health-check.sh - Comprehensive health verification check_service_health() { local service_name=$1 local endpoint=$2 local expected_status=${3:-200} response=$(curl -s -o /dev/null -w "%{http_code}" "$endpoint") if [[ "$response" == "$expected_status" ]]; then echo "✅ $service_name: Healthy (HTTP $response)" return 0 else echo "❌ $service_name: Unhealthy (HTTP $response)" return 1 fi } check_database_connectivity() { if npm run test:db-connection > /dev/null 2>&1; then echo "✅ Database: Connected" return 0 else echo "❌ Database: Connection failed" return 1 fi } check_external_apis() { # Test Google Cloud APIs if gcloud auth application-default print-access-token > /dev/null 2>&1; then echo "✅ Google Cloud Auth: Valid" else echo "❌ Google Cloud Auth: Invalid" return 1 fi # Test Vertex AI if curl -f "https://aiplatform.googleapis.com/v1/projects/$PROJECT_ID/locations" \ -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" > /dev/null 2>&1; then echo "✅ Vertex AI API: Accessible" else echo "❌ Vertex AI API: Inaccessible" return 1 fi } main() { echo "=== Gemini-Flow Health Check ===" echo "Timestamp: $(date)" failed_checks=0 # Core service health checks check_service_health "Vertex AI" "http://localhost:8080/health" || ((failed_checks++)) check_service_health "Workspace" "http://localhost:8081/health" || ((failed_checks++)) check_service_health "Streaming" "http://localhost:8082/health" || ((failed_checks++)) check_service_health "AgentSpace" "http://localhost:8083/health" || ((failed_checks++)) # Database and external API checks check_database_connectivity || ((failed_checks++)) check_external_apis || ((failed_checks++)) if [[ $failed_checks -eq 0 ]]; then echo "=== All health checks passed ===" exit 0 else echo "=== $failed_checks health checks failed ===" exit 1 fi } main "$@" ``` ## Rollback Procedures ### Automated Rollback Script ```bash #!/bin/bash # rollback.sh - Automated service rollback ROLLBACK_VERSION=${1:-"previous"} NAMESPACE=${2:-"default"} echo "=== Initiating rollback to $ROLLBACK_VERSION ===" # Kubernetes rollback kubectl rollout undo deployment/gemini-flow-api -n $NAMESPACE kubectl rollout undo deployment/gemini-flow-agents -n $NAMESPACE kubectl rollout undo deployment/gemini-flow-streaming -n $NAMESPACE # Wait for rollback completion kubectl rollout status deployment/gemini-flow-api -n $NAMESPACE --timeout=300s kubectl rollout status deployment/gemini-flow-agents -n $NAMESPACE --timeout=300s kubectl rollout status deployment/gemini-flow-streaming -n $NAMESPACE --timeout=300s # Verify rollback success if ./health-check.sh; then echo "✅ Rollback successful" else echo "❌ Rollback failed - manual intervention required" exit 1 fi ``` ## Automation Scripts ### Service Management Script ```bash #!/bin/bash # service-manager.sh - Unified service management ACTION=$1 SERVICE=$2 case $ACTION in "start") case $SERVICE in "all") ./startup-all.sh ;; *) npm run start:$SERVICE ;; esac ;; "stop") case $SERVICE in "all") ./shutdown-all.sh ;; *) npm run stop:$SERVICE ;; esac ;; "restart") npm run stop:$SERVICE sleep 10 npm run start:$SERVICE ;; "status") ./health-check.sh ;; *) echo "Usage: $0 {start|stop|restart|status} {service_name|all}" exit 1 ;; esac ``` ## Monitoring Integration ### Startup Metrics Collection ```bash # Collect startup metrics echo "startup_duration_seconds $(date +%s)" | curl -X POST http://pushgateway:9091/metrics/job/gemini-flow-startup ``` ### Health Check Automation ```yaml # docker-compose.monitoring.yml version: '3.8' services: healthcheck: image: curlimages/curl:latest command: | sh -c " while true; do if curl -f http://gemini-flow:8080/health; then echo 'health_check{service=\"gemini-flow\"} 1' | curl -X POST http://pushgateway:9091/metrics/job/health_check/instance/gemini-flow else echo 'health_check{service=\"gemini-flow\"} 0' | curl -X POST http://pushgateway:9091/metrics/job/health_check/instance/gemini-flow fi sleep 30 done " depends_on: - gemini-flow restart: unless-stopped ``` ## Troubleshooting Common Issues ### Service Won't Start 1. **Check logs**: `kubectl logs deployment/gemini-flow-api` 2. **Verify credentials**: `gcloud auth application-default print-access-token` 3. **Check resource availability**: `kubectl top nodes` 4. **Validate configuration**: `npm run config:validate` ### Service Fails Health Check 1. **Check service logs**: `kubectl logs -l app=gemini-flow --tail=100` 2. **Verify database connection**: `npm run test:db-connection` 3. **Check external API access**: `curl -f https://aiplatform.googleapis.com` 4. **Restart unhealthy service**: `kubectl rollout restart deployment/service-name` ### Performance Degradation During Startup 1. **Increase resource limits**: Edit Kubernetes resource definitions 2. **Stagger service startup**: Add delays between service starts 3. **Check CPU/Memory usage**: `kubectl top pods` 4. **Review startup logs**: Look for initialization bottlenecks --- **Document Owner**: SRE Team **Last Updated**: August 14, 2025 **Next Review**: November 14, 2025 **Version**: 1.0