oussamadouhou/ai-stack-deployer

Fork 0

Files

Oussama Douhou 80e54ce578 docs: add logging infrastructure section to CLAUDE.md

2026-01-10 14:17:31 +01:00

14 KiB

Raw Blame History

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨 🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨

YOU ARE FORCED TO FOLLOW THESE PRINCIPLE RULES

MUST USE SKILL TODO**
MUST FOLLOW YOUR TODO
MUST USE DOCUMENTATION/REPOSITORIES AFTER 3 TRIES
MUST PROPPERLY TEST WHAT YOU ARE DOING
NEVER NEVER ASSUME
MUST BE SURE
MUST DOCUMENT YOU FINDINGS FOR THE NEXT TIME
MUST CLEAN UP PROPPERLY
MUST USE/UPDATE YOUR TEST DOCUMENT

🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨

Project Overview

AI Stack Deployer is a self-service portal that deploys personal OpenCode AI coding assistant stacks. Users enter a name, and the system provisions containers via Dokploy to create a fully functional AI stack at {name}.ai.flexinit.nl. Wildcard DNS and SSL are pre-configured, so deployments only need to create the Dokploy project and application.

Core Architecture

Deployment Flow

The system orchestrates deployments through Dokploy, leveraging pre-configured infrastructure:

Dokploy API - Manages projects, applications, and container deployments
Traefik - Handles SSL termination and routing (pre-configured wildcard DNS and SSL)

Each deployment creates:

Dokploy project: ai-stack-{name}
Application with OpenCode server + ttyd terminal
Domain configuration with automatic HTTPS (Traefik handles SSL via wildcard cert)

Note: DNS is pre-configured with wildcard *.ai.flexinit.nl → 144.76.116.169. Individual DNS records are NOT created per deployment - Traefik routes based on hostname matching.

Two Runtime Modes

HTTP Server (src/index.ts) - Hono-based API for web portal
- Fully implemented production-ready web application
- REST API endpoints for deployment management
- Server-Sent Events (SSE) for real-time progress streaming
- Static frontend serving (HTML/CSS/JS)
- In-memory deployment state tracking
- CORS and logging middleware
MCP Server (src/mcp-server.ts) - Model Context Protocol server
- Development tool for Claude Code integration
- Exposes deployment tools via stdio transport
- Same deployment logic as HTTP server
- Useful for testing and automation

API Clients

HetznerDNSClient (src/api/hetzner.ts):

Available for DNS management via Hetzner Cloud API
NOT used in deployment flow (wildcard DNS already configured)
Key methods: createARecord(), recordExists(), findRRSetByName()
Could be used for manual DNS operations or testing

DokployClient (src/api/dokploy.ts):

Orchestrates container deployments (primary deployment mechanism)
Key flow: createProject() → createApplication() → createDomain() → deployApplication()
Communicates with internal Dokploy at http://10.100.0.20:3000
Traefik on Dokploy automatically handles SSL via pre-configured wildcard certificate

HTTP API Endpoints

The HTTP server exposes the following endpoints:

Health Check:

GET /health - Returns service health status

Deployment:

POST /api/deploy - Start a new deployment
- Body: { "name": "stack-name" }
- Returns: { deploymentId, url, statusEndpoint }
GET /api/status/:deploymentId - SSE stream for deployment progress
- Events: progress, complete, error
GET /api/check/:name - Check if a name is available
- Returns: { available, valid, error? }

Frontend:

GET / - Serves the web UI (src/frontend/index.html)
GET /static/* - Serves static assets (CSS, JS)

State Management

Both servers track deployments in-memory using a Map:

interface DeploymentState {
  id: string;              // dep_{timestamp}_{random}
  name: string;            // normalized username
  status: 'initializing' | 'creating_project' | 'creating_application' |
          'deploying' | 'completed' | 'failed';
  url?: string;            // https://{name}.ai.flexinit.nl
  error?: string;
  projectId?: string;
  applicationId?: string;
  progress: number;        // 0-100 (HTTP server only)
  currentStep: string;     // Human-readable step (HTTP server only)
}

Note: State is in-memory only and lost on server restart. For production with persistence, implement database storage.

Name Validation

Stack names must be:

3-20 characters
Lowercase alphanumeric with hyphens
Cannot start/end with hyphen
Not in reserved list (admin, api, www, root, system, test, demo, portal)

Frontend Architecture

Location: src/frontend/

index.html - Main UI with state machine (form, progress, success, error)
style.css - Modern gradient design with animations
app.js - Vanilla JavaScript with SSE client and real-time validation

Features:

Real-time name availability checking
Client-side validation with server verification
SSE-powered live deployment progress tracking
State machine: Form → Progress → Success/Error
Responsive design (mobile-friendly)
No framework dependencies (vanilla JS)

Development Commands

# Development server (HTTP API with hot reload)
bun run dev

# Production server (HTTP API)
bun run start

# MCP server (for Claude Code integration)
bun run mcp

# Type checking
bun run typecheck

# Build for production
bun run build

# Test API clients (requires valid credentials)
bun run src/test-clients.ts

# Docker commands
docker build -t ai-stack-deployer .
docker-compose up -d
docker-compose logs -f
docker-compose down

Session Management

The project supports two types of Claude Code sessions:

🤖 Built-in Sessions (Automatic)

Created automatically by Claude Code for every conversation
Stored in ~/.claude/projects/.../
Resume with: claude --session-id {uuid} or claude --continue

📁 Custom Sessions (Optional, for organization)

Created explicitly via ./scripts/claude-start.sh {name}
Enable named sessions and Graphiti Memory auto-integration
Best for: feature development, bug fixes, multi-day work

Commands

# List ALL sessions (both built-in and custom)
bash scripts/claude-session.sh list

# Create/resume custom named session
./scripts/claude-start.sh feature-http-api

# Delete a custom session
bash scripts/claude-session.sh delete feature-http-api

# Override permission mode (default: bypassPermissions)
CLAUDE_PERMISSION_MODE=prompt ./scripts/claude-start.sh feature-name

Custom Session Benefits

Automatic Configuration:

Permission mode: bypassPermissions (no permission prompts for file operations)
Session ID: Persistent UUID throughout work session
Environment variables: Auto-set for Graphiti Memory integration

Environment Variables (Set Automatically):

CLAUDE_SESSION_ID=550e8400-e29b-41d4-a716-446655440000
CLAUDE_SESSION_NAME=feature-http-api
CLAUDE_SESSION_START=2026-01-09 20:16:00
CLAUDE_SESSION_PROJECT=ai-stack-deployer
CLAUDE_SESSION_MCP_GROUP=project_ai_stack_deployer

Graphiti Memory Integration:

// At session end, store learnings
graphiti-memory_add_memory({
  name: "Session: feature-http-api - 2026-01-09",
  episode_body: "Session ID: 550e8400. Implemented HTTP server endpoints for deploy API. Added SSE for progress updates. Tests passing.",
  group_id: "project_ai_stack_deployer"  // Auto-set from CLAUDE_SESSION_MCP_GROUP
})

Storage:

Custom sessions: $HOME/.claude/sessions/ai-stack-deployer/*.session
Built-in sessions: ~/.claude/projects/-home-odouhou-locale-projects-ai-stack-deployer/*.jsonl

Environment Variables

Required for deployment operations:

DOKPLOY_URL - Dokploy API URL (http://10.100.0.20:3000)
DOKPLOY_API_TOKEN - Dokploy API authentication token

Optional configuration:

PORT - HTTP server port (default: 3000)
HOST - HTTP server bind address (default: 0.0.0.0)
STACK_DOMAIN_SUFFIX - Domain suffix for stacks (default: ai.flexinit.nl)
STACK_IMAGE - Docker image for user stacks
RESERVED_NAMES - Comma-separated list of forbidden names

Not used in deployment (available for testing/manual operations):

HETZNER_API_TOKEN - Hetzner Cloud API token
HETZNER_ZONE_ID - DNS zone ID (343733 for flexinit.nl)
TRAEFIK_IP - Public IP (144.76.116.169) - only for reference

See .env.example for complete configuration template.

MCP Server Integration

The MCP server is configured in .mcp.json and provides these tools:

deploy_stack - Deploys a new AI stack (Dokploy orchestration only, no DNS creation)
check_deployment_status - Query deployment progress by ID
list_deployments - List all deployments in current session
check_name_availability - Validate name before deployment
test_api_connections - Verify Hetzner and Dokploy connectivity (both clients available for testing)

To test MCP functionality:

# Start MCP server
bun run mcp

# Test API connections
bun run src/test-clients.ts

Key Implementation Details

Error Handling

Both API clients throw errors on failure. The MCP server catches these and returns structured error responses. No automatic retry logic exists yet.

Deployment Idempotency

Dokploy projects: Searches for existing project by name before creating
Creates only if not found
No automatic cleanup on partial failures
DNS is wildcard-based, so no per-deployment DNS operations needed

Concurrency

The MCP server handles one request at a time per invocation. No rate limiting or queue management exists yet.

Security Notes

All tokens in environment variables (never in code)
Dokploy URL is internal-only (10.100.0.x network)
No authentication on HTTP endpoints (portal will need auth)
Name validation prevents injection attacks

Testing Strategy

Currently implemented:

src/test-clients.ts - Manual testing of Hetzner and Dokploy clients
Requires real API credentials in .env
Note: Only Dokploy client is used in actual deployments

Missing (needs implementation):

Unit tests for validation logic
Integration tests for deployment flow
Mock API clients for testing without credentials
Health check monitoring
Rollback on failures

Common Patterns

Adding a New MCP Tool

Define tool schema in tools array (src/mcp-server.ts:178)
Add case to switch statement in CallToolRequestSchema handler (src/mcp-server.ts:249)
Extract typed arguments: const { arg } = args as { arg: Type }
Return structured response with content: [{ type: 'text', text: JSON.stringify(...) }]

Adding HTTP Endpoints

Add route to Hono app in src/index.ts
Use API clients from src/api/ directory
Return JSON with consistent error format
Consider adding SSE for long-running operations

Extending API Clients

Keep TypeScript interfaces at top of file
Use satisfies for type-safe request bodies
Throw descriptive errors (include API status codes)
Add methods to client class, use private request() helper

Production Deployment

Docker Build and Run

# Build the Docker image
docker build -t ai-stack-deployer:latest .

# Run with docker-compose (recommended)
docker-compose up -d

# Or run manually
docker run -d \
  --name ai-stack-deployer \
  -p 3000:3000 \
  --env-file .env \
  ai-stack-deployer:latest

Deploying to Dokploy

Prepare Environment:
- Ensure .env file has valid DOKPLOY_API_TOKEN
- Verify DOKPLOY_URL points to internal Dokploy instance

Build and Push Image (if using custom registry):

docker build -t your-registry/ai-stack-deployer:latest .
docker push your-registry/ai-stack-deployer:latest

Deploy via Dokploy UI:
- Create new project: ai-stack-deployer-portal
- Create application from Docker image
- Configure domain (e.g., portal.ai.flexinit.nl)
- Set environment variables from .env
- Deploy

Verify Deployment:

curl https://portal.ai.flexinit.nl/health

Health Monitoring

The application includes a /health endpoint that returns:

{
  "status": "healthy",
  "timestamp": "2026-01-09T...",
  "version": "0.1.0",
  "service": "ai-stack-deployer",
  "activeDeployments": 0
}

Docker health check runs every 30 seconds and restarts container if unhealthy.

Infrastructure Dependencies

Wildcard DNS - *.ai.flexinit.nl → 144.76.116.169 (pre-configured in Hetzner DNS)
Traefik at 144.76.116.169 - Pre-configured wildcard SSL certificate for *.ai.flexinit.nl
Dokploy at 10.100.0.20:3000 - Container orchestration platform (handles all deployments)
Docker image - oh-my-opencode-free (OpenCode + ttyd terminal)

Key Point: Individual DNS records are NOT created per deployment. The wildcard DNS and SSL are already configured, so Traefik automatically routes {name}.ai.flexinit.nl to the correct container based on hostname matching.

Logging Infrastructure

AI Stack logging integrates with the existing monitoring stack at logs.intra.flexinit.nl.

Components

Component	Location	Purpose
Log-ingest	`http://ai-stack-log-ingest:3000` (dokploy-network)	Receives events from AI stacks, pushes to Loki
Loki	`monitor-grafanaloki-qkj16i-loki-1`	Log storage
Grafana	https://logs.intra.flexinit.nl	Visualization
Dashboard	`/d/ai-stack-overview`	AI Stack metrics and logs

Datasource UIDs (Grafana)

Loki: af9a823s6iku8b
Prometheus: cf9r1fmfw9xxcf

Configuration

AI stacks send logs via environment variable:

LOG_INGEST_URL=http://ai-stack-log-ingest:3000/ingest

Local Development

The logging-stack/ directory contains a standalone docker-compose for local testing:

cd logging-stack && docker-compose up -d

Credentials

Grafana service account token stored in BWS:

Key: GRAFANA_OPENCODE_ACCESS_TOKEN
BWS ID: c77e58e3-fb34-41dc-9824-b3ce00da18a0

Project Status

✅ Completed:

HTTP Server with REST API and SSE streaming
Frontend UI with real-time deployment tracking
MCP Server for Claude Code integration
Docker configuration for production deployment
Full deployment orchestration via Dokploy API
Name validation and availability checking
Error handling and progress reporting

Ready for Production Deployment

14 KiB Raw Blame History