# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨
🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨


***YOU ARE FORCED TO FOLLOW THESE PRINCIPLE RULES***
- ***MUST*** USE SKILL TODO**
- ***MUST*** FOLLOW YOUR TODO
- ***MUST*** USE DOCUMENTATION/REPOSITORIES AFTER 3 TRIES
- ***MUST*** PROPPERLY TEST WHAT YOU ARE DOING
- ***NEVER*** NEVER ASSUME
- ***MUST*** BE SURE
- ***MUST*** DOCUMENT YOU FINDINGS FOR THE NEXT TIME
- ***MUST*** CLEAN UP PROPPERLY
- ***MUST*** USE/UPDATE YOUR TEST DOCUMENT

🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨

## Project Overview

AI Stack Deployer is a self-service portal that deploys personal OpenCode AI coding assistant stacks. Users enter a name, and the system provisions containers via Dokploy to create a fully functional AI stack at `{name}.ai.flexinit.nl`. Wildcard DNS and SSL are pre-configured, so deployments only need to create the Dokploy project and application.

## Core Architecture

### Deployment Flow
The system orchestrates deployments through Dokploy, leveraging pre-configured infrastructure:

1. **Dokploy API** - Manages projects, applications, and container deployments
2. **Traefik** - Handles SSL termination and routing (pre-configured wildcard DNS and SSL)

Each deployment creates:
- Dokploy project: `ai-stack-{name}`
- Application with OpenCode server + ttyd terminal
- Domain configuration with automatic HTTPS (Traefik handles SSL via wildcard cert)

**Note**: DNS is pre-configured with wildcard `*.ai.flexinit.nl` → `144.76.116.169`. Individual DNS records are NOT created per deployment - Traefik routes based on hostname matching.

### Two Runtime Modes

1. **HTTP Server** (`src/index.ts`) - Hono-based API for web portal
   - **Fully implemented** production-ready web application
   - REST API endpoints for deployment management
   - Server-Sent Events (SSE) for real-time progress streaming
   - Static frontend serving (HTML/CSS/JS)
   - In-memory deployment state tracking
   - CORS and logging middleware

2. **MCP Server** (`src/mcp-server.ts`) - Model Context Protocol server
   - Development tool for Claude Code integration
   - Exposes deployment tools via stdio transport
   - Same deployment logic as HTTP server
   - Useful for testing and automation

### API Clients

**HetznerDNSClient** (`src/api/hetzner.ts`):
- Available for DNS management via Hetzner Cloud API
- **NOT used in deployment flow** (wildcard DNS already configured)
- Key methods: `createARecord()`, `recordExists()`, `findRRSetByName()`
- Could be used for manual DNS operations or testing

**DokployClient** (`src/api/dokploy.ts`):
- Orchestrates container deployments (primary deployment mechanism)
- Key flow: `createProject()` → `createApplication()` → `createDomain()` → `deployApplication()`
- Communicates with internal Dokploy at `http://10.100.0.20:3000`
- Traefik on Dokploy automatically handles SSL via pre-configured wildcard certificate

### HTTP API Endpoints

The HTTP server exposes the following endpoints:

**Health Check**:
- `GET /health` - Returns service health status

**Deployment**:
- `POST /api/deploy` - Start a new deployment
  - Body: `{ "name": "stack-name" }`
  - Returns: `{ deploymentId, url, statusEndpoint }`
- `GET /api/status/:deploymentId` - SSE stream for deployment progress
  - Events: `progress`, `complete`, `error`
- `GET /api/check/:name` - Check if a name is available
  - Returns: `{ available, valid, error? }`

**Frontend**:
- `GET /` - Serves the web UI (`src/frontend/index.html`)
- `GET /static/*` - Serves static assets (CSS, JS)

### State Management

Both servers track deployments in-memory using a Map:
```typescript
interface DeploymentState {
  id: string;              // dep_{timestamp}_{random}
  name: string;            // normalized username
  status: 'initializing' | 'creating_project' | 'creating_application' |
          'deploying' | 'completed' | 'failed';
  url?: string;            // https://{name}.ai.flexinit.nl
  error?: string;
  projectId?: string;
  applicationId?: string;
  progress: number;        // 0-100 (HTTP server only)
  currentStep: string;     // Human-readable step (HTTP server only)
}
```

**Note**: State is in-memory only and lost on server restart. For production with persistence, implement database storage.

### Name Validation

Stack names must be:
- 3-20 characters
- Lowercase alphanumeric with hyphens
- Cannot start/end with hyphen
- Not in reserved list (admin, api, www, root, system, test, demo, portal)

### Frontend Architecture

**Location**: `src/frontend/`
- `index.html` - Main UI with state machine (form, progress, success, error)
- `style.css` - Modern gradient design with animations
- `app.js` - Vanilla JavaScript with SSE client and real-time validation

**Features**:
- Real-time name availability checking
- Client-side validation with server verification
- SSE-powered live deployment progress tracking
- State machine: Form → Progress → Success/Error
- Responsive design (mobile-friendly)
- No framework dependencies (vanilla JS)

## Development Commands

```bash
# Development server (HTTP API with hot reload)
bun run dev

# Production server (HTTP API)
bun run start

# MCP server (for Claude Code integration)
bun run mcp

# Type checking
bun run typecheck

# Build for production
bun run build

# Test API clients (requires valid credentials)
bun run src/test-clients.ts

# Docker commands
docker build -t ai-stack-deployer .
docker-compose up -d
docker-compose logs -f
docker-compose down
```

## Session Management

The project supports two types of Claude Code sessions:

**🤖 Built-in Sessions** (Automatic)
- Created automatically by Claude Code for every conversation
- Stored in `~/.claude/projects/.../`
- Resume with: `claude --session-id {uuid}` or `claude --continue`

**📁 Custom Sessions** (Optional, for organization)
- Created explicitly via `./scripts/claude-start.sh {name}`
- Enable named sessions and Graphiti Memory auto-integration
- Best for: feature development, bug fixes, multi-day work

### Commands

```bash
# List ALL sessions (both built-in and custom)
bash scripts/claude-session.sh list

# Create/resume custom named session
./scripts/claude-start.sh feature-http-api

# Delete a custom session
bash scripts/claude-session.sh delete feature-http-api

# Override permission mode (default: bypassPermissions)
CLAUDE_PERMISSION_MODE=prompt ./scripts/claude-start.sh feature-name
```

### Custom Session Benefits

**Automatic Configuration:**
- Permission mode: `bypassPermissions` (no permission prompts for file operations)
- Session ID: Persistent UUID throughout work session
- Environment variables: Auto-set for Graphiti Memory integration

**Environment Variables (Set Automatically):**
```bash
CLAUDE_SESSION_ID=550e8400-e29b-41d4-a716-446655440000
CLAUDE_SESSION_NAME=feature-http-api
CLAUDE_SESSION_START=2026-01-09 20:16:00
CLAUDE_SESSION_PROJECT=ai-stack-deployer
CLAUDE_SESSION_MCP_GROUP=project_ai_stack_deployer
```

**Graphiti Memory Integration:**
```javascript
// At session end, store learnings
graphiti-memory_add_memory({
  name: "Session: feature-http-api - 2026-01-09",
  episode_body: "Session ID: 550e8400. Implemented HTTP server endpoints for deploy API. Added SSE for progress updates. Tests passing.",
  group_id: "project_ai_stack_deployer"  // Auto-set from CLAUDE_SESSION_MCP_GROUP
})
```

**Storage:**
- Custom sessions: `$HOME/.claude/sessions/ai-stack-deployer/*.session`
- Built-in sessions: `~/.claude/projects/-home-odouhou-locale-projects-ai-stack-deployer/*.jsonl`

## Environment Variables

Required for deployment operations:
- `DOKPLOY_URL` - Dokploy API URL (http://10.100.0.20:3000)
- `DOKPLOY_API_TOKEN` - Dokploy API authentication token

Optional configuration:
- `PORT` - HTTP server port (default: 3000)
- `HOST` - HTTP server bind address (default: 0.0.0.0)
- `STACK_DOMAIN_SUFFIX` - Domain suffix for stacks (default: ai.flexinit.nl)
- `STACK_IMAGE` - Docker image for user stacks
- `RESERVED_NAMES` - Comma-separated list of forbidden names

Not used in deployment (available for testing/manual operations):
- `HETZNER_API_TOKEN` - Hetzner Cloud API token
- `HETZNER_ZONE_ID` - DNS zone ID (343733 for flexinit.nl)
- `TRAEFIK_IP` - Public IP (144.76.116.169) - only for reference

See `.env.example` for complete configuration template.

## MCP Server Integration

The MCP server is configured in `.mcp.json` and provides these tools:

- `deploy_stack` - Deploys a new AI stack (Dokploy orchestration only, no DNS creation)
- `check_deployment_status` - Query deployment progress by ID
- `list_deployments` - List all deployments in current session
- `check_name_availability` - Validate name before deployment
- `test_api_connections` - Verify Hetzner and Dokploy connectivity (both clients available for testing)

To test MCP functionality:
```bash
# Start MCP server
bun run mcp

# Test API connections
bun run src/test-clients.ts
```

## Key Implementation Details

### Error Handling
Both API clients throw errors on failure. The MCP server catches these and returns structured error responses. No automatic retry logic exists yet.

### Deployment Idempotency
- Dokploy projects: Searches for existing project by name before creating
- Creates only if not found
- No automatic cleanup on partial failures
- DNS is wildcard-based, so no per-deployment DNS operations needed

### Concurrency
The MCP server handles one request at a time per invocation. No rate limiting or queue management exists yet.

### Security Notes
- All tokens in environment variables (never in code)
- Dokploy URL is internal-only (10.100.0.x network)
- No authentication on HTTP endpoints (portal will need auth)
- Name validation prevents injection attacks

## Testing Strategy

Currently implemented:
- `src/test-clients.ts` - Manual testing of Hetzner and Dokploy clients
- Requires real API credentials in `.env`
- Note: Only Dokploy client is used in actual deployments

Missing (needs implementation):
- Unit tests for validation logic
- Integration tests for deployment flow
- Mock API clients for testing without credentials
- Health check monitoring
- Rollback on failures

## Common Patterns

### Adding a New MCP Tool
1. Define tool schema in `tools` array (src/mcp-server.ts:178)
2. Add case to switch statement in `CallToolRequestSchema` handler (src/mcp-server.ts:249)
3. Extract typed arguments: `const { arg } = args as { arg: Type }`
4. Return structured response with `content: [{ type: 'text', text: JSON.stringify(...) }]`

### Adding HTTP Endpoints
1. Add route to Hono app in `src/index.ts`
2. Use API clients from `src/api/` directory
3. Return JSON with consistent error format
4. Consider adding SSE for long-running operations

### Extending API Clients
- Keep TypeScript interfaces at top of file
- Use `satisfies` for type-safe request bodies
- Throw descriptive errors (include API status codes)
- Add methods to client class, use `private request()` helper

## Production Deployment

### Docker Build and Run

```bash
# Build the Docker image
docker build -t ai-stack-deployer:latest .

# Run with docker-compose (recommended)
docker-compose up -d

# Or run manually
docker run -d \
  --name ai-stack-deployer \
  -p 3000:3000 \
  --env-file .env \
  ai-stack-deployer:latest
```

### Deploying to Dokploy

1. **Prepare Environment**:
   - Ensure `.env` file has valid `DOKPLOY_API_TOKEN`
   - Verify `DOKPLOY_URL` points to internal Dokploy instance

2. **Build and Push Image** (if using custom registry):
   ```bash
   docker build -t your-registry/ai-stack-deployer:latest .
   docker push your-registry/ai-stack-deployer:latest
   ```

3. **Deploy via Dokploy UI**:
   - Create new project: `ai-stack-deployer-portal`
   - Create application from Docker image
   - Configure domain (e.g., `portal.ai.flexinit.nl`)
   - Set environment variables from `.env`
   - Deploy

4. **Verify Deployment**:
   ```bash
   curl https://portal.ai.flexinit.nl/health
   ```

### Health Monitoring

The application includes a `/health` endpoint that returns:
```json
{
  "status": "healthy",
  "timestamp": "2026-01-09T...",
  "version": "0.1.0",
  "service": "ai-stack-deployer",
  "activeDeployments": 0
}
```

Docker health check runs every 30 seconds and restarts container if unhealthy.

## Infrastructure Dependencies

- **Wildcard DNS** - `*.ai.flexinit.nl` → `144.76.116.169` (pre-configured in Hetzner DNS)
- **Traefik** at 144.76.116.169 - Pre-configured wildcard SSL certificate for `*.ai.flexinit.nl`
- **Dokploy** at 10.100.0.20:3000 - Container orchestration platform (handles all deployments)
- **Docker image** - oh-my-opencode-free (OpenCode + ttyd terminal)

**Key Point**: Individual DNS records are NOT created per deployment. The wildcard DNS and SSL are already configured, so Traefik automatically routes `{name}.ai.flexinit.nl` to the correct container based on hostname matching.

## Logging Infrastructure

AI Stack logging integrates with the existing monitoring stack at `logs.intra.flexinit.nl`.

### Components

| Component | Location | Purpose |
|-----------|----------|---------|
| Log-ingest | `http://ai-stack-log-ingest:3000` (dokploy-network) | Receives events from AI stacks, pushes to Loki |
| Loki | `monitor-grafanaloki-qkj16i-loki-1` | Log storage |
| Grafana | https://logs.intra.flexinit.nl | Visualization |
| Dashboard | `/d/ai-stack-overview` | AI Stack metrics and logs |

### Datasource UIDs (Grafana)
- Loki: `af9a823s6iku8b`
- Prometheus: `cf9r1fmfw9xxcf`

### Configuration

AI stacks send logs via environment variable:
```
LOG_INGEST_URL=http://ai-stack-log-ingest:3000/ingest
```

### Local Development

The `logging-stack/` directory contains a standalone docker-compose for local testing:
```bash
cd logging-stack && docker-compose up -d
```

### Credentials

Grafana service account token stored in BWS:
- Key: `GRAFANA_OPENCODE_ACCESS_TOKEN`
- BWS ID: `c77e58e3-fb34-41dc-9824-b3ce00da18a0`

## CI/CD - Gitea Actions

The `oh-my-opencode-free` Docker image is built automatically via Gitea Actions on push to main.

### Check Workflow Status

**Web UI:**
```
https://git.app.flexinit.nl/oussamadouhou/oh-my-opencode-free/actions
```

**API:**
```bash
# Get token from BWS (key: GITEA_API_TOKEN)
GITEA_TOKEN="<token>"

# List recent runs with status
curl -s -H "Authorization: token $GITEA_TOKEN" \
  "https://git.app.flexinit.nl/api/v1/repos/oussamadouhou/oh-my-opencode-free/actions/runs?limit=5" | \
  jq '.workflow_runs[] | {run_number, status, conclusion, display_title, head_sha: .head_sha[0:7]}'
```

### API Response Fields

| Field | Values |
|-------|--------|
| `status` | `queued`, `in_progress`, `completed` |
| `conclusion` | `success`, `failure`, `cancelled`, `skipped` |

### Credentials

- **GITEA_API_TOKEN** - Gitea API access (stored in BWS)

## Project Status

✅ **Completed**:
- HTTP Server with REST API and SSE streaming
- Frontend UI with real-time deployment tracking
- MCP Server for Claude Code integration
- Docker configuration for production deployment
- Full deployment orchestration via Dokploy API
- Name validation and availability checking
- Error handling and progress reporting

**Ready for Production Deployment**