ai-stack-deployer/docs/PRODUCTION_API_SPEC.md

# Dokploy API - Production Specification
**Date**: 2026-01-09
**Status**: ENTERPRISE GRADE - PRODUCTION READY

## API Authentication
- **Header**: `x-api-key: {token}`
- **Base URL**: `https://app.flexinit.nl` (public) or `http://10.100.0.20:3000` (internal)

## Production Deployment Flow

### Phase 1: Project & Environment Creation
```typescript
POST /api/project.create
Body: {
  name: string,        // "ai-stack-{username}"
  description?: string // "AI Stack for {username}"
}

Response: {
  projectId: string,
  name: string,
  description: string,
  createdAt: string,
  organizationId: string,
  env: string
}

// Note: Environment is created automatically with production environment
// Environment ID must be retrieved separately
```

### Phase 2: Get Environment ID
```typescript
GET /api/environment.byProjectId?projectId={projectId}

Response: Array<{
  environmentId: string,
  name: string,        // "production"
  projectId: string,
  isDefault: boolean,
  env: string,
  createdAt: string
}>
```

### Phase 3: Create Application
```typescript
POST /api/application.create
Body: {
  name: string,           // "opencode-{username}"
  environmentId: string   // From Phase 2
}

Response: {
  applicationId: string,
  name: string,
  environmentId: string,
  applicationStatus: 'idle' | 'running' | 'done' | 'error',
  createdAt: string,
  // ... other fields
}
```

### Phase 4: Configure Application (Docker Image)
```typescript
POST /api/application.update
Body: {
  applicationId: string,
  dockerImage: string,    // "git.app.flexinit.nl/..."
  sourceType: 'docker'
}

Response: {
  applicationId: string,
  // ... updated fields
}
```

### Phase 5: Create Domain
```typescript
POST /api/domain.create
Body: {
  host: string,          // "{username}.ai.flexinit.nl"
  applicationId: string,
  https: boolean,        // true
  port: number          // 8080
}

Response: {
  domainId: string,
  host: string,
  applicationId: string,
  https: boolean,
  port: number
}
```

### Phase 6: Deploy Application
```typescript
POST /api/application.deploy
Body: {
  applicationId: string
}

Response: void | { deploymentId?: string }
```

## Error Handling - Enterprise Grade

### Retry Strategy
- **Transient errors** (5xx, network): Exponential backoff (1s, 2s, 4s, 8s, 16s)
- **Rate limiting** (429): Respect Retry-After header
- **Authentication** (401): Fail immediately, no retry
- **Validation** (400): Fail immediately, log and report

### Rollback Strategy
On any phase failure:
1. Log failure point and error details
2. Execute cleanup in reverse order:
   - Delete domain (if created)
   - Delete application (if created)
   - Delete project (if no other resources)
3. Report detailed failure to user
4. Store failure record for analysis

### Circuit Breaker
- **Threshold**: 5 consecutive failures
- **Timeout**: 60 seconds
- **Half-open**: After timeout, allow 1 test request
- **Reset**: After 3 consecutive successes

## Idempotency

### Project Creation
- Check if project exists by name before creating
- If exists, use existing projectId
- Store creation timestamp for audit

### Application Creation
- Query existing applications by name in environment
- If exists and in valid state, reuse
- If exists but failed, delete and recreate

### Domain Creation
- Query existing domains for application
- If exists with same config, skip creation
- If exists with different config, update

### Deployment
- Check current deployment status before triggering
- If deployment in progress, poll status instead of re-triggering
- If deployment failed, analyze logs before retry

## Monitoring & Observability

### Structured Logging
```typescript
{
  timestamp: ISO8601,
  level: 'info' | 'warn' | 'error',
  phase: 'project' | 'environment' | 'application' | 'domain' | 'deploy',
  action: 'create' | 'update' | 'delete' | 'query',
  deploymentId: string,
  username: string,
  duration_ms: number,
  status: 'success' | 'failure',
  error?: {
    code: string,
    message: string,
    stack?: string,
    apiResponse?: unknown
  }
}
```

### Health Checks
- **Application health**: GET /health every 10s for 2 minutes
- **Container status**: Query application status via API
- **Domain resolution**: Verify DNS + HTTPS connectivity
- **Service availability**: Check if ttyd terminal is accessible

### Metrics
- Deployment success rate
- Average deployment time
- Failure reasons histogram
- API latency percentiles (p50, p95, p99)
- Retry counts per phase
- Rollback occurrences

## Security

### Input Validation
- Sanitize all user inputs before API calls
- Validate against injection attacks
- Enforce strict name regex
- Check reserved names list

### Secrets Management
- Never log API tokens
- Redact sensitive data in error messages
- Use environment variables for all credentials
- Rotate tokens periodically

### Rate Limiting
- Client-side: Max 10 deployments per user per hour
- Per-phase rate limiting to prevent API abuse
- Queue requests if limit exceeded

## Production Checklist

- [ ] All API calls use correct parameter names
- [ ] Environment ID retrieved and used for application creation
- [ ] Retry logic with exponential backoff implemented
- [ ] Circuit breaker pattern implemented
- [ ] Complete rollback on any failure
- [ ] Idempotency checks for all operations
- [ ] Structured logging with deployment tracking
- [ ] Health checks with timeout
- [ ] Input validation and sanitization
- [ ] Integration tests with real API
- [ ] Load testing (10 concurrent deployments)
- [ ] Failure scenario testing (network, auth, validation)
- [ ] Documentation and runbook complete
- [ ] Monitoring and alerting configured