Files
ai-stack-deployer/docs/PRODUCTION_API_SPEC.md
Oussama Douhou 19845880e3 fix(ci): trigger workflow on main branch to enable :latest tag
Changes:
- Create Gitea workflow for ai-stack-deployer
- Trigger on main branch (default branch)
- Use oussamadouhou + REGISTRY_TOKEN for authentication
- Build from ./Dockerfile

This enables :latest tag creation via {{is_default_branch}}.

Tags created:
- git.app.flexinit.nl/oussamadouhou/ai-stack-deployer:latest
- git.app.flexinit.nl/oussamadouhou/ai-stack-deployer:<sha>

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-09 23:33:39 +01:00

225 lines
5.5 KiB
Markdown

# Dokploy API - Production Specification
**Date**: 2026-01-09
**Status**: ENTERPRISE GRADE - PRODUCTION READY
## API Authentication
- **Header**: `x-api-key: {token}`
- **Base URL**: `https://app.flexinit.nl` (public) or `http://10.100.0.20:3000` (internal)
## Production Deployment Flow
### Phase 1: Project & Environment Creation
```typescript
POST /api/project.create
Body: {
name: string, // "ai-stack-{username}"
description?: string // "AI Stack for {username}"
}
Response: {
projectId: string,
name: string,
description: string,
createdAt: string,
organizationId: string,
env: string
}
// Note: Environment is created automatically with production environment
// Environment ID must be retrieved separately
```
### Phase 2: Get Environment ID
```typescript
GET /api/environment.byProjectId?projectId={projectId}
Response: Array<{
environmentId: string,
name: string, // "production"
projectId: string,
isDefault: boolean,
env: string,
createdAt: string
}>
```
### Phase 3: Create Application
```typescript
POST /api/application.create
Body: {
name: string, // "opencode-{username}"
environmentId: string // From Phase 2
}
Response: {
applicationId: string,
name: string,
environmentId: string,
applicationStatus: 'idle' | 'running' | 'done' | 'error',
createdAt: string,
// ... other fields
}
```
### Phase 4: Configure Application (Docker Image)
```typescript
POST /api/application.update
Body: {
applicationId: string,
dockerImage: string, // "git.app.flexinit.nl/..."
sourceType: 'docker'
}
Response: {
applicationId: string,
// ... updated fields
}
```
### Phase 5: Create Domain
```typescript
POST /api/domain.create
Body: {
host: string, // "{username}.ai.flexinit.nl"
applicationId: string,
https: boolean, // true
port: number // 8080
}
Response: {
domainId: string,
host: string,
applicationId: string,
https: boolean,
port: number
}
```
### Phase 6: Deploy Application
```typescript
POST /api/application.deploy
Body: {
applicationId: string
}
Response: void | { deploymentId?: string }
```
## Error Handling - Enterprise Grade
### Retry Strategy
- **Transient errors** (5xx, network): Exponential backoff (1s, 2s, 4s, 8s, 16s)
- **Rate limiting** (429): Respect Retry-After header
- **Authentication** (401): Fail immediately, no retry
- **Validation** (400): Fail immediately, log and report
### Rollback Strategy
On any phase failure:
1. Log failure point and error details
2. Execute cleanup in reverse order:
- Delete domain (if created)
- Delete application (if created)
- Delete project (if no other resources)
3. Report detailed failure to user
4. Store failure record for analysis
### Circuit Breaker
- **Threshold**: 5 consecutive failures
- **Timeout**: 60 seconds
- **Half-open**: After timeout, allow 1 test request
- **Reset**: After 3 consecutive successes
## Idempotency
### Project Creation
- Check if project exists by name before creating
- If exists, use existing projectId
- Store creation timestamp for audit
### Application Creation
- Query existing applications by name in environment
- If exists and in valid state, reuse
- If exists but failed, delete and recreate
### Domain Creation
- Query existing domains for application
- If exists with same config, skip creation
- If exists with different config, update
### Deployment
- Check current deployment status before triggering
- If deployment in progress, poll status instead of re-triggering
- If deployment failed, analyze logs before retry
## Monitoring & Observability
### Structured Logging
```typescript
{
timestamp: ISO8601,
level: 'info' | 'warn' | 'error',
phase: 'project' | 'environment' | 'application' | 'domain' | 'deploy',
action: 'create' | 'update' | 'delete' | 'query',
deploymentId: string,
username: string,
duration_ms: number,
status: 'success' | 'failure',
error?: {
code: string,
message: string,
stack?: string,
apiResponse?: unknown
}
}
```
### Health Checks
- **Application health**: GET /health every 10s for 2 minutes
- **Container status**: Query application status via API
- **Domain resolution**: Verify DNS + HTTPS connectivity
- **Service availability**: Check if ttyd terminal is accessible
### Metrics
- Deployment success rate
- Average deployment time
- Failure reasons histogram
- API latency percentiles (p50, p95, p99)
- Retry counts per phase
- Rollback occurrences
## Security
### Input Validation
- Sanitize all user inputs before API calls
- Validate against injection attacks
- Enforce strict name regex
- Check reserved names list
### Secrets Management
- Never log API tokens
- Redact sensitive data in error messages
- Use environment variables for all credentials
- Rotate tokens periodically
### Rate Limiting
- Client-side: Max 10 deployments per user per hour
- Per-phase rate limiting to prevent API abuse
- Queue requests if limit exceeded
## Production Checklist
- [ ] All API calls use correct parameter names
- [ ] Environment ID retrieved and used for application creation
- [ ] Retry logic with exponential backoff implemented
- [ ] Circuit breaker pattern implemented
- [ ] Complete rollback on any failure
- [ ] Idempotency checks for all operations
- [ ] Structured logging with deployment tracking
- [ ] Health checks with timeout
- [ ] Input validation and sanitization
- [ ] Integration tests with real API
- [ ] Load testing (10 concurrent deployments)
- [ ] Failure scenario testing (network, auth, validation)
- [ ] Documentation and runbook complete
- [ ] Monitoring and alerting configured