# Argonix — Full Documentation

> Sovereign AI Ops Copilot — AI agent + monitoring + testing + infrastructure automation, powered by a local LLM with zero data egress.

## Overview

Argonix is a SaaS platform that combines an autonomous AI agent (Argos) with full-stack monitoring, synthetic testing, test management, alerting, incident management, and public status pages. It supports multi-tenant organizations with role-based access control and Stripe-powered billing. Argos connects to 30+ cloud and infrastructure services via 310+ tools to observe, decide, and act autonomously — with optional sovereign deployment running a local LLM (Qwen 3.5 9B via Ollama) for zero data egress.

- **Website**: https://argonix.io
- **App**: https://app.argonix.io
- **API Base**: https://api.argonix.io/api/0.1/
- **Status Pages**: https://status.argonix.io
- **Contact**: hello@argonix.io
- **Terraform Provider**: https://github.com/argonix-io/terraform-provider-argonix
- **Kubernetes CRD Operator**: https://github.com/argonix-io/kubernetes-crd

---

## Argos AI Agent

### Overview
Argos is an autonomous AI ops agent built into Argonix. It connects to your entire infrastructure stack and can observe, investigate, decide, and act on your behalf. Each conversation uses a specialized persona and a curated set of connectors and tools.

### Personas
11 built-in personas, each with a system prompt, default connectors, and recommended workflows:
- **SRE**: Site Reliability — incident response, SLO monitoring, capacity planning
- **DevOps**: CI/CD, deployments, infrastructure provisioning
- **Security**: Vulnerability scanning, access audit, compliance checks
- **Cloud Architect**: Multi-cloud design, cost optimization, migration planning
- **DBA**: Database performance, query optimization, backup management
- **Network**: Connectivity, DNS, firewall rules, load balancing
- **Compliance**: Regulatory audit, policy enforcement, SOC2/GDPR checks
- **FinOps**: Cloud cost analysis, rightsizing, reserved instance planning
- **Performance**: APM, bottleneck identification, load testing analysis
- **Chaos**: Resilience testing, failure injection planning, blast radius analysis
- **Data**: Data pipeline monitoring, ETL health, data quality checks

### Connectors (30 total, 310+ tools)
Each connector provides typed tools with `read`, `write`, or `execute` capabilities:

| Connector | Tools | Description |
|-----------|-------|-------------|
| AWS | 18 | EC2, S3, Lambda, RDS, CloudWatch, IAM, ECS, Route53 |
| GCP | 26 | Compute, IAM, Monitoring, Logging, GKE clusters + K8s ops |
| Azure | 12 | VMs, Storage, Monitor, Resource Groups |
| Kubernetes | 30 | Pods, Deployments, StatefulSets, DaemonSets, Services, Nodes, ConfigMaps, Secrets, Metrics |
| Docker | 10 | Containers, Images, Networks, Volumes |
| SSH/VM | 30 | Remote command execution, file operations, system info, script execution |
| GitHub | 12 | Repos, Issues, PRs, Actions, Branches |
| GitLab | 12 | Projects, Issues, MRs, Pipelines, Branches |
| Jira | 8 | Issues, Projects, Boards, Sprints |
| Slack | 6 | Messages, Channels, Users |
| PagerDuty | 8 | Incidents, Services, Escalation Policies |
| Datadog | 10 | Monitors, Events, Metrics, Dashboards |
| Prometheus | 6 | Queries, Alerts, Targets, Rules |
| Grafana | 8 | Dashboards, Alerts, Datasources |
| Elasticsearch | 10 | Indices, Search, Cluster Health, Snapshots |
| PostgreSQL | 8 | Queries, Tables, Connections, Performance |
| MySQL | 8 | Queries, Tables, Connections, Performance |
| Redis | 6 | Info, Keys, Memory, Slowlog |
| MongoDB | 6 | Collections, Queries, Stats |
| Terraform | 6 | State, Plan, Workspaces |
| Ansible | 6 | Playbooks, Inventory, Roles |
| CloudFlare | 8 | DNS, Zones, Firewall, Cache |
| New Relic | 6 | APM, Alerts, Dashboards |
| Splunk | 6 | Search, Alerts, Indexes |
| OpsGenie | 6 | Alerts, Schedules, Teams |
| Sentry | 6 | Issues, Events, Releases |
| Vault | 6 | Secrets, Policies, Auth |
| Consul | 6 | Services, KV, Health |
| Nginx | 4 | Config, Status, Upstream |
| Argonix | 12 | Monitors, Incidents, Status Pages, Alerts (self-referential) |

### Workflows (42 pre-built)
Ready-to-use workflow templates organized by category:
- **Incident**: Triage, root cause, post-mortem, escalation, communication
- **Infrastructure**: Health check, capacity audit, scaling, provisioning
- **Security**: Vulnerability scan, access review, secret rotation, firewall audit
- **Cost**: Cloud cost analysis, rightsizing, unused resource cleanup
- **Performance**: APM review, query optimization, load test analysis
- **Compliance**: SOC2 check, GDPR audit, policy validation
- **Chaos**: Failure scenario planning, blast radius analysis
- **Data**: Pipeline monitoring, migration validation

### RAG Knowledge Base
Organizations can upload documents (PDF, Markdown, text) that are embedded and indexed for retrieval:
- **Hybrid search**: pgvector dense embeddings (BGE-M3 1024d) + PostgreSQL tsvector (BM25-style)
- **Reciprocal Rank Fusion**: Combines vector and keyword results for optimal retrieval
- **Context injection**: Relevant knowledge is automatically injected into AI conversations

### LLM Providers
4 supported providers with automatic cost tracking:
- **Local** (default): Qwen 3.5 9B via Ollama — zero cost, zero data egress, ~2s latency
- **Google**: Gemini 2.5 Flash / Pro via Vertex AI
- **Anthropic**: Claude Sonnet 4 / Opus
- **OpenAI**: GPT-4o / GPT-4o-mini

---

## Features

### Multi-Protocol Monitoring
Argonix supports 9 monitor types:
- **HTTP(S)**: GET/POST/PUT/PATCH/DELETE with headers, body, auth, keyword matching, follow redirects, SSL verification
- **Ping (ICMP)**: Host reachability
- **TCP**: Port connectivity checks
- **DNS**: Record type queries (A, AAAA, CNAME, MX, TXT, NS, SOA, SRV) with expected value assertions
- **SSL Certificate**: Expiration monitoring with configurable warning threshold (days)
- **Keyword**: Check for presence/absence of a keyword on a page
- **gRPC**: gRPC service health checks
- **Heartbeat/Push**: Cron job monitoring — endpoint receives pings, alerts if silent beyond grace period
- **Multi-step HTTP**: Ordered sequence of HTTP requests with variable extraction between steps

Each monitor supports:
- Configurable check intervals (down to 30s)
- Timeout and retry settings
- Multi-region checks
- Tags for organization
- Custom assertions (status code, response body, headers, timing thresholds)
- Remediation scripts (auto-run Python scripts on failure before final verdict)

### Synthetic Testing
Two types of synthetic tests:

**API Tests**: Chain multiple HTTP requests in sequence. Extract variables from responses and use them in subsequent steps. Assert on status codes, response bodies, and timing.

**Browser Tests** (Playwright): Headless Chromium browser scenarios with step-by-step actions:
- Navigate to URL
- Click elements (CSS selectors)
- Type text into inputs
- Wait for elements/navigation
- Take screenshots at each step
- Assert on page content and element visibility

Test runs capture:
- Total duration
- Per-step results (pass/fail, duration, error messages)
- Screenshots (stored as binary, served via API)
- Region/location of execution

**Notification Channels**: Each synthetic test can be linked to notification channels (Slack, Discord, Jira, Email, Webhook). After every run, results are dispatched to all linked channels — perfect for CI/CD feedback loops.

**CI/CD Integration**: Trigger tests via `POST /organizations/{id}/synthetic-tests/{id}/run/`. Webhook callbacks include a rich JSON payload with event type, test status, step-by-step results, total duration, and error details. Use from GitHub Actions, GitLab CI, Jenkins, or any pipeline.

### Test Management
Organize and track testing efforts beyond individual synthetic tests:

**Test Suites**: Group multiple synthetic tests and manual test cases into named suites. Run all synthetic tests in a suite with one click, creating a suite run snapshot. Each run tracks per-test results (pass/fail/blocked) with progress percentage. Tag suites for filtering.

**Manual Test Cases**: Create manual test cases with step-by-step instructions. Each step has a description and expected result. Prioritize as critical, high, medium, or low. Manual test results are recorded within suite runs.

**Suite Runs**: Each "Run All" action creates a suite run that snapshots all tests in the suite. Synthetic tests run automatically; manual tests can be marked pass/fail/blocked within each run. Full run history with per-test detail.

**Test Plans**: Combine multiple suites into a campaign. Track overall progress based on the latest suite run for each suite — with pass/fail/untested counts and completion percentage. Click any suite to navigate directly to its detail. Use for release sign-off, sprint testing, or regression campaigns.

**AI Test Generation**: Provide a URL and select a test type (Browser, API, or Manual). Argonix uses GPT-4o to analyze the page and generate ready-to-use test steps. Browser tests generate Playwright actions (navigate, click, type, assert). API tests generate HTTP request chains with assertions. Manual tests generate step-by-step cases with expected results.

### Alerting System
9 notification channel types:
- **Email**: Send to configured addresses
- **Slack**: Webhook integration
- **Webhook**: HTTP POST to any URL (ideal for CI/CD callbacks)
- **PagerDuty**: Integration key
- **OpsGenie**: API key integration
- **Telegram**: Bot token + chat ID
- **Discord**: Webhook URL
- **Microsoft Teams**: Webhook URL
- **Jira**: Create issues on failures, add recovery comments on resolution (Jira Cloud REST API v3)

Alert rules support:
- Trigger conditions: status_change, goes_down, goes_up, degraded, ssl_expiry
- Target: specific monitors, monitors with matching tags, or all monitors
- Consecutive failure threshold before alerting
- Cooldown period between repeated alerts
- Multiple channels per rule

### Incident Management
- Auto-created when monitors go down
- Status flow: ongoing → acknowledged → resolved
- Timeline events: triggered, acknowledged, resolved, escalated, comment
- Duration tracking
- Manual creation supported

### Maintenance Windows
- One-time (start/end datetime)
- Recurring: daily, weekly (select weekdays), monthly (day of month)
- Cron expression support
- Optional: scope to a group of monitors
- Alerts are suppressed during active maintenance

### Public Status Pages
Each status page has:
- Unique slug (e.g., status.argonix.io/my-company)
- Custom domain support with DNS verification (CNAME + TXT record)
- Branding: logo, favicon, accent color, custom CSS
- SEO: meta title, meta description
- Content: header text, footer text
- Visibility: public or private (password-protected)
- Health graph toggle
- Monitor display: select monitors, custom display names, ordering, grouping

Status page incidents:
- Severity: minor, major, critical
- Status: investigating, identified, monitoring, resolved, maintenance
- Markdown updates with author tracking
- Affected monitors linking

Subscriber notifications:
- Email and webhook subscribers
- Double opt-in (confirmation token)
- Unsubscribe token

### Metrics & Observability
Aggregated metrics at three granularities:
- **5 minutes**: High-resolution recent data
- **1 hour**: Medium-term trends
- **1 day**: Long-term historical data

Each data point includes:
- Average, min, max response time
- Percentiles: p50, p95, p99
- Total and successful check counts
- Uptime percentage
- DNS/TCP/TLS time breakdown

**Prometheus endpoint**: Export metrics in Prometheus format at `/api/0.1/metrics/prometheus/`

**Uptime summary**: Organization-wide uptime aggregation

### Deployment Tracking
- Record deployments with version, description, commit SHA, service name
- Optional: link to a specific monitor
- Correlate deployment events with performance changes

### Terraform Provider
Official Terraform provider on the HashiCorp Registry for managing Argonix resources as infrastructure-as-code.

- **Registry**: https://registry.terraform.io/modules/argonix-io/argonix/provider/latest
- **GitHub**: https://github.com/argonix-io/terraform-provider-argonix
- 9 resource types: `argonix_monitor`, `argonix_synthetic_test`, `argonix_alert_channel`, `argonix_notification_rule`, `argonix_status_page`, `argonix_group`, `argonix_test_suite`, `argonix_test_plan`, `argonix_manual_test_case`
- All resources have corresponding data sources for reading existing configuration
- Provider configuration requires only an API key

Example:
```hcl
resource "argonix_monitor" "api" {
  name           = "Production API"
  monitor_type   = "http"
  url            = "https://api.example.com/health"
  check_interval = 60
  regions        = jsonencode(["eu-france"])
}
```

### Kubernetes CRD Operator
Deploy and manage Argonix resources natively in Kubernetes using Custom Resource Definitions.

- **GitHub**: https://github.com/argonix-io/kubernetes-crd
- Install via Helm chart
- API group: `argonix.io`, API version: `v1alpha1`
- 9 CRD types: Monitor, SyntheticTest, AlertChannel, NotificationRule, StatusPage, Group, TestSuite, TestPlan, ManualTestCase
- Automatic reconciliation with the Argonix API (create, update, delete)
- Works with GitOps tools: ArgoCD, Flux, kubectl apply

Example:
```yaml
apiVersion: argonix.io/v1alpha1
kind: Monitor
metadata:
  name: production-api
spec:
  name: Production API
  monitorType: http
  url: https://api.example.com/health
  checkInterval: 60
  regions:
    - eu-france
```

### GitOps / Monitoring as Code
- Version-control your entire monitoring stack alongside application code
- Review monitoring changes in pull requests
- Apply via CI/CD pipelines (`terraform apply` or `kubectl apply`)
- Drift detection and auto-reconciliation
- Use Terraform workspaces or Kustomize overlays for multi-environment management (staging vs production)
- Seamless integration with GitHub Actions, GitLab CI, Jenkins

### Billing (Stripe)

| Feature | Free | Startup (€10/mo) | Pro (€200/mo) | Custom |
|---------|------|-------------------|---------------|--------|
| Argos AI runs/mo | 20 | 100 | 1,000 | Unlimited |
| Monitors | 1 | 5 | 100 | Unlimited |
| Synthetic runs/mo | 20 | 100 | 1,000 | Unlimited |
| SSO | Yes | Yes | Yes | Yes |
| API access | No | No | Yes | Yes |
| Prometheus export | No | No | Yes | Yes |
| Self-hosted | No | No | No | Yes |
| Dedicated support | No | No | No | Yes |

---

## API Reference

All endpoints are prefixed with `/api/0.1/`.

### Authentication
- `POST /accounts/token/` — Obtain JWT access + refresh tokens
- `POST /accounts/logout/` — Blacklist refresh token
- `POST /token/refresh/` — Refresh access token
- `POST /token/verify/` — Verify token validity
- API Key authentication: `Authorization: Api-Key ax_...` header

### Account Management
- `POST /accounts/register/` — Create new account
- `GET/PATCH /accounts/profile/` — View/update profile
- `POST /accounts/change-password/` — Change password
- `POST /accounts/verify/email/request/` — Request email verification
- `POST /accounts/verify/email/confirm/` — Confirm email verification code
- `GET /accounts/social/providers/` — List social login providers
- `POST /accounts/social/login/` — Social login
- `DELETE /accounts/delete/` — Delete account
- `POST /accounts/reset-password/email/` — Password reset

### Organizations
- `GET/POST /organizations/` — List/create organizations
- `GET/PATCH/DELETE /organizations/{id}/` — Retrieve/update/delete
- `GET/POST /organizations/{id}/members/` — List/add members
- `GET/POST /organizations/{id}/invitations/` — List/send invitations
- `GET /invitations/` — List user's pending invitations

### Groups
- `GET/POST /organizations/{id}/groups/` — List/create groups
- `GET/PATCH/DELETE /organizations/{id}/groups/{id}/` — CRUD group
- `GET/POST /organizations/{id}/groups/{id}/members/` — Group members

### Monitors
- `GET/POST /organizations/{id}/monitors/` — List/create monitors
- `GET/PATCH/DELETE /organizations/{id}/monitors/{id}/` — CRUD monitor
- `GET /organizations/{id}/monitors/{id}/results/` — Check results history
- `GET /organizations/{id}/monitors/{id}/metrics/` — Monitor metrics

### Synthetic Tests
- `GET/POST /organizations/{id}/synthetic-tests/` — List/create
- `GET/PATCH/DELETE /organizations/{id}/synthetic-tests/{id}/` — CRUD test
- `POST /organizations/{id}/synthetic-tests/{id}/run/` — Trigger a test run (CI/CD)
- `GET /organizations/{id}/synthetic-tests/{id}/runs/` — Test run history
- `GET /organizations/{id}/synthetic-tests/{id}/runs/{id}/screenshots/{step}/` — Step screenshot

### Test Management
- `GET/POST /organizations/{id}/test-suites/` — List/create test suites
- `GET/PATCH/DELETE /organizations/{id}/test-suites/{id}/` — CRUD suite
- `POST /organizations/{id}/test-suites/{id}/run_all/` — Run all tests in suite (creates suite run)
- `GET/DELETE /organizations/{id}/suite-runs/` — List/delete suite runs (?suite= filter)
- `GET /organizations/{id}/suite-runs/{id}/` — Suite run detail with per-test results
- `GET/PATCH /organizations/{id}/suite-run-results/` — List/update individual test results within a run
- `GET/POST /organizations/{id}/manual-test-cases/` — List/create manual test cases
- `GET/PATCH/DELETE /organizations/{id}/manual-test-cases/{id}/` — CRUD test case
- `GET/POST /organizations/{id}/test-plans/` — List/create test plans
- `GET/PATCH/DELETE /organizations/{id}/test-plans/{id}/` — CRUD plan
- `POST /organizations/{id}/testing/ai/generate/` — AI-generate test steps from URL

### Alerting
- `GET/POST /organizations/{id}/alert-channels/` — List/create channels
- `GET/PATCH/DELETE /organizations/{id}/alert-channels/{id}/` — CRUD channel
- `GET/POST /organizations/{id}/alert-rules/` — List/create rules
- `GET/PATCH/DELETE /organizations/{id}/alert-rules/{id}/` — CRUD rule

### Incidents
- `GET/POST /organizations/{id}/incidents/` — List/create incidents
- `GET/PATCH /organizations/{id}/incidents/{id}/` — View/update incident

### Maintenance Windows
- `GET/POST /organizations/{id}/maintenance-windows/` — List/create
- `GET/PATCH/DELETE /organizations/{id}/maintenance-windows/{id}/` — CRUD window

### Status Pages
- `GET/POST /organizations/{id}/status-pages/` — List/create
- `GET/PATCH/DELETE /organizations/{id}/status-pages/{id}/` — CRUD page
- `POST /organizations/{id}/status-pages/{id}/verify-domain/` — Verify custom domain
- `GET/POST /organizations/{id}/status-pages/{id}/monitors/` — Page monitors
- `GET/POST /organizations/{id}/status-pages/{id}/incidents/` — Page incidents
- `GET/POST /organizations/{id}/status-pages/{id}/incidents/{id}/updates/` — Incident updates

### Public Status Page (no auth)
- `GET /status/{slug}/` — Public status page data
- `POST /status/{slug}/subscribe/` — Subscribe (email/webhook)
- `GET /status/confirm/{token}/` — Confirm subscription
- `GET /status/unsubscribe/{token}/` — Unsubscribe

### Billing
- `GET /pricing/plans/` — List pricing plans
- `GET /organizations/{id}/usage/` — Current usage & limits
- `POST /organizations/{id}/billing/checkout/` — Create Stripe checkout session
- `POST /organizations/{id}/billing/portal/` — Create Stripe billing portal session
- `POST /organizations/{id}/billing/cancel/` — Cancel subscription
- `POST /billing/webhook/` — Stripe webhook handler

### Deployments
- `GET/POST /organizations/{id}/deployments/` — List/create deployments
- `GET/PATCH/DELETE /organizations/{id}/deployments/{id}/` — CRUD deployment

### API Keys
- `GET/POST /organizations/{id}/api-keys/` — List/create API keys
- `GET/PATCH/DELETE /organizations/{id}/api-keys/{id}/` — CRUD API key

### Metrics
- `GET /organizations/{id}/uptime-summary/` — Org-wide uptime summary
- `GET /metrics/prometheus/` — Prometheus-format metrics export

### Heartbeat
- `GET/POST /heartbeat/{monitor_id}/{token}/` — Push heartbeat (public, no auth)

### Announcements & Feedback
- `GET /announcements/active/` — Active system announcements
- `GET /organizations/{id}/feedback/question/` — Get feedback question
- `POST /organizations/{id}/feedback/respond/` — Submit feedback response

### Argos AI Agent
- `GET/POST /organizations/{id}/argos/conversations/` — List/create AI conversations
- `GET/DELETE /organizations/{id}/argos/conversations/{id}/` — Retrieve/delete conversation
- `POST /organizations/{id}/argos/conversations/{id}/message/` — Send message (triggers AI response)
- `GET /organizations/{id}/argos/personas/` — List available personas
- `GET /organizations/{id}/argos/connectors/` — List available connectors and their tools
- `GET /organizations/{id}/argos/workflows/` — List pre-built workflows
- `GET/POST /organizations/{id}/argos/knowledge/` — List/upload knowledge documents
- `GET/DELETE /organizations/{id}/argos/knowledge/{id}/` — Retrieve/delete document
- `GET /organizations/{id}/argos/dashboard/` — AI usage dashboard (runs, costs, top personas)

---

## Data Models

### Account
User model with email-based auth, phone verification, referral system, and preferences (language, timezone).

### Organization
Multi-tenant workspace with UUID-based ID, slug, roles (owner/admin/editor/viewer), invitation system, and billing subscription.

### Monitor
Core entity representing a monitored endpoint. Supports 9 types with extensive configuration (headers, auth, assertions, remediation, regions, tags).

### SyntheticTest
Multi-step test scenario (API or browser-based) with configurable intervals, assertions, Playwright browser automation, and optional notification channels for CI/CD feedback.

### TestSuite, ManualTestCase, TestPlan, SuiteRun, SuiteRunResult
Test management entities for organizing, executing, and tracking tests. Suites group synthetic tests and manual test cases; suite runs snapshot test execution with per-test pass/fail/blocked results; manual test cases have step-by-step instructions; test plans combine suites into campaigns with progress tracking based on latest suite runs.

### CheckResult
Individual check execution result with full timing breakdown (DNS, TCP, TLS, transfer), status code, SSL info, and assertion results.

### Incident
Tracks service disruptions from detection to resolution with timeline events and duration metrics.

### AlertChannel & AlertRule
Configurable notification pipeline with 9 channel types (including Jira) and rule-based targeting (by monitor, tags, or all monitors).

### StatusPage
Public-facing status dashboard with custom branding, domain, and subscriber notification system.

### MetricDataPoint
Aggregated performance data at 5-minute, 1-hour, and 1-day granularities with response time percentiles and uptime tracking.

---

## Tech Stack

- **Backend**: Python 3.13, Django 5, Django REST Framework, Celery + Redis
- **Database**: PostgreSQL + pgvector (RAG embeddings)
- **AI**: LangChain, Ollama (Qwen 3.5 9B), Google Vertex AI, Anthropic, OpenAI
- **Frontend**: Vue 3 (Composition API), Vite, TailwindCSS, vue-i18n
- **Browser Testing**: Playwright (headless Chromium)
- **Billing**: Stripe (subscriptions, checkout sessions, billing portal)
- **Auth**: JWT (SimpleJWT) + custom API key authentication
- **Monitoring**: Multi-region check execution, 30s minimum intervals
- **Observability**: Prometheus metrics export, Sentry error tracking
- **Infrastructure as Code**: Official Terraform provider, Kubernetes CRD operator (Helm chart)
- **i18n**: English and French (route-based locale switching)
- **Deployment**: Docker (multi-stage builds), uWSGI