Observability¶
Colony provides built-in distributed observability for long-running multi-agent sessions. Both traces (structured span data) and logs (standard Python logging) are durably persisted via Kafka and PostgreSQL, enabling post-mortem debugging even after agents have stopped.
Architecture¶
Traces¶
Traces capture structured execution spans (LLM calls, agent steps, tool invocations) with parent-child relationships, token counts, and timing data. See the AgentTracingFacility for details.
Pipeline: AgentTracingFacility → SpanProducer → Kafka (colony.spans) → SpanConsumer → PostgreSQL spans table → SpanQueryStore → Dashboard Traces tab.
Logs¶
Every Python log record emitted under the polymathera.colony namespace is captured, enriched with execution context, and durably stored.
How it works¶
-
KafkaLogHandler— A standard Pythonlogging.Handlerattached to thepolymathera.colonyroot logger during deployment initialization. It intercepts all log records without requiring changes to existing log calls. -
Context enrichment — Each log record is enriched with the current
ExecutionContext(tenant_id,colony_id,session_id,run_id,trace_id) when available. This enables filtering logs by session or correlating them with traces. -
Async batching — Log records are queued in-process and flushed to Kafka asynchronously in batches (default: 50 records every 2 seconds). The
emit()method never blocks the caller. -
LogConsumer— Runs in the dashboard backend container. Reads from thecolony.logsKafka topic and batch-inserts into PostgreSQL. -
LogQueryStore— Provides filtered, paginated queries over persisted logs: - Filter by session, run, trace, actor class, log level
- Full-text search in messages (case-insensitive)
- Time range queries
- Aggregate statistics (error counts, actor summaries)
Querying logs¶
The dashboard exposes persistent log endpoints:
GET /api/v1/logs/persistent?session_id=X&level=WARNING&limit=100
GET /api/v1/logs/persistent?run_id=Y&search=timeout
GET /api/v1/logs/persistent?actor_class=StandaloneAgentDeployment&since=1712000000
GET /api/v1/logs/persistent/stats?session_id=X
GET /api/v1/logs/persistent/actors
These work even after the application stops — logs are durably stored in PostgreSQL as long as the dashboard and database containers are running.
Log record schema¶
Each log record stored in PostgreSQL contains:
| Field | Type | Description |
|---|---|---|
log_id |
TEXT | Unique identifier |
timestamp |
TIMESTAMPTZ | When the log was emitted |
level |
TEXT | DEBUG, INFO, WARNING, ERROR, CRITICAL |
logger_name |
TEXT | Python logger name (e.g., polymathera.colony.agents.base) |
message |
TEXT | Log message |
module |
TEXT | Python module name |
func_name |
TEXT | Function that emitted the log |
line_no |
INTEGER | Source line number |
pid |
INTEGER | Process ID |
actor_class |
TEXT | Deployment class name (e.g., StandaloneAgentDeployment) |
node_id |
TEXT | Ray node ID |
tenant_id |
TEXT | Tenant ID (from execution context) |
colony_id |
TEXT | Colony ID (from execution context) |
session_id |
TEXT | Session ID (from execution context) |
run_id |
TEXT | Run ID (from execution context) |
trace_id |
TEXT | Trace ID (for correlation with spans) |
exc_info |
TEXT | Exception traceback if present |
Indexes¶
Logs are indexed for fast queries on common access patterns:
(session_id, timestamp DESC)— all logs for a session, newest first(run_id, timestamp DESC)— all logs for a specific run(actor_class, timestamp DESC)— all logs from a deployment type(level, timestamp DESC)— find errors/warnings quickly(trace_id, timestamp DESC)— correlate logs with traces(timestamp DESC)— global time-ordered access
Setup¶
The log pipeline is automatic. When KAFKA_BOOTSTRAP is set in the environment (which it is in all Docker containers), every deployment attaches the KafkaLogHandler during initialization. No configuration required.
To disable the log pipeline, unset the KAFKA_BOOTSTRAP environment variable.
Infrastructure requirements¶
Both traces and logs require:
- Kafka — Message broker for reliable delivery and replay
- PostgreSQL — Durable storage and indexed queries
- Dashboard container — Runs the Kafka consumers that sink to PostgreSQL
All three are included in the default colony-env Docker Compose setup.