Observability¶

Colony provides built-in distributed observability for long-running multi-agent sessions. Both traces (structured span data) and logs (standard Python logging) are durably persisted via Kafka and PostgreSQL, enabling post-mortem debugging even after agents have stopped.

Architecture¶

Traces¶

Traces capture structured execution spans (LLM calls, agent steps, tool invocations) with parent-child relationships, token counts, and timing data. See the AgentTracingFacility for details.

Pipeline: AgentTracingFacility → SpanProducer → Kafka (colony.spans) → SpanConsumer → PostgreSQL spans table → SpanQueryStore → Dashboard Traces tab.

Logs¶

Every Python log record emitted under the polymathera.colony namespace is captured, enriched with execution context, and durably stored.

How it works¶

KafkaLogHandler — A standard Python logging.Handler attached to the polymathera.colony root logger during deployment initialization. It intercepts all log records without requiring changes to existing log calls.
Context enrichment — Each log record is enriched with the current ExecutionContext (tenant_id, colony_id, session_id, run_id, trace_id) when available. This enables filtering logs by session or correlating them with traces.
Async batching — Log records are queued in-process and flushed to Kafka asynchronously in batches (default: 50 records every 2 seconds). The emit() method never blocks the caller.
LogConsumer — Runs in the dashboard backend container. Reads from the colony.logs Kafka topic and batch-inserts into PostgreSQL.
LogQueryStore — Provides filtered, paginated queries over persisted logs:
Filter by session, run, trace, actor class, log level
Full-text search in messages (case-insensitive)
Time range queries
Aggregate statistics (error counts, actor summaries)

Querying logs¶

The dashboard exposes persistent log endpoints:

GET /api/v1/logs/persistent?session_id=X&level=WARNING&limit=100
GET /api/v1/logs/persistent?run_id=Y&search=timeout
GET /api/v1/logs/persistent?actor_class=StandaloneAgentDeployment&since=1712000000
GET /api/v1/logs/persistent/stats?session_id=X
GET /api/v1/logs/persistent/actors

These work even after the application stops — logs are durably stored in PostgreSQL as long as the dashboard and database containers are running.

Log record schema¶

Each log record stored in PostgreSQL contains:

Field	Type	Description
`log_id`	TEXT	Unique identifier
`timestamp`	TIMESTAMPTZ	When the log was emitted
`level`	TEXT	DEBUG, INFO, WARNING, ERROR, CRITICAL
`logger_name`	TEXT	Python logger name (e.g., `polymathera.colony.agents.base`)
`message`	TEXT	Log message
`module`	TEXT	Python module name
`func_name`	TEXT	Function that emitted the log
`line_no`	INTEGER	Source line number
`pid`	INTEGER	Process ID
`actor_class`	TEXT	Deployment class name (e.g., `StandaloneAgentDeployment`)
`node_id`	TEXT	Ray node ID
`tenant_id`	TEXT	Tenant ID (from execution context)
`colony_id`	TEXT	Colony ID (from execution context)
`session_id`	TEXT	Session ID (from execution context)
`run_id`	TEXT	Run ID (from execution context)
`trace_id`	TEXT	Trace ID (for correlation with spans)
`exc_info`	TEXT	Exception traceback if present

Indexes¶

Logs are indexed for fast queries on common access patterns:

(session_id, timestamp DESC) — all logs for a session, newest first
(run_id, timestamp DESC) — all logs for a specific run
(actor_class, timestamp DESC) — all logs from a deployment type
(level, timestamp DESC) — find errors/warnings quickly
(trace_id, timestamp DESC) — correlate logs with traces
(timestamp DESC) — global time-ordered access

Setup¶

The log pipeline is automatic. When KAFKA_BOOTSTRAP is set in the environment (which it is in all Docker containers), every deployment attaches the KafkaLogHandler during initialization. No configuration required.

To disable the log pipeline, unset the KAFKA_BOOTSTRAP environment variable.

Infrastructure requirements¶

Both traces and logs require:

Kafka — Message broker for reliable delivery and replay
PostgreSQL — Durable storage and indexed queries
Dashboard container — Runs the Kafka consumers that sink to PostgreSQL

All three are included in the default colony-env Docker Compose setup.