Execution Context¶

Colony is a shared-nothing distributed system where multiple tenants run workloads on the same cluster. The execution context is the ambient credential that every operation carries, answering three questions: who is running, what scope they're in, and what privilege they have.

OS analogy

Think of the execution context like a CPU's privilege ring register combined with a process's UID/GID. Every instruction (@serving.endpoint call in Colony) runs at a specific ring level, and the system enforces access control based on that level.

The `ExecutionContext` Object¶

from polymathera.colony.distributed.ray_utils.serving.context import (
    Ring, ExecutionContext, execution_context,
)

# User-mode context (tenant-scoped)
with execution_context(
    ring=Ring.USER,
    colony_id="acme-monorepo",
    tenant_id="acme-corp",
    session_id="sess-abc123",
    run_id="run-def456",
    origin="cli",
):
    result = await handle.start_analysis(config)

# Kernel-mode context (infrastructure)
with execution_context(ring=Ring.KERNEL, origin="vcm_reconciler"):
    names = await handle.get_all_deployment_names()

The context is a frozen dataclass — immutable once created. You don't mutate it; you create a new one for a new scope (e.g., when a kernel task iterates over tenants).

Field	Type	Description
`ring`	`Ring`	Privilege level: `KERNEL` (0) or `USER` (3)
`colony_id`	`str \\| None`	Colony instance (required for `Ring.USER`)
`tenant_id`	`str \\| None`	Organization/tenant (required for `Ring.USER`)
`session_id`	`str \\| None`	User session (optional)
`run_id`	`str \\| None`	Analysis run (optional)
`trace_id`	`str \\| None`	Distributed tracing ID (optional)
`origin`	`str \\| None`	What created this context (audit)

Privilege Rings¶

Colony uses two privilege rings, modeled after CPU protection rings:

Ring	Value	Tenant context	Use case
KERNEL	0	Not required	VCM reconciliation, health checks, autoscaling, deployment management
USER	3	Required (`colony_id` + `tenant_id`)	Agent execution, analysis runs, session management

Calling Rules¶

Caller	Target	Allowed?
USER	USER	Yes (tenant context validated)
USER	KERNEL	Yes (syscall pattern)
KERNEL	KERNEL	Yes
KERNEL	USER	No — must enter USER context first
No context	Any	Error

Syscall pattern

User-mode code can call kernel-mode endpoints freely — just like user processes making system calls. The kernel endpoint simply doesn't validate tenant fields. The execution context is still propagated for audit and tracing.

Context Propagation¶

Python contextvars don't survive Ray .remote() calls. The serving framework bridges this gap by serializing the ExecutionContext into DeploymentRequest and restoring it on the other side.

Injection Points¶

Point	Location	What happens
A	`DeploymentHandle.__getattr__`	Reads `ExecutionContext` from contextvars, serializes into `DeploymentRequest`
B	`proxy.handle_request`	Restores `ExecutionContext` from request (for proxy-local logging/metrics)
C	`__handle_request__` on replica	Restores context, validates caller ring vs endpoint ring, calls method

Why `asyncio.create_task` Works¶

Python's asyncio.create_task copies the current contextvars.Context at task creation time (PEP 567). So when start_agent (running inside an ExecutionContext) does asyncio.create_task(self._run_agent_loop(...)), the agent loop inherits the full context — including ring, tenant, and session — for its entire lifetime.

Endpoint Ring Declaration¶

Every @serving.endpoint declares its ring level:

from polymathera.colony.distributed.ray_utils.serving import context as ctx

@serving.endpoint(ring=ctx.Ring.KERNEL)
async def get_all_deployment_names(self) -> list[str]:
    """Infrastructure endpoint -- no tenant context needed."""
    ...

@serving.endpoint  # defaults to Ring.USER
async def start_agent(self, blueprint: AgentBlueprint) -> str:
    """Tenant-scoped endpoint -- requires colony_id + tenant_id."""
    ...

The ring is stored as func.__endpoint_ring__ by the decorator and collected into self._endpoint_rings during replica initialization. The __handle_request__ method reads it to enforce access control before calling the actual method.

Background Tasks¶

Background tasks that run outside @serving.endpoint (periodic reconciliation, health checks, resource monitors) must set their own execution context. These are kernel-mode by definition:

from polymathera.colony.distributed.ray_utils.serving.context import (
    Ring, execution_context,
)

async def _periodic_reconciliation_loop(self):
    while True:
        await asyncio.sleep(30)
        with execution_context(ring=Ring.KERNEL, origin="vcm_reconciler"):
            await self._reconcile_page_state()

`@periodic_health_check` Methods¶

Periodic health checks are routed through __handle_request__ with a kernel-mode ExecutionContext. This means the health check method runs with proper context and can make downstream deployment calls without crashing.

Ring Transitions¶

USER to KERNEL (automatic)¶

When user-mode code calls a kernel endpoint, the existing ExecutionContext is propagated as-is. The kernel endpoint simply doesn't validate tenant fields. No explicit transition needed.

KERNEL to USER (explicit)¶

A kernel task that needs to operate per-tenant must explicitly create a USER context:

# Inside a kernel-mode background task
with execution_context(ring=Ring.KERNEL, origin="garbage_collector"):
    tenants = await admin_handle.list_all_tenants()
    for tenant in tenants:
        with execution_context(
            ring=Ring.USER,
            colony_id=tenant.colony_id,
            tenant_id=tenant.tenant_id,
            origin="garbage_collector:per_tenant",
        ):
            await handle.cleanup_expired_caches()

The execution_context manager uses token-based reset, so nesting works correctly — the inner context fully replaces the outer one, and the outer is restored on exit.

Reading Context in Downstream Code¶

Any code running inside an execution context can read it:

from polymathera.colony.distributed.ray_utils import serving

# Full context object
ctx = serving.require_execution_context()
print(ctx.ring, ctx.colony_id, ctx.tenant_id)

# Shorthand accessors
colony_id = serving.get_colony_id()
tenant_id = serving.get_tenant_id()
session_id = serving.get_session_id()

# Strict accessors (raise if None)
colony_id = serving.require_colony_id()
tenant_id = serving.require_tenant_id()

The `check_isolation` Decorator¶

Agents use @check_isolation to verify that the ambient execution context matches the agent's identity — catching context leaks or cross-tenant contamination:

@check_isolation
async def run_step(self) -> None:
    # This will raise RuntimeError if:
    # - ctx.tenant_id != self.tenant_id
    # - ctx.colony_id != self.colony_id
    ...