Skip to content

Execution Context

Colony is a shared-nothing distributed system where multiple tenants run workloads on the same cluster. The execution context is the ambient credential that every operation carries, answering three questions: who is running, what scope they're in, and what privilege they have.

OS analogy

Think of the execution context like a CPU's privilege ring register combined with a process's UID/GID. Every instruction (@serving.endpoint call in Colony) runs at a specific ring level, and the system enforces access control based on that level.

The ExecutionContext Object

from polymathera.colony.distributed.ray_utils.serving.context import (
    Ring, ExecutionContext, execution_context,
)

# User-mode context (tenant-scoped)
with execution_context(
    ring=Ring.USER,
    colony_id="acme-monorepo",
    tenant_id="acme-corp",
    session_id="sess-abc123",
    run_id="run-def456",
    origin="cli",
):
    result = await handle.start_analysis(config)

# Kernel-mode context (infrastructure)
with execution_context(ring=Ring.KERNEL, origin="vcm_reconciler"):
    names = await handle.get_all_deployment_names()

The context is a frozen dataclass — immutable once created. You don't mutate it; you create a new one for a new scope (e.g., when a kernel task iterates over tenants).

Field Type Description
ring Ring Privilege level: KERNEL (0) or USER (3)
colony_id str \| None Colony instance (required for Ring.USER)
tenant_id str \| None Organization/tenant (required for Ring.USER)
session_id str \| None User session (optional)
run_id str \| None Analysis run (optional)
trace_id str \| None Distributed tracing ID (optional)
origin str \| None What created this context (audit)

Privilege Rings

PRIVILEGE RINGS RING 3 — USER Requires colony_id + tenant_id RING 0 — KERNEL No tenant context required get_deployment_names() reconcile_page_state() health_check() get_replica_count() syscall

Colony uses two privilege rings, modeled after CPU protection rings:

Ring Value Tenant context Use case
KERNEL 0 Not required VCM reconciliation, health checks, autoscaling, deployment management
USER 3 Required (colony_id + tenant_id) Agent execution, analysis runs, session management

Calling Rules

Caller Target Allowed?
USER USER Yes (tenant context validated)
USER KERNEL Yes (syscall pattern)
KERNEL KERNEL Yes
KERNEL USER No — must enter USER context first
No context Any Error

Syscall pattern

User-mode code can call kernel-mode endpoints freely — just like user processes making system calls. The kernel endpoint simply doesn't validate tenant fields. The execution context is still propagated for audit and tracing.


Context Propagation

Python contextvars don't survive Ray .remote() calls. The serving framework bridges this gap by serializing the ExecutionContext into DeploymentRequest and restoring it on the other side.

Caller Process contextvars: ExecutionContext [A] Capture into request DeploymentHandle reads ctx, puts in DeploymentRequest Ray .remote() Proxy Actor [B] Restore context Routes request to replica queue contextvars fresh (restored from req) Ray .remote() Replica Actor [C] Restore + enforce Validates ring vs endpoint requirement Then calls method(*args) Resets on exit Request Lifecycle: Context Propagation

Injection Points

Point Location What happens
A DeploymentHandle.__getattr__ Reads ExecutionContext from contextvars, serializes into DeploymentRequest
B proxy.handle_request Restores ExecutionContext from request (for proxy-local logging/metrics)
C __handle_request__ on replica Restores context, validates caller ring vs endpoint ring, calls method

Why asyncio.create_task Works

Python's asyncio.create_task copies the current contextvars.Context at task creation time (PEP 567). So when start_agent (running inside an ExecutionContext) does asyncio.create_task(self._run_agent_loop(...)), the agent loop inherits the full context — including ring, tenant, and session — for its entire lifetime.


Endpoint Ring Declaration

Every @serving.endpoint declares its ring level:

from polymathera.colony.distributed.ray_utils.serving import context as ctx

@serving.endpoint(ring=ctx.Ring.KERNEL)
async def get_all_deployment_names(self) -> list[str]:
    """Infrastructure endpoint -- no tenant context needed."""
    ...

@serving.endpoint  # defaults to Ring.USER
async def start_agent(self, blueprint: AgentBlueprint) -> str:
    """Tenant-scoped endpoint -- requires colony_id + tenant_id."""
    ...

The ring is stored as func.__endpoint_ring__ by the decorator and collected into self._endpoint_rings during replica initialization. The __handle_request__ method reads it to enforce access control before calling the actual method.


Background Tasks

Background tasks that run outside @serving.endpoint (periodic reconciliation, health checks, resource monitors) must set their own execution context. These are kernel-mode by definition:

from polymathera.colony.distributed.ray_utils.serving.context import (
    Ring, execution_context,
)

async def _periodic_reconciliation_loop(self):
    while True:
        await asyncio.sleep(30)
        with execution_context(ring=Ring.KERNEL, origin="vcm_reconciler"):
            await self._reconcile_page_state()

@periodic_health_check Methods

Periodic health checks are routed through __handle_request__ with a kernel-mode ExecutionContext. This means the health check method runs with proper context and can make downstream deployment calls without crashing.


Ring Transitions

USER to KERNEL (automatic)

When user-mode code calls a kernel endpoint, the existing ExecutionContext is propagated as-is. The kernel endpoint simply doesn't validate tenant fields. No explicit transition needed.

KERNEL to USER (explicit)

A kernel task that needs to operate per-tenant must explicitly create a USER context:

# Inside a kernel-mode background task
with execution_context(ring=Ring.KERNEL, origin="garbage_collector"):
    tenants = await admin_handle.list_all_tenants()
    for tenant in tenants:
        with execution_context(
            ring=Ring.USER,
            colony_id=tenant.colony_id,
            tenant_id=tenant.tenant_id,
            origin="garbage_collector:per_tenant",
        ):
            await handle.cleanup_expired_caches()

The execution_context manager uses token-based reset, so nesting works correctly — the inner context fully replaces the outer one, and the outer is restored on exit.


Reading Context in Downstream Code

Any code running inside an execution context can read it:

from polymathera.colony.distributed.ray_utils import serving

# Full context object
ctx = serving.require_execution_context()
print(ctx.ring, ctx.colony_id, ctx.tenant_id)

# Shorthand accessors
colony_id = serving.get_colony_id()
tenant_id = serving.get_tenant_id()
session_id = serving.get_session_id()

# Strict accessors (raise if None)
colony_id = serving.require_colony_id()
tenant_id = serving.require_tenant_id()

The check_isolation Decorator

Agents use @check_isolation to verify that the ambient execution context matches the agent's identity — catching context leaks or cross-tenant contamination:

@check_isolation
async def run_step(self) -> None:
    # This will raise RuntimeError if:
    # - ctx.tenant_id != self.tenant_id
    # - ctx.colony_id != self.colony_id
    ...