Configuration System¶
Colony's configuration system is one centralized, typed, layered store, defined
by Pydantic ConfigComponent classes. Every operator-tunable value Colony
exposes — LLM topology, agent system, sandbox images, custom deployments,
observability, capability secrets — flows through it. Extensions register
new components from outside the public colony source tree without modifying
it. Operators override values at deploy time via a single --config
YAML; tenants and sessions layer their own overrides on top at runtime.
graph LR
Defaults["Pydantic field defaults<br/>(in each ConfigComponent class)"]
YAML["Operator YAML<br/>(--config /path/to/file.yaml)"]
Env["Environment variables<br/>(json_schema_extra: env)"]
L2["L2 — tenant overlay"]
L3["L3 — session overlay"]
L4["L4 — runtime overlay<br/>(custom deployments)"]
Defaults --> Loaded["Loaded config (L1)"]
YAML --> Loaded
Env --> Loaded
Loaded --> Composed["get_component_for(path,<br/>tenant_id, session_id)"]
L2 --> Composed
L3 --> Composed
L4 --> Composed
What the system gives you¶
| Need | Surface |
|---|---|
| Discover what's tunable | cm.get_schema(format="yaml") — the JSON schema of every registered component. |
| Read a typed value | cm.get_component("agent_system") returns a typed AgentSystemConfig. |
| Override at deploy time | One YAML file, passed via colony-env up --config <path>. |
| Override per environment | Field-level POLYMATHERA_<PATH>_<FIELD> env-var binding. |
| Override per tenant / session at runtime | cm.update_overlay(path, updates, scope=OverlayScope.tenant(tid)). |
| Push runtime values from a custom deployment | await ctx.write_runtime_overlay(path, updates) after provision(). |
| Add a new tunable from an external package | @register_polymathera_config(path="my_ext.thing") on a ConfigComponent subclass; declare in [tool.poetry.plugins."polymathera.config_components"]. |
Resolution chain¶
For every field of every registered ConfigComponent, the precedence (lowest
→ highest) is:
- Pydantic field default declared on the component class.
- Operator YAML at
--config, loaded byConfigurationManageronce per startup. - Environment variable declared on the field via
json_schema_extra={"env": "FOO_BAR"}. Plus the catch-allPOLYMATHERA_<dotted_path>_<field>always works. - Tier overlays (L2 tenant / L3 session / L4 runtime) layer on top at read time when the caller asks for a scoped view.
There is no fallback location search. --config is the only file path. The
resolution is deterministic — no ambiguity about which file produced a value.
A typed component, end to end¶
A ConfigComponent is a Pydantic model with a registered path and per-field
metadata. Defaults live in the class; env bindings live in
json_schema_extra; tier metadata is added via the tier_metadata helper
so the loader and overlay store know who is allowed to write what.
from pydantic import Field
from polymathera.colony.distributed.config import (
ConfigComponent, Mutability, Tier,
register_polymathera_config, tier_metadata,
)
@register_polymathera_config(path="capabilities.web_search")
class WebSearchConfig(ConfigComponent):
api_key: str = Field(
default="",
json_schema_extra={
"env": "TAVILY_API_KEY", "optional": True,
**tier_metadata(tier=Tier.L1_OPERATOR, mutability=Mutability.RELOADABLE),
},
)
Operators set it three equivalent ways:
Capabilities read it via a sync helper, not by re-reading env vars themselves:
from polymathera.colony.agents.configs import get_web_search_config
class WebSearchCapability(WebSearchAdapter):
def __init__(self, *, api_key: str | None = None, ...):
self._api_key = api_key or get_web_search_config().api_key
The helper degrades to defaults when the global manager has not initialized
yet (e.g. inside unit tests that build a capability directly). All capability
secrets that previously read os.environ directly use this pattern now —
WebSearchCapability, GitHubCapability, ChromaMemoryBackend, the
two cluster tracing call sites, the web-UI backend.
Tier-aware overlays¶
L2/L3/L4 overlays let you change values after deploy without touching the operator YAML. Each layer has a meaning:
| Tier | Scope key | Persistence | Use case |
|---|---|---|---|
L1_OPERATOR |
global | YAML / env vars | Cluster topology, default LLM, capability secrets — anything the operator owns at deploy time. |
L2_TENANT |
tenant_id |
StateManager (CAS) | Per-tenant quota raises, per-tenant API key overrides, per-tenant analysis selection. |
L3_SESSION |
session_id |
StateManager (CAS) | Per-session timeouts, budgets, repo selection. |
L4_RUNTIME |
deployment name | StateManager (CAS) | Values produced by a custom deployment's provision() (e.g. an HPC stack returning its scheduler URL). |
A field's declared tier is the highest layer permitted to write it. A tenant
overlay cannot override a field marked L1_OPERATOR; a session cannot override
an L2_TENANT field. Writes that violate this raise PermissionError before
they touch the state.
from polymathera.colony.distributed.config import OverlayScope
# Read the tenant-and-session-composed view of a component.
quotas = await cm.get_component_for(
"tenant_quotas", tenant_id="acme", session_id="s-42",
)
# Write a tenant overlay.
await cm.update_overlay(
"tenant_quotas",
{"max_concurrent_agents": 200},
scope=OverlayScope.tenant("acme"),
)
L1 is in-process. L2/L3/L4 share one ConfigOverlayState document
persisted via the existing colony StateManager (the same primitive VCM and
the convergence runtime use). Cross-replica consistency rides on the
StateManager's CAS — no etcd, no separate config-update protocol.
Custom deployments¶
The single most important extension surface for operators: a typed mechanism to plug in externally-managed resources (HPC stacks, AWS-CDK stacks, Slurm clusters, MQTT bridges, …) without coupling Colony to the implementation. Colony defines the contract; an extension package implements concrete handlers.
sequenceDiagram
autonumber
participant Op as Operator YAML
participant CM as ConfigurationManager
participant H as CustomDeployment handler<br/>(extension)
participant OS as OverlayStore (L4)
participant Reader as Any consumer
Op->>CM: custom_deployments.<name>.handler = "aws_cdk_hpc"
CM->>H: provision(ctx) (when auto_provision=true)
H->>H: bring up the resource
H->>OS: ctx.write_runtime_overlay("hpc.endpoints", {...})
Reader->>CM: get_component_for("hpc.endpoints")
CM-->>Reader: defaults ⊕ L4 overlay
Implementing a handler¶
from polymathera.colony.deployments import (
DeploymentContext, register_custom_deployment,
)
@register_custom_deployment("aws_cdk_hpc")
class AwsCdkHpc:
name = "aws_cdk_hpc"
async def provision(self, ctx: DeploymentContext) -> None:
endpoint, token = await _stack_up(ctx.config_manager.get_component("aws"))
await ctx.write_runtime_overlay(
"hpc.endpoints",
{"scheduler_url": endpoint, "auth_token": token},
)
async def query_state(self, ctx): return await _describe_stack()
async def tear_down(self, ctx): await _stack_destroy()
Wiring it in the operator YAML¶
custom_deployments:
deployments:
cps_hpc_aero:
handler: aws_cdk_hpc # registered name
enabled: true
auto_provision: true
params: { stack_name: my-stack, region: us-west-2 }
The instance name (cps_hpc_aero) doubles as the L4 overlay scope key. After
provision(), every consumer reading hpc.endpoints via
cm.get_component_for("hpc.endpoints") observes the new values — no
restart, no signal handling, no manual reload.
Pluggable LLM providers¶
Adding a new remote LLM backend follows the same registration pattern,
through a dedicated registry in cluster/remote_registry.py. Built-ins
(Anthropic, OpenRouter) register themselves at module-import time and are
lazy-loaded on first lookup so CPU-only environments without the optional
vllm extra never pay for unused module loads.
from polymathera.colony.cluster.remote_registry import register_remote_llm_provider
from polymathera.colony.cluster.remote_deployment import RemoteLLMDeployment
@register_remote_llm_provider("my_provider")
class MyProviderDeployment(RemoteLLMDeployment):
async def _initialize_client(self) -> None: ...
async def _call_api(self, messages, **kw): ...
Operator YAML:
cluster:
remote_deployments:
- model_name: "my-org/my-model"
provider: "my_provider"
api_key_env_var: "MY_API_KEY"
Shipping config from an external package¶
Extensions like polymathera-cps register their components without
patching public colony files. Two entry points connect them:
# In your package's pyproject.toml
[tool.poetry.plugins."polymathera.config_components"]
my_extension = "polymathera.cps.config:register_components"
# polymathera/cps/config.py
def register_components() -> None:
"""Side-effect: importing these modules triggers their
@register_polymathera_config / @register_custom_deployment /
@register_remote_llm_provider decorators."""
from . import deployments # noqa: F401
from . import analysis_types # noqa: F401
from . import remote_providers # noqa: F401
ConfigurationManager.initialize() walks the
polymathera.config_components group at startup, calls each registered
function once, and isolates failures (one broken extension is logged and
skipped — it does not block the rest).
Use cases¶
1. Swap the default LLM for a local OpenRouter model¶
# my-config.yaml
cluster:
remote_deployments:
- model_name: "deepseek/deepseek-v3.2"
provider: "openrouter"
api_key_env_var: "OPENROUTER_API_KEY"
num_replicas: 2
2. Raise a tenant's quota at runtime — no restart¶
async def raise_quota(cm, tenant_id: str, new_max: int):
await cm.update_overlay(
"tenant_quotas",
{"max_concurrent_agents": new_max},
scope=OverlayScope.tenant(tenant_id),
)
The next call to cm.get_component_for("tenant_quotas",
tenant_id=tenant_id) from any replica observes the new ceiling.
3. Inject HPC scheduler endpoints into agents after stack-up¶
A CPS-side custom deployment provisions an AWS-CDK HPC stack and writes the returned scheduler URL + credentials into the L4 overlay:
await ctx.write_runtime_overlay(
"hpc.endpoints",
{"scheduler_url": "https://cdk-stack-42.elb...", "auth_token": "..."},
)
A CFD-analysis agent capability reads it in its action body:
endpoints = await self.cm.get_component_for("hpc.endpoints")
result = await self._submit_job(endpoints.scheduler_url, endpoints.auth_token, job)
The capability code is unchanged whether the operator runs against the local stub backend, the test sandbox, or production HPC — only the overlay differs.
4. Override a capability secret per environment without editing YAML¶
# CI environment
export TAVILY_API_KEY=ci-test-key
# Staging
export POLYMATHERA_CAPABILITIES_WEB_SEARCH_API_KEY=staging-key
Both paths land on the same Pydantic field. The first uses the field's
declared env binding (TAVILY_API_KEY); the second uses the
catch-all (POLYMATHERA_<PATH>_<FIELD>). Use the declared one when you want
a stable name; use the catch-all when reaching into a third-party
ConfigComponent whose author didn't pre-declare a name you like.
5. Disable tracing globally without code changes¶
Tracing reads from ObservabilityConfig. Every tracing call site (agent
base, both cluster facilities) resolves through get_observability_config(),
so flipping one field reaches every consumer:
File layout¶
colony/configs/
├── README.md # operator-facing overview + registered-component table
└── example.yaml # documented template — copy + customize
colony/src/polymathera/colony/distributed/config/
├── manager.py # ConfigurationManager — load, set_config_path, update_*
├── configs.py # ConfigComponent base, registry, env-var application
├── extensions.py # discover_config_components (entry-point walker)
├── overlays.py # ConfigOverlayState, OverlayScope, OverlayStore
└── tiers.py # Tier / Mutability / Persistence enums + tier_metadata
See also¶
colony/configs/README.md— full table of registered components and field-by-field reference.colony/configs/example.yaml— annotated template ready to copy and edit.- colony-env guide — how
colony-env up --config <file>plumbs the YAML into the cluster. - Sandboxed Shell capability —
SandboxImagesConfigis its operator surface. - User Plugin capability —
PluginsConfigis its operator surface.