`UserPluginCapability`¶

Lets users drop a directory into a workspace to add a new tool the agent can use. Discovers SKILL.md skill bundles and plugin.json plugin packages, validates parameters, and runs each skill inside a SandboxedShellCapability container.

Code: polymathera.colony.agents.patterns.capabilities.UserPluginCapability. Subpackage: _plugin/{schema,discovery}.py. Sample plugin: colony/src/polymathera/colony/samples/plugins/colony-samples/.

The on-disk layout deliberately overlaps with Claude Code's Skills/Plugins so users can move directories between Colony and Claude Code with minimal translation.

Why it exists¶

Colony will frequently meet domain-specific tools the framework knows nothing about — a particular CFD simulator, a CAD tool, a domain-specific verifier, an in-house data pipeline. Hardcoding adapters for every one is a losing battle. Users add a directory, the agent discovers it. The capability is a registry + adapter, never a runner — execution always goes through the sandbox.

Layout¶

<plugin_root>/
└── colony-samples/
    ├── .claude-plugin/
    │   └── plugin.json
    └── skills/
        ├── code-complexity/
        │   ├── SKILL.md
        │   └── scripts/
        │       ├── run.sh
        │       └── complexity.py
        ├── scientific-debugging/
        │   ├── SKILL.md
        │   └── scripts/{run.sh, worksheet.py}
        └── systemic-vulnerability-scan/
            ├── SKILL.md
            └── scripts/{run.sh, scan.py}

A standalone skill (no plugin) sits at <skill_root>/<name>/SKILL.md. A plugin namespaces its skills under <plugin>/<skill> so two plugins can ship skills with the same local name.

`SKILL.md` frontmatter¶

---
name: code-complexity
description: |
  Compute per-function cyclomatic complexity for Python source files…
when_to_use: |
  Triggered when Python files are involved and the user wants a
  quantitative read on code complexity, refactor targets, or
  maintenance risk hotspots.
sandbox_image_role: default      # picks the Docker image for execution
script: scripts/run.sh           # path inside the skill dir
params:
  path: { type: string, required: true }
  threshold: { type: integer, required: false }
  top_n: { type: integer, required: false }
timeout_seconds: 120
paths: "**/*.py"                  # optional activation hint
disable-model-invocation: false   # default; true → user-only skill
---

# Skill body

Markdown body shown to the LLM when it `get_skill`-s the skill in detail.

The skill ships its own executable scripts under scripts/. They run inside the sandbox image declared by sandbox_image_role (defaults to the capability's default_sandbox_image_role, which is "default").

Discovery roots¶

Default search order (highest priority first):

<workspace_root>/.colony/skills — session/project (when the workspace is mounted)
~/.colony/skills — per-user
/etc/colony/skills — operator-managed shared

Same scheme for plugins/. Higher-priority sources win on name collisions; the loser is logged at INFO. Plugin-namespaced skills (<plugin>/<name>) cannot collide.

extra_plugin_roots / extra_skill_roots add paths at SYSTEM priority — used by the session agent to ship the bundled colony-samples plugin without shadowing what the user installed locally.

Action surface¶

Action	Purpose
`list_skills(source=…)`	Every loaded skill, optionally filtered by source (`session` / `user` / `system` / `plugin`).
`get_skill(name)`	Full metadata + body markdown.
`search_skills(query, max_results=10)`	Substring match against name / description / `when_to_use`, ranked.
`list_plugins`	Loaded plugins with their skill counts.
`reload_skills`	Force a rescan — useful when the user just installed something.
`run_skill(name, params=…, container_id=…, …)`	Validate + execute. Launches its own container if `container_id` is None and stops it on exit; otherwise runs in the caller's container.

get_action_group_description enumerates every loaded skill with a one-liner so the LLM sees them in the planning prompt without a separate list_skills call.

Execution¶

async def run_skill(self, name, *, params=None, container_id=None, …):
    sk = self._resolve_skill(name)
    self._validate_params(sk, params)         # required + type check

    sandbox = self.agent.get_capability_by_type(SandboxedShellCapability)
    if container_id is None:
        launched = await sandbox.launch_container(
            image_role=sk.sandbox_image_role or self._default_sandbox_image_role,
            extra_volumes=[{"src": str(sk.directory), "dst": "/skill", "mode": "ro"}],
            …,
        )
        container_id = launched["container_id"]
        owned = True

    try:
        return await sandbox.execute_command(
            container_id=container_id,
            command=self._render_script_command(sk, params),  # bash -lc 'cd /skill && bash <script> --p1 v1 …'
            timeout_seconds=sk.timeout_seconds,
        )
    finally:
        if owned:
            await sandbox.stop_container(container_id)

Param validation:

required: true enforced.
Type strings (string, integer, number, boolean, array, object) checked. bool is excluded from integer to avoid silent acceptance of True.
Unknown params are passed through (the LLM may add extras).

Command rendering:

The script path itself supports {name} placeholders (rare).
All other params become --name value positional args, individually shell-quoted.
The whole thing is wrapped as bash -lc 'cd /skill && bash <script> <args>'.

The bundled `colony-samples` plugin¶

Three skills shipped with the package and auto-discovered by the session agent:

Skill	What it does
`colony-samples/code-complexity`	McCabe cyclomatic complexity for Python files (stdlib only).
`colony-samples/scientific-debugging`	Emits a structured RCA worksheet — observe → hypothesise → predict → experiment → conclude.
`colony-samples/systemic-vulnerability-scan`	Heuristic scanner for `bare_except`, `shell=True`, mutable defaults, hard-coded secrets, the documented `write_transaction` return-bug pattern, and asserts in non-test files.

Wired into the session agent via:

UserPluginCapability.bind(
    scope=BlackboardScope.SESSION,
    extra_plugin_roots=[_bundled_samples_plugins_root()],
)

The helper resolves to polymathera.colony.samples.__file__ / .. / plugins, so the plugin ships with the wheel and is discovered without extra mounts.

Compatibility with Claude Code¶

Field	Claude Code	Colony
`name`, `description`, `when_to_use`, `paths`, `disable-model-invocation`	✓	✓
`sandbox_image_role`	—	Colony-only
`script`	optional (inline shell injection allowed)	required
`params` type validation	loose	strict (rejects type mismatches)
Auto-discovery	`~/.claude/skills`, `.claude/skills`	`~/.colony/skills`, `<workspace>/.colony/skills`, `/etc/colony/skills`

Skills can be copied between the two ecosystems. Inline shell injection (Claude Code's !... syntax) is not supported — every Colony skill must declare a script, because every skill execution is containerised.

Security¶

Sandbox boundary: every skill runs inside SandboxedShellCapability. The capability never executes code in the host process.
disable-model-invocation: true: skills with this flag are skipped by search_skills and refused by run_skill (override with the allow_model_invocation_override blueprint kwarg for testing).
Image role: a skill that requests an unknown sandbox_image_role fails at launch_container time.
Param validation: protects against shell-quoting confusion in run_skill since the substituted args are individually quoted.

Test surface¶

tests/test_user_plugin_capability.py (27 tests). Uses real filesystem tmp_path for discovery + a stub SandboxedShellCapability for execution. Covers: frontmatter parser (no FM / valid / malformed), source-priority collisions, per-skill error isolation, plugin namespacing, action surface, end-to-end run_skill with launch/exec/stop wiring, param required/type validation, disable-model-invocation gating, missing-sandbox error, launch failure surfacing, shell-quoting helper.

Open follow-ups¶

Dynamic per-skill @action_executor: the design proposes registering each discovered skill as its own action key (e.g., skill.colony-samples.code-complexity). The dispatcher walks cls.__dict__, not instance attrs, so this requires a dispatcher refactor; v1 ships run_skill(name) instead.
Procedural-memory sync: when the memory system supports skill storage, add _sync_to_procedural_memory() so feedback can refine skills over time.
Plugin agents/ directories: a plugin could ship its own child-agent classes; out of scope for v1.
Settings UI: a Skills tab listing discovered skills with enable/disable toggles.

UserPluginCapability¶