Planning Architecture¶
Merge this article with action-policies.md
The planning system and action policies are deeply intertwined. Consider merging this article with action-policies.md to unify the discussion of how the LLM planner generates actions and how the framework executes them.
Colony does not emphasize rigid plan graphs, state machines, or rule-based orchestration engines, although these approaches can be provided as ActionPolicy implementations. Instead, the framework provides context and available actions to a LLM planner, which decides what to do; the framework executes and feeds back results.
Why LLM-Centric Planning?¶
Traditional agent frameworks constrain planning through explicit structures: DAGs of tasks, state machines with fixed transitions, or Standard Operating Procedures that prescribe step sequences. These work for well-defined workflows but fail for Colony's target domain -- deep reasoning over extremely long context -- where the task structure is not known in advance. The optimal plan or strategy changees mid-execution depending on: - What the LLM discovers during execution - Cache state, page availability, and other agents' actions
Colony's approach: a plan is the LLM's current thinking plus execution history -- not a fixed sequence of steps. The LLM can revise, extend, or abandon its plan at any point based on new information. The reasoning process may require arbitrary depth of sub-planning
Plan and Action Models¶
Actions¶
Composite actions no longer supported
The original plan was to have composite actions that contain sub-plans. This added complexity without clear benefit. The current design is that all actions are atomic. Recursively nested planning is implemented indirectly via child agent spawning.
An Action represents a single decision made by the LLM planner:
class Action(BaseModel):
action_id: str
action_type: ActionType # ANALYZE, PLAN_CREATE, TOOL_USE, etc.
parameters: dict[str, Any] # Action-specific parameters
reasoning: str # LLM's reasoning for this action
status: ActionStatus # PENDING, RUNNING, COMPLETED, FAILED
result: ActionResult | None # Execution result
sub_plan: ActionPlan | None # For composite actions
parent_action_id: str | None # Hierarchical reference
Actions can be atomic (executed directly by an @action_executor) or composite (containing a sub-plan that is generated JIT when the action is executed). This enables hierarchical planning without requiring the LLM to plan everything upfront.
ActionPlan¶
Execution And Planning Context
Is execution_context actually populated and executed? Add details of using AgentContextEngine.gather_context() by the planner.
An ActionPlan is a container for the LLM's current strategy:
class ActionPlan(BaseModel):
plan_id: str
goal: str # What this plan aims to achieve
actions: list[Action] # Ordered actions at this level
current_action_index: int # Execution cursor
abstraction_level: int # 0=top-level, 1=sub-plan, ...
revision_history: list[str] # Version tracking
cache_context: CacheContext # Working set, access patterns, prefetch
execution_context: dict[str, Any] # Accumulated results and state
The plan maintains a full execution_context -- all completed actions, their results, and accumulated findings. This context is passed to the LLM when replanning, so the planner can generate informed continuations.
ActionPlanner¶
The ActionPlanner (in polymathera.colony.agents.patterns.planning.planner) is the abstract base for plan generation:
class ActionPlanner(ABC):
@abstractmethod
async def create_plan(self, planning_context: PlanningContext) -> ActionPlan: ...
@abstractmethod
async def revise_plan(
self,
current_plan: ActionPlan,
planning_context: PlanningContext,
revision_reason: str,
) -> ActionPlan: ...
Implementations:
SequentialPlanner: Manually-specified linear sequence of actionsCacheAwareActionPlanner: LLM-driven planning via pluggablePlanningStrategyPolicy
Explain and Consolidate Planner Parameters
These constructor parameters are confusing with seemingly overlapping responsibilities. Perhaps rename them to clarify their roles, and add a diagram showing how they interact.
class CacheAwareActionPlanner(ActionPlanner):
"""One planner class, customized via pluggable policies."""
def __init__(
self,
agent: Agent,
planning_strategy: PlanningStrategyPolicy,
planning_params: PlanningParameters,
cache_policy: CacheAwarePlanningPolicy | None = None,
learning_policy: LearningPlanningPolicy | None = None,
coordination_policy: CoordinationPlanningPolicy | None = None,
): ...
The Reasoning Loop¶
Composite actions no longer supported
Update this diagram to remove composite actions and show how hierarchical planning is implemented via child agent spawning instead.
The core execution cycle is straightforward:
graph TD
Start[Agent receives goal] --> Context[Gather context:<br/>goals, constraints, history,<br/>available actions, cache state]
Context --> Ask["Ask LLM: what's next?"]
Ask --> Execute[Execute chosen action]
Execute --> Update[Update execution context<br/>with results]
Update --> Check{Plan complete?}
Check -->|"No"| Context
Check -->|"Yes"| Done[Return results]
Execute -->|"Composite action"| SubPlan[Generate sub-plan JIT]
SubPlan --> SubExec[Execute sub-plan recursively]
SubExec --> Update
At each iteration:
- The framework gathers current context: goals, constraints, execution history, available actions from active capabilities, and cache state
- The LLM reasons about what to do next (or creates/revises a plan)
- The framework executes the chosen action via the appropriate
@action_executor - Results feed back into context for the next iteration
This is not a chatbot loop. The LLM receives structured input (ActionPolicyInput) with typed descriptions of every available action, and produces structured output (action type + parameters). The two-phase action selection (choose action type, then parameterize) prevents the LLM from being overwhelmed by the full parameter space.
Model-Predictive Control¶
CacheAwareActionPolicy uses a Model-Predictive Control (MPC) approach rather than plan-then-execute:
- Create plan: LLM generates a plan with a finite horizon (not trying to plan the entire task)
- Execute partially: Execute the next few actions
- Observe: Check results against expectations, observe cache state changes
- Revise: If conditions changed, revise the remaining plan; if on track, continue
sequenceDiagram
participant P as LLM Planner
participant E as Action Executor
participant V as VCM
participant B as Blackboard
P->>V: Query cache state
V-->>P: Available pages, capacity
P->>B: Query execution history
B-->>P: Past actions, results
P->>P: Generate plan (horizon=5 actions)
Note over P: Plan includes CacheContext:<br/>working set, prefetch hints,<br/>access sequence
loop MPC Cycle
P->>E: Execute next action
E-->>P: Result + resource usage
P->>P: Evaluate: on track?
alt Conditions changed
P->>P: Revise remaining plan
end
end
P->>B: Store completed plan for learning
MPC is essential for cache-aware planning because:
- Cache state is nonstationary: Other agents load and evict pages, invalidating assumptions
- Page graph evolves: New relationships are discovered during execution, changing optimal access patterns
- Working set drifts: The reasoning process may discover that different pages are more relevant than initially predicted
Hierarchical Planning¶
Composite actions no longer supported
Update this section to explain how hierarchical planning is implemented via child agent spawning instead of composite actions with sub-plans. Explain how sub-agents are described to the parent agent planner, how they are coordinated and how they feed results back to the parent agent.
Replanning¶
Replanning is triggered by several conditions:
| Trigger | Strategy | Example |
|---|---|---|
| Plan exhaustion | ADD_ACTIONS -- extend with new actions |
All planned actions executed, but goal not yet satisfied |
| Action failure | REVISE or BACKTRACK |
An action produces unexpected results |
| New information | REVISE -- adjust remaining actions |
Blackboard events from other agents invalidate assumptions |
| Resource changes | REVISE with new cache context |
VCM page availability changes |
| Periodic | REVISE |
Configurable re-evaluation interval |
The replanning mechanism preserves the full execution history. When revise_plan() is called, the planner sees all completed actions and their results, and generates a continuation that builds on what has already been accomplished.
Replanning decisions are made by the ReplanningPolicy (in polymathera.colony.agents.patterns.planning.replanning):
class ReplanningPolicy(ABC):
"""Decides WHEN to replan and WHAT revision strategy to use.
Separation of concerns:
- ReplanningPolicy decides WHEN to replan (this class)
- ActionPlanner.revise_plan() decides HOW to replan
- CacheAwareActionPolicy orchestrates the flow
"""
@abstractmethod
async def evaluate_replanning_need(
self,
plan: ActionPlan,
last_result: ActionResult | None,
state: ActionPolicyExecutionState,
) -> ReplanningDecision: ...
Built-in implementations compose via CompositeReplanningPolicy:
PlanExhaustionReplanningPolicy: Triggers when all actions are executedPeriodicReplanningPolicy: Re-evaluates at configurable intervals
Plan Exhaustion¶
When all actions in the current plan have been executed, this does not automatically mean the agent should stop. Plan exhaustion triggers replanning via the existing MPC mechanism:
- The planner receives the full execution context (all completed actions + results)
- The planner decides: is the goal satisfied, or does more work need to be done?
- If more work: new actions are appended to the existing plan (preserving history)
- If goal satisfied: the agent completes (or transitions to IDLE in continuous mode)
A configurable max_replan_cycles prevents infinite replanning loops.
Cache-Aware Planning Context¶
Every plan includes a CacheContext that the LLM planner reasons about:
class CacheContext(BaseModel):
working_set: list[str] # Pages this plan needs
working_set_priority: dict[str, float] # page_id -> importance (0-1)
estimated_access_pattern: dict[str, int] # page_id -> expected access count
access_sequence: list[str] # Expected order of access
page_graph_summary: dict[str, Any] # Cluster info, relationships
min_cache_size: int # Minimum for viable execution
ideal_cache_size: int # For optimal performance
shareable_pages: list[str] # Safe for concurrent access
exclusive_pages: list[str] # Must not be evicted
prefetch_pages: list[str] # Load before execution begins
prefetch_priority: dict[str, float] # Prefetch ordering
The planner uses this to:
- Order actions to maximize cache locality (group accesses to the same pages)
- Declare prefetch needs so the VCM can load pages before they are needed
- Size the plan to fit available cache capacity
- Preserve cache locality when revising plans mid-execution
Checkpointing¶
Long-running actions support checkpointing for fault tolerance:
- The checkpoint policy saves state after expensive actions (page analysis, synthesis, agent spawning)
- State includes the full plan with execution context, current action index, and accumulated results
- On failure, the agent can resume from the last checkpoint rather than starting over
- Checkpoint frequency is configurable per action type
What This Means in Practice¶
A concrete example of planning in action:
- Agent receives goal: "Understand the dependency injection patterns in this codebase"
- LLM creates initial plan: scan package structure, identify DI frameworks, trace injection points
- First action discovers the codebase uses a custom DI system (not a standard framework)
- LLM revises plan: shift from framework-specific analysis to tracing the custom DI implementation
- During tracing, the agent discovers the custom DI interacts with an ORM in unexpected ways
- LLM generates a sub-plan to analyze the ORM interaction, spawning child agents for the ORM modules
- Child agents report findings; the parent synthesizes across the full dependency chain
- Plan exhaustion triggers replanning; the LLM determines the goal is satisfied and completes
At no point did the framework prescribe the plan structure. The LLM adapted its strategy as it learned about the codebase. The MPC approach ensured that each planning decision was based on the most current information.