Colony¶

Polymathera's no-RAG, cache-aware multi-agent framework for extremely long, dense contexts (1B+ tokens).

Colony is a framework for building tightly-coupled, self-evolving, self-improving, self-aware multi-agent systems (agent colonies) that reason over extremely long context without retrieval-augmented generation (RAG). Instead of fragmenting context into chunks and retrieving snippets, Colony keeps the entire context "live" over a cluster of one or more LLMs through a cluster-level virtual memory system that manages LLM KV caches in the same way an operating system manages (almost unlimited) virtual memory over finite physical memory.

Colony's Vision

Colony's goal is to be the most efficient country of geniuses in a datacenter — the ideal substrate for civilization-building AI.

Use Cases.

Colony is designed to extract or synthesize novel insights from a large established body of knowledge, where the greatest cost factor is associated with the input context size rather than the expected output length. For example, incrementally editing a large code monorepo, systemic vulnerability detection in a billion-line codebase, reverse-engineering advanced proprietary designs from public knowledge, or discovering novel connections or plausible conjectures across thousands of scientific papers.

Pre-Alpha Early Access

Colony is still in pre-alpha early access. The API is not stable and the framework is under active development. We welcome feedback and contributions, but be aware that breaking changes may occur.

Who should use Colony?

Colony is designed for engineers building complex multi-agent systems that require reasoning over extremely long contexts. It is not a general-purpose agent framework or a consumer product. If you are looking for a simple agent orchestration tool or a way to add tool use to an LLM, Colony may not be the right fit. It runs over a Ray cluster (local or in the cloud) and it can be resource-intensive and expensive.

Key Ideas¶

NoRAG: Colony keeps the full context live and accessible, not filtered through retrieval. Colony manages all kinds of context (code, text, data) through distributed KV cache paging, not vector search.
Cache-Aware Agents: Agents are aware of what's in GPU memory (at the cluster level) and consciously plan their work to maximize cache reuse.
Agents All the Way Down: General intelligence emerges from the right composition of agent capabilities and multi-agent patterns. Every cognitive process -- attention, memory, planning, confidence tracking -- is a pluggable policy with a default implementation.
Distributed Reasoning Patterns: Multi-agent game protocols (hypothesis games, contract nets, negotiation) combat specific LLM failure modes like hallucination, laziness, and goal drift.

Getting Started¶

pip install polymathera-colony

See the Installation guide and Quick Start tutorial.

Architecture at a Glance¶

Documentation¶

Section	Description
Getting Started	Installation and initial setup instructions
Examples Gallery	Collection of example use cases and applications
Philosophy	Why Colony exists and what makes it different
Architecture	Technical architecture of each subsystem
Design Insights	Deep dives into novel design decisions
Guides	Practical how-to guides
API Reference	Detailed API documentation
Contributing	How to contribute to Colony