GitRepoContextPageSource¶
Pages a git repository (or a subtree of one) into VCM. Backing
implementation is GitRepoShardingStrategy + FileGrouperWithGraph,
both of which the source composes; the source itself is responsible
for cloning, sub-tree restriction, ignore-file handling, and watcher
composition.
Minimal example¶
from polymathera.colony.samples.paging import GitRepoContextPageSource
from polymathera.colony.vcm.models import MmapConfig
source = GitRepoContextPageSource(
scope_id="my-repo",
mmap_config=MmapConfig(),
origin_url="https://github.com/example/code.git",
branch="main",
commit="HEAD",
)
await source.initialize()
That's the equivalent of the historical default — the whole tree, no
filtering, only GitRemoteWatcher (no LocalFsWatcher).
Map a single subtree, skip the build directory¶
GitRepoContextPageSource(
scope_id="my-repo:src",
mmap_config=MmapConfig(),
origin_url="https://github.com/example/code.git",
start_dir="src/",
exclude_globs=["**/build/**", "**/__pycache__/**"],
)
Patterns are gitignore-style (pathspec with GitWildMatchPattern).
A .colonyignore (or .gitignore) at the sub-tree root augments
exclude_globs automatically.
Frozen-commit context (no watcher)¶
GitRepoContextPageSource(
scope_id="external-foo",
mmap_config=MmapConfig(),
origin_url="https://github.com/upstream/foo.git",
commit="<sha>",
static=True,
watch_remote=False,
)
static=True tells the convergence runtime that the page graph is
frozen at this commit; watch_remote=False skips the GitRemoteWatcher.
Reference¶
| Argument | Default | Effect |
|---|---|---|
scope_id |
required | VCM scope identifier; mmap_application_scope keys on this. |
mmap_config |
required | Memory-mapped page-graph config (flush thresholds, locality policy). |
origin_url |
required | https://, file://, or any URL git clone accepts. |
branch |
"main" |
Branch tracked by the watcher. |
commit |
"HEAD" |
Commit pinned at clone time; combine with static=True for a frozen snapshot. |
start_dir |
None |
Repo-relative directory to walk; None walks the repo root. |
include_globs |
None |
Gitignore-style include patterns; None includes everything not excluded. |
exclude_globs |
None |
Gitignore-style exclude patterns. |
ignore_files |
(".gitignore", ".colonyignore") |
Filenames inside the repo whose patterns are merged into the exclude set. Pass () to disable. |
binary_policy |
"skip" |
"skip" drops blobs whose first 8 KB contain a NUL byte; "include" keeps them (the literature source uses this). |
watch_remote |
True |
Subscribe to GitRemoteWatcher events. |
watch_local |
False |
Subscribe to LocalFsWatcher events. Off by default — the VCM mapping is the global read-only view of branch. |
static |
False |
True produces a frozen-commit instance — no events, no watchers. |
Failure modes¶
- Subtree empty: when
start_dirresolves to an empty directory the source yields zero pages and logs a warning. Subsequent reads return empty mappings. - Binary files in a code subtree: with the default
binary_policy="skip", PDFs and images are silently dropped. UseLiteratureContextPageSourcefor those. - Replica without a working tree: when
git clonefails on a replica,_repo_pathstaysNoneandwatch()returns immediately. The convergence runtime handles the empty iterator cleanly.