Why this matters
Two scenarios make provenance load-bearing:- “Was this review against my latest push?” — A reviewer reads a Sigilix comment, accepts the verdict, and merges. If the review was actually against an old SHA, the verdict applies to old code. The stale-marker system catches this.
- “Which specialist produced this finding, and did it earn its place?” — When a critical finding is unexpected, the human reviewer needs to know what produced it and how confident the system is. The provenance manifest records which specialist ran, whether it used its primary model or a cross-provider fallback, and the proof tier the finding earned through the believability pipeline.
The manifest
Every review comment includes a hidden manifest at the bottom:sigilix-meta marker lets tooling parse it programmatically.
The four domain specialists — Metis (logic/architecture), Argus (security), Iris (performance), Eunomia (tests) — run in parallel and are unified by Harmonia, the synthesizer that runs after them. The manifest records the role key (logic / security / performance / tests / synthesis) alongside each specialist’s display name.
The manifest deliberately does not pin specific model IDs. Each specialist runs a model tuned to its role — a reasoning-heavy model for logic, a faster high-volume model for security — and each has a cross-provider fallback so one provider’s outage can’t silence a specialist. Which underlying model served a given run is an implementation detail that churns; what the manifest commits to is the durable contract (which specialist ran, whether it fell back, and the proof tier each finding earned).
Outcomes per specialist
outcome | Meaning |
|---|---|
ok | Primary model succeeded on first attempt |
retry | Primary model needed a retry (transient 503/429/timeout) but ultimately produced findings |
fallback | Primary failed; cross-provider fallback succeeded |
skipped | Both primary and fallback failed; this specialist contributed no findings |
gated | Router decided this specialist shouldn’t fire on this PR (e.g., docs-only) |
skipped is posted with the _3 of 4 domain specialists succeeded_ footnote in the visible body. A review with a fallback is posted normally — the fallback is supposed to handle these cases silently.
Stale-marker detection
When apull_request.synchronize event arrives (a new commit pushed), the new review’s pipeline reads the most recent prior review’s sigilix-meta block and compares the manifest’s headSha against the current head.
If they differ — i.e., the prior review was against an older SHA — Sigilix updates the prior comment with a stale marker:
Provenance for findings, not just reviews
Each individual finding also carries provenance metadata:- Which specialist produced it (
logic/security/performance/tests) - Whether it was sourced from an evidence channel (SARIF, depVulns, secrets, deterministicChecks)
- Whether it was the result of agreement (multiple specialists flagged the same
path:line)
[Metis] / [Argus] / [Iris] / [Eunomia] prefix on each inline finding, plus an evidence badge ([Trivy via SARIF], [Secret scanner], etc.) when the finding came from an external channel.
Proof-tier receipts
Provenance answers what ran. The proof tier answers the harder question: did this finding earn the right to post? Every posted finding carries a tier pill — a receipt for how it was validated, not a numeric confidence guess.| Tier | Pill | Meaning |
|---|---|---|
| Verified | VERIFIED | The claim was checked by execution or carries a signed receipt — the strongest tier. A finding that says “this throws” was confirmed to throw. |
| Grounded | GROUNDED | The finding is anchored to cited code or external evidence (a SARIF row, a CVE record, a quoted diff span). The claim is tied to something verifiable in the PR. |
| Model | MODEL | Model judgment that passed the grounding gate but wasn’t independently executed or receipt-backed. Still anchored to the diff, but the validation is reasoning, not a check. |
The refute / execute stage
Proof tiers are not self-assigned by the specialist that raised the finding. Before synthesis, each candidate finding passes through a refute/execute gate:- Grounding check. The finding must anchor to cited code or evidence in the diff. A claim that can’t point at the line it’s about is demoted out of the inline channel — it cannot earn GROUNDED or above.
- Refutation pass. The system actively tries to disprove the finding against the surrounding context (callers, dependencies, the rest of the file). A finding that the refutation pass overturns is dropped or downgraded rather than posted as a false alarm.
- Execution / receipt. Where a claim is checkable — and a check is available — it is executed or matched against a signed receipt. Surviving that promotes the finding to VERIFIED.
proofTiers counts let tooling audit the tier distribution of a review at a glance.
See the believability pipeline for the full five gates — evidence, provenance contracts, refute/execute, proof-tier receipts, and memory — that every finding traverses.
Tooling integration
Thesigilix-meta manifest is intended to be parsed by tooling. Two known integrations:
- GitHub Actions consumers that aggregate Sigilix manifests across PRs for trend analysis.
- Internal dashboards that surface “which specialists fell back this week” for capacity planning.
schema: 1 field) when the manifest changes in a breaking way. Forward-compatible additions don’t bump it.
What the manifest does not include
- Customer code or diff content (privacy)
- Specialist prompts (those are Sigilix’s IP)
- API keys or internal-service identifiers
- Per-finding internal scores (only severities and counts)
Read next
Believability Pipeline
The five gates — including refute/execute — that earn a finding its proof tier.
SARIF Evidence
Push external scanner output into Sigilix reviews.
Review Lifecycle
Where the manifest is built in the pipeline.

