Skip to main content
A review is only as trustworthy as you can verify the inputs that produced it. Review provenance (ARC-184) is the manifest that ships with every Sigilix review, recording what ran and against what.

Why this matters

Two scenarios make provenance load-bearing:
  1. “Was this review against my latest push?” — A reviewer reads a Sigilix comment, accepts the verdict, and merges. If the review was actually against an old SHA, the verdict applies to old code. The stale-marker system catches this.
  2. “Which specialist produced this finding, and did it earn its place?” — When a critical finding is unexpected, the human reviewer needs to know what produced it and how confident the system is. The provenance manifest records which specialist ran, whether it used its primary model or a cross-provider fallback, and the proof tier the finding earned through the believability pipeline.

The manifest

Every review comment includes a hidden manifest at the bottom:
<!-- sigilix-meta: {
  "headSha": "abc1234567890",
  "specialists": {
    "logic":       { "specialist": "Metis", "outcome": "ok",       "ms": 21300 },
    "security":    { "specialist": "Argus",  "outcome": "fallback", "ms": 28100 },
    "performance": { "specialist": "Iris", "outcome": "ok",       "ms": 19800 },
    "tests":       { "specialist": "Eunomia", "outcome": "ok",       "ms": 16400 }
  },
  "synthesis":   { "specialist": "Harmonia", "outcome": "ok", "ms": 9800 },
  "evidence":    { "sarif": 3, "depVulns": 1, "secrets": 0 },
  "proofTiers":  { "verified": 1, "grounded": 4, "model": 0 },
  "deterministicChecks": 2,
  "incremental": true,
  "schema": 1
} -->
The block is invisible in GitHub’s rendered view but readable in the raw comment source. The sigilix-meta marker lets tooling parse it programmatically. The four domain specialists — Metis (logic/architecture), Argus (security), Iris (performance), Eunomia (tests) — run in parallel and are unified by Harmonia, the synthesizer that runs after them. The manifest records the role key (logic / security / performance / tests / synthesis) alongside each specialist’s display name.
The manifest deliberately does not pin specific model IDs. Each specialist runs a model tuned to its role — a reasoning-heavy model for logic, a faster high-volume model for security — and each has a cross-provider fallback so one provider’s outage can’t silence a specialist. Which underlying model served a given run is an implementation detail that churns; what the manifest commits to is the durable contract (which specialist ran, whether it fell back, and the proof tier each finding earned).

Outcomes per specialist

outcomeMeaning
okPrimary model succeeded on first attempt
retryPrimary model needed a retry (transient 503/429/timeout) but ultimately produced findings
fallbackPrimary failed; cross-provider fallback succeeded
skippedBoth primary and fallback failed; this specialist contributed no findings
gatedRouter decided this specialist shouldn’t fire on this PR (e.g., docs-only)
A review with one skipped is posted with the _3 of 4 domain specialists succeeded_ footnote in the visible body. A review with a fallback is posted normally — the fallback is supposed to handle these cases silently.

Stale-marker detection

When a pull_request.synchronize event arrives (a new commit pushed), the new review’s pipeline reads the most recent prior review’s sigilix-meta block and compares the manifest’s headSha against the current head. If they differ — i.e., the prior review was against an older SHA — Sigilix updates the prior comment with a stale marker:
> _⚠️ This review was on abc1234. The current head is def5678. See the [new review](...) for the latest findings._
The original review content is preserved (so historical context isn’t lost) but the marker tells readers not to take the old verdict at face value. This matters because stale reviews accumulate in long PR threads. Without the marker, a reviewer scanning the conversation might read an old “Approved” and miss that the latest push regressed.

Provenance for findings, not just reviews

Each individual finding also carries provenance metadata:
  • Which specialist produced it (logic / security / performance / tests)
  • Whether it was sourced from an evidence channel (SARIF, depVulns, secrets, deterministicChecks)
  • Whether it was the result of agreement (multiple specialists flagged the same path:line)
In the rendered review, this is the [Metis] / [Argus] / [Iris] / [Eunomia] prefix on each inline finding, plus an evidence badge ([Trivy via SARIF], [Secret scanner], etc.) when the finding came from an external channel.

Proof-tier receipts

Provenance answers what ran. The proof tier answers the harder question: did this finding earn the right to post? Every posted finding carries a tier pill — a receipt for how it was validated, not a numeric confidence guess.
TierPillMeaning
VerifiedVERIFIEDThe claim was checked by execution or carries a signed receipt — the strongest tier. A finding that says “this throws” was confirmed to throw.
GroundedGROUNDEDThe finding is anchored to cited code or external evidence (a SARIF row, a CVE record, a quoted diff span). The claim is tied to something verifiable in the PR.
ModelMODELModel judgment that passed the grounding gate but wasn’t independently executed or receipt-backed. Still anchored to the diff, but the validation is reasoning, not a check.
This replaces any older “1–5 confidence score” framing. A number invites the question “1–5 according to whom?”; a proof tier answers “validated how?” — which is the question a reviewer actually has when deciding whether to act on a finding.

The refute / execute stage

Proof tiers are not self-assigned by the specialist that raised the finding. Before synthesis, each candidate finding passes through a refute/execute gate:
  1. Grounding check. The finding must anchor to cited code or evidence in the diff. A claim that can’t point at the line it’s about is demoted out of the inline channel — it cannot earn GROUNDED or above.
  2. Refutation pass. The system actively tries to disprove the finding against the surrounding context (callers, dependencies, the rest of the file). A finding that the refutation pass overturns is dropped or downgraded rather than posted as a false alarm.
  3. Execution / receipt. Where a claim is checkable — and a check is available — it is executed or matched against a signed receipt. Surviving that promotes the finding to VERIFIED.
A finding cannot post unless it clears these gates. This is what makes believability architectural rather than a matter of prompt wording: the proof tier on the pill is the receipt of which gates the finding actually passed. The manifest’s proofTiers counts let tooling audit the tier distribution of a review at a glance. See the believability pipeline for the full five gates — evidence, provenance contracts, refute/execute, proof-tier receipts, and memory — that every finding traverses.

Tooling integration

The sigilix-meta manifest is intended to be parsed by tooling. Two known integrations:
  • GitHub Actions consumers that aggregate Sigilix manifests across PRs for trend analysis.
  • Internal dashboards that surface “which specialists fell back this week” for capacity planning.
Schema version is bumped (the schema: 1 field) when the manifest changes in a breaking way. Forward-compatible additions don’t bump it.

What the manifest does not include

  • Customer code or diff content (privacy)
  • Specialist prompts (those are Sigilix’s IP)
  • API keys or internal-service identifiers
  • Per-finding internal scores (only severities and counts)
The manifest is meant to be auditable, not exhaustive. If you need deeper provenance for a compliance audit, contact support — Sigilix retains review-level telemetry that can be exported.

Believability Pipeline

The five gates — including refute/execute — that earn a finding its proof tier.

SARIF Evidence

Push external scanner output into Sigilix reviews.

Review Lifecycle

Where the manifest is built in the pipeline.