Morten Magnusson · Symbiose Research, Sandnes, Norway · ORCID: 0009-0002-4860-5095 · April 2026 · CC-BY-4.0
Status: This ledger closes the global closure gap
from the 2026-04-23 system audit: per-test outcomes are aggregated into a single,
quantitative verdict via frozen thresholds and a documented score function.
Verdicts are computed, not paraphrased. Falsification status and model preference
are reported as separate fields — conflation was an explicit audit finding.
Scope today: thresholds frozen, aggregation formula defined, weight registry
seeded with defaults; live current_state computation after Likelihood
Ledger pipelines execute.
The Validation Ledger records per-test results. The Likelihood Ledger records how those results were computed. This ledger computes the global verdict: how results combine, where the falsification line is, and what the current state of the framework is.
Without this layer, the system can produce a wall of test outcomes without ever answering: is EFC currently falsified, in tension, or acceptable — and how does it compare overall against ΛCDM?
Applied to every test with a chi2 or delta_chi2 in the
Likelihood Ledger before aggregation.
| Verdict | Criterion | Interpretation |
|---|---|---|
| PASS | |Δχ²| < 1 | Indistinguishable from reference at current precision |
| TENSION | 1 ≤ |Δχ²| < 9 (~1–3σ) | Notable deviation; not decisive |
| FAIL | |Δχ²| ≥ 9 (>3σ) | Decisive deviation against expectation |
Validation Ledger tests that carry the vocabulary value COLLAPSED
map to FAIL; MARGINAL maps to
TENSION when |Δχ²|
is unavailable.
Frequentist form (always computed):
EFC_score = Σᵢ wᵢ · Δχ²ᵢ
Sum runs over inference-grade tests only — those with a
Likelihood Ledger row in status
executed or frozen. Weights wᵢ declared in the
weight registry (default 1.0).
Bayesian form (when evidence is available):
ln K_total = Σᵢ ln Kᵢ (log-Bayes factor, EFC vs reference, per test)
Reported alongside the frequentist score whenever the Likelihood Ledger row
includes evidence in its outputs.
EFC is falsified at the system level if either:
K_total against ΛCDM exceed 1 : 100 (decisive against, Jeffreys' scale).A non-kill-criterion test returning FAIL produces
TENSION at the system level — flagged for
review but not falsifying. The kill-criterion registry is pulled from
validation-ledger/data/grav.json and mirrored into
evaluation-ledger/data/evaluation.json["thresholds"]["kill_criterion_ids"].
An explicit gap in the 2026-04-23 audit: "non-rejectable" and "best model" were conflated. This ledger reports both as separate fields.
| Field | Values | Source |
|---|---|---|
falsification_status | non-falsified · tension · falsified | Per rule in §4 |
model_preference | prefers_efc · neutral · prefers_lcdm · insufficient_data | Per ΔAIC / ln K from Model Comparison |
Both are reported every release. "ΛCDM is currently preferred" and "EFC is currently not falsified" can both be true simultaneously — this ledger refuses to paper over that.
Non-unit weights must carry a rationale. Default is weight = 1.0.
{
"test_id": "s8_growth_fsigma8_2026",
"weight": 1.0,
"rationale": "Primary observable for μ < 1 prediction"
}
Down-weighting without a rationale string is rejected by the sync script. This blocks silent suppression of inconvenient tensions.
The Atlas defines the model space. This ledger
reads from Atlas via data/atlas-link.json to
project per-test posteriors back into Atlas viability regions for reporting.
| Field | Value / meaning |
|---|---|
atlas_ref | Pointer to the Atlas resource being consumed |
consumed_fields | e.g. allowed_region, excluded_region |
modifies_atlas | false (schema-enforced constant) |
This ledger does not write to the Atlas. Downstream projection (posteriors → Atlas regions) is a separate step; Atlas remains the source of truth for its own content.
Every generation of the ledger publishes a current_state block:
{
"falsification_status": "non-falsified | tension | falsified",
"model_preference": "prefers_efc | neutral | prefers_lcdm | insufficient_data",
"efc_score": <number or null>,
"ln_k_total": <number or null>,
"tension_tests": ["test_id", ...],
"fail_tests": ["test_id", ...]
}
grav.json? Proposal: pulled from grav.json (single source of truth).| What this ledger fixes | Turns per-test results into a single quantitative system verdict with frozen thresholds — no more semantic verdicts. |
| Hard rule | Falsification status and model preference are reported separately, always. Conflation is blocked by schema. |
| Hard dependency | Likelihood Ledger rows in status executed or frozen. No pipeline, no score. |
| Atlas safety | Read-only linkage via atlas-link.json; modifies_atlas pinned to false in schema. |
| Status | Thresholds and formulas frozen; weight registry seeded; current_state computation blocked until the first Likelihood Ledger pipeline executes. |
© 2026 Energy-Flow Cosmology Initiative · Evaluation Ledger — frozen thresholds, documented aggregation, separated verdicts