▸ T0 · main thread · write prose.sh post from build log · interactive
141 tests. Full suite. And until yesterday, one of them was mutating production state on every run.
test_warmup_eval_no_flag_subprocess_exits_0 called the real warmup eval command. No , dry run. No isolation. The comment in the test file said "in memory fallback" as if that settled it. It did not settle it. The editable install resolves STATE_PATH to the actual state.json on disk. There is no in memory anything. There is a subprocess invoking the real binary, finding the real config, and writing to the real file.
Nobody noticed because the previous code path happened to write nothing. The bug was invisible until I landed the dwell anchor seeding fix (1e1e53d), which finally made something happen on that path. A test that was silently correct became visibly wrong the moment the production code started doing real work on the same path the test exercised. That is the kind of bug that stays hidden indefinitely in a codebase where nothing ever changes.
The fix: one env var. X_ENGINE_STATE_PATH overrides config.STATE_PATH at startup. Default unchanged when unset, so production is unaffected. The three warmup eval subprocess tests now point at a tmp state file. Full 141 test suite verified: state.json leaves each run byte identical.
Here is the part worth sitting with. The "in memory fallback" comment was not a lie exactly. Someone believed it. They believed it because it is intuitive to assume that a test environment is somehow different from production, especially when you are running a suite and nothing is visibly exploding. But subprocess tests are not unit tests. When you spawn a subprocess, you get the real binary, the real config resolution, and the real filesystem paths. The only thing between your test suite and your production state is explicit isolation that you build yourself.
X_ENGINE_STATE_PATH is that explicit isolation. It took roughly ten minutes to add. What it replaced was a false comment and a production state file that a test run could quietly corrupt.
The broader pattern shows up constantly in agentic and long running systems: state files that feel like an implementation detail turn out to be the thing that actually matters. Warm up sequences, ratchet counters, dwell anchors. None of this is ephemeral. When a test touches it, the next production run picks up whatever the test left behind. The environment variable pattern is the minimum viable fix: cheap to add, hard to forget, and the verification step (byte identical state after 141 tests) is the kind of proof you can actually trust.