olivier/markovian-rsa-mlx

Files

transcrilive b65bf91e37 release: v0.1.1 — enable_thinking=False default + corrected bench gold + CHANGELOG

2026-05-10 14:38:27 +02:00

951 B

Raw Permalink Blame History

Changelog

v0.1.1 — 2026-05-10

Added

RSAConfig.enable_thinking field (default False). Toggling <think> mode in the chat template substantially affects output quality on math problems.
Bench scripts/bench_hmmt.py now uses corrected gold answers for the placeholder HMMT-1 (66, was 100) and HMMT-5 (1, was 76).

Changed

Default enable_thinking flipped to False. Empirical testing shows <think> mode causes the model to narrate the aggregation prompt ("We have a user message: ...") instead of solving. Direct mode produces math reasoning immediately.
_render_chat(messages, *, enable_thinking) signature now takes an explicit kwarg (was hardcoded to True).

Bench results

5/5 vanilla + 5/5 RSA on corrected HMMT subset. lift_pp +0.00pp (ceiling effect — vanilla already at 100%).

v0.1.0 — 2026-05-10

Initial public release. T=2 N=4 RSA orchestrator with audit JSONL + CLI + HMMT bench harness.