951 B
951 B
Changelog
v0.1.1 — 2026-05-10
Added
RSAConfig.enable_thinkingfield (defaultFalse). Toggling<think>mode in the chat template substantially affects output quality on math problems.- Bench
scripts/bench_hmmt.pynow uses corrected gold answers for the placeholder HMMT-1 (66, was 100) and HMMT-5 (1, was 76).
Changed
- Default
enable_thinkingflipped toFalse. Empirical testing shows<think>mode causes the model to narrate the aggregation prompt ("We have a user message: ...") instead of solving. Direct mode produces math reasoning immediately. _render_chat(messages, *, enable_thinking)signature now takes an explicit kwarg (was hardcoded toTrue).
Bench results
- 5/5 vanilla + 5/5 RSA on corrected HMMT subset. lift_pp +0.00pp (ceiling effect — vanilla already at 100%).
v0.1.0 — 2026-05-10
Initial public release. T=2 N=4 RSA orchestrator with audit JSONL + CLI + HMMT bench harness.