release: v0.1.1 — enable_thinking=False default + corrected bench gold + CHANGELOG
This commit is contained in:
18
CHANGELOG.md
Normal file
18
CHANGELOG.md
Normal file
@@ -0,0 +1,18 @@
|
||||
# Changelog
|
||||
|
||||
## v0.1.1 — 2026-05-10
|
||||
|
||||
### Added
|
||||
- `RSAConfig.enable_thinking` field (default `False`). Toggling `<think>` mode in the chat template substantially affects output quality on math problems.
|
||||
- Bench `scripts/bench_hmmt.py` now uses corrected gold answers for the placeholder HMMT-1 (66, was 100) and HMMT-5 (1, was 76).
|
||||
|
||||
### Changed
|
||||
- Default `enable_thinking` flipped to `False`. Empirical testing shows `<think>` mode causes the model to narrate the aggregation prompt (`"We have a user message: ..."`) instead of solving. Direct mode produces math reasoning immediately.
|
||||
- `_render_chat(messages, *, enable_thinking)` signature now takes an explicit kwarg (was hardcoded to `True`).
|
||||
|
||||
### Bench results
|
||||
- 5/5 vanilla + 5/5 RSA on corrected HMMT subset. lift_pp +0.00pp (ceiling effect — vanilla already at 100%).
|
||||
|
||||
## v0.1.0 — 2026-05-10
|
||||
|
||||
Initial public release. T=2 N=4 RSA orchestrator with audit JSONL + CLI + HMMT bench harness.
|
||||
Reference in New Issue
Block a user