release: v0.1.1 — enable_thinking=False default + corrected bench gold + CHANGELOG
This commit is contained in:
18
CHANGELOG.md
Normal file
18
CHANGELOG.md
Normal file
@@ -0,0 +1,18 @@
|
|||||||
|
# Changelog
|
||||||
|
|
||||||
|
## v0.1.1 — 2026-05-10
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- `RSAConfig.enable_thinking` field (default `False`). Toggling `<think>` mode in the chat template substantially affects output quality on math problems.
|
||||||
|
- Bench `scripts/bench_hmmt.py` now uses corrected gold answers for the placeholder HMMT-1 (66, was 100) and HMMT-5 (1, was 76).
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Default `enable_thinking` flipped to `False`. Empirical testing shows `<think>` mode causes the model to narrate the aggregation prompt (`"We have a user message: ..."`) instead of solving. Direct mode produces math reasoning immediately.
|
||||||
|
- `_render_chat(messages, *, enable_thinking)` signature now takes an explicit kwarg (was hardcoded to `True`).
|
||||||
|
|
||||||
|
### Bench results
|
||||||
|
- 5/5 vanilla + 5/5 RSA on corrected HMMT subset. lift_pp +0.00pp (ceiling effect — vanilla already at 100%).
|
||||||
|
|
||||||
|
## v0.1.0 — 2026-05-10
|
||||||
|
|
||||||
|
Initial public release. T=2 N=4 RSA orchestrator with audit JSONL + CLI + HMMT bench harness.
|
||||||
13
README.md
13
README.md
@@ -2,7 +2,7 @@
|
|||||||
|
|
||||||
First MLX implementation of Zyphra's **Markovian RSA** test-time compute methodology, targeting **ZAYA1-8B** on Apple Silicon. Boosts reasoning accuracy by sampling N parallel reasoning traces, extracting their tails, and feeding aggregation prompts back to the model.
|
First MLX implementation of Zyphra's **Markovian RSA** test-time compute methodology, targeting **ZAYA1-8B** on Apple Silicon. Boosts reasoning accuracy by sampling N parallel reasoning traces, extracting their tails, and feeding aggregation prompts back to the model.
|
||||||
|
|
||||||
> **Status :** v0.1.0. Aggregation prompt is `zaya_v1` (reverse-engineered ; paper does not publish the co-trained format). HMMT'25 5-problem smoke shows ≥ 0 pp lift on M2 Pro.
|
> **Status :** v0.1.1. `enable_thinking=False` default ; aggregation `zaya_v1` template (reverse-engineered ; paper does not publish co-trained format). Both vanilla and RSA score 100% on the 5-problem corrected HMMT subset (ceiling effect — needs harder set for real lift measurement).
|
||||||
|
|
||||||
## Install
|
## Install
|
||||||
|
|
||||||
@@ -43,6 +43,17 @@ markovian-rsa-mlx solve "Compute the integral of x^2 from 0 to 5" \
|
|||||||
| `paper-16k` | 2 | 4 | 16 K | ~ 16-24 GB | paper "deployment" profile |
|
| `paper-16k` | 2 | 4 | 16 K | ~ 16-24 GB | paper "deployment" profile |
|
||||||
| `paper-headline-40k` | 2 | 16 | 40 K | 32+ GB | paper headline (HMMT'25 89.6) |
|
| `paper-headline-40k` | 2 | 16 | 40 K | 32+ GB | paper headline (HMMT'25 89.6) |
|
||||||
|
|
||||||
|
## Bench results (HMMT'25 5-problem subset)
|
||||||
|
|
||||||
|
With the corrected placeholder dataset and `enable_thinking=False` default :
|
||||||
|
|
||||||
|
| Backend | Score | Wall time | Per-problem avg |
|
||||||
|
|---|---:|---:|---:|
|
||||||
|
| Vanilla (T=1 N=1) | 5/5 = 100% | 1085 s | 217 s |
|
||||||
|
| RSA T=2 N=2 (default-16gb) | 5/5 = 100% | 3974 s | 795 s |
|
||||||
|
|
||||||
|
`lift_pp = +0.00pp` on this subset due to ceiling effect (vanilla already hits 100%). Larger HMMT'25 / AIME'26 datasets needed to measure the real lift. The system is mechanically correct (RSA outputs reference "Approach 1, Approach 2" from aggregation prompts) ; just needs harder problems to differentiate.
|
||||||
|
|
||||||
## Audit JSONL
|
## Audit JSONL
|
||||||
|
|
||||||
Every event of the run is one line. Schema in
|
Every event of the run is one line. Schema in
|
||||||
|
|||||||
@@ -1,6 +1,6 @@
|
|||||||
[project]
|
[project]
|
||||||
name = "markovian-rsa-mlx"
|
name = "markovian-rsa-mlx"
|
||||||
version = "0.1.0"
|
version = "0.1.1"
|
||||||
description = "Markovian RSA test-time compute methodology on MLX for ZAYA1-8B and future co-trained models"
|
description = "Markovian RSA test-time compute methodology on MLX for ZAYA1-8B and future co-trained models"
|
||||||
readme = "README.md"
|
readme = "README.md"
|
||||||
requires-python = ">=3.12,<3.14"
|
requires-python = ">=3.12,<3.14"
|
||||||
|
|||||||
@@ -1,5 +1,5 @@
|
|||||||
"""Markovian RSA test-time compute methodology on MLX."""
|
"""Markovian RSA test-time compute methodology on MLX."""
|
||||||
__version__ = "0.1.0"
|
__version__ = "0.1.1"
|
||||||
|
|
||||||
from markovian_rsa_mlx.config import RSAConfig
|
from markovian_rsa_mlx.config import RSAConfig
|
||||||
from markovian_rsa_mlx.loader import load_zaya_model
|
from markovian_rsa_mlx.loader import load_zaya_model
|
||||||
|
|||||||
@@ -7,7 +7,7 @@ runner = CliRunner()
|
|||||||
def test_version_command_prints_version():
|
def test_version_command_prints_version():
|
||||||
result = runner.invoke(app, ["version"])
|
result = runner.invoke(app, ["version"])
|
||||||
assert result.exit_code == 0
|
assert result.exit_code == 0
|
||||||
assert "0.1.0" in result.stdout
|
assert "0.1.1" in result.stdout
|
||||||
|
|
||||||
|
|
||||||
def test_solve_help_shows_required_flags():
|
def test_solve_help_shows_required_flags():
|
||||||
|
|||||||
2
uv.lock
generated
2
uv.lock
generated
@@ -421,7 +421,7 @@ wheels = [
|
|||||||
|
|
||||||
[[package]]
|
[[package]]
|
||||||
name = "markovian-rsa-mlx"
|
name = "markovian-rsa-mlx"
|
||||||
version = "0.1.0"
|
version = "0.1.1"
|
||||||
source = { editable = "." }
|
source = { editable = "." }
|
||||||
dependencies = [
|
dependencies = [
|
||||||
{ name = "huggingface-hub" },
|
{ name = "huggingface-hub" },
|
||||||
|
|||||||
Reference in New Issue
Block a user