release: v0.1.1 — enable_thinking=False default + corrected bench gold + CHANGELOG

This commit is contained in:
transcrilive
2026-05-10 14:38:27 +02:00
parent 81e8ac88cc
commit b65bf91e37
6 changed files with 34 additions and 5 deletions

View File

@@ -2,7 +2,7 @@
First MLX implementation of Zyphra's **Markovian RSA** test-time compute methodology, targeting **ZAYA1-8B** on Apple Silicon. Boosts reasoning accuracy by sampling N parallel reasoning traces, extracting their tails, and feeding aggregation prompts back to the model.
> **Status :** v0.1.0. Aggregation prompt is `zaya_v1` (reverse-engineered ; paper does not publish the co-trained format). HMMT'25 5-problem smoke shows ≥ 0 pp lift on M2 Pro.
> **Status :** v0.1.1. `enable_thinking=False` default ; aggregation `zaya_v1` template (reverse-engineered ; paper does not publish co-trained format). Both vanilla and RSA score 100% on the 5-problem corrected HMMT subset (ceiling effect — needs harder set for real lift measurement).
## Install
@@ -43,6 +43,17 @@ markovian-rsa-mlx solve "Compute the integral of x^2 from 0 to 5" \
| `paper-16k` | 2 | 4 | 16 K | ~ 16-24 GB | paper "deployment" profile |
| `paper-headline-40k` | 2 | 16 | 40 K | 32+ GB | paper headline (HMMT'25 89.6) |
## Bench results (HMMT'25 5-problem subset)
With the corrected placeholder dataset and `enable_thinking=False` default :
| Backend | Score | Wall time | Per-problem avg |
|---|---:|---:|---:|
| Vanilla (T=1 N=1) | 5/5 = 100% | 1085 s | 217 s |
| RSA T=2 N=2 (default-16gb) | 5/5 = 100% | 3974 s | 795 s |
`lift_pp = +0.00pp` on this subset due to ceiling effect (vanilla already hits 100%). Larger HMMT'25 / AIME'26 datasets needed to measure the real lift. The system is mechanically correct (RSA outputs reference "Approach 1, Approach 2" from aggregation prompts) ; just needs harder problems to differentiate.
## Audit JSONL
Every event of the run is one line. Schema in