supertonic-3-mlx

olivier/supertonic-3-mlx

Fork 0

Commit Graph

Author	SHA1	Message	Date
ambassadia	d9f43c2531	docs: add multi-machine bench (M3 Ultra 45.8ms / M4 86.7ms / CoreML 303ms / ONNX 1200ms) Adds the Newton-sentence benchmark numbers measured on two real Macs + the upstream CoreML and ONNX baselines. Highlights: - Mac Studio M3 Ultra: 45.8 ms wall median (best 39 ms), RTF x88 - MacBook Air M4: 86.7 ms wall median, RTF x47 - M4 + CoreML: 303.5 ms wall median, RTF x27 - M4 + ONNX SDK: ~1200 ms wall median, RTF ~x3 Same FR utterance, same warmup protocol, 5 warm runs each. The ms-per-second-of-audio column is the honest backend comparison since the two paths produce slightly different audio durations (DurationPredictor + CoreML's speed=1.05 give different timing). MLX wins 1.78× over the CoreML build on identical M4 hardware, and ~35-40× over the upstream ONNX SDK. GPU memory footprint on the Ultra: 750 MB active, 844 MB peak.	2026-05-20 09:48:20 +02:00
transcrilive	12dbf4a821	v0.1.0 — initial release MLX-native port of Supertone's Supertonic 3 multilingual TTS. Runs the full flow-matching + classifier-free-guidance pipeline at ~x100 realtime on Apple Silicon, with audio cosine 1.0 vs the cached MLX path and cosine 0.98 vs the upstream ONNX Runtime reference. Weights are hosted at https://huggingface.co/ambassadia/supertonic-3-mlx and auto-downloaded on first use; this repository ships the port code, the model card, audio samples, and a zero-config setup_and_test.sh. Install: pip install git+https://gitea.tavportal.com/olivier/supertonic-3-mlx.git Quick test: git clone https://gitea.tavportal.com/olivier/supertonic-3-mlx.git cd supertonic-3-mlx && ./setup_and_test.sh Licenses (dual): model weights = BigScience Open RAIL-M (Section 4 propagation), port code = Apache-2.0. See LICENSE, LICENSE-CODE, NOTICE. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 09:17:05 +02:00

Author

SHA1

Message

Date

ambassadia

d9f43c2531

docs: add multi-machine bench (M3 Ultra 45.8ms / M4 86.7ms / CoreML 303ms / ONNX 1200ms)

Adds the Newton-sentence benchmark numbers measured on two real Macs +
the upstream CoreML and ONNX baselines. Highlights:

- Mac Studio M3 Ultra: 45.8 ms wall median (best 39 ms), RTF x88
- MacBook Air M4:      86.7 ms wall median,               RTF x47
- M4 + CoreML:        303.5 ms wall median,               RTF x27
- M4 + ONNX SDK:     ~1200 ms wall median,               RTF ~x3

Same FR utterance, same warmup protocol, 5 warm runs each. The
ms-per-second-of-audio column is the honest backend comparison since the
two paths produce slightly different audio durations (DurationPredictor
+ CoreML's speed=1.05 give different timing). MLX wins 1.78× over the
CoreML build on identical M4 hardware, and ~35-40× over the upstream
ONNX SDK.

GPU memory footprint on the Ultra: 750 MB active, 844 MB peak.

2026-05-20 09:48:20 +02:00

transcrilive

12dbf4a821

v0.1.0 — initial release

MLX-native port of Supertone's Supertonic 3 multilingual TTS. Runs the
full flow-matching + classifier-free-guidance pipeline at ~x100 realtime
on Apple Silicon, with audio cosine 1.0 vs the cached MLX path and
cosine 0.98 vs the upstream ONNX Runtime reference.

Weights are hosted at https://huggingface.co/ambassadia/supertonic-3-mlx
and auto-downloaded on first use; this repository ships the port code,
the model card, audio samples, and a zero-config setup_and_test.sh.

Install:
    pip install git+https://gitea.tavportal.com/olivier/supertonic-3-mlx.git

Quick test:
    git clone https://gitea.tavportal.com/olivier/supertonic-3-mlx.git
    cd supertonic-3-mlx && ./setup_and_test.sh

Licenses (dual): model weights = BigScience Open RAIL-M (Section 4
propagation), port code = Apache-2.0. See LICENSE, LICENSE-CODE, NOTICE.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-20 09:17:05 +02:00

2 Commits