v0.1.0 — initial release

MLX-native port of Supertone's Supertonic 3 multilingual TTS. Runs the full flow-matching + classifier-free-guidance pipeline at ~x100 realtime on Apple Silicon, with audio cosine 1.0 vs the cached MLX path and cosine 0.98 vs the upstream ONNX Runtime reference. Weights are hosted at https://huggingface.co/ambassadia/supertonic-3-mlx and auto-downloaded on first use; this repository ships the port code, the model card, audio samples, and a zero-config setup_and_test.sh. Install: pip install git+https://gitea.tavportal.com/olivier/supertonic-3-mlx.git Quick test: git clone https://gitea.tavportal.com/olivier/supertonic-3-mlx.git cd supertonic-3-mlx && ./setup_and_test.sh Licenses (dual): model weights = BigScience Open RAIL-M (Section 4 propagation), port code = Apache-2.0. See LICENSE, LICENSE-CODE, NOTICE. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 09:17:05 +02:00
commit 12dbf4a821
36 changed files with 3812 additions and 0 deletions
--- a/examples/quickstart.py
+++ b/examples/quickstart.py
@@ -0,0 +1,23 @@
+"""Minimal Supertonic 3 MLX usage — 5 lines, no fluff.
+
+Run from anywhere AFTER ``pip install supertonic-3-mlx`` (or from inside
+this directory after ``pip install ./``):
+
+    python examples/quickstart.py
+"""
+from supertonic_3_mlx import Pipeline
+import soundfile as sf
+
+# When the package has been pip-installed, this auto-downloads from the Hub
+# (~ 400 MB) into the standard Hugging Face cache. After the first run, the
+# weights are reused from cache and cold start is ~ 11 ms on M4.
+pipe = Pipeline.from_pretrained("ambassadia/supertonic-3-mlx")
+
+wav = pipe.generate(
+    "Hello world from Apple Silicon. Supertonic 3 runs at one hundred times realtime.",
+    voice="F1",  # one of F1..F5, M1..M5
+    lang="en",   # ISO 639-1
+)
+
+sf.write("hello.wav", wav, pipe.sample_rate)
+print(f"wrote hello.wav — {len(wav) / pipe.sample_rate:.2f}s of audio")