MLX-native port of Supertone's Supertonic 3 multilingual TTS. Runs the full flow-matching + classifier-free-guidance pipeline at ~x100 realtime on Apple Silicon, with audio cosine 1.0 vs the cached MLX path and cosine 0.98 vs the upstream ONNX Runtime reference. Weights are hosted at https://huggingface.co/ambassadia/supertonic-3-mlx and auto-downloaded on first use; this repository ships the port code, the model card, audio samples, and a zero-config setup_and_test.sh. Install: pip install git+https://gitea.tavportal.com/olivier/supertonic-3-mlx.git Quick test: git clone https://gitea.tavportal.com/olivier/supertonic-3-mlx.git cd supertonic-3-mlx && ./setup_and_test.sh Licenses (dual): model weights = BigScience Open RAIL-M (Section 4 propagation), port code = Apache-2.0. See LICENSE, LICENSE-CODE, NOTICE. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1.1 KiB
1.1 KiB
| 1 | filename | language | voice | text | duration_s | mlx_ms_median | rtf_mlx | onnx_ms_median | rtf_onnx | speedup_mlx_over_onnx |
|---|---|---|---|---|---|---|---|---|---|---|
| 2 | samples/en_F1_short.wav | en | F1 | Hello world from Apple Silicon. Supertonic 3 runs at one hundred times real time. | 2.786 | 36.6 | 76.2 | 1004.7 | 2.8 | 27.5 |
| 3 | samples/en_M1_long.wav | en | M1 | A gentle breeze moved through the open window while the children, still half-asleep, listened to the distant sound of the harbour bells. | 3.901 | 38.4 | 101.7 | 1356.0 | 2.9 | 35.3 |
| 4 | samples/fr_F2.wav | fr | F2 | Bonjour, ceci est un test de synthèse vocale en français. Le modèle gère trente-et-une langues sur une puce M4. | 3.413 | 37.9 | 90.1 | 1195.6 | 2.9 | 31.6 |
| 5 | samples/de_M2.wav | de | M2 | Guten Morgen. Dieses Modell läuft komplett auf Apple Silicon, ohne ONNX und ohne CoreML, in reinem MLX. | 3.692 | 38.1 | 96.9 | 1313.9 | 2.8 | 34.5 |
| 6 | samples/ja_F3.wav | ja | F3 | こんにちは。これはアップルシリコン上でMLXを使ったテストです。 | 1.463 | 32.1 | 45.6 | 848.4 | 1.7 | 26.4 |
| 7 | samples/es_M3.wav | es | M3 | Hola, esto es una prueba de síntesis de voz en español ejecutada en tiempo real sobre Apple Silicon. | 2.856 | 37.0 | 77.2 | 1002.1 | 2.9 | 27.1 |