supertonic-3-mlx

olivier/supertonic-3-mlx

Fork 0

Commit Graph

Author	SHA1	Message	Date
ambassadia	a3f44d0661	feat: ship 3 user-selected custom blended voices as presets After listening to the 10-voice comparison MP3 sent on 2026-05-20, the user picked voices 4 / 6 / 7 as their favourites. They are now first-class presets alongside F1..F5 / M1..M5 and can be used directly: wav = pipe.generate("Bonjour", voice="voix_sombre", lang="fr") wav = pipe.generate("Bonjour", voice="homme_moyen", lang="fr") wav = pipe.generate("Bonjour", voice="homme_clair", lang="fr") Blends (created via Pipeline.create_voice with slerp): voix_sombre F4 60 % + M3 40 % androgyne sombre, velouté et grave homme_moyen {M1, M2, M3, M4, M5} equal weight masculin standard homme_clair M1 50 % + M5 50 % masculin brillant, expressif Same JSON schema as the upstream Supertone presets (style_ttl 1×50×256, style_dp 1×8×16, both float32, metadata block recording the blend recipe so the file is self-describing).	2026-05-20 12:48:05 +02:00
transcrilive	12dbf4a821	v0.1.0 — initial release MLX-native port of Supertone's Supertonic 3 multilingual TTS. Runs the full flow-matching + classifier-free-guidance pipeline at ~x100 realtime on Apple Silicon, with audio cosine 1.0 vs the cached MLX path and cosine 0.98 vs the upstream ONNX Runtime reference. Weights are hosted at https://huggingface.co/ambassadia/supertonic-3-mlx and auto-downloaded on first use; this repository ships the port code, the model card, audio samples, and a zero-config setup_and_test.sh. Install: pip install git+https://gitea.tavportal.com/olivier/supertonic-3-mlx.git Quick test: git clone https://gitea.tavportal.com/olivier/supertonic-3-mlx.git cd supertonic-3-mlx && ./setup_and_test.sh Licenses (dual): model weights = BigScience Open RAIL-M (Section 4 propagation), port code = Apache-2.0. See LICENSE, LICENSE-CODE, NOTICE. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 09:17:05 +02:00

Author

SHA1

Message

Date

ambassadia

a3f44d0661

feat: ship 3 user-selected custom blended voices as presets

After listening to the 10-voice comparison MP3 sent on 2026-05-20, the
user picked voices 4 / 6 / 7 as their favourites. They are now first-class
presets alongside F1..F5 / M1..M5 and can be used directly:

    wav = pipe.generate("Bonjour", voice="voix_sombre", lang="fr")
    wav = pipe.generate("Bonjour", voice="homme_moyen", lang="fr")
    wav = pipe.generate("Bonjour", voice="homme_clair", lang="fr")

Blends (created via Pipeline.create_voice with slerp):

  voix_sombre   F4 60 % + M3 40 %                  androgyne sombre, velouté et grave
  homme_moyen   {M1, M2, M3, M4, M5} equal weight  masculin standard
  homme_clair   M1 50 % + M5 50 %                  masculin brillant, expressif

Same JSON schema as the upstream Supertone presets (style_ttl 1×50×256,
style_dp 1×8×16, both float32, metadata block recording the blend
recipe so the file is self-describing).

2026-05-20 12:48:05 +02:00

transcrilive

12dbf4a821

v0.1.0 — initial release

MLX-native port of Supertone's Supertonic 3 multilingual TTS. Runs the
full flow-matching + classifier-free-guidance pipeline at ~x100 realtime
on Apple Silicon, with audio cosine 1.0 vs the cached MLX path and
cosine 0.98 vs the upstream ONNX Runtime reference.

Weights are hosted at https://huggingface.co/ambassadia/supertonic-3-mlx
and auto-downloaded on first use; this repository ships the port code,
the model card, audio samples, and a zero-config setup_and_test.sh.

Install:
    pip install git+https://gitea.tavportal.com/olivier/supertonic-3-mlx.git

Quick test:
    git clone https://gitea.tavportal.com/olivier/supertonic-3-mlx.git
    cd supertonic-3-mlx && ./setup_and_test.sh

Licenses (dual): model weights = BigScience Open RAIL-M (Section 4
propagation), port code = Apache-2.0. See LICENSE, LICENSE-CODE, NOTICE.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-20 09:17:05 +02:00

2 Commits