v0.1.0 — initial release
MLX-native port of Supertone's Supertonic 3 multilingual TTS. Runs the full flow-matching + classifier-free-guidance pipeline at ~x100 realtime on Apple Silicon, with audio cosine 1.0 vs the cached MLX path and cosine 0.98 vs the upstream ONNX Runtime reference. Weights are hosted at https://huggingface.co/ambassadia/supertonic-3-mlx and auto-downloaded on first use; this repository ships the port code, the model card, audio samples, and a zero-config setup_and_test.sh. Install: pip install git+https://gitea.tavportal.com/olivier/supertonic-3-mlx.git Quick test: git clone https://gitea.tavportal.com/olivier/supertonic-3-mlx.git cd supertonic-3-mlx && ./setup_and_test.sh Licenses (dual): model weights = BigScience Open RAIL-M (Section 4 propagation), port code = Apache-2.0. See LICENSE, LICENSE-CODE, NOTICE. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
39
NOTICE
Normal file
39
NOTICE
Normal file
@@ -0,0 +1,39 @@
|
||||
supertonic-3-mlx
|
||||
================
|
||||
|
||||
This release is a derivative of the upstream Supertone Supertonic 3
|
||||
text-to-speech model and consists of two artefact classes governed by
|
||||
two different licenses:
|
||||
|
||||
1. The model weights (under ./weights/*.safetensors) are released under
|
||||
the BigScience Open RAIL-M License. The full text is in ./LICENSE and
|
||||
was copied verbatim from
|
||||
https://huggingface.co/Supertone/supertonic-3/blob/main/LICENSE
|
||||
The Attachment A use restrictions (Section 5 + Attachment A clauses
|
||||
(a)–(m)) apply to all downstream use of the model and of any output
|
||||
generated by the model.
|
||||
|
||||
2. The MLX port code (under ./src/supertonic_3_mlx/) is released under
|
||||
the Apache License, Version 2.0. The full text is in ./LICENSE-CODE.
|
||||
|
||||
Attribution and modifications statement (BigScience Open RAIL-M Section 4.c):
|
||||
|
||||
Copyright (c) 2026 Supertone Inc. — original model weights and reference
|
||||
Python/ONNX implementation. Distributed at
|
||||
https://huggingface.co/Supertone/supertonic-3
|
||||
Copyright (c) 2026 Olivier Dupont — MLX-native port code, weight format
|
||||
conversion (ONNX → safetensors via the 3-stage extractor in
|
||||
``src/supertonic_3_mlx/pipeline.py:_convert_onnx``), and pipeline
|
||||
optimisations (``mx.compile`` of the CFG Euler loop, cross-attention
|
||||
K/V cache shared across the 5 Euler steps). Distributed at
|
||||
https://huggingface.co/ambassadia/supertonic-3-mlx
|
||||
|
||||
The MLX port does not modify the model's learned parameters in any
|
||||
semantic sense — the only weight-level transformation is a tensor-shape
|
||||
re-layout to match the MLX memory model (e.g. depthwise Conv1d
|
||||
``(C, 1, K)`` → ``(C, K, 1)``). Bit-identical audio output to the
|
||||
upstream ONNX Runtime reference is preserved up to FP32 accumulation
|
||||
noise (cosine ≥ 0.98 on the full pipeline, cosine = 1.00 on the vocoder).
|
||||
|
||||
No use of the Supertone trademarks, logos, or trade dress is asserted or
|
||||
permitted by this release (BigScience Open RAIL-M Section 8).
|
||||
Reference in New Issue
Block a user