supertonic-3-mlx

Files

ambassadia d32aaae32d feat: create_voice() — mix presets to synthesise custom voices

The 10 preset voices live on a hypersphere of radius ≈ 7.1 in the
12 800-D style-token space (verified empirically: pairwise cosines
0.86-0.97, SVD shows 7 axes cover 99 % of variance). Linear or
spherical interpolation between presets stays in the trained
distribution and produces new intelligible voices.

API:
    voice = pipe.create_voice({'F2': 0.7, 'M1': 0.3})   # slerp by default
    voice = pipe.create_voice({'F2': 0.5, 'M1': 0.5}, interp='lerp')
    wav   = pipe.generate('Bonjour', voice=voice, lang='fr')

The voice argument of pipe.generate() now accepts either a preset
name (str) or a custom voice descriptor (dict from create_voice).

Whisper validation on 6 custom blends (FR test phrase):
    F2 70 / M1 30          → 100 % (lightly androgyne F voice)
    F2 50 / M1 50          →  91 % (true androgyne)
    avg of 5 F voices      → 100 % (mean feminine timbre)
    avg of 5 M voices      →  91 % (mean masculine timbre)
    warm fem (F4+F5)       →  91 %
    bright masc (M1+M5)    → 100 %

All blends remain intelligible — the trained voice manifold is convex
enough that interpolations don't fall out of the model's distribution.

Example script in examples/custom_voice_demo.py.

2026-05-20 12:25:15 +02:00

custom_voice_demo.py

feat: create_voice() — mix presets to synthesise custom voices

2026-05-20 12:25:15 +02:00

quickstart.py

v0.1.0 — initial release

2026-05-20 09:17:05 +02:00

streaming_demo.py

feat: streaming generate_stream() with sub-100ms TTFB

2026-05-20 12:23:17 +02:00