Files
pyannote-speaker-diarizatio…/README.md
transcrilive 2b1a3c1312 feat: initial public release v0.1.0 — MLX port of pyannote-speaker-diarization-3.1
Byte-parity with pyannote-PyTorch reference (cosine 0.763718 identical
at 6 decimals on 200 cross-window slot pairs). 2.5x faster than
pyannote-MPS on Apple Silicon native.

Extracted from gitea.tavportal.com/olivier/MLX_CONVERTOR commit 5f9eafa.
2026-05-09 16:05:39 +02:00

1.9 KiB

pyannote-speaker-diarization-3.1-mlx

First MLX port of pyannote-speaker-diarization-3.1 with byte-parity to the PyTorch reference. 2.5x faster than pyannote-MPS on Apple Silicon native.

Install

uv add "pyannote-speaker-diarization-3.1-mlx @ git+https://gitea.tavportal.com/olivier/pyannote-speaker-diarization-3.1-mlx.git"

Quickstart

from pyannote_diarization_3_1_mlx import MlxDiarizationPipeline

pipeline = MlxDiarizationPipeline.from_pretrained("pyannote/speaker-diarization-3.1")
diarization = pipeline("audio.wav")

for turn, _, speaker in diarization.itertracks(yield_label=True):
    print(f"{turn.start:.1f}s - {turn.end:.1f}s {speaker}")

Parity

Evidence MLX Reference Result
Cosine distance (200 cross-window pairs) mean=0.763718 pyannote-PyTorch mean=0.763718 identical at 6 decimals
5h10 bench 173s / 44 speakers / 1.27 GB pyannote-MPS 431s / 43 speakers / 1.72 GB Cross-DER 0.076

Architecture

SincNet → BiLSTM → Powerset(3,2) head + WeSpeaker ResNet34 speaker embedding + AgglomerativeClustering wrapper.

Module Naming

The repository name is pyannote-speaker-diarization-3.1-mlx; the Python import is pyannote_diarization_3_1_mlx. The import name follows PEP 8 and embeds the pyannote model version so future 4.0 ports can co-install.

Citation

This project ports the pyannote speaker diarization 3.1 pipeline architecture to MLX. Please cite the original pyannote.audio work when using this package:

@inproceedings{Plaquet23,
  author = {Alexis Plaquet and Hervé Bredin},
  title = {{Powerset multi-class cross entropy loss for neural speaker diarization}},
  booktitle = {Proc. INTERSPEECH 2023},
  year = {2023},
}

Provenance

Extracted from MLX_CONVERTOR/src/mlxconv/diar at commit 5f9eafa. Maintained at https://gitea.tavportal.com/olivier/pyannote-speaker-diarization-3.1-mlx.

License

MIT