# pyannote-speaker-diarization-3.1-mlx First MLX port of pyannote-speaker-diarization-3.1 with byte-parity to the PyTorch reference. 2.5x faster than pyannote-MPS on Apple Silicon native. ## Install ```bash uv add "pyannote-speaker-diarization-3.1-mlx @ git+https://gitea.tavportal.com/olivier/pyannote-speaker-diarization-3.1-mlx.git" ``` ## Quickstart ```python from pyannote_diarization_3_1_mlx import MlxDiarizationPipeline pipeline = MlxDiarizationPipeline.from_pretrained("pyannote/speaker-diarization-3.1") diarization = pipeline("audio.wav") for turn, _, speaker in diarization.itertracks(yield_label=True): print(f"{turn.start:.1f}s - {turn.end:.1f}s {speaker}") ``` ## Parity | Evidence | MLX | Reference | Result | | --- | --- | --- | --- | | Cosine distance (200 cross-window pairs) | mean=0.763718 | pyannote-PyTorch mean=0.763718 | identical at 6 decimals | | 5h10 bench | 173s / 44 speakers / 1.27 GB | pyannote-MPS 431s / 43 speakers / 1.72 GB | Cross-DER 0.076 | ## Architecture SincNet → BiLSTM → Powerset(3,2) head + WeSpeaker ResNet34 speaker embedding + AgglomerativeClustering wrapper. ## Module Naming The repository name is `pyannote-speaker-diarization-3.1-mlx`; the Python import is `pyannote_diarization_3_1_mlx`. The import name follows PEP 8 and embeds the pyannote model version so future 4.0 ports can co-install. ## Citation This project ports the pyannote speaker diarization 3.1 pipeline architecture to MLX. Please cite the original pyannote.audio work when using this package: ```bibtex @inproceedings{Plaquet23, author = {Alexis Plaquet and Hervé Bredin}, title = {{Powerset multi-class cross entropy loss for neural speaker diarization}}, booktitle = {Proc. INTERSPEECH 2023}, year = {2023}, } ``` ## Provenance Extracted from MLX_CONVERTOR/src/mlxconv/diar at commit 5f9eafa. Maintained at https://gitea.tavportal.com/olivier/pyannote-speaker-diarization-3.1-mlx. ## License MIT