feat: initial public release v0.1.0 — MLX port of pyannote-speaker-diarization-3.1
Byte-parity with pyannote-PyTorch reference (cosine 0.763718 identical at 6 decimals on 200 cross-window slot pairs). 2.5x faster than pyannote-MPS on Apple Silicon native. Extracted from gitea.tavportal.com/olivier/MLX_CONVERTOR commit 5f9eafa.
This commit is contained in:
57
README.md
Normal file
57
README.md
Normal file
@@ -0,0 +1,57 @@
|
||||
# pyannote-speaker-diarization-3.1-mlx
|
||||
|
||||
First MLX port of pyannote-speaker-diarization-3.1 with byte-parity to the PyTorch reference. 2.5x faster than pyannote-MPS on Apple Silicon native.
|
||||
|
||||
## Install
|
||||
|
||||
```bash
|
||||
uv add "pyannote-speaker-diarization-3.1-mlx @ git+https://gitea.tavportal.com/olivier/pyannote-speaker-diarization-3.1-mlx.git"
|
||||
```
|
||||
|
||||
## Quickstart
|
||||
|
||||
```python
|
||||
from pyannote_diarization_3_1_mlx import MlxDiarizationPipeline
|
||||
|
||||
pipeline = MlxDiarizationPipeline.from_pretrained("pyannote/speaker-diarization-3.1")
|
||||
diarization = pipeline("audio.wav")
|
||||
|
||||
for turn, _, speaker in diarization.itertracks(yield_label=True):
|
||||
print(f"{turn.start:.1f}s - {turn.end:.1f}s {speaker}")
|
||||
```
|
||||
|
||||
## Parity
|
||||
|
||||
| Evidence | MLX | Reference | Result |
|
||||
| --- | --- | --- | --- |
|
||||
| Cosine distance (200 cross-window pairs) | mean=0.763718 | pyannote-PyTorch mean=0.763718 | identical at 6 decimals |
|
||||
| 5h10 bench | 173s / 44 speakers / 1.27 GB | pyannote-MPS 431s / 43 speakers / 1.72 GB | Cross-DER 0.076 |
|
||||
|
||||
## Architecture
|
||||
|
||||
SincNet → BiLSTM → Powerset(3,2) head + WeSpeaker ResNet34 speaker embedding + AgglomerativeClustering wrapper.
|
||||
|
||||
## Module Naming
|
||||
|
||||
The repository name is `pyannote-speaker-diarization-3.1-mlx`; the Python import is `pyannote_diarization_3_1_mlx`. The import name follows PEP 8 and embeds the pyannote model version so future 4.0 ports can co-install.
|
||||
|
||||
## Citation
|
||||
|
||||
This project ports the pyannote speaker diarization 3.1 pipeline architecture to MLX. Please cite the original pyannote.audio work when using this package:
|
||||
|
||||
```bibtex
|
||||
@inproceedings{Plaquet23,
|
||||
author = {Alexis Plaquet and Hervé Bredin},
|
||||
title = {{Powerset multi-class cross entropy loss for neural speaker diarization}},
|
||||
booktitle = {Proc. INTERSPEECH 2023},
|
||||
year = {2023},
|
||||
}
|
||||
```
|
||||
|
||||
## Provenance
|
||||
|
||||
Extracted from MLX_CONVERTOR/src/mlxconv/diar at commit 5f9eafa. Maintained at https://gitea.tavportal.com/olivier/pyannote-speaker-diarization-3.1-mlx.
|
||||
|
||||
## License
|
||||
|
||||
MIT
|
||||
Reference in New Issue
Block a user