Initial Granite Speech Plus MLX package

This commit is contained in:
transcrilive
2026-05-09 20:00:57 +02:00
commit c6a20cb79f
21 changed files with 2002 additions and 0 deletions

37
docs/prompt-modes.md Normal file
View File

@@ -0,0 +1,37 @@
# Prompt Modes
Granite Speech Plus supports three prompt modes in this package.
## `asr`
Standard speech transcription.
```python
from granite_speech_plus_mlx import GraniteSpeechPlusPipeline
pipe = GraniteSpeechPlusPipeline.from_pretrained()
text = pipe.transcribe("audio.wav", prompt_mode="asr")
```
## `saa`
Speaker-attributed ASR. The prompt asks the model to add speaker turn labels
such as `[Speaker 1]:` and `[Speaker 2]:`.
```python
text = pipe.transcribe("meeting.wav", prompt_mode="saa")
```
## `ts`
Word-level timestamps. The prompt asks the model to append centisecond tags
after words, for example `hello [T:45] world [T:82]`.
```python
text = pipe.transcribe("clip.wav", prompt_mode="ts")
```
For long audio, the pipeline chunks the waveform and feeds a short previous
transcript prefix into later chunks for continuity. The prefix is context only;
the model is instructed not to repeat it.