Initial Granite Speech Plus MLX package

2026-05-09 20:00:57 +02:00
commit c6a20cb79f
21 changed files with 2002 additions and 0 deletions
--- a/docs/prompt-modes.md
+++ b/docs/prompt-modes.md
@@ -0,0 +1,37 @@
+# Prompt Modes
+
+Granite Speech Plus supports three prompt modes in this package.
+
+## `asr`
+
+Standard speech transcription.
+
+```python
+from granite_speech_plus_mlx import GraniteSpeechPlusPipeline
+
+pipe = GraniteSpeechPlusPipeline.from_pretrained()
+text = pipe.transcribe("audio.wav", prompt_mode="asr")
+```
+
+## `saa`
+
+Speaker-attributed ASR. The prompt asks the model to add speaker turn labels
+such as `[Speaker 1]:` and `[Speaker 2]:`.
+
+```python
+text = pipe.transcribe("meeting.wav", prompt_mode="saa")
+```
+
+## `ts`
+
+Word-level timestamps. The prompt asks the model to append centisecond tags
+after words, for example `hello [T:45] world [T:82]`.
+
+```python
+text = pipe.transcribe("clip.wav", prompt_mode="ts")
+```
+
+For long audio, the pipeline chunks the waveform and feeds a short previous
+transcript prefix into later chunks for continuity. The prefix is context only;
+the model is instructed not to repeat it.
+