granite-speech-4.1-2b-plus-mlx/docs/prompt-modes.md

# Prompt Modes

Granite Speech Plus supports three prompt modes in this package.

## `asr`

Standard speech transcription.

```python
from granite_speech_plus_mlx import GraniteSpeechPlusPipeline

pipe = GraniteSpeechPlusPipeline.from_pretrained()
text = pipe.transcribe("audio.wav", prompt_mode="asr")
```

## `saa`

Speaker-attributed ASR. The prompt asks the model to add speaker turn labels
such as `[Speaker 1]:` and `[Speaker 2]:`.

```python
text = pipe.transcribe("meeting.wav", prompt_mode="saa")
```

## `ts`

Word-level timestamps. The prompt asks the model to append centisecond tags
after words, for example `hello [T:45] world [T:82]`.

```python
text = pipe.transcribe("clip.wav", prompt_mode="ts")
```

For long audio, the pipeline chunks the waveform and feeds a short previous
transcript prefix into later chunks for continuity. The prefix is context only;
the model is instructed not to repeat it.