Initial Granite Speech Plus MLX package
This commit is contained in:
37
docs/prompt-modes.md
Normal file
37
docs/prompt-modes.md
Normal file
@@ -0,0 +1,37 @@
|
||||
# Prompt Modes
|
||||
|
||||
Granite Speech Plus supports three prompt modes in this package.
|
||||
|
||||
## `asr`
|
||||
|
||||
Standard speech transcription.
|
||||
|
||||
```python
|
||||
from granite_speech_plus_mlx import GraniteSpeechPlusPipeline
|
||||
|
||||
pipe = GraniteSpeechPlusPipeline.from_pretrained()
|
||||
text = pipe.transcribe("audio.wav", prompt_mode="asr")
|
||||
```
|
||||
|
||||
## `saa`
|
||||
|
||||
Speaker-attributed ASR. The prompt asks the model to add speaker turn labels
|
||||
such as `[Speaker 1]:` and `[Speaker 2]:`.
|
||||
|
||||
```python
|
||||
text = pipe.transcribe("meeting.wav", prompt_mode="saa")
|
||||
```
|
||||
|
||||
## `ts`
|
||||
|
||||
Word-level timestamps. The prompt asks the model to append centisecond tags
|
||||
after words, for example `hello [T:45] world [T:82]`.
|
||||
|
||||
```python
|
||||
text = pipe.transcribe("clip.wav", prompt_mode="ts")
|
||||
```
|
||||
|
||||
For long audio, the pipeline chunks the waveform and feeds a short previous
|
||||
transcript prefix into later chunks for continuity. The prefix is context only;
|
||||
the model is instructed not to repeat it.
|
||||
|
||||
Reference in New Issue
Block a user