Initial Granite Speech Plus MLX package
This commit is contained in:
42
README.md
Normal file
42
README.md
Normal file
@@ -0,0 +1,42 @@
|
||||
# granite-speech-4.1-2b-plus-mlx
|
||||
|
||||
Standalone Python package for the MLX port of IBM Granite Speech 4.1-2b-plus.
|
||||
The default model is
|
||||
[`mlx-community/granite-speech-4.1-2b-plus-mlx`](https://huggingface.co/mlx-community/granite-speech-4.1-2b-plus-mlx).
|
||||
|
||||
## Quickstart
|
||||
|
||||
```bash
|
||||
uv add "granite-speech-4.1-2b-plus-mlx @ git+https://gitea.tavportal.com/olivier/granite-speech-4.1-2b-plus-mlx.git"
|
||||
python -c "from granite_speech_plus_mlx import GraniteSpeechPlusPipeline as P; p=P.from_pretrained(); print(p.transcribe('audio.wav'))"
|
||||
python scripts/transcribe.py audio.wav --prompt-mode asr --output transcript.txt
|
||||
python scripts/transcribe.py meeting.wav --prompt-mode saa
|
||||
python scripts/benchmark.py audio.wav --results bench
|
||||
```
|
||||
|
||||
## Prompt Modes
|
||||
|
||||
- `asr`: standard transcription.
|
||||
- `saa`: speaker-attributed ASR with `[Speaker N]:` turn labels.
|
||||
- `ts`: word-level timestamp tags like `word [T:45]`.
|
||||
|
||||
See [docs/prompt-modes.md](docs/prompt-modes.md) for examples.
|
||||
|
||||
## Benchmark Hints
|
||||
|
||||
Granite Speech 4.1 allocates substantial encoder memory for long audio. Start
|
||||
with `--chunk-seconds 300 --repetition-penalty 1.2` for ASR and reduce chunks
|
||||
to 60 or 180 seconds if memory is tight. Timestamp mode (`ts`) often needs a
|
||||
larger `--max-tokens` budget because every word carries a timestamp tag.
|
||||
|
||||
## Provenance
|
||||
|
||||
This package was extracted from the local `MLX_CONVERTOR` project, including
|
||||
the Granite Speech patch bundle at
|
||||
`external/patches/granite-speech-idempotent-sanitize.patch`. The vendored
|
||||
Granite implementation is based on `mlx-audio` commit
|
||||
`f7c11556eda88731be5cc75ddbdf4a4cb9eeaafc` plus that local patch.
|
||||
|
||||
Package code is MIT licensed. Model weights remain under the IBM Granite model
|
||||
license; review the model card and license terms before redistribution or use.
|
||||
|
||||
Reference in New Issue
Block a user