Files
transcrilive e05ab3be88 chore: switch HF target to olivius/granite-speech-4.1-2b-plus-mlx
User has no push access to mlx-community org. Updating package
default and upload script to point to olivius/ personal namespace.
2026-05-09 20:37:12 +02:00

1.7 KiB

granite-speech-4.1-2b-plus-mlx

Standalone Python package for the MLX port of IBM Granite Speech 4.1-2b-plus. The default model is olivius/granite-speech-4.1-2b-plus-mlx.

Quickstart

uv add "granite-speech-4.1-2b-plus-mlx @ git+https://gitea.tavportal.com/olivier/granite-speech-4.1-2b-plus-mlx.git"
python -c "from granite_speech_plus_mlx import GraniteSpeechPlusPipeline as P; p=P.from_pretrained(); print(p.transcribe('audio.wav'))"
python scripts/transcribe.py audio.wav --prompt-mode asr --output transcript.txt
python scripts/transcribe.py meeting.wav --prompt-mode saa
python scripts/benchmark.py audio.wav --results bench

Prompt Modes

  • asr: standard transcription.
  • saa: speaker-attributed ASR with [Speaker N]: turn labels.
  • ts: word-level timestamp tags like word [T:45].

See docs/prompt-modes.md for examples.

Benchmark Hints

Granite Speech 4.1 allocates substantial encoder memory for long audio. Start with --chunk-seconds 300 --repetition-penalty 1.2 for ASR and reduce chunks to 60 or 180 seconds if memory is tight. Timestamp mode (ts) often needs a larger --max-tokens budget because every word carries a timestamp tag.

Provenance

This package was extracted from the local MLX_CONVERTOR project, including the Granite Speech patch bundle at external/patches/granite-speech-idempotent-sanitize.patch. The vendored Granite implementation is based on mlx-audio commit f7c11556eda88731be5cc75ddbdf4a4cb9eeaafc plus that local patch.

Package code is MIT licensed. Model weights remain under the IBM Granite model license; review the model card and license terms before redistribution or use.