Whisper.cpp is an open-source, C++ implementation of the Whisper speech recognition system. Whisper is an automatic speech recognition (ASR) system developed by OpenAI that can transcribe and translate speech in multiple languages.
It is a lightweight, CPU-only, and highly optimized version of the original Whisper model, which makes it suitable for deployment on edge devices, such as smartphones, tablets, and single-board computers.
An exciting development in the field of speech recognition, enabling developers to build innovative applications that can understand and interact with human speech!
Some key features of Whisper.cpp include:
- High accuracy: Whisper.cpp achieves state-of-the-art ASR performance on various benchmarks.
- Multi-language support: Whisper.cpp supports transcription and translation in multiple languages, including English, Spanish, French, German, Italian, Portuguese, Dutch, Russian, Chinese, Japanese, and Korean.
- Low latency: Whisper.cpp is optimized for real-time transcription and can process audio streams with low latency.
- Small footprint: Whisper.cpp has a small binary size, making it suitable for deployment on resource-constrained devices.
How to set it up
1) Clone the repository:
git clone https://github.com/ggerganov/whisper.cpp.git
2) Navigate into the directory:
cd whisper.cpp
3) Download one of the Whisper models converted in ggml format. For example:
sh ./models/download-ggml-model.sh base.en
4) Now build the whisper-cli example and transcribe an audio file like this:
# build the project cmake -B build cmake --build build --config Release # transcribe an audio file ./build/bin/whisper-cli -f samples/jfk.wav
For a quick demo, simply run make base.en.
The command downloads the base.en model converted to custom ggml format and runs the inference on all .wav samples in the folder samples.
For detailed usage instructions, run: ./build/bin/whisper-cli -h
Note that the whisper-cli example currently runs only with 16-bit WAV files, so make sure to convert your input before running the tool. For example, you can use ffmpeg like this:
ffmpeg -i input.mp3 -ar 16000 -ac 1 -c:a pcm_s16le output.wav
If you want some extra audio samples to play with, simply run:
make -j samples
This will download a few more audio files from Wikipedia and convert them to 16-bit WAV format via ffmpeg.
You can download and run the other models as follows:
make -j tiny.en
make -j tiny
make -j base.en
make -j base
make -j small.en
make -j small
make -j medium.en
make -j medium
make -j large-v1
make -j large-v2
make -j large-v3
make -j large-v3-turbo
Links