Whisper.cpp on M3 Ultra: 80× real-time — and what it means for dictation
· 6 min read · Benchmark · Apple Silicon · Whisper
If you're paying for local dictation software, you deserve an honest answer to one question: how fast is it, really? Cloud providers dodge the question with "as fast as our servers are right now." Local tools hide behind "depends on your hardware." Both are true and neither is useful.
So I measured. Reproducibly, with open data, with code anyone can run. Here are the numbers.
TL;DR
- tiny model on M3 Ultra: 80.9× real-time — 11 seconds of audio in 136 milliseconds
- large-v3-turbo beats medium: 17.7× vs 15.4× — at comparable model size on disk
- Even the largest tested model runs 17.7× faster than real-time — dictation latency is not a bottleneck
- Full data + methodology + reproduction script: github.com/mundwerk-app/whisper-metal-benchmark
Why this test exists
Honest Whisper latency measurements are scarce. Tech magazines say "runs smoothly." Vendors say "lightning fast." Both are right and both say nothing.
Mundwerk is a local dictation app for macOS that uses whisper.cpp in production. If I'm asking customers €14.99 for a one-time purchase, they should know what their Mac actually delivers — not "it depends," but in numbers.
Setup
What ran under the hood:
- Hardware: Mac Studio with Apple M3 Ultra, 24 performance + 8 efficiency cores, 80 GPU cores, 512 GB unified RAM
- OS: macOS 26.4.1 (build 25E253)
- whisper.cpp: v1.8.4, built with
-DGGML_METAL=ON -DGGML_ACCELERATE=ON -DCMAKE_BUILD_TYPE=Release - Sample: JFK inaugural address excerpt, 11 seconds, English, 22 words (publicly available in the whisper.cpp repository)
- Method: 3 measurement runs per model, median reported, no mean (more robust against outliers)
Tested: tiny, base, small, medium, large-v3-turbo. Models downloaded via the official download-ggml-model.sh script from whisper.cpp upstream.
Results
| Model | Size (MB) | Median inference (ms) | Real-time factor | ms/word |
|---|---|---|---|---|
tiny | 74 | 136.1 | 80.9× | 6.2 |
base | 141 | 161.5 | 68.1× | 7.3 |
small | 465 | 309.5 | 35.5× | 14.1 |
medium | 1,463 | 713.9 | 15.4× | 32.4 |
large-v3-turbo | 1,549 | 622.9 | 17.7× | 28.3 |
Real-time factor = how many seconds of audio are processed per second of wall-clock time. Higher is faster.
Three findings
1. Even tiny is "instant"
11 seconds of audio in 136 milliseconds. For comparison: an average key-press takes about 100 milliseconds. That means: between releasing the dictation key and seeing the text, there is roughly the time of a blink. For longer dictations, this scales linearly — 60 seconds of audio would transcribe in under a second.
In practice: on an M3 Ultra, inference is never the bottleneck. What actually takes time is the one-time model load at startup (~2 s for medium), and that has nothing to do with the dictation itself.
2. large-v3-turbo beats medium
This is the most interesting number in the whole series. large-v3-turbo is 87 MB larger on disk than medium, but 15 % faster at inference. How?
The turbo variant is a Whisper optimisation by OpenAI: full encoder (so quality close to large-v3), but a heavily reduced decoder with only 4 layers instead of 32. Since the decoder runs iteratively per token at inference time, it eats most of the time. Fewer decoder layers = faster — and barely any quality loss for most dictation use cases.
For Mundwerk users, this matters: large-v3-turbo is the sweet spot between quality and speed. The default model gives near-large-v3 quality at near-medium speed.
3. Cloud streaming would be slower
A cloud Whisper API (OpenAI, Replicate, self-hosted AWS) has to upload audio, process on a server GPU, send back text. Even with an optimal connection, you add at least 200–500 ms of network round-trip. On M3 Ultra with the tiny model, the cloud solution would be slower than the entire local processing time before the audio upload even completes.
This is not just true for top-end Macs. Even a base M1 processes whisper-cpp-medium at ~3× real-time — still faster than any cloud round-trip plus server inference.
What was NOT measured
Disclaimer: these numbers are a baseline, not a complete test. Here is what is missing:
- Word Error Rate. Latency is not everything — accuracy matters at least as much. WER measurements come in run 2 with German-English mixed samples.
- Other hardware. M3 Ultra with 80 GPU cores is the top of the M3 family. M1 base, M2 base, M3 Pro/Max, etc. will be added in subsequent runs (pull requests welcome).
- Real-world audio. Studio-clean audio, no background noise. Mundwerk in everyday use with a MacBook microphone in a café will look different — VAD and noise don't change latency, but they do affect recognition quality.
- Energy footprint. How much battery does a dictation cost? Coming with
powermetricscapture in a later iteration.
Reproduce it yourself
No one has to take my word blindly. Anyone with an Apple Silicon Mac and 4 GB free disk space can reproduce this in under 5 minutes:
# 1. Clone + build whisper.cpp
git clone https://github.com/ggerganov/whisper.cpp
cd whisper.cpp && mkdir -p build-bench && cd build-bench
cmake .. -DCMAKE_BUILD_TYPE=Release \
-DGGML_METAL=ON -DGGML_ACCELERATE=ON \
-DBUILD_SHARED_LIBS=OFF -DWHISPER_BUILD_EXAMPLES=ON
make -j$(sysctl -n hw.ncpu) whisper-cli
# 2. Download model
bash ../models/download-ggml-model.sh tiny
# 3. Measure
./bin/whisper-cli -m ../models/ggml-tiny.bin \
-f ../samples/jfk.wav -l en 2>&1 | grep "total time"
Output should land between 130 and 1500 milliseconds depending on hardware. M1 Air around 800 ms, M2 Max around 250 ms, M3 Ultra 130 ms. If your numbers differ and you'd like to share them, you're welcome to — the repository accepts pull requests.
Where this fits
Mundwerk is not "the fastest dictation software in the world." It is an honest dictation app: local, no cloud, one-time purchase, transparent about performance. This benchmark series is one piece of that promise — and it grows as more hardware gets tested.
If you'd like to cite the data (e. g. in a technical review or comparison test), please go ahead. Licence is CC BY 4.0, attribution format is in the repository.
Mundwerk Dictation — Local speech recognition for macOS
One-time purchase €14.99 (€8.99 with code LAUNCH until 2026-06-19) · Fully offline · 7-day trial
Try free for 7 days →
Sources & references
- Raw data + JSON: github.com/mundwerk-app/whisper-metal-benchmark
- Methodology: methodology.md
- whisper.cpp upstream: github.com/ggerganov/whisper.cpp (MIT license)
- Pillar article: Dictation on Mac — what counts (German)
- Article + data licence: CC BY 4.0