Parakeet vs Whisper: the two open speech models compared
Short answer: NVIDIA’s Parakeet and OpenAI’s Whisper are the two open speech-recognition model families that most Mac dictation apps are built on. Parakeet TDT v3 is faster and slightly more accurate on the English benchmark; Whisper covers far more languages. This is a model-level comparison, not an app review, and it sits inside the wider landscape of local speech-to-text on Mac. Below: accuracy, speed and architecture, language coverage, licenses, which apps run which model, and when to pick each.
The two models at a glance
Both are open-weight models you can download and run on your own machine. They differ on architecture, which drives almost everything else: how fast they run, how they behave in silences, and how many languages they cover.
| Parakeet TDT 0.6B v3 | Whisper Large V3 | |
|---|---|---|
| Maker | NVIDIA | OpenAI |
| Word error rate | 6.32% | 7.44% |
| Architecture | Transducer (Token-and-Duration) | Encoder-decoder |
| Parameters | ~600 million | ~1.55 billion |
| Languages | 25 European | ~100 |
| License | CC-BY-4.0 | MIT |
Accuracy: the leaderboard numbers
The standard reference for English accuracy is the Hugging Face Open ASR Leaderboard, which scores models on a shared set of English test data and reports an average word error rate. On that benchmark Parakeet TDT 0.6B v3 posts 6.32% against Whisper Large V3’s 7.44%. Lower is better, so Parakeet wins on the headline number, but the margin is roughly one word in a hundred. For dictation, both are already past the threshold where the model is the limiting factor.
The number to keep in proportion: word error rate is an aggregate over a fixed English corpus. Your own accuracy depends on your accent, your microphone, background noise and how much domain vocabulary you use. The leaderboard is a fair way to rank models against each other, not a promise about any one person’s transcripts. The fuller explanation of what the figure does and does not tell you sits in the primer on how word error rate is actually measured.
Speed and architecture: transducer vs encoder-decoder
The architectures are the real story. Whisper is an encoder-decoder model: it encodes a chunk of audio, then a decoder generates text autoregressively, one token at a time, attending back over the whole chunk. Parakeet is a transducer, specifically a Token-and-Duration Transducer, which predicts tokens against the audio stream incrementally and models duration explicitly.
Two consequences follow. First, speed. NVIDIA’s published benchmarks put Parakeet TDT 0.6B v3 at around a 3,333x real-time factor on Apple Silicon, against around 146x for Whisper Large V3. That is roughly an order of magnitude, and in practice it means a ten-minute dictation transcribes in a fraction of a second. For batch work the gap matters less; for push-to-talk dictation, where you want the words on screen the instant you release the key, it dominates how the tool feels.
Second, behavior in silence. Because the decoder always wants to produce text, Whisper-based systems can hallucinate words during long pauses or near-silent audio. A transducer produces silence during silence: a long pause in the audio becomes a long pause in the transcript, not an invented sentence. For dictation, where you stop and think mid-thought constantly, that difference shows up more often than the leaderboard gap does.
Whisper has a faster variant worth naming. Whisper Large V3 Turbo cuts the decoder from 32 layers to 4 and drops to around 809 million parameters, which makes it several times faster than full Large V3 at close to the same accuracy. It is still slower than Parakeet on the same hardware, and it keeps Whisper’s encoder-decoder behavior in silences, but it is the variant to compare against if speed is the concern and you need Whisper’s language coverage.
Language coverage: where Whisper wins
This is the axis where Whisper is clearly ahead. Whisper Large V3 covers about a hundred languages. Parakeet TDT v3 covers 25 European languages: English, French, German, Spanish, Italian, Portuguese, Dutch, Polish, Czech, Slovak, Hungarian, Romanian, Bulgarian, Croatian, Slovenian, Greek, Swedish, Danish, Finnish, Estonian, Latvian, Lithuanian, Maltese, Russian and Ukrainian, with automatic language detection across them.
The practical rule is simple. If you dictate in any of those 25 European languages, Parakeet is the model to want for its speed and silence behavior. If you dictate in Mandarin, Japanese, Korean, Arabic, Hindi or anything outside that European list, Whisper is the right model, because Parakeet does not cover it at all. There is no accuracy trade to weigh there; it is a coverage question. People working across languages can read the role-specific take in the primer on dictation for translators.
Licenses: both are open
Both models ship open weights, which is why they turn up inside third-party apps rather than only behind a paid API. Whisper’s code and weights are released by OpenAI under the MIT license, one of the most permissive there is. Parakeet TDT v3’s weights are released by NVIDIA under CC-BY-4.0, a Creative Commons license that permits commercial use with attribution.
For an end user the licensing rarely matters directly; it matters because it is the reason a small developer can ship either model inside a Mac app without licensing a closed engine. The work of actually getting an NVIDIA model running on the Apple Neural Engine is its own job, which we wrote up in the build note on shipping a 600 MB speech model inside a 2 MB app.
Which Mac apps run which model
Because both are open, the lines between apps are blurrier than they used to be. Most of the Whisper-based Mac apps added Parakeet support during 2025, so several apps now offer both families and let you pick.
| App | Models it runs | Shape |
|---|---|---|
| Parakeety | Parakeet TDT v3 only | One model, on-device, push-to-talk, $30 once |
| MacWhisper | Whisper models plus Parakeet | File transcription, model picker |
| SuperWhisper | Whisper models plus Parakeet, and cloud models | Multi-engine dictation with AI post-processing |
The distinction is what each app is built around. MacWhisper is strongest at transcribing pre-recorded audio and video files and gives you a menu of models to choose from. SuperWhisper is a multi-engine dictation app with AI cleanup baked in and the widest model menu, including cloud options. Parakeety is the narrowest of the three on purpose: it runs Parakeet TDT v3 as its only engine, fully on-device, so there is no model picker and nothing to configure. The deeper background on the model itself is in the primer on running NVIDIA Parakeet on a Mac.
When to pick each
- You dictate in a European language and want it instant. Parakeet TDT v3 is the model: fastest on Apple Silicon, slightly ahead on the English benchmark, and it does not hallucinate in your pauses.
- You dictate in a language outside the 25 European ones. Whisper is the only one of the two that covers you. Whisper Large V3, or Turbo if speed matters more than the last point of accuracy.
- You want one model that just runs, no picker. An app built around a single on-device model fits better than a multi-engine one. That is the bet Parakeety makes.
- You want to switch between models inside one app. A multi-engine app that ships both families, like SuperWhisper, is the right shape.
A note on names
Parakeet is the NVIDIA model family; Whisper is the OpenAI one. Parakeety, with the extra letter, is this app: a Mac dictation tool that runs the Parakeet TDT v3 model on-device. It is not affiliated with NVIDIA. The comparison above is between the two models, so it applies to any app running them, not only to ours.
FAQ
- Which is more accurate, Parakeet or Whisper?
- On the Hugging Face Open ASR Leaderboard, Parakeet TDT 0.6B v3 posts a 6.32% word error rate against Whisper Large V3’s 7.44%, so Parakeet edges it on the aggregate English benchmark. Accuracy is a property of the model, not of where it runs, and the gap is small enough that for most dictation either model is more accurate than you need. The larger practical difference is that Parakeet’s transducer architecture stays silent during silences, where Whisper’s encoder-decoder design can invent text.
- How does Whisper Turbo compare to Parakeet v3?
- Whisper Large V3 Turbo trims the decoder from 32 layers to 4 and drops to around 809 million parameters, which makes it several times faster than full Large V3 at close to the same accuracy. It is still slower than Parakeet TDT v3 on the same Apple Silicon, and it inherits Whisper’s encoder-decoder behavior in silences. Turbo’s advantage is language coverage: it keeps Whisper’s roughly one hundred languages, where Parakeet v3 covers 25 European ones.
- Are Parakeet and Whisper open source?
- Both ship open weights you can download and run yourself. Whisper’s code and weights are released by OpenAI under the MIT license. NVIDIA’s Parakeet TDT 0.6B v3 weights are released under CC-BY-4.0, which permits commercial use with attribution. Open weights are why both models turn up inside third-party Mac apps rather than only behind a paid API.
- Do SuperWhisper and MacWhisper support the Parakeet model?
- Yes, both added Parakeet support during 2025 alongside their existing Whisper models, so the same model families now show up across several Mac apps. Parakeety is different in that it runs Parakeet TDT v3 as its only engine, fully on-device, with no model picker and no subscription. If you want to choose between Whisper and Parakeet inside one app, SuperWhisper or MacWhisper fit; if you want one fast model that just works, Parakeety is the simpler path.
Try it
Parakeety is a Mac menu-bar app that runs Parakeet TDT v3 on the Apple Neural Engine. Hold the section key, talk, release; your words paste at the cursor in whichever app you were typing into. Audio never leaves the machine. It needs Apple Silicon and macOS 14 or later. There is a free 7-day trial with no card required. After that it is $30 once.