NVIDIA Parakeet on Mac
Short answer: Parakeet is NVIDIA's open speech recognition model. It currently sits at the top of the Hugging Face Open ASR Leaderboard, ahead of Whisper, and the weights are public. Running it on a Mac is a separate question from training it or downloading it: the upstream release is built for NVIDIA GPUs, so the practical Mac path is either a wrapper app that ships a converted build (Parakeety does this) or doing the CoreML / MLX conversion work yourself.
What Parakeet is
Parakeet is a family of speech recognition models published by NVIDIA. The current flagship is Parakeet TDT 0.6B v3: roughly 600 million parameters, transducer architecture, trained on a mix of public and licensed speech data, released with open weights on Hugging Face.
"TDT" stands for Token-and-Duration Transducer. The architectural detail that matters in everyday use is that transducer models predict tokens against an audio stream incrementally, with explicit duration modeling. They do not need to hallucinate text during silences the way Whisper's encoder-decoder architecture can. A long pause in the audio produces a long pause in the transcript instead of an invented sentence.
The model is the model. It is not an app, it is not a service, it is not a UI. To use it you need an inference runtime, a microphone capture pipeline, audio resampling to the model's expected sample rate, tokenizer outputs, and whatever you want to do with the resulting text. The work of dictation, file transcription or live captioning sits on top of those primitives.
Parakeet vs Whisper at the model level
Whisper is the better-known open ASR model, released by OpenAI in 2022. Parakeet TDT v3 trades off against Whisper Large V3 on three axes: accuracy, speed, and behavior in silences.
| Parakeet TDT 0.6B v3 | Whisper Large V3 | |
|---|---|---|
| Word error rate (Open ASR Leaderboard) | 6.32% | 7.44% |
| Architecture | Transducer (TDT) | Encoder-decoder |
| Parameters | ~600M | ~1.55B |
| Behavior in silence | Silent output | Can hallucinate text |
| Languages | 25 European | ~100 |
The speed gap is the part you feel most. NVIDIA's published numbers put Parakeet TDT 0.6B v3 at around a 3,333x real-time factor on Apple Silicon, against around 146x for Whisper Large V3. For batch transcription the gap matters less; for push-to-talk dictation, where you want the words on screen before you have finished thinking, it dominates the felt experience.
The trade is language coverage. Whisper covers about a hundred languages; Parakeet TDT v3 covers 25 European languages (English, French, German, Spanish, Italian, Portuguese, Dutch, Polish, Czech, Slovak, Hungarian, Romanian, Bulgarian, Croatian, Slovenian, Greek, Swedish, Danish, Finnish, Estonian, Latvian, Lithuanian, Maltese, Russian and Ukrainian). If you dictate in Mandarin, Japanese, Arabic, Hindi or anything outside that European list, Whisper is the right model.
Running Parakeet on a Mac, the from-scratch path
The honest version is that the upstream release is not built for Macs. NVIDIA ships Parakeet through the NeMo toolkit, which expects an NVIDIA GPU and a CUDA stack. None of that exists on Apple Silicon. The path from "open weights on Hugging Face" to "running on a Mac" is real engineering work.
The rough shape of the from-scratch path looks like this:
- Set up a Python environment (Conda or uv), pull down the Hugging Face weights and the NeMo dependencies.
- Get the model running on CPU first as a sanity check. CPU inference works but is slow, and not what you want for interactive dictation.
- Convert the model to a runtime that targets the Apple Neural Engine or the GPU on Apple Silicon. The two viable paths are CoreML (Apple's first-party runtime) and MLX (Apple's open-source ML framework). Both require taking the PyTorch weights, tracing the graph, exporting to the target format and verifying numerical parity.
- Build a microphone capture pipeline, resample audio to 16 kHz mono, feed it to the model, deal with VAD (voice activity detection), surface the transcript somewhere the rest of the system can use it.
- Wrap all of that in something that does not break when the user opens it from /Applications: code signing, notarization, hardened runtime, microphone permission flow.
None of those steps are blocked by anything fundamental. They are all standard ML and Mac platform work. They add up to roughly the kind of effort that a small company has to put in to ship a polished product, which is why the wrappers exist.
Running Parakeet on a Mac, the wrapper path
Parakeety is one of the simpler ways to run Parakeet TDT v3 on a Mac. The app is a small menu-bar utility that ships a converted build of the model and runs inference on the Apple Neural Engine. The dictation flow is hold-to-talk: press and hold the section key, speak, release, and the transcript pastes at the cursor in whichever app you were typing into.
The honest scope of what Parakeety does and does not do:
- Does: push-to-talk dictation across any Mac app that accepts keyboard input. The model runs on-device. Audio never leaves the Mac. 25 European languages with auto-detection that follows mid-paragraph language switches. $30 once, lifetime updates included.
- Does not: file transcription of pre-recorded audio. Live captioning. Speaker diarization. AI post-processing of transcripts into summaries or emails. Hands-free computer control. Cloud-based features, by design.
SuperWhisper added optional Parakeet support in 2025 as part of its multi-engine offering; its Parakeet path is a proprietary build from Argmax rather than the upstream model. MacWhisper is Whisper-only at the time of writing. The deeper comparisons are in Parakeety vs SuperWhisper and Parakeety vs MacWhisper.
Parakeet, Parakeety, Parakeet Chat, Parakeet Systems
The naming has gotten cluttered, so it is worth pinning down what is what.
- Parakeet is the NVIDIA speech recognition model family. It is a model, not a product. The current version is Parakeet TDT 0.6B v3.
- Parakeety is this app. A Mac dictation tool that runs the Parakeet TDT v3 model on-device. Not affiliated with NVIDIA.
- Parakeet Chat and Parakeet Systems are products from a different company (Parakeet AI) building autonomous voice agents for customer service. Unrelated to the NVIDIA speech model and unrelated to Parakeety.
The shared word is the cause of most of the confusion. The NVIDIA model is the only thing referred to as "Parakeet" on the Hugging Face leaderboard or in ASR research papers; everything else is a different product that happens to share a piece of vocabulary.
Picking the right path
- If you want push-to-talk dictation on a Mac and do not want to build anything: Parakeety is the most direct route. $30 once, 7-day free trial.
- If you want file transcription or batch jobs: the from-scratch NeMo path on CPU works, slowly. A faster route is to use a Whisper-based file-transcription app like MacWhisper, since file-transcription apps are well-served by Whisper. Parakeet's speed advantage matters most for interactive use.
- If you want to embed Parakeet in your own Mac app: the from-scratch CoreML / MLX conversion path is real work but tractable, and the model weights are openly licensed. Expect to spend time on inference plumbing and on the audio capture side of things.
- If you need a language Parakeet does not cover: use a Whisper-based tool instead. Whisper Large V3 covers around a hundred languages versus Parakeet v3's 25.
FAQ
- Is Parakeet the same thing as Parakeety?
- No. Parakeet is NVIDIA's open speech recognition model. Parakeety is a Mac dictation app that uses that model. The model is the engine; Parakeety is the car wrapped around it. There is also Parakeet Chat and Parakeet Systems, which are unrelated products from a different company. The Hugging Face model card at nvidia/parakeet-tdt-0.6b-v3 is the authoritative source for the model itself.
- Can I run NVIDIA Parakeet on a Mac without writing any code?
- Yes. Parakeety ships the Parakeet TDT v3 model as a Mac menu-bar app. Install, hold the section key to dictate, release to paste at the cursor. No Python environment, no CUDA drivers, no model conversion. The trade is that you get push-to-talk dictation specifically, not a general-purpose ASR pipeline.
- Is Parakeet better than Whisper?
- On the public benchmarks, yes. Parakeet TDT 0.6B v3 posts a 6.32% word error rate against Whisper Large V3's 7.44% on the Hugging Face Open ASR Leaderboard, runs roughly an order of magnitude faster on the same hardware, and uses a transducer architecture that does not hallucinate during silences the way Whisper's encoder-decoder architecture does.
- Does Parakeet run on Apple Silicon natively?
- The upstream NVIDIA release is built for NVIDIA GPUs through the NeMo toolkit. Running it on Apple Silicon means either using a wrapper app that has done the model conversion work (Parakeety runs it on the Apple Neural Engine), or doing that conversion yourself through CoreML or MLX. The model weights are open, but the path from those weights to something that runs efficiently on a Mac is the work.
Try it
Parakeety is a Mac menu-bar app that ships NVIDIA's Parakeet TDT v3 model converted for Apple Silicon. Hold the section key, talk, release; your words paste at the cursor in whichever app you were typing into. Audio never leaves the machine. There is a free 7-day trial with no card required. After that it is $30 once.