Speech to Text (Whisper)

Speech to Text (Whisper)

Transcribe spoken audio or video to text or subtitles, fully local.

Whisper is installed. First run with a given model size downloads the weights from openai.com (one-time, ~75 MB to ~3 GB depending on size).

Model size guide (smaller = faster + lower quality):

  • tiny — ~75 MB, very fast, ok for clear English
  • base — ~150 MB, recommended starting point
  • small — ~500 MB, good multilingual quality
  • medium — ~1.5 GB, near-best quality, slow on CPU
  • large — ~3 GB, best quality, very slow on CPU

Without a GPU, expect roughly 0.5×–2× real-time for tiny/base/small, and 5×–20× real-time for medium/large. A 10-minute audio file at medium on CPU can take 50+ minutes.

Drag & drop file here

or click to browse Accepted: .mp3,.wav,.ogg,.flac,.aac,.m4a,.opus,.mp4,.webm,.mkv,.mov