fix(voice): in-process Whisper STT works without an external binary on macOS (#3425)#3915
fix(voice): in-process Whisper STT works without an external binary on macOS (#3425)#3915YellowSnnowmann wants to merge 9 commits into
Conversation
Local Whisper and Piper fail to work on a fresh macOS install even when the binaries are present: GUI apps launched from Finder inherit a minimal launchd PATH that omits Homebrew dirs, and Piper's bundled espeak-ng engine is never located at spawn time. - paths: after the $PATH scan misses, resolve whisper/piper from standard Unix bin dirs (/opt/homebrew/bin, /usr/local/bin, …) so Finder-launched apps and brew installs both resolve. Precedence preserved: workspace install > env var > $PATH > standard dirs. - voice: spawn Piper with its install dir as cwd + ESPEAK_DATA_PATH at the bundled espeak-ng-data, and chmod 0755 the extracted binary, so TTS synthesizes with no external setup. Softened the espeak error message. - stt factory: route WhisperSttProvider to the in-process whisper-rs engine for 16 kHz WAV input (model-agnostic), falling back to the subprocess path for other containers. - settings: poll install status live so download progress advances, and gate the Test STT/TTS buttons until the selected local model finishes downloading. Tests cover binary-resolution precedence, espeak data-dir detection, in-process WAV routing + rejects, the chmod pass, and the install-gated Test buttons. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughAdds in-process Whisper transcription and routing, improves Whisper/Piper binary resolution, updates the silent WAV fixture to 16kHz, changes MicComposer fallback handling for missing binaries, and disables VoicePanel STT/TTS tests until local models are installed. ChangesLocal Voice: in-process transcription, binary resolution, and UI gating
Sequence Diagram(s)sequenceDiagram
participant Caller
participant MicComposer
participant WhisperSttProvider
participant WhisperEngine
participant whisper_cli as whisper-cli subprocess
Caller->>MicComposer: submit recorded audio
MicComposer->>MicComposer: transcribeWithRetry
alt missing local binary
MicComposer->>MicComposer: stop retry loop
MicComposer->>WhisperSttProvider: fallback transcription path
end
WhisperSttProvider->>WhisperSttProvider: chooseWhisperRoute
alt in-process WAV route
WhisperSttProvider->>WhisperEngine: load model + transcribe_wav_bytes
WhisperEngine-->>WhisperSttProvider: transcription result
else subprocess route
WhisperSttProvider->>whisper_cli: transcribe_whisper
whisper_cli-->>WhisperSttProvider: transcription result
end
WhisperSttProvider-->>MicComposer: transcript
MicComposer-->>Caller: submit result
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint install failed: private package registry requires authentication. Disable ESLint in CodeRabbit settings or use public packages. Comment |
…hange Runtime verification (core standalone + real 16 kHz WAV) confirmed the in-process whisper-rs route transcribes with no external binary. The same run disproved Component C's premise: the rhasspy piper `latest` macOS tarball ships NO `libespeak-ng.1.dylib` / `libonnxruntime.dylib` (only a .dSYM), so the piper binary aborts with `dyld: Library not loaded: @rpath/libespeak-ng.1.dylib` regardless of `current_dir`/`ESPEAK_DATA_PATH` — dyld resolves @rpath via the binary's LC_RPATH, not cwd. cwd/env cannot load a dylib that was never bundled. Revert the Piper TTS spawn changes (local_speech cwd/env + softened error, install_piper chmod, resolve_piper_dir_with_config) and keep this PR scoped to the verified STT work: standard-dir binary resolution (B), in-process factory routing (A2), and the install-aware Test buttons / live progress. Local Piper TTS needs a separate change to bundle the dylibs + patch rpath. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The live-poll useEffect (interval body that re-reads whisper/piper install status while a download is in flight) had no test, leaving its lines below the 80% diff-coverage gate. Add a fake-timer test that mounts in an `installing` state, advances past one 2s tick, and asserts the poller re-queries both engines and observes the completed status. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e-macos # Conflicts: # app/src/components/settings/panels/VoicePanel.tsx # app/src/components/settings/panels/__tests__/VoicePanel.test.tsx
…ary) The voice_test_provider STT fixture was an 8kHz silent WAV. looks_like_wav accepts it (RIFF header), so it routed to the in-process whisper-rs engine, whose decoder only accepts 16kHz and rejected it — falling through to the whisper-cli subprocess, which errors 'binary not found' on a machine with no external binary (exactly the tinyhumansai#3425 case). Generate the fixture at 16kHz so Test STT exercises the real binary-free in-process path.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1940bee532
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/openhuman/voice/factory/stt_providers.rs`:
- Around line 52-85: The shared whisper engine is only pinned during
ensure_engine_loaded, so a later concurrent request can unload or swap
`service.whisper` before the transcription work runs in the `spawn_blocking`
path. Fix this by holding the same load/inference guard across the entire
request flow in `ensure_engine_loaded` and the transcription call site, or by
switching `LocalAiService::whisper` to per-model cached instances so one request
cannot mutate another request’s engine selection. Use the existing symbols
`ensure_engine_loaded`, `service.whisper_load_lock`, and
`whisper_engine::load_engine` to locate and adjust the locking/model ownership
logic.
- Around line 40-45: `choose_whisper_route()` currently routes non-WAV inputs to
`WhisperRoute::Subprocess`, but the readiness logic still appears to treat
`whisper_in_process + model` as sufficient for local STT. Update the ready-state
check around `choose_whisper_route` and the related STT provider setup in
`stt_providers.rs` so local STT is only advertised as ready when `whisper-cli`
is also available for native `MediaRecorder` blobs, or ensure those inputs are
normalized to 16 kHz WAV before reaching this routing decision. Keep the
condition aligned with the actual runtime path used by `choose_whisper_route()`
and the subprocess fallback.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 83db93ca-3484-4e79-bb9e-ca35664a534d
📒 Files selected for processing (8)
app/src/components/settings/panels/VoicePanel.tsxapp/src/components/settings/panels/__tests__/VoicePanel.test.tsxsrc/openhuman/inference/local/service/whisper_engine.rssrc/openhuman/inference/paths.rssrc/openhuman/voice/factory/stt_providers.rssrc/openhuman/voice/schemas/handlers/provider_server.rssrc/openhuman/voice/schemas/helpers.rssrc/openhuman/voice/schemas_tests.rs
…nsai#3425) MicComposer sends the native webm/mp4 blob first for speed, re-encoding to 16kHz WAV only as a fallback. On a no-binary macOS install the native codec routes to the whisper-cli subprocess and errors 'binary not found' — which was classified transient, so every dictation burned 2 backoff retries before the WAV/in-process fallback. A missing binary never reappears on retry and the native codec can't use the in-process engine, so bail the native retry loop immediately while keeping it eligible for the WAV fallback (NOT a permanent error, which would skip the fallback entirely). Addresses CodeRabbit review.
…inyhumansai#3425) ensure_engine_loaded released the load lock before transcription started, so a concurrent dispatch for a different model size could unload/reload the single-model engine mid-flight — transcribing with the wrong weights or dropping the request onto the subprocess path. Hold the load lock across both the load check and the inference so they form one critical section. Lock order (load_lock -> handle) is unchanged, so no deadlock. Addresses Codex P2 review.
Summary
/opt/homebrew/bin,/usr/local/bin, …) after the$PATHscan misses, so Finder-launched macOS apps (which inherit a minimal launchdPATH) andbrew-installed binaries both resolve. Precedence preserved: workspace install → env var →$PATH→ standard dirs.WhisperSttProviderto the in-process, model-agnosticwhisper-rsengine for 16 kHz WAV input, falling back to the unchanged subprocess path for other audio containers / errors.Problem
On a fresh macOS install, local Whisper STT "doesn't work" even when the binary is installed and runs fine in a terminal (#3425): macOS GUI apps launched from Finder inherit a minimal launchd
PATHthat omits Homebrew dirs, so the resolver can't findwhisper-cli. Separately, the factory STT provider was subprocess-only, so it depended on an external binary even though a model-agnostic in-process engine was already available.Solution
inference/paths.rs: add astandard_unix_bin_dirs()fallback scan (Unix-gated; no-op on Windows) used after the$PATHmiss.voice/factory/stt_providers.rs+whisper_engine.rs:choose_whisper_routesends 16 kHz WAV to the in-process engine viatranscribe_wav_bytes, with reload-on-model-mismatch; any miss (non-WAV, flag off, load/inference error) falls back to the existing subprocess path unchanged.components/settings/panels/VoicePanel.tsx: live install-status polling (only while a download is in flight) + Test buttons gated on local-engineinstalledstate.Verification (runtime, not just unit tests)
Verified by running
openhuman-corestandalone and driving the voice JSON-RPCs directly with a real 16 kHz speech WAV — no microphone, nowhisper-clionPATH.Repro:
Result:
{"provider":"whisper","text":"testing 1, 2, 3, 4."}voice_statusreportedwhisper_binary: null, so the subprocess path would have errored — a correct transcript proves the in-process whisper-rs route ran end-to-end with no external binary, which is the fix.Also confirmed: install progress advances through the live status table (
downloading → extracting → install complete), and the Test buttons enable only once the model reportsinstalled.Scope note — local Piper TTS deliberately excluded
An earlier revision of this branch tried to make local Piper TTS work by spawning piper with its install dir as
current_dir+ESPEAK_DATA_PATH. Runtime verification disproved that approach and it has been reverted:rhasspy/piperlatestmacOS tarball ships nolibespeak-ng.1.dylib/libonnxruntime.dylib(only a.dSYM). The piper binary aborts withdyld: Library not loaded: @rpath/libespeak-ng.1.dylib.@rpathdylibs via the binary'sLC_RPATH, not the working directory, socurrent_dir/ESPEAK_DATA_PATHcannot fix a dylib that was never bundled.Making local Piper TTS work on macOS needs a separate change (bundle the dylibs at install + patch the rpath via
install_name_tool, sinceDYLD_*is SIP-stripped in a signed app). Tracked as follow-up; this PR is STT-only.Submission Checklist
VoicePanel.test.tsx44/44).diff-coverruns in CI.N/A: additive routing/UX change.## Related.Impact
GGML_NATIVE=OFFworkaround.Related
AI Authored PR Metadata
Linear Issue
Commit & Branch
fix/3425-local-voice-macosValidation Run
pnpm --filter openhuman-app format:checkpnpm typecheckVoicePanel.test.tsx(44/44);cargo test --lib -- paths:: voice::factory::stt_providers whisper_engine(131 passed)cargo fmt --check,GGML_NATIVE=OFF cargo checkGGML_NATIVE=OFF pnpm rust:checkvoice_stt_dispatchwith a real 16 kHz WAV → correct in-process transcript, no external binary.Behavior Changes
Parity Contract
Duplicate / Superseded PR Handling
Summary by CodeRabbit
New Features
Bug Fixes
Tests