fix(voice): in-process Whisper STT works without an external binary on macOS (#3425) by YellowSnnowmann · Pull Request #3915 · tinyhumansai/openhuman

YellowSnnowmann · 2026-06-22T12:48:22Z

Summary

Resolve the local Whisper binary from standard Unix bin dirs (/opt/homebrew/bin, /usr/local/bin, …) after the $PATH scan misses, so Finder-launched macOS apps (which inherit a minimal launchd PATH) and brew-installed binaries both resolve. Precedence preserved: workspace install → env var → $PATH → standard dirs.
Route the STT factory's WhisperSttProvider to the in-process, model-agnostic whisper-rs engine for 16 kHz WAV input, falling back to the unchanged subprocess path for other audio containers / errors.
Settings → Voice: poll install status live so download progress advances, and gate the Test STT / Test TTS buttons until the selected local model has finished downloading.

Problem

On a fresh macOS install, local Whisper STT "doesn't work" even when the binary is installed and runs fine in a terminal (#3425): macOS GUI apps launched from Finder inherit a minimal launchd PATH that omits Homebrew dirs, so the resolver can't find whisper-cli. Separately, the factory STT provider was subprocess-only, so it depended on an external binary even though a model-agnostic in-process engine was already available.

Solution

inference/paths.rs: add a standard_unix_bin_dirs() fallback scan (Unix-gated; no-op on Windows) used after the $PATH miss.
voice/factory/stt_providers.rs + whisper_engine.rs: choose_whisper_route sends 16 kHz WAV to the in-process engine via transcribe_wav_bytes, with reload-on-model-mismatch; any miss (non-WAV, flag off, load/inference error) falls back to the existing subprocess path unchanged.
components/settings/panels/VoicePanel.tsx: live install-status polling (only while a download is in flight) + Test buttons gated on local-engine installed state.

Verification (runtime, not just unit tests)

Verified by running openhuman-core standalone and driving the voice JSON-RPCs directly with a real 16 kHz speech WAV — no microphone, no whisper-cli on PATH.

Repro:

# real 16 kHz mono speech clip
say "testing one two three four" -o /tmp/s.aiff
afconvert -f WAVE -d LEI16@16000 -c 1 /tmp/s.aiff /tmp/s.wav

GGML_NATIVE=OFF cargo build --bin openhuman-core
OPENHUMAN_CORE_TOKEN=t OPENHUMAN_CORE_PORT=7799 OPENHUMAN_WORKSPACE=/tmp/oh \
  ./target/debug/openhuman-core run --jsonrpc-only &

# install tiny model, then dispatch the WAV through the whisper factory provider
# POST openhuman.inference_install_whisper {"model_size":"tiny"}
# POST openhuman.voice_stt_dispatch {"provider":"whisper","model":"tiny","audio_base64":"<b64 of s.wav>","mime_type":"audio/wav"}

Result:

{"provider":"whisper","text":"testing 1, 2, 3, 4."}

voice_status reported whisper_binary: null, so the subprocess path would have errored — a correct transcript proves the in-process whisper-rs route ran end-to-end with no external binary, which is the fix.

Also confirmed: install progress advances through the live status table (downloading → extracting → install complete), and the Test buttons enable only once the model reports installed.

Scope note — local Piper TTS deliberately excluded

An earlier revision of this branch tried to make local Piper TTS work by spawning piper with its install dir as current_dir + ESPEAK_DATA_PATH. Runtime verification disproved that approach and it has been reverted:

The official rhasspy/piper latest macOS tarball ships no libespeak-ng.1.dylib / libonnxruntime.dylib (only a .dSYM). The piper binary aborts with dyld: Library not loaded: @rpath/libespeak-ng.1.dylib.
macOS dyld resolves @rpath dylibs via the binary's LC_RPATH, not the working directory, so current_dir/ESPEAK_DATA_PATH cannot fix a dylib that was never bundled.

Making local Piper TTS work on macOS needs a separate change (bundle the dylibs at install + patch the rpath via install_name_tool, since DYLD_* is SIP-stripped in a signed app). Tracked as follow-up; this PR is STT-only.

Submission Checklist

Tests added or updated — binary-resolution precedence, in-process WAV routing + non-WAV/non-16 kHz rejects, and the install-gated Test buttons (VoicePanel.test.tsx 44/44).
Diff coverage ≥ 80% — focused Rust + Vitest tests added for every changed unit; merged diff-cover runs in CI.
Coverage matrix updated — N/A: additive routing/UX change.
No new external network dependencies introduced.
Linked issue referenced under ## Related.

Impact

Platform: desktop (macOS primary; the binary-resolution fallback is Unix-gated, no-op on Windows). Apple Silicon Rust builds use the documented GGML_NATIVE=OFF workaround.
Compatibility: binary-resolution precedence unchanged for existing installs; STT falls back to the prior subprocess path for any non-WAV input, so there is no path where a previously-working case regresses.

AI Authored PR Metadata

Linear Issue

Key: N/A
URL: N/A

Commit & Branch

Branch: fix/3425-local-voice-macos
Commit SHA: 33c6a00

Validation Run

pnpm --filter openhuman-app format:check
pnpm typecheck
Focused tests: VoicePanel.test.tsx (44/44); cargo test --lib -- paths:: voice::factory::stt_providers whisper_engine (131 passed)
Rust fmt/check: cargo fmt --check, GGML_NATIVE=OFF cargo check
Tauri fmt/check: GGML_NATIVE=OFF pnpm rust:check
Runtime: core standalone + voice_stt_dispatch with a real 16 kHz WAV → correct in-process transcript, no external binary.

Behavior Changes

Intended behavior change: in-process Whisper transcription for all model sizes with no external binary; Test buttons enable once a model finishes downloading.
User-visible effect: local STT works on a fresh macOS install; live download progress.

Parity Contract

Legacy behavior preserved: binary-resolution precedence order unchanged; non-WAV STT path unchanged.
Guard/fallback/dispatch parity checks: in-process route only for 16 kHz WAV + flag on; subprocess fallback otherwise.

Duplicate / Superseded PR Handling

Duplicate PR(s): None
Canonical PR: This PR
Resolution: N/A

Summary by CodeRabbit

New Features
- Local STT/TTS “Test” actions now reflect model readiness by disabling when the selected local Whisper/Piper provider isn’t installed and showing an explanatory message.
Bug Fixes
- Improved Whisper STT routing by detecting valid WAV input and using the in-process engine when eligible, otherwise falling back reliably.
- Better local binary discovery on desktop by extending Whisper/Piper path resolution.
- Updated self-test voice audio fixtures to 16kHz to match expectations.
- Mic input flow now skips retry backoff when the local STT binary is missing, moving directly to the in-process path.
Tests
- Added/expanded coverage for the new gating, WAV routing, path resolution, and 16kHz fixtures.

Local Whisper and Piper fail to work on a fresh macOS install even when the binaries are present: GUI apps launched from Finder inherit a minimal launchd PATH that omits Homebrew dirs, and Piper's bundled espeak-ng engine is never located at spawn time. - paths: after the $PATH scan misses, resolve whisper/piper from standard Unix bin dirs (/opt/homebrew/bin, /usr/local/bin, …) so Finder-launched apps and brew installs both resolve. Precedence preserved: workspace install > env var > $PATH > standard dirs. - voice: spawn Piper with its install dir as cwd + ESPEAK_DATA_PATH at the bundled espeak-ng-data, and chmod 0755 the extracted binary, so TTS synthesizes with no external setup. Softened the espeak error message. - stt factory: route WhisperSttProvider to the in-process whisper-rs engine for 16 kHz WAV input (model-agnostic), falling back to the subprocess path for other containers. - settings: poll install status live so download progress advances, and gate the Test STT/TTS buttons until the selected local model finishes downloading. Tests cover binary-resolution precedence, espeak data-dir detection, in-process WAV routing + rejects, the chmod pass, and the install-gated Test buttons. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-06-22T12:48:32Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 437e8ddf-2b9e-457a-99ab-21b3e13b7f79

📥 Commits

Reviewing files that changed from the base of the PR and between 1940bee and 0e98891.

📒 Files selected for processing (3)

app/src/features/human/MicComposer.test.tsx
app/src/features/human/MicComposer.tsx
src/openhuman/voice/factory/stt_providers.rs

🚧 Files skipped from review as they are similar to previous changes (1)

src/openhuman/voice/factory/stt_providers.rs

📝 Walkthrough

Walkthrough

Adds in-process Whisper transcription and routing, improves Whisper/Piper binary resolution, updates the silent WAV fixture to 16kHz, changes MicComposer fallback handling for missing binaries, and disables VoicePanel STT/TTS tests until local models are installed.

Changes

Local Voice: in-process transcription, binary resolution, and UI gating

Layer / File(s)	Summary
WAV detection and in-process transcription helpers `src/openhuman/inference/local/service/whisper_engine.rs`	Adds `looks_like_wav` and `transcribe_wav_bytes`; unit tests cover header sniffing, decode failures, and non-16kHz rejection with a PCM16 WAV test helper.
Whisper/Piper binary resolution with Unix fallback dirs `src/openhuman/inference/paths.rs`	Adds `standard_unix_bin_dirs()` and `resolve_binary_in_dirs()`; updates Whisper and Piper resolution to scan PATH and then standard Unix directories; adds tests for ordering, platform membership, and workspace precedence.
Whisper routing and silent WAV fixture `src/openhuman/voice/factory/stt_providers.rs`, `src/openhuman/voice/schemas/helpers.rs`, `src/openhuman/voice/schemas/handlers/provider_server.rs`, `src/openhuman/voice/schemas_tests.rs`	Adds local Whisper route selection, model loading, and in-process fallback handling; updates the silent WAV fixture to 16kHz with matching comment and test expectations; adds route-selection tests.
MicComposer fallback and VoicePanel test gating `app/src/features/human/MicComposer.tsx`, `app/src/features/human/MicComposer.test.tsx`, `app/src/components/settings/panels/VoicePanel.tsx`, `app/src/components/settings/panels/__tests__/VoicePanel.test.tsx`	Stops MicComposer native retries on missing local binaries and adds coverage; disables VoicePanel Test STT/TTS when local Whisper/Piper installs are not ready and adds coverage.

Sequence Diagram(s)

sequenceDiagram
  participant Caller
  participant MicComposer
  participant WhisperSttProvider
  participant WhisperEngine
  participant whisper_cli as whisper-cli subprocess

  Caller->>MicComposer: submit recorded audio
  MicComposer->>MicComposer: transcribeWithRetry
  alt missing local binary
    MicComposer->>MicComposer: stop retry loop
    MicComposer->>WhisperSttProvider: fallback transcription path
  end
  WhisperSttProvider->>WhisperSttProvider: chooseWhisperRoute
  alt in-process WAV route
    WhisperSttProvider->>WhisperEngine: load model + transcribe_wav_bytes
    WhisperEngine-->>WhisperSttProvider: transcription result
  else subprocess route
    WhisperSttProvider->>whisper_cli: transcribe_whisper
    whisper_cli-->>WhisperSttProvider: transcription result
  end
  WhisperSttProvider-->>MicComposer: transcript
  MicComposer-->>Caller: submit result

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

tinyhumansai/openhuman#3861: Modifies VoicePanel.tsx to gate provider actions on Whisper/Piper install readiness, touching the same install-status logic extended here.

Suggested reviewers

oxoxDev

Poem

🐇 I hopped through WAVs so quiet and small,
16kHz silence now answers the call.
If binaries hide, the retries stand down,
Then in-process whispers can wear the crown.
A bunny cheers softly: the voice path is clear!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Out of Scope Changes check	⚠️ Warning	The PR also changes Piper/TTS resolution and TTS test-button gating, which are outside the stated Whisper STT scope.	Remove the Piper/TTS-related changes or split them into a separate follow-up PR focused on Whisper STT only.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the main macOS Whisper STT fix and matches the changed code.
Linked Issues check	✅ Passed	The STT route, WAV handling, model loading, and 16 kHz fixture changes address the local voice failure on macOS.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint install failed: private package registry requires authentication. Disable ESLint in CodeRabbit settings or use public packages.

_{Comment @coderabbitai help to get the list of available commands.}

@rpath

…hange Runtime verification (core standalone + real 16 kHz WAV) confirmed the in-process whisper-rs route transcribes with no external binary. The same run disproved Component C's premise: the rhasspy piper `latest` macOS tarball ships NO `libespeak-ng.1.dylib` / `libonnxruntime.dylib` (only a .dSYM), so the piper binary aborts with `dyld: Library not loaded: @rpath/libespeak-ng.1.dylib` regardless of `current_dir`/`ESPEAK_DATA_PATH` — dyld resolves @rpath via the binary's LC_RPATH, not cwd. cwd/env cannot load a dylib that was never bundled. Revert the Piper TTS spawn changes (local_speech cwd/env + softened error, install_piper chmod, resolve_piper_dir_with_config) and keep this PR scoped to the verified STT work: standard-dir binary resolution (B), in-process factory routing (A2), and the install-aware Test buttons / live progress. Local Piper TTS needs a separate change to bundle the dylibs + patch rpath. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The live-poll useEffect (interval body that re-reads whisper/piper install status while a download is in flight) had no test, leaving its lines below the 80% diff-coverage gate. Add a fake-timer test that mounts in an `installing` state, advances past one 2s tick, and asserts the poller re-queries both engines and observes the completed status. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…e-macos # Conflicts: # app/src/components/settings/panels/VoicePanel.tsx # app/src/components/settings/panels/__tests__/VoicePanel.test.tsx

…ary) The voice_test_provider STT fixture was an 8kHz silent WAV. looks_like_wav accepts it (RIFF header), so it routed to the in-process whisper-rs engine, whose decoder only accepts 16kHz and rejected it — falling through to the whisper-cli subprocess, which errors 'binary not found' on a machine with no external binary (exactly the tinyhumansai#3425 case). Generate the fixture at 16kHz so Test STT exercises the real binary-free in-process path.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1940bee532

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/openhuman/voice/factory/stt_providers.rs`:
- Around line 52-85: The shared whisper engine is only pinned during
ensure_engine_loaded, so a later concurrent request can unload or swap
`service.whisper` before the transcription work runs in the `spawn_blocking`
path. Fix this by holding the same load/inference guard across the entire
request flow in `ensure_engine_loaded` and the transcription call site, or by
switching `LocalAiService::whisper` to per-model cached instances so one request
cannot mutate another request’s engine selection. Use the existing symbols
`ensure_engine_loaded`, `service.whisper_load_lock`, and
`whisper_engine::load_engine` to locate and adjust the locking/model ownership
logic.
- Around line 40-45: `choose_whisper_route()` currently routes non-WAV inputs to
`WhisperRoute::Subprocess`, but the readiness logic still appears to treat
`whisper_in_process + model` as sufficient for local STT. Update the ready-state
check around `choose_whisper_route` and the related STT provider setup in
`stt_providers.rs` so local STT is only advertised as ready when `whisper-cli`
is also available for native `MediaRecorder` blobs, or ensure those inputs are
normalized to 16 kHz WAV before reaching this routing decision. Keep the
condition aligned with the actual runtime path used by `choose_whisper_route()`
and the subprocess fallback.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 83db93ca-3484-4e79-bb9e-ca35664a534d

📥 Commits

Reviewing files that changed from the base of the PR and between 0d6acfd and 1940bee.

📒 Files selected for processing (8)

app/src/components/settings/panels/VoicePanel.tsx
app/src/components/settings/panels/__tests__/VoicePanel.test.tsx
src/openhuman/inference/local/service/whisper_engine.rs
src/openhuman/inference/paths.rs
src/openhuman/voice/factory/stt_providers.rs
src/openhuman/voice/schemas/handlers/provider_server.rs
src/openhuman/voice/schemas/helpers.rs
src/openhuman/voice/schemas_tests.rs

…nsai#3425) MicComposer sends the native webm/mp4 blob first for speed, re-encoding to 16kHz WAV only as a fallback. On a no-binary macOS install the native codec routes to the whisper-cli subprocess and errors 'binary not found' — which was classified transient, so every dictation burned 2 backoff retries before the WAV/in-process fallback. A missing binary never reappears on retry and the native codec can't use the in-process engine, so bail the native retry loop immediately while keeping it eligible for the WAV fallback (NOT a permanent error, which would skip the fallback entirely). Addresses CodeRabbit review.

…inyhumansai#3425) ensure_engine_loaded released the load lock before transcription started, so a concurrent dispatch for a different model size could unload/reload the single-model engine mid-flight — transcribing with the wrong weights or dropping the request onto the subprocess path. Hold the load lock across both the load check and the inference so they form one critical section. Lock order (load_lock -> handle) is unchanged, so no deadlock. Addresses Codex P2 review.

…e-macos

YellowSnnowmann changed the title ~~fix(voice): local STT/TTS work without an external binary on macOS (#3425)~~ fix(voice): in-process Whisper STT works without an external binary on macOS (#3425) Jun 22, 2026

YellowSnnowmann and others added 3 commits June 22, 2026 21:14

Merge remote-tracking branch 'upstream/main' into fix/3425-local-voic…

82ddc9a

…e-macos # Conflicts: # app/src/components/settings/panels/VoicePanel.tsx # app/src/components/settings/panels/__tests__/VoicePanel.test.tsx

YellowSnnowmann marked this pull request as ready for review June 24, 2026 14:05

YellowSnnowmann requested a review from a team June 24, 2026 14:05

chatgpt-codex-connector Bot reviewed Jun 24, 2026

View reviewed changes

Comment thread src/openhuman/voice/factory/stt_providers.rs Outdated

Comment thread app/src/components/settings/panels/VoicePanel.tsx

coderabbitai Bot added bug rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure. labels Jun 24, 2026

coderabbitai Bot requested changes Jun 24, 2026

View reviewed changes

Comment thread src/openhuman/voice/factory/stt_providers.rs

Comment thread src/openhuman/voice/factory/stt_providers.rs Outdated

YellowSnnowmann added 2 commits June 24, 2026 20:39

coderabbitai Bot approved these changes Jun 24, 2026

View reviewed changes

YellowSnnowmann added 2 commits June 25, 2026 12:36

Merge remote-tracking branch 'upstream/main' into fix/3425-local-voic…

7eee932

…e-macos

Merge branch main into fix/3425-local-voice-macos

8624967

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(voice): in-process Whisper STT works without an external binary on macOS (#3425)#3915

fix(voice): in-process Whisper STT works without an external binary on macOS (#3425)#3915
YellowSnnowmann wants to merge 9 commits into
tinyhumansai:mainfrom
YellowSnnowmann:fix/3425-local-voice-macos

YellowSnnowmann commented Jun 22, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 22, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

YellowSnnowmann commented Jun 22, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Verification (runtime, not just unit tests)

Scope note — local Piper TTS deliberately excluded

Submission Checklist

Impact

Related

AI Authored PR Metadata

Linear Issue

Commit & Branch

Validation Run

Behavior Changes

Parity Contract

Duplicate / Superseded PR Handling

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

YellowSnnowmann commented Jun 22, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 22, 2026 •

edited

Loading