Skip to content

fix(voice): in-process Whisper STT works without an external binary on macOS (#3425)#3915

Open
YellowSnnowmann wants to merge 9 commits into
tinyhumansai:mainfrom
YellowSnnowmann:fix/3425-local-voice-macos
Open

fix(voice): in-process Whisper STT works without an external binary on macOS (#3425)#3915
YellowSnnowmann wants to merge 9 commits into
tinyhumansai:mainfrom
YellowSnnowmann:fix/3425-local-voice-macos

Conversation

@YellowSnnowmann

@YellowSnnowmann YellowSnnowmann commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Resolve the local Whisper binary from standard Unix bin dirs (/opt/homebrew/bin, /usr/local/bin, …) after the $PATH scan misses, so Finder-launched macOS apps (which inherit a minimal launchd PATH) and brew-installed binaries both resolve. Precedence preserved: workspace install → env var → $PATH → standard dirs.
  • Route the STT factory's WhisperSttProvider to the in-process, model-agnostic whisper-rs engine for 16 kHz WAV input, falling back to the unchanged subprocess path for other audio containers / errors.
  • Settings → Voice: poll install status live so download progress advances, and gate the Test STT / Test TTS buttons until the selected local model has finished downloading.

Problem

On a fresh macOS install, local Whisper STT "doesn't work" even when the binary is installed and runs fine in a terminal (#3425): macOS GUI apps launched from Finder inherit a minimal launchd PATH that omits Homebrew dirs, so the resolver can't find whisper-cli. Separately, the factory STT provider was subprocess-only, so it depended on an external binary even though a model-agnostic in-process engine was already available.

Solution

  • inference/paths.rs: add a standard_unix_bin_dirs() fallback scan (Unix-gated; no-op on Windows) used after the $PATH miss.
  • voice/factory/stt_providers.rs + whisper_engine.rs: choose_whisper_route sends 16 kHz WAV to the in-process engine via transcribe_wav_bytes, with reload-on-model-mismatch; any miss (non-WAV, flag off, load/inference error) falls back to the existing subprocess path unchanged.
  • components/settings/panels/VoicePanel.tsx: live install-status polling (only while a download is in flight) + Test buttons gated on local-engine installed state.

Verification (runtime, not just unit tests)

Verified by running openhuman-core standalone and driving the voice JSON-RPCs directly with a real 16 kHz speech WAV — no microphone, no whisper-cli on PATH.

Repro:

# real 16 kHz mono speech clip
say "testing one two three four" -o /tmp/s.aiff
afconvert -f WAVE -d LEI16@16000 -c 1 /tmp/s.aiff /tmp/s.wav

GGML_NATIVE=OFF cargo build --bin openhuman-core
OPENHUMAN_CORE_TOKEN=t OPENHUMAN_CORE_PORT=7799 OPENHUMAN_WORKSPACE=/tmp/oh \
  ./target/debug/openhuman-core run --jsonrpc-only &

# install tiny model, then dispatch the WAV through the whisper factory provider
# POST openhuman.inference_install_whisper {"model_size":"tiny"}
# POST openhuman.voice_stt_dispatch {"provider":"whisper","model":"tiny","audio_base64":"<b64 of s.wav>","mime_type":"audio/wav"}

Result:

{"provider":"whisper","text":"testing 1, 2, 3, 4."}

voice_status reported whisper_binary: null, so the subprocess path would have errored — a correct transcript proves the in-process whisper-rs route ran end-to-end with no external binary, which is the fix.

Also confirmed: install progress advances through the live status table (downloading → extracting → install complete), and the Test buttons enable only once the model reports installed.

Scope note — local Piper TTS deliberately excluded

An earlier revision of this branch tried to make local Piper TTS work by spawning piper with its install dir as current_dir + ESPEAK_DATA_PATH. Runtime verification disproved that approach and it has been reverted:

  • The official rhasspy/piper latest macOS tarball ships no libespeak-ng.1.dylib / libonnxruntime.dylib (only a .dSYM). The piper binary aborts with dyld: Library not loaded: @rpath/libespeak-ng.1.dylib.
  • macOS dyld resolves @rpath dylibs via the binary's LC_RPATH, not the working directory, so current_dir/ESPEAK_DATA_PATH cannot fix a dylib that was never bundled.

Making local Piper TTS work on macOS needs a separate change (bundle the dylibs at install + patch the rpath via install_name_tool, since DYLD_* is SIP-stripped in a signed app). Tracked as follow-up; this PR is STT-only.

Submission Checklist

  • Tests added or updated — binary-resolution precedence, in-process WAV routing + non-WAV/non-16 kHz rejects, and the install-gated Test buttons (VoicePanel.test.tsx 44/44).
  • Diff coverage ≥ 80% — focused Rust + Vitest tests added for every changed unit; merged diff-cover runs in CI.
  • Coverage matrix updated — N/A: additive routing/UX change.
  • No new external network dependencies introduced.
  • Linked issue referenced under ## Related.

Impact

  • Platform: desktop (macOS primary; the binary-resolution fallback is Unix-gated, no-op on Windows). Apple Silicon Rust builds use the documented GGML_NATIVE=OFF workaround.
  • Compatibility: binary-resolution precedence unchanged for existing installs; STT falls back to the prior subprocess path for any non-WAV input, so there is no path where a previously-working case regresses.

Related


AI Authored PR Metadata

Linear Issue

  • Key: N/A
  • URL: N/A

Commit & Branch

  • Branch: fix/3425-local-voice-macos
  • Commit SHA: 33c6a00

Validation Run

  • pnpm --filter openhuman-app format:check
  • pnpm typecheck
  • Focused tests: VoicePanel.test.tsx (44/44); cargo test --lib -- paths:: voice::factory::stt_providers whisper_engine (131 passed)
  • Rust fmt/check: cargo fmt --check, GGML_NATIVE=OFF cargo check
  • Tauri fmt/check: GGML_NATIVE=OFF pnpm rust:check
  • Runtime: core standalone + voice_stt_dispatch with a real 16 kHz WAV → correct in-process transcript, no external binary.

Behavior Changes

  • Intended behavior change: in-process Whisper transcription for all model sizes with no external binary; Test buttons enable once a model finishes downloading.
  • User-visible effect: local STT works on a fresh macOS install; live download progress.

Parity Contract

  • Legacy behavior preserved: binary-resolution precedence order unchanged; non-WAV STT path unchanged.
  • Guard/fallback/dispatch parity checks: in-process route only for 16 kHz WAV + flag on; subprocess fallback otherwise.

Duplicate / Superseded PR Handling

  • Duplicate PR(s): None
  • Canonical PR: This PR
  • Resolution: N/A

Summary by CodeRabbit

  • New Features

    • Local STT/TTS “Test” actions now reflect model readiness by disabling when the selected local Whisper/Piper provider isn’t installed and showing an explanatory message.
  • Bug Fixes

    • Improved Whisper STT routing by detecting valid WAV input and using the in-process engine when eligible, otherwise falling back reliably.
    • Better local binary discovery on desktop by extending Whisper/Piper path resolution.
    • Updated self-test voice audio fixtures to 16kHz to match expectations.
    • Mic input flow now skips retry backoff when the local STT binary is missing, moving directly to the in-process path.
  • Tests

    • Added/expanded coverage for the new gating, WAV routing, path resolution, and 16kHz fixtures.

Local Whisper and Piper fail to work on a fresh macOS install even when
the binaries are present: GUI apps launched from Finder inherit a minimal
launchd PATH that omits Homebrew dirs, and Piper's bundled espeak-ng
engine is never located at spawn time.

- paths: after the $PATH scan misses, resolve whisper/piper from standard
  Unix bin dirs (/opt/homebrew/bin, /usr/local/bin, …) so Finder-launched
  apps and brew installs both resolve. Precedence preserved:
  workspace install > env var > $PATH > standard dirs.
- voice: spawn Piper with its install dir as cwd + ESPEAK_DATA_PATH at the
  bundled espeak-ng-data, and chmod 0755 the extracted binary, so TTS
  synthesizes with no external setup. Softened the espeak error message.
- stt factory: route WhisperSttProvider to the in-process whisper-rs
  engine for 16 kHz WAV input (model-agnostic), falling back to the
  subprocess path for other containers.
- settings: poll install status live so download progress advances, and
  gate the Test STT/TTS buttons until the selected local model finishes
  downloading.

Tests cover binary-resolution precedence, espeak data-dir detection,
in-process WAV routing + rejects, the chmod pass, and the install-gated
Test buttons.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 437e8ddf-2b9e-457a-99ab-21b3e13b7f79

📥 Commits

Reviewing files that changed from the base of the PR and between 1940bee and 0e98891.

📒 Files selected for processing (3)
  • app/src/features/human/MicComposer.test.tsx
  • app/src/features/human/MicComposer.tsx
  • src/openhuman/voice/factory/stt_providers.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/openhuman/voice/factory/stt_providers.rs

📝 Walkthrough

Walkthrough

Adds in-process Whisper transcription and routing, improves Whisper/Piper binary resolution, updates the silent WAV fixture to 16kHz, changes MicComposer fallback handling for missing binaries, and disables VoicePanel STT/TTS tests until local models are installed.

Changes

Local Voice: in-process transcription, binary resolution, and UI gating

Layer / File(s) Summary
WAV detection and in-process transcription helpers
src/openhuman/inference/local/service/whisper_engine.rs
Adds looks_like_wav and transcribe_wav_bytes; unit tests cover header sniffing, decode failures, and non-16kHz rejection with a PCM16 WAV test helper.
Whisper/Piper binary resolution with Unix fallback dirs
src/openhuman/inference/paths.rs
Adds standard_unix_bin_dirs() and resolve_binary_in_dirs(); updates Whisper and Piper resolution to scan PATH and then standard Unix directories; adds tests for ordering, platform membership, and workspace precedence.
Whisper routing and silent WAV fixture
src/openhuman/voice/factory/stt_providers.rs, src/openhuman/voice/schemas/helpers.rs, src/openhuman/voice/schemas/handlers/provider_server.rs, src/openhuman/voice/schemas_tests.rs
Adds local Whisper route selection, model loading, and in-process fallback handling; updates the silent WAV fixture to 16kHz with matching comment and test expectations; adds route-selection tests.
MicComposer fallback and VoicePanel test gating
app/src/features/human/MicComposer.tsx, app/src/features/human/MicComposer.test.tsx, app/src/components/settings/panels/VoicePanel.tsx, app/src/components/settings/panels/__tests__/VoicePanel.test.tsx
Stops MicComposer native retries on missing local binaries and adds coverage; disables VoicePanel Test STT/TTS when local Whisper/Piper installs are not ready and adds coverage.

Sequence Diagram(s)

sequenceDiagram
  participant Caller
  participant MicComposer
  participant WhisperSttProvider
  participant WhisperEngine
  participant whisper_cli as whisper-cli subprocess

  Caller->>MicComposer: submit recorded audio
  MicComposer->>MicComposer: transcribeWithRetry
  alt missing local binary
    MicComposer->>MicComposer: stop retry loop
    MicComposer->>WhisperSttProvider: fallback transcription path
  end
  WhisperSttProvider->>WhisperSttProvider: chooseWhisperRoute
  alt in-process WAV route
    WhisperSttProvider->>WhisperEngine: load model + transcribe_wav_bytes
    WhisperEngine-->>WhisperSttProvider: transcription result
  else subprocess route
    WhisperSttProvider->>whisper_cli: transcribe_whisper
    whisper_cli-->>WhisperSttProvider: transcription result
  end
  WhisperSttProvider-->>MicComposer: transcript
  MicComposer-->>Caller: submit result
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • tinyhumansai/openhuman#3861: Modifies VoicePanel.tsx to gate provider actions on Whisper/Piper install readiness, touching the same install-status logic extended here.

Suggested reviewers

  • oxoxDev

Poem

🐇 I hopped through WAVs so quiet and small,
16kHz silence now answers the call.
If binaries hide, the retries stand down,
Then in-process whispers can wear the crown.
A bunny cheers softly: the voice path is clear!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Out of Scope Changes check ⚠️ Warning The PR also changes Piper/TTS resolution and TTS test-button gating, which are outside the stated Whisper STT scope. Remove the Piper/TTS-related changes or split them into a separate follow-up PR focused on Whisper STT only.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main macOS Whisper STT fix and matches the changed code.
Linked Issues check ✅ Passed The STT route, WAV handling, model loading, and 16 kHz fixture changes address the local voice failure on macOS.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint install failed: private package registry requires authentication. Disable ESLint in CodeRabbit settings or use public packages.


Comment @coderabbitai help to get the list of available commands.

…hange

Runtime verification (core standalone + real 16 kHz WAV) confirmed the
in-process whisper-rs route transcribes with no external binary. The same
run disproved Component C's premise: the rhasspy piper `latest` macOS
tarball ships NO `libespeak-ng.1.dylib` / `libonnxruntime.dylib` (only a
.dSYM), so the piper binary aborts with
`dyld: Library not loaded: @rpath/libespeak-ng.1.dylib` regardless of
`current_dir`/`ESPEAK_DATA_PATH` — dyld resolves @rpath via the binary's
LC_RPATH, not cwd. cwd/env cannot load a dylib that was never bundled.

Revert the Piper TTS spawn changes (local_speech cwd/env + softened error,
install_piper chmod, resolve_piper_dir_with_config) and keep this PR scoped
to the verified STT work: standard-dir binary resolution (B), in-process
factory routing (A2), and the install-aware Test buttons / live progress.
Local Piper TTS needs a separate change to bundle the dylibs + patch rpath.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@YellowSnnowmann YellowSnnowmann changed the title fix(voice): local STT/TTS work without an external binary on macOS (#3425) fix(voice): in-process Whisper STT works without an external binary on macOS (#3425) Jun 22, 2026
YellowSnnowmann and others added 3 commits June 22, 2026 21:14
The live-poll useEffect (interval body that re-reads whisper/piper install
status while a download is in flight) had no test, leaving its lines below
the 80% diff-coverage gate. Add a fake-timer test that mounts in an
`installing` state, advances past one 2s tick, and asserts the poller
re-queries both engines and observes the completed status.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e-macos

# Conflicts:
#	app/src/components/settings/panels/VoicePanel.tsx
#	app/src/components/settings/panels/__tests__/VoicePanel.test.tsx
…ary)

The voice_test_provider STT fixture was an 8kHz silent WAV. looks_like_wav
accepts it (RIFF header), so it routed to the in-process whisper-rs engine,
whose decoder only accepts 16kHz and rejected it — falling through to the
whisper-cli subprocess, which errors 'binary not found' on a machine with no
external binary (exactly the tinyhumansai#3425 case). Generate the fixture at 16kHz so
Test STT exercises the real binary-free in-process path.
@YellowSnnowmann YellowSnnowmann marked this pull request as ready for review June 24, 2026 14:05
@YellowSnnowmann YellowSnnowmann requested a review from a team June 24, 2026 14:05

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1940bee532

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/openhuman/voice/factory/stt_providers.rs Outdated
Comment thread app/src/components/settings/panels/VoicePanel.tsx
@coderabbitai coderabbitai Bot added bug rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure. labels Jun 24, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/openhuman/voice/factory/stt_providers.rs`:
- Around line 52-85: The shared whisper engine is only pinned during
ensure_engine_loaded, so a later concurrent request can unload or swap
`service.whisper` before the transcription work runs in the `spawn_blocking`
path. Fix this by holding the same load/inference guard across the entire
request flow in `ensure_engine_loaded` and the transcription call site, or by
switching `LocalAiService::whisper` to per-model cached instances so one request
cannot mutate another request’s engine selection. Use the existing symbols
`ensure_engine_loaded`, `service.whisper_load_lock`, and
`whisper_engine::load_engine` to locate and adjust the locking/model ownership
logic.
- Around line 40-45: `choose_whisper_route()` currently routes non-WAV inputs to
`WhisperRoute::Subprocess`, but the readiness logic still appears to treat
`whisper_in_process + model` as sufficient for local STT. Update the ready-state
check around `choose_whisper_route` and the related STT provider setup in
`stt_providers.rs` so local STT is only advertised as ready when `whisper-cli`
is also available for native `MediaRecorder` blobs, or ensure those inputs are
normalized to 16 kHz WAV before reaching this routing decision. Keep the
condition aligned with the actual runtime path used by `choose_whisper_route()`
and the subprocess fallback.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 83db93ca-3484-4e79-bb9e-ca35664a534d

📥 Commits

Reviewing files that changed from the base of the PR and between 0d6acfd and 1940bee.

📒 Files selected for processing (8)
  • app/src/components/settings/panels/VoicePanel.tsx
  • app/src/components/settings/panels/__tests__/VoicePanel.test.tsx
  • src/openhuman/inference/local/service/whisper_engine.rs
  • src/openhuman/inference/paths.rs
  • src/openhuman/voice/factory/stt_providers.rs
  • src/openhuman/voice/schemas/handlers/provider_server.rs
  • src/openhuman/voice/schemas/helpers.rs
  • src/openhuman/voice/schemas_tests.rs

Comment thread src/openhuman/voice/factory/stt_providers.rs
Comment thread src/openhuman/voice/factory/stt_providers.rs Outdated
…nsai#3425)

MicComposer sends the native webm/mp4 blob first for speed, re-encoding to
16kHz WAV only as a fallback. On a no-binary macOS install the native codec
routes to the whisper-cli subprocess and errors 'binary not found' — which was
classified transient, so every dictation burned 2 backoff retries before the
WAV/in-process fallback. A missing binary never reappears on retry and the
native codec can't use the in-process engine, so bail the native retry loop
immediately while keeping it eligible for the WAV fallback (NOT a permanent
error, which would skip the fallback entirely). Addresses CodeRabbit review.
…inyhumansai#3425)

ensure_engine_loaded released the load lock before transcription started, so a
concurrent dispatch for a different model size could unload/reload the
single-model engine mid-flight — transcribing with the wrong weights or
dropping the request onto the subprocess path. Hold the load lock across both
the load check and the inference so they form one critical section. Lock order
(load_lock -> handle) is unchanged, so no deadlock. Addresses Codex P2 review.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Local voice not working

1 participant