edge-tier=2 WDT storm on ESP32-S3 N16R8 — DSP task starves UDP sender (v0.6.5-esp32)
Summary
On a clean v0.6.5-esp32 flash to an ESP32-S3 N16R8 (16 MB flash / 8 MB PSRAM), provisioning with --edge-tier 2 produces a sustained task_wdt storm on edge_dsp (CPU 1) within ~30 seconds of boot. The DSP task monopolizes core 1 and starves the UDP sender — measured 0 packets/s to a host UDP listener on the configured --target-ip:5005. The README and release notes for v0.6.5 state "boots cleanly at --edge-tier 2 with full vitals + edge DSP active," so this looks like an undocumented regression on the 16 MB / 8 MB PSRAM (N16R8) variant.
Switching to --edge-tier 1 is stable (~1.5 pps, vitals + presence work) but does not send raw CSI amplitudes, so the server-side pose model emits keypoints with confidence: 0.0 and the observatory pose view is empty.
Hardware
- Board: ESP32-S3 dev board labelled "Gold Edition N16R8" (Waveshare-style, AMOLED SH8601 1.8" 368×448 display detected on boot; no FT3168 touch, no TCA9554)
- Chip: ESP32-S3 (QFN56) rev v0.2, 8 MB embedded PSRAM, 16 MB flash
- MAC:
e8:f6:0a:a4:e1:ac
- USB-Serial/JTAG, COM5 on Windows
- WiFi: 2.4 GHz WPA2-PSK, RSSI -25 dBm to AP (Pakedge AN-810-AP-I-AC), channel 1/6
Firmware
- Release:
v0.6.5-esp32 (binaries flashed verbatim from firmware/esp32-csi-node/release_bins/ — bootloader.bin, partition-table.bin, ota_data_initial.bin, esp32-csi-node.bin)
- Standard 16 MB partition variant (not
-4mb)
Reproduction
# 1. Erase + flash (clean state)
python -m esptool --chip esp32s3 --port COM5 erase-flash
python -m esptool --chip esp32s3 --port COM5 --baud 460800 write-flash \
0x0 bootloader.bin 0x8000 partition-table.bin \
0xf000 ota_data_initial.bin 0x20000 esp32-csi-node.bin
# 2. Provision with --edge-tier 2
python provision.py --port COM5 --ssid "<my-2.4ghz-ssid>" --password "<pw>" \
--target-ip <my-server-ip> --edge-tier 2
# 3. Observe serial — WDT storm begins within ~30s
# 4. Listen for UDP packets on <my-server-ip>:5005 → ~0 pps
Observed Serial Output
Clean boot succeeds and the chip enters streaming state, then within ~30 seconds the watchdog begins firing repeatedly:
I (7706) main: CSI streaming active → 192.168.5.11:5005 (edge_tier=2, OTA=ready, WASM=ready, mmWave=off)
E (98661) task_wdt: Task watchdog got triggered. The following tasks/users did not reset the watchdog in time:
E (98661) task_wdt: - IDLE1 (CPU 1)
E (98661) task_wdt: Tasks currently running:
E (98661) task_wdt: CPU 0: IDLE0
E (98661) task_wdt: CPU 1: edge_dsp
E (98661) task_wdt: Print CPU 1 backtrace
Backtrace: 0x4037890F:0x3FC9D6A0 0x4037746D:0x3FC9D6C0 0x4200D225:0x3FCC9C60
The same three-frame backtrace recurs every ~5 s and is reproducible across multiple reset cycles.
Workarounds Attempted (All Still WDT)
| Provision args |
Result |
--edge-tier 2 (defaults: subk 32, vital_win 300, vital_int 1000) |
WDT storm, 0 pps |
--edge-tier 2 --subk-count 8 |
WDT storm, 0 pps |
--edge-tier 2 --subk-count 8 --vital-win 100 |
WDT storm, 0 pps |
--edge-tier 2 --subk-count 8 --vital-win 100 --vital-int 5000 |
WDT storm, 0 pps |
--edge-tier 1 |
OK, ~1.5 pps stats; vitals + presence work server-side |
--edge-tier 0 |
Mislabeled in --help as "raw passthrough" but actually sends 0 packets — stream_sender_send is not invoked when tier=0 in edge_processing.c:1049-1052 |
Source Reading
firmware/esp32-csi-node/main/edge_processing.c edge_task() (lines 904-939) is correctly written — it has vTaskDelay(1) between frames in a batch and a 20 ms post-batch yield, with comments explicitly referencing prior watchdog fixes (#266, #321). So the hang appears to be deeper inside process_frame() (lines 710-898) or one of its callees on this board variant. Suspect candidates without instrumentation:
update_multi_person_vitals() (lines 476-550) with EDGE_PHASE_HISTORY_LEN-sized inner loops over up to EDGE_MAX_PERSONS groups
estimate_bpm_zero_crossing() on full 300-sample histories
- WASM dispatch path at
process_frame() lines 879-897 — wasm_runtime_on_frame() is called every frame when tier >= 2 and s_pkt_valid — could this block?
I haven't built from source to instrument this; flagging in case the maintainer recognizes it immediately.
Server-Side Evidence
ruvnet/wifi-densepose:latest Docker container, CSI_SOURCE=esp32. Listening on UDP 5005:
TOTAL: 0 packets, 0 bytes in 10.0s = 0.0 pps # --edge-tier 2
TOTAL: 18 packets, 2212 bytes in 12.1s = 1.5 pps # --edge-tier 1
With --edge-tier 1, /api/v1/nodes reports the node as motion_level: present_moving, person_count: 1, but status: stale (1.5 pps is below the freshness threshold). /api/v1/models/load succeeds for the bundled wifi-densepose-v1.rvf (13 KB, found at docker/wifi-densepose-v1.rvf), but pose keypoints in the WebSocket sensing_update stream all report confidence: 0.0 because nodes[].amplitude = [] and subcarrier_count = 0 in tier=1 payloads.
What Would Help
- Confirm whether v0.6.5-esp32 tier=2 was validated on the 16 MB / 8 MB PSRAM (N16R8) variant or only on a different board (e.g. 4 MB). The
release_bins/ directory ships both a regular and -4mb set — maybe the 16 MB binary has a config divergence.
- If a fix lands, a
release_bins rebuild with that diff would let people on N16R8 boards (a very common cheap board on Amazon/AliExpress) use the project as documented.
Happy to provide additional logs, run instrumented builds, or test pre-release binaries against this exact board if helpful.
edge-tier=2WDT storm on ESP32-S3 N16R8 — DSP task starves UDP sender (v0.6.5-esp32)Summary
On a clean v0.6.5-esp32 flash to an ESP32-S3 N16R8 (16 MB flash / 8 MB PSRAM), provisioning with
--edge-tier 2produces a sustainedtask_wdtstorm onedge_dsp(CPU 1) within ~30 seconds of boot. The DSP task monopolizes core 1 and starves the UDP sender — measured 0 packets/s to a host UDP listener on the configured--target-ip:5005. The README and release notes for v0.6.5 state "boots cleanly at--edge-tier 2with full vitals + edge DSP active," so this looks like an undocumented regression on the 16 MB / 8 MB PSRAM (N16R8) variant.Switching to
--edge-tier 1is stable (~1.5 pps, vitals + presence work) but does not send raw CSI amplitudes, so the server-side pose model emits keypoints withconfidence: 0.0and the observatory pose view is empty.Hardware
e8:f6:0a:a4:e1:acFirmware
v0.6.5-esp32(binaries flashed verbatim fromfirmware/esp32-csi-node/release_bins/—bootloader.bin,partition-table.bin,ota_data_initial.bin,esp32-csi-node.bin)-4mb)Reproduction
Observed Serial Output
Clean boot succeeds and the chip enters streaming state, then within ~30 seconds the watchdog begins firing repeatedly:
The same three-frame backtrace recurs every ~5 s and is reproducible across multiple reset cycles.
Workarounds Attempted (All Still WDT)
--edge-tier 2(defaults: subk 32, vital_win 300, vital_int 1000)--edge-tier 2 --subk-count 8--edge-tier 2 --subk-count 8 --vital-win 100--edge-tier 2 --subk-count 8 --vital-win 100 --vital-int 5000--edge-tier 1--edge-tier 0--helpas "raw passthrough" but actually sends 0 packets —stream_sender_sendis not invoked when tier=0 inedge_processing.c:1049-1052Source Reading
firmware/esp32-csi-node/main/edge_processing.cedge_task()(lines 904-939) is correctly written — it hasvTaskDelay(1)between frames in a batch and a 20 ms post-batch yield, with comments explicitly referencing prior watchdog fixes (#266, #321). So the hang appears to be deeper insideprocess_frame()(lines 710-898) or one of its callees on this board variant. Suspect candidates without instrumentation:update_multi_person_vitals()(lines 476-550) withEDGE_PHASE_HISTORY_LEN-sized inner loops over up toEDGE_MAX_PERSONSgroupsestimate_bpm_zero_crossing()on full 300-sample historiesprocess_frame()lines 879-897 —wasm_runtime_on_frame()is called every frame whentier >= 2ands_pkt_valid— could this block?I haven't built from source to instrument this; flagging in case the maintainer recognizes it immediately.
Server-Side Evidence
ruvnet/wifi-densepose:latestDocker container,CSI_SOURCE=esp32. Listening on UDP5005:With
--edge-tier 1,/api/v1/nodesreports the node asmotion_level: present_moving,person_count: 1, butstatus: stale(1.5 pps is below the freshness threshold)./api/v1/models/loadsucceeds for the bundledwifi-densepose-v1.rvf(13 KB, found atdocker/wifi-densepose-v1.rvf), but pose keypoints in the WebSocketsensing_updatestream all reportconfidence: 0.0becausenodes[].amplitude = []andsubcarrier_count = 0in tier=1 payloads.What Would Help
release_bins/directory ships both a regular and-4mbset — maybe the 16 MB binary has a config divergence.release_binsrebuild with that diff would let people on N16R8 boards (a very common cheap board on Amazon/AliExpress) use the project as documented.Happy to provide additional logs, run instrumented builds, or test pre-release binaries against this exact board if helpful.