# Bucky Arm — EMG Gesture Control: Master Implementation Reference > Version: 2026-03-01 | Target: ESP32-S3 N32R16V (Xtensa LX7 @ 240 MHz, 512 KB SRAM, 16 MB OPI PSRAM) > Supersedes: META_EMG_RESEARCH_NOTES.md + BUCKY_ARM_IMPROVEMENT_PLAN.md > Source paper: doi:10.1038/s41586-025-09255-w (PDF: C:/VSCode/Marvel_Projects/s41586-025-09255-w.pdf) --- ## TABLE OF CONTENTS - [PART 0 — SYSTEM ARCHITECTURE & RESPONSIBILITY ASSIGNMENT](#part-0--system-architecture--responsibility-assignment) - [0.1 Who Does What](#01-who-does-what) - [0.2 Operating Modes](#02-operating-modes) - [0.3 FSM Reference (EMG_MAIN mode)](#03-fsm-reference-emg_main-mode) - [0.4 EMG_STANDALONE Boot Sequence](#04-emg_standalone-boot-sequence) - [0.5 New Firmware Changes for Architecture](#05-new-firmware-changes-for-architecture) - [0.6 New Python Script: live_predict.py](#06-new-python-script-live_predictpy) - [0.7 Firmware Cleanup: system_mode_t Removal](#07-firmware-cleanup-system_mode_t-removal) - [PART I — SYSTEM FOUNDATIONS](#part-i--system-foundations) - [1. Hardware Specification](#1-hardware-specification) - [2. Current System Snapshot](#2-current-system-snapshot) - [2.1 Confirmed Firmware Architecture](#21--confirmed-firmware-architecture-from-codebase-exploration) - [2.2 Bicep Channel Subsystem](#22--bicep-channel-subsystem-ch3--adc_channel_9--gpio-10) - [3. What Meta Built — Filtered for ESP32](#3-what-meta-built--filtered-for-esp32) - [4. Current Code State + Known Bugs](#4-current-code-state--known-bugs) - [PART II — TARGET ARCHITECTURE](#part-ii--target-architecture) - [5. Full Recommended Multi-Model Stack](#5-full-recommended-multi-model-stack) - [6. Compute Budget for Full Stack](#6-compute-budget-for-full-stack) - [7. Why This Architecture Works for 3-Channel EMG](#7-why-this-architecture-works-for-3-channel-emg) - [PART III — GESTURE EXTENSIBILITY](#part-iii--gesture-extensibility) - [8. What Changes When Adding or Removing a Gesture](#8-what-changes-when-adding-or-removing-a-gesture) - [9. Practical Limits of 3-Channel EMG](#9-practical-limits-of-3-channel-emg) - [10. Specific Gesture Considerations](#10-specific-gesture-considerations) - [PART IV — CHANGE REFERENCE](#part-iv--change-reference) - [11. Change Classification Matrix](#11-change-classification-matrix) - [PART V — FIRMWARE CHANGES](#part-v--firmware-changes) - [Change A — DMA-Driven ADC Sampling](#change-a--dma-driven-adc-sampling) - [Change B — IIR Biquad Bandpass Filter](#change-b--iir-biquad-bandpass-filter) - [Change C — Confidence Rejection](#change-c--confidence-rejection) - [Change D — On-Device NVS Calibration](#change-d--on-device-nvs-calibration) - [Change E — int8 MLP via TFLM](#change-e--int8-mlp-via-tflm) - [Change F — Ensemble Inference Pipeline](#change-f--ensemble-inference-pipeline) - [PART VI — PYTHON/TRAINING CHANGES](#part-vi--pythontraining-changes) - [Change 0 — Forward Label Shift](#change-0--forward-label-shift) - [Change 1 — Expanded Feature Set](#change-1--expanded-feature-set) - [Change 2 — Electrode Repositioning](#change-2--electrode-repositioning) - [Change 3 — Data Augmentation](#change-3--data-augmentation) - [Change 4 — Reinhard Compression](#change-4--reinhard-compression) - [Change 5 — Classifier Benchmark](#change-5--classifier-benchmark) - [Change 6 — Simplified MPF Features](#change-6--simplified-mpf-features) - [Change 7 — Ensemble Training](#change-7--ensemble-training) - [PART VII — FEATURE SELECTION FOR ESP32 PORTING](#part-vii--feature-selection-for-esp32-porting) - [PART VIII — MEASUREMENT AND VALIDATION](#part-viii--measurement-and-validation) - [PART IX — EXPORT WORKFLOW](#part-ix--export-workflow) - [PART X — REFERENCES](#part-x--references) --- # PART 0 — SYSTEM ARCHITECTURE & RESPONSIBILITY ASSIGNMENT > This section is the authoritative reference for what runs where. All implementation > decisions in later parts should be consistent with this partition. ## 0.1 Who Does What | Responsibility | Laptop (Python) | ESP32 | |----------------|-----------------|-------| | EMG sensor reading | — | ✓ `emg_sensor_read()` always | | Raw data streaming (for collection) | Receives CSV, saves to HDF5 | Streams CSV over UART | | Model training | ✓ `learning_data_collection.py` | — | | Model export | ✓ `export_to_header()` → `model_weights.h` | Compiled into firmware | | On-device inference | — | ✓ `inference_predict()` | | Laptop-side live inference | ✓ `live_predict.py` (new script) | Streams ADC + executes received cmd | | Arm actuation | — (sends gesture string back to ESP32) | ✓ `gestures_execute()` | | Autonomous operation (no laptop) | Not needed | ✓ `EMG_STANDALONE` mode | | Bicep flex detection | — | ✓ `bicep_detect()` (new, Section 2.2) | | NVS calibration | — | ✓ `calibration.c` (Change D) | **Key rule**: The laptop is never required for real-time arm control in production. The laptop's role is: collect data → train model → export → flash firmware → done. After that, the ESP32 operates completely independently. --- ## 0.2 Operating Modes Controlled by `#define MAIN_MODE` in `config/config.h`. The enum currently reads `enum {EMG_MAIN, SERVO_CALIBRATOR, GESTURE_TESTER}`. A new value `EMG_STANDALONE` must be added. | `MAIN_MODE` | When to use | Laptop required? | Entry point | |-------------|-------------|-----------------|-------------| | `EMG_MAIN` | Development sessions, data collection, monitored operation | Yes — UART handshake to start any mode | `appConnector()` in `main.c` | | `EMG_STANDALONE` | **Fully autonomous deployment** — no laptop | **No** — boots directly into predict+control | `run_standalone_loop()` (new function in `main.c`) | | `SERVO_CALIBRATOR` | Hardware setup, testing servo range of motion | Yes (serial input) | Inline in `app_main()` | | `GESTURE_TESTER` | Testing gesture→servo mapping via keyboard | Yes (serial input) | Inline in `app_main()` | **How to switch mode**: change `#define MAIN_MODE` in `config.h` and reflash. **To add `EMG_STANDALONE` to `config.h`** (1-line change): ```c // config.h line 19 — current: enum {EMG_MAIN, SERVO_CALIBRATOR, GESTURE_TESTER}; // Update to: enum {EMG_MAIN, SERVO_CALIBRATOR, GESTURE_TESTER, EMG_STANDALONE}; ``` --- ## 0.3 FSM Reference (EMG_MAIN mode) The `device_state_t` enum in `main.c` and the `command_t` enum control all transitions. Currently: `{STATE_IDLE, STATE_CONNECTED, STATE_STREAMING, STATE_PREDICTING}`. A new state `STATE_LAPTOP_PREDICT` must be added (see Section 0.5). ``` STATE_IDLE └─ {"cmd":"connect"} ──────────────────────────► STATE_CONNECTED │ {"cmd":"start"} ──────────┤ │ STATE_STREAMING │ ESP32 sends raw ADC CSV at 1kHz │ Laptop: saves to HDF5 (data collection) │ Laptop: trains model → exports model_weights.h │ ◄──── {"cmd":"stop"} ────────────────────┘ │ {"cmd":"start_predict"} ─────────┤ │ STATE_PREDICTING │ ESP32: inference_predict() on-device │ ESP32: gestures_execute() │ Laptop: optional UART monitor only │ ◄──── {"cmd":"stop"} ────────────────────┘ │ {"cmd":"start_laptop_predict"} ───────┘ STATE_LAPTOP_PREDICT [NEW] ESP32: streams raw ADC CSV (same as STREAMING) Laptop: runs live_predict.py inference Laptop: sends {"gesture":"fist"} back ESP32: executes received gesture command ◄──── {"cmd":"stop"} ────────────────────┘ All active states: {"cmd":"stop"} → STATE_CONNECTED {"cmd":"disconnect"} → STATE_IDLE {"cmd":"connect"} → STATE_CONNECTED (from any state — reconnect) ``` **Convenience table of commands and their effects:** | JSON command | Valid from state | Result | |---|---|---| | `{"cmd":"connect"}` | Any | → `STATE_CONNECTED` | | `{"cmd":"start"}` | `STATE_CONNECTED` | → `STATE_STREAMING` | | `{"cmd":"start_predict"}` | `STATE_CONNECTED` | → `STATE_PREDICTING` | | `{"cmd":"start_laptop_predict"}` | `STATE_CONNECTED` | → `STATE_LAPTOP_PREDICT` (new) | | `{"cmd":"stop"}` | `STREAMING/PREDICTING/LAPTOP_PREDICT` | → `STATE_CONNECTED` | | `{"cmd":"disconnect"}` | Any active state | → `STATE_IDLE` | --- ## 0.4 EMG_STANDALONE Boot Sequence No UART handshake. No laptop required. Powers on → predicts → controls arm. ``` app_main() switch MAIN_MODE == EMG_STANDALONE: │ ├── hand_init() // servos ├── emg_sensor_init() // ADC setup ├── inference_init() // clear window buffer, reset smoothing state ├── calibration_init() // load NVS z-score params (Change D) │ └── if not found in NVS: │ collect 120 REST windows (~3s at 25ms hop) │ call calibration_update() to compute and store stats ├── bicep_load_threshold() // load NVS bicep threshold (Section 2.2) │ └── if not found: │ collect 3s of still bicep data │ call bicep_calibrate() and bicep_save_threshold() │ └── run_standalone_loop() ← NEW function (added to main.c) while (1): emg_sensor_read(&sample) inference_add_sample(sample.channels) if stride_counter++ >= INFERENCE_HOP_SIZE: stride_counter = 0 gesture_t g = inference_get_gesture_enum(inference_predict(&conf)) gestures_execute(g) bicep_state_t b = bicep_detect() // (future: bicep_actuate(b)) vTaskDelay(1) ``` `run_standalone_loop()` is structurally identical to `run_inference_loop()` in `EMG_MAIN`, minus all UART state-change checking and telemetry prints. It runs forever until power-off. **Where to add**: New function `run_standalone_loop()` in `app/main.c`, plus a new case in the `app_main()` switch block: ```c case EMG_STANDALONE: run_standalone_loop(); break; ``` --- ## 0.5 New Firmware Changes for Architecture These changes are needed to implement the architecture above. They are **structural** (not accuracy improvements) and should be done before any other changes. ### S1 — Add `EMG_STANDALONE` to `config.h` **File**: `EMG_Arm/src/config/config.h`, line 19 ```c // Change: enum {EMG_MAIN, SERVO_CALIBRATOR, GESTURE_TESTER}; // To: enum {EMG_MAIN, SERVO_CALIBRATOR, GESTURE_TESTER, EMG_STANDALONE}; ``` ### S2 — Add `STATE_LAPTOP_PREDICT` to FSM (`main.c`) **File**: `EMG_Arm/src/app/main.c` ```c // In device_state_t enum — add new state: typedef enum { STATE_IDLE = 0, STATE_CONNECTED, STATE_STREAMING, STATE_PREDICTING, STATE_LAPTOP_PREDICT, // ← ADD: streams ADC to laptop, executes laptop's gesture commands } device_state_t; // In command_t enum — add new command: typedef enum { CMD_NONE = 0, CMD_CONNECT, CMD_START, CMD_START_PREDICT, CMD_START_LAPTOP_PREDICT, // ← ADD CMD_STOP, CMD_DISCONNECT, } command_t; ``` **In `parse_command()`** — add detection (place BEFORE the `"start"` check to avoid prefix collision): ```c } else if (strncmp(value_start, "start_laptop_predict", 20) == 0) { return CMD_START_LAPTOP_PREDICT; } else if (strncmp(value_start, "start_predict", 13) == 0) { return CMD_START_PREDICT; } else if (strncmp(value_start, "start", 5) == 0) { return CMD_START; ``` **In `serial_input_task()` FSM switch** — add to `STATE_CONNECTED` block: ```c } else if (cmd == CMD_START_LAPTOP_PREDICT) { g_device_state = STATE_LAPTOP_PREDICT; printf("[STATE] CONNECTED -> LAPTOP_PREDICT\n"); xQueueSend(g_cmd_queue, &cmd, 0); } ``` **Add to the active-state check** in `serial_input_task()`: ```c case STATE_STREAMING: case STATE_PREDICTING: case STATE_LAPTOP_PREDICT: // ← ADD to the case list if (cmd == CMD_STOP) { ... } ``` **New function `run_laptop_predict_loop()`** (add alongside `stream_emg_data()` and `run_inference_loop()`): ```c /** * @brief Laptop-mediated prediction loop (STATE_LAPTOP_PREDICT). * * Streams raw ADC CSV to laptop for inference. * Simultaneously reads gesture commands sent back by laptop. * Executes received gesture immediately. * * Laptop sends: {"gesture":"fist"}\n OR {"gesture":"rest"}\n etc. * ESP32 parses the "gesture" field and calls inference_get_gesture_enum() + gestures_execute(). */ static void run_laptop_predict_loop(void) { emg_sample_t sample; char cmd_buf[64]; int cmd_idx = 0; printf("{\"status\":\"info\",\"msg\":\"Laptop-predict mode started\"}\n"); while (g_device_state == STATE_LAPTOP_PREDICT) { // 1. Send raw ADC sample (same format as STATE_STREAMING) emg_sensor_read(&sample); printf("%u,%u,%u,%u\n", sample.channels[0], sample.channels[1], sample.channels[2], sample.channels[3]); // 2. Non-blocking read of any incoming gesture command from laptop // (serial_input_task already handles FSM commands; this handles gesture commands) // Note: getchar() is non-blocking when there is no data (returns EOF). // Gesture messages from laptop look like: {"gesture":"fist"}\n int c = getchar(); if (c != EOF && c != 0xFF) { if (c == '\n' || c == '\r') { if (cmd_idx > 0) { cmd_buf[cmd_idx] = '\0'; // Parse {"gesture":""} — look for "gesture" field const char *g = strstr(cmd_buf, "\"gesture\""); if (g) { const char *v = strchr(g, ':'); if (v) { v++; while (*v == ' ' || *v == '"') v++; // Extract gesture name up to closing quote char name[32] = {0}; int ni = 0; while (*v && *v != '"' && ni < 31) name[ni++] = *v++; name[ni] = '\0'; // Map name to enum and execute (reuse inference mapping) gesture_t gesture = (gesture_t)inference_get_gesture_enum_by_name(name); if (gesture != GESTURE_NONE) { gestures_execute(gesture); } } } cmd_idx = 0; } } else if (cmd_idx < (int)sizeof(cmd_buf) - 1) { cmd_buf[cmd_idx++] = (char)c; } else { cmd_idx = 0; } } vTaskDelay(1); } } ``` **Note**: `inference_get_gesture_enum_by_name(const char *name)` is just the existing `inference_get_gesture_enum(int class_idx)` refactored to accept a string directly (bypassing the class_idx lookup). Alternatively, keep the existing function and add a simple wrapper — the string matching logic already exists in `inference.c`: ```c // Simpler: reuse the existing strcmp chain in inference_get_gesture_enum() // by passing the name through a helper that returns the gesture_t directly. // Add to inference.c / inference.h: gesture_t inference_get_gesture_by_name(const char *name); // (same strcmp logic as inference_get_gesture_enum, but returns gesture_t directly) ``` **In `state_machine_loop()`** — add the new state: ```c static void state_machine_loop(void) { command_t cmd; const TickType_t poll_interval = pdMS_TO_TICKS(50); while (1) { if (g_device_state == STATE_STREAMING) stream_emg_data(); else if (g_device_state == STATE_PREDICTING) run_inference_loop(); else if (g_device_state == STATE_LAPTOP_PREDICT) run_laptop_predict_loop(); // ← ADD xQueueReceive(g_cmd_queue, &cmd, poll_interval); } } ``` **In `app_main()` switch** — add the standalone case: ```c case EMG_STANDALONE: run_standalone_loop(); // new function — see Section 0.4 break; ``` --- ## 0.6 New Python Script: `live_predict.py` **Location**: `C:/VSCode/Marvel_Projects/Bucky_Arm/live_predict.py` (new file) **Purpose**: Laptop-side live inference. Reads raw ADC stream from ESP32, runs the Python classifier, sends gesture commands back to ESP32 for arm control. **When to use**: `EMG_MAIN` + `STATE_LAPTOP_PREDICT` — useful for debugging and comparing laptop accuracy vs on-device accuracy before flashing a new model. ```python """ live_predict.py — Laptop-side live EMG inference for Bucky Arm. Connects to ESP32, requests STATE_LAPTOP_PREDICT, reads raw ADC CSV, runs the trained Python classifier, sends gesture commands back to ESP32. Usage: python live_predict.py --port COM3 --model path/to/saved_model/ """ import argparse import time import numpy as np import serial from pathlib import Path import sys sys.path.insert(0, str(Path(__file__).parent)) from learning_data_collection import ( EMGClassifier, EMGFeatureExtractor, SessionStorage, HAND_CHANNELS, WINDOW_SIZE_SAMPLES, HOP_SIZE_SAMPLES, NUM_CHANNELS, ) BAUD_RATE = 921600 CALIB_SEC = 3.0 # seconds of REST to collect for normalization at startup CALIB_LABEL = "rest" # label used during calibration window def parse_args(): p = argparse.ArgumentParser() p.add_argument("--port", required=True, help="Serial port, e.g. COM3 or /dev/ttyUSB0") p.add_argument("--model", required=True, help="Path to saved EMGClassifier model directory") return p.parse_args() def handshake(ser): """Send connect command, wait for ack.""" ser.write(b'{"cmd":"connect"}\n') deadline = time.time() + 5.0 while time.time() < deadline: line = ser.readline().decode("utf-8", errors="ignore").strip() if "ack_connect" in line: print(f"[Handshake] Connected: {line}") return True raise RuntimeError("No ack_connect received within 5s") def collect_calibration_windows(ser, n_windows, window_size, hop_size, n_channels): """Collect n_windows worth of REST data for normalization calibration.""" print(f"[Calib] Collecting {n_windows} REST windows — hold arm still...") raw_buffer = np.zeros((window_size, n_channels), dtype=np.float32) windows = [] sample_count = 0 while len(windows) < n_windows: line = ser.readline().decode("utf-8", errors="ignore").strip() try: vals = [float(v) for v in line.split(",")] if len(vals) != n_channels: continue except ValueError: continue raw_buffer = np.roll(raw_buffer, -1, axis=0) raw_buffer[-1] = vals sample_count += 1 if sample_count >= window_size and sample_count % hop_size == 0: windows.append(raw_buffer.copy()) print(f"[Calib] Collected {len(windows)} windows. Computing normalization stats...") return np.array(windows) # (n_windows, window_size, n_channels) def main(): args = parse_args() # Load trained classifier print(f"[Init] Loading classifier from {args.model}...") classifier = EMGClassifier() classifier.load(Path(args.model)) extractor = classifier.feature_extractor ser = serial.Serial(args.port, BAUD_RATE, timeout=1.0) time.sleep(0.5) ser.reset_input_buffer() handshake(ser) # Request laptop-predict mode ser.write(b'{"cmd":"start_laptop_predict"}\n') print("[Control] Entered STATE_LAPTOP_PREDICT") # Calibration: collect 3s of REST for session normalization n_calib_windows = max(10, int(CALIB_SEC * 1000 / (HOP_SIZE_SAMPLES))) calib_raw = collect_calibration_windows( ser, n_calib_windows, WINDOW_SIZE_SAMPLES, HOP_SIZE_SAMPLES, NUM_CHANNELS ) calib_features = extractor.extract_features_batch(calib_raw) calib_mean = calib_features.mean(axis=0) calib_std = np.where(calib_features.std(axis=0) > 1e-6, calib_features.std(axis=0), 1e-6) print("[Calib] Done. Starting live prediction...") # Live prediction loop raw_buffer = np.zeros((WINDOW_SIZE_SAMPLES, NUM_CHANNELS), dtype=np.float32) sample_count = 0 last_gesture = None try: while True: line = ser.readline().decode("utf-8", errors="ignore").strip() # Skip JSON telemetry lines from ESP32 if line.startswith("{"): continue try: vals = [float(v) for v in line.split(",")] if len(vals) != NUM_CHANNELS: continue except ValueError: continue # Slide window raw_buffer = np.roll(raw_buffer, -1, axis=0) raw_buffer[-1] = vals sample_count += 1 if sample_count >= WINDOW_SIZE_SAMPLES and sample_count % HOP_SIZE_SAMPLES == 0: # Extract features and normalize with session stats feat = extractor.extract_features_window(raw_buffer) feat = (feat - calib_mean) / calib_std proba = classifier.model.predict_proba([feat])[0] class_idx = int(np.argmax(proba)) gesture_name = classifier.label_names[class_idx] confidence = float(proba[class_idx]) # Send gesture command to ESP32 cmd = f'{{"gesture":"{gesture_name}"}}\n' ser.write(cmd.encode("utf-8")) if gesture_name != last_gesture: print(f"[Predict] {gesture_name:12s} conf={confidence:.2f}") last_gesture = gesture_name except KeyboardInterrupt: print("\n[Stop] Sending stop command...") ser.write(b'{"cmd":"stop"}\n') ser.close() if __name__ == "__main__": main() ``` **Dependencies** (add to a `requirements.txt` in `Bucky_Arm/` if not already there): ``` pyserial numpy scikit-learn ``` --- ## 0.7 Firmware Cleanup: `system_mode_t` Removal `config.h` lines 94–100 define a `system_mode_t` typedef that is **not referenced anywhere** in the firmware. It predates the current `device_state_t` FSM in `main.c` and conflicts conceptually with it. Remove before starting implementation work. **File**: `EMG_Arm/src/config/config.h` **Remove** (lines 93–100): ```c /** * @brief System operating modes. */ typedef enum { MODE_IDLE = 0, /**< Waiting for commands */ MODE_DATA_STREAM, /**< Streaming EMG data to laptop */ MODE_COMMAND, /**< Executing gesture commands from laptop */ MODE_DEMO, /**< Running demo sequence */ MODE_COUNT } system_mode_t; ``` No other file references `system_mode_t` — the deletion is safe and requires no other changes. --- # PART I — SYSTEM FOUNDATIONS ## 1. Hardware Specification ### ESP32-S3 N32R16V — Confirmed Hardware | Resource | Spec | Implication | |----------|------|-------------| | CPU | Dual-core Xtensa LX7 @ 240 MHz | Pin inference to Core 1, sampling to Core 0 | | SIMD | PIE 128-bit vector extension | esp-dsp exploits this for FFT, biquad, dot-product | | Internal SRAM | ~512 KB | All hot-path buffers, model weights, inference state | | OPI PSRAM | 16 MB (~80 MB/s) | ADC ring buffer, raw window storage — not hot path | | Flash | 32 MB | Code + read-only model flatbuffers (TFLM path) | | ADC | 2× SAR ADC, 12-bit, continuous DMA mode | Change A: use `adc_continuous` driver | **Memory rules**: - Tag inference code: `IRAM_ATTR` — prevents cache miss stalls - Tag large ring buffers: `EXT_RAM_BSS_ATTR` — pushes to PSRAM automatically - Never run hot-path loops from PSRAM (latency varies; ~10× slower than SRAM) ### Espressif Acceleration Libraries | Library | Accelerates | Key Functions | |---------|-------------|---------------| | **esp-dsp** | IIR biquad, FFT (up to 4096-pt), vector dot-product, matrix ops — PIE SIMD | `dsps_biquad_f32`, `dsps_fft2r_fc32`, `dsps_dotprod_f32` | | **esp-nn** | int8 FC, depthwise/pointwise Conv, activations — SIMD optimized | Used internally by esp-dl | | **esp-dl** | High-level int8 inference: MLP, Conv1D, LSTM; activation buffer management | Small MLP / tiny CNN deployment | | **TFLite Micro** | Standard int8 flatbuffer inference, tensor arena (static alloc) | Keras → TFLite → int8 workflow | ### Real-Time Budget (1000 Hz, 25ms hop) | Stage | Cost | Notes | |-------|------|-------| | ADC DMA sampling | ~0 µs | Hardware; CPU-free | | IIR biquad (3 ch, 2 stages) | <100 µs | `dsps_biquad_f32` | | Feature extraction (69 feat) | ~1,200 µs | FFT-based features dominate | | 3 specialist LDAs | ~150 µs | `dsps_dotprod_f32` per class | | Meta-LDA (15 inputs) | ~10 µs | 75 MACs total | | int8 MLP fallback [69→32→16→5] | ~250 µs | esp-nn FC kernels | | Post-processing | <50 µs | EMA, vote, debounce | | **Total (full ensemble)** | **~1,760 µs** | **14× margin within 25ms** | ### Hard No-Gos | Technique | Why | |-----------|-----| | Full MPF with matrix logarithm | Eigendecomposition per window; fragile float32; no SIMD path | | Conv1D(16→512) + 3×LSTM(512) | ~4 MB weights; LSTM sequential dependency — impossible | | Any transformer / attention | O(n²); no int8 transformer kernels for MCU | | On-device gradient updates | Inference only — no training infrastructure | | Heap allocations on hot path | FreeRTOS heap fragmentation kills determinism | --- ## 2. Current System Snapshot | Aspect | Current State | |--------|--------------| | Channels | 4 total; ch0–ch2 forearm (FCR, FCU, extensor), ch3 bicep (excluded from hand classifier) | | Sampling | 1000 Hz, timer/polling (jitter — fix with Change A) | | Window | 150 samples (150ms), 25-sample hop (25ms) | | Features | 12: RMS, WL, ZC, SSC × 3 channels | | Classifier | Single LDA, float32 weights in C header | | Label alignment | RMS onset detection — missing +100ms forward shift (Change 0) | | Normalization | Per-session z-score in Python; no on-device equivalent (Change D) | | Smoothing | EMA (α=0.7) + majority vote (5) + debounce (3 counts) | | Confidence rejection | None — always outputs a class (Change C) | | Signal filtering | Analogue only via MyoWare (Change B adds software IIR) | | Gestures | 5: fist, hook\_em, open, rest, thumbs\_up | | Training data | 15 HDF5 sessions, 1 user | --- ## 2.1 — Confirmed Firmware Architecture (From Codebase Exploration) > Confirmed by direct codebase inspection 2026-02-24. All file paths relative to > `C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/` ### ADC Pin Mapping (`drivers/emg_sensor.c`) | Channel | ADC Channel | GPIO | Muscle Location | Role in Classifier | |---------|-------------|------|-----------------|-------------------| | ch0 | `ADC_CHANNEL_1` | GPIO 2 | Forearm Belly (FCR) | Primary flexion signal | | ch1 | `ADC_CHANNEL_2` | GPIO 3 | Forearm Extensors | Extension signal | | ch2 | `ADC_CHANNEL_8` | GPIO 9 | Forearm Contractors (FCU) | Ulnar flexion signal | | ch3 | `ADC_CHANNEL_9` | GPIO 10 | Bicep | Independent — see Section 2.2 | **Current ADC driver**: `adc_oneshot` (polling — **NOT DMA continuous yet**; Change A migrates this) - Attenuation: `ADC_ATTEN_DB_12` (0–3.9V full-scale range) - Calibration: `adc_cali_curve_fitting` scheme - Output: calibrated millivolts as `uint16_t` packed into `emg_sample_t.channels[4]` - Timing: `vTaskDelay(1)` in `run_inference_loop()` provides the ~1ms sample interval ### Current Task Structure (`app/main.c`) | Task | Priority | Stack | Core Pinning | Role | |------|----------|-------|--------------|------| | `app_main` (implicit) | Default | Default | None | Runs inference loop + state machine | | `serial_input_task` | 5 | 4096 B | **None** | Parses UART JSON commands | **No other tasks exist.** Change A will add `adc_sampling_task` pinned to Core 0. The inference loop runs on `app_main`'s default task — no explicit core affinity. ### State Machine (`app/main.c`) ``` STATE_IDLE ─(BLE/UART connect)─► STATE_CONNECTED │ {"cmd": "start_stream"}▼ STATE_STREAMING (sends raw ADC over UART for Python) │ {"cmd": "start_predict"}▼ STATE_PREDICTING (runs run_inference_loop()) ``` Communication: UART at 921600 baud, JSON framing. ### Complete Data Flow (Exact Function Names) ``` emg_sensor_read(&sample) │ drivers/emg_sensor.c │ adc_oneshot_read() × 4 channels → adc_cali_raw_to_voltage() → uint16_t mV │ Result: sample.channels[4] = {ch0_mV, ch1_mV, ch2_mV, ch3_mV} │ ▼ Called every ~1ms (vTaskDelay(1) in run_inference_loop) inference_add_sample(sample.channels) │ core/inference.c │ Writes to circular window_buffer[150][4] │ Returns true when buffer is full (after first 150 samples) │ ▼ Called every 25 samples (stride_counter % INFERENCE_HOP_SIZE == 0) inference_predict(&confidence) │ core/inference.c │ compute_features() → LDA scores → softmax → EMA → majority vote → debounce │ Returns: gesture class index (int), fills confidence (float) │ ▼ inference_get_gesture_enum(class_idx) │ core/inference.c │ String match on MODEL_CLASS_NAMES[] → gesture_t enum value │ ▼ gestures_execute(gesture) core/gestures.c switch(gesture) → servo PWM via LEDC driver Servo pins: GPIO 1,4,5,6,7 (Thumb, Index, Middle, Ring, Pinky) ``` ### Current Buffer State ```c // core/inference.c line 19: static uint16_t window_buffer[INFERENCE_WINDOW_SIZE][NUM_CHANNELS]; // ^^^^^^^^ MUST change to float when adding IIR filter (Change B) // // uint16_t: 150 × 4 × 2 = 1,200 bytes in internal SRAM // float: 150 × 4 × 4 = 2,400 bytes in internal SRAM (still trivially small) // // Reason for change: IIR filter outputs float; casting back to uint16_t loses // sub-mV precision and re-introduces the quantization noise we just filtered out. ``` ### `platformio.ini` Current State (`EMG_Arm/platformio.ini`) **Current `lib_deps`**: **None** — completely empty, no external library dependencies. Required additions per change tier: | Change | Library | `platformio.ini` `lib_deps` entry | |--------|---------|----------------------------------| | B (IIR biquad) | esp-dsp | `espressif/esp-dsp @ ^2.0.0` | | 1 (FFT features) | esp-dsp | (same — add once for both B and 1) | | E (int8 MLP) | TFLite Micro | `tensorflow/tflite-micro` | | F (ensemble) | esp-dsp | (same as B) | Add to `platformio.ini` under `[env:esp32-s3-devkitc1-n16r16]`: ```ini lib_deps = espressif/esp-dsp @ ^2.0.0 ; tensorflow/tflite-micro ← add this only when implementing Change E ``` --- ## 2.2 — Bicep Channel Subsystem (ch3 / ADC_CHANNEL_9 / GPIO 10) ### Current Status The bicep channel is: - **Sampled**: `emg_sensor_read()` reads all 4 channels; `sample.channels[3]` holds bicep data - **Excluded from hand classifier**: `HAND_NUM_CHANNELS = 3`; `compute_features()` explicitly loops `ch = 0` to `ch < HAND_NUM_CHANNELS` (i.e., ch0, ch1, ch2 only) - **Not yet independently processed**: the comment in `inference.c` line 68 (`"ch3 (bicep) is excluded — it will be processed independently"`) is aspirational — the independent processing is not yet implemented ### Phase 1 — Binary Flex/Unflex (Current Target) Implement a simple RMS threshold detector as a new subsystem: **New files:** ``` EMG_Arm/src/core/bicep.h EMG_Arm/src/core/bicep.c ``` **bicep.h:** ```c #pragma once #include #include typedef enum { BICEP_STATE_REST = 0, BICEP_STATE_FLEX = 1, } bicep_state_t; // Call once at session start with ~3s of relaxed bicep data. // Returns the computed threshold (also stored internally). float bicep_calibrate(const uint16_t *ch3_samples, int n_samples); // Call every 25ms (same hop as hand gesture inference). // Computes RMS on the last BICEP_WINDOW_SAMPLES from the ch3 circular buffer. bicep_state_t bicep_detect(void); // Load/save threshold to NVS (reuse calibration.c infrastructure from Change D) bool bicep_save_threshold(float threshold_mv); bool bicep_load_threshold(float *threshold_mv_out); ``` **Core logic (`bicep.c`):** ```c #define BICEP_WINDOW_SAMPLES 50 // 50ms window at 1000Hz #define BICEP_FLEX_MULTIPLIER 2.5f // threshold = rest_rms × 2.5 #define BICEP_HYSTERESIS 1.3f // prevents rapid toggling at threshold boundary static float s_threshold_mv = 0.0f; static bicep_state_t s_state = BICEP_STATE_REST; float bicep_calibrate(const uint16_t *ch3_samples, int n_samples) { float rms_sq = 0.0f; for (int i = 0; i < n_samples; i++) rms_sq += (float)ch3_samples[i] * ch3_samples[i]; float rest_rms = sqrtf(rms_sq / n_samples); s_threshold_mv = rest_rms * BICEP_FLEX_MULTIPLIER; printf("[Bicep] Calibrated: rest_rms=%.1f mV, threshold=%.1f mV\n", rest_rms, s_threshold_mv); return s_threshold_mv; } bicep_state_t bicep_detect(void) { // Compute RMS on last BICEP_WINDOW_SAMPLES from ch3 circular buffer // (ch3 values are stored in window_buffer[][3] alongside hand channels) float rms_sq = 0.0f; int idx = buffer_head; for (int i = 0; i < BICEP_WINDOW_SAMPLES; i++) { float v = (float)window_buffer[idx][3]; // ch3 = bicep rms_sq += v * v; idx = (idx + 1) % INFERENCE_WINDOW_SIZE; } float rms = sqrtf(rms_sq / BICEP_WINDOW_SAMPLES); // Hysteresis: require FLEX_MULTIPLIER to enter flex, 1.0× to exit if (s_state == BICEP_STATE_REST && rms > s_threshold_mv * BICEP_HYSTERESIS) s_state = BICEP_STATE_FLEX; else if (s_state == BICEP_STATE_FLEX && rms < s_threshold_mv) s_state = BICEP_STATE_REST; return s_state; } ``` **Integration in `main.c` `run_inference_loop()`:** ```c // Call alongside inference_predict() every 25ms: if (stride_counter % INFERENCE_HOP_SIZE == 0) { float confidence; int class_idx = inference_predict(&confidence); gesture_t gesture = inference_get_gesture_enum(class_idx); bicep_state_t bicep = bicep_detect(); // Combined actuation: hand gesture + bicep state // Example: bicep flex can enable/disable certain gestures, // or control a separate elbow/wrist joint. gestures_execute(gesture); // bicep_actuate(bicep); ← add when elbow motor is wired } ``` **Calibration trigger (add to serial_input_task command parsing):** ```c // {"cmd": "calibrate_bicep"} → collect 3s of rest data, call bicep_calibrate() ``` ### Phase 2 — Continuous Angle/Velocity Prediction (Future) When ready to move beyond binary flex/unflex: 1. **Collect angle-labeled data**: hold arm at 0°, 15°, 30°, 45°, 60°, 75°, 90°; log RMS at each; collect 5+ reps per angle. 2. **Fit polynomial**: `angle = a0 + a1*rms + a2*rms²` (degree-2 usually sufficient); use `numpy.polyfit(rms_values, angles, deg=2)`. 3. **Store coefficients in NVS**: 3 floats via `nvs_set_blob()`. 4. **On-device evaluation**: `angle = a0 + rms*(a1 + rms*a2)` — 2 MACs per inference. 5. **Velocity**: `velocity = (angle_now - angle_prev) / HOP_MS` with low-pass smoothing. ### Including ch3 in Hand Gesture Classifier (for Wrist Rotation) If/when wrist rotation or supination gestures are added: ```python # learning_data_collection.py — change this constant: HAND_CHANNELS = [0, 1, 2, 3] # was [0, 1, 2]; include bicep for rotation gestures ``` Feature count becomes: 4 channels × 20 per-ch + 10 cross-ch covariances + 6 correlations = **96 total**. The bicep subsystem is then retired and ch3 becomes part of the main gesture classifier. --- ## 3. What Meta Built — Filtered for ESP32 Meta's Nature 2025 paper (doi:10.1038/s41586-025-09255-w) describes a 16-channel wristband running Conv1D(16→512)+3×LSTM(512). **That exact model is not portable to ESP32-S3** (~4 MB weights). What IS transferable: | Meta Technique | Transferability | Where Used | |----------------|-----------------|-----------| | +100ms forward label shift after onset detection | ✓ Direct copy | Change 0 | | Frequency features > amplitude features (Extended Data Fig. 6) | ✓ Core insight | Change 1, Change 6 | | Deliberate electrode repositioning between sessions | ✓ Protocol | Change 2 | | Window jitter + amplitude augmentation | ✓ Training | Change 3 | | Reinhard compression `64x/(32+|x|)` | ✓ Optional flag | Change 4 | | EMA α=0.7, threshold=0.35, debounce=50ms | ✓ Already implemented | Change C | | Specialist features → meta-learner stacking | ✓ Adapted | Change 7 + F | | Conv1D+LSTM architecture | ✗ Too large | Not implementable | | Full MPF with matrix logarithm | ✗ Eigendecomp too costly | Not implementable | --- ## 4. Current Code State + Known Bugs **All Python changes**: `C:/VSCode/Marvel_Projects/Bucky_Arm/learning_data_collection.py` **Firmware**: `C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/core/inference.c` **Config**: `C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/config/config.h` **Weights**: `C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/core/model_weights.h` ### Key Symbol Locations | Symbol | Line | Notes | |--------|------|-------| | Constants block | 49–94 | `NUM_CHANNELS`, `SAMPLING_RATE_HZ`, `WINDOW_SIZE_MS`, etc. | | `align_labels_with_onset()` | 442 | RMS onset detection | | `filter_transition_windows()` | 529 | Removes onset/offset ambiguity windows | | `SessionStorage.save_session()` | 643 | Calls onset alignment, saves HDF5 | | `SessionStorage.load_all_for_training()` | 871 | Returns 6 values (see bug below) | | `EMGFeatureExtractor` class | 1404 | Current: RMS, WL, ZC, SSC only | | `extract_features_single_channel()` | 1448 | Per-channel feature dict | | `extract_features_window()` | 1482 | Flat array + cross-channel | | `extract_features_batch()` | 1520 | Batch wrapper | | `get_feature_names()` | 1545 | String names for features | | `CalibrationTransform` class | 1562 | z-score at Python-side inference | | `EMGClassifier` class | 1713 | LDA/QDA wrapper | | `EMGClassifier.__init__()` | 1722 | Creates `EMGFeatureExtractor` | | `EMGClassifier.train()` | 1735 | Feature extraction + model fit | | `EMGClassifier._apply_session_normalization()` | 1774 | Per-session z-score | | `EMGClassifier.cross_validate()` | 1822 | GroupKFold, trial-level | | `EMGClassifier.export_to_header()` | 1956 | Writes `model_weights.h` | | `EMGClassifier.save()` | 1910 | Persists model params | | `EMGClassifier.load()` | 2089 | Reconstructs from saved params | | `run_training_demo()` | 2333 | Main training entry point | | `inference.c` `compute_features()` | 68 | C feature extraction | | `inference.c` `inference_predict()` | 158 | C LDA + smoothing pipeline | ### Pending Cleanups (Do Before Any Other Code Changes) | Item | File | Action | |------|------|--------| | Remove `system_mode_t` | `config/config.h` lines 93–100 | Delete the unused typedef (see Part 0, Section 0.7) | | Add `EMG_STANDALONE` to enum | `config/config.h` line 19 | Add value to the existing MAIN_MODE enum | | Add `STATE_LAPTOP_PREDICT` + `CMD_START_LAPTOP_PREDICT` | `app/main.c` | See Part 0, Section 0.5 for exact diffs | | Add `run_standalone_loop()` | `app/main.c` | New function — see Part 0, Section 0.4 | | Add `run_laptop_predict_loop()` | `app/main.c` | New function — see Part 0, Section 0.5 | | Add `inference_get_gesture_by_name()` | `core/inference.c` + `core/inference.h` | Small helper — extracts existing strcmp logic | ### Known Bug — Line 2382 ```python # BUG: load_all_for_training() returns 6 values; this call unpacks only 5. # session_indices_combined is silently dropped — breaks per-session normalization. X, y, trial_ids, label_names, loaded_sessions = storage.load_all_for_training() # FIX (apply with Change 1): X, y, trial_ids, session_indices, label_names, loaded_sessions = storage.load_all_for_training() ``` ### Current `model_weights.h` State (as of 2026-02-14 training run) | Constant | Value | Note | |----------|-------|------| | `MODEL_NUM_CLASSES` | 5 | fist, hook_em, open, rest, thumbs_up | | `MODEL_NUM_FEATURES` | 12 | RMS, WL, ZC, SSC × 3 forearm channels | | `MODEL_CLASS_NAMES` | `{"fist","hook_em","open","rest","thumbs_up"}` | Alphabetical order | | `MODEL_NORMALIZE_FEATURES` | *not defined yet* | Add when enabling cross-ch norm (Change B) | | `MODEL_USE_REINHARD` | *not defined yet* | Add when enabling Reinhard compression (Change 4) | | `FEAT_ZC_THRESH` | `0.1f` | Fraction of RMS for zero-crossing threshold | | `FEAT_SSC_THRESH` | `0.1f` | Fraction of RMS for slope sign change threshold | The LDA_WEIGHTS and LDA_INTERCEPTS arrays are current trained values — do not modify manually. They are regenerated by `EMGClassifier.export_to_header()` after each training run. ### Current Feature Vector (12 features — firmware contract) ``` ch0: [0]=rms [1]=wl [2]=zc [3]=ssc ch1: [4]=rms [5]=wl [6]=zc [7]=ssc ch2: [8]=rms [9]=wl [10]=zc [11]=ssc ``` ### Target Feature Vector (69 features after Change 1) ``` Per channel (×3 channels, 20 features each): [0] rms [1] wl [2] zc [3] ssc [4] mav [5] var [6] iemg [7] wamp [8] ar1 [9] ar2 [10] ar3 [11] ar4 [12] mnf [13] mdf [14] pkf [15] mnp [16] bp0 [17] bp1 [18] bp2 [19] bp3 ch0: indices 0–19 ch1: indices 20–39 ch2: indices 40–59 Cross-channel (9 features): [60] cov_ch0_ch0 [61] cov_ch0_ch1 [62] cov_ch0_ch2 [63] cov_ch1_ch1 [64] cov_ch1_ch2 [65] cov_ch2_ch2 [66] cor_ch0_ch1 [67] cor_ch0_ch2 [68] cor_ch1_ch2 ``` ### Specialist Feature Subset Indices (for Change F + Change 7) ``` TD (time-domain, 36 feat): indices [0–11, 20–31, 40–51] FD (frequency-domain, 24 feat): indices [12–19, 32–39, 52–59] CC (cross-channel, 9 feat): indices [60–68] ``` --- # PART II — TARGET ARCHITECTURE ## 5. Full Recommended Multi-Model Stack ``` ADC (DMA, Change A) └── IIR Biquad filter per channel (Change B) └── 150-sample circular window buffer │ ▼ [every 25ms] compute_features() → 69-feature vector │ ▼ calibration_apply() (Change D — NVS z-score) │ ├─── Stage 1: Activity Gate ──────────────────────────────────┐ │ total_rms < REST_THRESHOLD? → return GESTURE_REST │ │ (skips all inference during obvious idle) │ │ │ ▼ (only reached when gesture is active) │ Stage 2: Parallel Specialist LDAs (Change F) │ ├── LDA_TD [TD features, 36-dim] → prob_td[5] │ ├── LDA_FD [FD features, 24-dim] → prob_fd[5] │ └── LDA_CC [CC features, 9-dim] → prob_cc[5] │ │ ▼ │ Stage 3: Meta-LDA stacker (Change F) │ input: [prob_td | prob_fd | prob_cc] (15-dim) │ output: meta_probs[5] │ │ ▼ │ EMA smoothing (α=0.7) on meta_probs │ │ │ ├── max smoothed prob ≥ 0.50? ────── Yes ──────────────────┐ │ │ │ │ └── No: Stage 4 Confidence Cascade (Change E) │ │ run int8 MLP on full 69-feat vector │ │ use higher-confidence winner │ │ │ │ │ └────────────────────────────────────────────►│ │ │ │ ◄────────────────────────────────────────────────────────── │ │ │ ◄─┘ ▼ Stage 5: Confidence rejection (Change C) max_prob < 0.40? → return current_output (hold / GESTURE_NONE) │ ▼ Majority vote (window=5) + Debounce (count=3) │ ▼ final gesture → actuation ``` ### Model Weight Footprint | Model | Input Dim | Weights | Memory (float32) | |-------|-----------|---------|-----------------| | LDA_TD | 36 | 5×36 = 180 | 720 B | | LDA_FD | 24 | 5×24 = 120 | 480 B | | LDA_CC | 9 | 5×9 = 45 | 180 B | | Meta-LDA | 15 | 5×15 = 75 | 300 B | | int8 MLP [69→32→16→5] | 69 | ~2,900 | ~2.9 KB int8 | | **Total** | | | **~4.6 KB** | All model weights fit comfortably in internal SRAM. --- ## 6. Compute Budget for Full Stack | Stage | Cost | Cumulative | |-------|------|-----------| | Feature extraction (69 feat, 128-pt FFT ×3) | 1,200 µs | 1,200 µs | | NVS calibration apply | 10 µs | 1,210 µs | | Activity gate (RMS check) | 5 µs | 1,215 µs | | LDA_TD (36 feat × 5 classes) | 50 µs | 1,265 µs | | LDA_FD (24 feat × 5 classes) | 35 µs | 1,300 µs | | LDA_CC (9 feat × 5 classes) | 15 µs | 1,315 µs | | Meta-LDA (15 feat × 5 classes) | 10 µs | 1,325 µs | | EMA + confidence check | 10 µs | 1,335 µs | | int8 MLP (worst case, ~30% of hops) | 250 µs | 1,585 µs | | Vote + debounce | 20 µs | 1,605 µs | | **Worst-case total** | **1,760 µs** | **7% of 25ms budget** | --- ## 7. Why This Architecture Works for 3-Channel EMG Three channels means limited spatial information. The ensemble compensates by extracting **maximum diversity from the temporal and spectral dimensions**: - **LDA_TD** specializes in muscle activation *intensity and dynamics* (how hard and fast is each muscle firing) - **LDA_FD** specializes in muscle activation *frequency content* (motor unit recruitment patterns — slow vs. fast twitch fibres fire at different frequencies) - **LDA_CC** specializes in *inter-muscle coordination* (which muscles co-activate — the spatial "fingerprint" of each gesture) These three signal aspects are partially uncorrelated. A gesture that confuses LDA_TD (similar amplitude patterns) may be distinguishable by LDA_FD (different frequency recruitment) or LDA_CC (different co-activation pattern). The meta-LDA learns which specialist to trust for each gesture boundary. The int8 MLP fallback handles the residual nonlinear cases: gesture pairs where the decision boundary is curved in feature space, which LDA (linear boundary only) cannot resolve. --- # PART III — GESTURE EXTENSIBILITY ## 8. What Changes When Adding or Removing a Gesture The system is designed for extensibility. Adding a gesture requires **3 firmware lines and a retrain**. ### What Changes Automatically (No Manual Code Edits) | Component | How it adapts | |-----------|--------------| | `MODEL_NUM_CLASSES` in `model_weights.h` | Auto-computed from training data label count | | LDA weight array dimensions | `[MODEL_NUM_CLASSES][MODEL_NUM_FEATURES]` — regenerated by `export_to_header()` | | `MODEL_CLASS_NAMES` array | Regenerated by `export_to_header()` | | All ensemble LDA weight arrays | Regenerated by `export_ensemble_header()` (Change 7) | | int8 MLP output layer | Retrained with new class count; re-exported to TFLite | | Meta-LDA input/output dims | `META_NUM_INPUTS = 3 × MODEL_NUM_CLASSES` — auto from Python | ### What Requires Manual Code Changes **Python side** (`learning_data_collection.py`): ```python # 1. Add gesture name to the gesture list (1 line) # Find where GESTURES or similar list is defined (near constants block ~line 49) GESTURES = ['fist', 'hook_em', 'open', 'rest', 'thumbs_up', 'wrist_flex'] # example ``` **Firmware — `config.h`** (1 line per gesture): ```c // Add enum value typedef enum { GESTURE_NONE = 0, GESTURE_REST = 1, GESTURE_FIST = 2, GESTURE_OPEN = 3, GESTURE_HOOK_EM = 4, GESTURE_THUMBS_UP = 5, GESTURE_WRIST_FLEX = 6, // ← add this line } gesture_t; ``` **Firmware — `inference.c`** `inference_get_gesture_enum()` (2–3 lines per gesture): ```c if (strcmp(name, "wrist_flex") == 0 || strcmp(name, "WRIST_FLEX") == 0) return GESTURE_WRIST_FLEX; ``` **Firmware — `gestures.c`** (2 changes — these are easy to miss): ```c // 1. Add to gesture_names[] static array — index MUST match gesture_t enum value: static const char *gesture_names[GESTURE_COUNT] = { "NONE", // GESTURE_NONE = 0 "REST", // GESTURE_REST = 1 "FIST", // GESTURE_FIST = 2 "OPEN", // GESTURE_OPEN = 3 "HOOK_EM", // GESTURE_HOOK_EM = 4 "THUMBS_UP", // GESTURE_THUMBS_UP = 5 "WRIST_FLEX", // GESTURE_WRIST_FLEX = 6 ← add here }; // 2. Add case to gestures_execute() switch statement: case GESTURE_WRIST_FLEX: gesture_wrist_flex(); // implement the actuation function break; ``` **Critical**: `GESTURE_COUNT` at the end of the `gesture_t` enum in `config.h` is used as the array size for `gesture_names[]`. It updates automatically when new enum values are added before it. Both `gesture_names[GESTURE_COUNT]` and the switch statement must be kept in sync with `GESTURE_COUNT`. Mismatch causes a bounds-overrun or silent misclassification. ### Complete Workflow for Adding a Gesture ``` 1. Python: add gesture string to GESTURES list in learning_data_collection.py (1 line) 2. Data: collect ≥10 sessions × ≥30 reps of new gesture (follow Change 2 protocol: vary electrode placement between sessions) 3. Train: python learning_data_collection.py → option 3 OR: python train_ensemble.py (after Change 7 is implemented) 4. Export: export_to_header() OR export_ensemble_header() → overwrites model_weights.h / model_weights_ensemble.h with new class count 5. config.h: add enum value before GESTURE_COUNT (1 line): GESTURE_WRIST_FLEX = 6, // ← insert before GESTURE_COUNT GESTURE_COUNT // stays last — auto-counts 6. inference.c: add string mapping in inference_get_gesture_enum() (2 lines) 7. gestures.c: add name to gesture_names[] array at correct index (1 line) 8. gestures.c: add case to gestures_execute() switch statement (3 lines) 9. Implement actuation function for new gesture (servo angles) 10. Reflash and validate: pio run -t upload ``` **Exact files touched per new gesture (summary):** | File | What to change | |------|---------------| | `learning_data_collection.py` | Add string to GESTURES list | | `config/config.h` | Add enum value before `GESTURE_COUNT` | | `core/inference.c` | Add `strcmp` case in `inference_get_gesture_enum()` | | `core/gestures.c` | Add to `gesture_names[]` array + add switch case | | `core/gestures.c` | Implement `gesture_()` function with servo angles | | `core/model_weights.h` | Auto-generated — do not edit manually | ### Removing a Gesture Removing is the same process in reverse, with one additional step: filter the HDF5 training data to exclude sessions that contain the removed gesture's label. The simplest approach is to pass a label whitelist to `load_all_for_training()`: ```python # Proposed addition to load_all_for_training() — add include_labels parameter X, y, trial_ids, session_indices, label_names, sessions = \ storage.load_all_for_training(include_labels=['fist', 'open', 'rest', 'thumbs_up']) # hook_em removed — existing session files are not modified ``` --- ## 9. Practical Limits of 3-Channel EMG This is the most important constraint for gesture count: | Gesture Count | Expected Accuracy | Notes | |--------------|-------------------|-------| | 3–5 gestures | >90% achievable | Current baseline target | | 6–8 gestures | 80–90% achievable | Requires richer features + ensemble | | 9–12 gestures | 65–80% achievable | Diminishing returns; some pairs will be confused | | 13+ gestures | <65% | Surface EMG with 3 channels cannot reliably separate this many | **Why 3 channels limits gesture count**: Surface EMG captures the summed electrical activity of many motor units under each electrode. With only 3 spatial locations, gestures that recruit overlapping muscle groups (e.g., all finger-flexion gestures recruit FCR) produce similar signals. The frequency and coordination features from Change 1 help, but there's a hard information-theoretic limit imposed by channel count. **Rule of thumb**: aim for ≤8 gestures with the current 3-channel setup. For more, add the bicep channel (ch3, currently excluded) to get 4 channels — see Section 10. --- ## 10. Specific Gesture Considerations ### Wrist Flexion / Extension - **Feasibility**: High — FCR (ch0) activates strongly for flexion; extensor group (ch2) for extension - **Differentiation from finger gestures**: frequency content differs (wrist involves slower motor units) - **Recommendation**: Add these before wrist rotation — more reliable with surface EMG ### Wrist Rotation (Supination / Pronation) - **Feasibility**: Medium — the primary supinator is a deep muscle; surface electrodes capture it weakly - **Key helper**: the bicep activates strongly during supination → **include ch3** (`HAND_CHANNELS = [0, 1, 2, 3]`) - **Code change for 4 channels**: Python: `HAND_CHANNELS = [0, 1, 2, 3]`; firmware: `HAND_NUM_CHANNELS` auto-updates from the exported header since `MODEL_NUM_FEATURES` is recalculated - **Caveat**: pronation vs. rest may be harder to distinguish than supination vs. rest ### Pinch / Precision Grasp - **Feasibility**: Medium — involves intrinsic hand muscles poorly captured by forearm electrodes - Likely confused with open hand depending on electrode placement - Collect with careful placement; validate cross-session accuracy before relying on it ### Including ch3 (Bicep) for Wrist Gestures To include the bicep channel in the hand gesture classifier: ```python # learning_data_collection.py — change this constant HAND_CHANNELS = [0, 1, 2, 3] # was [0, 1, 2] — add bicep channel ``` Feature count: 4 channels × 20 per-channel features + 10 cross-channel covariances + 6 correlations = **96 total features**. The ensemble architecture handles this automatically — specialist LDA weight dimensions recalculate at training time. --- # PART IV — CHANGE REFERENCE ## 11. Change Classification Matrix | Change | Category | Priority | Files | ESP32 Reflash? | Retrain? | Risk | |--------|----------|----------|-------|----------------|----------|------| | **C** | Firmware | **Tier 1** | inference.c | ✓ | No | **Very Low** | | **B** | Firmware | **Tier 1** | inference.c / filter.c | ✓ | No | Low | | **A** | Firmware | **Tier 1** | adc_sampling.c | ✓ | No | Medium | | **0** | Python | **Tier 1** | learning_data_collection.py | No | ✓ | Low | | **1** | Python+C | **Tier 2** | learning_data_collection.py + inference.c | ✓ after | ✓ | Medium | | **D** | Firmware | **Tier 2** | calibration.c/.h | ✓ | No | Medium | | **2** | Protocol | **Tier 2** | None | No | ✓ new data | None | | **3** | Python | **Tier 2** | learning_data_collection.py | No | ✓ | Low | | **E** | Python+FW | **Tier 3** | train_mlp_tflite.py + firmware | ✓ | ✓ | High | | **4** | Python+C | **Tier 3** | learning_data_collection.py + inference.c | ✓ if enabled | ✓ | Low | | **5** | Python | **Tier 3** | learning_data_collection.py | No | No | None | | **6** | Python | **Tier 3** | learning_data_collection.py | No | ✓ | Low | | **7** | Python | **Tier 3** | new: train_ensemble.py | No | ✓ | Medium | | **F** | Firmware | **Tier 3** | new: inference_ensemble.c | ✓ | No (needs 7 first) | Medium | **Recommended implementation order**: C → B → A → 0 → 1 → D → 2 → 3 → 5 (benchmark) → 7+F → E --- # PART V — FIRMWARE CHANGES ## Change A — DMA-Driven ADC Sampling (Migration from `adc_oneshot` to `adc_continuous`) **Priority**: Tier 1 **Current driver**: `adc_oneshot_read()` polling in `drivers/emg_sensor.c`. Timing is controlled by `vTaskDelay(1)` in `run_inference_loop()` — subject to FreeRTOS scheduler jitter of ±0.5–1ms, which corrupts frequency-domain features and ADC burst grouping. **Why**: `adc_continuous` runs entirely in hardware DMA. Sample-to-sample jitter drops from ±1ms to <10µs. CPU overhead between samples is zero. Required for frequency features (Change 1). **Effort**: 2–4 hours (replace `emg_sensor_read()` internals; keep public API the same) ### ESP-IDF ADC Continuous API ```c // --- Initialize (call once at startup) --- adc_continuous_handle_t adc_handle = NULL; adc_continuous_handle_cfg_t adc_cfg = { .max_store_buf_size = 4096, // PSRAM ring buffer size (bytes) .conv_frame_size = 256, // bytes per conversion frame }; adc_continuous_new_handle(&adc_cfg, &adc_handle); // Actual hardware channel mapping (from emg_sensor.c): // ch0 = ADC_CHANNEL_1 / GPIO 2 (Forearm Belly / FCR) // ch1 = ADC_CHANNEL_2 / GPIO 3 (Forearm Extensors) // ch2 = ADC_CHANNEL_8 / GPIO 9 (Forearm Contractors / FCU) // ch3 = ADC_CHANNEL_9 / GPIO 10 (Bicep — independent subsystem) adc_digi_pattern_config_t chan_cfg[4] = { {.atten = ADC_ATTEN_DB_12, .channel = ADC_CHANNEL_1, .unit = ADC_UNIT_1, .bit_width = ADC_BITWIDTH_12}, {.atten = ADC_ATTEN_DB_12, .channel = ADC_CHANNEL_2, .unit = ADC_UNIT_1, .bit_width = ADC_BITWIDTH_12}, {.atten = ADC_ATTEN_DB_12, .channel = ADC_CHANNEL_8, .unit = ADC_UNIT_1, .bit_width = ADC_BITWIDTH_12}, {.atten = ADC_ATTEN_DB_12, .channel = ADC_CHANNEL_9, .unit = ADC_UNIT_1, .bit_width = ADC_BITWIDTH_12}, }; adc_continuous_config_t cont_cfg = { .sample_freq_hz = 4000, // 4 channels × 1000 Hz = 4000 total samples/sec .conv_mode = ADC_CONV_SINGLE_UNIT_1, .format = ADC_DIGI_OUTPUT_FORMAT_TYPE2, .pattern_num = 4, .adc_pattern = chan_cfg, }; adc_continuous_config(adc_handle, &cont_cfg); // --- ISR callback (fires each frame) --- static SemaphoreHandle_t s_adc_sem; static bool IRAM_ATTR adc_conv_done_cb( adc_continuous_handle_t handle, const adc_continuous_evt_data_t *edata, void *user_data) { BaseType_t hp_woken = pdFALSE; xSemaphoreGiveFromISR(s_adc_sem, &hp_woken); return hp_woken == pdTRUE; } adc_continuous_evt_cbs_t cbs = { .on_conv_done = adc_conv_done_cb }; adc_continuous_register_event_callbacks(adc_handle, &cbs, NULL); adc_continuous_start(adc_handle); // --- ADC calibration (apply per sample) --- adc_cali_handle_t cali_handle; adc_cali_curve_fitting_config_t cali_cfg = { .unit_id = ADC_UNIT_1, .atten = ADC_ATTEN_DB_12, // matches ADC_ATTEN_DB_12 used in current emg_sensor.c .bitwidth = ADC_BITWIDTH_12, }; adc_cali_create_scheme_curve_fitting(&cali_cfg, &cali_handle); // --- Sampling task (pin to Core 0) --- void adc_sampling_task(void *arg) { uint8_t result_buf[256]; uint32_t out_len = 0; while (1) { xSemaphoreTake(s_adc_sem, portMAX_DELAY); adc_continuous_read(adc_handle, result_buf, sizeof(result_buf), &out_len, 0); // Parse: each entry is adc_digi_output_data_t // Apply adc_cali_raw_to_voltage() for each sample // Apply IIR filter (Change B) → post to inference ring buffer } } ``` **Verify**: log consecutive sample timestamps via `esp_timer_get_time()`; spacing should be 1.0ms ± 0.05ms. --- ## Change B — IIR Biquad Bandpass Filter **Priority**: Tier 1 **Why**: MyoWare analogue filters are not tunable. Software IIR removes powerline interference (50/60 Hz), sub-20 Hz motion artifact, and >500 Hz noise — all of which inflate ZC, WL, and other features computed at rest. **Effort**: 2 hours ### Step 1 — Compute Coefficients in Python (one-time, offline) ```python from scipy.signal import butter import numpy as np fs = 1000.0 sos = butter(N=2, Wn=[20.0, 500.0], btype='bandpass', fs=fs, output='sos') # sos[i] = [b0, b1, b2, a0, a1, a2] # esp-dsp Direct Form II convention: coeffs = [b0, b1, b2, -a1, -a2] for i, s in enumerate(sos): b0, b1, b2, a0, a1, a2 = s print(f"Section {i}: {b0:.8f}f, {b1:.8f}f, {b2:.8f}f, {-a1:.8f}f, {-a2:.8f}f") # Run this and paste the printed values into the C constants below ``` ### Step 2 — Add to inference.c (after includes, before `// --- State ---`) ```c #include "dsps_biquad.h" // 2nd-order Butterworth bandpass 20–500 Hz @ 1000 Hz // Coefficients: [b0, b1, b2, -a1, -a2] — Direct Form II, esp-dsp sign convention // Regenerate with: scipy.signal.butter(N=2, Wn=[20,500], btype='bandpass', fs=1000, output='sos') static const float BIQUAD_HP_COEFFS[5] = { /* paste section 0 output here */ }; static const float BIQUAD_LP_COEFFS[5] = { /* paste section 1 output here */ }; // Filter delay state: 3 channels × 2 stages × 2 delay elements = 12 floats (48 bytes) static float biquad_hp_w[HAND_NUM_CHANNELS][2]; static float biquad_lp_w[HAND_NUM_CHANNELS][2]; ``` Add to `inference_init()`: ```c memset(biquad_hp_w, 0, sizeof(biquad_hp_w)); memset(biquad_lp_w, 0, sizeof(biquad_lp_w)); ``` ### Step 3 — Apply Per Sample (called before writing to window_buffer) ```c // Apply to each channel before posting to the window buffer. // Must be called IN ORDER for each sample (IIR has memory across calls). static float IRAM_ATTR apply_bandpass(int ch, float raw) { float hp_out, lp_out; dsps_biquad_f32(&raw, &hp_out, 1, (float *)BIQUAD_HP_COEFFS, biquad_hp_w[ch]); dsps_biquad_f32(&hp_out, &lp_out, 1, (float *)BIQUAD_LP_COEFFS, biquad_lp_w[ch]); return lp_out; } ``` **Note**: `window_buffer` stores `uint16_t` — change to `float` when adding this filter, so filtered values are stored directly without lossy integer round-trip. **Verify**: log ZC count at rest before and after — filtered ZC should be substantially lower (less spurious noise crossings). --- ## Change C — Confidence Rejection **Priority**: Tier 1 — **implement this first, lowest risk of all changes** **Why**: Without a rejection threshold, ambiguous EMG (rest-to-gesture transition, mid-gesture fatigue, electrode lift) always produces a false actuation. **Effort**: 15 minutes ### Step 1 — Add Constant (top of inference.c with other constants) ```c #define CONFIDENCE_THRESHOLD 0.40f // Reject when max smoothed prob < this. // Meta paper uses 0.35; 0.40 adds prosthetic safety margin. // Tune: lower to 0.35 if real gestures are being rejected. ``` ### Step 2 — Insert After EMA Block in `inference_predict()` (after line 214) ```c // Confidence rejection: if the peak smoothed probability is below threshold, // hold the last confirmed output rather than outputting an uncertain prediction. // Prevents false actuations during gesture transitions and electrode artifacts. if (max_smoothed_prob < CONFIDENCE_THRESHOLD) { *confidence = max_smoothed_prob; return current_output; // -1 (GESTURE_NONE) until first confident prediction } ``` **Verify**: arm at complete rest → confirm output stays at GESTURE_NONE and confidence logs below 0.40. Deliberate fist → confidence rises above 0.40 within 1–3 inference cycles. --- ## Change D — On-Device NVS Calibration **Priority**: Tier 2 **Why**: Python `CalibrationTransform` only runs during training. On-device NVS calibration lets the ESP32 recalibrate z-score normalization at startup (3 seconds of REST) without retraining — solving placement drift and day-to-day impedance variation. **Effort**: 3–4 hours ### New Files ``` EMG_Arm/src/core/calibration.h EMG_Arm/src/core/calibration.c ``` ### calibration.h ```c #pragma once #include #include "config/config.h" #define CALIB_MAX_FEATURES 96 // supports up to 4-channel expansion bool calibration_init(void); // load from NVS at startup void calibration_apply(float *feat); // z-score in-place; no-op if not calibrated bool calibration_update(const float X[][CALIB_MAX_FEATURES], int n_windows, int n_feat); void calibration_reset(void); bool calibration_is_valid(void); ``` ### calibration.c ```c #include "calibration.h" #include "nvs_flash.h" #include "nvs.h" #include #include #include #define NVS_NAMESPACE "emg_calib" #define NVS_KEY_MEAN "feat_mean" #define NVS_KEY_STD "feat_std" #define NVS_KEY_NFEAT "n_feat" #define NVS_KEY_VALID "calib_ok" static float s_mean[CALIB_MAX_FEATURES]; static float s_std[CALIB_MAX_FEATURES]; static int s_n_feat = 0; static bool s_valid = false; bool calibration_init(void) { esp_err_t err = nvs_flash_init(); if (err == ESP_ERR_NVS_NO_FREE_PAGES || err == ESP_ERR_NVS_NEW_VERSION_FOUND) { nvs_flash_erase(); nvs_flash_init(); } nvs_handle_t h; if (nvs_open(NVS_NAMESPACE, NVS_READONLY, &h) != ESP_OK) return false; uint8_t valid = 0; size_t mean_sz = sizeof(s_mean), std_sz = sizeof(s_std); bool ok = (nvs_get_u8(h, NVS_KEY_VALID, &valid) == ESP_OK) && (valid == 1) && (nvs_get_i32(h, NVS_KEY_NFEAT, (int32_t*)&s_n_feat) == ESP_OK) && (nvs_get_blob(h, NVS_KEY_MEAN, s_mean, &mean_sz) == ESP_OK) && (nvs_get_blob(h, NVS_KEY_STD, s_std, &std_sz) == ESP_OK); nvs_close(h); s_valid = ok; printf("[Calib] %s (%d features)\n", ok ? "Loaded from NVS" : "Not found — identity", s_n_feat); return ok; } void calibration_apply(float *feat) { if (!s_valid) return; for (int i = 0; i < s_n_feat; i++) feat[i] = (feat[i] - s_mean[i]) / s_std[i]; } bool calibration_update(const float X[][CALIB_MAX_FEATURES], int n_windows, int n_feat) { if (n_windows < 10 || n_feat > CALIB_MAX_FEATURES) return false; s_n_feat = n_feat; memset(s_mean, 0, sizeof(s_mean)); for (int w = 0; w < n_windows; w++) for (int f = 0; f < n_feat; f++) s_mean[f] += X[w][f]; for (int f = 0; f < n_feat; f++) s_mean[f] /= n_windows; memset(s_std, 0, sizeof(s_std)); for (int w = 0; w < n_windows; w++) for (int f = 0; f < n_feat; f++) { float d = X[w][f] - s_mean[f]; s_std[f] += d * d; } for (int f = 0; f < n_feat; f++) { s_std[f] = sqrtf(s_std[f] / n_windows); if (s_std[f] < 1e-6f) s_std[f] = 1e-6f; } nvs_handle_t h; if (nvs_open(NVS_NAMESPACE, NVS_READWRITE, &h) != ESP_OK) return false; nvs_set_blob(h, NVS_KEY_MEAN, s_mean, sizeof(s_mean)); nvs_set_blob(h, NVS_KEY_STD, s_std, sizeof(s_std)); nvs_set_i32(h, NVS_KEY_NFEAT, n_feat); nvs_set_u8(h, NVS_KEY_VALID, 1); nvs_commit(h); nvs_close(h); s_valid = true; printf("[Calib] Updated from %d REST windows, %d features\n", n_windows, n_feat); return true; } ``` ### Integration in inference.c In `inference_predict()`, after `compute_features(features)`, before LDA: ```c calibration_apply(features); // z-score using NVS-stored mean/std ``` ### Startup Flow ```c // In main application startup sequence: calibration_init(); // load from NVS; no-op if not present yet // When user triggers recalibration (button press or serial command): // Collect ~120 REST windows (~3 seconds at 25ms hop) // Call calibration_update(rest_feature_buffer, 120, MODEL_NUM_FEATURES) ``` --- ## Change E — int8 MLP via TFLite Micro **Priority**: Tier 3 — implement after Tier 1+2 changes and benchmark (Change 5) shows LDA plateauing **Why**: LDA finds only linear decision boundaries. A 2-layer int8 MLP adds nonlinear boundaries for gesture pairs that overlap in feature space. **Effort**: 4–6 hours ### Python Training (new file: `train_mlp_tflite.py`) ```python """ Train int8 MLP for ESP32-S3 deployment via TFLite Micro. Run AFTER Change 0 (label shift) + Change 1 (expanded features). """ import numpy as np import tensorflow as tf from pathlib import Path import sys sys.path.insert(0, str(Path(__file__).parent)) from learning_data_collection import SessionStorage, EMGFeatureExtractor, HAND_CHANNELS storage = SessionStorage() X_raw, y, trial_ids, session_indices, label_names, _ = storage.load_all_for_training() extractor = EMGFeatureExtractor(channels=HAND_CHANNELS, cross_channel=True) X = extractor.extract_features_batch(X_raw).astype(np.float32) from sklearn.preprocessing import StandardScaler scaler = StandardScaler() X = scaler.fit_transform(X) n_feat, n_cls = X.shape[1], len(np.unique(y)) model = tf.keras.Sequential([ tf.keras.layers.Input(shape=(n_feat,)), tf.keras.layers.Dense(32, activation='relu'), tf.keras.layers.Dropout(0.2), tf.keras.layers.Dense(16, activation='relu'), tf.keras.layers.Dense(n_cls, activation='softmax'), ]) model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy']) model.fit(X, y, epochs=150, batch_size=64, validation_split=0.1, verbose=1) def representative_dataset(): for i in range(0, len(X), 10): yield [X[i:i+1]] converter = tf.lite.TFLiteConverter.from_keras_model(model) converter.optimizations = [tf.lite.Optimize.DEFAULT] converter.representative_dataset = representative_dataset converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8] converter.inference_input_type = tf.int8 converter.inference_output_type = tf.int8 tflite_model = converter.convert() out = Path('EMG_Arm/src/core/emg_model_data.cc') with open(out, 'w') as f: f.write('#include "emg_model_data.h"\n') f.write(f'const int g_model_len = {len(tflite_model)};\n') f.write('const unsigned char g_model[] = {\n ') f.write(', '.join(f'0x{b:02x}' for b in tflite_model)) f.write('\n};\n') print(f"Wrote {out} ({len(tflite_model)} bytes)") ``` ### Firmware (inference_mlp.cc) ```cpp #include "inference_mlp.h" #include "emg_model_data.h" #include "tensorflow/lite/micro/micro_interpreter.h" #include "tensorflow/lite/micro/micro_mutable_op_resolver.h" #include "tensorflow/lite/schema/schema_generated.h" static uint8_t tensor_arena[48 * 1024]; // 48 KB — tune down if memory is tight static tflite::MicroInterpreter *interpreter = nullptr; static TfLiteTensor *input = nullptr, *output = nullptr; void inference_mlp_init(void) { const tflite::Model *model = tflite::GetModel(g_model); static tflite::MicroMutableOpResolver<4> resolver; resolver.AddFullyConnected(); resolver.AddRelu(); resolver.AddSoftmax(); resolver.AddDequantize(); static tflite::MicroInterpreter interp(model, resolver, tensor_arena, sizeof(tensor_arena)); interpreter = &interp; interpreter->AllocateTensors(); input = interpreter->input(0); output = interpreter->output(0); } int inference_mlp_predict(const float *features, int n_feat, float *conf_out) { float iscale = input->params.scale; int izp = input->params.zero_point; for (int i = 0; i < n_feat; i++) { int q = (int)roundf(features[i] / iscale) + izp; input->data.int8[i] = (int8_t)(q < -128 ? -128 : q > 127 ? 127 : q); } interpreter->Invoke(); float oscale = output->params.scale; int ozp = output->params.zero_point; float max_p = -1e9f; int max_c = 0; for (int c = 0; c < MODEL_NUM_CLASSES; c++) { float p = (output->data.int8[c] - ozp) * oscale; if (p > max_p) { max_p = p; max_c = c; } } *conf_out = max_p; return max_c; } ``` **platformio.ini addition**: ```ini lib_deps = tensorflow/tflite-micro ``` --- ## Change F — Ensemble Inference Pipeline **Priority**: Tier 3 (requires Change 1 features + Change 7 training + Change E MLP) **Why**: This is the full recommended architecture from Part II. **Effort**: 3–4 hours firmware (after Python ensemble is trained and exported) ### New Files ``` EMG_Arm/src/core/inference_ensemble.c EMG_Arm/src/core/inference_ensemble.h EMG_Arm/src/core/model_weights_ensemble.h (generated by Change 7 Python script) ``` ### inference_ensemble.h ```c #pragma once #include void inference_ensemble_init(void); int inference_ensemble_predict(float *confidence); ``` ### inference_ensemble.c ```c #include "inference_ensemble.h" #include "inference.h" // for compute_features(), calibration_apply() #include "inference_mlp.h" // for inference_mlp_predict() #include "model_weights_ensemble.h" #include "config/config.h" #include "dsps_dotprod.h" #include #include #include #define ENSEMBLE_EMA_ALPHA 0.70f #define ENSEMBLE_CONF_THRESHOLD 0.50f // below this: escalate to MLP fallback #define REJECT_THRESHOLD 0.40f // below this even after MLP: hold output #define REST_ACTIVITY_THRESHOLD 0.05f // total_rms below this → skip inference, return REST // EMA state static float s_smoothed[MODEL_NUM_CLASSES]; // Vote + debounce (reuse existing pattern from inference.c) static int s_vote_history[5]; static int s_vote_head = 0; static int s_current_output = -1; static int s_pending_output = -1; static int s_pending_count = 0; // --- Generic LDA softmax predict --- // weights: [n_classes][n_feat], intercepts: [n_classes] // proba_out: [n_classes] — caller-provided output static void lda_softmax(const float *feat, int n_feat, const float *weights_flat, const float *intercepts, int n_classes, float *proba_out) { float raw[MODEL_NUM_CLASSES]; float max_raw = -1e9f, sum_exp = 0.0f; for (int c = 0; c < n_classes; c++) { raw[c] = intercepts[c]; // dsps_dotprod_f32 requires 4-byte aligned arrays and length multiple of 4; // for safety use plain loop — compiler will auto-vectorize with -O2 const float *w = weights_flat + c * n_feat; for (int f = 0; f < n_feat; f++) raw[c] += feat[f] * w[f]; if (raw[c] > max_raw) max_raw = raw[c]; } for (int c = 0; c < n_classes; c++) { proba_out[c] = expf(raw[c] - max_raw); sum_exp += proba_out[c]; } for (int c = 0; c < n_classes; c++) proba_out[c] /= sum_exp; } void inference_ensemble_init(void) { for (int c = 0; c < MODEL_NUM_CLASSES; c++) s_smoothed[c] = 1.0f / MODEL_NUM_CLASSES; for (int i = 0; i < 5; i++) s_vote_history[i] = -1; s_vote_head = 0; s_current_output = -1; s_pending_output = -1; s_pending_count = 0; } int inference_ensemble_predict(float *confidence) { // 1. Extract features (shared with single-model path) float features[MODEL_NUM_FEATURES]; compute_features(features); calibration_apply(features); // 2. Activity gate — skip inference during obvious REST float total_rms_sq = 0.0f; for (int ch = 0; ch < HAND_NUM_CHANNELS; ch++) { float r = features[ch * ENSEMBLE_PER_CH_FEATURES]; // RMS is index 0 per channel total_rms_sq += r * r; } if (sqrtf(total_rms_sq) < REST_ACTIVITY_THRESHOLD) { *confidence = 1.0f; return GESTURE_REST; } // 3. Specialist LDAs float prob_td[MODEL_NUM_CLASSES]; float prob_fd[MODEL_NUM_CLASSES]; float prob_cc[MODEL_NUM_CLASSES]; lda_softmax(features + TD_FEAT_OFFSET, TD_NUM_FEATURES, (const float *)LDA_TD_WEIGHTS, LDA_TD_INTERCEPTS, MODEL_NUM_CLASSES, prob_td); lda_softmax(features + FD_FEAT_OFFSET, FD_NUM_FEATURES, (const float *)LDA_FD_WEIGHTS, LDA_FD_INTERCEPTS, MODEL_NUM_CLASSES, prob_fd); lda_softmax(features + CC_FEAT_OFFSET, CC_NUM_FEATURES, (const float *)LDA_CC_WEIGHTS, LDA_CC_INTERCEPTS, MODEL_NUM_CLASSES, prob_cc); // 4. Meta-LDA stacker float meta_in[META_NUM_INPUTS]; // = 3 * MODEL_NUM_CLASSES memcpy(meta_in, prob_td, MODEL_NUM_CLASSES * sizeof(float)); memcpy(meta_in + MODEL_NUM_CLASSES, prob_fd, MODEL_NUM_CLASSES * sizeof(float)); memcpy(meta_in + 2*MODEL_NUM_CLASSES, prob_cc, MODEL_NUM_CLASSES * sizeof(float)); float meta_probs[MODEL_NUM_CLASSES]; lda_softmax(meta_in, META_NUM_INPUTS, (const float *)META_LDA_WEIGHTS, META_LDA_INTERCEPTS, MODEL_NUM_CLASSES, meta_probs); // 5. EMA smoothing on meta output float max_smooth = 0.0f; int winner = 0; for (int c = 0; c < MODEL_NUM_CLASSES; c++) { s_smoothed[c] = ENSEMBLE_EMA_ALPHA * s_smoothed[c] + (1.0f - ENSEMBLE_EMA_ALPHA) * meta_probs[c]; if (s_smoothed[c] > max_smooth) { max_smooth = s_smoothed[c]; winner = c; } } // 6. Confidence cascade: escalate to MLP if meta-LDA is uncertain if (max_smooth < ENSEMBLE_CONF_THRESHOLD) { float mlp_conf = 0.0f; int mlp_winner = inference_mlp_predict(features, MODEL_NUM_FEATURES, &mlp_conf); if (mlp_conf > max_smooth) { winner = mlp_winner; max_smooth = mlp_conf; } } // 7. Reject if still uncertain if (max_smooth < REJECT_THRESHOLD) { *confidence = max_smooth; return s_current_output; } *confidence = max_smooth; // 8. Majority vote (window = 5) s_vote_history[s_vote_head] = winner; s_vote_head = (s_vote_head + 1) % 5; int counts[MODEL_NUM_CLASSES] = {0}; for (int i = 0; i < 5; i++) if (s_vote_history[i] >= 0) counts[s_vote_history[i]]++; int majority = 0, majority_cnt = 0; for (int c = 0; c < MODEL_NUM_CLASSES; c++) if (counts[c] > majority_cnt) { majority_cnt = counts[c]; majority = c; } // 9. Debounce (3 consecutive predictions to change output) int final = s_current_output; if (s_current_output == -1) { s_current_output = majority; final = majority; } else if (majority == s_current_output) { s_pending_output = majority; s_pending_count = 1; } else if (majority == s_pending_output) { if (++s_pending_count >= 3) { s_current_output = majority; final = majority; } } else { s_pending_output = majority; s_pending_count = 1; } return final; } ``` ### model_weights_ensemble.h Layout (generated by Change 7) ```c // Auto-generated by train_ensemble.py — do not edit manually #pragma once #define MODEL_NUM_CLASSES 5 // auto-computed from training data #define MODEL_NUM_FEATURES 69 // total feature count (after Change 1) #define ENSEMBLE_PER_CH_FEATURES 20 // features per channel // Specialist feature subset offsets and sizes #define TD_FEAT_OFFSET 0 #define TD_NUM_FEATURES 36 // time-domain: indices 0–11, 20–31, 40–51 #define FD_FEAT_OFFSET 12 // NOTE: FD features are interleaved per-channel #define FD_NUM_FEATURES 24 // freq-domain: indices 12–19, 32–39, 52–59 #define CC_FEAT_OFFSET 60 #define CC_NUM_FEATURES 9 // cross-channel: indices 60–68 #define META_NUM_INPUTS (3 * MODEL_NUM_CLASSES) // = 15 // Specialist LDA weights (flat row-major: [n_classes][n_feat]) extern const float LDA_TD_WEIGHTS[MODEL_NUM_CLASSES][TD_NUM_FEATURES]; extern const float LDA_TD_INTERCEPTS[MODEL_NUM_CLASSES]; extern const float LDA_FD_WEIGHTS[MODEL_NUM_CLASSES][FD_NUM_FEATURES]; extern const float LDA_FD_INTERCEPTS[MODEL_NUM_CLASSES]; extern const float LDA_CC_WEIGHTS[MODEL_NUM_CLASSES][CC_NUM_FEATURES]; extern const float LDA_CC_INTERCEPTS[MODEL_NUM_CLASSES]; // Meta-LDA weights extern const float META_LDA_WEIGHTS[MODEL_NUM_CLASSES][META_NUM_INPUTS]; extern const float META_LDA_INTERCEPTS[MODEL_NUM_CLASSES]; // Class names (for inference_get_gesture_enum) extern const char *MODEL_CLASS_NAMES[MODEL_NUM_CLASSES]; ``` **Important note on FD features**: the frequency-domain features are interleaved at indices [12–19] for ch0, [32–39] for ch1, [52–59] for ch2. The `lda_softmax` call for LDA_FD must pass a **gathered** (non-contiguous) sub-vector. The cleanest approach is to gather them into a contiguous buffer before calling lda_softmax: ```c // Gather FD features into contiguous buffer before LDA_FD float fd_buf[FD_NUM_FEATURES]; for (int ch = 0; ch < HAND_NUM_CHANNELS; ch++) memcpy(fd_buf + ch*8, features + ch*20 + 12, 8 * sizeof(float)); lda_softmax(fd_buf, FD_NUM_FEATURES, ...); ``` Similarly for TD features. This gather costs <5 µs — negligible. --- # PART VI — PYTHON/TRAINING CHANGES ## Change 0 — Forward Label Shift **Priority**: Tier 1 **Source**: Meta Nature 2025, Methods: "Discrete-gesture time alignment" **Why**: +100ms shift after onset detection gives the classifier 100ms of pre-event "building" signal, dramatically cleaning the decision boundary near gesture onset. **ESP32 impact**: None. ### Step 1 — Add Constant After Line 94 ```python # After: TRANSITION_END_MS = 150 LABEL_FORWARD_SHIFT_MS = 100 # shift label boundaries +100ms after onset alignment # Source: Kaifosh et al. Nature 2025. doi:10.1038/s41586-025-09255-w ``` ### Step 2 — Apply Shift in `SessionStorage.save_session()` (after line ~704) Find and insert after: ```python print(f"[Storage] Labels aligned: {changed}/{len(labels)} windows shifted") ``` Insert: ```python if LABEL_FORWARD_SHIFT_MS > 0: shift_windows = max(1, round(LABEL_FORWARD_SHIFT_MS / HOP_SIZE_MS)) shifted = list(aligned_labels) for i in range(1, len(aligned_labels)): if aligned_labels[i] != aligned_labels[i - 1]: for j in range(i, min(i + shift_windows, len(aligned_labels))): if shifted[j] == aligned_labels[i]: shifted[j] = aligned_labels[i - 1] n_shifted = sum(1 for a, b in zip(aligned_labels, shifted) if a != b) aligned_labels = shifted print(f"[Storage] Forward label shift (+{LABEL_FORWARD_SHIFT_MS}ms): {n_shifted} windows adjusted") ``` ### Step 3 — Reduce TRANSITION_START_MS ```python TRANSITION_START_MS = 200 # was 300 — reduce because 100ms shift already adds pre-event context ``` **Verify**: printout shows `N windows adjusted` where N is 5–20% of total windows per session. --- ## Change 1 — Expanded Feature Set **Priority**: Tier 2 **Why**: 12 → 69 features; adds frequency-domain and cross-channel information that is structurally more informative than amplitude alone (Meta Extended Data Fig. 6). **ESP32 impact**: retrain → export new `model_weights.h`; port selected features to C. ### Sub-change 1A — Expand `extract_features_single_channel()` (line 1448) Replace the entire function body: ```python def extract_features_single_channel(self, signal: np.ndarray) -> dict: if getattr(self, 'reinhard', False): signal = 64.0 * signal / (32.0 + np.abs(signal)) signal = signal - np.mean(signal) N = len(signal) # --- Time domain --- rms = np.sqrt(np.mean(signal ** 2)) diff = np.diff(signal) wl = np.sum(np.abs(diff)) zc_thresh = self.zc_threshold_percent * rms ssc_thresh = (self.ssc_threshold_percent * rms) ** 2 sign_ch = signal[:-1] * signal[1:] < 0 zc = int(np.sum(sign_ch & (np.abs(diff) > zc_thresh))) d_l = signal[1:-1] - signal[:-2] d_r = signal[1:-1] - signal[2:] ssc = int(np.sum((d_l * d_r) > ssc_thresh)) mav = np.mean(np.abs(signal)) var = np.mean(signal ** 2) iemg = np.sum(np.abs(signal)) wamp = int(np.sum(np.abs(diff) > 0.15 * rms)) # AR(4) via Yule-Walker ar = np.zeros(4) if rms > 1e-6: try: from scipy.linalg import solve_toeplitz r = np.array([np.dot(signal[i:], signal[:N-i]) / N for i in range(5)]) if r[0] > 1e-10: ar = solve_toeplitz(r[:4], -r[1:5]) except Exception: pass # --- Frequency domain (20–500 Hz) --- freqs = np.fft.rfftfreq(N, d=1.0 / SAMPLING_RATE_HZ) psd = np.abs(np.fft.rfft(signal)) ** 2 / N m = (freqs >= 20) & (freqs <= 500) f_m, p_m = freqs[m], psd[m] tp = np.sum(p_m) + 1e-10 mnf = float(np.sum(f_m * p_m) / tp) cum = np.cumsum(p_m) mdf = float(f_m[min(np.searchsorted(cum, tp / 2), len(f_m) - 1)]) pkf = float(f_m[np.argmax(p_m)]) if len(p_m) > 0 else 0.0 mnp = float(tp / max(len(p_m), 1)) # Bandpower in 4 physiological bands (mirrors firmware esp-dsp FFT bands) bands = [(20, 80), (80, 150), (150, 300), (300, 500)] bp = [float(np.sum(psd[(freqs >= lo) & (freqs < hi)])) for lo, hi in bands] return { 'rms': rms, 'wl': wl, 'zc': zc, 'ssc': ssc, 'mav': mav, 'var': var, 'iemg': iemg, 'wamp': wamp, 'ar1': float(ar[0]), 'ar2': float(ar[1]), 'ar3': float(ar[2]), 'ar4': float(ar[3]), 'mnf': mnf, 'mdf': mdf, 'pkf': pkf, 'mnp': mnp, 'bp0': bp[0], 'bp1': bp[1], 'bp2': bp[2], 'bp3': bp[3], } ``` ### Sub-change 1B — Update `extract_features_window()` Return Block (line 1482) Replace the return section: ```python FEATURE_ORDER = ['rms', 'wl', 'zc', 'ssc', 'mav', 'var', 'iemg', 'wamp', 'ar1', 'ar2', 'ar3', 'ar4', 'mnf', 'mdf', 'pkf', 'mnp', 'bp0', 'bp1', 'bp2', 'bp3'] NORMALIZE_KEYS = {'rms', 'wl', 'mav', 'iemg'} features = [] for ch_features in all_ch_features: for key in FEATURE_ORDER: val = ch_features.get(key, 0.0) if self.normalize and key in NORMALIZE_KEYS: val = val / norm_factor features.append(float(val)) if self.cross_channel and window.shape[1] >= 2: sel = window[:, channel_indices].astype(np.float32) wc = sel - sel.mean(axis=0) cov = (wc.T @ wc) / len(wc) ri, ci = np.triu_indices(len(channel_indices)) features.extend(cov[ri, ci].tolist()) stds = np.sqrt(np.diag(cov)) + 1e-10 cor = cov / np.outer(stds, stds) ro, co = np.triu_indices(len(channel_indices), k=1) features.extend(cor[ro, co].tolist()) return np.array(features, dtype=np.float32) ``` ### Sub-change 1C — Update `EMGFeatureExtractor.__init__()` (line 1430) ```python def __init__(self, zc_threshold_percent=0.1, ssc_threshold_percent=0.1, channels=None, normalize=True, cross_channel=True, reinhard=False): self.zc_threshold_percent = zc_threshold_percent self.ssc_threshold_percent = ssc_threshold_percent self.channels = channels self.normalize = normalize self.cross_channel = cross_channel self.reinhard = reinhard ``` ### Sub-change 1D — Update Feature Count in `extract_features_batch()` (line 1520) Replace `n_features = n_channels * 4`: ```python per_ch = 20 if self.cross_channel and n_channels >= 2: n_features = n_channels * per_ch + \ n_channels*(n_channels+1)//2 + n_channels*(n_channels-1)//2 else: n_features = n_channels * per_ch ``` ### Sub-change 1E — Update `get_feature_names()` (line 1545) ```python def get_feature_names(self, n_channels=0): ch_idx = self.channels if self.channels is not None else list(range(n_channels)) ORDER = ['rms','wl','zc','ssc','mav','var','iemg','wamp', 'ar1','ar2','ar3','ar4','mnf','mdf','pkf','mnp','bp0','bp1','bp2','bp3'] names = [f'ch{ch}_{f}' for ch in ch_idx for f in ORDER] if self.cross_channel and len(ch_idx) >= 2: n = len(ch_idx) names += [f'cov_ch{ch_idx[i]}_ch{ch_idx[j]}' for i in range(n) for j in range(i, n)] names += [f'cor_ch{ch_idx[i]}_ch{ch_idx[j]}' for i in range(n) for j in range(i+1, n)] return names ``` ### Sub-change 1F — Update `EMGClassifier.__init__()` (line 1722) ```python self.feature_extractor = EMGFeatureExtractor( channels=HAND_CHANNELS, cross_channel=True, reinhard=False) ``` ### Sub-change 1G — Update `save()` (line 1910) and `load()` (line 2089) In `save()`, add to `feature_extractor_params` dict: ```python 'cross_channel': getattr(self.feature_extractor, 'cross_channel', True), 'reinhard': getattr(self.feature_extractor, 'reinhard', False), ``` In `load()`, update `EMGFeatureExtractor(...)` constructor: ```python classifier.feature_extractor = EMGFeatureExtractor( zc_threshold_percent = params.get('zc_threshold_percent', 0.1), ssc_threshold_percent = params.get('ssc_threshold_percent', 0.1), channels = params.get('channels', HAND_CHANNELS), normalize = params.get('normalize', False), cross_channel = params.get('cross_channel', True), reinhard = params.get('reinhard', False), ) ``` ### Also Fix Bug at Line 2382 ```python X, y, trial_ids, session_indices, label_names, loaded_sessions = storage.load_all_for_training() ``` --- ## Change 2 — Electrode Repositioning Protocol **Protocol**: no code changes. > *"Between sessions within a single day, the participants remove and slightly reposition the > sEMG wristband to enable generalization across different recording positions."* > — Meta Nature 2025 Methods - Session 1: standard placement - Session 2: band 1–2 cm up the forearm - Session 3: band 1–2 cm down the forearm - Session 4+: slight axial rotation or return to any above position The per-session z-score normalization in `_apply_session_normalization()` handles the resulting amplitude shifts. Perform **fast, natural** gestures — not slow/deliberate. --- ## Change 3 — Data Augmentation **Priority**: Tier 2. Apply to **raw windows BEFORE feature extraction**. Insert before the `# === LDA CLASSIFIER ===` comment (~line 1709): ```python def augment_emg_batch(X, y, multiplier=3, seed=42): """ Augment raw EMG windows for training robustness. Must be called on raw windows (n_windows, n_samples, n_channels), not on pre-computed features. Source (window jitter): Kaifosh et al. Nature 2025. doi:10.1038/s41586-025-09255-w """ rng = np.random.default_rng(seed) aug_X, aug_y = [X], [y] for _ in range(multiplier - 1): Xc = X.copy().astype(np.float32) Xc *= rng.uniform(0.80, 1.20, (len(X), 1, 1)).astype(np.float32) # amplitude rms = np.sqrt(np.mean(Xc**2, axis=(1,2), keepdims=True)) + 1e-8 Xc += rng.standard_normal(Xc.shape).astype(np.float32) * (0.05 * rms) # noise Xc += rng.uniform(-20., 20., (len(X), 1, X.shape[2])).astype(np.float32) # DC jitter shifts = rng.integers(-5, 6, size=len(X)) for i in range(len(Xc)): if shifts[i]: Xc[i] = np.roll(Xc[i], shifts[i], axis=0) # jitter aug_X.append(Xc); aug_y.append(y) return np.concatenate(aug_X), np.concatenate(aug_y) ``` In `EMGClassifier.train()`, replace the start of the function's feature extraction block: ```python if getattr(self, 'use_augmentation', True): X_aug, y_aug = augment_emg_batch(X, y, multiplier=3) print(f"[Classifier] Augmented: {len(X)} → {len(X_aug)} windows") else: X_aug, y_aug = X, y X_features = self.feature_extractor.extract_features_batch(X_aug) # ... then use y_aug instead of y for model.fit() ``` --- ## Change 4 — Reinhard Compression (Optional) **Formula**: `output = 64 × x / (32 + |x|)` **Enable in Python**: set `reinhard=True` in `EMGFeatureExtractor` constructor (Change 1F). **Enable in firmware** (`inference.c` `compute_features()`, after signal copy loop, before mean calc): ```c #if MODEL_USE_REINHARD for (int i = 0; i < INFERENCE_WINDOW_SIZE; i++) { float x = signal[i]; signal[i] = 64.0f * x / (32.0f + fabsf(x)); } #endif ``` Add `#define MODEL_USE_REINHARD 0` to `model_weights.h` (set to `1` when Python uses `reinhard=True`). **Python and firmware MUST match.** Mismatch silently corrupts all predictions. --- ## Change 5 — Classifier Benchmark **Purpose**: tells you whether LDA accuracy plateau is a features problem (all classifiers similar → add features) or a model complexity problem (SVM/MLP >> LDA → implement Change E/F). Add after `run_training_demo()`: ```python def run_classifier_benchmark(): from sklearn.svm import SVC from sklearn.neural_network import MLPClassifier from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler from sklearn.model_selection import cross_val_score, GroupKFold from sklearn.discriminant_analysis import LinearDiscriminantAnalysis, QuadraticDiscriminantAnalysis storage = SessionStorage() X_raw, y, trial_ids, session_indices, label_names, _ = storage.load_all_for_training() extractor = EMGFeatureExtractor(channels=HAND_CHANNELS, cross_channel=True) X = extractor.extract_features_batch(X_raw) X = EMGClassifier()._apply_session_normalization(X, session_indices, y=y) clfs = { 'LDA (ESP32 model)': LinearDiscriminantAnalysis(), 'QDA': QuadraticDiscriminantAnalysis(reg_param=0.1), 'SVM-RBF': Pipeline([('s', StandardScaler()), ('m', SVC(kernel='rbf', C=10))]), 'MLP-128-64': Pipeline([('s', StandardScaler()), ('m', MLPClassifier(hidden_layer_sizes=(128,64), max_iter=1000, early_stopping=True))]), } gkf = GroupKFold(n_splits=5) print(f"\n{'Classifier':<22} {'Mean CV':>8} {'Std':>6}") print("-" * 40) for name, clf in clfs.items(): sc = cross_val_score(clf, X, y, cv=gkf, groups=trial_ids, scoring='accuracy') print(f" {name:<20} {sc.mean()*100:>7.1f}% ±{sc.std()*100:.1f}%") print("\n → If LDA ≈ SVM: features are the bottleneck (add Change 1 features)") print(" → If SVM >> LDA: model complexity bottleneck (implement Change F ensemble)") ``` --- ## Change 6 — Simplified MPF Features **Python training only** — not worth porting to ESP32 directly (use bandpower bp0–bp3 from Change 1 as the firmware-side approximation). Add after `EMGFeatureExtractor` class: ```python class MPFFeatureExtractor: """ Simplified 3-channel MPF: CSD upper triangle per 6 frequency bands = 36 features. Python training only. Omits matrix logarithm (not needed for 3 channels). Source: Kaifosh et al. Nature 2025. doi:10.1038/s41586-025-09255-w ESP32 approximation: use bp0–bp3 from EMGFeatureExtractor (Change 1). """ BANDS = [(0,62),(62,125),(125,187),(187,250),(250,375),(375,500)] def __init__(self, channels=None, log_diagonal=True): self.channels = channels or HAND_CHANNELS self.log_diag = log_diagonal self.n_ch = len(self.channels) self._r, self._c = np.triu_indices(self.n_ch) self.n_features = len(self.BANDS) * len(self._r) def extract_window(self, window): sig = window[:, self.channels].astype(np.float64) N = len(sig) freqs = np.fft.rfftfreq(N, d=1.0/SAMPLING_RATE_HZ) Xf = np.fft.rfft(sig, axis=0) feats = [] for lo, hi in self.BANDS: mask = (freqs >= lo) & (freqs < hi) if not mask.any(): feats.extend([0.0] * len(self._r)); continue CSD = (Xf[mask].conj().T @ Xf[mask]).real / N if self.log_diag: for k in range(self.n_ch): CSD[k,k] = np.log(max(CSD[k,k], 1e-10)) feats.extend(CSD[self._r, self._c].tolist()) return np.array(feats, dtype=np.float32) def extract_batch(self, X): out = np.zeros((len(X), self.n_features), dtype=np.float32) for i in range(len(X)): out[i] = self.extract_window(X[i]) return out ``` In `EMGClassifier.train()`, after standard feature extraction: ```python if getattr(self, 'use_mpf', False): mpf = MPFFeatureExtractor(channels=HAND_CHANNELS) X_features = np.hstack([X_features, mpf.extract_batch(X_aug)]) ``` --- ## Change 7 — Ensemble Training **Priority**: Tier 3 (implements Change F's training side) **New file**: `C:/VSCode/Marvel_Projects/Bucky_Arm/train_ensemble.py` ```python """ Train the full 3-specialist-LDA + meta-LDA ensemble. Requires Change 1 (expanded features) to be implemented first. Exports model_weights_ensemble.h for firmware Change F. Architecture: LDA_TD (36 time-domain feat) ─┐ LDA_FD (24 freq-domain feat) ├─ 15 probs ─► Meta-LDA ─► final class LDA_CC (9 cross-ch feat) ─┘ """ import numpy as np from pathlib import Path from sklearn.discriminant_analysis import LinearDiscriminantAnalysis from sklearn.model_selection import cross_val_predict, GroupKFold, cross_val_score import sys sys.path.insert(0, str(Path(__file__).parent)) from learning_data_collection import ( SessionStorage, EMGFeatureExtractor, HAND_CHANNELS ) # ─── Load and extract features ─────────────────────────────────────────────── storage = SessionStorage() X_raw, y, trial_ids, session_indices, label_names, _ = storage.load_all_for_training() extractor = EMGFeatureExtractor(channels=HAND_CHANNELS, cross_channel=True) X = extractor.extract_features_batch(X_raw).astype(np.float64) # Per-session normalization (same as EMGClassifier._apply_session_normalization) from sklearn.preprocessing import StandardScaler for sid in np.unique(session_indices): mask = session_indices == sid sc = StandardScaler() X[mask] = sc.fit_transform(X[mask]) feat_names = extractor.get_feature_names(n_channels=len(HAND_CHANNELS)) n_cls = len(np.unique(y)) # ─── Feature subset indices ─────────────────────────────────────────────────── TD_FEAT = ['rms','wl','zc','ssc','mav','var','iemg','wamp','ar1','ar2','ar3','ar4'] FD_FEAT = ['mnf','mdf','pkf','mnp','bp0','bp1','bp2','bp3'] td_idx = [i for i,n in enumerate(feat_names) if any(n.endswith(f'_{f}') for f in TD_FEAT)] fd_idx = [i for i,n in enumerate(feat_names) if any(n.endswith(f'_{f}') for f in FD_FEAT)] cc_idx = [i for i,n in enumerate(feat_names) if n.startswith('cov_') or n.startswith('cor_')] print(f"Feature subsets — TD: {len(td_idx)}, FD: {len(fd_idx)}, CC: {len(cc_idx)}") X_td = X[:, td_idx] X_fd = X[:, fd_idx] X_cc = X[:, cc_idx] # ─── Train specialist LDAs with out-of-fold stacking ───────────────────────── gkf = GroupKFold(n_splits=5) print("Training specialist LDAs (out-of-fold for stacking)...") lda_td = LinearDiscriminantAnalysis() lda_fd = LinearDiscriminantAnalysis() lda_cc = LinearDiscriminantAnalysis() oof_td = cross_val_predict(lda_td, X_td, y, cv=gkf, groups=trial_ids, method='predict_proba') oof_fd = cross_val_predict(lda_fd, X_fd, y, cv=gkf, groups=trial_ids, method='predict_proba') oof_cc = cross_val_predict(lda_cc, X_cc, y, cv=gkf, groups=trial_ids, method='predict_proba') # Specialist CV accuracy (for diagnostics) for name, mdl, Xs in [('LDA_TD', lda_td, X_td), ('LDA_FD', lda_fd, X_fd), ('LDA_CC', lda_cc, X_cc)]: sc = cross_val_score(mdl, Xs, y, cv=gkf, groups=trial_ids) print(f" {name}: {sc.mean()*100:.1f}% ± {sc.std()*100:.1f}%") # ─── Train meta-LDA on out-of-fold outputs ─────────────────────────────────── X_meta = np.hstack([oof_td, oof_fd, oof_cc]) # (n_samples, 3*n_cls = 15) meta_lda = LinearDiscriminantAnalysis() meta_sc = cross_val_score(meta_lda, X_meta, y, cv=gkf, groups=trial_ids) print(f" Meta-LDA: {meta_sc.mean()*100:.1f}% ± {meta_sc.std()*100:.1f}%") # Fit all models on full dataset for deployment lda_td.fit(X_td, y); lda_fd.fit(X_fd, y); lda_cc.fit(X_cc, y) meta_lda.fit(X_meta, y) # ─── Export all weights to C header ────────────────────────────────────────── def lda_to_c_arrays(lda, name, feat_dim, n_cls, label_names, class_order): """Generate C array strings for LDA weights and intercepts.""" # Reorder classes to match label_names order coef = lda.coef_ # shape (n_cls, feat_dim) for LinearDiscriminantAnalysis intercept = lda.intercept_ lines = [] lines.append(f"const float {name}_WEIGHTS[{n_cls}][{feat_dim}] = {{") for c in class_order: row = ', '.join(f'{v:.8f}f' for v in coef[c]) lines.append(f" {{{row}}}, // {label_names[c]}") lines.append("};") lines.append(f"const float {name}_INTERCEPTS[{n_cls}] = {{") intercept_str = ', '.join(f'{intercept[c]:.8f}f' for c in class_order) lines.append(f" {intercept_str}") lines.append("};") return '\n'.join(lines) class_order = list(range(n_cls)) out_path = Path('EMG_Arm/src/core/model_weights_ensemble.h') with open(out_path, 'w') as f: f.write("// Auto-generated by train_ensemble.py — do not edit\n") f.write("#pragma once\n\n") f.write(f"#define MODEL_NUM_CLASSES {n_cls}\n") f.write(f"#define MODEL_NUM_FEATURES {X.shape[1]}\n") f.write(f"#define ENSEMBLE_PER_CH_FEATURES 20\n\n") f.write(f"#define TD_FEAT_OFFSET {min(td_idx)}\n") f.write(f"#define TD_NUM_FEATURES {len(td_idx)}\n") f.write(f"#define FD_FEAT_OFFSET {min(fd_idx)}\n") f.write(f"#define FD_NUM_FEATURES {len(fd_idx)}\n") f.write(f"#define CC_FEAT_OFFSET {min(cc_idx)}\n") f.write(f"#define CC_NUM_FEATURES {len(cc_idx)}\n") f.write(f"#define META_NUM_INPUTS ({3} * MODEL_NUM_CLASSES)\n\n") f.write(lda_to_c_arrays(lda_td, 'LDA_TD', len(td_idx), n_cls, label_names, class_order)) f.write('\n\n') f.write(lda_to_c_arrays(lda_fd, 'LDA_FD', len(fd_idx), n_cls, label_names, class_order)) f.write('\n\n') f.write(lda_to_c_arrays(lda_cc, 'LDA_CC', len(cc_idx), n_cls, label_names, class_order)) f.write('\n\n') f.write(lda_to_c_arrays(meta_lda, 'META_LDA', 3*n_cls, n_cls, label_names, class_order)) f.write('\n\n') names_str = ', '.join(f'"{label_names[c]}"' for c in class_order) f.write(f"const char *MODEL_CLASS_NAMES[MODEL_NUM_CLASSES] = {{{names_str}}};\n") print(f"Exported ensemble weights to {out_path}") print(f"Total weight storage: {(len(td_idx)+len(fd_idx)+len(cc_idx)+3*n_cls)*n_cls*4} bytes float32") ``` **Note on LinearDiscriminantAnalysis with multi-class**: scikit-learn's LDA uses a `(n_classes-1, n_features)` coef matrix for multi-class. Verify `lda.coef_.shape` after fitting — if it is `(n_cls-1, n_feat)` rather than `(n_cls, n_feat)`, use the `decision_function()` output structure and adjust the export accordingly. --- # PART VII — FEATURE SELECTION FOR ESP32 PORTING After Change 1 is trained, use this to decide what to port to C firmware. ### Step 1 — Get Feature Importance ```python importance = np.abs(classifier.model.coef_).mean(axis=0) feat_names = classifier.feature_extractor.get_feature_names(n_channels=len(HAND_CHANNELS)) ranked = sorted(zip(feat_names, importance), key=lambda x: -x[1]) print("Top 20 features by LDA discriminative weight:") for name, score in ranked[:20]: print(f" {name:<35} {score:.4f}") ``` ### Step 2 — Port Decision Matrix | Feature | C Complexity | Prereq | Port? | |---------|-------------|--------|-------| | RMS, WL, ZC, SSC | ✓ Already in C | — | Keep | | MAV, VAR, IEMG | Very easy (1 loop) | None | ✓ Yes | | WAMP | Very easy (threshold on diff) | None | ✓ Yes | | Cross-ch covariance | Easy (3×3 outer product) | None | ✓ Yes | | Cross-ch correlation | Easy (normalize covariance) | Covariance | ✓ Yes | | Bandpower bp0–bp3 | Medium (128-pt FFT via esp-dsp) | Add FFT call | ✓ Yes — highest ROI | | MNF, MDF, PKF, MNP | Easy after FFT | Bandpower FFT | ✓ Free once FFT added | | AR(4) | Medium (Levinson-Durbin in C) | None | Only if top-8 importance | Once `dsps_fft2r_fc32()` is added for bandpower, MNF/MDF/PKF/MNP come free. ### Step 3 — Adding FFT-Based Features to inference.c Add inside `compute_features()` loop, after time-domain features per channel: ```c // 128-pt FFT for frequency-domain features per channel // Zero-pad signal from INFERENCE_WINDOW_SIZE (150) to 128 by truncating float fft_buf[256] = {0}; // 128 complex floats for (int i = 0; i < 128 && i < INFERENCE_WINDOW_SIZE; i++) { fft_buf[2*i] = signal[i]; // real fft_buf[2*i+1] = 0.0f; // imag } dsps_fft2r_fc32(fft_buf, 128); dsps_bit_rev_fc32(fft_buf, 128); // Bandpower: bin k → freq = k * 1000/128 ≈ k * 7.8125 Hz // Band 0: 20–80 Hz → bins 3–10 // Band 1: 80–150 Hz → bins 10–19 // Band 2: 150–300 Hz→ bins 19–38 // Band 3: 300–500 Hz→ bins 38–64 int band_bins[5] = {3, 10, 19, 38, 64}; float bp[4] = {0,0,0,0}; for (int b = 0; b < 4; b++) for (int k = band_bins[b]; k < band_bins[b+1]; k++) { float re = fft_buf[2*k], im = fft_buf[2*k+1]; bp[b] += re*re + im*im; } // Store at correct indices (base = ch * 20) int base = ch * 20; features_out[base+16] = bp[0]; features_out[base+17] = bp[1]; features_out[base+18] = bp[2]; features_out[base+19] = bp[3]; ``` --- # PART VIII — MEASUREMENT AND VALIDATION ## Baseline Protocol **Run this BEFORE any change and after EACH change.** ``` 1. python learning_data_collection.py → option 3 (Train Classifier) 2. Record: - "Mean CV accuracy: XX.X% ± Y.Y%" (cross-validation) - Confusion matrix (which gesture pairs are most confused) - Per-gesture accuracy breakdown 3. On-device test: - Put on sensors, perform 10 reps of each gesture - Log classification output (UART or Python serial monitor) - Compute per-gesture accuracy manually 4. Record REST false-trigger rate: hold arm at rest for 30 seconds, count number of non-REST outputs ``` ## Results Log | Change | CV Acc Before | CV Acc After | Delta | On-Device Acc | False Triggers/30s | Keep? | |--------|--------------|-------------|-------|---------------|-------------------|-------| | Baseline | — | — | — | — | — | — | | Change C (reject) | — | — | — | — | — | — | | Change B (filter) | — | — | — | — | — | — | | Change 0 (label shift) | — | — | — | — | — | — | | Change 1 (features) | — | — | — | — | — | — | | Change D (NVS calib) | — | — | — | — | — | — | | Change 3 (augment) | — | — | — | — | — | — | | Change 5 (benchmark) | — | — | — | — | — | — | | Change 7+F (ensemble) | — | — | — | — | — | — | | Change E (MLP) | — | — | — | — | — | — | ## When to Add More Gestures | CV Accuracy | Recommendation | |-------------|----------------| | <80% | Do NOT add gestures — fix the existing 5 first | | 80–90% | Adding 1–2 gestures is reasonable; expect 5–8% drop per new gesture | | >90% | Good baseline; can add gestures; target staying above 85% | | >95% | Excellent; can be ambitious with gesture count | --- # PART IX — EXPORT WORKFLOW ## Path 1 — LDA / Ensemble (Changes 0–4, 7+F) ``` 1. Train: python learning_data_collection.py → option 3 (single LDA) OR: python train_ensemble.py (full ensemble) 2. Export: Single LDA: classifier.export_to_header(Path('EMG_Arm/src/core/model_weights.h')) Ensemble: export_ensemble_header() in train_ensemble.py → writes model_weights_ensemble.h 3. Port new features to inference.c (if Change 1 features added): - Follow feature selection decision matrix (Part VII) - CRITICAL: C feature index order MUST match Python FEATURE_ORDER exactly 4. Build + flash: pio run -t upload ``` ## Path 2 — int8 MLP via TFLM (Change E) ``` 1. python train_mlp_tflite.py → emg_model_data.cc 2. Add TFLM to platformio.ini lib_deps 3. Replace LDA inference call with inference_mlp_predict() in inference.c OR use inference_ensemble_predict() which calls MLP as fallback (Change F) 4. pio run -t upload ``` ## Feature Index Contract (Critical) The order of values written to `features_out[]` in `compute_features()` in C **must exactly match** `FEATURE_ORDER` in `extract_features_window()` in Python, index for index. To verify before flashing: print both the C feature names (from `MODEL_FEATURE_NAMES` if added to header) and Python `extractor.get_feature_names()` and diff them. --- # PART X — REFERENCES **Primary paper**: Kaifosh, P., Reardon, T., et al. "A high-bandwidth neuromotor prosthesis enabled by implicit information in intrinsic motor neurons." *Nature* (2025). doi:10.1038/s41586-025-09255-w **Meta codebase** (label alignment, CLER metric, model architectures): `C:/VSCode/Marvel_Projects/Meta_Emg_Stuff/generic-neuromotor-interface/` - `data.py`: onset detection, `searchsorted` alignment, window jitter - `cler.py`: threshold=0.35, debounce=50ms, tolerance=±50/250ms - `networks.py`: model architectures, left_context=20, stride=10 - `lightning.py`: `targets[..., left_context::stride]` label shift **Barachant et al. 2012**: "Multiclass brain–computer interface classification by Riemannian geometry." — matrix logarithm reference (MPF features). **Espressif libraries**: - esp-dsp: `github.com/espressif/esp-dsp` — biquad, FFT, dot-product - esp-dl: `github.com/espressif/esp-dl` — quantized MLP/CNN inference - TFLite Micro: `github.com/tensorflow/tflite-micro` **All project files** (existing + planned): ``` ── Laptop / Python ───────────────────────────────────────────────────────────────────────── C:/VSCode/Marvel_Projects/Bucky_Arm/learning_data_collection.py ← main: data collection + training C:/VSCode/Marvel_Projects/Bucky_Arm/live_predict.py ← NEW (Part 0.6): laptop-side live inference C:/VSCode/Marvel_Projects/Bucky_Arm/train_ensemble.py ← NEW (Change 7): ensemble training C:/VSCode/Marvel_Projects/Bucky_Arm/train_mlp_tflite.py ← NEW (Change E): int8 MLP export ── ESP32 Firmware — Existing ─────────────────────────────────────────────────────────────── C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/platformio.ini └─ ADD lib_deps: espressif/esp-dsp (Changes B,1,F), tensorflow/tflite-micro (Change E) C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/config/config.h └─ MODIFY: remove system_mode_t; add EMG_STANDALONE to MAIN_MODE enum (Part 0.7, S1) C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/app/main.c └─ MODIFY: add STATE_LAPTOP_PREDICT, CMD_START_LAPTOP_PREDICT, run_laptop_predict_loop(), run_standalone_loop() (Part 0.5) C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/drivers/emg_sensor.c └─ MODIFY (Change A): migrate from adc_oneshot to adc_continuous driver C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/core/inference.c └─ MODIFY: add inference_get_gesture_by_name(), IIR filter (B), features (1), confidence rejection (C) C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/core/inference.h └─ MODIFY: add inference_get_gesture_by_name() declaration C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/core/gestures.c └─ MODIFY: update gesture_names[] and gestures_execute() when adding gestures C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/core/model_weights.h └─ AUTO-GENERATED by export_to_header() — do not edit manually ── ESP32 Firmware — New Files ────────────────────────────────────────────────────────────── C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/core/bicep.h/.c ← Part 0 / Section 2.2 C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/core/calibration.h/.c ← Change D (NVS z-score) C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/core/inference_ensemble.h/.c ← Change F C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/core/inference_mlp.h/.cc ← Change E C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/core/model_weights_ensemble.h ← AUTO-GENERATED (Change 7) C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/core/emg_model_data.h/.cc ← AUTO-GENERATED (Change E) ```