112 KiB
Bucky Arm — EMG Gesture Control: Master Implementation Reference
Version: 2026-03-01 | Target: ESP32-S3 N32R16V (Xtensa LX7 @ 240 MHz, 512 KB SRAM, 16 MB OPI PSRAM) Supersedes: META_EMG_RESEARCH_NOTES.md + BUCKY_ARM_IMPROVEMENT_PLAN.md Source paper: doi:10.1038/s41586-025-09255-w (PDF: C:/VSCode/Marvel_Projects/s41586-025-09255-w.pdf)
TABLE OF CONTENTS
- PART 0 — SYSTEM ARCHITECTURE & RESPONSIBILITY ASSIGNMENT
- PART I — SYSTEM FOUNDATIONS
- PART II — TARGET ARCHITECTURE
- PART III — GESTURE EXTENSIBILITY
- PART IV — CHANGE REFERENCE
- PART V — FIRMWARE CHANGES
- PART VI — PYTHON/TRAINING CHANGES
- PART VII — FEATURE SELECTION FOR ESP32 PORTING
- PART VIII — MEASUREMENT AND VALIDATION
- PART IX — EXPORT WORKFLOW
- PART X — REFERENCES
PART 0 — SYSTEM ARCHITECTURE & RESPONSIBILITY ASSIGNMENT
This section is the authoritative reference for what runs where. All implementation decisions in later parts should be consistent with this partition.
0.1 Who Does What
| Responsibility | Laptop (Python) | ESP32 |
|---|---|---|
| EMG sensor reading | — | ✓ emg_sensor_read() always |
| Raw data streaming (for collection) | Receives CSV, saves to HDF5 | Streams CSV over UART |
| Model training | ✓ learning_data_collection.py |
— |
| Model export | ✓ export_to_header() → model_weights.h |
Compiled into firmware |
| On-device inference | — | ✓ inference_predict() |
| Laptop-side live inference | ✓ live_predict.py (new script) |
Streams ADC + executes received cmd |
| Arm actuation | — (sends gesture string back to ESP32) | ✓ gestures_execute() |
| Autonomous operation (no laptop) | Not needed | ✓ EMG_STANDALONE mode |
| Bicep flex detection | — | ✓ bicep_detect() (new, Section 2.2) |
| NVS calibration | — | ✓ calibration.c (Change D) |
Key rule: The laptop is never required for real-time arm control in production. The laptop's role is: collect data → train model → export → flash firmware → done. After that, the ESP32 operates completely independently.
0.2 Operating Modes
Controlled by #define MAIN_MODE in config/config.h.
The enum currently reads enum {EMG_MAIN, SERVO_CALIBRATOR, GESTURE_TESTER}.
A new value EMG_STANDALONE must be added.
MAIN_MODE |
When to use | Laptop required? | Entry point |
|---|---|---|---|
EMG_MAIN |
Development sessions, data collection, monitored operation | Yes — UART handshake to start any mode | appConnector() in main.c |
EMG_STANDALONE |
Fully autonomous deployment — no laptop | No — boots directly into predict+control | run_standalone_loop() (new function in main.c) |
SERVO_CALIBRATOR |
Hardware setup, testing servo range of motion | Yes (serial input) | Inline in app_main() |
GESTURE_TESTER |
Testing gesture→servo mapping via keyboard | Yes (serial input) | Inline in app_main() |
How to switch mode: change #define MAIN_MODE in config.h and reflash.
To add EMG_STANDALONE to config.h (1-line change):
// config.h line 19 — current:
enum {EMG_MAIN, SERVO_CALIBRATOR, GESTURE_TESTER};
// Update to:
enum {EMG_MAIN, SERVO_CALIBRATOR, GESTURE_TESTER, EMG_STANDALONE};
0.3 FSM Reference (EMG_MAIN mode)
The device_state_t enum in main.c and the command_t enum control all transitions.
Currently: {STATE_IDLE, STATE_CONNECTED, STATE_STREAMING, STATE_PREDICTING}.
A new state STATE_LAPTOP_PREDICT must be added (see Section 0.5).
STATE_IDLE
└─ {"cmd":"connect"} ──────────────────────────► STATE_CONNECTED
│
{"cmd":"start"} ──────────┤
│ STATE_STREAMING
│ ESP32 sends raw ADC CSV at 1kHz
│ Laptop: saves to HDF5 (data collection)
│ Laptop: trains model → exports model_weights.h
│ ◄──── {"cmd":"stop"} ────────────────────┘
│
{"cmd":"start_predict"} ─────────┤
│ STATE_PREDICTING
│ ESP32: inference_predict() on-device
│ ESP32: gestures_execute()
│ Laptop: optional UART monitor only
│ ◄──── {"cmd":"stop"} ────────────────────┘
│
{"cmd":"start_laptop_predict"} ───────┘
STATE_LAPTOP_PREDICT [NEW]
ESP32: streams raw ADC CSV (same as STREAMING)
Laptop: runs live_predict.py inference
Laptop: sends {"gesture":"fist"} back
ESP32: executes received gesture command
◄──── {"cmd":"stop"} ────────────────────┘
All active states:
{"cmd":"stop"} → STATE_CONNECTED
{"cmd":"disconnect"} → STATE_IDLE
{"cmd":"connect"} → STATE_CONNECTED (from any state — reconnect)
Convenience table of commands and their effects:
| JSON command | Valid from state | Result |
|---|---|---|
{"cmd":"connect"} |
Any | → STATE_CONNECTED |
{"cmd":"start"} |
STATE_CONNECTED |
→ STATE_STREAMING |
{"cmd":"start_predict"} |
STATE_CONNECTED |
→ STATE_PREDICTING |
{"cmd":"start_laptop_predict"} |
STATE_CONNECTED |
→ STATE_LAPTOP_PREDICT (new) |
{"cmd":"stop"} |
STREAMING/PREDICTING/LAPTOP_PREDICT |
→ STATE_CONNECTED |
{"cmd":"disconnect"} |
Any active state | → STATE_IDLE |
0.4 EMG_STANDALONE Boot Sequence
No UART handshake. No laptop required. Powers on → predicts → controls arm.
app_main() switch MAIN_MODE == EMG_STANDALONE:
│
├── hand_init() // servos
├── emg_sensor_init() // ADC setup
├── inference_init() // clear window buffer, reset smoothing state
├── calibration_init() // load NVS z-score params (Change D)
│ └── if not found in NVS:
│ collect 120 REST windows (~3s at 25ms hop)
│ call calibration_update() to compute and store stats
├── bicep_load_threshold() // load NVS bicep threshold (Section 2.2)
│ └── if not found:
│ collect 3s of still bicep data
│ call bicep_calibrate() and bicep_save_threshold()
│
└── run_standalone_loop() ← NEW function (added to main.c)
while (1):
emg_sensor_read(&sample)
inference_add_sample(sample.channels)
if stride_counter++ >= INFERENCE_HOP_SIZE:
stride_counter = 0
gesture_t g = inference_get_gesture_enum(inference_predict(&conf))
gestures_execute(g)
bicep_state_t b = bicep_detect()
// (future: bicep_actuate(b))
vTaskDelay(1)
run_standalone_loop() is structurally identical to run_inference_loop() in EMG_MAIN,
minus all UART state-change checking and telemetry prints. It runs forever until power-off.
Where to add: New function run_standalone_loop() in app/main.c, plus a new case
in the app_main() switch block:
case EMG_STANDALONE:
run_standalone_loop();
break;
0.5 New Firmware Changes for Architecture
These changes are needed to implement the architecture above. They are structural (not accuracy improvements) and should be done before any other changes.
S1 — Add EMG_STANDALONE to config.h
File: EMG_Arm/src/config/config.h, line 19
// Change:
enum {EMG_MAIN, SERVO_CALIBRATOR, GESTURE_TESTER};
// To:
enum {EMG_MAIN, SERVO_CALIBRATOR, GESTURE_TESTER, EMG_STANDALONE};
S2 — Add STATE_LAPTOP_PREDICT to FSM (main.c)
File: EMG_Arm/src/app/main.c
// In device_state_t enum — add new state:
typedef enum {
STATE_IDLE = 0,
STATE_CONNECTED,
STATE_STREAMING,
STATE_PREDICTING,
STATE_LAPTOP_PREDICT, // ← ADD: streams ADC to laptop, executes laptop's gesture commands
} device_state_t;
// In command_t enum — add new command:
typedef enum {
CMD_NONE = 0,
CMD_CONNECT,
CMD_START,
CMD_START_PREDICT,
CMD_START_LAPTOP_PREDICT, // ← ADD
CMD_STOP,
CMD_DISCONNECT,
} command_t;
In parse_command() — add detection (place BEFORE the "start" check to avoid prefix collision):
} else if (strncmp(value_start, "start_laptop_predict", 20) == 0) {
return CMD_START_LAPTOP_PREDICT;
} else if (strncmp(value_start, "start_predict", 13) == 0) {
return CMD_START_PREDICT;
} else if (strncmp(value_start, "start", 5) == 0) {
return CMD_START;
In serial_input_task() FSM switch — add to STATE_CONNECTED block:
} else if (cmd == CMD_START_LAPTOP_PREDICT) {
g_device_state = STATE_LAPTOP_PREDICT;
printf("[STATE] CONNECTED -> LAPTOP_PREDICT\n");
xQueueSend(g_cmd_queue, &cmd, 0);
}
Add to the active-state check in serial_input_task():
case STATE_STREAMING:
case STATE_PREDICTING:
case STATE_LAPTOP_PREDICT: // ← ADD to the case list
if (cmd == CMD_STOP) { ... }
New function run_laptop_predict_loop() (add alongside stream_emg_data() and run_inference_loop()):
/**
* @brief Laptop-mediated prediction loop (STATE_LAPTOP_PREDICT).
*
* Streams raw ADC CSV to laptop for inference.
* Simultaneously reads gesture commands sent back by laptop.
* Executes received gesture immediately.
*
* Laptop sends: {"gesture":"fist"}\n OR {"gesture":"rest"}\n etc.
* ESP32 parses the "gesture" field and calls inference_get_gesture_enum() + gestures_execute().
*/
static void run_laptop_predict_loop(void) {
emg_sample_t sample;
char cmd_buf[64];
int cmd_idx = 0;
printf("{\"status\":\"info\",\"msg\":\"Laptop-predict mode started\"}\n");
while (g_device_state == STATE_LAPTOP_PREDICT) {
// 1. Send raw ADC sample (same format as STATE_STREAMING)
emg_sensor_read(&sample);
printf("%u,%u,%u,%u\n", sample.channels[0], sample.channels[1],
sample.channels[2], sample.channels[3]);
// 2. Non-blocking read of any incoming gesture command from laptop
// (serial_input_task already handles FSM commands; this handles gesture commands)
// Note: getchar() is non-blocking when there is no data (returns EOF).
// Gesture messages from laptop look like: {"gesture":"fist"}\n
int c = getchar();
if (c != EOF && c != 0xFF) {
if (c == '\n' || c == '\r') {
if (cmd_idx > 0) {
cmd_buf[cmd_idx] = '\0';
// Parse {"gesture":"<name>"} — look for "gesture" field
const char *g = strstr(cmd_buf, "\"gesture\"");
if (g) {
const char *v = strchr(g, ':');
if (v) {
v++;
while (*v == ' ' || *v == '"') v++;
// Extract gesture name up to closing quote
char name[32] = {0};
int ni = 0;
while (*v && *v != '"' && ni < 31) name[ni++] = *v++;
name[ni] = '\0';
// Map name to enum and execute (reuse inference mapping)
gesture_t gesture = (gesture_t)inference_get_gesture_enum_by_name(name);
if (gesture != GESTURE_NONE) {
gestures_execute(gesture);
}
}
}
cmd_idx = 0;
}
} else if (cmd_idx < (int)sizeof(cmd_buf) - 1) {
cmd_buf[cmd_idx++] = (char)c;
} else {
cmd_idx = 0;
}
}
vTaskDelay(1);
}
}
Note: inference_get_gesture_enum_by_name(const char *name) is just the existing
inference_get_gesture_enum(int class_idx) refactored to accept a string directly
(bypassing the class_idx lookup). Alternatively, keep the existing function and add a
simple wrapper — the string matching logic already exists in inference.c:
// Simpler: reuse the existing strcmp chain in inference_get_gesture_enum()
// by passing the name through a helper that returns the gesture_t directly.
// Add to inference.c / inference.h:
gesture_t inference_get_gesture_by_name(const char *name);
// (same strcmp logic as inference_get_gesture_enum, but returns gesture_t directly)
In state_machine_loop() — add the new state:
static void state_machine_loop(void) {
command_t cmd;
const TickType_t poll_interval = pdMS_TO_TICKS(50);
while (1) {
if (g_device_state == STATE_STREAMING) stream_emg_data();
else if (g_device_state == STATE_PREDICTING) run_inference_loop();
else if (g_device_state == STATE_LAPTOP_PREDICT) run_laptop_predict_loop(); // ← ADD
xQueueReceive(g_cmd_queue, &cmd, poll_interval);
}
}
In app_main() switch — add the standalone case:
case EMG_STANDALONE:
run_standalone_loop(); // new function — see Section 0.4
break;
0.6 New Python Script: live_predict.py
Location: C:/VSCode/Marvel_Projects/Bucky_Arm/live_predict.py (new file)
Purpose: Laptop-side live inference. Reads raw ADC stream from ESP32, runs the Python
classifier, sends gesture commands back to ESP32 for arm control.
When to use: EMG_MAIN + STATE_LAPTOP_PREDICT — useful for debugging and comparing
laptop accuracy vs on-device accuracy before flashing a new model.
"""
live_predict.py — Laptop-side live EMG inference for Bucky Arm.
Connects to ESP32, requests STATE_LAPTOP_PREDICT, reads raw ADC CSV,
runs the trained Python classifier, sends gesture commands back to ESP32.
Usage:
python live_predict.py --port COM3 --model path/to/saved_model/
"""
import argparse
import time
import numpy as np
import serial
from pathlib import Path
import sys
sys.path.insert(0, str(Path(__file__).parent))
from learning_data_collection import (
EMGClassifier, EMGFeatureExtractor, SessionStorage, HAND_CHANNELS,
WINDOW_SIZE_SAMPLES, HOP_SIZE_SAMPLES, NUM_CHANNELS,
)
BAUD_RATE = 921600
CALIB_SEC = 3.0 # seconds of REST to collect for normalization at startup
CALIB_LABEL = "rest" # label used during calibration window
def parse_args():
p = argparse.ArgumentParser()
p.add_argument("--port", required=True, help="Serial port, e.g. COM3 or /dev/ttyUSB0")
p.add_argument("--model", required=True, help="Path to saved EMGClassifier model directory")
return p.parse_args()
def handshake(ser):
"""Send connect command, wait for ack."""
ser.write(b'{"cmd":"connect"}\n')
deadline = time.time() + 5.0
while time.time() < deadline:
line = ser.readline().decode("utf-8", errors="ignore").strip()
if "ack_connect" in line:
print(f"[Handshake] Connected: {line}")
return True
raise RuntimeError("No ack_connect received within 5s")
def collect_calibration_windows(ser, n_windows, window_size, hop_size, n_channels):
"""Collect n_windows worth of REST data for normalization calibration."""
print(f"[Calib] Collecting {n_windows} REST windows — hold arm still...")
raw_buffer = np.zeros((window_size, n_channels), dtype=np.float32)
windows = []
sample_count = 0
while len(windows) < n_windows:
line = ser.readline().decode("utf-8", errors="ignore").strip()
try:
vals = [float(v) for v in line.split(",")]
if len(vals) != n_channels:
continue
except ValueError:
continue
raw_buffer = np.roll(raw_buffer, -1, axis=0)
raw_buffer[-1] = vals
sample_count += 1
if sample_count >= window_size and sample_count % hop_size == 0:
windows.append(raw_buffer.copy())
print(f"[Calib] Collected {len(windows)} windows. Computing normalization stats...")
return np.array(windows) # (n_windows, window_size, n_channels)
def main():
args = parse_args()
# Load trained classifier
print(f"[Init] Loading classifier from {args.model}...")
classifier = EMGClassifier()
classifier.load(Path(args.model))
extractor = classifier.feature_extractor
ser = serial.Serial(args.port, BAUD_RATE, timeout=1.0)
time.sleep(0.5)
ser.reset_input_buffer()
handshake(ser)
# Request laptop-predict mode
ser.write(b'{"cmd":"start_laptop_predict"}\n')
print("[Control] Entered STATE_LAPTOP_PREDICT")
# Calibration: collect 3s of REST for session normalization
n_calib_windows = max(10, int(CALIB_SEC * 1000 / (HOP_SIZE_SAMPLES)))
calib_raw = collect_calibration_windows(
ser, n_calib_windows, WINDOW_SIZE_SAMPLES, HOP_SIZE_SAMPLES, NUM_CHANNELS
)
calib_features = extractor.extract_features_batch(calib_raw)
calib_mean = calib_features.mean(axis=0)
calib_std = np.where(calib_features.std(axis=0) > 1e-6,
calib_features.std(axis=0), 1e-6)
print("[Calib] Done. Starting live prediction...")
# Live prediction loop
raw_buffer = np.zeros((WINDOW_SIZE_SAMPLES, NUM_CHANNELS), dtype=np.float32)
sample_count = 0
last_gesture = None
try:
while True:
line = ser.readline().decode("utf-8", errors="ignore").strip()
# Skip JSON telemetry lines from ESP32
if line.startswith("{"):
continue
try:
vals = [float(v) for v in line.split(",")]
if len(vals) != NUM_CHANNELS:
continue
except ValueError:
continue
# Slide window
raw_buffer = np.roll(raw_buffer, -1, axis=0)
raw_buffer[-1] = vals
sample_count += 1
if sample_count >= WINDOW_SIZE_SAMPLES and sample_count % HOP_SIZE_SAMPLES == 0:
# Extract features and normalize with session stats
feat = extractor.extract_features_window(raw_buffer)
feat = (feat - calib_mean) / calib_std
proba = classifier.model.predict_proba([feat])[0]
class_idx = int(np.argmax(proba))
gesture_name = classifier.label_names[class_idx]
confidence = float(proba[class_idx])
# Send gesture command to ESP32
cmd = f'{{"gesture":"{gesture_name}"}}\n'
ser.write(cmd.encode("utf-8"))
if gesture_name != last_gesture:
print(f"[Predict] {gesture_name:12s} conf={confidence:.2f}")
last_gesture = gesture_name
except KeyboardInterrupt:
print("\n[Stop] Sending stop command...")
ser.write(b'{"cmd":"stop"}\n')
ser.close()
if __name__ == "__main__":
main()
Dependencies (add to a requirements.txt in Bucky_Arm/ if not already there):
pyserial
numpy
scikit-learn
0.7 Firmware Cleanup: system_mode_t Removal
config.h lines 94–100 define a system_mode_t typedef that is not referenced anywhere
in the firmware. It predates the current device_state_t FSM in main.c and conflicts
conceptually with it. Remove before starting implementation work.
File: EMG_Arm/src/config/config.h
Remove (lines 93–100):
/**
* @brief System operating modes.
*/
typedef enum {
MODE_IDLE = 0, /**< Waiting for commands */
MODE_DATA_STREAM, /**< Streaming EMG data to laptop */
MODE_COMMAND, /**< Executing gesture commands from laptop */
MODE_DEMO, /**< Running demo sequence */
MODE_COUNT
} system_mode_t;
No other file references system_mode_t — the deletion is safe and requires no other changes.
PART I — SYSTEM FOUNDATIONS
1. Hardware Specification
ESP32-S3 N32R16V — Confirmed Hardware
| Resource | Spec | Implication |
|---|---|---|
| CPU | Dual-core Xtensa LX7 @ 240 MHz | Pin inference to Core 1, sampling to Core 0 |
| SIMD | PIE 128-bit vector extension | esp-dsp exploits this for FFT, biquad, dot-product |
| Internal SRAM | ~512 KB | All hot-path buffers, model weights, inference state |
| OPI PSRAM | 16 MB (~80 MB/s) | ADC ring buffer, raw window storage — not hot path |
| Flash | 32 MB | Code + read-only model flatbuffers (TFLM path) |
| ADC | 2× SAR ADC, 12-bit, continuous DMA mode | Change A: use adc_continuous driver |
Memory rules:
- Tag inference code:
IRAM_ATTR— prevents cache miss stalls - Tag large ring buffers:
EXT_RAM_BSS_ATTR— pushes to PSRAM automatically - Never run hot-path loops from PSRAM (latency varies; ~10× slower than SRAM)
Espressif Acceleration Libraries
| Library | Accelerates | Key Functions |
|---|---|---|
| esp-dsp | IIR biquad, FFT (up to 4096-pt), vector dot-product, matrix ops — PIE SIMD | dsps_biquad_f32, dsps_fft2r_fc32, dsps_dotprod_f32 |
| esp-nn | int8 FC, depthwise/pointwise Conv, activations — SIMD optimized | Used internally by esp-dl |
| esp-dl | High-level int8 inference: MLP, Conv1D, LSTM; activation buffer management | Small MLP / tiny CNN deployment |
| TFLite Micro | Standard int8 flatbuffer inference, tensor arena (static alloc) | Keras → TFLite → int8 workflow |
Real-Time Budget (1000 Hz, 25ms hop)
| Stage | Cost | Notes |
|---|---|---|
| ADC DMA sampling | ~0 µs | Hardware; CPU-free |
| IIR biquad (3 ch, 2 stages) | <100 µs | dsps_biquad_f32 |
| Feature extraction (69 feat) | ~1,200 µs | FFT-based features dominate |
| 3 specialist LDAs | ~150 µs | dsps_dotprod_f32 per class |
| Meta-LDA (15 inputs) | ~10 µs | 75 MACs total |
| int8 MLP fallback [69→32→16→5] | ~250 µs | esp-nn FC kernels |
| Post-processing | <50 µs | EMA, vote, debounce |
| Total (full ensemble) | ~1,760 µs | 14× margin within 25ms |
Hard No-Gos
| Technique | Why |
|---|---|
| Full MPF with matrix logarithm | Eigendecomposition per window; fragile float32; no SIMD path |
| Conv1D(16→512) + 3×LSTM(512) | ~4 MB weights; LSTM sequential dependency — impossible |
| Any transformer / attention | O(n²); no int8 transformer kernels for MCU |
| On-device gradient updates | Inference only — no training infrastructure |
| Heap allocations on hot path | FreeRTOS heap fragmentation kills determinism |
2. Current System Snapshot
| Aspect | Current State |
|---|---|
| Channels | 4 total; ch0–ch2 forearm (FCR, FCU, extensor), ch3 bicep (excluded from hand classifier) |
| Sampling | 1000 Hz, timer/polling (jitter — fix with Change A) |
| Window | 150 samples (150ms), 25-sample hop (25ms) |
| Features | 12: RMS, WL, ZC, SSC × 3 channels |
| Classifier | Single LDA, float32 weights in C header |
| Label alignment | RMS onset detection — missing +100ms forward shift (Change 0) |
| Normalization | Per-session z-score in Python; no on-device equivalent (Change D) |
| Smoothing | EMA (α=0.7) + majority vote (5) + debounce (3 counts) |
| Confidence rejection | None — always outputs a class (Change C) |
| Signal filtering | Analogue only via MyoWare (Change B adds software IIR) |
| Gestures | 5: fist, hook_em, open, rest, thumbs_up |
| Training data | 15 HDF5 sessions, 1 user |
2.1 — Confirmed Firmware Architecture (From Codebase Exploration)
Confirmed by direct codebase inspection 2026-02-24. All file paths relative to
C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/
ADC Pin Mapping (drivers/emg_sensor.c)
| Channel | ADC Channel | GPIO | Muscle Location | Role in Classifier |
|---|---|---|---|---|
| ch0 | ADC_CHANNEL_1 |
GPIO 2 | Forearm Belly (FCR) | Primary flexion signal |
| ch1 | ADC_CHANNEL_2 |
GPIO 3 | Forearm Extensors | Extension signal |
| ch2 | ADC_CHANNEL_8 |
GPIO 9 | Forearm Contractors (FCU) | Ulnar flexion signal |
| ch3 | ADC_CHANNEL_9 |
GPIO 10 | Bicep | Independent — see Section 2.2 |
Current ADC driver: adc_oneshot (polling — NOT DMA continuous yet; Change A migrates this)
- Attenuation:
ADC_ATTEN_DB_12(0–3.9V full-scale range) - Calibration:
adc_cali_curve_fittingscheme - Output: calibrated millivolts as
uint16_tpacked intoemg_sample_t.channels[4] - Timing:
vTaskDelay(1)inrun_inference_loop()provides the ~1ms sample interval
Current Task Structure (app/main.c)
| Task | Priority | Stack | Core Pinning | Role |
|---|---|---|---|---|
app_main (implicit) |
Default | Default | None | Runs inference loop + state machine |
serial_input_task |
5 | 4096 B | None | Parses UART JSON commands |
No other tasks exist. Change A will add adc_sampling_task pinned to Core 0.
The inference loop runs on app_main's default task — no explicit core affinity.
State Machine (app/main.c)
STATE_IDLE ─(BLE/UART connect)─► STATE_CONNECTED
│
{"cmd": "start_stream"}▼
STATE_STREAMING (sends raw ADC over UART for Python)
│
{"cmd": "start_predict"}▼
STATE_PREDICTING (runs run_inference_loop())
Communication: UART at 921600 baud, JSON framing.
Complete Data Flow (Exact Function Names)
emg_sensor_read(&sample)
│ drivers/emg_sensor.c
│ adc_oneshot_read() × 4 channels → adc_cali_raw_to_voltage() → uint16_t mV
│ Result: sample.channels[4] = {ch0_mV, ch1_mV, ch2_mV, ch3_mV}
│
▼ Called every ~1ms (vTaskDelay(1) in run_inference_loop)
inference_add_sample(sample.channels)
│ core/inference.c
│ Writes to circular window_buffer[150][4]
│ Returns true when buffer is full (after first 150 samples)
│
▼ Called every 25 samples (stride_counter % INFERENCE_HOP_SIZE == 0)
inference_predict(&confidence)
│ core/inference.c
│ compute_features() → LDA scores → softmax → EMA → majority vote → debounce
│ Returns: gesture class index (int), fills confidence (float)
│
▼
inference_get_gesture_enum(class_idx)
│ core/inference.c
│ String match on MODEL_CLASS_NAMES[] → gesture_t enum value
│
▼
gestures_execute(gesture)
core/gestures.c
switch(gesture) → servo PWM via LEDC driver
Servo pins: GPIO 1,4,5,6,7 (Thumb, Index, Middle, Ring, Pinky)
Current Buffer State
// core/inference.c line 19:
static uint16_t window_buffer[INFERENCE_WINDOW_SIZE][NUM_CHANNELS];
// ^^^^^^^^ MUST change to float when adding IIR filter (Change B)
//
// uint16_t: 150 × 4 × 2 = 1,200 bytes in internal SRAM
// float: 150 × 4 × 4 = 2,400 bytes in internal SRAM (still trivially small)
//
// Reason for change: IIR filter outputs float; casting back to uint16_t loses
// sub-mV precision and re-introduces the quantization noise we just filtered out.
platformio.ini Current State (EMG_Arm/platformio.ini)
Current lib_deps: None — completely empty, no external library dependencies.
Required additions per change tier:
| Change | Library | platformio.ini lib_deps entry |
|---|---|---|
| B (IIR biquad) | esp-dsp | espressif/esp-dsp @ ^2.0.0 |
| 1 (FFT features) | esp-dsp | (same — add once for both B and 1) |
| E (int8 MLP) | TFLite Micro | tensorflow/tflite-micro |
| F (ensemble) | esp-dsp | (same as B) |
Add to platformio.ini under [env:esp32-s3-devkitc1-n16r16]:
lib_deps =
espressif/esp-dsp @ ^2.0.0
; tensorflow/tflite-micro ← add this only when implementing Change E
2.2 — Bicep Channel Subsystem (ch3 / ADC_CHANNEL_9 / GPIO 10)
Current Status
The bicep channel is:
- Sampled:
emg_sensor_read()reads all 4 channels;sample.channels[3]holds bicep data - Excluded from hand classifier:
HAND_NUM_CHANNELS = 3;compute_features()explicitly loopsch = 0toch < HAND_NUM_CHANNELS(i.e., ch0, ch1, ch2 only) - Not yet independently processed: the comment in
inference.cline 68 ("ch3 (bicep) is excluded — it will be processed independently") is aspirational — the independent processing is not yet implemented
Phase 1 — Binary Flex/Unflex (Current Target)
Implement a simple RMS threshold detector as a new subsystem:
New files:
EMG_Arm/src/core/bicep.h
EMG_Arm/src/core/bicep.c
bicep.h:
#pragma once
#include <stdint.h>
#include <stdbool.h>
typedef enum {
BICEP_STATE_REST = 0,
BICEP_STATE_FLEX = 1,
} bicep_state_t;
// Call once at session start with ~3s of relaxed bicep data.
// Returns the computed threshold (also stored internally).
float bicep_calibrate(const uint16_t *ch3_samples, int n_samples);
// Call every 25ms (same hop as hand gesture inference).
// Computes RMS on the last BICEP_WINDOW_SAMPLES from the ch3 circular buffer.
bicep_state_t bicep_detect(void);
// Load/save threshold to NVS (reuse calibration.c infrastructure from Change D)
bool bicep_save_threshold(float threshold_mv);
bool bicep_load_threshold(float *threshold_mv_out);
Core logic (bicep.c):
#define BICEP_WINDOW_SAMPLES 50 // 50ms window at 1000Hz
#define BICEP_FLEX_MULTIPLIER 2.5f // threshold = rest_rms × 2.5
#define BICEP_HYSTERESIS 1.3f // prevents rapid toggling at threshold boundary
static float s_threshold_mv = 0.0f;
static bicep_state_t s_state = BICEP_STATE_REST;
float bicep_calibrate(const uint16_t *ch3_samples, int n_samples) {
float rms_sq = 0.0f;
for (int i = 0; i < n_samples; i++)
rms_sq += (float)ch3_samples[i] * ch3_samples[i];
float rest_rms = sqrtf(rms_sq / n_samples);
s_threshold_mv = rest_rms * BICEP_FLEX_MULTIPLIER;
printf("[Bicep] Calibrated: rest_rms=%.1f mV, threshold=%.1f mV\n",
rest_rms, s_threshold_mv);
return s_threshold_mv;
}
bicep_state_t bicep_detect(void) {
// Compute RMS on last BICEP_WINDOW_SAMPLES from ch3 circular buffer
// (ch3 values are stored in window_buffer[][3] alongside hand channels)
float rms_sq = 0.0f;
int idx = buffer_head;
for (int i = 0; i < BICEP_WINDOW_SAMPLES; i++) {
float v = (float)window_buffer[idx][3]; // ch3 = bicep
rms_sq += v * v;
idx = (idx + 1) % INFERENCE_WINDOW_SIZE;
}
float rms = sqrtf(rms_sq / BICEP_WINDOW_SAMPLES);
// Hysteresis: require FLEX_MULTIPLIER to enter flex, 1.0× to exit
if (s_state == BICEP_STATE_REST && rms > s_threshold_mv * BICEP_HYSTERESIS)
s_state = BICEP_STATE_FLEX;
else if (s_state == BICEP_STATE_FLEX && rms < s_threshold_mv)
s_state = BICEP_STATE_REST;
return s_state;
}
Integration in main.c run_inference_loop():
// Call alongside inference_predict() every 25ms:
if (stride_counter % INFERENCE_HOP_SIZE == 0) {
float confidence;
int class_idx = inference_predict(&confidence);
gesture_t gesture = inference_get_gesture_enum(class_idx);
bicep_state_t bicep = bicep_detect();
// Combined actuation: hand gesture + bicep state
// Example: bicep flex can enable/disable certain gestures,
// or control a separate elbow/wrist joint.
gestures_execute(gesture);
// bicep_actuate(bicep); ← add when elbow motor is wired
}
Calibration trigger (add to serial_input_task command parsing):
// {"cmd": "calibrate_bicep"} → collect 3s of rest data, call bicep_calibrate()
Phase 2 — Continuous Angle/Velocity Prediction (Future)
When ready to move beyond binary flex/unflex:
- Collect angle-labeled data: hold arm at 0°, 15°, 30°, 45°, 60°, 75°, 90°; log RMS at each; collect 5+ reps per angle.
- Fit polynomial:
angle = a0 + a1*rms + a2*rms²(degree-2 usually sufficient); usenumpy.polyfit(rms_values, angles, deg=2). - Store coefficients in NVS: 3 floats via
nvs_set_blob(). - On-device evaluation:
angle = a0 + rms*(a1 + rms*a2)— 2 MACs per inference. - Velocity:
velocity = (angle_now - angle_prev) / HOP_MSwith low-pass smoothing.
Including ch3 in Hand Gesture Classifier (for Wrist Rotation)
If/when wrist rotation or supination gestures are added:
# learning_data_collection.py — change this constant:
HAND_CHANNELS = [0, 1, 2, 3] # was [0, 1, 2]; include bicep for rotation gestures
Feature count becomes: 4 channels × 20 per-ch + 10 cross-ch covariances + 6 correlations = 96 total. The bicep subsystem is then retired and ch3 becomes part of the main gesture classifier.
3. What Meta Built — Filtered for ESP32
Meta's Nature 2025 paper (doi:10.1038/s41586-025-09255-w) describes a 16-channel wristband running Conv1D(16→512)+3×LSTM(512). That exact model is not portable to ESP32-S3 (~4 MB weights). What IS transferable:
| Meta Technique | Transferability | Where Used |
|---|---|---|
| +100ms forward label shift after onset detection | ✓ Direct copy | Change 0 |
| Frequency features > amplitude features (Extended Data Fig. 6) | ✓ Core insight | Change 1, Change 6 |
| Deliberate electrode repositioning between sessions | ✓ Protocol | Change 2 |
| Window jitter + amplitude augmentation | ✓ Training | Change 3 |
| Reinhard compression `64x/(32+ | x | )` |
| EMA α=0.7, threshold=0.35, debounce=50ms | ✓ Already implemented | Change C |
| Specialist features → meta-learner stacking | ✓ Adapted | Change 7 + F |
| Conv1D+LSTM architecture | ✗ Too large | Not implementable |
| Full MPF with matrix logarithm | ✗ Eigendecomp too costly | Not implementable |
4. Current Code State + Known Bugs
All Python changes: C:/VSCode/Marvel_Projects/Bucky_Arm/learning_data_collection.py
Firmware: C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/core/inference.c
Config: C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/config/config.h
Weights: C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/core/model_weights.h
Key Symbol Locations
| Symbol | Line | Notes |
|---|---|---|
| Constants block | 49–94 | NUM_CHANNELS, SAMPLING_RATE_HZ, WINDOW_SIZE_MS, etc. |
align_labels_with_onset() |
442 | RMS onset detection |
filter_transition_windows() |
529 | Removes onset/offset ambiguity windows |
SessionStorage.save_session() |
643 | Calls onset alignment, saves HDF5 |
SessionStorage.load_all_for_training() |
871 | Returns 6 values (see bug below) |
EMGFeatureExtractor class |
1404 | Current: RMS, WL, ZC, SSC only |
extract_features_single_channel() |
1448 | Per-channel feature dict |
extract_features_window() |
1482 | Flat array + cross-channel |
extract_features_batch() |
1520 | Batch wrapper |
get_feature_names() |
1545 | String names for features |
CalibrationTransform class |
1562 | z-score at Python-side inference |
EMGClassifier class |
1713 | LDA/QDA wrapper |
EMGClassifier.__init__() |
1722 | Creates EMGFeatureExtractor |
EMGClassifier.train() |
1735 | Feature extraction + model fit |
EMGClassifier._apply_session_normalization() |
1774 | Per-session z-score |
EMGClassifier.cross_validate() |
1822 | GroupKFold, trial-level |
EMGClassifier.export_to_header() |
1956 | Writes model_weights.h |
EMGClassifier.save() |
1910 | Persists model params |
EMGClassifier.load() |
2089 | Reconstructs from saved params |
run_training_demo() |
2333 | Main training entry point |
inference.c compute_features() |
68 | C feature extraction |
inference.c inference_predict() |
158 | C LDA + smoothing pipeline |
Pending Cleanups (Do Before Any Other Code Changes)
| Item | File | Action |
|---|---|---|
Remove system_mode_t |
config/config.h lines 93–100 |
Delete the unused typedef (see Part 0, Section 0.7) |
Add EMG_STANDALONE to enum |
config/config.h line 19 |
Add value to the existing MAIN_MODE enum |
Add STATE_LAPTOP_PREDICT + CMD_START_LAPTOP_PREDICT |
app/main.c |
See Part 0, Section 0.5 for exact diffs |
Add run_standalone_loop() |
app/main.c |
New function — see Part 0, Section 0.4 |
Add run_laptop_predict_loop() |
app/main.c |
New function — see Part 0, Section 0.5 |
Add inference_get_gesture_by_name() |
core/inference.c + core/inference.h |
Small helper — extracts existing strcmp logic |
Known Bug — Line 2382
# BUG: load_all_for_training() returns 6 values; this call unpacks only 5.
# session_indices_combined is silently dropped — breaks per-session normalization.
X, y, trial_ids, label_names, loaded_sessions = storage.load_all_for_training()
# FIX (apply with Change 1):
X, y, trial_ids, session_indices, label_names, loaded_sessions = storage.load_all_for_training()
Current model_weights.h State (as of 2026-02-14 training run)
| Constant | Value | Note |
|---|---|---|
MODEL_NUM_CLASSES |
5 | fist, hook_em, open, rest, thumbs_up |
MODEL_NUM_FEATURES |
12 | RMS, WL, ZC, SSC × 3 forearm channels |
MODEL_CLASS_NAMES |
{"fist","hook_em","open","rest","thumbs_up"} |
Alphabetical order |
MODEL_NORMALIZE_FEATURES |
not defined yet | Add when enabling cross-ch norm (Change B) |
MODEL_USE_REINHARD |
not defined yet | Add when enabling Reinhard compression (Change 4) |
FEAT_ZC_THRESH |
0.1f |
Fraction of RMS for zero-crossing threshold |
FEAT_SSC_THRESH |
0.1f |
Fraction of RMS for slope sign change threshold |
The LDA_WEIGHTS and LDA_INTERCEPTS arrays are current trained values — do not modify manually.
They are regenerated by EMGClassifier.export_to_header() after each training run.
Current Feature Vector (12 features — firmware contract)
ch0: [0]=rms [1]=wl [2]=zc [3]=ssc
ch1: [4]=rms [5]=wl [6]=zc [7]=ssc
ch2: [8]=rms [9]=wl [10]=zc [11]=ssc
Target Feature Vector (69 features after Change 1)
Per channel (×3 channels, 20 features each):
[0] rms [1] wl [2] zc [3] ssc [4] mav [5] var
[6] iemg [7] wamp [8] ar1 [9] ar2 [10] ar3 [11] ar4
[12] mnf [13] mdf [14] pkf [15] mnp [16] bp0 [17] bp1
[18] bp2 [19] bp3
ch0: indices 0–19
ch1: indices 20–39
ch2: indices 40–59
Cross-channel (9 features):
[60] cov_ch0_ch0 [61] cov_ch0_ch1 [62] cov_ch0_ch2
[63] cov_ch1_ch1 [64] cov_ch1_ch2 [65] cov_ch2_ch2
[66] cor_ch0_ch1 [67] cor_ch0_ch2 [68] cor_ch1_ch2
Specialist Feature Subset Indices (for Change F + Change 7)
TD (time-domain, 36 feat): indices [0–11, 20–31, 40–51]
FD (frequency-domain, 24 feat): indices [12–19, 32–39, 52–59]
CC (cross-channel, 9 feat): indices [60–68]
PART II — TARGET ARCHITECTURE
5. Full Recommended Multi-Model Stack
ADC (DMA, Change A)
└── IIR Biquad filter per channel (Change B)
└── 150-sample circular window buffer
│
▼ [every 25ms]
compute_features() → 69-feature vector
│
▼
calibration_apply() (Change D — NVS z-score)
│
├─── Stage 1: Activity Gate ──────────────────────────────────┐
│ total_rms < REST_THRESHOLD? → return GESTURE_REST │
│ (skips all inference during obvious idle) │
│ │
▼ (only reached when gesture is active) │
Stage 2: Parallel Specialist LDAs (Change F) │
├── LDA_TD [TD features, 36-dim] → prob_td[5] │
├── LDA_FD [FD features, 24-dim] → prob_fd[5] │
└── LDA_CC [CC features, 9-dim] → prob_cc[5] │
│
▼ │
Stage 3: Meta-LDA stacker (Change F) │
input: [prob_td | prob_fd | prob_cc] (15-dim) │
output: meta_probs[5] │
│
▼ │
EMA smoothing (α=0.7) on meta_probs │
│ │
├── max smoothed prob ≥ 0.50? ────── Yes ──────────────────┐ │
│ │ │
└── No: Stage 4 Confidence Cascade (Change E) │ │
run int8 MLP on full 69-feat vector │ │
use higher-confidence winner │ │
│ │ │
└────────────────────────────────────────────►│ │
│ │
◄────────────────────────────────────────────────────────── │ │
│ ◄─┘
▼
Stage 5: Confidence rejection (Change C)
max_prob < 0.40? → return current_output (hold / GESTURE_NONE)
│
▼
Majority vote (window=5) + Debounce (count=3)
│
▼
final gesture → actuation
Model Weight Footprint
| Model | Input Dim | Weights | Memory (float32) |
|---|---|---|---|
| LDA_TD | 36 | 5×36 = 180 | 720 B |
| LDA_FD | 24 | 5×24 = 120 | 480 B |
| LDA_CC | 9 | 5×9 = 45 | 180 B |
| Meta-LDA | 15 | 5×15 = 75 | 300 B |
| int8 MLP [69→32→16→5] | 69 | ~2,900 | ~2.9 KB int8 |
| Total | ~4.6 KB |
All model weights fit comfortably in internal SRAM.
6. Compute Budget for Full Stack
| Stage | Cost | Cumulative |
|---|---|---|
| Feature extraction (69 feat, 128-pt FFT ×3) | 1,200 µs | 1,200 µs |
| NVS calibration apply | 10 µs | 1,210 µs |
| Activity gate (RMS check) | 5 µs | 1,215 µs |
| LDA_TD (36 feat × 5 classes) | 50 µs | 1,265 µs |
| LDA_FD (24 feat × 5 classes) | 35 µs | 1,300 µs |
| LDA_CC (9 feat × 5 classes) | 15 µs | 1,315 µs |
| Meta-LDA (15 feat × 5 classes) | 10 µs | 1,325 µs |
| EMA + confidence check | 10 µs | 1,335 µs |
| int8 MLP (worst case, ~30% of hops) | 250 µs | 1,585 µs |
| Vote + debounce | 20 µs | 1,605 µs |
| Worst-case total | 1,760 µs | 7% of 25ms budget |
7. Why This Architecture Works for 3-Channel EMG
Three channels means limited spatial information. The ensemble compensates by extracting maximum diversity from the temporal and spectral dimensions:
- LDA_TD specializes in muscle activation intensity and dynamics (how hard and fast is each muscle firing)
- LDA_FD specializes in muscle activation frequency content (motor unit recruitment patterns — slow vs. fast twitch fibres fire at different frequencies)
- LDA_CC specializes in inter-muscle coordination (which muscles co-activate — the spatial "fingerprint" of each gesture)
These three signal aspects are partially uncorrelated. A gesture that confuses LDA_TD (similar amplitude patterns) may be distinguishable by LDA_FD (different frequency recruitment) or LDA_CC (different co-activation pattern). The meta-LDA learns which specialist to trust for each gesture boundary.
The int8 MLP fallback handles the residual nonlinear cases: gesture pairs where the decision boundary is curved in feature space, which LDA (linear boundary only) cannot resolve.
PART III — GESTURE EXTENSIBILITY
8. What Changes When Adding or Removing a Gesture
The system is designed for extensibility. Adding a gesture requires 3 firmware lines and a retrain.
What Changes Automatically (No Manual Code Edits)
| Component | How it adapts |
|---|---|
MODEL_NUM_CLASSES in model_weights.h |
Auto-computed from training data label count |
| LDA weight array dimensions | [MODEL_NUM_CLASSES][MODEL_NUM_FEATURES] — regenerated by export_to_header() |
MODEL_CLASS_NAMES array |
Regenerated by export_to_header() |
| All ensemble LDA weight arrays | Regenerated by export_ensemble_header() (Change 7) |
| int8 MLP output layer | Retrained with new class count; re-exported to TFLite |
| Meta-LDA input/output dims | META_NUM_INPUTS = 3 × MODEL_NUM_CLASSES — auto from Python |
What Requires Manual Code Changes
Python side (learning_data_collection.py):
# 1. Add gesture name to the gesture list (1 line)
# Find where GESTURES or similar list is defined (near constants block ~line 49)
GESTURES = ['fist', 'hook_em', 'open', 'rest', 'thumbs_up', 'wrist_flex'] # example
Firmware — config.h (1 line per gesture):
// Add enum value
typedef enum {
GESTURE_NONE = 0,
GESTURE_REST = 1,
GESTURE_FIST = 2,
GESTURE_OPEN = 3,
GESTURE_HOOK_EM = 4,
GESTURE_THUMBS_UP = 5,
GESTURE_WRIST_FLEX = 6, // ← add this line
} gesture_t;
Firmware — inference.c inference_get_gesture_enum() (2–3 lines per gesture):
if (strcmp(name, "wrist_flex") == 0 || strcmp(name, "WRIST_FLEX") == 0)
return GESTURE_WRIST_FLEX;
Firmware — gestures.c (2 changes — these are easy to miss):
// 1. Add to gesture_names[] static array — index MUST match gesture_t enum value:
static const char *gesture_names[GESTURE_COUNT] = {
"NONE", // GESTURE_NONE = 0
"REST", // GESTURE_REST = 1
"FIST", // GESTURE_FIST = 2
"OPEN", // GESTURE_OPEN = 3
"HOOK_EM", // GESTURE_HOOK_EM = 4
"THUMBS_UP", // GESTURE_THUMBS_UP = 5
"WRIST_FLEX", // GESTURE_WRIST_FLEX = 6 ← add here
};
// 2. Add case to gestures_execute() switch statement:
case GESTURE_WRIST_FLEX:
gesture_wrist_flex(); // implement the actuation function
break;
Critical: GESTURE_COUNT at the end of the gesture_t enum in config.h is used as the
array size for gesture_names[]. It updates automatically when new enum values are added before
it. Both gesture_names[GESTURE_COUNT] and the switch statement must be kept in sync with
GESTURE_COUNT. Mismatch causes a bounds-overrun or silent misclassification.
Complete Workflow for Adding a Gesture
1. Python: add gesture string to GESTURES list in learning_data_collection.py (1 line)
2. Data: collect ≥10 sessions × ≥30 reps of new gesture
(follow Change 2 protocol: vary electrode placement between sessions)
3. Train: python learning_data_collection.py → option 3
OR: python train_ensemble.py (after Change 7 is implemented)
4. Export: export_to_header() OR export_ensemble_header()
→ overwrites model_weights.h / model_weights_ensemble.h with new class count
5. config.h: add enum value before GESTURE_COUNT (1 line):
GESTURE_WRIST_FLEX = 6, // ← insert before GESTURE_COUNT
GESTURE_COUNT // stays last — auto-counts
6. inference.c: add string mapping in inference_get_gesture_enum() (2 lines)
7. gestures.c: add name to gesture_names[] array at correct index (1 line)
8. gestures.c: add case to gestures_execute() switch statement (3 lines)
9. Implement actuation function for new gesture (servo angles)
10. Reflash and validate: pio run -t upload
Exact files touched per new gesture (summary):
| File | What to change |
|---|---|
learning_data_collection.py |
Add string to GESTURES list |
config/config.h |
Add enum value before GESTURE_COUNT |
core/inference.c |
Add strcmp case in inference_get_gesture_enum() |
core/gestures.c |
Add to gesture_names[] array + add switch case |
core/gestures.c |
Implement gesture_<name>() function with servo angles |
core/model_weights.h |
Auto-generated — do not edit manually |
Removing a Gesture
Removing is the same process in reverse, with one additional step: filter the HDF5 training
data to exclude sessions that contain the removed gesture's label. The simplest approach is
to pass a label whitelist to load_all_for_training():
# Proposed addition to load_all_for_training() — add include_labels parameter
X, y, trial_ids, session_indices, label_names, sessions = \
storage.load_all_for_training(include_labels=['fist', 'open', 'rest', 'thumbs_up'])
# hook_em removed — existing session files are not modified
9. Practical Limits of 3-Channel EMG
This is the most important constraint for gesture count:
| Gesture Count | Expected Accuracy | Notes |
|---|---|---|
| 3–5 gestures | >90% achievable | Current baseline target |
| 6–8 gestures | 80–90% achievable | Requires richer features + ensemble |
| 9–12 gestures | 65–80% achievable | Diminishing returns; some pairs will be confused |
| 13+ gestures | <65% | Surface EMG with 3 channels cannot reliably separate this many |
Why 3 channels limits gesture count: Surface EMG captures the summed electrical activity of many motor units under each electrode. With only 3 spatial locations, gestures that recruit overlapping muscle groups (e.g., all finger-flexion gestures recruit FCR) produce similar signals. The frequency and coordination features from Change 1 help, but there's a hard information-theoretic limit imposed by channel count.
Rule of thumb: aim for ≤8 gestures with the current 3-channel setup. For more, add the bicep channel (ch3, currently excluded) to get 4 channels — see Section 10.
10. Specific Gesture Considerations
Wrist Flexion / Extension
- Feasibility: High — FCR (ch0) activates strongly for flexion; extensor group (ch2) for extension
- Differentiation from finger gestures: frequency content differs (wrist involves slower motor units)
- Recommendation: Add these before wrist rotation — more reliable with surface EMG
Wrist Rotation (Supination / Pronation)
- Feasibility: Medium — the primary supinator is a deep muscle; surface electrodes capture it weakly
- Key helper: the bicep activates strongly during supination → include ch3 (
HAND_CHANNELS = [0, 1, 2, 3]) - Code change for 4 channels: Python:
HAND_CHANNELS = [0, 1, 2, 3]; firmware:HAND_NUM_CHANNELSauto-updates from the exported header sinceMODEL_NUM_FEATURESis recalculated - Caveat: pronation vs. rest may be harder to distinguish than supination vs. rest
Pinch / Precision Grasp
- Feasibility: Medium — involves intrinsic hand muscles poorly captured by forearm electrodes
- Likely confused with open hand depending on electrode placement
- Collect with careful placement; validate cross-session accuracy before relying on it
Including ch3 (Bicep) for Wrist Gestures
To include the bicep channel in the hand gesture classifier:
# learning_data_collection.py — change this constant
HAND_CHANNELS = [0, 1, 2, 3] # was [0, 1, 2] — add bicep channel
Feature count: 4 channels × 20 per-channel features + 10 cross-channel covariances + 6 correlations = 96 total features. The ensemble architecture handles this automatically — specialist LDA weight dimensions recalculate at training time.
PART IV — CHANGE REFERENCE
11. Change Classification Matrix
| Change | Category | Priority | Files | ESP32 Reflash? | Retrain? | Risk |
|---|---|---|---|---|---|---|
| C | Firmware | Tier 1 | inference.c | ✓ | No | Very Low |
| B | Firmware | Tier 1 | inference.c / filter.c | ✓ | No | Low |
| A | Firmware | Tier 1 | adc_sampling.c | ✓ | No | Medium |
| 0 | Python | Tier 1 | learning_data_collection.py | No | ✓ | Low |
| 1 | Python+C | Tier 2 | learning_data_collection.py + inference.c | ✓ after | ✓ | Medium |
| D | Firmware | Tier 2 | calibration.c/.h | ✓ | No | Medium |
| 2 | Protocol | Tier 2 | None | No | ✓ new data | None |
| 3 | Python | Tier 2 | learning_data_collection.py | No | ✓ | Low |
| E | Python+FW | Tier 3 | train_mlp_tflite.py + firmware | ✓ | ✓ | High |
| 4 | Python+C | Tier 3 | learning_data_collection.py + inference.c | ✓ if enabled | ✓ | Low |
| 5 | Python | Tier 3 | learning_data_collection.py | No | No | None |
| 6 | Python | Tier 3 | learning_data_collection.py | No | ✓ | Low |
| 7 | Python | Tier 3 | new: train_ensemble.py | No | ✓ | Medium |
| F | Firmware | Tier 3 | new: inference_ensemble.c | ✓ | No (needs 7 first) | Medium |
Recommended implementation order: C → B → A → 0 → 1 → D → 2 → 3 → 5 (benchmark) → 7+F → E
PART V — FIRMWARE CHANGES
Change A — DMA-Driven ADC Sampling (Migration from adc_oneshot to adc_continuous)
Priority: Tier 1
Current driver: adc_oneshot_read() polling in drivers/emg_sensor.c. Timing is
controlled by vTaskDelay(1) in run_inference_loop() — subject to FreeRTOS scheduler
jitter of ±0.5–1ms, which corrupts frequency-domain features and ADC burst grouping.
Why: adc_continuous runs entirely in hardware DMA. Sample-to-sample jitter drops from
±1ms to <10µs. CPU overhead between samples is zero. Required for frequency features (Change 1).
Effort: 2–4 hours (replace emg_sensor_read() internals; keep public API the same)
ESP-IDF ADC Continuous API
// --- Initialize (call once at startup) ---
adc_continuous_handle_t adc_handle = NULL;
adc_continuous_handle_cfg_t adc_cfg = {
.max_store_buf_size = 4096, // PSRAM ring buffer size (bytes)
.conv_frame_size = 256, // bytes per conversion frame
};
adc_continuous_new_handle(&adc_cfg, &adc_handle);
// Actual hardware channel mapping (from emg_sensor.c):
// ch0 = ADC_CHANNEL_1 / GPIO 2 (Forearm Belly / FCR)
// ch1 = ADC_CHANNEL_2 / GPIO 3 (Forearm Extensors)
// ch2 = ADC_CHANNEL_8 / GPIO 9 (Forearm Contractors / FCU)
// ch3 = ADC_CHANNEL_9 / GPIO 10 (Bicep — independent subsystem)
adc_digi_pattern_config_t chan_cfg[4] = {
{.atten = ADC_ATTEN_DB_12, .channel = ADC_CHANNEL_1, .unit = ADC_UNIT_1, .bit_width = ADC_BITWIDTH_12},
{.atten = ADC_ATTEN_DB_12, .channel = ADC_CHANNEL_2, .unit = ADC_UNIT_1, .bit_width = ADC_BITWIDTH_12},
{.atten = ADC_ATTEN_DB_12, .channel = ADC_CHANNEL_8, .unit = ADC_UNIT_1, .bit_width = ADC_BITWIDTH_12},
{.atten = ADC_ATTEN_DB_12, .channel = ADC_CHANNEL_9, .unit = ADC_UNIT_1, .bit_width = ADC_BITWIDTH_12},
};
adc_continuous_config_t cont_cfg = {
.sample_freq_hz = 4000, // 4 channels × 1000 Hz = 4000 total samples/sec
.conv_mode = ADC_CONV_SINGLE_UNIT_1,
.format = ADC_DIGI_OUTPUT_FORMAT_TYPE2,
.pattern_num = 4,
.adc_pattern = chan_cfg,
};
adc_continuous_config(adc_handle, &cont_cfg);
// --- ISR callback (fires each frame) ---
static SemaphoreHandle_t s_adc_sem;
static bool IRAM_ATTR adc_conv_done_cb(
adc_continuous_handle_t handle,
const adc_continuous_evt_data_t *edata, void *user_data) {
BaseType_t hp_woken = pdFALSE;
xSemaphoreGiveFromISR(s_adc_sem, &hp_woken);
return hp_woken == pdTRUE;
}
adc_continuous_evt_cbs_t cbs = { .on_conv_done = adc_conv_done_cb };
adc_continuous_register_event_callbacks(adc_handle, &cbs, NULL);
adc_continuous_start(adc_handle);
// --- ADC calibration (apply per sample) ---
adc_cali_handle_t cali_handle;
adc_cali_curve_fitting_config_t cali_cfg = {
.unit_id = ADC_UNIT_1,
.atten = ADC_ATTEN_DB_12, // matches ADC_ATTEN_DB_12 used in current emg_sensor.c
.bitwidth = ADC_BITWIDTH_12,
};
adc_cali_create_scheme_curve_fitting(&cali_cfg, &cali_handle);
// --- Sampling task (pin to Core 0) ---
void adc_sampling_task(void *arg) {
uint8_t result_buf[256];
uint32_t out_len = 0;
while (1) {
xSemaphoreTake(s_adc_sem, portMAX_DELAY);
adc_continuous_read(adc_handle, result_buf, sizeof(result_buf), &out_len, 0);
// Parse: each entry is adc_digi_output_data_t
// Apply adc_cali_raw_to_voltage() for each sample
// Apply IIR filter (Change B) → post to inference ring buffer
}
}
Verify: log consecutive sample timestamps via esp_timer_get_time(); spacing should be 1.0ms ± 0.05ms.
Change B — IIR Biquad Bandpass Filter
Priority: Tier 1 Why: MyoWare analogue filters are not tunable. Software IIR removes powerline interference (50/60 Hz), sub-20 Hz motion artifact, and >500 Hz noise — all of which inflate ZC, WL, and other features computed at rest. Effort: 2 hours
Step 1 — Compute Coefficients in Python (one-time, offline)
from scipy.signal import butter
import numpy as np
fs = 1000.0
sos = butter(N=2, Wn=[20.0, 500.0], btype='bandpass', fs=fs, output='sos')
# sos[i] = [b0, b1, b2, a0, a1, a2]
# esp-dsp Direct Form II convention: coeffs = [b0, b1, b2, -a1, -a2]
for i, s in enumerate(sos):
b0, b1, b2, a0, a1, a2 = s
print(f"Section {i}: {b0:.8f}f, {b1:.8f}f, {b2:.8f}f, {-a1:.8f}f, {-a2:.8f}f")
# Run this and paste the printed values into the C constants below
Step 2 — Add to inference.c (after includes, before // --- State ---)
#include "dsps_biquad.h"
// 2nd-order Butterworth bandpass 20–500 Hz @ 1000 Hz
// Coefficients: [b0, b1, b2, -a1, -a2] — Direct Form II, esp-dsp sign convention
// Regenerate with: scipy.signal.butter(N=2, Wn=[20,500], btype='bandpass', fs=1000, output='sos')
static const float BIQUAD_HP_COEFFS[5] = { /* paste section 0 output here */ };
static const float BIQUAD_LP_COEFFS[5] = { /* paste section 1 output here */ };
// Filter delay state: 3 channels × 2 stages × 2 delay elements = 12 floats (48 bytes)
static float biquad_hp_w[HAND_NUM_CHANNELS][2];
static float biquad_lp_w[HAND_NUM_CHANNELS][2];
Add to inference_init():
memset(biquad_hp_w, 0, sizeof(biquad_hp_w));
memset(biquad_lp_w, 0, sizeof(biquad_lp_w));
Step 3 — Apply Per Sample (called before writing to window_buffer)
// Apply to each channel before posting to the window buffer.
// Must be called IN ORDER for each sample (IIR has memory across calls).
static float IRAM_ATTR apply_bandpass(int ch, float raw) {
float hp_out, lp_out;
dsps_biquad_f32(&raw, &hp_out, 1, (float *)BIQUAD_HP_COEFFS, biquad_hp_w[ch]);
dsps_biquad_f32(&hp_out, &lp_out, 1, (float *)BIQUAD_LP_COEFFS, biquad_lp_w[ch]);
return lp_out;
}
Note: window_buffer stores uint16_t — change to float when adding this filter, so
filtered values are stored directly without lossy integer round-trip.
Verify: log ZC count at rest before and after — filtered ZC should be substantially lower (less spurious noise crossings).
Change C — Confidence Rejection
Priority: Tier 1 — implement this first, lowest risk of all changes Why: Without a rejection threshold, ambiguous EMG (rest-to-gesture transition, mid-gesture fatigue, electrode lift) always produces a false actuation. Effort: 15 minutes
Step 1 — Add Constant (top of inference.c with other constants)
#define CONFIDENCE_THRESHOLD 0.40f // Reject when max smoothed prob < this.
// Meta paper uses 0.35; 0.40 adds prosthetic safety margin.
// Tune: lower to 0.35 if real gestures are being rejected.
Step 2 — Insert After EMA Block in inference_predict() (after line 214)
// Confidence rejection: if the peak smoothed probability is below threshold,
// hold the last confirmed output rather than outputting an uncertain prediction.
// Prevents false actuations during gesture transitions and electrode artifacts.
if (max_smoothed_prob < CONFIDENCE_THRESHOLD) {
*confidence = max_smoothed_prob;
return current_output; // -1 (GESTURE_NONE) until first confident prediction
}
Verify: arm at complete rest → confirm output stays at GESTURE_NONE and confidence logs below 0.40. Deliberate fist → confidence rises above 0.40 within 1–3 inference cycles.
Change D — On-Device NVS Calibration
Priority: Tier 2
Why: Python CalibrationTransform only runs during training. On-device NVS calibration
lets the ESP32 recalibrate z-score normalization at startup (3 seconds of REST) without
retraining — solving placement drift and day-to-day impedance variation.
Effort: 3–4 hours
New Files
EMG_Arm/src/core/calibration.h
EMG_Arm/src/core/calibration.c
calibration.h
#pragma once
#include <stdbool.h>
#include "config/config.h"
#define CALIB_MAX_FEATURES 96 // supports up to 4-channel expansion
bool calibration_init(void); // load from NVS at startup
void calibration_apply(float *feat); // z-score in-place; no-op if not calibrated
bool calibration_update(const float X[][CALIB_MAX_FEATURES], int n_windows, int n_feat);
void calibration_reset(void);
bool calibration_is_valid(void);
calibration.c
#include "calibration.h"
#include "nvs_flash.h"
#include "nvs.h"
#include <math.h>
#include <string.h>
#include <stdio.h>
#define NVS_NAMESPACE "emg_calib"
#define NVS_KEY_MEAN "feat_mean"
#define NVS_KEY_STD "feat_std"
#define NVS_KEY_NFEAT "n_feat"
#define NVS_KEY_VALID "calib_ok"
static float s_mean[CALIB_MAX_FEATURES];
static float s_std[CALIB_MAX_FEATURES];
static int s_n_feat = 0;
static bool s_valid = false;
bool calibration_init(void) {
esp_err_t err = nvs_flash_init();
if (err == ESP_ERR_NVS_NO_FREE_PAGES || err == ESP_ERR_NVS_NEW_VERSION_FOUND) {
nvs_flash_erase();
nvs_flash_init();
}
nvs_handle_t h;
if (nvs_open(NVS_NAMESPACE, NVS_READONLY, &h) != ESP_OK) return false;
uint8_t valid = 0;
size_t mean_sz = sizeof(s_mean), std_sz = sizeof(s_std);
bool ok = (nvs_get_u8(h, NVS_KEY_VALID, &valid) == ESP_OK) && (valid == 1) &&
(nvs_get_i32(h, NVS_KEY_NFEAT, (int32_t*)&s_n_feat) == ESP_OK) &&
(nvs_get_blob(h, NVS_KEY_MEAN, s_mean, &mean_sz) == ESP_OK) &&
(nvs_get_blob(h, NVS_KEY_STD, s_std, &std_sz) == ESP_OK);
nvs_close(h);
s_valid = ok;
printf("[Calib] %s (%d features)\n", ok ? "Loaded from NVS" : "Not found — identity", s_n_feat);
return ok;
}
void calibration_apply(float *feat) {
if (!s_valid) return;
for (int i = 0; i < s_n_feat; i++)
feat[i] = (feat[i] - s_mean[i]) / s_std[i];
}
bool calibration_update(const float X[][CALIB_MAX_FEATURES], int n_windows, int n_feat) {
if (n_windows < 10 || n_feat > CALIB_MAX_FEATURES) return false;
s_n_feat = n_feat;
memset(s_mean, 0, sizeof(s_mean));
for (int w = 0; w < n_windows; w++)
for (int f = 0; f < n_feat; f++)
s_mean[f] += X[w][f];
for (int f = 0; f < n_feat; f++) s_mean[f] /= n_windows;
memset(s_std, 0, sizeof(s_std));
for (int w = 0; w < n_windows; w++)
for (int f = 0; f < n_feat; f++) {
float d = X[w][f] - s_mean[f];
s_std[f] += d * d;
}
for (int f = 0; f < n_feat; f++) {
s_std[f] = sqrtf(s_std[f] / n_windows);
if (s_std[f] < 1e-6f) s_std[f] = 1e-6f;
}
nvs_handle_t h;
if (nvs_open(NVS_NAMESPACE, NVS_READWRITE, &h) != ESP_OK) return false;
nvs_set_blob(h, NVS_KEY_MEAN, s_mean, sizeof(s_mean));
nvs_set_blob(h, NVS_KEY_STD, s_std, sizeof(s_std));
nvs_set_i32(h, NVS_KEY_NFEAT, n_feat);
nvs_set_u8(h, NVS_KEY_VALID, 1);
nvs_commit(h);
nvs_close(h);
s_valid = true;
printf("[Calib] Updated from %d REST windows, %d features\n", n_windows, n_feat);
return true;
}
Integration in inference.c
In inference_predict(), after compute_features(features), before LDA:
calibration_apply(features); // z-score using NVS-stored mean/std
Startup Flow
// In main application startup sequence:
calibration_init(); // load from NVS; no-op if not present yet
// When user triggers recalibration (button press or serial command):
// Collect ~120 REST windows (~3 seconds at 25ms hop)
// Call calibration_update(rest_feature_buffer, 120, MODEL_NUM_FEATURES)
Change E — int8 MLP via TFLite Micro
Priority: Tier 3 — implement after Tier 1+2 changes and benchmark (Change 5) shows LDA plateauing Why: LDA finds only linear decision boundaries. A 2-layer int8 MLP adds nonlinear boundaries for gesture pairs that overlap in feature space. Effort: 4–6 hours
Python Training (new file: train_mlp_tflite.py)
"""
Train int8 MLP for ESP32-S3 deployment via TFLite Micro.
Run AFTER Change 0 (label shift) + Change 1 (expanded features).
"""
import numpy as np
import tensorflow as tf
from pathlib import Path
import sys
sys.path.insert(0, str(Path(__file__).parent))
from learning_data_collection import SessionStorage, EMGFeatureExtractor, HAND_CHANNELS
storage = SessionStorage()
X_raw, y, trial_ids, session_indices, label_names, _ = storage.load_all_for_training()
extractor = EMGFeatureExtractor(channels=HAND_CHANNELS, cross_channel=True)
X = extractor.extract_features_batch(X_raw).astype(np.float32)
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X = scaler.fit_transform(X)
n_feat, n_cls = X.shape[1], len(np.unique(y))
model = tf.keras.Sequential([
tf.keras.layers.Input(shape=(n_feat,)),
tf.keras.layers.Dense(32, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(16, activation='relu'),
tf.keras.layers.Dense(n_cls, activation='softmax'),
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(X, y, epochs=150, batch_size=64, validation_split=0.1, verbose=1)
def representative_dataset():
for i in range(0, len(X), 10):
yield [X[i:i+1]]
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8
converter.inference_output_type = tf.int8
tflite_model = converter.convert()
out = Path('EMG_Arm/src/core/emg_model_data.cc')
with open(out, 'w') as f:
f.write('#include "emg_model_data.h"\n')
f.write(f'const int g_model_len = {len(tflite_model)};\n')
f.write('const unsigned char g_model[] = {\n ')
f.write(', '.join(f'0x{b:02x}' for b in tflite_model))
f.write('\n};\n')
print(f"Wrote {out} ({len(tflite_model)} bytes)")
Firmware (inference_mlp.cc)
#include "inference_mlp.h"
#include "emg_model_data.h"
#include "tensorflow/lite/micro/micro_interpreter.h"
#include "tensorflow/lite/micro/micro_mutable_op_resolver.h"
#include "tensorflow/lite/schema/schema_generated.h"
static uint8_t tensor_arena[48 * 1024]; // 48 KB — tune down if memory is tight
static tflite::MicroInterpreter *interpreter = nullptr;
static TfLiteTensor *input = nullptr, *output = nullptr;
void inference_mlp_init(void) {
const tflite::Model *model = tflite::GetModel(g_model);
static tflite::MicroMutableOpResolver<4> resolver;
resolver.AddFullyConnected();
resolver.AddRelu();
resolver.AddSoftmax();
resolver.AddDequantize();
static tflite::MicroInterpreter interp(model, resolver, tensor_arena, sizeof(tensor_arena));
interpreter = &interp;
interpreter->AllocateTensors();
input = interpreter->input(0);
output = interpreter->output(0);
}
int inference_mlp_predict(const float *features, int n_feat, float *conf_out) {
float iscale = input->params.scale;
int izp = input->params.zero_point;
for (int i = 0; i < n_feat; i++) {
int q = (int)roundf(features[i] / iscale) + izp;
input->data.int8[i] = (int8_t)(q < -128 ? -128 : q > 127 ? 127 : q);
}
interpreter->Invoke();
float oscale = output->params.scale;
int ozp = output->params.zero_point;
float max_p = -1e9f;
int max_c = 0;
for (int c = 0; c < MODEL_NUM_CLASSES; c++) {
float p = (output->data.int8[c] - ozp) * oscale;
if (p > max_p) { max_p = p; max_c = c; }
}
*conf_out = max_p;
return max_c;
}
platformio.ini addition:
lib_deps =
tensorflow/tflite-micro
Change F — Ensemble Inference Pipeline
Priority: Tier 3 (requires Change 1 features + Change 7 training + Change E MLP) Why: This is the full recommended architecture from Part II. Effort: 3–4 hours firmware (after Python ensemble is trained and exported)
New Files
EMG_Arm/src/core/inference_ensemble.c
EMG_Arm/src/core/inference_ensemble.h
EMG_Arm/src/core/model_weights_ensemble.h (generated by Change 7 Python script)
inference_ensemble.h
#pragma once
#include <stdbool.h>
void inference_ensemble_init(void);
int inference_ensemble_predict(float *confidence);
inference_ensemble.c
#include "inference_ensemble.h"
#include "inference.h" // for compute_features(), calibration_apply()
#include "inference_mlp.h" // for inference_mlp_predict()
#include "model_weights_ensemble.h"
#include "config/config.h"
#include "dsps_dotprod.h"
#include <math.h>
#include <string.h>
#include <stdio.h>
#define ENSEMBLE_EMA_ALPHA 0.70f
#define ENSEMBLE_CONF_THRESHOLD 0.50f // below this: escalate to MLP fallback
#define REJECT_THRESHOLD 0.40f // below this even after MLP: hold output
#define REST_ACTIVITY_THRESHOLD 0.05f // total_rms below this → skip inference, return REST
// EMA state
static float s_smoothed[MODEL_NUM_CLASSES];
// Vote + debounce (reuse existing pattern from inference.c)
static int s_vote_history[5];
static int s_vote_head = 0;
static int s_current_output = -1;
static int s_pending_output = -1;
static int s_pending_count = 0;
// --- Generic LDA softmax predict ---
// weights: [n_classes][n_feat], intercepts: [n_classes]
// proba_out: [n_classes] — caller-provided output
static void lda_softmax(const float *feat, int n_feat,
const float *weights_flat, const float *intercepts,
int n_classes, float *proba_out) {
float raw[MODEL_NUM_CLASSES];
float max_raw = -1e9f, sum_exp = 0.0f;
for (int c = 0; c < n_classes; c++) {
raw[c] = intercepts[c];
// dsps_dotprod_f32 requires 4-byte aligned arrays and length multiple of 4;
// for safety use plain loop — compiler will auto-vectorize with -O2
const float *w = weights_flat + c * n_feat;
for (int f = 0; f < n_feat; f++) raw[c] += feat[f] * w[f];
if (raw[c] > max_raw) max_raw = raw[c];
}
for (int c = 0; c < n_classes; c++) {
proba_out[c] = expf(raw[c] - max_raw);
sum_exp += proba_out[c];
}
for (int c = 0; c < n_classes; c++) proba_out[c] /= sum_exp;
}
void inference_ensemble_init(void) {
for (int c = 0; c < MODEL_NUM_CLASSES; c++)
s_smoothed[c] = 1.0f / MODEL_NUM_CLASSES;
for (int i = 0; i < 5; i++) s_vote_history[i] = -1;
s_vote_head = 0;
s_current_output = -1;
s_pending_output = -1;
s_pending_count = 0;
}
int inference_ensemble_predict(float *confidence) {
// 1. Extract features (shared with single-model path)
float features[MODEL_NUM_FEATURES];
compute_features(features);
calibration_apply(features);
// 2. Activity gate — skip inference during obvious REST
float total_rms_sq = 0.0f;
for (int ch = 0; ch < HAND_NUM_CHANNELS; ch++) {
float r = features[ch * ENSEMBLE_PER_CH_FEATURES]; // RMS is index 0 per channel
total_rms_sq += r * r;
}
if (sqrtf(total_rms_sq) < REST_ACTIVITY_THRESHOLD) {
*confidence = 1.0f;
return GESTURE_REST;
}
// 3. Specialist LDAs
float prob_td[MODEL_NUM_CLASSES];
float prob_fd[MODEL_NUM_CLASSES];
float prob_cc[MODEL_NUM_CLASSES];
lda_softmax(features + TD_FEAT_OFFSET, TD_NUM_FEATURES,
(const float *)LDA_TD_WEIGHTS, LDA_TD_INTERCEPTS,
MODEL_NUM_CLASSES, prob_td);
lda_softmax(features + FD_FEAT_OFFSET, FD_NUM_FEATURES,
(const float *)LDA_FD_WEIGHTS, LDA_FD_INTERCEPTS,
MODEL_NUM_CLASSES, prob_fd);
lda_softmax(features + CC_FEAT_OFFSET, CC_NUM_FEATURES,
(const float *)LDA_CC_WEIGHTS, LDA_CC_INTERCEPTS,
MODEL_NUM_CLASSES, prob_cc);
// 4. Meta-LDA stacker
float meta_in[META_NUM_INPUTS]; // = 3 * MODEL_NUM_CLASSES
memcpy(meta_in, prob_td, MODEL_NUM_CLASSES * sizeof(float));
memcpy(meta_in + MODEL_NUM_CLASSES, prob_fd, MODEL_NUM_CLASSES * sizeof(float));
memcpy(meta_in + 2*MODEL_NUM_CLASSES, prob_cc, MODEL_NUM_CLASSES * sizeof(float));
float meta_probs[MODEL_NUM_CLASSES];
lda_softmax(meta_in, META_NUM_INPUTS,
(const float *)META_LDA_WEIGHTS, META_LDA_INTERCEPTS,
MODEL_NUM_CLASSES, meta_probs);
// 5. EMA smoothing on meta output
float max_smooth = 0.0f;
int winner = 0;
for (int c = 0; c < MODEL_NUM_CLASSES; c++) {
s_smoothed[c] = ENSEMBLE_EMA_ALPHA * s_smoothed[c] +
(1.0f - ENSEMBLE_EMA_ALPHA) * meta_probs[c];
if (s_smoothed[c] > max_smooth) { max_smooth = s_smoothed[c]; winner = c; }
}
// 6. Confidence cascade: escalate to MLP if meta-LDA is uncertain
if (max_smooth < ENSEMBLE_CONF_THRESHOLD) {
float mlp_conf = 0.0f;
int mlp_winner = inference_mlp_predict(features, MODEL_NUM_FEATURES, &mlp_conf);
if (mlp_conf > max_smooth) { winner = mlp_winner; max_smooth = mlp_conf; }
}
// 7. Reject if still uncertain
if (max_smooth < REJECT_THRESHOLD) {
*confidence = max_smooth;
return s_current_output;
}
*confidence = max_smooth;
// 8. Majority vote (window = 5)
s_vote_history[s_vote_head] = winner;
s_vote_head = (s_vote_head + 1) % 5;
int counts[MODEL_NUM_CLASSES] = {0};
for (int i = 0; i < 5; i++)
if (s_vote_history[i] >= 0) counts[s_vote_history[i]]++;
int majority = 0, majority_cnt = 0;
for (int c = 0; c < MODEL_NUM_CLASSES; c++)
if (counts[c] > majority_cnt) { majority_cnt = counts[c]; majority = c; }
// 9. Debounce (3 consecutive predictions to change output)
int final = s_current_output;
if (s_current_output == -1) {
s_current_output = majority; final = majority;
} else if (majority == s_current_output) {
s_pending_output = majority; s_pending_count = 1;
} else if (majority == s_pending_output) {
if (++s_pending_count >= 3) { s_current_output = majority; final = majority; }
} else {
s_pending_output = majority; s_pending_count = 1;
}
return final;
}
model_weights_ensemble.h Layout (generated by Change 7)
// Auto-generated by train_ensemble.py — do not edit manually
#pragma once
#define MODEL_NUM_CLASSES 5 // auto-computed from training data
#define MODEL_NUM_FEATURES 69 // total feature count (after Change 1)
#define ENSEMBLE_PER_CH_FEATURES 20 // features per channel
// Specialist feature subset offsets and sizes
#define TD_FEAT_OFFSET 0
#define TD_NUM_FEATURES 36 // time-domain: indices 0–11, 20–31, 40–51
#define FD_FEAT_OFFSET 12 // NOTE: FD features are interleaved per-channel
#define FD_NUM_FEATURES 24 // freq-domain: indices 12–19, 32–39, 52–59
#define CC_FEAT_OFFSET 60
#define CC_NUM_FEATURES 9 // cross-channel: indices 60–68
#define META_NUM_INPUTS (3 * MODEL_NUM_CLASSES) // = 15
// Specialist LDA weights (flat row-major: [n_classes][n_feat])
extern const float LDA_TD_WEIGHTS[MODEL_NUM_CLASSES][TD_NUM_FEATURES];
extern const float LDA_TD_INTERCEPTS[MODEL_NUM_CLASSES];
extern const float LDA_FD_WEIGHTS[MODEL_NUM_CLASSES][FD_NUM_FEATURES];
extern const float LDA_FD_INTERCEPTS[MODEL_NUM_CLASSES];
extern const float LDA_CC_WEIGHTS[MODEL_NUM_CLASSES][CC_NUM_FEATURES];
extern const float LDA_CC_INTERCEPTS[MODEL_NUM_CLASSES];
// Meta-LDA weights
extern const float META_LDA_WEIGHTS[MODEL_NUM_CLASSES][META_NUM_INPUTS];
extern const float META_LDA_INTERCEPTS[MODEL_NUM_CLASSES];
// Class names (for inference_get_gesture_enum)
extern const char *MODEL_CLASS_NAMES[MODEL_NUM_CLASSES];
Important note on FD features: the frequency-domain features are interleaved at indices
[12–19] for ch0, [32–39] for ch1, [52–59] for ch2. The lda_softmax call for LDA_FD must
pass a gathered (non-contiguous) sub-vector. The cleanest approach is to gather them into
a contiguous buffer before calling lda_softmax:
// Gather FD features into contiguous buffer before LDA_FD
float fd_buf[FD_NUM_FEATURES];
for (int ch = 0; ch < HAND_NUM_CHANNELS; ch++)
memcpy(fd_buf + ch*8, features + ch*20 + 12, 8 * sizeof(float));
lda_softmax(fd_buf, FD_NUM_FEATURES, ...);
Similarly for TD features. This gather costs <5 µs — negligible.
PART VI — PYTHON/TRAINING CHANGES
Change 0 — Forward Label Shift
Priority: Tier 1 Source: Meta Nature 2025, Methods: "Discrete-gesture time alignment" Why: +100ms shift after onset detection gives the classifier 100ms of pre-event "building" signal, dramatically cleaning the decision boundary near gesture onset. ESP32 impact: None.
Step 1 — Add Constant After Line 94
# After: TRANSITION_END_MS = 150
LABEL_FORWARD_SHIFT_MS = 100 # shift label boundaries +100ms after onset alignment
# Source: Kaifosh et al. Nature 2025. doi:10.1038/s41586-025-09255-w
Step 2 — Apply Shift in SessionStorage.save_session() (after line ~704)
Find and insert after:
print(f"[Storage] Labels aligned: {changed}/{len(labels)} windows shifted")
Insert:
if LABEL_FORWARD_SHIFT_MS > 0:
shift_windows = max(1, round(LABEL_FORWARD_SHIFT_MS / HOP_SIZE_MS))
shifted = list(aligned_labels)
for i in range(1, len(aligned_labels)):
if aligned_labels[i] != aligned_labels[i - 1]:
for j in range(i, min(i + shift_windows, len(aligned_labels))):
if shifted[j] == aligned_labels[i]:
shifted[j] = aligned_labels[i - 1]
n_shifted = sum(1 for a, b in zip(aligned_labels, shifted) if a != b)
aligned_labels = shifted
print(f"[Storage] Forward label shift (+{LABEL_FORWARD_SHIFT_MS}ms): {n_shifted} windows adjusted")
Step 3 — Reduce TRANSITION_START_MS
TRANSITION_START_MS = 200 # was 300 — reduce because 100ms shift already adds pre-event context
Verify: printout shows N windows adjusted where N is 5–20% of total windows per session.
Change 1 — Expanded Feature Set
Priority: Tier 2
Why: 12 → 69 features; adds frequency-domain and cross-channel information that is
structurally more informative than amplitude alone (Meta Extended Data Fig. 6).
ESP32 impact: retrain → export new model_weights.h; port selected features to C.
Sub-change 1A — Expand extract_features_single_channel() (line 1448)
Replace the entire function body:
def extract_features_single_channel(self, signal: np.ndarray) -> dict:
if getattr(self, 'reinhard', False):
signal = 64.0 * signal / (32.0 + np.abs(signal))
signal = signal - np.mean(signal)
N = len(signal)
# --- Time domain ---
rms = np.sqrt(np.mean(signal ** 2))
diff = np.diff(signal)
wl = np.sum(np.abs(diff))
zc_thresh = self.zc_threshold_percent * rms
ssc_thresh = (self.ssc_threshold_percent * rms) ** 2
sign_ch = signal[:-1] * signal[1:] < 0
zc = int(np.sum(sign_ch & (np.abs(diff) > zc_thresh)))
d_l = signal[1:-1] - signal[:-2]
d_r = signal[1:-1] - signal[2:]
ssc = int(np.sum((d_l * d_r) > ssc_thresh))
mav = np.mean(np.abs(signal))
var = np.mean(signal ** 2)
iemg = np.sum(np.abs(signal))
wamp = int(np.sum(np.abs(diff) > 0.15 * rms))
# AR(4) via Yule-Walker
ar = np.zeros(4)
if rms > 1e-6:
try:
from scipy.linalg import solve_toeplitz
r = np.array([np.dot(signal[i:], signal[:N-i]) / N for i in range(5)])
if r[0] > 1e-10:
ar = solve_toeplitz(r[:4], -r[1:5])
except Exception:
pass
# --- Frequency domain (20–500 Hz) ---
freqs = np.fft.rfftfreq(N, d=1.0 / SAMPLING_RATE_HZ)
psd = np.abs(np.fft.rfft(signal)) ** 2 / N
m = (freqs >= 20) & (freqs <= 500)
f_m, p_m = freqs[m], psd[m]
tp = np.sum(p_m) + 1e-10
mnf = float(np.sum(f_m * p_m) / tp)
cum = np.cumsum(p_m)
mdf = float(f_m[min(np.searchsorted(cum, tp / 2), len(f_m) - 1)])
pkf = float(f_m[np.argmax(p_m)]) if len(p_m) > 0 else 0.0
mnp = float(tp / max(len(p_m), 1))
# Bandpower in 4 physiological bands (mirrors firmware esp-dsp FFT bands)
bands = [(20, 80), (80, 150), (150, 300), (300, 500)]
bp = [float(np.sum(psd[(freqs >= lo) & (freqs < hi)])) for lo, hi in bands]
return {
'rms': rms, 'wl': wl, 'zc': zc, 'ssc': ssc,
'mav': mav, 'var': var, 'iemg': iemg, 'wamp': wamp,
'ar1': float(ar[0]), 'ar2': float(ar[1]),
'ar3': float(ar[2]), 'ar4': float(ar[3]),
'mnf': mnf, 'mdf': mdf, 'pkf': pkf, 'mnp': mnp,
'bp0': bp[0], 'bp1': bp[1], 'bp2': bp[2], 'bp3': bp[3],
}
Sub-change 1B — Update extract_features_window() Return Block (line 1482)
Replace the return section:
FEATURE_ORDER = ['rms', 'wl', 'zc', 'ssc', 'mav', 'var', 'iemg', 'wamp',
'ar1', 'ar2', 'ar3', 'ar4', 'mnf', 'mdf', 'pkf', 'mnp',
'bp0', 'bp1', 'bp2', 'bp3']
NORMALIZE_KEYS = {'rms', 'wl', 'mav', 'iemg'}
features = []
for ch_features in all_ch_features:
for key in FEATURE_ORDER:
val = ch_features.get(key, 0.0)
if self.normalize and key in NORMALIZE_KEYS:
val = val / norm_factor
features.append(float(val))
if self.cross_channel and window.shape[1] >= 2:
sel = window[:, channel_indices].astype(np.float32)
wc = sel - sel.mean(axis=0)
cov = (wc.T @ wc) / len(wc)
ri, ci = np.triu_indices(len(channel_indices))
features.extend(cov[ri, ci].tolist())
stds = np.sqrt(np.diag(cov)) + 1e-10
cor = cov / np.outer(stds, stds)
ro, co = np.triu_indices(len(channel_indices), k=1)
features.extend(cor[ro, co].tolist())
return np.array(features, dtype=np.float32)
Sub-change 1C — Update EMGFeatureExtractor.__init__() (line 1430)
def __init__(self, zc_threshold_percent=0.1, ssc_threshold_percent=0.1,
channels=None, normalize=True, cross_channel=True, reinhard=False):
self.zc_threshold_percent = zc_threshold_percent
self.ssc_threshold_percent = ssc_threshold_percent
self.channels = channels
self.normalize = normalize
self.cross_channel = cross_channel
self.reinhard = reinhard
Sub-change 1D — Update Feature Count in extract_features_batch() (line 1520)
Replace n_features = n_channels * 4:
per_ch = 20
if self.cross_channel and n_channels >= 2:
n_features = n_channels * per_ch + \
n_channels*(n_channels+1)//2 + n_channels*(n_channels-1)//2
else:
n_features = n_channels * per_ch
Sub-change 1E — Update get_feature_names() (line 1545)
def get_feature_names(self, n_channels=0):
ch_idx = self.channels if self.channels is not None else list(range(n_channels))
ORDER = ['rms','wl','zc','ssc','mav','var','iemg','wamp',
'ar1','ar2','ar3','ar4','mnf','mdf','pkf','mnp','bp0','bp1','bp2','bp3']
names = [f'ch{ch}_{f}' for ch in ch_idx for f in ORDER]
if self.cross_channel and len(ch_idx) >= 2:
n = len(ch_idx)
names += [f'cov_ch{ch_idx[i]}_ch{ch_idx[j]}' for i in range(n) for j in range(i, n)]
names += [f'cor_ch{ch_idx[i]}_ch{ch_idx[j]}' for i in range(n) for j in range(i+1, n)]
return names
Sub-change 1F — Update EMGClassifier.__init__() (line 1722)
self.feature_extractor = EMGFeatureExtractor(
channels=HAND_CHANNELS, cross_channel=True, reinhard=False)
Sub-change 1G — Update save() (line 1910) and load() (line 2089)
In save(), add to feature_extractor_params dict:
'cross_channel': getattr(self.feature_extractor, 'cross_channel', True),
'reinhard': getattr(self.feature_extractor, 'reinhard', False),
In load(), update EMGFeatureExtractor(...) constructor:
classifier.feature_extractor = EMGFeatureExtractor(
zc_threshold_percent = params.get('zc_threshold_percent', 0.1),
ssc_threshold_percent = params.get('ssc_threshold_percent', 0.1),
channels = params.get('channels', HAND_CHANNELS),
normalize = params.get('normalize', False),
cross_channel = params.get('cross_channel', True),
reinhard = params.get('reinhard', False),
)
Also Fix Bug at Line 2382
X, y, trial_ids, session_indices, label_names, loaded_sessions = storage.load_all_for_training()
Change 2 — Electrode Repositioning Protocol
Protocol: no code changes.
"Between sessions within a single day, the participants remove and slightly reposition the sEMG wristband to enable generalization across different recording positions." — Meta Nature 2025 Methods
- Session 1: standard placement
- Session 2: band 1–2 cm up the forearm
- Session 3: band 1–2 cm down the forearm
- Session 4+: slight axial rotation or return to any above position
The per-session z-score normalization in _apply_session_normalization() handles the
resulting amplitude shifts. Perform fast, natural gestures — not slow/deliberate.
Change 3 — Data Augmentation
Priority: Tier 2. Apply to raw windows BEFORE feature extraction.
Insert before the # === LDA CLASSIFIER === comment (~line 1709):
def augment_emg_batch(X, y, multiplier=3, seed=42):
"""
Augment raw EMG windows for training robustness.
Must be called on raw windows (n_windows, n_samples, n_channels),
not on pre-computed features.
Source (window jitter): Kaifosh et al. Nature 2025. doi:10.1038/s41586-025-09255-w
"""
rng = np.random.default_rng(seed)
aug_X, aug_y = [X], [y]
for _ in range(multiplier - 1):
Xc = X.copy().astype(np.float32)
Xc *= rng.uniform(0.80, 1.20, (len(X), 1, 1)).astype(np.float32) # amplitude
rms = np.sqrt(np.mean(Xc**2, axis=(1,2), keepdims=True)) + 1e-8
Xc += rng.standard_normal(Xc.shape).astype(np.float32) * (0.05 * rms) # noise
Xc += rng.uniform(-20., 20., (len(X), 1, X.shape[2])).astype(np.float32) # DC jitter
shifts = rng.integers(-5, 6, size=len(X))
for i in range(len(Xc)):
if shifts[i]: Xc[i] = np.roll(Xc[i], shifts[i], axis=0) # jitter
aug_X.append(Xc); aug_y.append(y)
return np.concatenate(aug_X), np.concatenate(aug_y)
In EMGClassifier.train(), replace the start of the function's feature extraction block:
if getattr(self, 'use_augmentation', True):
X_aug, y_aug = augment_emg_batch(X, y, multiplier=3)
print(f"[Classifier] Augmented: {len(X)} → {len(X_aug)} windows")
else:
X_aug, y_aug = X, y
X_features = self.feature_extractor.extract_features_batch(X_aug)
# ... then use y_aug instead of y for model.fit()
Change 4 — Reinhard Compression (Optional)
Formula: output = 64 × x / (32 + |x|)
Enable in Python: set reinhard=True in EMGFeatureExtractor constructor (Change 1F).
Enable in firmware (inference.c compute_features(), after signal copy loop, before mean calc):
#if MODEL_USE_REINHARD
for (int i = 0; i < INFERENCE_WINDOW_SIZE; i++) {
float x = signal[i];
signal[i] = 64.0f * x / (32.0f + fabsf(x));
}
#endif
Add #define MODEL_USE_REINHARD 0 to model_weights.h (set to 1 when Python uses reinhard=True).
Python and firmware MUST match. Mismatch silently corrupts all predictions.
Change 5 — Classifier Benchmark
Purpose: tells you whether LDA accuracy plateau is a features problem (all classifiers similar → add features) or a model complexity problem (SVM/MLP >> LDA → implement Change E/F).
Add after run_training_demo():
def run_classifier_benchmark():
from sklearn.svm import SVC
from sklearn.neural_network import MLPClassifier
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import cross_val_score, GroupKFold
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis, QuadraticDiscriminantAnalysis
storage = SessionStorage()
X_raw, y, trial_ids, session_indices, label_names, _ = storage.load_all_for_training()
extractor = EMGFeatureExtractor(channels=HAND_CHANNELS, cross_channel=True)
X = extractor.extract_features_batch(X_raw)
X = EMGClassifier()._apply_session_normalization(X, session_indices, y=y)
clfs = {
'LDA (ESP32 model)': LinearDiscriminantAnalysis(),
'QDA': QuadraticDiscriminantAnalysis(reg_param=0.1),
'SVM-RBF': Pipeline([('s', StandardScaler()), ('m', SVC(kernel='rbf', C=10))]),
'MLP-128-64': Pipeline([('s', StandardScaler()),
('m', MLPClassifier(hidden_layer_sizes=(128,64),
max_iter=1000, early_stopping=True))]),
}
gkf = GroupKFold(n_splits=5)
print(f"\n{'Classifier':<22} {'Mean CV':>8} {'Std':>6}")
print("-" * 40)
for name, clf in clfs.items():
sc = cross_val_score(clf, X, y, cv=gkf, groups=trial_ids, scoring='accuracy')
print(f" {name:<20} {sc.mean()*100:>7.1f}% ±{sc.std()*100:.1f}%")
print("\n → If LDA ≈ SVM: features are the bottleneck (add Change 1 features)")
print(" → If SVM >> LDA: model complexity bottleneck (implement Change F ensemble)")
Change 6 — Simplified MPF Features
Python training only — not worth porting to ESP32 directly (use bandpower bp0–bp3 from Change 1 as the firmware-side approximation).
Add after EMGFeatureExtractor class:
class MPFFeatureExtractor:
"""
Simplified 3-channel MPF: CSD upper triangle per 6 frequency bands = 36 features.
Python training only. Omits matrix logarithm (not needed for 3 channels).
Source: Kaifosh et al. Nature 2025. doi:10.1038/s41586-025-09255-w
ESP32 approximation: use bp0–bp3 from EMGFeatureExtractor (Change 1).
"""
BANDS = [(0,62),(62,125),(125,187),(187,250),(250,375),(375,500)]
def __init__(self, channels=None, log_diagonal=True):
self.channels = channels or HAND_CHANNELS
self.log_diag = log_diagonal
self.n_ch = len(self.channels)
self._r, self._c = np.triu_indices(self.n_ch)
self.n_features = len(self.BANDS) * len(self._r)
def extract_window(self, window):
sig = window[:, self.channels].astype(np.float64)
N = len(sig)
freqs = np.fft.rfftfreq(N, d=1.0/SAMPLING_RATE_HZ)
Xf = np.fft.rfft(sig, axis=0)
feats = []
for lo, hi in self.BANDS:
mask = (freqs >= lo) & (freqs < hi)
if not mask.any():
feats.extend([0.0] * len(self._r)); continue
CSD = (Xf[mask].conj().T @ Xf[mask]).real / N
if self.log_diag:
for k in range(self.n_ch): CSD[k,k] = np.log(max(CSD[k,k], 1e-10))
feats.extend(CSD[self._r, self._c].tolist())
return np.array(feats, dtype=np.float32)
def extract_batch(self, X):
out = np.zeros((len(X), self.n_features), dtype=np.float32)
for i in range(len(X)): out[i] = self.extract_window(X[i])
return out
In EMGClassifier.train(), after standard feature extraction:
if getattr(self, 'use_mpf', False):
mpf = MPFFeatureExtractor(channels=HAND_CHANNELS)
X_features = np.hstack([X_features, mpf.extract_batch(X_aug)])
Change 7 — Ensemble Training
Priority: Tier 3 (implements Change F's training side)
New file: C:/VSCode/Marvel_Projects/Bucky_Arm/train_ensemble.py
"""
Train the full 3-specialist-LDA + meta-LDA ensemble.
Requires Change 1 (expanded features) to be implemented first.
Exports model_weights_ensemble.h for firmware Change F.
Architecture:
LDA_TD (36 time-domain feat) ─┐
LDA_FD (24 freq-domain feat) ├─ 15 probs ─► Meta-LDA ─► final class
LDA_CC (9 cross-ch feat) ─┘
"""
import numpy as np
from pathlib import Path
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.model_selection import cross_val_predict, GroupKFold, cross_val_score
import sys
sys.path.insert(0, str(Path(__file__).parent))
from learning_data_collection import (
SessionStorage, EMGFeatureExtractor, HAND_CHANNELS
)
# ─── Load and extract features ───────────────────────────────────────────────
storage = SessionStorage()
X_raw, y, trial_ids, session_indices, label_names, _ = storage.load_all_for_training()
extractor = EMGFeatureExtractor(channels=HAND_CHANNELS, cross_channel=True)
X = extractor.extract_features_batch(X_raw).astype(np.float64)
# Per-session normalization (same as EMGClassifier._apply_session_normalization)
from sklearn.preprocessing import StandardScaler
for sid in np.unique(session_indices):
mask = session_indices == sid
sc = StandardScaler()
X[mask] = sc.fit_transform(X[mask])
feat_names = extractor.get_feature_names(n_channels=len(HAND_CHANNELS))
n_cls = len(np.unique(y))
# ─── Feature subset indices ───────────────────────────────────────────────────
TD_FEAT = ['rms','wl','zc','ssc','mav','var','iemg','wamp','ar1','ar2','ar3','ar4']
FD_FEAT = ['mnf','mdf','pkf','mnp','bp0','bp1','bp2','bp3']
td_idx = [i for i,n in enumerate(feat_names) if any(n.endswith(f'_{f}') for f in TD_FEAT)]
fd_idx = [i for i,n in enumerate(feat_names) if any(n.endswith(f'_{f}') for f in FD_FEAT)]
cc_idx = [i for i,n in enumerate(feat_names) if n.startswith('cov_') or n.startswith('cor_')]
print(f"Feature subsets — TD: {len(td_idx)}, FD: {len(fd_idx)}, CC: {len(cc_idx)}")
X_td = X[:, td_idx]
X_fd = X[:, fd_idx]
X_cc = X[:, cc_idx]
# ─── Train specialist LDAs with out-of-fold stacking ─────────────────────────
gkf = GroupKFold(n_splits=5)
print("Training specialist LDAs (out-of-fold for stacking)...")
lda_td = LinearDiscriminantAnalysis()
lda_fd = LinearDiscriminantAnalysis()
lda_cc = LinearDiscriminantAnalysis()
oof_td = cross_val_predict(lda_td, X_td, y, cv=gkf, groups=trial_ids, method='predict_proba')
oof_fd = cross_val_predict(lda_fd, X_fd, y, cv=gkf, groups=trial_ids, method='predict_proba')
oof_cc = cross_val_predict(lda_cc, X_cc, y, cv=gkf, groups=trial_ids, method='predict_proba')
# Specialist CV accuracy (for diagnostics)
for name, mdl, Xs in [('LDA_TD', lda_td, X_td), ('LDA_FD', lda_fd, X_fd), ('LDA_CC', lda_cc, X_cc)]:
sc = cross_val_score(mdl, Xs, y, cv=gkf, groups=trial_ids)
print(f" {name}: {sc.mean()*100:.1f}% ± {sc.std()*100:.1f}%")
# ─── Train meta-LDA on out-of-fold outputs ───────────────────────────────────
X_meta = np.hstack([oof_td, oof_fd, oof_cc]) # (n_samples, 3*n_cls = 15)
meta_lda = LinearDiscriminantAnalysis()
meta_sc = cross_val_score(meta_lda, X_meta, y, cv=gkf, groups=trial_ids)
print(f" Meta-LDA: {meta_sc.mean()*100:.1f}% ± {meta_sc.std()*100:.1f}%")
# Fit all models on full dataset for deployment
lda_td.fit(X_td, y); lda_fd.fit(X_fd, y); lda_cc.fit(X_cc, y)
meta_lda.fit(X_meta, y)
# ─── Export all weights to C header ──────────────────────────────────────────
def lda_to_c_arrays(lda, name, feat_dim, n_cls, label_names, class_order):
"""Generate C array strings for LDA weights and intercepts."""
# Reorder classes to match label_names order
coef = lda.coef_ # shape (n_cls, feat_dim) for LinearDiscriminantAnalysis
intercept = lda.intercept_
lines = []
lines.append(f"const float {name}_WEIGHTS[{n_cls}][{feat_dim}] = {{")
for c in class_order:
row = ', '.join(f'{v:.8f}f' for v in coef[c])
lines.append(f" {{{row}}}, // {label_names[c]}")
lines.append("};")
lines.append(f"const float {name}_INTERCEPTS[{n_cls}] = {{")
intercept_str = ', '.join(f'{intercept[c]:.8f}f' for c in class_order)
lines.append(f" {intercept_str}")
lines.append("};")
return '\n'.join(lines)
class_order = list(range(n_cls))
out_path = Path('EMG_Arm/src/core/model_weights_ensemble.h')
with open(out_path, 'w') as f:
f.write("// Auto-generated by train_ensemble.py — do not edit\n")
f.write("#pragma once\n\n")
f.write(f"#define MODEL_NUM_CLASSES {n_cls}\n")
f.write(f"#define MODEL_NUM_FEATURES {X.shape[1]}\n")
f.write(f"#define ENSEMBLE_PER_CH_FEATURES 20\n\n")
f.write(f"#define TD_FEAT_OFFSET {min(td_idx)}\n")
f.write(f"#define TD_NUM_FEATURES {len(td_idx)}\n")
f.write(f"#define FD_FEAT_OFFSET {min(fd_idx)}\n")
f.write(f"#define FD_NUM_FEATURES {len(fd_idx)}\n")
f.write(f"#define CC_FEAT_OFFSET {min(cc_idx)}\n")
f.write(f"#define CC_NUM_FEATURES {len(cc_idx)}\n")
f.write(f"#define META_NUM_INPUTS ({3} * MODEL_NUM_CLASSES)\n\n")
f.write(lda_to_c_arrays(lda_td, 'LDA_TD', len(td_idx), n_cls, label_names, class_order))
f.write('\n\n')
f.write(lda_to_c_arrays(lda_fd, 'LDA_FD', len(fd_idx), n_cls, label_names, class_order))
f.write('\n\n')
f.write(lda_to_c_arrays(lda_cc, 'LDA_CC', len(cc_idx), n_cls, label_names, class_order))
f.write('\n\n')
f.write(lda_to_c_arrays(meta_lda, 'META_LDA', 3*n_cls, n_cls, label_names, class_order))
f.write('\n\n')
names_str = ', '.join(f'"{label_names[c]}"' for c in class_order)
f.write(f"const char *MODEL_CLASS_NAMES[MODEL_NUM_CLASSES] = {{{names_str}}};\n")
print(f"Exported ensemble weights to {out_path}")
print(f"Total weight storage: {(len(td_idx)+len(fd_idx)+len(cc_idx)+3*n_cls)*n_cls*4} bytes float32")
Note on LinearDiscriminantAnalysis with multi-class: scikit-learn's LDA uses a
(n_classes-1, n_features) coef matrix for multi-class. Verify lda.coef_.shape after
fitting — if it is (n_cls-1, n_feat) rather than (n_cls, n_feat), use the
decision_function() output structure and adjust the export accordingly.
PART VII — FEATURE SELECTION FOR ESP32 PORTING
After Change 1 is trained, use this to decide what to port to C firmware.
Step 1 — Get Feature Importance
importance = np.abs(classifier.model.coef_).mean(axis=0)
feat_names = classifier.feature_extractor.get_feature_names(n_channels=len(HAND_CHANNELS))
ranked = sorted(zip(feat_names, importance), key=lambda x: -x[1])
print("Top 20 features by LDA discriminative weight:")
for name, score in ranked[:20]:
print(f" {name:<35} {score:.4f}")
Step 2 — Port Decision Matrix
| Feature | C Complexity | Prereq | Port? |
|---|---|---|---|
| RMS, WL, ZC, SSC | ✓ Already in C | — | Keep |
| MAV, VAR, IEMG | Very easy (1 loop) | None | ✓ Yes |
| WAMP | Very easy (threshold on diff) | None | ✓ Yes |
| Cross-ch covariance | Easy (3×3 outer product) | None | ✓ Yes |
| Cross-ch correlation | Easy (normalize covariance) | Covariance | ✓ Yes |
| Bandpower bp0–bp3 | Medium (128-pt FFT via esp-dsp) | Add FFT call | ✓ Yes — highest ROI |
| MNF, MDF, PKF, MNP | Easy after FFT | Bandpower FFT | ✓ Free once FFT added |
| AR(4) | Medium (Levinson-Durbin in C) | None | Only if top-8 importance |
Once dsps_fft2r_fc32() is added for bandpower, MNF/MDF/PKF/MNP come free.
Step 3 — Adding FFT-Based Features to inference.c
Add inside compute_features() loop, after time-domain features per channel:
// 128-pt FFT for frequency-domain features per channel
// Zero-pad signal from INFERENCE_WINDOW_SIZE (150) to 128 by truncating
float fft_buf[256] = {0}; // 128 complex floats
for (int i = 0; i < 128 && i < INFERENCE_WINDOW_SIZE; i++) {
fft_buf[2*i] = signal[i]; // real
fft_buf[2*i+1] = 0.0f; // imag
}
dsps_fft2r_fc32(fft_buf, 128);
dsps_bit_rev_fc32(fft_buf, 128);
// Bandpower: bin k → freq = k * 1000/128 ≈ k * 7.8125 Hz
// Band 0: 20–80 Hz → bins 3–10
// Band 1: 80–150 Hz → bins 10–19
// Band 2: 150–300 Hz→ bins 19–38
// Band 3: 300–500 Hz→ bins 38–64
int band_bins[5] = {3, 10, 19, 38, 64};
float bp[4] = {0,0,0,0};
for (int b = 0; b < 4; b++)
for (int k = band_bins[b]; k < band_bins[b+1]; k++) {
float re = fft_buf[2*k], im = fft_buf[2*k+1];
bp[b] += re*re + im*im;
}
// Store at correct indices (base = ch * 20)
int base = ch * 20;
features_out[base+16] = bp[0]; features_out[base+17] = bp[1];
features_out[base+18] = bp[2]; features_out[base+19] = bp[3];
PART VIII — MEASUREMENT AND VALIDATION
Baseline Protocol
Run this BEFORE any change and after EACH change.
1. python learning_data_collection.py → option 3 (Train Classifier)
2. Record:
- "Mean CV accuracy: XX.X% ± Y.Y%" (cross-validation)
- Confusion matrix (which gesture pairs are most confused)
- Per-gesture accuracy breakdown
3. On-device test:
- Put on sensors, perform 10 reps of each gesture
- Log classification output (UART or Python serial monitor)
- Compute per-gesture accuracy manually
4. Record REST false-trigger rate: hold arm at rest for 30 seconds,
count number of non-REST outputs
Results Log
| Change | CV Acc Before | CV Acc After | Delta | On-Device Acc | False Triggers/30s | Keep? |
|---|---|---|---|---|---|---|
| Baseline | — | — | — | — | — | — |
| Change C (reject) | — | — | — | — | — | — |
| Change B (filter) | — | — | — | — | — | — |
| Change 0 (label shift) | — | — | — | — | — | — |
| Change 1 (features) | — | — | — | — | — | — |
| Change D (NVS calib) | — | — | — | — | — | — |
| Change 3 (augment) | — | — | — | — | — | — |
| Change 5 (benchmark) | — | — | — | — | — | — |
| Change 7+F (ensemble) | — | — | — | — | — | — |
| Change E (MLP) | — | — | — | — | — | — |
When to Add More Gestures
| CV Accuracy | Recommendation |
|---|---|
| <80% | Do NOT add gestures — fix the existing 5 first |
| 80–90% | Adding 1–2 gestures is reasonable; expect 5–8% drop per new gesture |
| >90% | Good baseline; can add gestures; target staying above 85% |
| >95% | Excellent; can be ambitious with gesture count |
PART IX — EXPORT WORKFLOW
Path 1 — LDA / Ensemble (Changes 0–4, 7+F)
1. Train: python learning_data_collection.py → option 3 (single LDA)
OR: python train_ensemble.py (full ensemble)
2. Export:
Single LDA: classifier.export_to_header(Path('EMG_Arm/src/core/model_weights.h'))
Ensemble: export_ensemble_header() in train_ensemble.py
→ writes model_weights_ensemble.h
3. Port new features to inference.c (if Change 1 features added):
- Follow feature selection decision matrix (Part VII)
- CRITICAL: C feature index order MUST match Python FEATURE_ORDER exactly
4. Build + flash: pio run -t upload
Path 2 — int8 MLP via TFLM (Change E)
1. python train_mlp_tflite.py → emg_model_data.cc
2. Add TFLM to platformio.ini lib_deps
3. Replace LDA inference call with inference_mlp_predict() in inference.c
OR use inference_ensemble_predict() which calls MLP as fallback (Change F)
4. pio run -t upload
Feature Index Contract (Critical)
The order of values written to features_out[] in compute_features() in C must exactly
match FEATURE_ORDER in extract_features_window() in Python, index for index.
To verify before flashing: print both the C feature names (from MODEL_FEATURE_NAMES if
added to header) and Python extractor.get_feature_names() and diff them.
PART X — REFERENCES
Primary paper: Kaifosh, P., Reardon, T., et al. "A high-bandwidth neuromotor prosthesis enabled by implicit information in intrinsic motor neurons." Nature (2025). doi:10.1038/s41586-025-09255-w
Meta codebase (label alignment, CLER metric, model architectures):
C:/VSCode/Marvel_Projects/Meta_Emg_Stuff/generic-neuromotor-interface/
data.py: onset detection,searchsortedalignment, window jittercler.py: threshold=0.35, debounce=50ms, tolerance=±50/250msnetworks.py: model architectures, left_context=20, stride=10lightning.py:targets[..., left_context::stride]label shift
Barachant et al. 2012: "Multiclass brain–computer interface classification by Riemannian geometry." — matrix logarithm reference (MPF features).
Espressif libraries:
- esp-dsp:
github.com/espressif/esp-dsp— biquad, FFT, dot-product - esp-dl:
github.com/espressif/esp-dl— quantized MLP/CNN inference - TFLite Micro:
github.com/tensorflow/tflite-micro
All project files (existing + planned):
── Laptop / Python ─────────────────────────────────────────────────────────────────────────
C:/VSCode/Marvel_Projects/Bucky_Arm/learning_data_collection.py ← main: data collection + training
C:/VSCode/Marvel_Projects/Bucky_Arm/live_predict.py ← NEW (Part 0.6): laptop-side live inference
C:/VSCode/Marvel_Projects/Bucky_Arm/train_ensemble.py ← NEW (Change 7): ensemble training
C:/VSCode/Marvel_Projects/Bucky_Arm/train_mlp_tflite.py ← NEW (Change E): int8 MLP export
── ESP32 Firmware — Existing ───────────────────────────────────────────────────────────────
C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/platformio.ini
└─ ADD lib_deps: espressif/esp-dsp (Changes B,1,F), tensorflow/tflite-micro (Change E)
C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/config/config.h
└─ MODIFY: remove system_mode_t; add EMG_STANDALONE to MAIN_MODE enum (Part 0.7, S1)
C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/app/main.c
└─ MODIFY: add STATE_LAPTOP_PREDICT, CMD_START_LAPTOP_PREDICT, run_laptop_predict_loop(),
run_standalone_loop() (Part 0.5)
C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/drivers/emg_sensor.c
└─ MODIFY (Change A): migrate from adc_oneshot to adc_continuous driver
C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/core/inference.c
└─ MODIFY: add inference_get_gesture_by_name(), IIR filter (B), features (1), confidence rejection (C)
C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/core/inference.h
└─ MODIFY: add inference_get_gesture_by_name() declaration
C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/core/gestures.c
└─ MODIFY: update gesture_names[] and gestures_execute() when adding gestures
C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/core/model_weights.h
└─ AUTO-GENERATED by export_to_header() — do not edit manually
── ESP32 Firmware — New Files ──────────────────────────────────────────────────────────────
C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/core/bicep.h/.c ← Part 0 / Section 2.2
C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/core/calibration.h/.c ← Change D (NVS z-score)
C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/core/inference_ensemble.h/.c ← Change F
C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/core/inference_mlp.h/.cc ← Change E
C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/core/model_weights_ensemble.h ← AUTO-GENERATED (Change 7)
C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/core/emg_model_data.h/.cc ← AUTO-GENERATED (Change E)