Files

Surya Balaji 349bcffc71 Multi-model prediction on laptop | need to move to on-board prediction

2026-03-10 11:39:02 -05:00

112 KiB

Raw Blame History

Bucky Arm — EMG Gesture Control: Master Implementation Reference

Version: 2026-03-01 | Target: ESP32-S3 N32R16V (Xtensa LX7 @ 240 MHz, 512 KB SRAM, 16 MB OPI PSRAM) Supersedes: META_EMG_RESEARCH_NOTES.md + BUCKY_ARM_IMPROVEMENT_PLAN.md Source paper: doi:10.1038/s41586-025-09255-w (PDF: C:/VSCode/Marvel_Projects/s41586-025-09255-w.pdf)

PART 0 — SYSTEM ARCHITECTURE & RESPONSIBILITY ASSIGNMENT
PART I — SYSTEM FOUNDATIONS
PART II — TARGET ARCHITECTURE
PART III — GESTURE EXTENSIBILITY
PART IV — CHANGE REFERENCE
- 11. Change Classification Matrix
PART V — FIRMWARE CHANGES
PART VI — PYTHON/TRAINING CHANGES
PART VII — FEATURE SELECTION FOR ESP32 PORTING
PART VIII — MEASUREMENT AND VALIDATION
PART IX — EXPORT WORKFLOW
PART X — REFERENCES

PART 0 — SYSTEM ARCHITECTURE & RESPONSIBILITY ASSIGNMENT

This section is the authoritative reference for what runs where. All implementation decisions in later parts should be consistent with this partition.

0.1 Who Does What

Responsibility	Laptop (Python)	ESP32
EMG sensor reading	—	✓ `emg_sensor_read()` always
Raw data streaming (for collection)	Receives CSV, saves to HDF5	Streams CSV over UART
Model training	✓ `learning_data_collection.py`	—
Model export	✓ `export_to_header()` → `model_weights.h`	Compiled into firmware
On-device inference	—	✓ `inference_predict()`
Laptop-side live inference	✓ `live_predict.py` (new script)	Streams ADC + executes received cmd
Arm actuation	— (sends gesture string back to ESP32)	✓ `gestures_execute()`
Autonomous operation (no laptop)	Not needed	✓ `EMG_STANDALONE` mode
Bicep flex detection	—	✓ `bicep_detect()` (new, Section 2.2)
NVS calibration	—	✓ `calibration.c` (Change D)

Key rule: The laptop is never required for real-time arm control in production. The laptop's role is: collect data → train model → export → flash firmware → done. After that, the ESP32 operates completely independently.

0.2 Operating Modes

Controlled by #define MAIN_MODE in config/config.h. The enum currently reads enum {EMG_MAIN, SERVO_CALIBRATOR, GESTURE_TESTER}. A new value EMG_STANDALONE must be added.

`MAIN_MODE`	When to use	Laptop required?	Entry point
`EMG_MAIN`	Development sessions, data collection, monitored operation	Yes — UART handshake to start any mode	`appConnector()` in `main.c`
`EMG_STANDALONE`	Fully autonomous deployment — no laptop	No — boots directly into predict+control	`run_standalone_loop()` (new function in `main.c`)
`SERVO_CALIBRATOR`	Hardware setup, testing servo range of motion	Yes (serial input)	Inline in `app_main()`
`GESTURE_TESTER`	Testing gesture→servo mapping via keyboard	Yes (serial input)	Inline in `app_main()`

How to switch mode: change #define MAIN_MODE in config.h and reflash.

To add EMG_STANDALONE to config.h (1-line change):

// config.h line 19 — current:
enum {EMG_MAIN, SERVO_CALIBRATOR, GESTURE_TESTER};

// Update to:
enum {EMG_MAIN, SERVO_CALIBRATOR, GESTURE_TESTER, EMG_STANDALONE};

0.3 FSM Reference (EMG_MAIN mode)

The device_state_t enum in main.c and the command_t enum control all transitions. Currently: {STATE_IDLE, STATE_CONNECTED, STATE_STREAMING, STATE_PREDICTING}. A new state STATE_LAPTOP_PREDICT must be added (see Section 0.5).

STATE_IDLE
  └─ {"cmd":"connect"} ──────────────────────────► STATE_CONNECTED
                                                         │
                               {"cmd":"start"} ──────────┤
                                                         │    STATE_STREAMING
                                                         │    ESP32 sends raw ADC CSV at 1kHz
                                                         │    Laptop: saves to HDF5 (data collection)
                                                         │    Laptop: trains model → exports model_weights.h
                                                         │    ◄──── {"cmd":"stop"} ────────────────────┘
                                                         │
                        {"cmd":"start_predict"} ─────────┤
                                                         │    STATE_PREDICTING
                                                         │    ESP32: inference_predict() on-device
                                                         │    ESP32: gestures_execute()
                                                         │    Laptop: optional UART monitor only
                                                         │    ◄──── {"cmd":"stop"} ────────────────────┘
                                                         │
                   {"cmd":"start_laptop_predict"} ───────┘
                                                              STATE_LAPTOP_PREDICT  [NEW]
                                                              ESP32: streams raw ADC CSV (same as STREAMING)
                                                              Laptop: runs live_predict.py inference
                                                              Laptop: sends {"gesture":"fist"} back
                                                              ESP32: executes received gesture command
                                                              ◄──── {"cmd":"stop"} ────────────────────┘

All active states:
  {"cmd":"stop"}       → STATE_CONNECTED
  {"cmd":"disconnect"} → STATE_IDLE
  {"cmd":"connect"}    → STATE_CONNECTED  (from any state — reconnect)

Convenience table of commands and their effects:

JSON command	Valid from state	Result
`{"cmd":"connect"}`	Any	→ `STATE_CONNECTED`
`{"cmd":"start"}`	`STATE_CONNECTED`	→ `STATE_STREAMING`
`{"cmd":"start_predict"}`	`STATE_CONNECTED`	→ `STATE_PREDICTING`
`{"cmd":"start_laptop_predict"}`	`STATE_CONNECTED`	→ `STATE_LAPTOP_PREDICT` (new)
`{"cmd":"stop"}`	`STREAMING/PREDICTING/LAPTOP_PREDICT`	→ `STATE_CONNECTED`
`{"cmd":"disconnect"}`	Any active state	→ `STATE_IDLE`

0.4 EMG_STANDALONE Boot Sequence

No UART handshake. No laptop required. Powers on → predicts → controls arm.

app_main() switch MAIN_MODE == EMG_STANDALONE:
  │
  ├── hand_init()            // servos
  ├── emg_sensor_init()      // ADC setup
  ├── inference_init()       // clear window buffer, reset smoothing state
  ├── calibration_init()     // load NVS z-score params (Change D)
  │       └── if not found in NVS:
  │               collect 120 REST windows (~3s at 25ms hop)
  │               call calibration_update() to compute and store stats
  ├── bicep_load_threshold() // load NVS bicep threshold (Section 2.2)
  │       └── if not found:
  │               collect 3s of still bicep data
  │               call bicep_calibrate() and bicep_save_threshold()
  │
  └── run_standalone_loop()  ← NEW function (added to main.c)
        while (1):
          emg_sensor_read(&sample)
          inference_add_sample(sample.channels)
          if stride_counter++ >= INFERENCE_HOP_SIZE:
            stride_counter = 0
            gesture_t g = inference_get_gesture_enum(inference_predict(&conf))
            gestures_execute(g)
            bicep_state_t b = bicep_detect()
            // (future: bicep_actuate(b))
          vTaskDelay(1)

run_standalone_loop() is structurally identical to run_inference_loop() in EMG_MAIN, minus all UART state-change checking and telemetry prints. It runs forever until power-off.

Where to add: New function run_standalone_loop() in app/main.c, plus a new case in the app_main() switch block:

case EMG_STANDALONE:
    run_standalone_loop();
    break;

0.5 New Firmware Changes for Architecture

These changes are needed to implement the architecture above. They are structural (not accuracy improvements) and should be done before any other changes.

S1 — Add `EMG_STANDALONE` to `config.h`

File: EMG_Arm/src/config/config.h, line 19

// Change:
enum {EMG_MAIN, SERVO_CALIBRATOR, GESTURE_TESTER};
// To:
enum {EMG_MAIN, SERVO_CALIBRATOR, GESTURE_TESTER, EMG_STANDALONE};

S2 — Add `STATE_LAPTOP_PREDICT` to FSM (`main.c`)

File: EMG_Arm/src/app/main.c

// In device_state_t enum — add new state:
typedef enum {
  STATE_IDLE = 0,
  STATE_CONNECTED,
  STATE_STREAMING,
  STATE_PREDICTING,
  STATE_LAPTOP_PREDICT,  // ← ADD: streams ADC to laptop, executes laptop's gesture commands
} device_state_t;

// In command_t enum — add new command:
typedef enum {
  CMD_NONE = 0,
  CMD_CONNECT,
  CMD_START,
  CMD_START_PREDICT,
  CMD_START_LAPTOP_PREDICT,  // ← ADD
  CMD_STOP,
  CMD_DISCONNECT,
} command_t;

In parse_command() — add detection (place BEFORE the "start" check to avoid prefix collision):

} else if (strncmp(value_start, "start_laptop_predict", 20) == 0) {
    return CMD_START_LAPTOP_PREDICT;
} else if (strncmp(value_start, "start_predict", 13) == 0) {
    return CMD_START_PREDICT;
} else if (strncmp(value_start, "start", 5) == 0) {
    return CMD_START;

In serial_input_task() FSM switch — add to STATE_CONNECTED block:

} else if (cmd == CMD_START_LAPTOP_PREDICT) {
    g_device_state = STATE_LAPTOP_PREDICT;
    printf("[STATE] CONNECTED -> LAPTOP_PREDICT\n");
    xQueueSend(g_cmd_queue, &cmd, 0);
}

Add to the active-state check in serial_input_task():

case STATE_STREAMING:
case STATE_PREDICTING:
case STATE_LAPTOP_PREDICT:  // ← ADD to the case list
    if (cmd == CMD_STOP) { ... }

New function run_laptop_predict_loop() (add alongside stream_emg_data() and run_inference_loop()):

/**
 * @brief Laptop-mediated prediction loop (STATE_LAPTOP_PREDICT).
 *
 * Streams raw ADC CSV to laptop for inference.
 * Simultaneously reads gesture commands sent back by laptop.
 * Executes received gesture immediately.
 *
 * Laptop sends: {"gesture":"fist"}\n  OR  {"gesture":"rest"}\n  etc.
 * ESP32 parses the "gesture" field and calls inference_get_gesture_enum() + gestures_execute().
 */
static void run_laptop_predict_loop(void) {
    emg_sample_t sample;
    char cmd_buf[64];
    int cmd_idx = 0;

    printf("{\"status\":\"info\",\"msg\":\"Laptop-predict mode started\"}\n");

    while (g_device_state == STATE_LAPTOP_PREDICT) {
        // 1. Send raw ADC sample (same format as STATE_STREAMING)
        emg_sensor_read(&sample);
        printf("%u,%u,%u,%u\n", sample.channels[0], sample.channels[1],
               sample.channels[2], sample.channels[3]);

        // 2. Non-blocking read of any incoming gesture command from laptop
        //    (serial_input_task already handles FSM commands; this handles gesture commands)
        //    Note: getchar() is non-blocking when there is no data (returns EOF).
        //    Gesture messages from laptop look like: {"gesture":"fist"}\n
        int c = getchar();
        if (c != EOF && c != 0xFF) {
            if (c == '\n' || c == '\r') {
                if (cmd_idx > 0) {
                    cmd_buf[cmd_idx] = '\0';
                    // Parse {"gesture":"<name>"} — look for "gesture" field
                    const char *g = strstr(cmd_buf, "\"gesture\"");
                    if (g) {
                        const char *v = strchr(g, ':');
                        if (v) {
                            v++;
                            while (*v == ' ' || *v == '"') v++;
                            // Extract gesture name up to closing quote
                            char name[32] = {0};
                            int ni = 0;
                            while (*v && *v != '"' && ni < 31) name[ni++] = *v++;
                            name[ni] = '\0';
                            // Map name to enum and execute (reuse inference mapping)
                            gesture_t gesture = (gesture_t)inference_get_gesture_enum_by_name(name);
                            if (gesture != GESTURE_NONE) {
                                gestures_execute(gesture);
                            }
                        }
                    }
                    cmd_idx = 0;
                }
            } else if (cmd_idx < (int)sizeof(cmd_buf) - 1) {
                cmd_buf[cmd_idx++] = (char)c;
            } else {
                cmd_idx = 0;
            }
        }

        vTaskDelay(1);
    }
}

Note: inference_get_gesture_enum_by_name(const char *name) is just the existing inference_get_gesture_enum(int class_idx) refactored to accept a string directly (bypassing the class_idx lookup). Alternatively, keep the existing function and add a simple wrapper — the string matching logic already exists in inference.c:

// Simpler: reuse the existing strcmp chain in inference_get_gesture_enum()
// by passing the name through a helper that returns the gesture_t directly.
// Add to inference.c / inference.h:
gesture_t inference_get_gesture_by_name(const char *name);
// (same strcmp logic as inference_get_gesture_enum, but returns gesture_t directly)

In state_machine_loop() — add the new state:

static void state_machine_loop(void) {
    command_t cmd;
    const TickType_t poll_interval = pdMS_TO_TICKS(50);
    while (1) {
        if      (g_device_state == STATE_STREAMING)        stream_emg_data();
        else if (g_device_state == STATE_PREDICTING)       run_inference_loop();
        else if (g_device_state == STATE_LAPTOP_PREDICT)   run_laptop_predict_loop();  // ← ADD
        xQueueReceive(g_cmd_queue, &cmd, poll_interval);
    }
}

In app_main() switch — add the standalone case:

case EMG_STANDALONE:
    run_standalone_loop();  // new function — see Section 0.4
    break;

0.6 New Python Script: `live_predict.py`

Location: C:/VSCode/Marvel_Projects/Bucky_Arm/live_predict.py (new file) Purpose: Laptop-side live inference. Reads raw ADC stream from ESP32, runs the Python classifier, sends gesture commands back to ESP32 for arm control. When to use: EMG_MAIN + STATE_LAPTOP_PREDICT — useful for debugging and comparing laptop accuracy vs on-device accuracy before flashing a new model.

"""
live_predict.py — Laptop-side live EMG inference for Bucky Arm.

Connects to ESP32, requests STATE_LAPTOP_PREDICT, reads raw ADC CSV,
runs the trained Python classifier, sends gesture commands back to ESP32.

Usage:
    python live_predict.py --port COM3 --model path/to/saved_model/
"""
import argparse
import time
import numpy as np
import serial
from pathlib import Path
import sys
sys.path.insert(0, str(Path(__file__).parent))
from learning_data_collection import (
    EMGClassifier, EMGFeatureExtractor, SessionStorage, HAND_CHANNELS,
    WINDOW_SIZE_SAMPLES, HOP_SIZE_SAMPLES, NUM_CHANNELS,
)

BAUD_RATE    = 921600
CALIB_SEC    = 3.0          # seconds of REST to collect for normalization at startup
CALIB_LABEL  = "rest"       # label used during calibration window

def parse_args():
    p = argparse.ArgumentParser()
    p.add_argument("--port",  required=True, help="Serial port, e.g. COM3 or /dev/ttyUSB0")
    p.add_argument("--model", required=True, help="Path to saved EMGClassifier model directory")
    return p.parse_args()

def handshake(ser):
    """Send connect command, wait for ack."""
    ser.write(b'{"cmd":"connect"}\n')
    deadline = time.time() + 5.0
    while time.time() < deadline:
        line = ser.readline().decode("utf-8", errors="ignore").strip()
        if "ack_connect" in line:
            print(f"[Handshake] Connected: {line}")
            return True
    raise RuntimeError("No ack_connect received within 5s")

def collect_calibration_windows(ser, n_windows, window_size, hop_size, n_channels):
    """Collect n_windows worth of REST data for normalization calibration."""
    print(f"[Calib] Collecting {n_windows} REST windows — hold arm still...")
    raw_buffer = np.zeros((window_size, n_channels), dtype=np.float32)
    windows = []
    sample_count = 0
    while len(windows) < n_windows:
        line = ser.readline().decode("utf-8", errors="ignore").strip()
        try:
            vals = [float(v) for v in line.split(",")]
            if len(vals) != n_channels:
                continue
        except ValueError:
            continue
        raw_buffer = np.roll(raw_buffer, -1, axis=0)
        raw_buffer[-1] = vals
        sample_count += 1
        if sample_count >= window_size and sample_count % hop_size == 0:
            windows.append(raw_buffer.copy())
    print(f"[Calib] Collected {len(windows)} windows. Computing normalization stats...")
    return np.array(windows)  # (n_windows, window_size, n_channels)

def main():
    args = parse_args()

    # Load trained classifier
    print(f"[Init] Loading classifier from {args.model}...")
    classifier = EMGClassifier()
    classifier.load(Path(args.model))
    extractor = classifier.feature_extractor

    ser = serial.Serial(args.port, BAUD_RATE, timeout=1.0)
    time.sleep(0.5)
    ser.reset_input_buffer()

    handshake(ser)

    # Request laptop-predict mode
    ser.write(b'{"cmd":"start_laptop_predict"}\n')
    print("[Control] Entered STATE_LAPTOP_PREDICT")

    # Calibration: collect 3s of REST for session normalization
    n_calib_windows = max(10, int(CALIB_SEC * 1000 / (HOP_SIZE_SAMPLES)))
    calib_raw = collect_calibration_windows(
        ser, n_calib_windows, WINDOW_SIZE_SAMPLES, HOP_SIZE_SAMPLES, NUM_CHANNELS
    )
    calib_features = extractor.extract_features_batch(calib_raw)
    calib_mean = calib_features.mean(axis=0)
    calib_std  = np.where(calib_features.std(axis=0) > 1e-6,
                          calib_features.std(axis=0), 1e-6)
    print("[Calib] Done. Starting live prediction...")

    # Live prediction loop
    raw_buffer   = np.zeros((WINDOW_SIZE_SAMPLES, NUM_CHANNELS), dtype=np.float32)
    sample_count = 0
    last_gesture = None

    try:
        while True:
            line = ser.readline().decode("utf-8", errors="ignore").strip()

            # Skip JSON telemetry lines from ESP32
            if line.startswith("{"):
                continue

            try:
                vals = [float(v) for v in line.split(",")]
                if len(vals) != NUM_CHANNELS:
                    continue
            except ValueError:
                continue

            # Slide window
            raw_buffer = np.roll(raw_buffer, -1, axis=0)
            raw_buffer[-1] = vals
            sample_count += 1

            if sample_count >= WINDOW_SIZE_SAMPLES and sample_count % HOP_SIZE_SAMPLES == 0:
                # Extract features and normalize with session stats
                feat = extractor.extract_features_window(raw_buffer)
                feat = (feat - calib_mean) / calib_std

                proba = classifier.model.predict_proba([feat])[0]
                class_idx = int(np.argmax(proba))
                gesture_name = classifier.label_names[class_idx]
                confidence   = float(proba[class_idx])

                # Send gesture command to ESP32
                cmd = f'{{"gesture":"{gesture_name}"}}\n'
                ser.write(cmd.encode("utf-8"))

                if gesture_name != last_gesture:
                    print(f"[Predict] {gesture_name:12s}  conf={confidence:.2f}")
                    last_gesture = gesture_name

    except KeyboardInterrupt:
        print("\n[Stop] Sending stop command...")
        ser.write(b'{"cmd":"stop"}\n')
        ser.close()

if __name__ == "__main__":
    main()

Dependencies (add to a requirements.txt in Bucky_Arm/ if not already there):

pyserial
numpy
scikit-learn

0.7 Firmware Cleanup: `system_mode_t` Removal

config.h lines 94–100 define a system_mode_t typedef that is not referenced anywhere in the firmware. It predates the current device_state_t FSM in main.c and conflicts conceptually with it. Remove before starting implementation work.

File: EMG_Arm/src/config/config.h Remove (lines 93–100):

/**
 * @brief System operating modes.
 */
typedef enum {
    MODE_IDLE = 0,      /**< Waiting for commands */
    MODE_DATA_STREAM,   /**< Streaming EMG data to laptop */
    MODE_COMMAND,       /**< Executing gesture commands from laptop */
    MODE_DEMO,          /**< Running demo sequence */
    MODE_COUNT
} system_mode_t;

No other file references system_mode_t — the deletion is safe and requires no other changes.

PART I — SYSTEM FOUNDATIONS

1. Hardware Specification

ESP32-S3 N32R16V — Confirmed Hardware

Resource	Spec	Implication
CPU	Dual-core Xtensa LX7 @ 240 MHz	Pin inference to Core 1, sampling to Core 0
SIMD	PIE 128-bit vector extension	esp-dsp exploits this for FFT, biquad, dot-product
Internal SRAM	~512 KB	All hot-path buffers, model weights, inference state
OPI PSRAM	16 MB (~80 MB/s)	ADC ring buffer, raw window storage — not hot path
Flash	32 MB	Code + read-only model flatbuffers (TFLM path)
ADC	2× SAR ADC, 12-bit, continuous DMA mode	Change A: use `adc_continuous` driver

Memory rules:

Tag inference code: IRAM_ATTR — prevents cache miss stalls
Tag large ring buffers: EXT_RAM_BSS_ATTR — pushes to PSRAM automatically
Never run hot-path loops from PSRAM (latency varies; ~10× slower than SRAM)

Espressif Acceleration Libraries

Library	Accelerates	Key Functions
esp-dsp	IIR biquad, FFT (up to 4096-pt), vector dot-product, matrix ops — PIE SIMD	`dsps_biquad_f32`, `dsps_fft2r_fc32`, `dsps_dotprod_f32`
esp-nn	int8 FC, depthwise/pointwise Conv, activations — SIMD optimized	Used internally by esp-dl
esp-dl	High-level int8 inference: MLP, Conv1D, LSTM; activation buffer management	Small MLP / tiny CNN deployment
TFLite Micro	Standard int8 flatbuffer inference, tensor arena (static alloc)	Keras → TFLite → int8 workflow

Real-Time Budget (1000 Hz, 25ms hop)

Stage	Cost	Notes
ADC DMA sampling	~0 µs	Hardware; CPU-free
IIR biquad (3 ch, 2 stages)	<100 µs	`dsps_biquad_f32`
Feature extraction (69 feat)	~1,200 µs	FFT-based features dominate
3 specialist LDAs	~150 µs	`dsps_dotprod_f32` per class
Meta-LDA (15 inputs)	~10 µs	75 MACs total
int8 MLP fallback [69→32→16→5]	~250 µs	esp-nn FC kernels
Post-processing	<50 µs	EMA, vote, debounce
Total (full ensemble)	~1,760 µs	14× margin within 25ms

Hard No-Gos

Technique	Why
Full MPF with matrix logarithm	Eigendecomposition per window; fragile float32; no SIMD path
Conv1D(16→512) + 3×LSTM(512)	~4 MB weights; LSTM sequential dependency — impossible
Any transformer / attention	O(n²); no int8 transformer kernels for MCU
On-device gradient updates	Inference only — no training infrastructure
Heap allocations on hot path	FreeRTOS heap fragmentation kills determinism

2. Current System Snapshot

Aspect	Current State
Channels	4 total; ch0–ch2 forearm (FCR, FCU, extensor), ch3 bicep (excluded from hand classifier)
Sampling	1000 Hz, timer/polling (jitter — fix with Change A)
Window	150 samples (150ms), 25-sample hop (25ms)
Features	12: RMS, WL, ZC, SSC × 3 channels
Classifier	Single LDA, float32 weights in C header
Label alignment	RMS onset detection — missing +100ms forward shift (Change 0)
Normalization	Per-session z-score in Python; no on-device equivalent (Change D)
Smoothing	EMA (α=0.7) + majority vote (5) + debounce (3 counts)
Confidence rejection	None — always outputs a class (Change C)
Signal filtering	Analogue only via MyoWare (Change B adds software IIR)
Gestures	5: fist, hook_em, open, rest, thumbs_up
Training data	15 HDF5 sessions, 1 user

2.1 — Confirmed Firmware Architecture (From Codebase Exploration)

Confirmed by direct codebase inspection 2026-02-24. All file paths relative to C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/

ADC Pin Mapping (`drivers/emg_sensor.c`)

Channel	ADC Channel	GPIO	Muscle Location	Role in Classifier
ch0	`ADC_CHANNEL_1`	GPIO 2	Forearm Belly (FCR)	Primary flexion signal
ch1	`ADC_CHANNEL_2`	GPIO 3	Forearm Extensors	Extension signal
ch2	`ADC_CHANNEL_8`	GPIO 9	Forearm Contractors (FCU)	Ulnar flexion signal
ch3	`ADC_CHANNEL_9`	GPIO 10	Bicep	Independent — see Section 2.2

Current ADC driver: adc_oneshot (polling — NOT DMA continuous yet; Change A migrates this)

Attenuation: ADC_ATTEN_DB_12 (0–3.9V full-scale range)
Calibration: adc_cali_curve_fitting scheme
Output: calibrated millivolts as uint16_t packed into emg_sample_t.channels[4]
Timing: vTaskDelay(1) in run_inference_loop() provides the ~1ms sample interval

Current Task Structure (`app/main.c`)

Task	Priority	Stack	Core Pinning	Role
`app_main` (implicit)	Default	Default	None	Runs inference loop + state machine
`serial_input_task`	5	4096 B	None	Parses UART JSON commands

No other tasks exist. Change A will add adc_sampling_task pinned to Core 0. The inference loop runs on app_main's default task — no explicit core affinity.

State Machine (`app/main.c`)

STATE_IDLE  ─(BLE/UART connect)─►  STATE_CONNECTED
                                         │
                   {"cmd": "start_stream"}▼
                                  STATE_STREAMING    (sends raw ADC over UART for Python)
                                         │
                  {"cmd": "start_predict"}▼
                                  STATE_PREDICTING   (runs run_inference_loop())

Communication: UART at 921600 baud, JSON framing.

Complete Data Flow (Exact Function Names)

emg_sensor_read(&sample)
  │  drivers/emg_sensor.c
  │  adc_oneshot_read() × 4 channels → adc_cali_raw_to_voltage() → uint16_t mV
  │  Result: sample.channels[4] = {ch0_mV, ch1_mV, ch2_mV, ch3_mV}
  │
  ▼  Called every ~1ms (vTaskDelay(1) in run_inference_loop)
inference_add_sample(sample.channels)
  │  core/inference.c
  │  Writes to circular window_buffer[150][4]
  │  Returns true when buffer is full (after first 150 samples)
  │
  ▼  Called every 25 samples (stride_counter % INFERENCE_HOP_SIZE == 0)
inference_predict(&confidence)
  │  core/inference.c
  │  compute_features() → LDA scores → softmax → EMA → majority vote → debounce
  │  Returns: gesture class index (int), fills confidence (float)
  │
  ▼
inference_get_gesture_enum(class_idx)
  │  core/inference.c
  │  String match on MODEL_CLASS_NAMES[] → gesture_t enum value
  │
  ▼
gestures_execute(gesture)
     core/gestures.c
     switch(gesture) → servo PWM via LEDC driver
     Servo pins: GPIO 1,4,5,6,7 (Thumb, Index, Middle, Ring, Pinky)

Current Buffer State

// core/inference.c line 19:
static uint16_t window_buffer[INFERENCE_WINDOW_SIZE][NUM_CHANNELS];
//       ^^^^^^^^ MUST change to float when adding IIR filter (Change B)
//
// uint16_t: 150 × 4 × 2 = 1,200 bytes in internal SRAM
// float:    150 × 4 × 4 = 2,400 bytes in internal SRAM  (still trivially small)
//
// Reason for change: IIR filter outputs float; casting back to uint16_t loses
// sub-mV precision and re-introduces the quantization noise we just filtered out.

`platformio.ini` Current State (`EMG_Arm/platformio.ini`)

Current lib_deps: None — completely empty, no external library dependencies.

Required additions per change tier:

Change	Library	`platformio.ini` `lib_deps` entry
B (IIR biquad)	esp-dsp	`espressif/esp-dsp @ ^2.0.0`
1 (FFT features)	esp-dsp	(same — add once for both B and 1)
E (int8 MLP)	TFLite Micro	`tensorflow/tflite-micro`
F (ensemble)	esp-dsp	(same as B)

Add to platformio.ini under [env:esp32-s3-devkitc1-n16r16]:

lib_deps =
    espressif/esp-dsp @ ^2.0.0
    ; tensorflow/tflite-micro   ← add this only when implementing Change E

2.2 — Bicep Channel Subsystem (ch3 / ADC_CHANNEL_9 / GPIO 10)

Current Status

The bicep channel is:

Sampled: emg_sensor_read() reads all 4 channels; sample.channels[3] holds bicep data
Excluded from hand classifier: HAND_NUM_CHANNELS = 3; compute_features() explicitly loops ch = 0 to ch < HAND_NUM_CHANNELS (i.e., ch0, ch1, ch2 only)
Not yet independently processed: the comment in inference.c line 68 ("ch3 (bicep) is excluded — it will be processed independently") is aspirational — the independent processing is not yet implemented

Phase 1 — Binary Flex/Unflex (Current Target)

Implement a simple RMS threshold detector as a new subsystem:

New files:

EMG_Arm/src/core/bicep.h
EMG_Arm/src/core/bicep.c

bicep.h:

#pragma once
#include <stdint.h>
#include <stdbool.h>

typedef enum {
    BICEP_STATE_REST = 0,
    BICEP_STATE_FLEX = 1,
} bicep_state_t;

// Call once at session start with ~3s of relaxed bicep data.
// Returns the computed threshold (also stored internally).
float bicep_calibrate(const uint16_t *ch3_samples, int n_samples);

// Call every 25ms (same hop as hand gesture inference).
// Computes RMS on the last BICEP_WINDOW_SAMPLES from the ch3 circular buffer.
bicep_state_t bicep_detect(void);

// Load/save threshold to NVS (reuse calibration.c infrastructure from Change D)
bool bicep_save_threshold(float threshold_mv);
bool bicep_load_threshold(float *threshold_mv_out);

Core logic (bicep.c):

#define BICEP_WINDOW_SAMPLES  50     // 50ms window at 1000Hz
#define BICEP_FLEX_MULTIPLIER 2.5f   // threshold = rest_rms × 2.5
#define BICEP_HYSTERESIS      1.3f   // prevents rapid toggling at threshold boundary

static float s_threshold_mv = 0.0f;
static bicep_state_t s_state = BICEP_STATE_REST;

float bicep_calibrate(const uint16_t *ch3_samples, int n_samples) {
    float rms_sq = 0.0f;
    for (int i = 0; i < n_samples; i++)
        rms_sq += (float)ch3_samples[i] * ch3_samples[i];
    float rest_rms = sqrtf(rms_sq / n_samples);
    s_threshold_mv = rest_rms * BICEP_FLEX_MULTIPLIER;
    printf("[Bicep] Calibrated: rest_rms=%.1f mV, threshold=%.1f mV\n",
           rest_rms, s_threshold_mv);
    return s_threshold_mv;
}

bicep_state_t bicep_detect(void) {
    // Compute RMS on last BICEP_WINDOW_SAMPLES from ch3 circular buffer
    // (ch3 values are stored in window_buffer[][3] alongside hand channels)
    float rms_sq = 0.0f;
    int idx = buffer_head;
    for (int i = 0; i < BICEP_WINDOW_SAMPLES; i++) {
        float v = (float)window_buffer[idx][3];  // ch3 = bicep
        rms_sq += v * v;
        idx = (idx + 1) % INFERENCE_WINDOW_SIZE;
    }
    float rms = sqrtf(rms_sq / BICEP_WINDOW_SAMPLES);

    // Hysteresis: require FLEX_MULTIPLIER to enter flex, 1.0× to exit
    if (s_state == BICEP_STATE_REST && rms > s_threshold_mv * BICEP_HYSTERESIS)
        s_state = BICEP_STATE_FLEX;
    else if (s_state == BICEP_STATE_FLEX && rms < s_threshold_mv)
        s_state = BICEP_STATE_REST;

    return s_state;
}

Integration in main.c run_inference_loop():

// Call alongside inference_predict() every 25ms:
if (stride_counter % INFERENCE_HOP_SIZE == 0) {
    float confidence;
    int class_idx     = inference_predict(&confidence);
    gesture_t gesture = inference_get_gesture_enum(class_idx);
    bicep_state_t bicep = bicep_detect();

    // Combined actuation: hand gesture + bicep state
    // Example: bicep flex can enable/disable certain gestures,
    // or control a separate elbow/wrist joint.
    gestures_execute(gesture);
    // bicep_actuate(bicep);  ← add when elbow motor is wired
}

Calibration trigger (add to serial_input_task command parsing):

// {"cmd": "calibrate_bicep"}  → collect 3s of rest data, call bicep_calibrate()

Phase 2 — Continuous Angle/Velocity Prediction (Future)

When ready to move beyond binary flex/unflex:

Collect angle-labeled data: hold arm at 0°, 15°, 30°, 45°, 60°, 75°, 90°; log RMS at each; collect 5+ reps per angle.
Fit polynomial: angle = a0 + a1*rms + a2*rms² (degree-2 usually sufficient); use numpy.polyfit(rms_values, angles, deg=2).
Store coefficients in NVS: 3 floats via nvs_set_blob().
On-device evaluation: angle = a0 + rms*(a1 + rms*a2) — 2 MACs per inference.
Velocity: velocity = (angle_now - angle_prev) / HOP_MS with low-pass smoothing.

Including ch3 in Hand Gesture Classifier (for Wrist Rotation)

If/when wrist rotation or supination gestures are added:

# learning_data_collection.py — change this constant:
HAND_CHANNELS = [0, 1, 2, 3]  # was [0, 1, 2]; include bicep for rotation gestures

Feature count becomes: 4 channels × 20 per-ch + 10 cross-ch covariances + 6 correlations = 96 total. The bicep subsystem is then retired and ch3 becomes part of the main gesture classifier.

3. What Meta Built — Filtered for ESP32

Meta's Nature 2025 paper (doi:10.1038/s41586-025-09255-w) describes a 16-channel wristband running Conv1D(16→512)+3×LSTM(512). That exact model is not portable to ESP32-S3 (~4 MB weights). What IS transferable:

Meta Technique	Transferability	Where Used
+100ms forward label shift after onset detection	✓ Direct copy	Change 0
Frequency features > amplitude features (Extended Data Fig. 6)	✓ Core insight	Change 1, Change 6
Deliberate electrode repositioning between sessions	✓ Protocol	Change 2
Window jitter + amplitude augmentation	✓ Training	Change 3
Reinhard compression `64x/(32+	x	)`
EMA α=0.7, threshold=0.35, debounce=50ms	✓ Already implemented	Change C
Specialist features → meta-learner stacking	✓ Adapted	Change 7 + F
Conv1D+LSTM architecture	✗ Too large	Not implementable
Full MPF with matrix logarithm	✗ Eigendecomp too costly	Not implementable

4. Current Code State + Known Bugs

All Python changes: C:/VSCode/Marvel_Projects/Bucky_Arm/learning_data_collection.py Firmware: C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/core/inference.c Config: C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/config/config.h Weights: C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/core/model_weights.h

Key Symbol Locations

Symbol	Line	Notes
Constants block	49–94	`NUM_CHANNELS`, `SAMPLING_RATE_HZ`, `WINDOW_SIZE_MS`, etc.
`align_labels_with_onset()`	442	RMS onset detection
`filter_transition_windows()`	529	Removes onset/offset ambiguity windows
`SessionStorage.save_session()`	643	Calls onset alignment, saves HDF5
`SessionStorage.load_all_for_training()`	871	Returns 6 values (see bug below)
`EMGFeatureExtractor` class	1404	Current: RMS, WL, ZC, SSC only
`extract_features_single_channel()`	1448	Per-channel feature dict
`extract_features_window()`	1482	Flat array + cross-channel
`extract_features_batch()`	1520	Batch wrapper
`get_feature_names()`	1545	String names for features
`CalibrationTransform` class	1562	z-score at Python-side inference
`EMGClassifier` class	1713	LDA/QDA wrapper
`EMGClassifier.__init__()`	1722	Creates `EMGFeatureExtractor`
`EMGClassifier.train()`	1735	Feature extraction + model fit
`EMGClassifier._apply_session_normalization()`	1774	Per-session z-score
`EMGClassifier.cross_validate()`	1822	GroupKFold, trial-level
`EMGClassifier.export_to_header()`	1956	Writes `model_weights.h`
`EMGClassifier.save()`	1910	Persists model params
`EMGClassifier.load()`	2089	Reconstructs from saved params
`run_training_demo()`	2333	Main training entry point
`inference.c` `compute_features()`	68	C feature extraction
`inference.c` `inference_predict()`	158	C LDA + smoothing pipeline

Pending Cleanups (Do Before Any Other Code Changes)

Item	File	Action
Remove `system_mode_t`	`config/config.h` lines 93–100	Delete the unused typedef (see Part 0, Section 0.7)
Add `EMG_STANDALONE` to enum	`config/config.h` line 19	Add value to the existing MAIN_MODE enum
Add `STATE_LAPTOP_PREDICT` + `CMD_START_LAPTOP_PREDICT`	`app/main.c`	See Part 0, Section 0.5 for exact diffs
Add `run_standalone_loop()`	`app/main.c`	New function — see Part 0, Section 0.4
Add `run_laptop_predict_loop()`	`app/main.c`	New function — see Part 0, Section 0.5
Add `inference_get_gesture_by_name()`	`core/inference.c` + `core/inference.h`	Small helper — extracts existing strcmp logic

Known Bug — Line 2382

# BUG: load_all_for_training() returns 6 values; this call unpacks only 5.
# session_indices_combined is silently dropped — breaks per-session normalization.
X, y, trial_ids, label_names, loaded_sessions = storage.load_all_for_training()

# FIX (apply with Change 1):
X, y, trial_ids, session_indices, label_names, loaded_sessions = storage.load_all_for_training()

Current `model_weights.h` State (as of 2026-02-14 training run)

Constant	Value	Note
`MODEL_NUM_CLASSES`	5	fist, hook_em, open, rest, thumbs_up
`MODEL_NUM_FEATURES`	12	RMS, WL, ZC, SSC × 3 forearm channels
`MODEL_CLASS_NAMES`	`{"fist","hook_em","open","rest","thumbs_up"}`	Alphabetical order
`MODEL_NORMALIZE_FEATURES`	not defined yet	Add when enabling cross-ch norm (Change B)
`MODEL_USE_REINHARD`	not defined yet	Add when enabling Reinhard compression (Change 4)
`FEAT_ZC_THRESH`	`0.1f`	Fraction of RMS for zero-crossing threshold
`FEAT_SSC_THRESH`	`0.1f`	Fraction of RMS for slope sign change threshold

The LDA_WEIGHTS and LDA_INTERCEPTS arrays are current trained values — do not modify manually. They are regenerated by EMGClassifier.export_to_header() after each training run.

Current Feature Vector (12 features — firmware contract)

ch0: [0]=rms  [1]=wl  [2]=zc  [3]=ssc
ch1: [4]=rms  [5]=wl  [6]=zc  [7]=ssc
ch2: [8]=rms  [9]=wl [10]=zc [11]=ssc

Target Feature Vector (69 features after Change 1)

Per channel (×3 channels, 20 features each):
  [0] rms  [1] wl   [2] zc   [3] ssc   [4] mav   [5] var
  [6] iemg [7] wamp [8] ar1  [9] ar2  [10] ar3  [11] ar4
 [12] mnf [13] mdf  [14] pkf [15] mnp  [16] bp0  [17] bp1
 [18] bp2 [19] bp3

ch0: indices  0–19
ch1: indices 20–39
ch2: indices 40–59

Cross-channel (9 features):
 [60] cov_ch0_ch0  [61] cov_ch0_ch1  [62] cov_ch0_ch2
 [63] cov_ch1_ch1  [64] cov_ch1_ch2  [65] cov_ch2_ch2
 [66] cor_ch0_ch1  [67] cor_ch0_ch2  [68] cor_ch1_ch2

Specialist Feature Subset Indices (for Change F + Change 7)

TD (time-domain, 36 feat): indices [0–11, 20–31, 40–51]
FD (frequency-domain, 24 feat): indices [12–19, 32–39, 52–59]
CC (cross-channel, 9 feat): indices [60–68]

PART II — TARGET ARCHITECTURE

5. Full Recommended Multi-Model Stack

ADC (DMA, Change A)
  └── IIR Biquad filter per channel (Change B)
        └── 150-sample circular window buffer
              │
              ▼  [every 25ms]
        compute_features()  →  69-feature vector
              │
              ▼
        calibration_apply()  (Change D — NVS z-score)
              │
              ├─── Stage 1: Activity Gate ──────────────────────────────────┐
              │    total_rms < REST_THRESHOLD?  →  return GESTURE_REST      │
              │    (skips all inference during obvious idle)                 │
              │                                                              │
              ▼  (only reached when gesture is active)                      │
        Stage 2: Parallel Specialist LDAs (Change F)                        │
              ├── LDA_TD  [TD features, 36-dim]  →  prob_td[5]             │
              ├── LDA_FD  [FD features, 24-dim]  →  prob_fd[5]             │
              └── LDA_CC  [CC features,  9-dim]  →  prob_cc[5]             │
                                                                            │
              ▼                                                             │
        Stage 3: Meta-LDA stacker (Change F)                               │
              input: [prob_td | prob_fd | prob_cc]  (15-dim)               │
              output: meta_probs[5]                                         │
                                                                            │
              ▼                                                             │
        EMA smoothing (α=0.7) on meta_probs                                │
              │                                                             │
              ├── max smoothed prob ≥ 0.50? ────── Yes ──────────────────┐ │
              │                                                           │ │
              └── No: Stage 4 Confidence Cascade (Change E)              │ │
                    run int8 MLP on full 69-feat vector                  │ │
                    use higher-confidence winner                         │ │
                          │                                              │ │
                          └────────────────────────────────────────────►│ │
                                                                         │ │
              ◄────────────────────────────────────────────────────────── │ │
              │                                                            ◄─┘
              ▼
        Stage 5: Confidence rejection (Change C)
              max_prob < 0.40?  →  return current_output (hold / GESTURE_NONE)
              │
              ▼
        Majority vote (window=5) + Debounce (count=3)
              │
              ▼
        final gesture → actuation

Model Weight Footprint

Model	Input Dim	Weights	Memory (float32)
LDA_TD	36	5×36 = 180	720 B
LDA_FD	24	5×24 = 120	480 B
LDA_CC	9	5×9 = 45	180 B
Meta-LDA	15	5×15 = 75	300 B
int8 MLP [69→32→16→5]	69	~2,900	~2.9 KB int8
Total			~4.6 KB

All model weights fit comfortably in internal SRAM.

6. Compute Budget for Full Stack

Stage	Cost	Cumulative
Feature extraction (69 feat, 128-pt FFT ×3)	1,200 µs	1,200 µs
NVS calibration apply	10 µs	1,210 µs
Activity gate (RMS check)	5 µs	1,215 µs
LDA_TD (36 feat × 5 classes)	50 µs	1,265 µs
LDA_FD (24 feat × 5 classes)	35 µs	1,300 µs
LDA_CC (9 feat × 5 classes)	15 µs	1,315 µs
Meta-LDA (15 feat × 5 classes)	10 µs	1,325 µs
EMA + confidence check	10 µs	1,335 µs
int8 MLP (worst case, ~30% of hops)	250 µs	1,585 µs
Vote + debounce	20 µs	1,605 µs
Worst-case total	1,760 µs	7% of 25ms budget

7. Why This Architecture Works for 3-Channel EMG

Three channels means limited spatial information. The ensemble compensates by extracting maximum diversity from the temporal and spectral dimensions:

LDA_TD specializes in muscle activation intensity and dynamics (how hard and fast is each muscle firing)
LDA_FD specializes in muscle activation frequency content (motor unit recruitment patterns — slow vs. fast twitch fibres fire at different frequencies)
LDA_CC specializes in inter-muscle coordination (which muscles co-activate — the spatial "fingerprint" of each gesture)

These three signal aspects are partially uncorrelated. A gesture that confuses LDA_TD (similar amplitude patterns) may be distinguishable by LDA_FD (different frequency recruitment) or LDA_CC (different co-activation pattern). The meta-LDA learns which specialist to trust for each gesture boundary.

The int8 MLP fallback handles the residual nonlinear cases: gesture pairs where the decision boundary is curved in feature space, which LDA (linear boundary only) cannot resolve.

PART III — GESTURE EXTENSIBILITY

8. What Changes When Adding or Removing a Gesture

The system is designed for extensibility. Adding a gesture requires 3 firmware lines and a retrain.

What Changes Automatically (No Manual Code Edits)

Component	How it adapts
`MODEL_NUM_CLASSES` in `model_weights.h`	Auto-computed from training data label count
LDA weight array dimensions	`[MODEL_NUM_CLASSES][MODEL_NUM_FEATURES]` — regenerated by `export_to_header()`
`MODEL_CLASS_NAMES` array	Regenerated by `export_to_header()`
All ensemble LDA weight arrays	Regenerated by `export_ensemble_header()` (Change 7)
int8 MLP output layer	Retrained with new class count; re-exported to TFLite
Meta-LDA input/output dims	`META_NUM_INPUTS = 3 × MODEL_NUM_CLASSES` — auto from Python

What Requires Manual Code Changes

Python side (learning_data_collection.py):

# 1. Add gesture name to the gesture list (1 line)
# Find where GESTURES or similar list is defined (near constants block ~line 49)
GESTURES = ['fist', 'hook_em', 'open', 'rest', 'thumbs_up', 'wrist_flex']  # example

Firmware — config.h (1 line per gesture):

// Add enum value
typedef enum {
    GESTURE_NONE     = 0,
    GESTURE_REST     = 1,
    GESTURE_FIST     = 2,
    GESTURE_OPEN     = 3,
    GESTURE_HOOK_EM  = 4,
    GESTURE_THUMBS_UP = 5,
    GESTURE_WRIST_FLEX = 6,  // ← add this line
} gesture_t;

Firmware — inference.c inference_get_gesture_enum() (2–3 lines per gesture):

if (strcmp(name, "wrist_flex") == 0 || strcmp(name, "WRIST_FLEX") == 0)
    return GESTURE_WRIST_FLEX;

Firmware — gestures.c (2 changes — these are easy to miss):

// 1. Add to gesture_names[] static array — index MUST match gesture_t enum value:
static const char *gesture_names[GESTURE_COUNT] = {
    "NONE",       // GESTURE_NONE = 0
    "REST",       // GESTURE_REST = 1
    "FIST",       // GESTURE_FIST = 2
    "OPEN",       // GESTURE_OPEN = 3
    "HOOK_EM",    // GESTURE_HOOK_EM = 4
    "THUMBS_UP",  // GESTURE_THUMBS_UP = 5
    "WRIST_FLEX", // GESTURE_WRIST_FLEX = 6  ← add here
};

// 2. Add case to gestures_execute() switch statement:
case GESTURE_WRIST_FLEX:
    gesture_wrist_flex();   // implement the actuation function
    break;

Critical: GESTURE_COUNT at the end of the gesture_t enum in config.h is used as the array size for gesture_names[]. It updates automatically when new enum values are added before it. Both gesture_names[GESTURE_COUNT] and the switch statement must be kept in sync with GESTURE_COUNT. Mismatch causes a bounds-overrun or silent misclassification.

Complete Workflow for Adding a Gesture

1. Python: add gesture string to GESTURES list in learning_data_collection.py (1 line)

2. Data: collect ≥10 sessions × ≥30 reps of new gesture
   (follow Change 2 protocol: vary electrode placement between sessions)

3. Train: python learning_data_collection.py → option 3
         OR: python train_ensemble.py (after Change 7 is implemented)

4. Export: export_to_header() OR export_ensemble_header()
   → overwrites model_weights.h / model_weights_ensemble.h with new class count

5. config.h: add enum value before GESTURE_COUNT (1 line):
       GESTURE_WRIST_FLEX = 6,   // ← insert before GESTURE_COUNT
       GESTURE_COUNT             // stays last — auto-counts

6. inference.c: add string mapping in inference_get_gesture_enum() (2 lines)

7. gestures.c: add name to gesture_names[] array at correct index (1 line)

8. gestures.c: add case to gestures_execute() switch statement (3 lines)

9. Implement actuation function for new gesture (servo angles)

10. Reflash and validate: pio run -t upload

Exact files touched per new gesture (summary):

File	What to change
`learning_data_collection.py`	Add string to GESTURES list
`config/config.h`	Add enum value before `GESTURE_COUNT`
`core/inference.c`	Add `strcmp` case in `inference_get_gesture_enum()`
`core/gestures.c`	Add to `gesture_names[]` array + add switch case
`core/gestures.c`	Implement `gesture_<name>()` function with servo angles
`core/model_weights.h`	Auto-generated — do not edit manually

Removing a Gesture

Removing is the same process in reverse, with one additional step: filter the HDF5 training data to exclude sessions that contain the removed gesture's label. The simplest approach is to pass a label whitelist to load_all_for_training():

# Proposed addition to load_all_for_training() — add include_labels parameter
X, y, trial_ids, session_indices, label_names, sessions = \
    storage.load_all_for_training(include_labels=['fist', 'open', 'rest', 'thumbs_up'])
    # hook_em removed — existing session files are not modified

9. Practical Limits of 3-Channel EMG

This is the most important constraint for gesture count:

Gesture Count	Expected Accuracy	Notes
3–5 gestures	>90% achievable	Current baseline target
6–8 gestures	80–90% achievable	Requires richer features + ensemble
9–12 gestures	65–80% achievable	Diminishing returns; some pairs will be confused
13+ gestures	<65%	Surface EMG with 3 channels cannot reliably separate this many

Why 3 channels limits gesture count: Surface EMG captures the summed electrical activity of many motor units under each electrode. With only 3 spatial locations, gestures that recruit overlapping muscle groups (e.g., all finger-flexion gestures recruit FCR) produce similar signals. The frequency and coordination features from Change 1 help, but there's a hard information-theoretic limit imposed by channel count.

Rule of thumb: aim for ≤8 gestures with the current 3-channel setup. For more, add the bicep channel (ch3, currently excluded) to get 4 channels — see Section 10.

10. Specific Gesture Considerations

Wrist Flexion / Extension

Feasibility: High — FCR (ch0) activates strongly for flexion; extensor group (ch2) for extension
Differentiation from finger gestures: frequency content differs (wrist involves slower motor units)
Recommendation: Add these before wrist rotation — more reliable with surface EMG

Wrist Rotation (Supination / Pronation)

Feasibility: Medium — the primary supinator is a deep muscle; surface electrodes capture it weakly
Key helper: the bicep activates strongly during supination → include ch3 (HAND_CHANNELS = [0, 1, 2, 3])
Code change for 4 channels: Python: HAND_CHANNELS = [0, 1, 2, 3]; firmware: HAND_NUM_CHANNELS auto-updates from the exported header since MODEL_NUM_FEATURES is recalculated
Caveat: pronation vs. rest may be harder to distinguish than supination vs. rest

Pinch / Precision Grasp

Feasibility: Medium — involves intrinsic hand muscles poorly captured by forearm electrodes
Likely confused with open hand depending on electrode placement
Collect with careful placement; validate cross-session accuracy before relying on it

Including ch3 (Bicep) for Wrist Gestures

To include the bicep channel in the hand gesture classifier:

# learning_data_collection.py — change this constant
HAND_CHANNELS = [0, 1, 2, 3]  # was [0, 1, 2] — add bicep channel

Feature count: 4 channels × 20 per-channel features + 10 cross-channel covariances + 6 correlations = 96 total features. The ensemble architecture handles this automatically — specialist LDA weight dimensions recalculate at training time.

PART IV — CHANGE REFERENCE

11. Change Classification Matrix

Change	Category	Priority	Files	ESP32 Reflash?	Retrain?	Risk
C	Firmware	Tier 1	inference.c	✓	No	Very Low
B	Firmware	Tier 1	inference.c / filter.c	✓	No	Low
A	Firmware	Tier 1	adc_sampling.c	✓	No	Medium
0	Python	Tier 1	learning_data_collection.py	No	✓	Low
1	Python+C	Tier 2	learning_data_collection.py + inference.c	✓ after	✓	Medium
D	Firmware	Tier 2	calibration.c/.h	✓	No	Medium
2	Protocol	Tier 2	None	No	✓ new data	None
3	Python	Tier 2	learning_data_collection.py	No	✓	Low
E	Python+FW	Tier 3	train_mlp_tflite.py + firmware	✓	✓	High
4	Python+C	Tier 3	learning_data_collection.py + inference.c	✓ if enabled	✓	Low
5	Python	Tier 3	learning_data_collection.py	No	No	None
6	Python	Tier 3	learning_data_collection.py	No	✓	Low
7	Python	Tier 3	new: train_ensemble.py	No	✓	Medium
F	Firmware	Tier 3	new: inference_ensemble.c	✓	No (needs 7 first)	Medium

Recommended implementation order: C → B → A → 0 → 1 → D → 2 → 3 → 5 (benchmark) → 7+F → E

PART V — FIRMWARE CHANGES

Change A — DMA-Driven ADC Sampling (Migration from `adc_oneshot` to `adc_continuous`)

Priority: Tier 1 Current driver: adc_oneshot_read() polling in drivers/emg_sensor.c. Timing is controlled by vTaskDelay(1) in run_inference_loop() — subject to FreeRTOS scheduler jitter of ±0.5–1ms, which corrupts frequency-domain features and ADC burst grouping. Why: adc_continuous runs entirely in hardware DMA. Sample-to-sample jitter drops from ±1ms to <10µs. CPU overhead between samples is zero. Required for frequency features (Change 1). Effort: 2–4 hours (replace emg_sensor_read() internals; keep public API the same)

ESP-IDF ADC Continuous API

// --- Initialize (call once at startup) ---
adc_continuous_handle_t adc_handle = NULL;
adc_continuous_handle_cfg_t adc_cfg = {
    .max_store_buf_size = 4096,    // PSRAM ring buffer size (bytes)
    .conv_frame_size    = 256,     // bytes per conversion frame
};
adc_continuous_new_handle(&adc_cfg, &adc_handle);

// Actual hardware channel mapping (from emg_sensor.c):
// ch0 = ADC_CHANNEL_1 / GPIO 2  (Forearm Belly / FCR)
// ch1 = ADC_CHANNEL_2 / GPIO 3  (Forearm Extensors)
// ch2 = ADC_CHANNEL_8 / GPIO 9  (Forearm Contractors / FCU)
// ch3 = ADC_CHANNEL_9 / GPIO 10 (Bicep — independent subsystem)
adc_digi_pattern_config_t chan_cfg[4] = {
    {.atten = ADC_ATTEN_DB_12, .channel = ADC_CHANNEL_1, .unit = ADC_UNIT_1, .bit_width = ADC_BITWIDTH_12},
    {.atten = ADC_ATTEN_DB_12, .channel = ADC_CHANNEL_2, .unit = ADC_UNIT_1, .bit_width = ADC_BITWIDTH_12},
    {.atten = ADC_ATTEN_DB_12, .channel = ADC_CHANNEL_8, .unit = ADC_UNIT_1, .bit_width = ADC_BITWIDTH_12},
    {.atten = ADC_ATTEN_DB_12, .channel = ADC_CHANNEL_9, .unit = ADC_UNIT_1, .bit_width = ADC_BITWIDTH_12},
};
adc_continuous_config_t cont_cfg = {
    .sample_freq_hz = 4000,        // 4 channels × 1000 Hz = 4000 total samples/sec
    .conv_mode      = ADC_CONV_SINGLE_UNIT_1,
    .format         = ADC_DIGI_OUTPUT_FORMAT_TYPE2,
    .pattern_num    = 4,
    .adc_pattern    = chan_cfg,
};
adc_continuous_config(adc_handle, &cont_cfg);

// --- ISR callback (fires each frame) ---
static SemaphoreHandle_t s_adc_sem;
static bool IRAM_ATTR adc_conv_done_cb(
        adc_continuous_handle_t handle,
        const adc_continuous_evt_data_t *edata, void *user_data) {
    BaseType_t hp_woken = pdFALSE;
    xSemaphoreGiveFromISR(s_adc_sem, &hp_woken);
    return hp_woken == pdTRUE;
}
adc_continuous_evt_cbs_t cbs = { .on_conv_done = adc_conv_done_cb };
adc_continuous_register_event_callbacks(adc_handle, &cbs, NULL);
adc_continuous_start(adc_handle);

// --- ADC calibration (apply per sample) ---
adc_cali_handle_t cali_handle;
adc_cali_curve_fitting_config_t cali_cfg = {
    .unit_id  = ADC_UNIT_1,
    .atten    = ADC_ATTEN_DB_12,   // matches ADC_ATTEN_DB_12 used in current emg_sensor.c
    .bitwidth = ADC_BITWIDTH_12,
};
adc_cali_create_scheme_curve_fitting(&cali_cfg, &cali_handle);

// --- Sampling task (pin to Core 0) ---
void adc_sampling_task(void *arg) {
    uint8_t result_buf[256];
    uint32_t out_len = 0;
    while (1) {
        xSemaphoreTake(s_adc_sem, portMAX_DELAY);
        adc_continuous_read(adc_handle, result_buf, sizeof(result_buf), &out_len, 0);
        // Parse: each entry is adc_digi_output_data_t
        // Apply adc_cali_raw_to_voltage() for each sample
        // Apply IIR filter (Change B) → post to inference ring buffer
    }
}

Verify: log consecutive sample timestamps via esp_timer_get_time(); spacing should be 1.0ms ± 0.05ms.

Change B — IIR Biquad Bandpass Filter

Priority: Tier 1 Why: MyoWare analogue filters are not tunable. Software IIR removes powerline interference (50/60 Hz), sub-20 Hz motion artifact, and >500 Hz noise — all of which inflate ZC, WL, and other features computed at rest. Effort: 2 hours

Step 1 — Compute Coefficients in Python (one-time, offline)

from scipy.signal import butter
import numpy as np

fs = 1000.0
sos = butter(N=2, Wn=[20.0, 500.0], btype='bandpass', fs=fs, output='sos')
# sos[i] = [b0, b1, b2, a0, a1, a2]
# esp-dsp Direct Form II convention: coeffs = [b0, b1, b2, -a1, -a2]
for i, s in enumerate(sos):
    b0, b1, b2, a0, a1, a2 = s
    print(f"Section {i}: {b0:.8f}f, {b1:.8f}f, {b2:.8f}f, {-a1:.8f}f, {-a2:.8f}f")
# Run this and paste the printed values into the C constants below

Step 2 — Add to inference.c (after includes, before `// --- State ---`)

#include "dsps_biquad.h"

// 2nd-order Butterworth bandpass 20–500 Hz @ 1000 Hz
// Coefficients: [b0, b1, b2, -a1, -a2] — Direct Form II, esp-dsp sign convention
// Regenerate with: scipy.signal.butter(N=2, Wn=[20,500], btype='bandpass', fs=1000, output='sos')
static const float BIQUAD_HP_COEFFS[5] = { /* paste section 0 output here */ };
static const float BIQUAD_LP_COEFFS[5] = { /* paste section 1 output here */ };

// Filter delay state: 3 channels × 2 stages × 2 delay elements = 12 floats (48 bytes)
static float biquad_hp_w[HAND_NUM_CHANNELS][2];
static float biquad_lp_w[HAND_NUM_CHANNELS][2];

Add to inference_init():

    memset(biquad_hp_w, 0, sizeof(biquad_hp_w));
    memset(biquad_lp_w, 0, sizeof(biquad_lp_w));

Step 3 — Apply Per Sample (called before writing to window_buffer)

// Apply to each channel before posting to the window buffer.
// Must be called IN ORDER for each sample (IIR has memory across calls).
static float IRAM_ATTR apply_bandpass(int ch, float raw) {
    float hp_out, lp_out;
    dsps_biquad_f32(&raw,   &hp_out, 1, (float *)BIQUAD_HP_COEFFS, biquad_hp_w[ch]);
    dsps_biquad_f32(&hp_out, &lp_out, 1, (float *)BIQUAD_LP_COEFFS, biquad_lp_w[ch]);
    return lp_out;
}

Note: window_buffer stores uint16_t — change to float when adding this filter, so filtered values are stored directly without lossy integer round-trip.

Verify: log ZC count at rest before and after — filtered ZC should be substantially lower (less spurious noise crossings).

Change C — Confidence Rejection

Priority: Tier 1 — implement this first, lowest risk of all changes Why: Without a rejection threshold, ambiguous EMG (rest-to-gesture transition, mid-gesture fatigue, electrode lift) always produces a false actuation. Effort: 15 minutes

Step 1 — Add Constant (top of inference.c with other constants)

#define CONFIDENCE_THRESHOLD 0.40f  // Reject when max smoothed prob < this.
                                    // Meta paper uses 0.35; 0.40 adds prosthetic safety margin.
                                    // Tune: lower to 0.35 if real gestures are being rejected.

Step 2 — Insert After EMA Block in `inference_predict()` (after line 214)

  // Confidence rejection: if the peak smoothed probability is below threshold,
  // hold the last confirmed output rather than outputting an uncertain prediction.
  // Prevents false actuations during gesture transitions and electrode artifacts.
  if (max_smoothed_prob < CONFIDENCE_THRESHOLD) {
    *confidence = max_smoothed_prob;
    return current_output;  // -1 (GESTURE_NONE) until first confident prediction
  }

Verify: arm at complete rest → confirm output stays at GESTURE_NONE and confidence logs below 0.40. Deliberate fist → confidence rises above 0.40 within 1–3 inference cycles.

Change D — On-Device NVS Calibration

Priority: Tier 2 Why: Python CalibrationTransform only runs during training. On-device NVS calibration lets the ESP32 recalibrate z-score normalization at startup (3 seconds of REST) without retraining — solving placement drift and day-to-day impedance variation. Effort: 3–4 hours

New Files

EMG_Arm/src/core/calibration.h
EMG_Arm/src/core/calibration.c

calibration.h

#pragma once
#include <stdbool.h>
#include "config/config.h"

#define CALIB_MAX_FEATURES 96  // supports up to 4-channel expansion

bool calibration_init(void);          // load from NVS at startup
void calibration_apply(float *feat);  // z-score in-place; no-op if not calibrated
bool calibration_update(const float X[][CALIB_MAX_FEATURES], int n_windows, int n_feat);
void calibration_reset(void);
bool calibration_is_valid(void);

calibration.c

#include "calibration.h"
#include "nvs_flash.h"
#include "nvs.h"
#include <math.h>
#include <string.h>
#include <stdio.h>

#define NVS_NAMESPACE "emg_calib"
#define NVS_KEY_MEAN  "feat_mean"
#define NVS_KEY_STD   "feat_std"
#define NVS_KEY_NFEAT "n_feat"
#define NVS_KEY_VALID "calib_ok"

static float s_mean[CALIB_MAX_FEATURES];
static float s_std[CALIB_MAX_FEATURES];
static int   s_n_feat = 0;
static bool  s_valid  = false;

bool calibration_init(void) {
    esp_err_t err = nvs_flash_init();
    if (err == ESP_ERR_NVS_NO_FREE_PAGES || err == ESP_ERR_NVS_NEW_VERSION_FOUND) {
        nvs_flash_erase();
        nvs_flash_init();
    }
    nvs_handle_t h;
    if (nvs_open(NVS_NAMESPACE, NVS_READONLY, &h) != ESP_OK) return false;

    uint8_t valid = 0;
    size_t mean_sz = sizeof(s_mean), std_sz = sizeof(s_std);
    bool ok = (nvs_get_u8(h, NVS_KEY_VALID, &valid)         == ESP_OK) && (valid == 1) &&
              (nvs_get_i32(h, NVS_KEY_NFEAT, (int32_t*)&s_n_feat) == ESP_OK) &&
              (nvs_get_blob(h, NVS_KEY_MEAN, s_mean, &mean_sz) == ESP_OK) &&
              (nvs_get_blob(h, NVS_KEY_STD,  s_std,  &std_sz)  == ESP_OK);
    nvs_close(h);
    s_valid = ok;
    printf("[Calib] %s (%d features)\n", ok ? "Loaded from NVS" : "Not found — identity", s_n_feat);
    return ok;
}

void calibration_apply(float *feat) {
    if (!s_valid) return;
    for (int i = 0; i < s_n_feat; i++)
        feat[i] = (feat[i] - s_mean[i]) / s_std[i];
}

bool calibration_update(const float X[][CALIB_MAX_FEATURES], int n_windows, int n_feat) {
    if (n_windows < 10 || n_feat > CALIB_MAX_FEATURES) return false;
    s_n_feat = n_feat;
    memset(s_mean, 0, sizeof(s_mean));
    for (int w = 0; w < n_windows; w++)
        for (int f = 0; f < n_feat; f++)
            s_mean[f] += X[w][f];
    for (int f = 0; f < n_feat; f++) s_mean[f] /= n_windows;

    memset(s_std, 0, sizeof(s_std));
    for (int w = 0; w < n_windows; w++)
        for (int f = 0; f < n_feat; f++) {
            float d = X[w][f] - s_mean[f];
            s_std[f] += d * d;
        }
    for (int f = 0; f < n_feat; f++) {
        s_std[f] = sqrtf(s_std[f] / n_windows);
        if (s_std[f] < 1e-6f) s_std[f] = 1e-6f;
    }

    nvs_handle_t h;
    if (nvs_open(NVS_NAMESPACE, NVS_READWRITE, &h) != ESP_OK) return false;
    nvs_set_blob(h, NVS_KEY_MEAN, s_mean, sizeof(s_mean));
    nvs_set_blob(h, NVS_KEY_STD,  s_std,  sizeof(s_std));
    nvs_set_i32(h,  NVS_KEY_NFEAT, n_feat);
    nvs_set_u8(h,   NVS_KEY_VALID, 1);
    nvs_commit(h);
    nvs_close(h);
    s_valid = true;
    printf("[Calib] Updated from %d REST windows, %d features\n", n_windows, n_feat);
    return true;
}

Integration in inference.c

In inference_predict(), after compute_features(features), before LDA:

    calibration_apply(features);  // z-score using NVS-stored mean/std

Startup Flow

// In main application startup sequence:
calibration_init();  // load from NVS; no-op if not present yet

// When user triggers recalibration (button press or serial command):
// Collect ~120 REST windows (~3 seconds at 25ms hop)
// Call calibration_update(rest_feature_buffer, 120, MODEL_NUM_FEATURES)

Change E — int8 MLP via TFLite Micro

Priority: Tier 3 — implement after Tier 1+2 changes and benchmark (Change 5) shows LDA plateauing Why: LDA finds only linear decision boundaries. A 2-layer int8 MLP adds nonlinear boundaries for gesture pairs that overlap in feature space. Effort: 4–6 hours

Python Training (new file: `train_mlp_tflite.py`)

"""
Train int8 MLP for ESP32-S3 deployment via TFLite Micro.
Run AFTER Change 0 (label shift) + Change 1 (expanded features).
"""
import numpy as np
import tensorflow as tf
from pathlib import Path
import sys
sys.path.insert(0, str(Path(__file__).parent))
from learning_data_collection import SessionStorage, EMGFeatureExtractor, HAND_CHANNELS

storage = SessionStorage()
X_raw, y, trial_ids, session_indices, label_names, _ = storage.load_all_for_training()

extractor = EMGFeatureExtractor(channels=HAND_CHANNELS, cross_channel=True)
X = extractor.extract_features_batch(X_raw).astype(np.float32)

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X = scaler.fit_transform(X)

n_feat, n_cls = X.shape[1], len(np.unique(y))

model = tf.keras.Sequential([
    tf.keras.layers.Input(shape=(n_feat,)),
    tf.keras.layers.Dense(32, activation='relu'),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(16, activation='relu'),
    tf.keras.layers.Dense(n_cls, activation='softmax'),
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(X, y, epochs=150, batch_size=64, validation_split=0.1, verbose=1)

def representative_dataset():
    for i in range(0, len(X), 10):
        yield [X[i:i+1]]

converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type  = tf.int8
converter.inference_output_type = tf.int8
tflite_model = converter.convert()

out = Path('EMG_Arm/src/core/emg_model_data.cc')
with open(out, 'w') as f:
    f.write('#include "emg_model_data.h"\n')
    f.write(f'const int g_model_len = {len(tflite_model)};\n')
    f.write('const unsigned char g_model[] = {\n  ')
    f.write(', '.join(f'0x{b:02x}' for b in tflite_model))
    f.write('\n};\n')
print(f"Wrote {out} ({len(tflite_model)} bytes)")

Firmware (inference_mlp.cc)

#include "inference_mlp.h"
#include "emg_model_data.h"
#include "tensorflow/lite/micro/micro_interpreter.h"
#include "tensorflow/lite/micro/micro_mutable_op_resolver.h"
#include "tensorflow/lite/schema/schema_generated.h"

static uint8_t tensor_arena[48 * 1024];  // 48 KB — tune down if memory is tight
static tflite::MicroInterpreter *interpreter = nullptr;
static TfLiteTensor *input = nullptr, *output = nullptr;

void inference_mlp_init(void) {
    const tflite::Model *model = tflite::GetModel(g_model);
    static tflite::MicroMutableOpResolver<4> resolver;
    resolver.AddFullyConnected();
    resolver.AddRelu();
    resolver.AddSoftmax();
    resolver.AddDequantize();
    static tflite::MicroInterpreter interp(model, resolver, tensor_arena, sizeof(tensor_arena));
    interpreter = &interp;
    interpreter->AllocateTensors();
    input  = interpreter->input(0);
    output = interpreter->output(0);
}

int inference_mlp_predict(const float *features, int n_feat, float *conf_out) {
    float iscale = input->params.scale;
    int   izp    = input->params.zero_point;
    for (int i = 0; i < n_feat; i++) {
        int q = (int)roundf(features[i] / iscale) + izp;
        input->data.int8[i] = (int8_t)(q < -128 ? -128 : q > 127 ? 127 : q);
    }
    interpreter->Invoke();

    float oscale = output->params.scale;
    int   ozp    = output->params.zero_point;
    float max_p = -1e9f;
    int max_c = 0;
    for (int c = 0; c < MODEL_NUM_CLASSES; c++) {
        float p = (output->data.int8[c] - ozp) * oscale;
        if (p > max_p) { max_p = p; max_c = c; }
    }
    *conf_out = max_p;
    return max_c;
}

platformio.ini addition:

lib_deps =
    tensorflow/tflite-micro

Change F — Ensemble Inference Pipeline

Priority: Tier 3 (requires Change 1 features + Change 7 training + Change E MLP) Why: This is the full recommended architecture from Part II. Effort: 3–4 hours firmware (after Python ensemble is trained and exported)

New Files

EMG_Arm/src/core/inference_ensemble.c
EMG_Arm/src/core/inference_ensemble.h
EMG_Arm/src/core/model_weights_ensemble.h   (generated by Change 7 Python script)

inference_ensemble.h

#pragma once
#include <stdbool.h>

void inference_ensemble_init(void);
int  inference_ensemble_predict(float *confidence);

inference_ensemble.c

#include "inference_ensemble.h"
#include "inference.h"          // for compute_features(), calibration_apply()
#include "inference_mlp.h"      // for inference_mlp_predict()
#include "model_weights_ensemble.h"
#include "config/config.h"
#include "dsps_dotprod.h"
#include <math.h>
#include <string.h>
#include <stdio.h>

#define ENSEMBLE_EMA_ALPHA      0.70f
#define ENSEMBLE_CONF_THRESHOLD 0.50f  // below this: escalate to MLP fallback
#define REJECT_THRESHOLD        0.40f  // below this even after MLP: hold output
#define REST_ACTIVITY_THRESHOLD 0.05f  // total_rms below this → skip inference, return REST

// EMA state
static float s_smoothed[MODEL_NUM_CLASSES];
// Vote + debounce (reuse existing pattern from inference.c)
static int s_vote_history[5];
static int s_vote_head = 0;
static int s_current_output = -1;
static int s_pending_output = -1;
static int s_pending_count  = 0;

// --- Generic LDA softmax predict ---
// weights: [n_classes][n_feat], intercepts: [n_classes]
// proba_out: [n_classes] — caller-provided output
static void lda_softmax(const float *feat, int n_feat,
                         const float *weights_flat, const float *intercepts,
                         int n_classes, float *proba_out) {
    float raw[MODEL_NUM_CLASSES];
    float max_raw = -1e9f, sum_exp = 0.0f;

    for (int c = 0; c < n_classes; c++) {
        raw[c] = intercepts[c];
        // dsps_dotprod_f32 requires 4-byte aligned arrays and length multiple of 4;
        // for safety use plain loop — compiler will auto-vectorize with -O2
        const float *w = weights_flat + c * n_feat;
        for (int f = 0; f < n_feat; f++) raw[c] += feat[f] * w[f];
        if (raw[c] > max_raw) max_raw = raw[c];
    }
    for (int c = 0; c < n_classes; c++) {
        proba_out[c] = expf(raw[c] - max_raw);
        sum_exp += proba_out[c];
    }
    for (int c = 0; c < n_classes; c++) proba_out[c] /= sum_exp;
}

void inference_ensemble_init(void) {
    for (int c = 0; c < MODEL_NUM_CLASSES; c++)
        s_smoothed[c] = 1.0f / MODEL_NUM_CLASSES;
    for (int i = 0; i < 5; i++) s_vote_history[i] = -1;
    s_vote_head = 0;
    s_current_output = -1;
    s_pending_output = -1;
    s_pending_count  = 0;
}

int inference_ensemble_predict(float *confidence) {
    // 1. Extract features (shared with single-model path)
    float features[MODEL_NUM_FEATURES];
    compute_features(features);
    calibration_apply(features);

    // 2. Activity gate — skip inference during obvious REST
    float total_rms_sq = 0.0f;
    for (int ch = 0; ch < HAND_NUM_CHANNELS; ch++) {
        float r = features[ch * ENSEMBLE_PER_CH_FEATURES]; // RMS is index 0 per channel
        total_rms_sq += r * r;
    }
    if (sqrtf(total_rms_sq) < REST_ACTIVITY_THRESHOLD) {
        *confidence = 1.0f;
        return GESTURE_REST;
    }

    // 3. Specialist LDAs
    float prob_td[MODEL_NUM_CLASSES];
    float prob_fd[MODEL_NUM_CLASSES];
    float prob_cc[MODEL_NUM_CLASSES];

    lda_softmax(features + TD_FEAT_OFFSET, TD_NUM_FEATURES,
                (const float *)LDA_TD_WEIGHTS, LDA_TD_INTERCEPTS,
                MODEL_NUM_CLASSES, prob_td);
    lda_softmax(features + FD_FEAT_OFFSET, FD_NUM_FEATURES,
                (const float *)LDA_FD_WEIGHTS, LDA_FD_INTERCEPTS,
                MODEL_NUM_CLASSES, prob_fd);
    lda_softmax(features + CC_FEAT_OFFSET, CC_NUM_FEATURES,
                (const float *)LDA_CC_WEIGHTS, LDA_CC_INTERCEPTS,
                MODEL_NUM_CLASSES, prob_cc);

    // 4. Meta-LDA stacker
    float meta_in[META_NUM_INPUTS];  // = 3 * MODEL_NUM_CLASSES
    memcpy(meta_in,                        prob_td, MODEL_NUM_CLASSES * sizeof(float));
    memcpy(meta_in +   MODEL_NUM_CLASSES,  prob_fd, MODEL_NUM_CLASSES * sizeof(float));
    memcpy(meta_in + 2*MODEL_NUM_CLASSES,  prob_cc, MODEL_NUM_CLASSES * sizeof(float));

    float meta_probs[MODEL_NUM_CLASSES];
    lda_softmax(meta_in, META_NUM_INPUTS,
                (const float *)META_LDA_WEIGHTS, META_LDA_INTERCEPTS,
                MODEL_NUM_CLASSES, meta_probs);

    // 5. EMA smoothing on meta output
    float max_smooth = 0.0f;
    int winner = 0;
    for (int c = 0; c < MODEL_NUM_CLASSES; c++) {
        s_smoothed[c] = ENSEMBLE_EMA_ALPHA * s_smoothed[c] +
                        (1.0f - ENSEMBLE_EMA_ALPHA) * meta_probs[c];
        if (s_smoothed[c] > max_smooth) { max_smooth = s_smoothed[c]; winner = c; }
    }

    // 6. Confidence cascade: escalate to MLP if meta-LDA is uncertain
    if (max_smooth < ENSEMBLE_CONF_THRESHOLD) {
        float mlp_conf = 0.0f;
        int mlp_winner = inference_mlp_predict(features, MODEL_NUM_FEATURES, &mlp_conf);
        if (mlp_conf > max_smooth) { winner = mlp_winner; max_smooth = mlp_conf; }
    }

    // 7. Reject if still uncertain
    if (max_smooth < REJECT_THRESHOLD) {
        *confidence = max_smooth;
        return s_current_output;
    }

    *confidence = max_smooth;

    // 8. Majority vote (window = 5)
    s_vote_history[s_vote_head] = winner;
    s_vote_head = (s_vote_head + 1) % 5;
    int counts[MODEL_NUM_CLASSES] = {0};
    for (int i = 0; i < 5; i++)
        if (s_vote_history[i] >= 0) counts[s_vote_history[i]]++;
    int majority = 0, majority_cnt = 0;
    for (int c = 0; c < MODEL_NUM_CLASSES; c++)
        if (counts[c] > majority_cnt) { majority_cnt = counts[c]; majority = c; }

    // 9. Debounce (3 consecutive predictions to change output)
    int final = s_current_output;
    if (s_current_output == -1) {
        s_current_output = majority; final = majority;
    } else if (majority == s_current_output) {
        s_pending_output = majority; s_pending_count = 1;
    } else if (majority == s_pending_output) {
        if (++s_pending_count >= 3) { s_current_output = majority; final = majority; }
    } else {
        s_pending_output = majority; s_pending_count = 1;
    }

    return final;
}

model_weights_ensemble.h Layout (generated by Change 7)

// Auto-generated by train_ensemble.py — do not edit manually
#pragma once

#define MODEL_NUM_CLASSES    5       // auto-computed from training data
#define MODEL_NUM_FEATURES   69      // total feature count (after Change 1)
#define ENSEMBLE_PER_CH_FEATURES 20  // features per channel

// Specialist feature subset offsets and sizes
#define TD_FEAT_OFFSET  0
#define TD_NUM_FEATURES 36   // time-domain: indices 0–11, 20–31, 40–51
#define FD_FEAT_OFFSET  12   // NOTE: FD features are interleaved per-channel
#define FD_NUM_FEATURES 24   // freq-domain: indices 12–19, 32–39, 52–59
#define CC_FEAT_OFFSET  60
#define CC_NUM_FEATURES 9    // cross-channel: indices 60–68

#define META_NUM_INPUTS (3 * MODEL_NUM_CLASSES)  // = 15

// Specialist LDA weights (flat row-major: [n_classes][n_feat])
extern const float LDA_TD_WEIGHTS[MODEL_NUM_CLASSES][TD_NUM_FEATURES];
extern const float LDA_TD_INTERCEPTS[MODEL_NUM_CLASSES];

extern const float LDA_FD_WEIGHTS[MODEL_NUM_CLASSES][FD_NUM_FEATURES];
extern const float LDA_FD_INTERCEPTS[MODEL_NUM_CLASSES];

extern const float LDA_CC_WEIGHTS[MODEL_NUM_CLASSES][CC_NUM_FEATURES];
extern const float LDA_CC_INTERCEPTS[MODEL_NUM_CLASSES];

// Meta-LDA weights
extern const float META_LDA_WEIGHTS[MODEL_NUM_CLASSES][META_NUM_INPUTS];
extern const float META_LDA_INTERCEPTS[MODEL_NUM_CLASSES];

// Class names (for inference_get_gesture_enum)
extern const char *MODEL_CLASS_NAMES[MODEL_NUM_CLASSES];

Important note on FD features: the frequency-domain features are interleaved at indices [12–19] for ch0, [32–39] for ch1, [52–59] for ch2. The lda_softmax call for LDA_FD must pass a gathered (non-contiguous) sub-vector. The cleanest approach is to gather them into a contiguous buffer before calling lda_softmax:

// Gather FD features into contiguous buffer before LDA_FD
float fd_buf[FD_NUM_FEATURES];
for (int ch = 0; ch < HAND_NUM_CHANNELS; ch++)
    memcpy(fd_buf + ch*8, features + ch*20 + 12, 8 * sizeof(float));
lda_softmax(fd_buf, FD_NUM_FEATURES, ...);

Similarly for TD features. This gather costs <5 µs — negligible.

PART VI — PYTHON/TRAINING CHANGES

Change 0 — Forward Label Shift

Priority: Tier 1 Source: Meta Nature 2025, Methods: "Discrete-gesture time alignment" Why: +100ms shift after onset detection gives the classifier 100ms of pre-event "building" signal, dramatically cleaning the decision boundary near gesture onset. ESP32 impact: None.

Step 1 — Add Constant After Line 94

# After: TRANSITION_END_MS = 150
LABEL_FORWARD_SHIFT_MS = 100  # shift label boundaries +100ms after onset alignment
                               # Source: Kaifosh et al. Nature 2025. doi:10.1038/s41586-025-09255-w

Step 2 — Apply Shift in `SessionStorage.save_session()` (after line ~704)

Find and insert after:

            print(f"[Storage] Labels aligned: {changed}/{len(labels)} windows shifted")

Insert:

        if LABEL_FORWARD_SHIFT_MS > 0:
            shift_windows = max(1, round(LABEL_FORWARD_SHIFT_MS / HOP_SIZE_MS))
            shifted = list(aligned_labels)
            for i in range(1, len(aligned_labels)):
                if aligned_labels[i] != aligned_labels[i - 1]:
                    for j in range(i, min(i + shift_windows, len(aligned_labels))):
                        if shifted[j] == aligned_labels[i]:
                            shifted[j] = aligned_labels[i - 1]
            n_shifted = sum(1 for a, b in zip(aligned_labels, shifted) if a != b)
            aligned_labels = shifted
            print(f"[Storage] Forward label shift (+{LABEL_FORWARD_SHIFT_MS}ms): {n_shifted} windows adjusted")

Step 3 — Reduce TRANSITION_START_MS

TRANSITION_START_MS = 200   # was 300 — reduce because 100ms shift already adds pre-event context

Verify: printout shows N windows adjusted where N is 5–20% of total windows per session.

Change 1 — Expanded Feature Set

Priority: Tier 2 Why: 12 → 69 features; adds frequency-domain and cross-channel information that is structurally more informative than amplitude alone (Meta Extended Data Fig. 6). ESP32 impact: retrain → export new model_weights.h; port selected features to C.

Sub-change 1A — Expand `extract_features_single_channel()` (line 1448)

Replace the entire function body:

    def extract_features_single_channel(self, signal: np.ndarray) -> dict:
        if getattr(self, 'reinhard', False):
            signal = 64.0 * signal / (32.0 + np.abs(signal))

        signal = signal - np.mean(signal)
        N = len(signal)

        # --- Time domain ---
        rms  = np.sqrt(np.mean(signal ** 2))
        diff = np.diff(signal)
        wl   = np.sum(np.abs(diff))
        zc_thresh  = self.zc_threshold_percent * rms
        ssc_thresh = (self.ssc_threshold_percent * rms) ** 2
        sign_ch = signal[:-1] * signal[1:] < 0
        zc  = int(np.sum(sign_ch & (np.abs(diff) > zc_thresh)))
        d_l = signal[1:-1] - signal[:-2]
        d_r = signal[1:-1] - signal[2:]
        ssc = int(np.sum((d_l * d_r) > ssc_thresh))
        mav  = np.mean(np.abs(signal))
        var  = np.mean(signal ** 2)
        iemg = np.sum(np.abs(signal))
        wamp = int(np.sum(np.abs(diff) > 0.15 * rms))

        # AR(4) via Yule-Walker
        ar = np.zeros(4)
        if rms > 1e-6:
            try:
                from scipy.linalg import solve_toeplitz
                r = np.array([np.dot(signal[i:], signal[:N-i]) / N for i in range(5)])
                if r[0] > 1e-10:
                    ar = solve_toeplitz(r[:4], -r[1:5])
            except Exception:
                pass

        # --- Frequency domain (20–500 Hz) ---
        freqs = np.fft.rfftfreq(N, d=1.0 / SAMPLING_RATE_HZ)
        psd   = np.abs(np.fft.rfft(signal)) ** 2 / N
        m     = (freqs >= 20) & (freqs <= 500)
        f_m, p_m = freqs[m], psd[m]
        tp = np.sum(p_m) + 1e-10
        mnf = float(np.sum(f_m * p_m) / tp)
        cum = np.cumsum(p_m)
        mdf = float(f_m[min(np.searchsorted(cum, tp / 2), len(f_m) - 1)])
        pkf = float(f_m[np.argmax(p_m)]) if len(p_m) > 0 else 0.0
        mnp = float(tp / max(len(p_m), 1))

        # Bandpower in 4 physiological bands (mirrors firmware esp-dsp FFT bands)
        bands = [(20, 80), (80, 150), (150, 300), (300, 500)]
        bp = [float(np.sum(psd[(freqs >= lo) & (freqs < hi)])) for lo, hi in bands]

        return {
            'rms': rms, 'wl': wl, 'zc': zc, 'ssc': ssc,
            'mav': mav, 'var': var, 'iemg': iemg, 'wamp': wamp,
            'ar1': float(ar[0]), 'ar2': float(ar[1]),
            'ar3': float(ar[2]), 'ar4': float(ar[3]),
            'mnf': mnf, 'mdf': mdf, 'pkf': pkf, 'mnp': mnp,
            'bp0': bp[0], 'bp1': bp[1], 'bp2': bp[2], 'bp3': bp[3],
        }

Sub-change 1B — Update `extract_features_window()` Return Block (line 1482)

Replace the return section:

        FEATURE_ORDER = ['rms', 'wl', 'zc', 'ssc', 'mav', 'var', 'iemg', 'wamp',
                         'ar1', 'ar2', 'ar3', 'ar4', 'mnf', 'mdf', 'pkf', 'mnp',
                         'bp0', 'bp1', 'bp2', 'bp3']
        NORMALIZE_KEYS = {'rms', 'wl', 'mav', 'iemg'}

        features = []
        for ch_features in all_ch_features:
            for key in FEATURE_ORDER:
                val = ch_features.get(key, 0.0)
                if self.normalize and key in NORMALIZE_KEYS:
                    val = val / norm_factor
                features.append(float(val))

        if self.cross_channel and window.shape[1] >= 2:
            sel   = window[:, channel_indices].astype(np.float32)
            wc    = sel - sel.mean(axis=0)
            cov   = (wc.T @ wc) / len(wc)
            ri, ci = np.triu_indices(len(channel_indices))
            features.extend(cov[ri, ci].tolist())
            stds = np.sqrt(np.diag(cov)) + 1e-10
            cor  = cov / np.outer(stds, stds)
            ro, co = np.triu_indices(len(channel_indices), k=1)
            features.extend(cor[ro, co].tolist())

        return np.array(features, dtype=np.float32)

Sub-change 1C — Update `EMGFeatureExtractor.init()` (line 1430)

    def __init__(self, zc_threshold_percent=0.1, ssc_threshold_percent=0.1,
                 channels=None, normalize=True, cross_channel=True, reinhard=False):
        self.zc_threshold_percent  = zc_threshold_percent
        self.ssc_threshold_percent = ssc_threshold_percent
        self.channels      = channels
        self.normalize     = normalize
        self.cross_channel = cross_channel
        self.reinhard      = reinhard

Sub-change 1D — Update Feature Count in `extract_features_batch()` (line 1520)

Replace n_features = n_channels * 4:

        per_ch = 20
        if self.cross_channel and n_channels >= 2:
            n_features = n_channels * per_ch + \
                         n_channels*(n_channels+1)//2 + n_channels*(n_channels-1)//2
        else:
            n_features = n_channels * per_ch

Sub-change 1E — Update `get_feature_names()` (line 1545)

    def get_feature_names(self, n_channels=0):
        ch_idx = self.channels if self.channels is not None else list(range(n_channels))
        ORDER = ['rms','wl','zc','ssc','mav','var','iemg','wamp',
                 'ar1','ar2','ar3','ar4','mnf','mdf','pkf','mnp','bp0','bp1','bp2','bp3']
        names = [f'ch{ch}_{f}' for ch in ch_idx for f in ORDER]
        if self.cross_channel and len(ch_idx) >= 2:
            n = len(ch_idx)
            names += [f'cov_ch{ch_idx[i]}_ch{ch_idx[j]}' for i in range(n) for j in range(i, n)]
            names += [f'cor_ch{ch_idx[i]}_ch{ch_idx[j]}' for i in range(n) for j in range(i+1, n)]
        return names

Sub-change 1F — Update `EMGClassifier.init()` (line 1722)

        self.feature_extractor = EMGFeatureExtractor(
            channels=HAND_CHANNELS, cross_channel=True, reinhard=False)

Sub-change 1G — Update `save()` (line 1910) and `load()` (line 2089)

In save(), add to feature_extractor_params dict:

                'cross_channel': getattr(self.feature_extractor, 'cross_channel', True),
                'reinhard':      getattr(self.feature_extractor, 'reinhard', False),

In load(), update EMGFeatureExtractor(...) constructor:

        classifier.feature_extractor = EMGFeatureExtractor(
            zc_threshold_percent  = params.get('zc_threshold_percent', 0.1),
            ssc_threshold_percent = params.get('ssc_threshold_percent', 0.1),
            channels              = params.get('channels', HAND_CHANNELS),
            normalize             = params.get('normalize', False),
            cross_channel         = params.get('cross_channel', True),
            reinhard              = params.get('reinhard', False),
        )

Also Fix Bug at Line 2382

X, y, trial_ids, session_indices, label_names, loaded_sessions = storage.load_all_for_training()

Change 2 — Electrode Repositioning Protocol

Protocol: no code changes.

"Between sessions within a single day, the participants remove and slightly reposition the sEMG wristband to enable generalization across different recording positions." — Meta Nature 2025 Methods

Session 1: standard placement
Session 2: band 1–2 cm up the forearm
Session 3: band 1–2 cm down the forearm
Session 4+: slight axial rotation or return to any above position

The per-session z-score normalization in _apply_session_normalization() handles the resulting amplitude shifts. Perform fast, natural gestures — not slow/deliberate.

Change 3 — Data Augmentation

Priority: Tier 2. Apply to raw windows BEFORE feature extraction.

Insert before the # === LDA CLASSIFIER === comment (~line 1709):

def augment_emg_batch(X, y, multiplier=3, seed=42):
    """
    Augment raw EMG windows for training robustness.
    Must be called on raw windows (n_windows, n_samples, n_channels),
    not on pre-computed features.
    Source (window jitter): Kaifosh et al. Nature 2025. doi:10.1038/s41586-025-09255-w
    """
    rng = np.random.default_rng(seed)
    aug_X, aug_y = [X], [y]
    for _ in range(multiplier - 1):
        Xc = X.copy().astype(np.float32)
        Xc *= rng.uniform(0.80, 1.20, (len(X), 1, 1)).astype(np.float32)          # amplitude
        rms = np.sqrt(np.mean(Xc**2, axis=(1,2), keepdims=True)) + 1e-8
        Xc += rng.standard_normal(Xc.shape).astype(np.float32) * (0.05 * rms)     # noise
        Xc += rng.uniform(-20., 20., (len(X), 1, X.shape[2])).astype(np.float32)  # DC jitter
        shifts = rng.integers(-5, 6, size=len(X))
        for i in range(len(Xc)):
            if shifts[i]: Xc[i] = np.roll(Xc[i], shifts[i], axis=0)              # jitter
        aug_X.append(Xc); aug_y.append(y)
    return np.concatenate(aug_X), np.concatenate(aug_y)

In EMGClassifier.train(), replace the start of the function's feature extraction block:

        if getattr(self, 'use_augmentation', True):
            X_aug, y_aug = augment_emg_batch(X, y, multiplier=3)
            print(f"[Classifier] Augmented: {len(X)} → {len(X_aug)} windows")
        else:
            X_aug, y_aug = X, y
        X_features = self.feature_extractor.extract_features_batch(X_aug)
        # ... then use y_aug instead of y for model.fit()

Change 4 — Reinhard Compression (Optional)

Formula: output = 64 × x / (32 + |x|) Enable in Python: set reinhard=True in EMGFeatureExtractor constructor (Change 1F).

Enable in firmware (inference.c compute_features(), after signal copy loop, before mean calc):

#if MODEL_USE_REINHARD
    for (int i = 0; i < INFERENCE_WINDOW_SIZE; i++) {
        float x = signal[i];
        signal[i] = 64.0f * x / (32.0f + fabsf(x));
    }
#endif

Add #define MODEL_USE_REINHARD 0 to model_weights.h (set to 1 when Python uses reinhard=True). Python and firmware MUST match. Mismatch silently corrupts all predictions.

Change 5 — Classifier Benchmark

Purpose: tells you whether LDA accuracy plateau is a features problem (all classifiers similar → add features) or a model complexity problem (SVM/MLP >> LDA → implement Change E/F).

Add after run_training_demo():

def run_classifier_benchmark():
    from sklearn.svm import SVC
    from sklearn.neural_network import MLPClassifier
    from sklearn.pipeline import Pipeline
    from sklearn.preprocessing import StandardScaler
    from sklearn.model_selection import cross_val_score, GroupKFold
    from sklearn.discriminant_analysis import LinearDiscriminantAnalysis, QuadraticDiscriminantAnalysis

    storage = SessionStorage()
    X_raw, y, trial_ids, session_indices, label_names, _ = storage.load_all_for_training()
    extractor = EMGFeatureExtractor(channels=HAND_CHANNELS, cross_channel=True)
    X = extractor.extract_features_batch(X_raw)
    X = EMGClassifier()._apply_session_normalization(X, session_indices, y=y)

    clfs = {
        'LDA (ESP32 model)':  LinearDiscriminantAnalysis(),
        'QDA':                QuadraticDiscriminantAnalysis(reg_param=0.1),
        'SVM-RBF':            Pipeline([('s', StandardScaler()), ('m', SVC(kernel='rbf', C=10))]),
        'MLP-128-64':         Pipeline([('s', StandardScaler()),
                                         ('m', MLPClassifier(hidden_layer_sizes=(128,64),
                                                             max_iter=1000, early_stopping=True))]),
    }
    gkf = GroupKFold(n_splits=5)
    print(f"\n{'Classifier':<22} {'Mean CV':>8} {'Std':>6}")
    print("-" * 40)
    for name, clf in clfs.items():
        sc = cross_val_score(clf, X, y, cv=gkf, groups=trial_ids, scoring='accuracy')
        print(f"  {name:<20} {sc.mean()*100:>7.1f}%  ±{sc.std()*100:.1f}%")
    print("\n  → If LDA ≈ SVM: features are the bottleneck (add Change 1 features)")
    print("  → If SVM >> LDA: model complexity bottleneck (implement Change F ensemble)")

Change 6 — Simplified MPF Features

Python training only — not worth porting to ESP32 directly (use bandpower bp0–bp3 from Change 1 as the firmware-side approximation).

Add after EMGFeatureExtractor class:

class MPFFeatureExtractor:
    """
    Simplified 3-channel MPF: CSD upper triangle per 6 frequency bands = 36 features.
    Python training only. Omits matrix logarithm (not needed for 3 channels).
    Source: Kaifosh et al. Nature 2025. doi:10.1038/s41586-025-09255-w
    ESP32 approximation: use bp0–bp3 from EMGFeatureExtractor (Change 1).
    """
    BANDS = [(0,62),(62,125),(125,187),(187,250),(250,375),(375,500)]

    def __init__(self, channels=None, log_diagonal=True):
        self.channels = channels or HAND_CHANNELS
        self.log_diag = log_diagonal
        self.n_ch = len(self.channels)
        self._r, self._c = np.triu_indices(self.n_ch)
        self.n_features = len(self.BANDS) * len(self._r)

    def extract_window(self, window):
        sig   = window[:, self.channels].astype(np.float64)
        N     = len(sig)
        freqs = np.fft.rfftfreq(N, d=1.0/SAMPLING_RATE_HZ)
        Xf    = np.fft.rfft(sig, axis=0)
        feats = []
        for lo, hi in self.BANDS:
            mask = (freqs >= lo) & (freqs < hi)
            if not mask.any():
                feats.extend([0.0] * len(self._r)); continue
            CSD = (Xf[mask].conj().T @ Xf[mask]).real / N
            if self.log_diag:
                for k in range(self.n_ch): CSD[k,k] = np.log(max(CSD[k,k], 1e-10))
            feats.extend(CSD[self._r, self._c].tolist())
        return np.array(feats, dtype=np.float32)

    def extract_batch(self, X):
        out = np.zeros((len(X), self.n_features), dtype=np.float32)
        for i in range(len(X)): out[i] = self.extract_window(X[i])
        return out

In EMGClassifier.train(), after standard feature extraction:

        if getattr(self, 'use_mpf', False):
            mpf = MPFFeatureExtractor(channels=HAND_CHANNELS)
            X_features = np.hstack([X_features, mpf.extract_batch(X_aug)])

Change 7 — Ensemble Training

Priority: Tier 3 (implements Change F's training side) New file: C:/VSCode/Marvel_Projects/Bucky_Arm/train_ensemble.py

"""
Train the full 3-specialist-LDA + meta-LDA ensemble.
Requires Change 1 (expanded features) to be implemented first.
Exports model_weights_ensemble.h for firmware Change F.

Architecture:
  LDA_TD (36 time-domain feat) ─┐
  LDA_FD (24 freq-domain feat)  ├─ 15 probs ─► Meta-LDA ─► final class
  LDA_CC (9  cross-ch feat)     ─┘
"""
import numpy as np
from pathlib import Path
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.model_selection import cross_val_predict, GroupKFold, cross_val_score
import sys
sys.path.insert(0, str(Path(__file__).parent))
from learning_data_collection import (
    SessionStorage, EMGFeatureExtractor, HAND_CHANNELS
)

# ─── Load and extract features ───────────────────────────────────────────────
storage = SessionStorage()
X_raw, y, trial_ids, session_indices, label_names, _ = storage.load_all_for_training()

extractor = EMGFeatureExtractor(channels=HAND_CHANNELS, cross_channel=True)
X = extractor.extract_features_batch(X_raw).astype(np.float64)

# Per-session normalization (same as EMGClassifier._apply_session_normalization)
from sklearn.preprocessing import StandardScaler
for sid in np.unique(session_indices):
    mask = session_indices == sid
    sc = StandardScaler()
    X[mask] = sc.fit_transform(X[mask])

feat_names = extractor.get_feature_names(n_channels=len(HAND_CHANNELS))
n_cls = len(np.unique(y))

# ─── Feature subset indices ───────────────────────────────────────────────────
TD_FEAT = ['rms','wl','zc','ssc','mav','var','iemg','wamp','ar1','ar2','ar3','ar4']
FD_FEAT = ['mnf','mdf','pkf','mnp','bp0','bp1','bp2','bp3']

td_idx = [i for i,n in enumerate(feat_names) if any(n.endswith(f'_{f}') for f in TD_FEAT)]
fd_idx = [i for i,n in enumerate(feat_names) if any(n.endswith(f'_{f}') for f in FD_FEAT)]
cc_idx = [i for i,n in enumerate(feat_names) if n.startswith('cov_') or n.startswith('cor_')]

print(f"Feature subsets — TD: {len(td_idx)}, FD: {len(fd_idx)}, CC: {len(cc_idx)}")

X_td = X[:, td_idx]
X_fd = X[:, fd_idx]
X_cc = X[:, cc_idx]

# ─── Train specialist LDAs with out-of-fold stacking ─────────────────────────
gkf = GroupKFold(n_splits=5)

print("Training specialist LDAs (out-of-fold for stacking)...")
lda_td = LinearDiscriminantAnalysis()
lda_fd = LinearDiscriminantAnalysis()
lda_cc = LinearDiscriminantAnalysis()

oof_td = cross_val_predict(lda_td, X_td, y, cv=gkf, groups=trial_ids, method='predict_proba')
oof_fd = cross_val_predict(lda_fd, X_fd, y, cv=gkf, groups=trial_ids, method='predict_proba')
oof_cc = cross_val_predict(lda_cc, X_cc, y, cv=gkf, groups=trial_ids, method='predict_proba')

# Specialist CV accuracy (for diagnostics)
for name, mdl, Xs in [('LDA_TD', lda_td, X_td), ('LDA_FD', lda_fd, X_fd), ('LDA_CC', lda_cc, X_cc)]:
    sc = cross_val_score(mdl, Xs, y, cv=gkf, groups=trial_ids)
    print(f"  {name}: {sc.mean()*100:.1f}% ± {sc.std()*100:.1f}%")

# ─── Train meta-LDA on out-of-fold outputs ───────────────────────────────────
X_meta = np.hstack([oof_td, oof_fd, oof_cc])   # (n_samples, 3*n_cls = 15)
meta_lda = LinearDiscriminantAnalysis()
meta_sc = cross_val_score(meta_lda, X_meta, y, cv=gkf, groups=trial_ids)
print(f"  Meta-LDA: {meta_sc.mean()*100:.1f}% ± {meta_sc.std()*100:.1f}%")

# Fit all models on full dataset for deployment
lda_td.fit(X_td, y); lda_fd.fit(X_fd, y); lda_cc.fit(X_cc, y)
meta_lda.fit(X_meta, y)

# ─── Export all weights to C header ──────────────────────────────────────────
def lda_to_c_arrays(lda, name, feat_dim, n_cls, label_names, class_order):
    """Generate C array strings for LDA weights and intercepts."""
    # Reorder classes to match label_names order
    coef = lda.coef_    # shape (n_cls, feat_dim) for LinearDiscriminantAnalysis
    intercept = lda.intercept_
    lines = []
    lines.append(f"const float {name}_WEIGHTS[{n_cls}][{feat_dim}] = {{")
    for c in class_order:
        row = ', '.join(f'{v:.8f}f' for v in coef[c])
        lines.append(f"    {{{row}}},  // {label_names[c]}")
    lines.append("};")
    lines.append(f"const float {name}_INTERCEPTS[{n_cls}] = {{")
    intercept_str = ', '.join(f'{intercept[c]:.8f}f' for c in class_order)
    lines.append(f"    {intercept_str}")
    lines.append("};")
    return '\n'.join(lines)

class_order = list(range(n_cls))
out_path = Path('EMG_Arm/src/core/model_weights_ensemble.h')

with open(out_path, 'w') as f:
    f.write("// Auto-generated by train_ensemble.py — do not edit\n")
    f.write("#pragma once\n\n")
    f.write(f"#define MODEL_NUM_CLASSES    {n_cls}\n")
    f.write(f"#define MODEL_NUM_FEATURES   {X.shape[1]}\n")
    f.write(f"#define ENSEMBLE_PER_CH_FEATURES 20\n\n")
    f.write(f"#define TD_FEAT_OFFSET  {min(td_idx)}\n")
    f.write(f"#define TD_NUM_FEATURES {len(td_idx)}\n")
    f.write(f"#define FD_FEAT_OFFSET  {min(fd_idx)}\n")
    f.write(f"#define FD_NUM_FEATURES {len(fd_idx)}\n")
    f.write(f"#define CC_FEAT_OFFSET  {min(cc_idx)}\n")
    f.write(f"#define CC_NUM_FEATURES {len(cc_idx)}\n")
    f.write(f"#define META_NUM_INPUTS ({3} * MODEL_NUM_CLASSES)\n\n")

    f.write(lda_to_c_arrays(lda_td,  'LDA_TD',  len(td_idx), n_cls, label_names, class_order))
    f.write('\n\n')
    f.write(lda_to_c_arrays(lda_fd,  'LDA_FD',  len(fd_idx), n_cls, label_names, class_order))
    f.write('\n\n')
    f.write(lda_to_c_arrays(lda_cc,  'LDA_CC',  len(cc_idx), n_cls, label_names, class_order))
    f.write('\n\n')
    f.write(lda_to_c_arrays(meta_lda, 'META_LDA', 3*n_cls,   n_cls, label_names, class_order))
    f.write('\n\n')

    names_str = ', '.join(f'"{label_names[c]}"' for c in class_order)
    f.write(f"const char *MODEL_CLASS_NAMES[MODEL_NUM_CLASSES] = {{{names_str}}};\n")

print(f"Exported ensemble weights to {out_path}")
print(f"Total weight storage: {(len(td_idx)+len(fd_idx)+len(cc_idx)+3*n_cls)*n_cls*4} bytes float32")

Note on LinearDiscriminantAnalysis with multi-class: scikit-learn's LDA uses a (n_classes-1, n_features) coef matrix for multi-class. Verify lda.coef_.shape after fitting — if it is (n_cls-1, n_feat) rather than (n_cls, n_feat), use the decision_function() output structure and adjust the export accordingly.

PART VII — FEATURE SELECTION FOR ESP32 PORTING

After Change 1 is trained, use this to decide what to port to C firmware.

Step 1 — Get Feature Importance

importance = np.abs(classifier.model.coef_).mean(axis=0)
feat_names  = classifier.feature_extractor.get_feature_names(n_channels=len(HAND_CHANNELS))
ranked = sorted(zip(feat_names, importance), key=lambda x: -x[1])
print("Top 20 features by LDA discriminative weight:")
for name, score in ranked[:20]:
    print(f"  {name:<35} {score:.4f}")

Step 2 — Port Decision Matrix

Feature	C Complexity	Prereq	Port?
RMS, WL, ZC, SSC	✓ Already in C	—	Keep
MAV, VAR, IEMG	Very easy (1 loop)	None	✓ Yes
WAMP	Very easy (threshold on diff)	None	✓ Yes
Cross-ch covariance	Easy (3×3 outer product)	None	✓ Yes
Cross-ch correlation	Easy (normalize covariance)	Covariance	✓ Yes
Bandpower bp0–bp3	Medium (128-pt FFT via esp-dsp)	Add FFT call	✓ Yes — highest ROI
MNF, MDF, PKF, MNP	Easy after FFT	Bandpower FFT	✓ Free once FFT added
AR(4)	Medium (Levinson-Durbin in C)	None	Only if top-8 importance

Once dsps_fft2r_fc32() is added for bandpower, MNF/MDF/PKF/MNP come free.

Step 3 — Adding FFT-Based Features to inference.c

Add inside compute_features() loop, after time-domain features per channel:

// 128-pt FFT for frequency-domain features per channel
// Zero-pad signal from INFERENCE_WINDOW_SIZE (150) to 128 by truncating
float fft_buf[256] = {0};  // 128 complex floats
for (int i = 0; i < 128 && i < INFERENCE_WINDOW_SIZE; i++) {
    fft_buf[2*i]   = signal[i];  // real
    fft_buf[2*i+1] = 0.0f;       // imag
}
dsps_fft2r_fc32(fft_buf, 128);
dsps_bit_rev_fc32(fft_buf, 128);

// Bandpower: bin k → freq = k * 1000/128 ≈ k * 7.8125 Hz
// Band 0: 20–80 Hz  → bins  3–10
// Band 1: 80–150 Hz → bins 10–19
// Band 2: 150–300 Hz→ bins 19–38
// Band 3: 300–500 Hz→ bins 38–64
int band_bins[5] = {3, 10, 19, 38, 64};
float bp[4] = {0,0,0,0};
for (int b = 0; b < 4; b++)
    for (int k = band_bins[b]; k < band_bins[b+1]; k++) {
        float re = fft_buf[2*k], im = fft_buf[2*k+1];
        bp[b] += re*re + im*im;
    }
// Store at correct indices (base = ch * 20)
int base = ch * 20;
features_out[base+16] = bp[0]; features_out[base+17] = bp[1];
features_out[base+18] = bp[2]; features_out[base+19] = bp[3];

PART VIII — MEASUREMENT AND VALIDATION

Baseline Protocol

Run this BEFORE any change and after EACH change.

1. python learning_data_collection.py → option 3 (Train Classifier)
2. Record:
   - "Mean CV accuracy: XX.X% ± Y.Y%"  (cross-validation)
   - Confusion matrix (which gesture pairs are most confused)
   - Per-gesture accuracy breakdown
3. On-device test:
   - Put on sensors, perform 10 reps of each gesture
   - Log classification output (UART or Python serial monitor)
   - Compute per-gesture accuracy manually
4. Record REST false-trigger rate: hold arm at rest for 30 seconds,
   count number of non-REST outputs

Results Log

Change	CV Acc Before	CV Acc After	Delta	On-Device Acc	False Triggers/30s	Keep?
Baseline	—	—	—	—	—	—
Change C (reject)	—	—	—	—	—	—
Change B (filter)	—	—	—	—	—	—
Change 0 (label shift)	—	—	—	—	—	—
Change 1 (features)	—	—	—	—	—	—
Change D (NVS calib)	—	—	—	—	—	—
Change 3 (augment)	—	—	—	—	—	—
Change 5 (benchmark)	—	—	—	—	—	—
Change 7+F (ensemble)	—	—	—	—	—	—
Change E (MLP)	—	—	—	—	—	—

When to Add More Gestures

CV Accuracy	Recommendation
<80%	Do NOT add gestures — fix the existing 5 first
80–90%	Adding 1–2 gestures is reasonable; expect 5–8% drop per new gesture
>90%	Good baseline; can add gestures; target staying above 85%
>95%	Excellent; can be ambitious with gesture count

PART IX — EXPORT WORKFLOW

Path 1 — LDA / Ensemble (Changes 0–4, 7+F)

1. Train: python learning_data_collection.py → option 3  (single LDA)
         OR: python train_ensemble.py                     (full ensemble)

2. Export:
   Single LDA:  classifier.export_to_header(Path('EMG_Arm/src/core/model_weights.h'))
   Ensemble:    export_ensemble_header() in train_ensemble.py
                → writes model_weights_ensemble.h

3. Port new features to inference.c (if Change 1 features added):
   - Follow feature selection decision matrix (Part VII)
   - CRITICAL: C feature index order MUST match Python FEATURE_ORDER exactly

4. Build + flash: pio run -t upload

Path 2 — int8 MLP via TFLM (Change E)

1. python train_mlp_tflite.py  → emg_model_data.cc
2. Add TFLM to platformio.ini lib_deps
3. Replace LDA inference call with inference_mlp_predict() in inference.c
   OR use inference_ensemble_predict() which calls MLP as fallback (Change F)
4. pio run -t upload

Feature Index Contract (Critical)

The order of values written to features_out[] in compute_features() in C must exactly match FEATURE_ORDER in extract_features_window() in Python, index for index.

To verify before flashing: print both the C feature names (from MODEL_FEATURE_NAMES if added to header) and Python extractor.get_feature_names() and diff them.

PART X — REFERENCES

Primary paper: Kaifosh, P., Reardon, T., et al. "A high-bandwidth neuromotor prosthesis enabled by implicit information in intrinsic motor neurons." Nature (2025). doi:10.1038/s41586-025-09255-w

Meta codebase (label alignment, CLER metric, model architectures): C:/VSCode/Marvel_Projects/Meta_Emg_Stuff/generic-neuromotor-interface/

data.py: onset detection, searchsorted alignment, window jitter
cler.py: threshold=0.35, debounce=50ms, tolerance=±50/250ms
networks.py: model architectures, left_context=20, stride=10
lightning.py: targets[..., left_context::stride] label shift

Barachant et al. 2012: "Multiclass brain–computer interface classification by Riemannian geometry." — matrix logarithm reference (MPF features).

Espressif libraries:

esp-dsp: github.com/espressif/esp-dsp — biquad, FFT, dot-product
esp-dl: github.com/espressif/esp-dl — quantized MLP/CNN inference
TFLite Micro: github.com/tensorflow/tflite-micro

All project files (existing + planned):

── Laptop / Python ─────────────────────────────────────────────────────────────────────────
C:/VSCode/Marvel_Projects/Bucky_Arm/learning_data_collection.py  ← main: data collection + training
C:/VSCode/Marvel_Projects/Bucky_Arm/live_predict.py              ← NEW (Part 0.6): laptop-side live inference
C:/VSCode/Marvel_Projects/Bucky_Arm/train_ensemble.py            ← NEW (Change 7): ensemble training
C:/VSCode/Marvel_Projects/Bucky_Arm/train_mlp_tflite.py          ← NEW (Change E): int8 MLP export

── ESP32 Firmware — Existing ───────────────────────────────────────────────────────────────
C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/platformio.ini
  └─ ADD lib_deps: espressif/esp-dsp (Changes B,1,F), tensorflow/tflite-micro (Change E)
C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/config/config.h
  └─ MODIFY: remove system_mode_t; add EMG_STANDALONE to MAIN_MODE enum (Part 0.7, S1)
C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/app/main.c
  └─ MODIFY: add STATE_LAPTOP_PREDICT, CMD_START_LAPTOP_PREDICT, run_laptop_predict_loop(),
             run_standalone_loop() (Part 0.5)
C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/drivers/emg_sensor.c
  └─ MODIFY (Change A): migrate from adc_oneshot to adc_continuous driver
C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/core/inference.c
  └─ MODIFY: add inference_get_gesture_by_name(), IIR filter (B), features (1), confidence rejection (C)
C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/core/inference.h
  └─ MODIFY: add inference_get_gesture_by_name() declaration
C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/core/gestures.c
  └─ MODIFY: update gesture_names[] and gestures_execute() when adding gestures
C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/core/model_weights.h
  └─ AUTO-GENERATED by export_to_header() — do not edit manually

── ESP32 Firmware — New Files ──────────────────────────────────────────────────────────────
C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/core/bicep.h/.c        ← Part 0 / Section 2.2
C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/core/calibration.h/.c  ← Change D (NVS z-score)
C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/core/inference_ensemble.h/.c  ← Change F
C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/core/inference_mlp.h/.cc      ← Change E
C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/core/model_weights_ensemble.h ← AUTO-GENERATED (Change 7)
C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/core/emg_model_data.h/.cc     ← AUTO-GENERATED (Change E)

112 KiB Raw Blame History Unescape Escape

Bucky Arm — EMG Gesture Control: Master Implementation Reference

TABLE OF CONTENTS

PART 0 — SYSTEM ARCHITECTURE & RESPONSIBILITY ASSIGNMENT

0.1 Who Does What

0.2 Operating Modes

0.3 FSM Reference (EMG_MAIN mode)

0.4 EMG_STANDALONE Boot Sequence

0.5 New Firmware Changes for Architecture

S1 — Add EMG_STANDALONE to config.h

S2 — Add STATE_LAPTOP_PREDICT to FSM (main.c)

0.6 New Python Script: live_predict.py

0.7 Firmware Cleanup: system_mode_t Removal

PART I — SYSTEM FOUNDATIONS

1. Hardware Specification

ESP32-S3 N32R16V — Confirmed Hardware

Espressif Acceleration Libraries

Real-Time Budget (1000 Hz, 25ms hop)

Hard No-Gos

2. Current System Snapshot

2.1 — Confirmed Firmware Architecture (From Codebase Exploration)

ADC Pin Mapping (drivers/emg_sensor.c)

Current Task Structure (app/main.c)

State Machine (app/main.c)

Complete Data Flow (Exact Function Names)

Current Buffer State

platformio.ini Current State (EMG_Arm/platformio.ini)

2.2 — Bicep Channel Subsystem (ch3 / ADC_CHANNEL_9 / GPIO 10)

Current Status

Phase 1 — Binary Flex/Unflex (Current Target)

Phase 2 — Continuous Angle/Velocity Prediction (Future)

Including ch3 in Hand Gesture Classifier (for Wrist Rotation)

3. What Meta Built — Filtered for ESP32

4. Current Code State + Known Bugs

Key Symbol Locations

Pending Cleanups (Do Before Any Other Code Changes)

Known Bug — Line 2382

Current model_weights.h State (as of 2026-02-14 training run)

Current Feature Vector (12 features — firmware contract)

Target Feature Vector (69 features after Change 1)

Specialist Feature Subset Indices (for Change F + Change 7)

PART II — TARGET ARCHITECTURE

5. Full Recommended Multi-Model Stack

Model Weight Footprint

6. Compute Budget for Full Stack

7. Why This Architecture Works for 3-Channel EMG

PART III — GESTURE EXTENSIBILITY

8. What Changes When Adding or Removing a Gesture

What Changes Automatically (No Manual Code Edits)

What Requires Manual Code Changes

Complete Workflow for Adding a Gesture

Removing a Gesture

9. Practical Limits of 3-Channel EMG

10. Specific Gesture Considerations

Wrist Flexion / Extension

Wrist Rotation (Supination / Pronation)

Pinch / Precision Grasp

Including ch3 (Bicep) for Wrist Gestures

PART IV — CHANGE REFERENCE

11. Change Classification Matrix

PART V — FIRMWARE CHANGES

Change A — DMA-Driven ADC Sampling (Migration from adc_oneshot to adc_continuous)

ESP-IDF ADC Continuous API

Change B — IIR Biquad Bandpass Filter

Step 1 — Compute Coefficients in Python (one-time, offline)

Step 2 — Add to inference.c (after includes, before // --- State ---)

Step 3 — Apply Per Sample (called before writing to window_buffer)

Change C — Confidence Rejection

Step 1 — Add Constant (top of inference.c with other constants)

Step 2 — Insert After EMA Block in inference_predict() (after line 214)

Change D — On-Device NVS Calibration

New Files

calibration.h

calibration.c

Integration in inference.c

Startup Flow

Change E — int8 MLP via TFLite Micro

Python Training (new file: train_mlp_tflite.py)

Firmware (inference_mlp.cc)

Change F — Ensemble Inference Pipeline

112 KiB

Raw Blame History

S1 — Add `EMG_STANDALONE` to `config.h`

S2 — Add `STATE_LAPTOP_PREDICT` to FSM (`main.c`)

0.6 New Python Script: `live_predict.py`

0.7 Firmware Cleanup: `system_mode_t` Removal

ADC Pin Mapping (`drivers/emg_sensor.c`)

Current Task Structure (`app/main.c`)

State Machine (`app/main.c`)

`platformio.ini` Current State (`EMG_Arm/platformio.ini`)

Current `model_weights.h` State (as of 2026-02-14 training run)

Change A — DMA-Driven ADC Sampling (Migration from `adc_oneshot` to `adc_continuous`)

Step 2 — Add to inference.c (after includes, before `// --- State ---`)

Step 2 — Insert After EMA Block in `inference_predict()` (after line 214)

Python Training (new file: `train_mlp_tflite.py`)

Step 2 — Apply Shift in `SessionStorage.save_session()` (after line ~704)

Sub-change 1A — Expand `extract_features_single_channel()` (line 1448)

Sub-change 1B — Update `extract_features_window()` Return Block (line 1482)

Sub-change 1C — Update `EMGFeatureExtractor.init()` (line 1430)

Sub-change 1D — Update Feature Count in `extract_features_batch()` (line 1520)

Sub-change 1E — Update `get_feature_names()` (line 1545)

Sub-change 1F — Update `EMGClassifier.init()` (line 1722)

Sub-change 1G — Update `save()` (line 1910) and `load()` (line 2089)