BUCKY_ARM_MASTER_PLAN.md

# Bucky Arm — EMG Gesture Control: Master Implementation Reference
> Version: 2026-03-01 | Target: ESP32-S3 N32R16V (Xtensa LX7 @ 240 MHz, 512 KB SRAM, 16 MB OPI PSRAM)
> Supersedes: META_EMG_RESEARCH_NOTES.md + BUCKY_ARM_IMPROVEMENT_PLAN.md
> Source paper: doi:10.1038/s41586-025-09255-w (PDF: C:/VSCode/Marvel_Projects/s41586-025-09255-w.pdf)

---

## TABLE OF CONTENTS

- [PART 0 — SYSTEM ARCHITECTURE & RESPONSIBILITY ASSIGNMENT](#part-0--system-architecture--responsibility-assignment)
  - [0.1 Who Does What](#01-who-does-what)
  - [0.2 Operating Modes](#02-operating-modes)
  - [0.3 FSM Reference (EMG_MAIN mode)](#03-fsm-reference-emg_main-mode)
  - [0.4 EMG_STANDALONE Boot Sequence](#04-emg_standalone-boot-sequence)
  - [0.5 New Firmware Changes for Architecture](#05-new-firmware-changes-for-architecture)
  - [0.6 New Python Script: live_predict.py](#06-new-python-script-live_predictpy)
  - [0.7 Firmware Cleanup: system_mode_t Removal](#07-firmware-cleanup-system_mode_t-removal)
- [PART I — SYSTEM FOUNDATIONS](#part-i--system-foundations)
  - [1. Hardware Specification](#1-hardware-specification)
  - [2. Current System Snapshot](#2-current-system-snapshot)
  - [2.1 Confirmed Firmware Architecture](#21--confirmed-firmware-architecture-from-codebase-exploration)
  - [2.2 Bicep Channel Subsystem](#22--bicep-channel-subsystem-ch3--adc_channel_9--gpio-10)
  - [3. What Meta Built — Filtered for ESP32](#3-what-meta-built--filtered-for-esp32)
  - [4. Current Code State + Known Bugs](#4-current-code-state--known-bugs)
- [PART II — TARGET ARCHITECTURE](#part-ii--target-architecture)
  - [5. Full Recommended Multi-Model Stack](#5-full-recommended-multi-model-stack)
  - [6. Compute Budget for Full Stack](#6-compute-budget-for-full-stack)
  - [7. Why This Architecture Works for 3-Channel EMG](#7-why-this-architecture-works-for-3-channel-emg)
- [PART III — GESTURE EXTENSIBILITY](#part-iii--gesture-extensibility)
  - [8. What Changes When Adding or Removing a Gesture](#8-what-changes-when-adding-or-removing-a-gesture)
  - [9. Practical Limits of 3-Channel EMG](#9-practical-limits-of-3-channel-emg)
  - [10. Specific Gesture Considerations](#10-specific-gesture-considerations)
- [PART IV — CHANGE REFERENCE](#part-iv--change-reference)
  - [11. Change Classification Matrix](#11-change-classification-matrix)
- [PART V — FIRMWARE CHANGES](#part-v--firmware-changes)
  - [Change A — DMA-Driven ADC Sampling](#change-a--dma-driven-adc-sampling)
  - [Change B — IIR Biquad Bandpass Filter](#change-b--iir-biquad-bandpass-filter)
  - [Change C — Confidence Rejection](#change-c--confidence-rejection)
  - [Change D — On-Device NVS Calibration](#change-d--on-device-nvs-calibration)
  - [Change E — int8 MLP via TFLM](#change-e--int8-mlp-via-tflm)
  - [Change F — Ensemble Inference Pipeline](#change-f--ensemble-inference-pipeline)
- [PART VI — PYTHON/TRAINING CHANGES](#part-vi--pythontraining-changes)
  - [Change 0 — Forward Label Shift](#change-0--forward-label-shift)
  - [Change 1 — Expanded Feature Set](#change-1--expanded-feature-set)
  - [Change 2 — Electrode Repositioning](#change-2--electrode-repositioning)
  - [Change 3 — Data Augmentation](#change-3--data-augmentation)
  - [Change 4 — Reinhard Compression](#change-4--reinhard-compression)
  - [Change 5 — Classifier Benchmark](#change-5--classifier-benchmark)
  - [Change 6 — Simplified MPF Features](#change-6--simplified-mpf-features)
  - [Change 7 — Ensemble Training](#change-7--ensemble-training)
- [PART VII — FEATURE SELECTION FOR ESP32 PORTING](#part-vii--feature-selection-for-esp32-porting)
- [PART VIII — MEASUREMENT AND VALIDATION](#part-viii--measurement-and-validation)
- [PART IX — EXPORT WORKFLOW](#part-ix--export-workflow)
- [PART X — REFERENCES](#part-x--references)

---

# PART 0 — SYSTEM ARCHITECTURE & RESPONSIBILITY ASSIGNMENT

> This section is the authoritative reference for what runs where. All implementation
> decisions in later parts should be consistent with this partition.

## 0.1 Who Does What

| Responsibility | Laptop (Python) | ESP32 |
|----------------|-----------------|-------|
| EMG sensor reading | — | ✓ `emg_sensor_read()` always |
| Raw data streaming (for collection) | Receives CSV, saves to HDF5 | Streams CSV over UART |
| Model training | ✓ `learning_data_collection.py` | — |
| Model export | ✓ `export_to_header()` → `model_weights.h` | Compiled into firmware |
| On-device inference | — | ✓ `inference_predict()` |
| Laptop-side live inference | ✓ `live_predict.py` (new script) | Streams ADC + executes received cmd |
| Arm actuation | — (sends gesture string back to ESP32) | ✓ `gestures_execute()` |
| Autonomous operation (no laptop) | Not needed | ✓ `EMG_STANDALONE` mode |
| Bicep flex detection | — | ✓ `bicep_detect()` (new, Section 2.2) |
| NVS calibration | — | ✓ `calibration.c` (Change D) |

**Key rule**: The laptop is never required for real-time arm control in production.
The laptop's role is: collect data → train model → export → flash firmware → done.
After that, the ESP32 operates completely independently.

---

## 0.2 Operating Modes

Controlled by `#define MAIN_MODE` in `config/config.h`.
The enum currently reads `enum {EMG_MAIN, SERVO_CALIBRATOR, GESTURE_TESTER}`.
A new value `EMG_STANDALONE` must be added.

| `MAIN_MODE` | When to use | Laptop required? | Entry point |
|-------------|-------------|-----------------|-------------|
| `EMG_MAIN` | Development sessions, data collection, monitored operation | Yes — UART handshake to start any mode | `appConnector()` in `main.c` |
| `EMG_STANDALONE` | **Fully autonomous deployment** — no laptop | **No** — boots directly into predict+control | `run_standalone_loop()` (new function in `main.c`) |
| `SERVO_CALIBRATOR` | Hardware setup, testing servo range of motion | Yes (serial input) | Inline in `app_main()` |
| `GESTURE_TESTER` | Testing gesture→servo mapping via keyboard | Yes (serial input) | Inline in `app_main()` |

**How to switch mode**: change `#define MAIN_MODE` in `config.h` and reflash.

**To add `EMG_STANDALONE` to `config.h`** (1-line change):
```c
// config.h line 19 — current:
enum {EMG_MAIN, SERVO_CALIBRATOR, GESTURE_TESTER};

// Update to:
enum {EMG_MAIN, SERVO_CALIBRATOR, GESTURE_TESTER, EMG_STANDALONE};
```

---

## 0.3 FSM Reference (EMG_MAIN mode)

The `device_state_t` enum in `main.c` and the `command_t` enum control all transitions.
Currently: `{STATE_IDLE, STATE_CONNECTED, STATE_STREAMING, STATE_PREDICTING}`.
A new state `STATE_LAPTOP_PREDICT` must be added (see Section 0.5).

```
STATE_IDLE
  └─ {"cmd":"connect"} ──────────────────────────► STATE_CONNECTED
                                                         │
                               {"cmd":"start"} ──────────┤
                                                         │    STATE_STREAMING
                                                         │    ESP32 sends raw ADC CSV at 1kHz
                                                         │    Laptop: saves to HDF5 (data collection)
                                                         │    Laptop: trains model → exports model_weights.h
                                                         │    ◄──── {"cmd":"stop"} ────────────────────┘
                                                         │
                        {"cmd":"start_predict"} ─────────┤
                                                         │    STATE_PREDICTING
                                                         │    ESP32: inference_predict() on-device
                                                         │    ESP32: gestures_execute()
                                                         │    Laptop: optional UART monitor only
                                                         │    ◄──── {"cmd":"stop"} ────────────────────┘
                                                         │
                   {"cmd":"start_laptop_predict"} ───────┘
                                                              STATE_LAPTOP_PREDICT  [NEW]
                                                              ESP32: streams raw ADC CSV (same as STREAMING)
                                                              Laptop: runs live_predict.py inference
                                                              Laptop: sends {"gesture":"fist"} back
                                                              ESP32: executes received gesture command
                                                              ◄──── {"cmd":"stop"} ────────────────────┘

All active states:
  {"cmd":"stop"}       → STATE_CONNECTED
  {"cmd":"disconnect"} → STATE_IDLE
  {"cmd":"connect"}    → STATE_CONNECTED  (from any state — reconnect)
```

**Convenience table of commands and their effects:**

| JSON command | Valid from state | Result |
|---|---|---|
| `{"cmd":"connect"}` | Any | → `STATE_CONNECTED` |
| `{"cmd":"start"}` | `STATE_CONNECTED` | → `STATE_STREAMING` |
| `{"cmd":"start_predict"}` | `STATE_CONNECTED` | → `STATE_PREDICTING` |
| `{"cmd":"start_laptop_predict"}` | `STATE_CONNECTED` | → `STATE_LAPTOP_PREDICT` (new) |
| `{"cmd":"stop"}` | `STREAMING/PREDICTING/LAPTOP_PREDICT` | → `STATE_CONNECTED` |
| `{"cmd":"disconnect"}` | Any active state | → `STATE_IDLE` |

---

## 0.4 EMG_STANDALONE Boot Sequence

No UART handshake. No laptop required. Powers on → predicts → controls arm.

```
app_main() switch MAIN_MODE == EMG_STANDALONE:
  │
  ├── hand_init()            // servos
  ├── emg_sensor_init()      // ADC setup
  ├── inference_init()       // clear window buffer, reset smoothing state
  ├── calibration_init()     // load NVS z-score params (Change D)
  │       └── if not found in NVS:
  │               collect 120 REST windows (~3s at 25ms hop)
  │               call calibration_update() to compute and store stats
  ├── bicep_load_threshold() // load NVS bicep threshold (Section 2.2)
  │       └── if not found:
  │               collect 3s of still bicep data
  │               call bicep_calibrate() and bicep_save_threshold()
  │
  └── run_standalone_loop()  ← NEW function (added to main.c)
        while (1):
          emg_sensor_read(&sample)
          inference_add_sample(sample.channels)
          if stride_counter++ >= INFERENCE_HOP_SIZE:
            stride_counter = 0
            gesture_t g = inference_get_gesture_enum(inference_predict(&conf))
            gestures_execute(g)
            bicep_state_t b = bicep_detect()
            // (future: bicep_actuate(b))
          vTaskDelay(1)
```

`run_standalone_loop()` is structurally identical to `run_inference_loop()` in `EMG_MAIN`,
minus all UART state-change checking and telemetry prints. It runs forever until power-off.

**Where to add**: New function `run_standalone_loop()` in `app/main.c`, plus a new case
in the `app_main()` switch block:
```c
case EMG_STANDALONE:
    run_standalone_loop();
    break;
```

---

## 0.5 New Firmware Changes for Architecture

These changes are needed to implement the architecture above. They are **structural**
(not accuracy improvements) and should be done before any other changes.

### S1 — Add `EMG_STANDALONE` to `config.h`

**File**: `EMG_Arm/src/config/config.h`, line 19
```c
// Change:
enum {EMG_MAIN, SERVO_CALIBRATOR, GESTURE_TESTER};
// To:
enum {EMG_MAIN, SERVO_CALIBRATOR, GESTURE_TESTER, EMG_STANDALONE};
```

### S2 — Add `STATE_LAPTOP_PREDICT` to FSM (`main.c`)

**File**: `EMG_Arm/src/app/main.c`

```c
// In device_state_t enum — add new state:
typedef enum {
  STATE_IDLE = 0,
  STATE_CONNECTED,
  STATE_STREAMING,
  STATE_PREDICTING,
  STATE_LAPTOP_PREDICT,  // ← ADD: streams ADC to laptop, executes laptop's gesture commands
} device_state_t;

// In command_t enum — add new command:
typedef enum {
  CMD_NONE = 0,
  CMD_CONNECT,
  CMD_START,
  CMD_START_PREDICT,
  CMD_START_LAPTOP_PREDICT,  // ← ADD
  CMD_STOP,
  CMD_DISCONNECT,
} command_t;
```

**In `parse_command()`** — add detection (place BEFORE the `"start"` check to avoid prefix collision):
```c
} else if (strncmp(value_start, "start_laptop_predict", 20) == 0) {
    return CMD_START_LAPTOP_PREDICT;
} else if (strncmp(value_start, "start_predict", 13) == 0) {
    return CMD_START_PREDICT;
} else if (strncmp(value_start, "start", 5) == 0) {
    return CMD_START;
```

**In `serial_input_task()` FSM switch** — add to `STATE_CONNECTED` block:
```c
} else if (cmd == CMD_START_LAPTOP_PREDICT) {
    g_device_state = STATE_LAPTOP_PREDICT;
    printf("[STATE] CONNECTED -> LAPTOP_PREDICT\n");
    xQueueSend(g_cmd_queue, &cmd, 0);
}
```

**Add to the active-state check** in `serial_input_task()`:
```c
case STATE_STREAMING:
case STATE_PREDICTING:
case STATE_LAPTOP_PREDICT:  // ← ADD to the case list
    if (cmd == CMD_STOP) { ... }
```

**New function `run_laptop_predict_loop()`** (add alongside `stream_emg_data()` and `run_inference_loop()`):
```c
/**
 * @brief Laptop-mediated prediction loop (STATE_LAPTOP_PREDICT).
 *
 * Streams raw ADC CSV to laptop for inference.
 * Simultaneously reads gesture commands sent back by laptop.
 * Executes received gesture immediately.
 *
 * Laptop sends: {"gesture":"fist"}\n  OR  {"gesture":"rest"}\n  etc.
 * ESP32 parses the "gesture" field and calls inference_get_gesture_enum() + gestures_execute().
 */
static void run_laptop_predict_loop(void) {
    emg_sample_t sample;
    char cmd_buf[64];
    int cmd_idx = 0;

    printf("{\"status\":\"info\",\"msg\":\"Laptop-predict mode started\"}\n");

    while (g_device_state == STATE_LAPTOP_PREDICT) {
        // 1. Send raw ADC sample (same format as STATE_STREAMING)
        emg_sensor_read(&sample);
        printf("%u,%u,%u,%u\n", sample.channels[0], sample.channels[1],
               sample.channels[2], sample.channels[3]);

        // 2. Non-blocking read of any incoming gesture command from laptop
        //    (serial_input_task already handles FSM commands; this handles gesture commands)
        //    Note: getchar() is non-blocking when there is no data (returns EOF).
        //    Gesture messages from laptop look like: {"gesture":"fist"}\n
        int c = getchar();
        if (c != EOF && c != 0xFF) {
            if (c == '\n' || c == '\r') {
                if (cmd_idx > 0) {
                    cmd_buf[cmd_idx] = '\0';
                    // Parse {"gesture":"<name>"} — look for "gesture" field
                    const char *g = strstr(cmd_buf, "\"gesture\"");
                    if (g) {
                        const char *v = strchr(g, ':');
                        if (v) {
                            v++;
                            while (*v == ' ' || *v == '"') v++;
                            // Extract gesture name up to closing quote
                            char name[32] = {0};
                            int ni = 0;
                            while (*v && *v != '"' && ni < 31) name[ni++] = *v++;
                            name[ni] = '\0';
                            // Map name to enum and execute (reuse inference mapping)
                            gesture_t gesture = (gesture_t)inference_get_gesture_enum_by_name(name);
                            if (gesture != GESTURE_NONE) {
                                gestures_execute(gesture);
                            }
                        }
                    }
                    cmd_idx = 0;
                }
            } else if (cmd_idx < (int)sizeof(cmd_buf) - 1) {
                cmd_buf[cmd_idx++] = (char)c;
            } else {
                cmd_idx = 0;
            }
        }

        vTaskDelay(1);
    }
}
```

**Note**: `inference_get_gesture_enum_by_name(const char *name)` is just the existing
`inference_get_gesture_enum(int class_idx)` refactored to accept a string directly
(bypassing the class_idx lookup). Alternatively, keep the existing function and add a
simple wrapper — the string matching logic already exists in `inference.c`:
```c
// Simpler: reuse the existing strcmp chain in inference_get_gesture_enum()
// by passing the name through a helper that returns the gesture_t directly.
// Add to inference.c / inference.h:
gesture_t inference_get_gesture_by_name(const char *name);
// (same strcmp logic as inference_get_gesture_enum, but returns gesture_t directly)
```

**In `state_machine_loop()`** — add the new state:
```c
static void state_machine_loop(void) {
    command_t cmd;
    const TickType_t poll_interval = pdMS_TO_TICKS(50);
    while (1) {
        if      (g_device_state == STATE_STREAMING)        stream_emg_data();
        else if (g_device_state == STATE_PREDICTING)       run_inference_loop();
        else if (g_device_state == STATE_LAPTOP_PREDICT)   run_laptop_predict_loop();  // ← ADD
        xQueueReceive(g_cmd_queue, &cmd, poll_interval);
    }
}
```

**In `app_main()` switch** — add the standalone case:
```c
case EMG_STANDALONE:
    run_standalone_loop();  // new function — see Section 0.4
    break;
```

---

## 0.6 New Python Script: `live_predict.py`

**Location**: `C:/VSCode/Marvel_Projects/Bucky_Arm/live_predict.py` (new file)
**Purpose**: Laptop-side live inference. Reads raw ADC stream from ESP32, runs the Python
classifier, sends gesture commands back to ESP32 for arm control.
**When to use**: `EMG_MAIN` + `STATE_LAPTOP_PREDICT` — useful for debugging and comparing
laptop accuracy vs on-device accuracy before flashing a new model.

```python
"""
live_predict.py — Laptop-side live EMG inference for Bucky Arm.

Connects to ESP32, requests STATE_LAPTOP_PREDICT, reads raw ADC CSV,
runs the trained Python classifier, sends gesture commands back to ESP32.

Usage:
    python live_predict.py --port COM3 --model path/to/saved_model/
"""
import argparse
import time
import numpy as np
import serial
from pathlib import Path
import sys
sys.path.insert(0, str(Path(__file__).parent))
from learning_data_collection import (
    EMGClassifier, EMGFeatureExtractor, SessionStorage, HAND_CHANNELS,
    WINDOW_SIZE_SAMPLES, HOP_SIZE_SAMPLES, NUM_CHANNELS,
)

BAUD_RATE    = 921600
CALIB_SEC    = 3.0          # seconds of REST to collect for normalization at startup
CALIB_LABEL  = "rest"       # label used during calibration window

def parse_args():
    p = argparse.ArgumentParser()
    p.add_argument("--port",  required=True, help="Serial port, e.g. COM3 or /dev/ttyUSB0")
    p.add_argument("--model", required=True, help="Path to saved EMGClassifier model directory")
    return p.parse_args()

def handshake(ser):
    """Send connect command, wait for ack."""
    ser.write(b'{"cmd":"connect"}\n')
    deadline = time.time() + 5.0
    while time.time() < deadline:
        line = ser.readline().decode("utf-8", errors="ignore").strip()
        if "ack_connect" in line:
            print(f"[Handshake] Connected: {line}")
            return True
    raise RuntimeError("No ack_connect received within 5s")

def collect_calibration_windows(ser, n_windows, window_size, hop_size, n_channels):
    """Collect n_windows worth of REST data for normalization calibration."""
    print(f"[Calib] Collecting {n_windows} REST windows — hold arm still...")
    raw_buffer = np.zeros((window_size, n_channels), dtype=np.float32)
    windows = []
    sample_count = 0
    while len(windows) < n_windows:
        line = ser.readline().decode("utf-8", errors="ignore").strip()
        try:
            vals = [float(v) for v in line.split(",")]
            if len(vals) != n_channels:
                continue
        except ValueError:
            continue
        raw_buffer = np.roll(raw_buffer, -1, axis=0)
        raw_buffer[-1] = vals
        sample_count += 1
        if sample_count >= window_size and sample_count % hop_size == 0:
            windows.append(raw_buffer.copy())
    print(f"[Calib] Collected {len(windows)} windows. Computing normalization stats...")
    return np.array(windows)  # (n_windows, window_size, n_channels)

def main():
    args = parse_args()

    # Load trained classifier
    print(f"[Init] Loading classifier from {args.model}...")
    classifier = EMGClassifier()
    classifier.load(Path(args.model))
    extractor = classifier.feature_extractor

    ser = serial.Serial(args.port, BAUD_RATE, timeout=1.0)
    time.sleep(0.5)
    ser.reset_input_buffer()

    handshake(ser)

    # Request laptop-predict mode
    ser.write(b'{"cmd":"start_laptop_predict"}\n')
    print("[Control] Entered STATE_LAPTOP_PREDICT")

    # Calibration: collect 3s of REST for session normalization
    n_calib_windows = max(10, int(CALIB_SEC * 1000 / (HOP_SIZE_SAMPLES)))
    calib_raw = collect_calibration_windows(
        ser, n_calib_windows, WINDOW_SIZE_SAMPLES, HOP_SIZE_SAMPLES, NUM_CHANNELS
    )
    calib_features = extractor.extract_features_batch(calib_raw)
    calib_mean = calib_features.mean(axis=0)
    calib_std  = np.where(calib_features.std(axis=0) > 1e-6,
                          calib_features.std(axis=0), 1e-6)
    print("[Calib] Done. Starting live prediction...")

    # Live prediction loop
    raw_buffer   = np.zeros((WINDOW_SIZE_SAMPLES, NUM_CHANNELS), dtype=np.float32)
    sample_count = 0
    last_gesture = None

    try:
        while True:
            line = ser.readline().decode("utf-8", errors="ignore").strip()

            # Skip JSON telemetry lines from ESP32
            if line.startswith("{"):
                continue

            try:
                vals = [float(v) for v in line.split(",")]
                if len(vals) != NUM_CHANNELS:
                    continue
            except ValueError:
                continue

            # Slide window
            raw_buffer = np.roll(raw_buffer, -1, axis=0)
            raw_buffer[-1] = vals
            sample_count += 1

            if sample_count >= WINDOW_SIZE_SAMPLES and sample_count % HOP_SIZE_SAMPLES == 0:
                # Extract features and normalize with session stats
                feat = extractor.extract_features_window(raw_buffer)
                feat = (feat - calib_mean) / calib_std

                proba = classifier.model.predict_proba([feat])[0]
                class_idx = int(np.argmax(proba))
                gesture_name = classifier.label_names[class_idx]
                confidence   = float(proba[class_idx])

                # Send gesture command to ESP32
                cmd = f'{{"gesture":"{gesture_name}"}}\n'
                ser.write(cmd.encode("utf-8"))

                if gesture_name != last_gesture:
                    print(f"[Predict] {gesture_name:12s}  conf={confidence:.2f}")
                    last_gesture = gesture_name

    except KeyboardInterrupt:
        print("\n[Stop] Sending stop command...")
        ser.write(b'{"cmd":"stop"}\n')
        ser.close()

if __name__ == "__main__":
    main()
```

**Dependencies** (add to a `requirements.txt` in `Bucky_Arm/` if not already there):
```
pyserial
numpy
scikit-learn
```

---

## 0.7 Firmware Cleanup: `system_mode_t` Removal

`config.h` lines 94–100 define a `system_mode_t` typedef that is **not referenced anywhere**
in the firmware. It predates the current `device_state_t` FSM in `main.c` and conflicts
conceptually with it. Remove before starting implementation work.

**File**: `EMG_Arm/src/config/config.h`
**Remove** (lines 93–100):
```c
/**
 * @brief System operating modes.
 */
typedef enum {
    MODE_IDLE = 0,      /**< Waiting for commands */
    MODE_DATA_STREAM,   /**< Streaming EMG data to laptop */
    MODE_COMMAND,       /**< Executing gesture commands from laptop */
    MODE_DEMO,          /**< Running demo sequence */
    MODE_COUNT
} system_mode_t;
```
No other file references `system_mode_t` — the deletion is safe and requires no other changes.

---

# PART I — SYSTEM FOUNDATIONS

## 1. Hardware Specification

### ESP32-S3 N32R16V — Confirmed Hardware

| Resource | Spec | Implication |
|----------|------|-------------|
| CPU | Dual-core Xtensa LX7 @ 240 MHz | Pin inference to Core 1, sampling to Core 0 |
| SIMD | PIE 128-bit vector extension | esp-dsp exploits this for FFT, biquad, dot-product |
| Internal SRAM | ~512 KB | All hot-path buffers, model weights, inference state |
| OPI PSRAM | 16 MB (~80 MB/s) | ADC ring buffer, raw window storage — not hot path |
| Flash | 32 MB | Code + read-only model flatbuffers (TFLM path) |
| ADC | 2× SAR ADC, 12-bit, continuous DMA mode | Change A: use `adc_continuous` driver |

**Memory rules**:
- Tag inference code: `IRAM_ATTR` — prevents cache miss stalls
- Tag large ring buffers: `EXT_RAM_BSS_ATTR` — pushes to PSRAM automatically
- Never run hot-path loops from PSRAM (latency varies; ~10× slower than SRAM)

### Espressif Acceleration Libraries

| Library | Accelerates | Key Functions |
|---------|-------------|---------------|
| **esp-dsp** | IIR biquad, FFT (up to 4096-pt), vector dot-product, matrix ops — PIE SIMD | `dsps_biquad_f32`, `dsps_fft2r_fc32`, `dsps_dotprod_f32` |
| **esp-nn** | int8 FC, depthwise/pointwise Conv, activations — SIMD optimized | Used internally by esp-dl |
| **esp-dl** | High-level int8 inference: MLP, Conv1D, LSTM; activation buffer management | Small MLP / tiny CNN deployment |
| **TFLite Micro** | Standard int8 flatbuffer inference, tensor arena (static alloc) | Keras → TFLite → int8 workflow |

### Real-Time Budget (1000 Hz, 25ms hop)

| Stage | Cost | Notes |
|-------|------|-------|
| ADC DMA sampling | ~0 µs | Hardware; CPU-free |
| IIR biquad (3 ch, 2 stages) | <100 µs | `dsps_biquad_f32` |
| Feature extraction (69 feat) | ~1,200 µs | FFT-based features dominate |
| 3 specialist LDAs | ~150 µs | `dsps_dotprod_f32` per class |
| Meta-LDA (15 inputs) | ~10 µs | 75 MACs total |
| int8 MLP fallback [69→32→16→5] | ~250 µs | esp-nn FC kernels |
| Post-processing | <50 µs | EMA, vote, debounce |
| **Total (full ensemble)** | **~1,760 µs** | **14× margin within 25ms** |

### Hard No-Gos

| Technique | Why |
|-----------|-----|
| Full MPF with matrix logarithm | Eigendecomposition per window; fragile float32; no SIMD path |
| Conv1D(16→512) + 3×LSTM(512) | ~4 MB weights; LSTM sequential dependency — impossible |
| Any transformer / attention | O(n²); no int8 transformer kernels for MCU |
| On-device gradient updates | Inference only — no training infrastructure |
| Heap allocations on hot path | FreeRTOS heap fragmentation kills determinism |

---

## 2. Current System Snapshot

| Aspect | Current State |
|--------|--------------|
| Channels | 4 total; ch0–ch2 forearm (FCR, FCU, extensor), ch3 bicep (excluded from hand classifier) |
| Sampling | 1000 Hz, timer/polling (jitter — fix with Change A) |
| Window | 150 samples (150ms), 25-sample hop (25ms) |
| Features | 12: RMS, WL, ZC, SSC × 3 channels |
| Classifier | Single LDA, float32 weights in C header |
| Label alignment | RMS onset detection — missing +100ms forward shift (Change 0) |
| Normalization | Per-session z-score in Python; no on-device equivalent (Change D) |
| Smoothing | EMA (α=0.7) + majority vote (5) + debounce (3 counts) |
| Confidence rejection | None — always outputs a class (Change C) |
| Signal filtering | Analogue only via MyoWare (Change B adds software IIR) |
| Gestures | 5: fist, hook\_em, open, rest, thumbs\_up |
| Training data | 15 HDF5 sessions, 1 user |

---

## 2.1 — Confirmed Firmware Architecture (From Codebase Exploration)

> Confirmed by direct codebase inspection 2026-02-24. All file paths relative to
> `C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/`

### ADC Pin Mapping (`drivers/emg_sensor.c`)

| Channel | ADC Channel | GPIO | Muscle Location | Role in Classifier |
|---------|-------------|------|-----------------|-------------------|
| ch0 | `ADC_CHANNEL_1` | GPIO 2 | Forearm Belly (FCR) | Primary flexion signal |
| ch1 | `ADC_CHANNEL_2` | GPIO 3 | Forearm Extensors | Extension signal |
| ch2 | `ADC_CHANNEL_8` | GPIO 9 | Forearm Contractors (FCU) | Ulnar flexion signal |
| ch3 | `ADC_CHANNEL_9` | GPIO 10 | Bicep | Independent — see Section 2.2 |

**Current ADC driver**: `adc_oneshot` (polling — **NOT DMA continuous yet**; Change A migrates this)
- Attenuation: `ADC_ATTEN_DB_12` (0–3.9V full-scale range)
- Calibration: `adc_cali_curve_fitting` scheme
- Output: calibrated millivolts as `uint16_t` packed into `emg_sample_t.channels[4]`
- Timing: `vTaskDelay(1)` in `run_inference_loop()` provides the ~1ms sample interval

### Current Task Structure (`app/main.c`)

| Task | Priority | Stack | Core Pinning | Role |
|------|----------|-------|--------------|------|
| `app_main` (implicit) | Default | Default | None | Runs inference loop + state machine |
| `serial_input_task` | 5 | 4096 B | **None** | Parses UART JSON commands |

**No other tasks exist.** Change A will add `adc_sampling_task` pinned to Core 0.
The inference loop runs on `app_main`'s default task — no explicit core affinity.

### State Machine (`app/main.c`)

```
STATE_IDLE  ─(BLE/UART connect)─►  STATE_CONNECTED
                                         │
                   {"cmd": "start_stream"}▼
                                  STATE_STREAMING    (sends raw ADC over UART for Python)
                                         │
                  {"cmd": "start_predict"}▼
                                  STATE_PREDICTING   (runs run_inference_loop())
```
Communication: UART at 921600 baud, JSON framing.

### Complete Data Flow (Exact Function Names)

```
emg_sensor_read(&sample)
  │  drivers/emg_sensor.c
  │  adc_oneshot_read() × 4 channels → adc_cali_raw_to_voltage() → uint16_t mV
  │  Result: sample.channels[4] = {ch0_mV, ch1_mV, ch2_mV, ch3_mV}
  │
  ▼  Called every ~1ms (vTaskDelay(1) in run_inference_loop)
inference_add_sample(sample.channels)
  │  core/inference.c
  │  Writes to circular window_buffer[150][4]
  │  Returns true when buffer is full (after first 150 samples)
  │
  ▼  Called every 25 samples (stride_counter % INFERENCE_HOP_SIZE == 0)
inference_predict(&confidence)
  │  core/inference.c
  │  compute_features() → LDA scores → softmax → EMA → majority vote → debounce
  │  Returns: gesture class index (int), fills confidence (float)
  │
  ▼
inference_get_gesture_enum(class_idx)
  │  core/inference.c
  │  String match on MODEL_CLASS_NAMES[] → gesture_t enum value
  │
  ▼
gestures_execute(gesture)
     core/gestures.c
     switch(gesture) → servo PWM via LEDC driver
     Servo pins: GPIO 1,4,5,6,7 (Thumb, Index, Middle, Ring, Pinky)
```

### Current Buffer State

```c
// core/inference.c line 19:
static uint16_t window_buffer[INFERENCE_WINDOW_SIZE][NUM_CHANNELS];
//       ^^^^^^^^ MUST change to float when adding IIR filter (Change B)
//
// uint16_t: 150 × 4 × 2 = 1,200 bytes in internal SRAM
// float:    150 × 4 × 4 = 2,400 bytes in internal SRAM  (still trivially small)
//
// Reason for change: IIR filter outputs float; casting back to uint16_t loses
// sub-mV precision and re-introduces the quantization noise we just filtered out.
```

### `platformio.ini` Current State (`EMG_Arm/platformio.ini`)

**Current `lib_deps`**: **None** — completely empty, no external library dependencies.

Required additions per change tier:

| Change | Library | `platformio.ini` `lib_deps` entry |
|--------|---------|----------------------------------|
| B (IIR biquad) | esp-dsp | `espressif/esp-dsp @ ^2.0.0` |
| 1 (FFT features) | esp-dsp | (same — add once for both B and 1) |
| E (int8 MLP) | TFLite Micro | `tensorflow/tflite-micro` |
| F (ensemble) | esp-dsp | (same as B) |

Add to `platformio.ini` under `[env:esp32-s3-devkitc1-n16r16]`:
```ini
lib_deps =
    espressif/esp-dsp @ ^2.0.0
    ; tensorflow/tflite-micro   ← add this only when implementing Change E
```

---

## 2.2 — Bicep Channel Subsystem (ch3 / ADC_CHANNEL_9 / GPIO 10)

### Current Status

The bicep channel is:
- **Sampled**: `emg_sensor_read()` reads all 4 channels; `sample.channels[3]` holds bicep data
- **Excluded from hand classifier**: `HAND_NUM_CHANNELS = 3`; `compute_features()` explicitly
  loops `ch = 0` to `ch < HAND_NUM_CHANNELS` (i.e., ch0, ch1, ch2 only)
- **Not yet independently processed**: the comment in `inference.c` line 68
  (`"ch3 (bicep) is excluded — it will be processed independently"`) is aspirational —
  the independent processing is not yet implemented

### Phase 1 — Binary Flex/Unflex (Current Target)

Implement a simple RMS threshold detector as a new subsystem:

**New files:**
```
EMG_Arm/src/core/bicep.h
EMG_Arm/src/core/bicep.c
```

**bicep.h:**
```c
#pragma once
#include <stdint.h>
#include <stdbool.h>

typedef enum {
    BICEP_STATE_REST = 0,
    BICEP_STATE_FLEX = 1,
} bicep_state_t;

// Call once at session start with ~3s of relaxed bicep data.
// Returns the computed threshold (also stored internally).
float bicep_calibrate(const uint16_t *ch3_samples, int n_samples);

// Call every 25ms (same hop as hand gesture inference).
// Computes RMS on the last BICEP_WINDOW_SAMPLES from the ch3 circular buffer.
bicep_state_t bicep_detect(void);

// Load/save threshold to NVS (reuse calibration.c infrastructure from Change D)
bool bicep_save_threshold(float threshold_mv);
bool bicep_load_threshold(float *threshold_mv_out);
```

**Core logic (`bicep.c`):**
```c
#define BICEP_WINDOW_SAMPLES  50     // 50ms window at 1000Hz
#define BICEP_FLEX_MULTIPLIER 2.5f   // threshold = rest_rms × 2.5
#define BICEP_HYSTERESIS      1.3f   // prevents rapid toggling at threshold boundary

static float s_threshold_mv = 0.0f;
static bicep_state_t s_state = BICEP_STATE_REST;

float bicep_calibrate(const uint16_t *ch3_samples, int n_samples) {
    float rms_sq = 0.0f;
    for (int i = 0; i < n_samples; i++)
        rms_sq += (float)ch3_samples[i] * ch3_samples[i];
    float rest_rms = sqrtf(rms_sq / n_samples);
    s_threshold_mv = rest_rms * BICEP_FLEX_MULTIPLIER;
    printf("[Bicep] Calibrated: rest_rms=%.1f mV, threshold=%.1f mV\n",
           rest_rms, s_threshold_mv);
    return s_threshold_mv;
}

bicep_state_t bicep_detect(void) {
    // Compute RMS on last BICEP_WINDOW_SAMPLES from ch3 circular buffer
    // (ch3 values are stored in window_buffer[][3] alongside hand channels)
    float rms_sq = 0.0f;
    int idx = buffer_head;
    for (int i = 0; i < BICEP_WINDOW_SAMPLES; i++) {
        float v = (float)window_buffer[idx][3];  // ch3 = bicep
        rms_sq += v * v;
        idx = (idx + 1) % INFERENCE_WINDOW_SIZE;
    }
    float rms = sqrtf(rms_sq / BICEP_WINDOW_SAMPLES);

    // Hysteresis: require FLEX_MULTIPLIER to enter flex, 1.0× to exit
    if (s_state == BICEP_STATE_REST && rms > s_threshold_mv * BICEP_HYSTERESIS)
        s_state = BICEP_STATE_FLEX;
    else if (s_state == BICEP_STATE_FLEX && rms < s_threshold_mv)
        s_state = BICEP_STATE_REST;

    return s_state;
}
```

**Integration in `main.c` `run_inference_loop()`:**
```c
// Call alongside inference_predict() every 25ms:
if (stride_counter % INFERENCE_HOP_SIZE == 0) {
    float confidence;
    int class_idx     = inference_predict(&confidence);
    gesture_t gesture = inference_get_gesture_enum(class_idx);
    bicep_state_t bicep = bicep_detect();

    // Combined actuation: hand gesture + bicep state
    // Example: bicep flex can enable/disable certain gestures,
    // or control a separate elbow/wrist joint.
    gestures_execute(gesture);
    // bicep_actuate(bicep);  ← add when elbow motor is wired
}
```

**Calibration trigger (add to serial_input_task command parsing):**
```c
// {"cmd": "calibrate_bicep"}  → collect 3s of rest data, call bicep_calibrate()
```

### Phase 2 — Continuous Angle/Velocity Prediction (Future)

When ready to move beyond binary flex/unflex:

1. **Collect angle-labeled data**: hold arm at 0°, 15°, 30°, 45°, 60°, 75°, 90°;
   log RMS at each; collect 5+ reps per angle.
2. **Fit polynomial**: `angle = a0 + a1*rms + a2*rms²` (degree-2 usually sufficient);
   use `numpy.polyfit(rms_values, angles, deg=2)`.
3. **Store coefficients in NVS**: 3 floats via `nvs_set_blob()`.
4. **On-device evaluation**: `angle = a0 + rms*(a1 + rms*a2)` — 2 MACs per inference.
5. **Velocity**: `velocity = (angle_now - angle_prev) / HOP_MS` with low-pass smoothing.

### Including ch3 in Hand Gesture Classifier (for Wrist Rotation)

If/when wrist rotation or supination gestures are added:
```python
# learning_data_collection.py — change this constant:
HAND_CHANNELS = [0, 1, 2, 3]  # was [0, 1, 2]; include bicep for rotation gestures
```
Feature count becomes: 4 channels × 20 per-ch + 10 cross-ch covariances + 6 correlations = **96 total**.
The bicep subsystem is then retired and ch3 becomes part of the main gesture classifier.

---

## 3. What Meta Built — Filtered for ESP32

Meta's Nature 2025 paper (doi:10.1038/s41586-025-09255-w) describes a 16-channel wristband
running Conv1D(16→512)+3×LSTM(512). **That exact model is not portable to ESP32-S3** (~4 MB
weights). What IS transferable:

| Meta Technique | Transferability | Where Used |
|----------------|-----------------|-----------|
| +100ms forward label shift after onset detection | ✓ Direct copy | Change 0 |
| Frequency features > amplitude features (Extended Data Fig. 6) | ✓ Core insight | Change 1, Change 6 |
| Deliberate electrode repositioning between sessions | ✓ Protocol | Change 2 |
| Window jitter + amplitude augmentation | ✓ Training | Change 3 |
| Reinhard compression `64x/(32+|x|)` | ✓ Optional flag | Change 4 |
| EMA α=0.7, threshold=0.35, debounce=50ms | ✓ Already implemented | Change C |
| Specialist features → meta-learner stacking | ✓ Adapted | Change 7 + F |
| Conv1D+LSTM architecture | ✗ Too large | Not implementable |
| Full MPF with matrix logarithm | ✗ Eigendecomp too costly | Not implementable |

---

## 4. Current Code State + Known Bugs

**All Python changes**: `C:/VSCode/Marvel_Projects/Bucky_Arm/learning_data_collection.py`
**Firmware**: `C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/core/inference.c`
**Config**: `C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/config/config.h`
**Weights**: `C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/core/model_weights.h`

### Key Symbol Locations

| Symbol | Line | Notes |
|--------|------|-------|
| Constants block | 49–94 | `NUM_CHANNELS`, `SAMPLING_RATE_HZ`, `WINDOW_SIZE_MS`, etc. |
| `align_labels_with_onset()` | 442 | RMS onset detection |
| `filter_transition_windows()` | 529 | Removes onset/offset ambiguity windows |
| `SessionStorage.save_session()` | 643 | Calls onset alignment, saves HDF5 |
| `SessionStorage.load_all_for_training()` | 871 | Returns 6 values (see bug below) |
| `EMGFeatureExtractor` class | 1404 | Current: RMS, WL, ZC, SSC only |
| `extract_features_single_channel()` | 1448 | Per-channel feature dict |
| `extract_features_window()` | 1482 | Flat array + cross-channel |
| `extract_features_batch()` | 1520 | Batch wrapper |
| `get_feature_names()` | 1545 | String names for features |
| `CalibrationTransform` class | 1562 | z-score at Python-side inference |
| `EMGClassifier` class | 1713 | LDA/QDA wrapper |
| `EMGClassifier.__init__()` | 1722 | Creates `EMGFeatureExtractor` |
| `EMGClassifier.train()` | 1735 | Feature extraction + model fit |
| `EMGClassifier._apply_session_normalization()` | 1774 | Per-session z-score |
| `EMGClassifier.cross_validate()` | 1822 | GroupKFold, trial-level |
| `EMGClassifier.export_to_header()` | 1956 | Writes `model_weights.h` |
| `EMGClassifier.save()` | 1910 | Persists model params |
| `EMGClassifier.load()` | 2089 | Reconstructs from saved params |
| `run_training_demo()` | 2333 | Main training entry point |
| `inference.c` `compute_features()` | 68 | C feature extraction |
| `inference.c` `inference_predict()` | 158 | C LDA + smoothing pipeline |

### Pending Cleanups (Do Before Any Other Code Changes)

| Item | File | Action |
|------|------|--------|
| Remove `system_mode_t` | `config/config.h` lines 93–100 | Delete the unused typedef (see Part 0, Section 0.7) |
| Add `EMG_STANDALONE` to enum | `config/config.h` line 19 | Add value to the existing MAIN_MODE enum |
| Add `STATE_LAPTOP_PREDICT` + `CMD_START_LAPTOP_PREDICT` | `app/main.c` | See Part 0, Section 0.5 for exact diffs |
| Add `run_standalone_loop()` | `app/main.c` | New function — see Part 0, Section 0.4 |
| Add `run_laptop_predict_loop()` | `app/main.c` | New function — see Part 0, Section 0.5 |
| Add `inference_get_gesture_by_name()` | `core/inference.c` + `core/inference.h` | Small helper — extracts existing strcmp logic |

### Known Bug — Line 2382

```python
# BUG: load_all_for_training() returns 6 values; this call unpacks only 5.
# session_indices_combined is silently dropped — breaks per-session normalization.
X, y, trial_ids, label_names, loaded_sessions = storage.load_all_for_training()

# FIX (apply with Change 1):
X, y, trial_ids, session_indices, label_names, loaded_sessions = storage.load_all_for_training()
```

### Current `model_weights.h` State (as of 2026-02-14 training run)

| Constant | Value | Note |
|----------|-------|------|
| `MODEL_NUM_CLASSES` | 5 | fist, hook_em, open, rest, thumbs_up |
| `MODEL_NUM_FEATURES` | 12 | RMS, WL, ZC, SSC × 3 forearm channels |
| `MODEL_CLASS_NAMES` | `{"fist","hook_em","open","rest","thumbs_up"}` | Alphabetical order |
| `MODEL_NORMALIZE_FEATURES` | *not defined yet* | Add when enabling cross-ch norm (Change B) |
| `MODEL_USE_REINHARD` | *not defined yet* | Add when enabling Reinhard compression (Change 4) |
| `FEAT_ZC_THRESH` | `0.1f` | Fraction of RMS for zero-crossing threshold |
| `FEAT_SSC_THRESH` | `0.1f` | Fraction of RMS for slope sign change threshold |

The LDA_WEIGHTS and LDA_INTERCEPTS arrays are current trained values — do not modify manually.
They are regenerated by `EMGClassifier.export_to_header()` after each training run.

### Current Feature Vector (12 features — firmware contract)

```
ch0: [0]=rms  [1]=wl  [2]=zc  [3]=ssc
ch1: [4]=rms  [5]=wl  [6]=zc  [7]=ssc
ch2: [8]=rms  [9]=wl [10]=zc [11]=ssc
```

### Target Feature Vector (69 features after Change 1)

```
Per channel (×3 channels, 20 features each):
  [0] rms  [1] wl   [2] zc   [3] ssc   [4] mav   [5] var
  [6] iemg [7] wamp [8] ar1  [9] ar2  [10] ar3  [11] ar4
 [12] mnf [13] mdf  [14] pkf [15] mnp  [16] bp0  [17] bp1
 [18] bp2 [19] bp3

ch0: indices  0–19
ch1: indices 20–39
ch2: indices 40–59

Cross-channel (9 features):
 [60] cov_ch0_ch0  [61] cov_ch0_ch1  [62] cov_ch0_ch2
 [63] cov_ch1_ch1  [64] cov_ch1_ch2  [65] cov_ch2_ch2
 [66] cor_ch0_ch1  [67] cor_ch0_ch2  [68] cor_ch1_ch2
```

### Specialist Feature Subset Indices (for Change F + Change 7)

```
TD (time-domain, 36 feat): indices [0–11, 20–31, 40–51]
FD (frequency-domain, 24 feat): indices [12–19, 32–39, 52–59]
CC (cross-channel, 9 feat): indices [60–68]
```

---

# PART II — TARGET ARCHITECTURE

## 5. Full Recommended Multi-Model Stack

```
ADC (DMA, Change A)
  └── IIR Biquad filter per channel (Change B)
        └── 150-sample circular window buffer
              │
              ▼  [every 25ms]
        compute_features()  →  69-feature vector
              │
              ▼
        calibration_apply()  (Change D — NVS z-score)
              │
              ├─── Stage 1: Activity Gate ──────────────────────────────────┐
              │    total_rms < REST_THRESHOLD?  →  return GESTURE_REST      │
              │    (skips all inference during obvious idle)                 │
              │                                                              │
              ▼  (only reached when gesture is active)                      │
        Stage 2: Parallel Specialist LDAs (Change F)                        │
              ├── LDA_TD  [TD features, 36-dim]  →  prob_td[5]             │
              ├── LDA_FD  [FD features, 24-dim]  →  prob_fd[5]             │
              └── LDA_CC  [CC features,  9-dim]  →  prob_cc[5]             │
                                                                            │
              ▼                                                             │
        Stage 3: Meta-LDA stacker (Change F)                               │
              input: [prob_td | prob_fd | prob_cc]  (15-dim)               │
              output: meta_probs[5]                                         │
                                                                            │
              ▼                                                             │
        EMA smoothing (α=0.7) on meta_probs                                │
              │                                                             │
              ├── max smoothed prob ≥ 0.50? ────── Yes ──────────────────┐ │
              │                                                           │ │
              └── No: Stage 4 Confidence Cascade (Change E)              │ │
                    run int8 MLP on full 69-feat vector                  │ │
                    use higher-confidence winner                         │ │
                          │                                              │ │
                          └────────────────────────────────────────────►│ │
                                                                         │ │
              ◄────────────────────────────────────────────────────────── │ │
              │                                                            ◄─┘
              ▼
        Stage 5: Confidence rejection (Change C)
              max_prob < 0.40?  →  return current_output (hold / GESTURE_NONE)
              │
              ▼
        Majority vote (window=5) + Debounce (count=3)
              │
              ▼
        final gesture → actuation
```

### Model Weight Footprint

| Model | Input Dim | Weights | Memory (float32) |
|-------|-----------|---------|-----------------|
| LDA_TD | 36 | 5×36 = 180 | 720 B |
| LDA_FD | 24 | 5×24 = 120 | 480 B |
| LDA_CC | 9 | 5×9 = 45 | 180 B |
| Meta-LDA | 15 | 5×15 = 75 | 300 B |
| int8 MLP [69→32→16→5] | 69 | ~2,900 | ~2.9 KB int8 |
| **Total** | | | **~4.6 KB** |

All model weights fit comfortably in internal SRAM.

---

## 6. Compute Budget for Full Stack

| Stage | Cost | Cumulative |
|-------|------|-----------|
| Feature extraction (69 feat, 128-pt FFT ×3) | 1,200 µs | 1,200 µs |
| NVS calibration apply | 10 µs | 1,210 µs |
| Activity gate (RMS check) | 5 µs | 1,215 µs |
| LDA_TD (36 feat × 5 classes) | 50 µs | 1,265 µs |
| LDA_FD (24 feat × 5 classes) | 35 µs | 1,300 µs |
| LDA_CC (9 feat × 5 classes) | 15 µs | 1,315 µs |
| Meta-LDA (15 feat × 5 classes) | 10 µs | 1,325 µs |
| EMA + confidence check | 10 µs | 1,335 µs |
| int8 MLP (worst case, ~30% of hops) | 250 µs | 1,585 µs |
| Vote + debounce | 20 µs | 1,605 µs |
| **Worst-case total** | **1,760 µs** | **7% of 25ms budget** |

---

## 7. Why This Architecture Works for 3-Channel EMG

Three channels means limited spatial information. The ensemble compensates by extracting
**maximum diversity from the temporal and spectral dimensions**:

- **LDA_TD** specializes in muscle activation *intensity and dynamics* (how hard and fast is each muscle firing)
- **LDA_FD** specializes in muscle activation *frequency content* (motor unit recruitment patterns — slow vs. fast twitch fibres fire at different frequencies)
- **LDA_CC** specializes in *inter-muscle coordination* (which muscles co-activate — the spatial "fingerprint" of each gesture)

These three signal aspects are partially uncorrelated. A gesture that confuses LDA_TD (similar amplitude patterns) may be distinguishable by LDA_FD (different frequency recruitment) or LDA_CC (different co-activation pattern). The meta-LDA learns which specialist to trust for each gesture boundary.

The int8 MLP fallback handles the residual nonlinear cases: gesture pairs where the decision boundary is curved in feature space, which LDA (linear boundary only) cannot resolve.

---

# PART III — GESTURE EXTENSIBILITY

## 8. What Changes When Adding or Removing a Gesture

The system is designed for extensibility. Adding a gesture requires **3 firmware lines and a retrain**.

### What Changes Automatically (No Manual Code Edits)

| Component | How it adapts |
|-----------|--------------|
| `MODEL_NUM_CLASSES` in `model_weights.h` | Auto-computed from training data label count |
| LDA weight array dimensions | `[MODEL_NUM_CLASSES][MODEL_NUM_FEATURES]` — regenerated by `export_to_header()` |
| `MODEL_CLASS_NAMES` array | Regenerated by `export_to_header()` |
| All ensemble LDA weight arrays | Regenerated by `export_ensemble_header()` (Change 7) |
| int8 MLP output layer | Retrained with new class count; re-exported to TFLite |
| Meta-LDA input/output dims | `META_NUM_INPUTS = 3 × MODEL_NUM_CLASSES` — auto from Python |

### What Requires Manual Code Changes

**Python side** (`learning_data_collection.py`):
```python
# 1. Add gesture name to the gesture list (1 line)
# Find where GESTURES or similar list is defined (near constants block ~line 49)
GESTURES = ['fist', 'hook_em', 'open', 'rest', 'thumbs_up', 'wrist_flex']  # example
```

**Firmware — `config.h`** (1 line per gesture):
```c
// Add enum value
typedef enum {
    GESTURE_NONE     = 0,
    GESTURE_REST     = 1,
    GESTURE_FIST     = 2,
    GESTURE_OPEN     = 3,
    GESTURE_HOOK_EM  = 4,
    GESTURE_THUMBS_UP = 5,
    GESTURE_WRIST_FLEX = 6,  // ← add this line
} gesture_t;
```

**Firmware — `inference.c`** `inference_get_gesture_enum()` (2–3 lines per gesture):
```c
if (strcmp(name, "wrist_flex") == 0 || strcmp(name, "WRIST_FLEX") == 0)
    return GESTURE_WRIST_FLEX;
```

**Firmware — `gestures.c`** (2 changes — these are easy to miss):
```c
// 1. Add to gesture_names[] static array — index MUST match gesture_t enum value:
static const char *gesture_names[GESTURE_COUNT] = {
    "NONE",       // GESTURE_NONE = 0
    "REST",       // GESTURE_REST = 1
    "FIST",       // GESTURE_FIST = 2
    "OPEN",       // GESTURE_OPEN = 3
    "HOOK_EM",    // GESTURE_HOOK_EM = 4
    "THUMBS_UP",  // GESTURE_THUMBS_UP = 5
    "WRIST_FLEX", // GESTURE_WRIST_FLEX = 6  ← add here
};

// 2. Add case to gestures_execute() switch statement:
case GESTURE_WRIST_FLEX:
    gesture_wrist_flex();   // implement the actuation function
    break;
```

**Critical**: `GESTURE_COUNT` at the end of the `gesture_t` enum in `config.h` is used as the
array size for `gesture_names[]`. It updates automatically when new enum values are added before
it. Both `gesture_names[GESTURE_COUNT]` and the switch statement must be kept in sync with
`GESTURE_COUNT`. Mismatch causes a bounds-overrun or silent misclassification.

### Complete Workflow for Adding a Gesture

```
1. Python: add gesture string to GESTURES list in learning_data_collection.py (1 line)

2. Data: collect ≥10 sessions × ≥30 reps of new gesture
   (follow Change 2 protocol: vary electrode placement between sessions)

3. Train: python learning_data_collection.py → option 3
         OR: python train_ensemble.py (after Change 7 is implemented)

4. Export: export_to_header() OR export_ensemble_header()
   → overwrites model_weights.h / model_weights_ensemble.h with new class count

5. config.h: add enum value before GESTURE_COUNT (1 line):
       GESTURE_WRIST_FLEX = 6,   // ← insert before GESTURE_COUNT
       GESTURE_COUNT             // stays last — auto-counts

6. inference.c: add string mapping in inference_get_gesture_enum() (2 lines)

7. gestures.c: add name to gesture_names[] array at correct index (1 line)

8. gestures.c: add case to gestures_execute() switch statement (3 lines)

9. Implement actuation function for new gesture (servo angles)

10. Reflash and validate: pio run -t upload
```

**Exact files touched per new gesture (summary):**
| File | What to change |
|------|---------------|
| `learning_data_collection.py` | Add string to GESTURES list |
| `config/config.h` | Add enum value before `GESTURE_COUNT` |
| `core/inference.c` | Add `strcmp` case in `inference_get_gesture_enum()` |
| `core/gestures.c` | Add to `gesture_names[]` array + add switch case |
| `core/gestures.c` | Implement `gesture_<name>()` function with servo angles |
| `core/model_weights.h` | Auto-generated — do not edit manually |

### Removing a Gesture

Removing is the same process in reverse, with one additional step: filter the HDF5 training
data to exclude sessions that contain the removed gesture's label. The simplest approach is
to pass a label whitelist to `load_all_for_training()`:

```python
# Proposed addition to load_all_for_training() — add include_labels parameter
X, y, trial_ids, session_indices, label_names, sessions = \
    storage.load_all_for_training(include_labels=['fist', 'open', 'rest', 'thumbs_up'])
    # hook_em removed — existing session files are not modified
```

---

## 9. Practical Limits of 3-Channel EMG

This is the most important constraint for gesture count:

| Gesture Count | Expected Accuracy | Notes |
|--------------|-------------------|-------|
| 3–5 gestures | >90% achievable | Current baseline target |
| 6–8 gestures | 80–90% achievable | Requires richer features + ensemble |
| 9–12 gestures | 65–80% achievable | Diminishing returns; some pairs will be confused |
| 13+ gestures | <65% | Surface EMG with 3 channels cannot reliably separate this many |

**Why 3 channels limits gesture count**: Surface EMG captures the summed electrical activity of
many motor units under each electrode. With only 3 spatial locations, gestures that recruit
overlapping muscle groups (e.g., all finger-flexion gestures recruit FCR) produce similar
signals. The frequency and coordination features from Change 1 help, but there's a hard
information-theoretic limit imposed by channel count.

**Rule of thumb**: aim for ≤8 gestures with the current 3-channel setup. For more, add the
bicep channel (ch3, currently excluded) to get 4 channels — see Section 10.

---

## 10. Specific Gesture Considerations

### Wrist Flexion / Extension
- **Feasibility**: High — FCR (ch0) activates strongly for flexion; extensor group (ch2) for extension
- **Differentiation from finger gestures**: frequency content differs (wrist involves slower motor units)
- **Recommendation**: Add these before wrist rotation — more reliable with surface EMG

### Wrist Rotation (Supination / Pronation)
- **Feasibility**: Medium — the primary supinator is a deep muscle; surface electrodes capture it weakly
- **Key helper**: the bicep activates strongly during supination → **include ch3** (`HAND_CHANNELS = [0, 1, 2, 3]`)
- **Code change for 4 channels**: Python: `HAND_CHANNELS = [0, 1, 2, 3]`; firmware: `HAND_NUM_CHANNELS` auto-updates from the exported header since `MODEL_NUM_FEATURES` is recalculated
- **Caveat**: pronation vs. rest may be harder to distinguish than supination vs. rest

### Pinch / Precision Grasp
- **Feasibility**: Medium — involves intrinsic hand muscles poorly captured by forearm electrodes
- Likely confused with open hand depending on electrode placement
- Collect with careful placement; validate cross-session accuracy before relying on it

### Including ch3 (Bicep) for Wrist Gestures

To include the bicep channel in the hand gesture classifier:
```python
# learning_data_collection.py — change this constant
HAND_CHANNELS = [0, 1, 2, 3]  # was [0, 1, 2] — add bicep channel
```
Feature count: 4 channels × 20 per-channel features + 10 cross-channel covariances + 6 correlations = **96 total features**.
The ensemble architecture handles this automatically — specialist LDA weight dimensions
recalculate at training time.

---

# PART IV — CHANGE REFERENCE

## 11. Change Classification Matrix

| Change | Category | Priority | Files | ESP32 Reflash? | Retrain? | Risk |
|--------|----------|----------|-------|----------------|----------|------|
| **C** | Firmware | **Tier 1** | inference.c | ✓ | No | **Very Low** |
| **B** | Firmware | **Tier 1** | inference.c / filter.c | ✓ | No | Low |
| **A** | Firmware | **Tier 1** | adc_sampling.c | ✓ | No | Medium |
| **0** | Python | **Tier 1** | learning_data_collection.py | No | ✓ | Low |
| **1** | Python+C | **Tier 2** | learning_data_collection.py + inference.c | ✓ after | ✓ | Medium |
| **D** | Firmware | **Tier 2** | calibration.c/.h | ✓ | No | Medium |
| **2** | Protocol | **Tier 2** | None | No | ✓ new data | None |
| **3** | Python | **Tier 2** | learning_data_collection.py | No | ✓ | Low |
| **E** | Python+FW | **Tier 3** | train_mlp_tflite.py + firmware | ✓ | ✓ | High |
| **4** | Python+C | **Tier 3** | learning_data_collection.py + inference.c | ✓ if enabled | ✓ | Low |
| **5** | Python | **Tier 3** | learning_data_collection.py | No | No | None |
| **6** | Python | **Tier 3** | learning_data_collection.py | No | ✓ | Low |
| **7** | Python | **Tier 3** | new: train_ensemble.py | No | ✓ | Medium |
| **F** | Firmware | **Tier 3** | new: inference_ensemble.c | ✓ | No (needs 7 first) | Medium |

**Recommended implementation order**: C → B → A → 0 → 1 → D → 2 → 3 → 5 (benchmark) → 7+F → E

---

# PART V — FIRMWARE CHANGES

## Change A — DMA-Driven ADC Sampling (Migration from `adc_oneshot` to `adc_continuous`)

**Priority**: Tier 1
**Current driver**: `adc_oneshot_read()` polling in `drivers/emg_sensor.c`. Timing is
controlled by `vTaskDelay(1)` in `run_inference_loop()` — subject to FreeRTOS scheduler
jitter of ±0.5–1ms, which corrupts frequency-domain features and ADC burst grouping.
**Why**: `adc_continuous` runs entirely in hardware DMA. Sample-to-sample jitter drops from
±1ms to <10µs. CPU overhead between samples is zero. Required for frequency features (Change 1).
**Effort**: 2–4 hours (replace `emg_sensor_read()` internals; keep public API the same)

### ESP-IDF ADC Continuous API

```c
// --- Initialize (call once at startup) ---
adc_continuous_handle_t adc_handle = NULL;
adc_continuous_handle_cfg_t adc_cfg = {
    .max_store_buf_size = 4096,    // PSRAM ring buffer size (bytes)
    .conv_frame_size    = 256,     // bytes per conversion frame
};
adc_continuous_new_handle(&adc_cfg, &adc_handle);

// Actual hardware channel mapping (from emg_sensor.c):
// ch0 = ADC_CHANNEL_1 / GPIO 2  (Forearm Belly / FCR)
// ch1 = ADC_CHANNEL_2 / GPIO 3  (Forearm Extensors)
// ch2 = ADC_CHANNEL_8 / GPIO 9  (Forearm Contractors / FCU)
// ch3 = ADC_CHANNEL_9 / GPIO 10 (Bicep — independent subsystem)
adc_digi_pattern_config_t chan_cfg[4] = {
    {.atten = ADC_ATTEN_DB_12, .channel = ADC_CHANNEL_1, .unit = ADC_UNIT_1, .bit_width = ADC_BITWIDTH_12},
    {.atten = ADC_ATTEN_DB_12, .channel = ADC_CHANNEL_2, .unit = ADC_UNIT_1, .bit_width = ADC_BITWIDTH_12},
    {.atten = ADC_ATTEN_DB_12, .channel = ADC_CHANNEL_8, .unit = ADC_UNIT_1, .bit_width = ADC_BITWIDTH_12},
    {.atten = ADC_ATTEN_DB_12, .channel = ADC_CHANNEL_9, .unit = ADC_UNIT_1, .bit_width = ADC_BITWIDTH_12},
};
adc_continuous_config_t cont_cfg = {
    .sample_freq_hz = 4000,        // 4 channels × 1000 Hz = 4000 total samples/sec
    .conv_mode      = ADC_CONV_SINGLE_UNIT_1,
    .format         = ADC_DIGI_OUTPUT_FORMAT_TYPE2,
    .pattern_num    = 4,
    .adc_pattern    = chan_cfg,
};
adc_continuous_config(adc_handle, &cont_cfg);

// --- ISR callback (fires each frame) ---
static SemaphoreHandle_t s_adc_sem;
static bool IRAM_ATTR adc_conv_done_cb(
        adc_continuous_handle_t handle,
        const adc_continuous_evt_data_t *edata, void *user_data) {
    BaseType_t hp_woken = pdFALSE;
    xSemaphoreGiveFromISR(s_adc_sem, &hp_woken);
    return hp_woken == pdTRUE;
}
adc_continuous_evt_cbs_t cbs = { .on_conv_done = adc_conv_done_cb };
adc_continuous_register_event_callbacks(adc_handle, &cbs, NULL);
adc_continuous_start(adc_handle);

// --- ADC calibration (apply per sample) ---
adc_cali_handle_t cali_handle;
adc_cali_curve_fitting_config_t cali_cfg = {
    .unit_id  = ADC_UNIT_1,
    .atten    = ADC_ATTEN_DB_12,   // matches ADC_ATTEN_DB_12 used in current emg_sensor.c
    .bitwidth = ADC_BITWIDTH_12,
};
adc_cali_create_scheme_curve_fitting(&cali_cfg, &cali_handle);

// --- Sampling task (pin to Core 0) ---
void adc_sampling_task(void *arg) {
    uint8_t result_buf[256];
    uint32_t out_len = 0;
    while (1) {
        xSemaphoreTake(s_adc_sem, portMAX_DELAY);
        adc_continuous_read(adc_handle, result_buf, sizeof(result_buf), &out_len, 0);
        // Parse: each entry is adc_digi_output_data_t
        // Apply adc_cali_raw_to_voltage() for each sample
        // Apply IIR filter (Change B) → post to inference ring buffer
    }
}
```

**Verify**: log consecutive sample timestamps via `esp_timer_get_time()`; spacing should be 1.0ms ± 0.05ms.

---

## Change B — IIR Biquad Bandpass Filter

**Priority**: Tier 1
**Why**: MyoWare analogue filters are not tunable. Software IIR removes powerline interference
(50/60 Hz), sub-20 Hz motion artifact, and >500 Hz noise — all of which inflate ZC, WL, and
other features computed at rest.
**Effort**: 2 hours

### Step 1 — Compute Coefficients in Python (one-time, offline)

```python
from scipy.signal import butter
import numpy as np

fs = 1000.0
sos = butter(N=2, Wn=[20.0, 500.0], btype='bandpass', fs=fs, output='sos')
# sos[i] = [b0, b1, b2, a0, a1, a2]
# esp-dsp Direct Form II convention: coeffs = [b0, b1, b2, -a1, -a2]
for i, s in enumerate(sos):
    b0, b1, b2, a0, a1, a2 = s
    print(f"Section {i}: {b0:.8f}f, {b1:.8f}f, {b2:.8f}f, {-a1:.8f}f, {-a2:.8f}f")
# Run this and paste the printed values into the C constants below
```

### Step 2 — Add to inference.c (after includes, before `// --- State ---`)

```c
#include "dsps_biquad.h"

// 2nd-order Butterworth bandpass 20–500 Hz @ 1000 Hz
// Coefficients: [b0, b1, b2, -a1, -a2] — Direct Form II, esp-dsp sign convention
// Regenerate with: scipy.signal.butter(N=2, Wn=[20,500], btype='bandpass', fs=1000, output='sos')
static const float BIQUAD_HP_COEFFS[5] = { /* paste section 0 output here */ };
static const float BIQUAD_LP_COEFFS[5] = { /* paste section 1 output here */ };

// Filter delay state: 3 channels × 2 stages × 2 delay elements = 12 floats (48 bytes)
static float biquad_hp_w[HAND_NUM_CHANNELS][2];
static float biquad_lp_w[HAND_NUM_CHANNELS][2];
```

Add to `inference_init()`:
```c
    memset(biquad_hp_w, 0, sizeof(biquad_hp_w));
    memset(biquad_lp_w, 0, sizeof(biquad_lp_w));
```

### Step 3 — Apply Per Sample (called before writing to window_buffer)

```c
// Apply to each channel before posting to the window buffer.
// Must be called IN ORDER for each sample (IIR has memory across calls).
static float IRAM_ATTR apply_bandpass(int ch, float raw) {
    float hp_out, lp_out;
    dsps_biquad_f32(&raw,   &hp_out, 1, (float *)BIQUAD_HP_COEFFS, biquad_hp_w[ch]);
    dsps_biquad_f32(&hp_out, &lp_out, 1, (float *)BIQUAD_LP_COEFFS, biquad_lp_w[ch]);
    return lp_out;
}
```

**Note**: `window_buffer` stores `uint16_t` — change to `float` when adding this filter, so
filtered values are stored directly without lossy integer round-trip.

**Verify**: log ZC count at rest before and after — filtered ZC should be substantially lower
(less spurious noise crossings).

---

## Change C — Confidence Rejection

**Priority**: Tier 1 — **implement this first, lowest risk of all changes**
**Why**: Without a rejection threshold, ambiguous EMG (rest-to-gesture transition,
mid-gesture fatigue, electrode lift) always produces a false actuation.
**Effort**: 15 minutes

### Step 1 — Add Constant (top of inference.c with other constants)

```c
#define CONFIDENCE_THRESHOLD 0.40f  // Reject when max smoothed prob < this.
                                    // Meta paper uses 0.35; 0.40 adds prosthetic safety margin.
                                    // Tune: lower to 0.35 if real gestures are being rejected.
```

### Step 2 — Insert After EMA Block in `inference_predict()` (after line 214)

```c
  // Confidence rejection: if the peak smoothed probability is below threshold,
  // hold the last confirmed output rather than outputting an uncertain prediction.
  // Prevents false actuations during gesture transitions and electrode artifacts.
  if (max_smoothed_prob < CONFIDENCE_THRESHOLD) {
    *confidence = max_smoothed_prob;
    return current_output;  // -1 (GESTURE_NONE) until first confident prediction
  }
```

**Verify**: arm at complete rest → confirm output stays at GESTURE_NONE and confidence logs
below 0.40. Deliberate fist → confidence rises above 0.40 within 1–3 inference cycles.

---

## Change D — On-Device NVS Calibration

**Priority**: Tier 2
**Why**: Python `CalibrationTransform` only runs during training. On-device NVS calibration
lets the ESP32 recalibrate z-score normalization at startup (3 seconds of REST) without
retraining — solving placement drift and day-to-day impedance variation.
**Effort**: 3–4 hours

### New Files

```
EMG_Arm/src/core/calibration.h
EMG_Arm/src/core/calibration.c
```

### calibration.h

```c
#pragma once
#include <stdbool.h>
#include "config/config.h"

#define CALIB_MAX_FEATURES 96  // supports up to 4-channel expansion

bool calibration_init(void);          // load from NVS at startup
void calibration_apply(float *feat);  // z-score in-place; no-op if not calibrated
bool calibration_update(const float X[][CALIB_MAX_FEATURES], int n_windows, int n_feat);
void calibration_reset(void);
bool calibration_is_valid(void);
```

### calibration.c

```c
#include "calibration.h"
#include "nvs_flash.h"
#include "nvs.h"
#include <math.h>
#include <string.h>
#include <stdio.h>

#define NVS_NAMESPACE "emg_calib"
#define NVS_KEY_MEAN  "feat_mean"
#define NVS_KEY_STD   "feat_std"
#define NVS_KEY_NFEAT "n_feat"
#define NVS_KEY_VALID "calib_ok"

static float s_mean[CALIB_MAX_FEATURES];
static float s_std[CALIB_MAX_FEATURES];
static int   s_n_feat = 0;
static bool  s_valid  = false;

bool calibration_init(void) {
    esp_err_t err = nvs_flash_init();
    if (err == ESP_ERR_NVS_NO_FREE_PAGES || err == ESP_ERR_NVS_NEW_VERSION_FOUND) {
        nvs_flash_erase();
        nvs_flash_init();
    }
    nvs_handle_t h;
    if (nvs_open(NVS_NAMESPACE, NVS_READONLY, &h) != ESP_OK) return false;

    uint8_t valid = 0;
    size_t mean_sz = sizeof(s_mean), std_sz = sizeof(s_std);
    bool ok = (nvs_get_u8(h, NVS_KEY_VALID, &valid)         == ESP_OK) && (valid == 1) &&
              (nvs_get_i32(h, NVS_KEY_NFEAT, (int32_t*)&s_n_feat) == ESP_OK) &&
              (nvs_get_blob(h, NVS_KEY_MEAN, s_mean, &mean_sz) == ESP_OK) &&
              (nvs_get_blob(h, NVS_KEY_STD,  s_std,  &std_sz)  == ESP_OK);
    nvs_close(h);
    s_valid = ok;
    printf("[Calib] %s (%d features)\n", ok ? "Loaded from NVS" : "Not found — identity", s_n_feat);
    return ok;
}

void calibration_apply(float *feat) {
    if (!s_valid) return;
    for (int i = 0; i < s_n_feat; i++)
        feat[i] = (feat[i] - s_mean[i]) / s_std[i];
}

bool calibration_update(const float X[][CALIB_MAX_FEATURES], int n_windows, int n_feat) {
    if (n_windows < 10 || n_feat > CALIB_MAX_FEATURES) return false;
    s_n_feat = n_feat;
    memset(s_mean, 0, sizeof(s_mean));
    for (int w = 0; w < n_windows; w++)
        for (int f = 0; f < n_feat; f++)
            s_mean[f] += X[w][f];
    for (int f = 0; f < n_feat; f++) s_mean[f] /= n_windows;

    memset(s_std, 0, sizeof(s_std));
    for (int w = 0; w < n_windows; w++)
        for (int f = 0; f < n_feat; f++) {
            float d = X[w][f] - s_mean[f];
            s_std[f] += d * d;
        }
    for (int f = 0; f < n_feat; f++) {
        s_std[f] = sqrtf(s_std[f] / n_windows);
        if (s_std[f] < 1e-6f) s_std[f] = 1e-6f;
    }

    nvs_handle_t h;
    if (nvs_open(NVS_NAMESPACE, NVS_READWRITE, &h) != ESP_OK) return false;
    nvs_set_blob(h, NVS_KEY_MEAN, s_mean, sizeof(s_mean));
    nvs_set_blob(h, NVS_KEY_STD,  s_std,  sizeof(s_std));
    nvs_set_i32(h,  NVS_KEY_NFEAT, n_feat);
    nvs_set_u8(h,   NVS_KEY_VALID, 1);
    nvs_commit(h);
    nvs_close(h);
    s_valid = true;
    printf("[Calib] Updated from %d REST windows, %d features\n", n_windows, n_feat);
    return true;
}
```

### Integration in inference.c

In `inference_predict()`, after `compute_features(features)`, before LDA:
```c
    calibration_apply(features);  // z-score using NVS-stored mean/std
```

### Startup Flow

```c
// In main application startup sequence:
calibration_init();  // load from NVS; no-op if not present yet

// When user triggers recalibration (button press or serial command):
// Collect ~120 REST windows (~3 seconds at 25ms hop)
// Call calibration_update(rest_feature_buffer, 120, MODEL_NUM_FEATURES)
```

---

## Change E — int8 MLP via TFLite Micro

**Priority**: Tier 3 — implement after Tier 1+2 changes and benchmark (Change 5) shows LDA plateauing
**Why**: LDA finds only linear decision boundaries. A 2-layer int8 MLP adds nonlinear
boundaries for gesture pairs that overlap in feature space.
**Effort**: 4–6 hours

### Python Training (new file: `train_mlp_tflite.py`)

```python
"""
Train int8 MLP for ESP32-S3 deployment via TFLite Micro.
Run AFTER Change 0 (label shift) + Change 1 (expanded features).
"""
import numpy as np
import tensorflow as tf
from pathlib import Path
import sys
sys.path.insert(0, str(Path(__file__).parent))
from learning_data_collection import SessionStorage, EMGFeatureExtractor, HAND_CHANNELS

storage = SessionStorage()
X_raw, y, trial_ids, session_indices, label_names, _ = storage.load_all_for_training()

extractor = EMGFeatureExtractor(channels=HAND_CHANNELS, cross_channel=True)
X = extractor.extract_features_batch(X_raw).astype(np.float32)

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X = scaler.fit_transform(X)

n_feat, n_cls = X.shape[1], len(np.unique(y))

model = tf.keras.Sequential([
    tf.keras.layers.Input(shape=(n_feat,)),
    tf.keras.layers.Dense(32, activation='relu'),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(16, activation='relu'),
    tf.keras.layers.Dense(n_cls, activation='softmax'),
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(X, y, epochs=150, batch_size=64, validation_split=0.1, verbose=1)

def representative_dataset():
    for i in range(0, len(X), 10):
        yield [X[i:i+1]]

converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type  = tf.int8
converter.inference_output_type = tf.int8
tflite_model = converter.convert()

out = Path('EMG_Arm/src/core/emg_model_data.cc')
with open(out, 'w') as f:
    f.write('#include "emg_model_data.h"\n')
    f.write(f'const int g_model_len = {len(tflite_model)};\n')
    f.write('const unsigned char g_model[] = {\n  ')
    f.write(', '.join(f'0x{b:02x}' for b in tflite_model))
    f.write('\n};\n')
print(f"Wrote {out} ({len(tflite_model)} bytes)")
```

### Firmware (inference_mlp.cc)

```cpp
#include "inference_mlp.h"
#include "emg_model_data.h"
#include "tensorflow/lite/micro/micro_interpreter.h"
#include "tensorflow/lite/micro/micro_mutable_op_resolver.h"
#include "tensorflow/lite/schema/schema_generated.h"

static uint8_t tensor_arena[48 * 1024];  // 48 KB — tune down if memory is tight
static tflite::MicroInterpreter *interpreter = nullptr;
static TfLiteTensor *input = nullptr, *output = nullptr;

void inference_mlp_init(void) {
    const tflite::Model *model = tflite::GetModel(g_model);
    static tflite::MicroMutableOpResolver<4> resolver;
    resolver.AddFullyConnected();
    resolver.AddRelu();
    resolver.AddSoftmax();
    resolver.AddDequantize();
    static tflite::MicroInterpreter interp(model, resolver, tensor_arena, sizeof(tensor_arena));
    interpreter = &interp;
    interpreter->AllocateTensors();
    input  = interpreter->input(0);
    output = interpreter->output(0);
}

int inference_mlp_predict(const float *features, int n_feat, float *conf_out) {
    float iscale = input->params.scale;
    int   izp    = input->params.zero_point;
    for (int i = 0; i < n_feat; i++) {
        int q = (int)roundf(features[i] / iscale) + izp;
        input->data.int8[i] = (int8_t)(q < -128 ? -128 : q > 127 ? 127 : q);
    }
    interpreter->Invoke();

    float oscale = output->params.scale;
    int   ozp    = output->params.zero_point;
    float max_p = -1e9f;
    int max_c = 0;
    for (int c = 0; c < MODEL_NUM_CLASSES; c++) {
        float p = (output->data.int8[c] - ozp) * oscale;
        if (p > max_p) { max_p = p; max_c = c; }
    }
    *conf_out = max_p;
    return max_c;
}
```

**platformio.ini addition**:
```ini
lib_deps =
    tensorflow/tflite-micro
```

---

## Change F — Ensemble Inference Pipeline

**Priority**: Tier 3 (requires Change 1 features + Change 7 training + Change E MLP)
**Why**: This is the full recommended architecture from Part II.
**Effort**: 3–4 hours firmware (after Python ensemble is trained and exported)

### New Files

```
EMG_Arm/src/core/inference_ensemble.c
EMG_Arm/src/core/inference_ensemble.h
EMG_Arm/src/core/model_weights_ensemble.h   (generated by Change 7 Python script)
```

### inference_ensemble.h

```c
#pragma once
#include <stdbool.h>

void inference_ensemble_init(void);
int  inference_ensemble_predict(float *confidence);
```

### inference_ensemble.c

```c
#include "inference_ensemble.h"
#include "inference.h"          // for compute_features(), calibration_apply()
#include "inference_mlp.h"      // for inference_mlp_predict()
#include "model_weights_ensemble.h"
#include "config/config.h"
#include "dsps_dotprod.h"
#include <math.h>
#include <string.h>
#include <stdio.h>

#define ENSEMBLE_EMA_ALPHA      0.70f
#define ENSEMBLE_CONF_THRESHOLD 0.50f  // below this: escalate to MLP fallback
#define REJECT_THRESHOLD        0.40f  // below this even after MLP: hold output
#define REST_ACTIVITY_THRESHOLD 0.05f  // total_rms below this → skip inference, return REST

// EMA state
static float s_smoothed[MODEL_NUM_CLASSES];
// Vote + debounce (reuse existing pattern from inference.c)
static int s_vote_history[5];
static int s_vote_head = 0;
static int s_current_output = -1;
static int s_pending_output = -1;
static int s_pending_count  = 0;

// --- Generic LDA softmax predict ---
// weights: [n_classes][n_feat], intercepts: [n_classes]
// proba_out: [n_classes] — caller-provided output
static void lda_softmax(const float *feat, int n_feat,
                         const float *weights_flat, const float *intercepts,
                         int n_classes, float *proba_out) {
    float raw[MODEL_NUM_CLASSES];
    float max_raw = -1e9f, sum_exp = 0.0f;

    for (int c = 0; c < n_classes; c++) {
        raw[c] = intercepts[c];
        // dsps_dotprod_f32 requires 4-byte aligned arrays and length multiple of 4;
        // for safety use plain loop — compiler will auto-vectorize with -O2
        const float *w = weights_flat + c * n_feat;
        for (int f = 0; f < n_feat; f++) raw[c] += feat[f] * w[f];
        if (raw[c] > max_raw) max_raw = raw[c];
    }
    for (int c = 0; c < n_classes; c++) {
        proba_out[c] = expf(raw[c] - max_raw);
        sum_exp += proba_out[c];
    }
    for (int c = 0; c < n_classes; c++) proba_out[c] /= sum_exp;
}

void inference_ensemble_init(void) {
    for (int c = 0; c < MODEL_NUM_CLASSES; c++)
        s_smoothed[c] = 1.0f / MODEL_NUM_CLASSES;
    for (int i = 0; i < 5; i++) s_vote_history[i] = -1;
    s_vote_head = 0;
    s_current_output = -1;
    s_pending_output = -1;
    s_pending_count  = 0;
}

int inference_ensemble_predict(float *confidence) {
    // 1. Extract features (shared with single-model path)
    float features[MODEL_NUM_FEATURES];
    compute_features(features);
    calibration_apply(features);

    // 2. Activity gate — skip inference during obvious REST
    float total_rms_sq = 0.0f;
    for (int ch = 0; ch < HAND_NUM_CHANNELS; ch++) {
        float r = features[ch * ENSEMBLE_PER_CH_FEATURES]; // RMS is index 0 per channel
        total_rms_sq += r * r;
    }
    if (sqrtf(total_rms_sq) < REST_ACTIVITY_THRESHOLD) {
        *confidence = 1.0f;
        return GESTURE_REST;
    }

    // 3. Specialist LDAs
    float prob_td[MODEL_NUM_CLASSES];
    float prob_fd[MODEL_NUM_CLASSES];
    float prob_cc[MODEL_NUM_CLASSES];

    lda_softmax(features + TD_FEAT_OFFSET, TD_NUM_FEATURES,
                (const float *)LDA_TD_WEIGHTS, LDA_TD_INTERCEPTS,
                MODEL_NUM_CLASSES, prob_td);
    lda_softmax(features + FD_FEAT_OFFSET, FD_NUM_FEATURES,
                (const float *)LDA_FD_WEIGHTS, LDA_FD_INTERCEPTS,
                MODEL_NUM_CLASSES, prob_fd);
    lda_softmax(features + CC_FEAT_OFFSET, CC_NUM_FEATURES,
                (const float *)LDA_CC_WEIGHTS, LDA_CC_INTERCEPTS,
                MODEL_NUM_CLASSES, prob_cc);

    // 4. Meta-LDA stacker
    float meta_in[META_NUM_INPUTS];  // = 3 * MODEL_NUM_CLASSES
    memcpy(meta_in,                        prob_td, MODEL_NUM_CLASSES * sizeof(float));
    memcpy(meta_in +   MODEL_NUM_CLASSES,  prob_fd, MODEL_NUM_CLASSES * sizeof(float));
    memcpy(meta_in + 2*MODEL_NUM_CLASSES,  prob_cc, MODEL_NUM_CLASSES * sizeof(float));

    float meta_probs[MODEL_NUM_CLASSES];
    lda_softmax(meta_in, META_NUM_INPUTS,
                (const float *)META_LDA_WEIGHTS, META_LDA_INTERCEPTS,
                MODEL_NUM_CLASSES, meta_probs);

    // 5. EMA smoothing on meta output
    float max_smooth = 0.0f;
    int winner = 0;
    for (int c = 0; c < MODEL_NUM_CLASSES; c++) {
        s_smoothed[c] = ENSEMBLE_EMA_ALPHA * s_smoothed[c] +
                        (1.0f - ENSEMBLE_EMA_ALPHA) * meta_probs[c];
        if (s_smoothed[c] > max_smooth) { max_smooth = s_smoothed[c]; winner = c; }
    }

    // 6. Confidence cascade: escalate to MLP if meta-LDA is uncertain
    if (max_smooth < ENSEMBLE_CONF_THRESHOLD) {
        float mlp_conf = 0.0f;
        int mlp_winner = inference_mlp_predict(features, MODEL_NUM_FEATURES, &mlp_conf);
        if (mlp_conf > max_smooth) { winner = mlp_winner; max_smooth = mlp_conf; }
    }

    // 7. Reject if still uncertain
    if (max_smooth < REJECT_THRESHOLD) {
        *confidence = max_smooth;
        return s_current_output;
    }

    *confidence = max_smooth;

    // 8. Majority vote (window = 5)
    s_vote_history[s_vote_head] = winner;
    s_vote_head = (s_vote_head + 1) % 5;
    int counts[MODEL_NUM_CLASSES] = {0};
    for (int i = 0; i < 5; i++)
        if (s_vote_history[i] >= 0) counts[s_vote_history[i]]++;
    int majority = 0, majority_cnt = 0;
    for (int c = 0; c < MODEL_NUM_CLASSES; c++)
        if (counts[c] > majority_cnt) { majority_cnt = counts[c]; majority = c; }

    // 9. Debounce (3 consecutive predictions to change output)
    int final = s_current_output;
    if (s_current_output == -1) {
        s_current_output = majority; final = majority;
    } else if (majority == s_current_output) {
        s_pending_output = majority; s_pending_count = 1;
    } else if (majority == s_pending_output) {
        if (++s_pending_count >= 3) { s_current_output = majority; final = majority; }
    } else {
        s_pending_output = majority; s_pending_count = 1;
    }

    return final;
}
```

### model_weights_ensemble.h Layout (generated by Change 7)

```c
// Auto-generated by train_ensemble.py — do not edit manually
#pragma once

#define MODEL_NUM_CLASSES    5       // auto-computed from training data
#define MODEL_NUM_FEATURES   69      // total feature count (after Change 1)
#define ENSEMBLE_PER_CH_FEATURES 20  // features per channel

// Specialist feature subset offsets and sizes
#define TD_FEAT_OFFSET  0
#define TD_NUM_FEATURES 36   // time-domain: indices 0–11, 20–31, 40–51
#define FD_FEAT_OFFSET  12   // NOTE: FD features are interleaved per-channel
#define FD_NUM_FEATURES 24   // freq-domain: indices 12–19, 32–39, 52–59
#define CC_FEAT_OFFSET  60
#define CC_NUM_FEATURES 9    // cross-channel: indices 60–68

#define META_NUM_INPUTS (3 * MODEL_NUM_CLASSES)  // = 15

// Specialist LDA weights (flat row-major: [n_classes][n_feat])
extern const float LDA_TD_WEIGHTS[MODEL_NUM_CLASSES][TD_NUM_FEATURES];
extern const float LDA_TD_INTERCEPTS[MODEL_NUM_CLASSES];

extern const float LDA_FD_WEIGHTS[MODEL_NUM_CLASSES][FD_NUM_FEATURES];
extern const float LDA_FD_INTERCEPTS[MODEL_NUM_CLASSES];

extern const float LDA_CC_WEIGHTS[MODEL_NUM_CLASSES][CC_NUM_FEATURES];
extern const float LDA_CC_INTERCEPTS[MODEL_NUM_CLASSES];

// Meta-LDA weights
extern const float META_LDA_WEIGHTS[MODEL_NUM_CLASSES][META_NUM_INPUTS];
extern const float META_LDA_INTERCEPTS[MODEL_NUM_CLASSES];

// Class names (for inference_get_gesture_enum)
extern const char *MODEL_CLASS_NAMES[MODEL_NUM_CLASSES];
```

**Important note on FD features**: the frequency-domain features are interleaved at indices
[12–19] for ch0, [32–39] for ch1, [52–59] for ch2. The `lda_softmax` call for LDA_FD must
pass a **gathered** (non-contiguous) sub-vector. The cleanest approach is to gather them into
a contiguous buffer before calling lda_softmax:

```c
// Gather FD features into contiguous buffer before LDA_FD
float fd_buf[FD_NUM_FEATURES];
for (int ch = 0; ch < HAND_NUM_CHANNELS; ch++)
    memcpy(fd_buf + ch*8, features + ch*20 + 12, 8 * sizeof(float));
lda_softmax(fd_buf, FD_NUM_FEATURES, ...);
```

Similarly for TD features. This gather costs <5 µs — negligible.

---

# PART VI — PYTHON/TRAINING CHANGES

## Change 0 — Forward Label Shift

**Priority**: Tier 1
**Source**: Meta Nature 2025, Methods: "Discrete-gesture time alignment"
**Why**: +100ms shift after onset detection gives the classifier 100ms of pre-event "building"
signal, dramatically cleaning the decision boundary near gesture onset.
**ESP32 impact**: None.

### Step 1 — Add Constant After Line 94

```python
# After: TRANSITION_END_MS = 150
LABEL_FORWARD_SHIFT_MS = 100  # shift label boundaries +100ms after onset alignment
                               # Source: Kaifosh et al. Nature 2025. doi:10.1038/s41586-025-09255-w
```

### Step 2 — Apply Shift in `SessionStorage.save_session()` (after line ~704)

Find and insert after:
```python
            print(f"[Storage] Labels aligned: {changed}/{len(labels)} windows shifted")
```

Insert:
```python
        if LABEL_FORWARD_SHIFT_MS > 0:
            shift_windows = max(1, round(LABEL_FORWARD_SHIFT_MS / HOP_SIZE_MS))
            shifted = list(aligned_labels)
            for i in range(1, len(aligned_labels)):
                if aligned_labels[i] != aligned_labels[i - 1]:
                    for j in range(i, min(i + shift_windows, len(aligned_labels))):
                        if shifted[j] == aligned_labels[i]:
                            shifted[j] = aligned_labels[i - 1]
            n_shifted = sum(1 for a, b in zip(aligned_labels, shifted) if a != b)
            aligned_labels = shifted
            print(f"[Storage] Forward label shift (+{LABEL_FORWARD_SHIFT_MS}ms): {n_shifted} windows adjusted")
```

### Step 3 — Reduce TRANSITION_START_MS

```python
TRANSITION_START_MS = 200   # was 300 — reduce because 100ms shift already adds pre-event context
```

**Verify**: printout shows `N windows adjusted` where N is 5–20% of total windows per session.

---

## Change 1 — Expanded Feature Set

**Priority**: Tier 2
**Why**: 12 → 69 features; adds frequency-domain and cross-channel information that is
structurally more informative than amplitude alone (Meta Extended Data Fig. 6).
**ESP32 impact**: retrain → export new `model_weights.h`; port selected features to C.

### Sub-change 1A — Expand `extract_features_single_channel()` (line 1448)

Replace the entire function body:

```python
    def extract_features_single_channel(self, signal: np.ndarray) -> dict:
        if getattr(self, 'reinhard', False):
            signal = 64.0 * signal / (32.0 + np.abs(signal))

        signal = signal - np.mean(signal)
        N = len(signal)

        # --- Time domain ---
        rms  = np.sqrt(np.mean(signal ** 2))
        diff = np.diff(signal)
        wl   = np.sum(np.abs(diff))
        zc_thresh  = self.zc_threshold_percent * rms
        ssc_thresh = (self.ssc_threshold_percent * rms) ** 2
        sign_ch = signal[:-1] * signal[1:] < 0
        zc  = int(np.sum(sign_ch & (np.abs(diff) > zc_thresh)))
        d_l = signal[1:-1] - signal[:-2]
        d_r = signal[1:-1] - signal[2:]
        ssc = int(np.sum((d_l * d_r) > ssc_thresh))
        mav  = np.mean(np.abs(signal))
        var  = np.mean(signal ** 2)
        iemg = np.sum(np.abs(signal))
        wamp = int(np.sum(np.abs(diff) > 0.15 * rms))

        # AR(4) via Yule-Walker
        ar = np.zeros(4)
        if rms > 1e-6:
            try:
                from scipy.linalg import solve_toeplitz
                r = np.array([np.dot(signal[i:], signal[:N-i]) / N for i in range(5)])
                if r[0] > 1e-10:
                    ar = solve_toeplitz(r[:4], -r[1:5])
            except Exception:
                pass

        # --- Frequency domain (20–500 Hz) ---
        freqs = np.fft.rfftfreq(N, d=1.0 / SAMPLING_RATE_HZ)
        psd   = np.abs(np.fft.rfft(signal)) ** 2 / N
        m     = (freqs >= 20) & (freqs <= 500)
        f_m, p_m = freqs[m], psd[m]
        tp = np.sum(p_m) + 1e-10
        mnf = float(np.sum(f_m * p_m) / tp)
        cum = np.cumsum(p_m)
        mdf = float(f_m[min(np.searchsorted(cum, tp / 2), len(f_m) - 1)])
        pkf = float(f_m[np.argmax(p_m)]) if len(p_m) > 0 else 0.0
        mnp = float(tp / max(len(p_m), 1))

        # Bandpower in 4 physiological bands (mirrors firmware esp-dsp FFT bands)
        bands = [(20, 80), (80, 150), (150, 300), (300, 500)]
        bp = [float(np.sum(psd[(freqs >= lo) & (freqs < hi)])) for lo, hi in bands]

        return {
            'rms': rms, 'wl': wl, 'zc': zc, 'ssc': ssc,
            'mav': mav, 'var': var, 'iemg': iemg, 'wamp': wamp,
            'ar1': float(ar[0]), 'ar2': float(ar[1]),
            'ar3': float(ar[2]), 'ar4': float(ar[3]),
            'mnf': mnf, 'mdf': mdf, 'pkf': pkf, 'mnp': mnp,
            'bp0': bp[0], 'bp1': bp[1], 'bp2': bp[2], 'bp3': bp[3],
        }
```

### Sub-change 1B — Update `extract_features_window()` Return Block (line 1482)

Replace the return section:

```python
        FEATURE_ORDER = ['rms', 'wl', 'zc', 'ssc', 'mav', 'var', 'iemg', 'wamp',
                         'ar1', 'ar2', 'ar3', 'ar4', 'mnf', 'mdf', 'pkf', 'mnp',
                         'bp0', 'bp1', 'bp2', 'bp3']
        NORMALIZE_KEYS = {'rms', 'wl', 'mav', 'iemg'}

        features = []
        for ch_features in all_ch_features:
            for key in FEATURE_ORDER:
                val = ch_features.get(key, 0.0)
                if self.normalize and key in NORMALIZE_KEYS:
                    val = val / norm_factor
                features.append(float(val))

        if self.cross_channel and window.shape[1] >= 2:
            sel   = window[:, channel_indices].astype(np.float32)
            wc    = sel - sel.mean(axis=0)
            cov   = (wc.T @ wc) / len(wc)
            ri, ci = np.triu_indices(len(channel_indices))
            features.extend(cov[ri, ci].tolist())
            stds = np.sqrt(np.diag(cov)) + 1e-10
            cor  = cov / np.outer(stds, stds)
            ro, co = np.triu_indices(len(channel_indices), k=1)
            features.extend(cor[ro, co].tolist())

        return np.array(features, dtype=np.float32)
```

### Sub-change 1C — Update `EMGFeatureExtractor.__init__()` (line 1430)

```python
    def __init__(self, zc_threshold_percent=0.1, ssc_threshold_percent=0.1,
                 channels=None, normalize=True, cross_channel=True, reinhard=False):
        self.zc_threshold_percent  = zc_threshold_percent
        self.ssc_threshold_percent = ssc_threshold_percent
        self.channels      = channels
        self.normalize     = normalize
        self.cross_channel = cross_channel
        self.reinhard      = reinhard
```

### Sub-change 1D — Update Feature Count in `extract_features_batch()` (line 1520)

Replace `n_features = n_channels * 4`:
```python
        per_ch = 20
        if self.cross_channel and n_channels >= 2:
            n_features = n_channels * per_ch + \
                         n_channels*(n_channels+1)//2 + n_channels*(n_channels-1)//2
        else:
            n_features = n_channels * per_ch
```

### Sub-change 1E — Update `get_feature_names()` (line 1545)

```python
    def get_feature_names(self, n_channels=0):
        ch_idx = self.channels if self.channels is not None else list(range(n_channels))
        ORDER = ['rms','wl','zc','ssc','mav','var','iemg','wamp',
                 'ar1','ar2','ar3','ar4','mnf','mdf','pkf','mnp','bp0','bp1','bp2','bp3']
        names = [f'ch{ch}_{f}' for ch in ch_idx for f in ORDER]
        if self.cross_channel and len(ch_idx) >= 2:
            n = len(ch_idx)
            names += [f'cov_ch{ch_idx[i]}_ch{ch_idx[j]}' for i in range(n) for j in range(i, n)]
            names += [f'cor_ch{ch_idx[i]}_ch{ch_idx[j]}' for i in range(n) for j in range(i+1, n)]
        return names
```

### Sub-change 1F — Update `EMGClassifier.__init__()` (line 1722)

```python
        self.feature_extractor = EMGFeatureExtractor(
            channels=HAND_CHANNELS, cross_channel=True, reinhard=False)
```

### Sub-change 1G — Update `save()` (line 1910) and `load()` (line 2089)

In `save()`, add to `feature_extractor_params` dict:
```python
                'cross_channel': getattr(self.feature_extractor, 'cross_channel', True),
                'reinhard':      getattr(self.feature_extractor, 'reinhard', False),
```

In `load()`, update `EMGFeatureExtractor(...)` constructor:
```python
        classifier.feature_extractor = EMGFeatureExtractor(
            zc_threshold_percent  = params.get('zc_threshold_percent', 0.1),
            ssc_threshold_percent = params.get('ssc_threshold_percent', 0.1),
            channels              = params.get('channels', HAND_CHANNELS),
            normalize             = params.get('normalize', False),
            cross_channel         = params.get('cross_channel', True),
            reinhard              = params.get('reinhard', False),
        )
```

### Also Fix Bug at Line 2382

```python
X, y, trial_ids, session_indices, label_names, loaded_sessions = storage.load_all_for_training()
```

---

## Change 2 — Electrode Repositioning Protocol

**Protocol**: no code changes.
> *"Between sessions within a single day, the participants remove and slightly reposition the
> sEMG wristband to enable generalization across different recording positions."*
> — Meta Nature 2025 Methods

- Session 1: standard placement
- Session 2: band 1–2 cm up the forearm
- Session 3: band 1–2 cm down the forearm
- Session 4+: slight axial rotation or return to any above position

The per-session z-score normalization in `_apply_session_normalization()` handles the
resulting amplitude shifts. Perform **fast, natural** gestures — not slow/deliberate.

---

## Change 3 — Data Augmentation

**Priority**: Tier 2. Apply to **raw windows BEFORE feature extraction**.

Insert before the `# === LDA CLASSIFIER ===` comment (~line 1709):

```python
def augment_emg_batch(X, y, multiplier=3, seed=42):
    """
    Augment raw EMG windows for training robustness.
    Must be called on raw windows (n_windows, n_samples, n_channels),
    not on pre-computed features.
    Source (window jitter): Kaifosh et al. Nature 2025. doi:10.1038/s41586-025-09255-w
    """
    rng = np.random.default_rng(seed)
    aug_X, aug_y = [X], [y]
    for _ in range(multiplier - 1):
        Xc = X.copy().astype(np.float32)
        Xc *= rng.uniform(0.80, 1.20, (len(X), 1, 1)).astype(np.float32)          # amplitude
        rms = np.sqrt(np.mean(Xc**2, axis=(1,2), keepdims=True)) + 1e-8
        Xc += rng.standard_normal(Xc.shape).astype(np.float32) * (0.05 * rms)     # noise
        Xc += rng.uniform(-20., 20., (len(X), 1, X.shape[2])).astype(np.float32)  # DC jitter
        shifts = rng.integers(-5, 6, size=len(X))
        for i in range(len(Xc)):
            if shifts[i]: Xc[i] = np.roll(Xc[i], shifts[i], axis=0)              # jitter
        aug_X.append(Xc); aug_y.append(y)
    return np.concatenate(aug_X), np.concatenate(aug_y)
```

In `EMGClassifier.train()`, replace the start of the function's feature extraction block:

```python
        if getattr(self, 'use_augmentation', True):
            X_aug, y_aug = augment_emg_batch(X, y, multiplier=3)
            print(f"[Classifier] Augmented: {len(X)} → {len(X_aug)} windows")
        else:
            X_aug, y_aug = X, y
        X_features = self.feature_extractor.extract_features_batch(X_aug)
        # ... then use y_aug instead of y for model.fit()
```

---

## Change 4 — Reinhard Compression (Optional)

**Formula**: `output = 64 × x / (32 + |x|)`
**Enable in Python**: set `reinhard=True` in `EMGFeatureExtractor` constructor (Change 1F).

**Enable in firmware** (`inference.c` `compute_features()`, after signal copy loop, before mean calc):
```c
#if MODEL_USE_REINHARD
    for (int i = 0; i < INFERENCE_WINDOW_SIZE; i++) {
        float x = signal[i];
        signal[i] = 64.0f * x / (32.0f + fabsf(x));
    }
#endif
```
Add `#define MODEL_USE_REINHARD 0` to `model_weights.h` (set to `1` when Python uses `reinhard=True`).
**Python and firmware MUST match.** Mismatch silently corrupts all predictions.

---

## Change 5 — Classifier Benchmark

**Purpose**: tells you whether LDA accuracy plateau is a features problem (all classifiers similar → add features) or a model complexity problem (SVM/MLP >> LDA → implement Change E/F).

Add after `run_training_demo()`:

```python
def run_classifier_benchmark():
    from sklearn.svm import SVC
    from sklearn.neural_network import MLPClassifier
    from sklearn.pipeline import Pipeline
    from sklearn.preprocessing import StandardScaler
    from sklearn.model_selection import cross_val_score, GroupKFold
    from sklearn.discriminant_analysis import LinearDiscriminantAnalysis, QuadraticDiscriminantAnalysis

    storage = SessionStorage()
    X_raw, y, trial_ids, session_indices, label_names, _ = storage.load_all_for_training()
    extractor = EMGFeatureExtractor(channels=HAND_CHANNELS, cross_channel=True)
    X = extractor.extract_features_batch(X_raw)
    X = EMGClassifier()._apply_session_normalization(X, session_indices, y=y)

    clfs = {
        'LDA (ESP32 model)':  LinearDiscriminantAnalysis(),
        'QDA':                QuadraticDiscriminantAnalysis(reg_param=0.1),
        'SVM-RBF':            Pipeline([('s', StandardScaler()), ('m', SVC(kernel='rbf', C=10))]),
        'MLP-128-64':         Pipeline([('s', StandardScaler()),
                                         ('m', MLPClassifier(hidden_layer_sizes=(128,64),
                                                             max_iter=1000, early_stopping=True))]),
    }
    gkf = GroupKFold(n_splits=5)
    print(f"\n{'Classifier':<22} {'Mean CV':>8} {'Std':>6}")
    print("-" * 40)
    for name, clf in clfs.items():
        sc = cross_val_score(clf, X, y, cv=gkf, groups=trial_ids, scoring='accuracy')
        print(f"  {name:<20} {sc.mean()*100:>7.1f}%  ±{sc.std()*100:.1f}%")
    print("\n  → If LDA ≈ SVM: features are the bottleneck (add Change 1 features)")
    print("  → If SVM >> LDA: model complexity bottleneck (implement Change F ensemble)")
```

---

## Change 6 — Simplified MPF Features

**Python training only** — not worth porting to ESP32 directly (use bandpower bp0–bp3 from Change 1 as the firmware-side approximation).

Add after `EMGFeatureExtractor` class:

```python
class MPFFeatureExtractor:
    """
    Simplified 3-channel MPF: CSD upper triangle per 6 frequency bands = 36 features.
    Python training only. Omits matrix logarithm (not needed for 3 channels).
    Source: Kaifosh et al. Nature 2025. doi:10.1038/s41586-025-09255-w
    ESP32 approximation: use bp0–bp3 from EMGFeatureExtractor (Change 1).
    """
    BANDS = [(0,62),(62,125),(125,187),(187,250),(250,375),(375,500)]

    def __init__(self, channels=None, log_diagonal=True):
        self.channels = channels or HAND_CHANNELS
        self.log_diag = log_diagonal
        self.n_ch = len(self.channels)
        self._r, self._c = np.triu_indices(self.n_ch)
        self.n_features = len(self.BANDS) * len(self._r)

    def extract_window(self, window):
        sig   = window[:, self.channels].astype(np.float64)
        N     = len(sig)
        freqs = np.fft.rfftfreq(N, d=1.0/SAMPLING_RATE_HZ)
        Xf    = np.fft.rfft(sig, axis=0)
        feats = []
        for lo, hi in self.BANDS:
            mask = (freqs >= lo) & (freqs < hi)
            if not mask.any():
                feats.extend([0.0] * len(self._r)); continue
            CSD = (Xf[mask].conj().T @ Xf[mask]).real / N
            if self.log_diag:
                for k in range(self.n_ch): CSD[k,k] = np.log(max(CSD[k,k], 1e-10))
            feats.extend(CSD[self._r, self._c].tolist())
        return np.array(feats, dtype=np.float32)

    def extract_batch(self, X):
        out = np.zeros((len(X), self.n_features), dtype=np.float32)
        for i in range(len(X)): out[i] = self.extract_window(X[i])
        return out
```

In `EMGClassifier.train()`, after standard feature extraction:
```python
        if getattr(self, 'use_mpf', False):
            mpf = MPFFeatureExtractor(channels=HAND_CHANNELS)
            X_features = np.hstack([X_features, mpf.extract_batch(X_aug)])
```

---

## Change 7 — Ensemble Training

**Priority**: Tier 3 (implements Change F's training side)
**New file**: `C:/VSCode/Marvel_Projects/Bucky_Arm/train_ensemble.py`

```python
"""
Train the full 3-specialist-LDA + meta-LDA ensemble.
Requires Change 1 (expanded features) to be implemented first.
Exports model_weights_ensemble.h for firmware Change F.

Architecture:
  LDA_TD (36 time-domain feat) ─┐
  LDA_FD (24 freq-domain feat)  ├─ 15 probs ─► Meta-LDA ─► final class
  LDA_CC (9  cross-ch feat)     ─┘
"""
import numpy as np
from pathlib import Path
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.model_selection import cross_val_predict, GroupKFold, cross_val_score
import sys
sys.path.insert(0, str(Path(__file__).parent))
from learning_data_collection import (
    SessionStorage, EMGFeatureExtractor, HAND_CHANNELS
)

# ─── Load and extract features ───────────────────────────────────────────────
storage = SessionStorage()
X_raw, y, trial_ids, session_indices, label_names, _ = storage.load_all_for_training()

extractor = EMGFeatureExtractor(channels=HAND_CHANNELS, cross_channel=True)
X = extractor.extract_features_batch(X_raw).astype(np.float64)

# Per-session normalization (same as EMGClassifier._apply_session_normalization)
from sklearn.preprocessing import StandardScaler
for sid in np.unique(session_indices):
    mask = session_indices == sid
    sc = StandardScaler()
    X[mask] = sc.fit_transform(X[mask])

feat_names = extractor.get_feature_names(n_channels=len(HAND_CHANNELS))
n_cls = len(np.unique(y))

# ─── Feature subset indices ───────────────────────────────────────────────────
TD_FEAT = ['rms','wl','zc','ssc','mav','var','iemg','wamp','ar1','ar2','ar3','ar4']
FD_FEAT = ['mnf','mdf','pkf','mnp','bp0','bp1','bp2','bp3']

td_idx = [i for i,n in enumerate(feat_names) if any(n.endswith(f'_{f}') for f in TD_FEAT)]
fd_idx = [i for i,n in enumerate(feat_names) if any(n.endswith(f'_{f}') for f in FD_FEAT)]
cc_idx = [i for i,n in enumerate(feat_names) if n.startswith('cov_') or n.startswith('cor_')]

print(f"Feature subsets — TD: {len(td_idx)}, FD: {len(fd_idx)}, CC: {len(cc_idx)}")

X_td = X[:, td_idx]
X_fd = X[:, fd_idx]
X_cc = X[:, cc_idx]

# ─── Train specialist LDAs with out-of-fold stacking ─────────────────────────
gkf = GroupKFold(n_splits=5)

print("Training specialist LDAs (out-of-fold for stacking)...")
lda_td = LinearDiscriminantAnalysis()
lda_fd = LinearDiscriminantAnalysis()
lda_cc = LinearDiscriminantAnalysis()

oof_td = cross_val_predict(lda_td, X_td, y, cv=gkf, groups=trial_ids, method='predict_proba')
oof_fd = cross_val_predict(lda_fd, X_fd, y, cv=gkf, groups=trial_ids, method='predict_proba')
oof_cc = cross_val_predict(lda_cc, X_cc, y, cv=gkf, groups=trial_ids, method='predict_proba')

# Specialist CV accuracy (for diagnostics)
for name, mdl, Xs in [('LDA_TD', lda_td, X_td), ('LDA_FD', lda_fd, X_fd), ('LDA_CC', lda_cc, X_cc)]:
    sc = cross_val_score(mdl, Xs, y, cv=gkf, groups=trial_ids)
    print(f"  {name}: {sc.mean()*100:.1f}% ± {sc.std()*100:.1f}%")

# ─── Train meta-LDA on out-of-fold outputs ───────────────────────────────────
X_meta = np.hstack([oof_td, oof_fd, oof_cc])   # (n_samples, 3*n_cls = 15)
meta_lda = LinearDiscriminantAnalysis()
meta_sc = cross_val_score(meta_lda, X_meta, y, cv=gkf, groups=trial_ids)
print(f"  Meta-LDA: {meta_sc.mean()*100:.1f}% ± {meta_sc.std()*100:.1f}%")

# Fit all models on full dataset for deployment
lda_td.fit(X_td, y); lda_fd.fit(X_fd, y); lda_cc.fit(X_cc, y)
meta_lda.fit(X_meta, y)

# ─── Export all weights to C header ──────────────────────────────────────────
def lda_to_c_arrays(lda, name, feat_dim, n_cls, label_names, class_order):
    """Generate C array strings for LDA weights and intercepts."""
    # Reorder classes to match label_names order
    coef = lda.coef_    # shape (n_cls, feat_dim) for LinearDiscriminantAnalysis
    intercept = lda.intercept_
    lines = []
    lines.append(f"const float {name}_WEIGHTS[{n_cls}][{feat_dim}] = {{")
    for c in class_order:
        row = ', '.join(f'{v:.8f}f' for v in coef[c])
        lines.append(f"    {{{row}}},  // {label_names[c]}")
    lines.append("};")
    lines.append(f"const float {name}_INTERCEPTS[{n_cls}] = {{")
    intercept_str = ', '.join(f'{intercept[c]:.8f}f' for c in class_order)
    lines.append(f"    {intercept_str}")
    lines.append("};")
    return '\n'.join(lines)

class_order = list(range(n_cls))
out_path = Path('EMG_Arm/src/core/model_weights_ensemble.h')

with open(out_path, 'w') as f:
    f.write("// Auto-generated by train_ensemble.py — do not edit\n")
    f.write("#pragma once\n\n")
    f.write(f"#define MODEL_NUM_CLASSES    {n_cls}\n")
    f.write(f"#define MODEL_NUM_FEATURES   {X.shape[1]}\n")
    f.write(f"#define ENSEMBLE_PER_CH_FEATURES 20\n\n")
    f.write(f"#define TD_FEAT_OFFSET  {min(td_idx)}\n")
    f.write(f"#define TD_NUM_FEATURES {len(td_idx)}\n")
    f.write(f"#define FD_FEAT_OFFSET  {min(fd_idx)}\n")
    f.write(f"#define FD_NUM_FEATURES {len(fd_idx)}\n")
    f.write(f"#define CC_FEAT_OFFSET  {min(cc_idx)}\n")
    f.write(f"#define CC_NUM_FEATURES {len(cc_idx)}\n")
    f.write(f"#define META_NUM_INPUTS ({3} * MODEL_NUM_CLASSES)\n\n")

    f.write(lda_to_c_arrays(lda_td,  'LDA_TD',  len(td_idx), n_cls, label_names, class_order))
    f.write('\n\n')
    f.write(lda_to_c_arrays(lda_fd,  'LDA_FD',  len(fd_idx), n_cls, label_names, class_order))
    f.write('\n\n')
    f.write(lda_to_c_arrays(lda_cc,  'LDA_CC',  len(cc_idx), n_cls, label_names, class_order))
    f.write('\n\n')
    f.write(lda_to_c_arrays(meta_lda, 'META_LDA', 3*n_cls,   n_cls, label_names, class_order))
    f.write('\n\n')

    names_str = ', '.join(f'"{label_names[c]}"' for c in class_order)
    f.write(f"const char *MODEL_CLASS_NAMES[MODEL_NUM_CLASSES] = {{{names_str}}};\n")

print(f"Exported ensemble weights to {out_path}")
print(f"Total weight storage: {(len(td_idx)+len(fd_idx)+len(cc_idx)+3*n_cls)*n_cls*4} bytes float32")
```

**Note on LinearDiscriminantAnalysis with multi-class**: scikit-learn's LDA uses a
`(n_classes-1, n_features)` coef matrix for multi-class. Verify `lda.coef_.shape` after
fitting — if it is `(n_cls-1, n_feat)` rather than `(n_cls, n_feat)`, use the
`decision_function()` output structure and adjust the export accordingly.

---

# PART VII — FEATURE SELECTION FOR ESP32 PORTING

After Change 1 is trained, use this to decide what to port to C firmware.

### Step 1 — Get Feature Importance

```python
importance = np.abs(classifier.model.coef_).mean(axis=0)
feat_names  = classifier.feature_extractor.get_feature_names(n_channels=len(HAND_CHANNELS))
ranked = sorted(zip(feat_names, importance), key=lambda x: -x[1])
print("Top 20 features by LDA discriminative weight:")
for name, score in ranked[:20]:
    print(f"  {name:<35} {score:.4f}")
```

### Step 2 — Port Decision Matrix

| Feature | C Complexity | Prereq | Port? |
|---------|-------------|--------|-------|
| RMS, WL, ZC, SSC | ✓ Already in C | — | Keep |
| MAV, VAR, IEMG | Very easy (1 loop) | None | ✓ Yes |
| WAMP | Very easy (threshold on diff) | None | ✓ Yes |
| Cross-ch covariance | Easy (3×3 outer product) | None | ✓ Yes |
| Cross-ch correlation | Easy (normalize covariance) | Covariance | ✓ Yes |
| Bandpower bp0–bp3 | Medium (128-pt FFT via esp-dsp) | Add FFT call | ✓ Yes — highest ROI |
| MNF, MDF, PKF, MNP | Easy after FFT | Bandpower FFT | ✓ Free once FFT added |
| AR(4) | Medium (Levinson-Durbin in C) | None | Only if top-8 importance |

Once `dsps_fft2r_fc32()` is added for bandpower, MNF/MDF/PKF/MNP come free.

### Step 3 — Adding FFT-Based Features to inference.c

Add inside `compute_features()` loop, after time-domain features per channel:

```c
// 128-pt FFT for frequency-domain features per channel
// Zero-pad signal from INFERENCE_WINDOW_SIZE (150) to 128 by truncating
float fft_buf[256] = {0};  // 128 complex floats
for (int i = 0; i < 128 && i < INFERENCE_WINDOW_SIZE; i++) {
    fft_buf[2*i]   = signal[i];  // real
    fft_buf[2*i+1] = 0.0f;       // imag
}
dsps_fft2r_fc32(fft_buf, 128);
dsps_bit_rev_fc32(fft_buf, 128);

// Bandpower: bin k → freq = k * 1000/128 ≈ k * 7.8125 Hz
// Band 0: 20–80 Hz  → bins  3–10
// Band 1: 80–150 Hz → bins 10–19
// Band 2: 150–300 Hz→ bins 19–38
// Band 3: 300–500 Hz→ bins 38–64
int band_bins[5] = {3, 10, 19, 38, 64};
float bp[4] = {0,0,0,0};
for (int b = 0; b < 4; b++)
    for (int k = band_bins[b]; k < band_bins[b+1]; k++) {
        float re = fft_buf[2*k], im = fft_buf[2*k+1];
        bp[b] += re*re + im*im;
    }
// Store at correct indices (base = ch * 20)
int base = ch * 20;
features_out[base+16] = bp[0]; features_out[base+17] = bp[1];
features_out[base+18] = bp[2]; features_out[base+19] = bp[3];
```

---

# PART VIII — MEASUREMENT AND VALIDATION

## Baseline Protocol

**Run this BEFORE any change and after EACH change.**

```
1. python learning_data_collection.py → option 3 (Train Classifier)
2. Record:
   - "Mean CV accuracy: XX.X% ± Y.Y%"  (cross-validation)
   - Confusion matrix (which gesture pairs are most confused)
   - Per-gesture accuracy breakdown
3. On-device test:
   - Put on sensors, perform 10 reps of each gesture
   - Log classification output (UART or Python serial monitor)
   - Compute per-gesture accuracy manually
4. Record REST false-trigger rate: hold arm at rest for 30 seconds,
   count number of non-REST outputs
```

## Results Log

| Change | CV Acc Before | CV Acc After | Delta | On-Device Acc | False Triggers/30s | Keep? |
|--------|--------------|-------------|-------|---------------|-------------------|-------|
| Baseline | — | — | — | — | — | — |
| Change C (reject) | — | — | — | — | — | — |
| Change B (filter) | — | — | — | — | — | — |
| Change 0 (label shift) | — | — | — | — | — | — |
| Change 1 (features) | — | — | — | — | — | — |
| Change D (NVS calib) | — | — | — | — | — | — |
| Change 3 (augment) | — | — | — | — | — | — |
| Change 5 (benchmark) | — | — | — | — | — | — |
| Change 7+F (ensemble) | — | — | — | — | — | — |
| Change E (MLP) | — | — | — | — | — | — |

## When to Add More Gestures

| CV Accuracy | Recommendation |
|-------------|----------------|
| <80% | Do NOT add gestures — fix the existing 5 first |
| 80–90% | Adding 1–2 gestures is reasonable; expect 5–8% drop per new gesture |
| >90% | Good baseline; can add gestures; target staying above 85% |
| >95% | Excellent; can be ambitious with gesture count |

---

# PART IX — EXPORT WORKFLOW

## Path 1 — LDA / Ensemble (Changes 0–4, 7+F)

```
1. Train: python learning_data_collection.py → option 3  (single LDA)
         OR: python train_ensemble.py                     (full ensemble)

2. Export:
   Single LDA:  classifier.export_to_header(Path('EMG_Arm/src/core/model_weights.h'))
   Ensemble:    export_ensemble_header() in train_ensemble.py
                → writes model_weights_ensemble.h

3. Port new features to inference.c (if Change 1 features added):
   - Follow feature selection decision matrix (Part VII)
   - CRITICAL: C feature index order MUST match Python FEATURE_ORDER exactly

4. Build + flash: pio run -t upload
```

## Path 2 — int8 MLP via TFLM (Change E)

```
1. python train_mlp_tflite.py  → emg_model_data.cc
2. Add TFLM to platformio.ini lib_deps
3. Replace LDA inference call with inference_mlp_predict() in inference.c
   OR use inference_ensemble_predict() which calls MLP as fallback (Change F)
4. pio run -t upload
```

## Feature Index Contract (Critical)

The order of values written to `features_out[]` in `compute_features()` in C **must exactly
match** `FEATURE_ORDER` in `extract_features_window()` in Python, index for index.

To verify before flashing: print both the C feature names (from `MODEL_FEATURE_NAMES` if
added to header) and Python `extractor.get_feature_names()` and diff them.

---

# PART X — REFERENCES

**Primary paper**: Kaifosh, P., Reardon, T., et al. "A high-bandwidth neuromotor prosthesis
enabled by implicit information in intrinsic motor neurons." *Nature* (2025).
doi:10.1038/s41586-025-09255-w

**Meta codebase** (label alignment, CLER metric, model architectures):
`C:/VSCode/Marvel_Projects/Meta_Emg_Stuff/generic-neuromotor-interface/`
- `data.py`: onset detection, `searchsorted` alignment, window jitter
- `cler.py`: threshold=0.35, debounce=50ms, tolerance=±50/250ms
- `networks.py`: model architectures, left_context=20, stride=10
- `lightning.py`: `targets[..., left_context::stride]` label shift

**Barachant et al. 2012**: "Multiclass brain–computer interface classification by
Riemannian geometry." — matrix logarithm reference (MPF features).

**Espressif libraries**:
- esp-dsp: `github.com/espressif/esp-dsp` — biquad, FFT, dot-product
- esp-dl: `github.com/espressif/esp-dl` — quantized MLP/CNN inference
- TFLite Micro: `github.com/tensorflow/tflite-micro`

**All project files** (existing + planned):

```
── Laptop / Python ─────────────────────────────────────────────────────────────────────────
C:/VSCode/Marvel_Projects/Bucky_Arm/learning_data_collection.py  ← main: data collection + training
C:/VSCode/Marvel_Projects/Bucky_Arm/live_predict.py              ← NEW (Part 0.6): laptop-side live inference
C:/VSCode/Marvel_Projects/Bucky_Arm/train_ensemble.py            ← NEW (Change 7): ensemble training
C:/VSCode/Marvel_Projects/Bucky_Arm/train_mlp_tflite.py          ← NEW (Change E): int8 MLP export

── ESP32 Firmware — Existing ───────────────────────────────────────────────────────────────
C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/platformio.ini
  └─ ADD lib_deps: espressif/esp-dsp (Changes B,1,F), tensorflow/tflite-micro (Change E)
C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/config/config.h
  └─ MODIFY: remove system_mode_t; add EMG_STANDALONE to MAIN_MODE enum (Part 0.7, S1)
C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/app/main.c
  └─ MODIFY: add STATE_LAPTOP_PREDICT, CMD_START_LAPTOP_PREDICT, run_laptop_predict_loop(),
             run_standalone_loop() (Part 0.5)
C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/drivers/emg_sensor.c
  └─ MODIFY (Change A): migrate from adc_oneshot to adc_continuous driver
C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/core/inference.c
  └─ MODIFY: add inference_get_gesture_by_name(), IIR filter (B), features (1), confidence rejection (C)
C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/core/inference.h
  └─ MODIFY: add inference_get_gesture_by_name() declaration
C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/core/gestures.c
  └─ MODIFY: update gesture_names[] and gestures_execute() when adding gestures
C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/core/model_weights.h
  └─ AUTO-GENERATED by export_to_header() — do not edit manually

── ESP32 Firmware — New Files ──────────────────────────────────────────────────────────────
C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/core/bicep.h/.c        ← Part 0 / Section 2.2
C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/core/calibration.h/.c  ← Change D (NVS z-score)
C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/core/inference_ensemble.h/.c  ← Change F
C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/core/inference_mlp.h/.cc      ← Change E
C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/core/model_weights_ensemble.h ← AUTO-GENERATED (Change 7)
C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/core/emg_model_data.h/.cc     ← AUTO-GENERATED (Change E)
```
-												Multi-model prediction on laptop | need to move to on-board prediction

											
										
										
											2026-03-10 11:39:02 -05:00
+								# Bucky Arm — EMG Gesture Control: Master Implementation Reference
 								> Version: 2026-03-01 | Target: ESP32-S3 N32R16V (Xtensa LX7 @ 240 MHz, 512 KB SRAM, 16 MB OPI PSRAM)
 								> Supersedes: META_EMG_RESEARCH_NOTES.md + BUCKY_ARM_IMPROVEMENT_PLAN.md
 								> Source paper: doi:10.1038/s41586-025-09255-w (PDF: C:/VSCode/Marvel_Projects/s41586-025-09255-w.pdf)
 								---
 								## TABLE OF CONTENTS
 								- [PART 0 — SYSTEM ARCHITECTURE & RESPONSIBILITY ASSIGNMENT](#part-0--system-architecture--responsibility-assignment)
 								  - [0.1 Who Does What](#01-who-does-what)
 								  - [0.2 Operating Modes](#02-operating-modes)
 								  - [0.3 FSM Reference (EMG_MAIN mode)](#03-fsm-reference-emg_main-mode)
 								  - [0.4 EMG_STANDALONE Boot Sequence](#04-emg_standalone-boot-sequence)
 								  - [0.5 New Firmware Changes for Architecture](#05-new-firmware-changes-for-architecture)
 								  - [0.6 New Python Script: live_predict.py](#06-new-python-script-live_predictpy)
 								  - [0.7 Firmware Cleanup: system_mode_t Removal](#07-firmware-cleanup-system_mode_t-removal)
 								- [PART I — SYSTEM FOUNDATIONS](#part-i--system-foundations)
 								  - [1. Hardware Specification](#1-hardware-specification)
 								  - [2. Current System Snapshot](#2-current-system-snapshot)
 								  - [2.1 Confirmed Firmware Architecture](#21--confirmed-firmware-architecture-from-codebase-exploration)
 								  - [2.2 Bicep Channel Subsystem](#22--bicep-channel-subsystem-ch3--adc_channel_9--gpio-10)
 								  - [3. What Meta Built — Filtered for ESP32](#3-what-meta-built--filtered-for-esp32)
 								  - [4. Current Code State + Known Bugs](#4-current-code-state--known-bugs)
 								- [PART II — TARGET ARCHITECTURE](#part-ii--target-architecture)
 								  - [5. Full Recommended Multi-Model Stack](#5-full-recommended-multi-model-stack)
 								  - [6. Compute Budget for Full Stack](#6-compute-budget-for-full-stack)
 								  - [7. Why This Architecture Works for 3-Channel EMG](#7-why-this-architecture-works-for-3-channel-emg)
 								- [PART III — GESTURE EXTENSIBILITY](#part-iii--gesture-extensibility)
 								  - [8. What Changes When Adding or Removing a Gesture](#8-what-changes-when-adding-or-removing-a-gesture)
 								  - [9. Practical Limits of 3-Channel EMG](#9-practical-limits-of-3-channel-emg)
 								  - [10. Specific Gesture Considerations](#10-specific-gesture-considerations)
 								- [PART IV — CHANGE REFERENCE](#part-iv--change-reference)
 								  - [11. Change Classification Matrix](#11-change-classification-matrix)
 								- [PART V — FIRMWARE CHANGES](#part-v--firmware-changes)
 								  - [Change A — DMA-Driven ADC Sampling](#change-a--dma-driven-adc-sampling)
 								  - [Change B — IIR Biquad Bandpass Filter](#change-b--iir-biquad-bandpass-filter)
 								  - [Change C — Confidence Rejection](#change-c--confidence-rejection)
 								  - [Change D — On-Device NVS Calibration](#change-d--on-device-nvs-calibration)
 								  - [Change E — int8 MLP via TFLM](#change-e--int8-mlp-via-tflm)
 								  - [Change F — Ensemble Inference Pipeline](#change-f--ensemble-inference-pipeline)
 								- [PART VI — PYTHON/TRAINING CHANGES](#part-vi--pythontraining-changes)
 								  - [Change 0 — Forward Label Shift](#change-0--forward-label-shift)
 								  - [Change 1 — Expanded Feature Set](#change-1--expanded-feature-set)
 								  - [Change 2 — Electrode Repositioning](#change-2--electrode-repositioning)
 								  - [Change 3 — Data Augmentation](#change-3--data-augmentation)
 								  - [Change 4 — Reinhard Compression](#change-4--reinhard-compression)
 								  - [Change 5 — Classifier Benchmark](#change-5--classifier-benchmark)
 								  - [Change 6 — Simplified MPF Features](#change-6--simplified-mpf-features)
 								  - [Change 7 — Ensemble Training](#change-7--ensemble-training)
 								- [PART VII — FEATURE SELECTION FOR ESP32 PORTING](#part-vii--feature-selection-for-esp32-porting)
 								- [PART VIII — MEASUREMENT AND VALIDATION](#part-viii--measurement-and-validation)
 								- [PART IX — EXPORT WORKFLOW](#part-ix--export-workflow)
 								- [PART X — REFERENCES](#part-x--references)
 								---
 								# PART 0 — SYSTEM ARCHITECTURE & RESPONSIBILITY ASSIGNMENT
 								> This section is the authoritative reference for what runs where. All implementation
 								> decisions in later parts should be consistent with this partition.
 								## 0.1 Who Does What
 								| Responsibility | Laptop (Python) | ESP32 |
 								|----------------|-----------------|-------|
 								| EMG sensor reading | — | ✓ `emg_sensor_read()` always |
 								| Raw data streaming (for collection) | Receives CSV, saves to HDF5 | Streams CSV over UART |
 								| Model training | ✓ `learning_data_collection.py` | — |
 								| Model export | ✓ `export_to_header()` → `model_weights.h` | Compiled into firmware |
 								| On-device inference | — | ✓ `inference_predict()` |
 								| Laptop-side live inference | ✓ `live_predict.py` (new script) | Streams ADC + executes received cmd |
 								| Arm actuation | — (sends gesture string back to ESP32) | ✓ `gestures_execute()` |
 								| Autonomous operation (no laptop) | Not needed | ✓ `EMG_STANDALONE` mode |
 								| Bicep flex detection | — | ✓ `bicep_detect()` (new, Section 2.2) |
 								| NVS calibration | — | ✓ `calibration.c` (Change D) |
 								**Key rule**: The laptop is never required for real-time arm control in production.
 								The laptop's role is: collect data → train model → export → flash firmware → done.
 								After that, the ESP32 operates completely independently.
 								---
 								## 0.2 Operating Modes
 								Controlled by `#define MAIN_MODE` in `config/config.h`.
 								The enum currently reads `enum {EMG_MAIN, SERVO_CALIBRATOR, GESTURE_TESTER}`.
 								A new value `EMG_STANDALONE` must be added.
 								| `MAIN_MODE` | When to use | Laptop required? | Entry point |
 								|-------------|-------------|-----------------|-------------|
 								| `EMG_MAIN` | Development sessions, data collection, monitored operation | Yes — UART handshake to start any mode | `appConnector()` in `main.c` |
 								| `EMG_STANDALONE` | **Fully autonomous deployment** — no laptop | **No** — boots directly into predict+control | `run_standalone_loop()` (new function in `main.c`) |
 								| `SERVO_CALIBRATOR` | Hardware setup, testing servo range of motion | Yes (serial input) | Inline in `app_main()` |
 								| `GESTURE_TESTER` | Testing gesture→servo mapping via keyboard | Yes (serial input) | Inline in `app_main()` |
 								**How to switch mode**: change `#define MAIN_MODE` in `config.h` and reflash.
 								**To add `EMG_STANDALONE` to `config.h`** (1-line change):
 								```c
 								// config.h line 19 — current:
 								enum {EMG_MAIN, SERVO_CALIBRATOR, GESTURE_TESTER};
 								// Update to:
 								enum {EMG_MAIN, SERVO_CALIBRATOR, GESTURE_TESTER, EMG_STANDALONE};
 								```
 								---
 								## 0.3 FSM Reference (EMG_MAIN mode)
 								The `device_state_t` enum in `main.c` and the `command_t` enum control all transitions.
 								Currently: `{STATE_IDLE, STATE_CONNECTED, STATE_STREAMING, STATE_PREDICTING}`.
 								A new state `STATE_LAPTOP_PREDICT` must be added (see Section 0.5).
 								```
 								STATE_IDLE
 								  └─ {"cmd":"connect"} ──────────────────────────► STATE_CONNECTED
 								                                                         │
 								                               {"cmd":"start"} ──────────┤
 								                                                         │    STATE_STREAMING
 								                                                         │    ESP32 sends raw ADC CSV at 1kHz
 								                                                         │    Laptop: saves to HDF5 (data collection)
 								                                                         │    Laptop: trains model → exports model_weights.h
 								                                                         │    ◄──── {"cmd":"stop"} ────────────────────┘
 								                                                         │
 								                        {"cmd":"start_predict"} ─────────┤
 								                                                         │    STATE_PREDICTING
 								                                                         │    ESP32: inference_predict() on-device
 								                                                         │    ESP32: gestures_execute()
 								                                                         │    Laptop: optional UART monitor only
 								                                                         │    ◄──── {"cmd":"stop"} ────────────────────┘
 								                                                         │
 								                   {"cmd":"start_laptop_predict"} ───────┘
 								                                                              STATE_LAPTOP_PREDICT  [NEW]
 								                                                              ESP32: streams raw ADC CSV (same as STREAMING)
 								                                                              Laptop: runs live_predict.py inference
 								                                                              Laptop: sends {"gesture":"fist"} back
 								                                                              ESP32: executes received gesture command
 								                                                              ◄──── {"cmd":"stop"} ────────────────────┘
 								All active states:
 								  {"cmd":"stop"}       → STATE_CONNECTED
 								  {"cmd":"disconnect"} → STATE_IDLE
 								  {"cmd":"connect"}    → STATE_CONNECTED  (from any state — reconnect)
 								```
 								**Convenience table of commands and their effects:**
 								| JSON command | Valid from state | Result |
 								|---|---|---|
 								| `{"cmd":"connect"}` | Any | → `STATE_CONNECTED` |
 								| `{"cmd":"start"}` | `STATE_CONNECTED` | → `STATE_STREAMING` |
 								| `{"cmd":"start_predict"}` | `STATE_CONNECTED` | → `STATE_PREDICTING` |
 								| `{"cmd":"start_laptop_predict"}` | `STATE_CONNECTED` | → `STATE_LAPTOP_PREDICT` (new) |
 								| `{"cmd":"stop"}` | `STREAMING/PREDICTING/LAPTOP_PREDICT` | → `STATE_CONNECTED` |
 								| `{"cmd":"disconnect"}` | Any active state | → `STATE_IDLE` |
 								---
 								## 0.4 EMG_STANDALONE Boot Sequence
 								No UART handshake. No laptop required. Powers on → predicts → controls arm.
 								```
 								app_main() switch MAIN_MODE == EMG_STANDALONE:
 								  │
 								  ├── hand_init()            // servos
 								  ├── emg_sensor_init()      // ADC setup
 								  ├── inference_init()       // clear window buffer, reset smoothing state
 								  ├── calibration_init()     // load NVS z-score params (Change D)
 								  │       └── if not found in NVS:
 								  │               collect 120 REST windows (~3s at 25ms hop)
 								  │               call calibration_update() to compute and store stats
 								  ├── bicep_load_threshold() // load NVS bicep threshold (Section 2.2)
 								  │       └── if not found:
 								  │               collect 3s of still bicep data
 								  │               call bicep_calibrate() and bicep_save_threshold()
 								  │
 								  └── run_standalone_loop()  ← NEW function (added to main.c)
 								        while (1):
 								          emg_sensor_read(&sample)
 								          inference_add_sample(sample.channels)
 								          if stride_counter++ >= INFERENCE_HOP_SIZE:
 								            stride_counter = 0
 								            gesture_t g = inference_get_gesture_enum(inference_predict(&conf))
 								            gestures_execute(g)
 								            bicep_state_t b = bicep_detect()
 								            // (future: bicep_actuate(b))
 								          vTaskDelay(1)
 								```
 								`run_standalone_loop()` is structurally identical to `run_inference_loop()` in `EMG_MAIN`,
 								minus all UART state-change checking and telemetry prints. It runs forever until power-off.
 								**Where to add**: New function `run_standalone_loop()` in `app/main.c`, plus a new case
 								in the `app_main()` switch block:
 								```c
 								case EMG_STANDALONE:
 								    run_standalone_loop();
 								    break;
 								```
 								---
 								## 0.5 New Firmware Changes for Architecture
 								These changes are needed to implement the architecture above. They are **structural**
 								(not accuracy improvements) and should be done before any other changes.
 								### S1 — Add `EMG_STANDALONE` to `config.h`
 								**File**: `EMG_Arm/src/config/config.h`, line 19
 								```c
 								// Change:
 								enum {EMG_MAIN, SERVO_CALIBRATOR, GESTURE_TESTER};
 								// To:
 								enum {EMG_MAIN, SERVO_CALIBRATOR, GESTURE_TESTER, EMG_STANDALONE};
 								```
 								### S2 — Add `STATE_LAPTOP_PREDICT` to FSM (`main.c`)
 								**File**: `EMG_Arm/src/app/main.c`
 								```c
 								// In device_state_t enum — add new state:
 								typedef enum {
 								  STATE_IDLE = 0,
 								  STATE_CONNECTED,
 								  STATE_STREAMING,
 								  STATE_PREDICTING,
 								  STATE_LAPTOP_PREDICT,  // ← ADD: streams ADC to laptop, executes laptop's gesture commands
 								} device_state_t;
 								// In command_t enum — add new command:
 								typedef enum {
 								  CMD_NONE = 0,
 								  CMD_CONNECT,
 								  CMD_START,
 								  CMD_START_PREDICT,
 								  CMD_START_LAPTOP_PREDICT,  // ← ADD
 								  CMD_STOP,
 								  CMD_DISCONNECT,
 								} command_t;
 								```
 								**In `parse_command()`** — add detection (place BEFORE the `"start"` check to avoid prefix collision):
 								```c
 								} else if (strncmp(value_start, "start_laptop_predict", 20) == 0) {
 								    return CMD_START_LAPTOP_PREDICT;
 								} else if (strncmp(value_start, "start_predict", 13) == 0) {
 								    return CMD_START_PREDICT;
 								} else if (strncmp(value_start, "start", 5) == 0) {
 								    return CMD_START;
 								```
 								**In `serial_input_task()` FSM switch** — add to `STATE_CONNECTED` block:
 								```c
 								} else if (cmd == CMD_START_LAPTOP_PREDICT) {
 								    g_device_state = STATE_LAPTOP_PREDICT;
 								    printf("[STATE] CONNECTED -> LAPTOP_PREDICT\n");
 								    xQueueSend(g_cmd_queue, &cmd, 0);
 								}
 								```
 								**Add to the active-state check** in `serial_input_task()`:
 								```c
 								case STATE_STREAMING:
 								case STATE_PREDICTING:
 								case STATE_LAPTOP_PREDICT:  // ← ADD to the case list
 								    if (cmd == CMD_STOP) { ... }
 								```
 								**New function `run_laptop_predict_loop()`** (add alongside `stream_emg_data()` and `run_inference_loop()`):
 								```c
 								/**
 								 * @brief Laptop-mediated prediction loop (STATE_LAPTOP_PREDICT).
 								 *
 								 * Streams raw ADC CSV to laptop for inference.
 								 * Simultaneously reads gesture commands sent back by laptop.
 								 * Executes received gesture immediately.
 								 *
 								 * Laptop sends: {"gesture":"fist"}\n  OR  {"gesture":"rest"}\n  etc.
 								 * ESP32 parses the "gesture" field and calls inference_get_gesture_enum() + gestures_execute().
 								 */
 								static void run_laptop_predict_loop(void) {
 								    emg_sample_t sample;
 								    char cmd_buf[64];
 								    int cmd_idx = 0;
 								    printf("{\"status\":\"info\",\"msg\":\"Laptop-predict mode started\"}\n");
 								    while (g_device_state == STATE_LAPTOP_PREDICT) {
 								        // 1. Send raw ADC sample (same format as STATE_STREAMING)
 								        emg_sensor_read(&sample);
 								        printf("%u,%u,%u,%u\n", sample.channels[0], sample.channels[1],
 								               sample.channels[2], sample.channels[3]);
 								        // 2. Non-blocking read of any incoming gesture command from laptop
 								        //    (serial_input_task already handles FSM commands; this handles gesture commands)
 								        //    Note: getchar() is non-blocking when there is no data (returns EOF).
 								        //    Gesture messages from laptop look like: {"gesture":"fist"}\n
 								        int c = getchar();
 								        if (c != EOF && c != 0xFF) {
 								            if (c == '\n' || c == '\r') {
 								                if (cmd_idx > 0) {
 								                    cmd_buf[cmd_idx] = '\0';
 								                    // Parse {"gesture":"<name>"} — look for "gesture" field
 								                    const char *g = strstr(cmd_buf, "\"gesture\"");
 								                    if (g) {
 								                        const char *v = strchr(g, ':');
 								                        if (v) {
 								                            v++;
 								                            while (*v == ' ' || *v == '"') v++;
 								                            // Extract gesture name up to closing quote
 								                            char name[32] = {0};
 								                            int ni = 0;
 								                            while (*v && *v != '"' && ni < 31) name[ni++] = *v++;
 								                            name[ni] = '\0';
 								                            // Map name to enum and execute (reuse inference mapping)
 								                            gesture_t gesture = (gesture_t)inference_get_gesture_enum_by_name(name);
 								                            if (gesture != GESTURE_NONE) {
 								                                gestures_execute(gesture);
 								                            }
 								                        }
 								                    }
 								                    cmd_idx = 0;
 								                }
 								            } else if (cmd_idx < (int)sizeof(cmd_buf) - 1) {
 								                cmd_buf[cmd_idx++] = (char)c;
 								            } else {
 								                cmd_idx = 0;
 								            }
 								        }
 								        vTaskDelay(1);
 								    }
 								}
 								```
 								**Note**: `inference_get_gesture_enum_by_name(const char *name)` is just the existing
 								`inference_get_gesture_enum(int class_idx)` refactored to accept a string directly
 								(bypassing the class_idx lookup). Alternatively, keep the existing function and add a
 								simple wrapper — the string matching logic already exists in `inference.c`:
 								```c
 								// Simpler: reuse the existing strcmp chain in inference_get_gesture_enum()
 								// by passing the name through a helper that returns the gesture_t directly.
 								// Add to inference.c / inference.h:
 								gesture_t inference_get_gesture_by_name(const char *name);
 								// (same strcmp logic as inference_get_gesture_enum, but returns gesture_t directly)
 								```
 								**In `state_machine_loop()`** — add the new state:
 								```c
 								static void state_machine_loop(void) {
 								    command_t cmd;
 								    const TickType_t poll_interval = pdMS_TO_TICKS(50);
 								    while (1) {
 								        if      (g_device_state == STATE_STREAMING)        stream_emg_data();
 								        else if (g_device_state == STATE_PREDICTING)       run_inference_loop();
 								        else if (g_device_state == STATE_LAPTOP_PREDICT)   run_laptop_predict_loop();  // ← ADD
 								        xQueueReceive(g_cmd_queue, &cmd, poll_interval);
 								    }
 								}
 								```
 								**In `app_main()` switch** — add the standalone case:
 								```c
 								case EMG_STANDALONE:
 								    run_standalone_loop();  // new function — see Section 0.4
 								    break;
 								```
 								---
 								## 0.6 New Python Script: `live_predict.py`
 								**Location**: `C:/VSCode/Marvel_Projects/Bucky_Arm/live_predict.py` (new file)
 								**Purpose**: Laptop-side live inference. Reads raw ADC stream from ESP32, runs the Python
 								classifier, sends gesture commands back to ESP32 for arm control.
 								**When to use**: `EMG_MAIN` + `STATE_LAPTOP_PREDICT` — useful for debugging and comparing
 								laptop accuracy vs on-device accuracy before flashing a new model.
 								```python
 								"""
 								live_predict.py — Laptop-side live EMG inference for Bucky Arm.
 								Connects to ESP32, requests STATE_LAPTOP_PREDICT, reads raw ADC CSV,
 								runs the trained Python classifier, sends gesture commands back to ESP32.
 								Usage:
 								    python live_predict.py --port COM3 --model path/to/saved_model/
 								"""
 								import argparse
 								import time
 								import numpy as np
 								import serial
 								from pathlib import Path
 								import sys
 								sys.path.insert(0, str(Path(__file__).parent))
 								from learning_data_collection import (
 								    EMGClassifier, EMGFeatureExtractor, SessionStorage, HAND_CHANNELS,
 								    WINDOW_SIZE_SAMPLES, HOP_SIZE_SAMPLES, NUM_CHANNELS,
 								)
 								BAUD_RATE    = 921600
 								CALIB_SEC    = 3.0          # seconds of REST to collect for normalization at startup
 								CALIB_LABEL  = "rest"       # label used during calibration window
 								def parse_args():
 								    p = argparse.ArgumentParser()
 								    p.add_argument("--port",  required=True, help="Serial port, e.g. COM3 or /dev/ttyUSB0")
 								    p.add_argument("--model", required=True, help="Path to saved EMGClassifier model directory")
 								    return p.parse_args()
 								def handshake(ser):
 								    """Send connect command, wait for ack."""
 								    ser.write(b'{"cmd":"connect"}\n')
 								    deadline = time.time() + 5.0
 								    while time.time() < deadline:
 								        line = ser.readline().decode("utf-8", errors="ignore").strip()
 								        if "ack_connect" in line:
 								            print(f"[Handshake] Connected: {line}")
 								            return True
 								    raise RuntimeError("No ack_connect received within 5s")
 								def collect_calibration_windows(ser, n_windows, window_size, hop_size, n_channels):
 								    """Collect n_windows worth of REST data for normalization calibration."""
 								    print(f"[Calib] Collecting {n_windows} REST windows — hold arm still...")
 								    raw_buffer = np.zeros((window_size, n_channels), dtype=np.float32)
 								    windows = []
 								    sample_count = 0
 								    while len(windows) < n_windows:
 								        line = ser.readline().decode("utf-8", errors="ignore").strip()
 								        try:
 								            vals = [float(v) for v in line.split(",")]
 								            if len(vals) != n_channels:
 								                continue
 								        except ValueError:
 								            continue
 								        raw_buffer = np.roll(raw_buffer, -1, axis=0)
 								        raw_buffer[-1] = vals
 								        sample_count += 1
 								        if sample_count >= window_size and sample_count % hop_size == 0:
 								            windows.append(raw_buffer.copy())
 								    print(f"[Calib] Collected {len(windows)} windows. Computing normalization stats...")
 								    return np.array(windows)  # (n_windows, window_size, n_channels)
 								def main():
 								    args = parse_args()
 								    # Load trained classifier
 								    print(f"[Init] Loading classifier from {args.model}...")
 								    classifier = EMGClassifier()
 								    classifier.load(Path(args.model))
 								    extractor = classifier.feature_extractor
 								    ser = serial.Serial(args.port, BAUD_RATE, timeout=1.0)
 								    time.sleep(0.5)
 								    ser.reset_input_buffer()
 								    handshake(ser)
 								    # Request laptop-predict mode
 								    ser.write(b'{"cmd":"start_laptop_predict"}\n')
 								    print("[Control] Entered STATE_LAPTOP_PREDICT")
 								    # Calibration: collect 3s of REST for session normalization
 								    n_calib_windows = max(10, int(CALIB_SEC * 1000 / (HOP_SIZE_SAMPLES)))
 								    calib_raw = collect_calibration_windows(
 								        ser, n_calib_windows, WINDOW_SIZE_SAMPLES, HOP_SIZE_SAMPLES, NUM_CHANNELS
 								    )
 								    calib_features = extractor.extract_features_batch(calib_raw)
 								    calib_mean = calib_features.mean(axis=0)
 								    calib_std  = np.where(calib_features.std(axis=0) > 1e-6,
 								                          calib_features.std(axis=0), 1e-6)
 								    print("[Calib] Done. Starting live prediction...")
 								    # Live prediction loop
 								    raw_buffer   = np.zeros((WINDOW_SIZE_SAMPLES, NUM_CHANNELS), dtype=np.float32)
 								    sample_count = 0
 								    last_gesture = None
 								    try:
 								        while True:
 								            line = ser.readline().decode("utf-8", errors="ignore").strip()
 								            # Skip JSON telemetry lines from ESP32
 								            if line.startswith("{"):
 								                continue
 								            try:
 								                vals = [float(v) for v in line.split(",")]
 								                if len(vals) != NUM_CHANNELS:
 								                    continue
 								            except ValueError:
 								                continue
 								            # Slide window
 								            raw_buffer = np.roll(raw_buffer, -1, axis=0)
 								            raw_buffer[-1] = vals
 								            sample_count += 1
 								            if sample_count >= WINDOW_SIZE_SAMPLES and sample_count % HOP_SIZE_SAMPLES == 0:
 								                # Extract features and normalize with session stats
 								                feat = extractor.extract_features_window(raw_buffer)
 								                feat = (feat - calib_mean) / calib_std
 								                proba = classifier.model.predict_proba([feat])[0]
 								                class_idx = int(np.argmax(proba))
 								                gesture_name = classifier.label_names[class_idx]
 								                confidence   = float(proba[class_idx])
 								                # Send gesture command to ESP32
 								                cmd = f'{{"gesture":"{gesture_name}"}}\n'
 								                ser.write(cmd.encode("utf-8"))
 								                if gesture_name != last_gesture:
 								                    print(f"[Predict] {gesture_name:12s}  conf={confidence:.2f}")
 								                    last_gesture = gesture_name
 								    except KeyboardInterrupt:
 								        print("\n[Stop] Sending stop command...")
 								        ser.write(b'{"cmd":"stop"}\n')
 								        ser.close()
 								if __name__ == "__main__":
 								    main()
 								```
 								**Dependencies** (add to a `requirements.txt` in `Bucky_Arm/` if not already there):
 								```
 								pyserial
 								numpy
 								scikit-learn
 								```
 								---
 								## 0.7 Firmware Cleanup: `system_mode_t` Removal
 								`config.h` lines 94–100 define a `system_mode_t` typedef that is **not referenced anywhere**
 								in the firmware. It predates the current `device_state_t` FSM in `main.c` and conflicts
 								conceptually with it. Remove before starting implementation work.
 								**File**: `EMG_Arm/src/config/config.h`
 								**Remove** (lines 93–100):
 								```c
 								/**
 								 * @brief System operating modes.
 								 */
 								typedef enum {
 								    MODE_IDLE = 0,      /**< Waiting for commands */
 								    MODE_DATA_STREAM,   /**< Streaming EMG data to laptop */
 								    MODE_COMMAND,       /**< Executing gesture commands from laptop */
 								    MODE_DEMO,          /**< Running demo sequence */
 								    MODE_COUNT
 								} system_mode_t;
 								```
 								No other file references `system_mode_t` — the deletion is safe and requires no other changes.
 								---
 								# PART I — SYSTEM FOUNDATIONS
 								## 1. Hardware Specification
 								### ESP32-S3 N32R16V — Confirmed Hardware
 								| Resource | Spec | Implication |
 								|----------|------|-------------|
 								| CPU | Dual-core Xtensa LX7 @ 240 MHz | Pin inference to Core 1, sampling to Core 0 |
 								| SIMD | PIE 128-bit vector extension | esp-dsp exploits this for FFT, biquad, dot-product |
 								| Internal SRAM | ~512 KB | All hot-path buffers, model weights, inference state |
 								| OPI PSRAM | 16 MB (~80 MB/s) | ADC ring buffer, raw window storage — not hot path |
 								| Flash | 32 MB | Code + read-only model flatbuffers (TFLM path) |
 								| ADC | 2× SAR ADC, 12-bit, continuous DMA mode | Change A: use `adc_continuous` driver |
 								**Memory rules**:
 								- Tag inference code: `IRAM_ATTR` — prevents cache miss stalls
 								- Tag large ring buffers: `EXT_RAM_BSS_ATTR` — pushes to PSRAM automatically
 								- Never run hot-path loops from PSRAM (latency varies; ~10× slower than SRAM)
 								### Espressif Acceleration Libraries
 								| Library | Accelerates | Key Functions |
 								|---------|-------------|---------------|
 								| **esp-dsp** | IIR biquad, FFT (up to 4096-pt), vector dot-product, matrix ops — PIE SIMD | `dsps_biquad_f32`, `dsps_fft2r_fc32`, `dsps_dotprod_f32` |
 								| **esp-nn** | int8 FC, depthwise/pointwise Conv, activations — SIMD optimized | Used internally by esp-dl |
 								| **esp-dl** | High-level int8 inference: MLP, Conv1D, LSTM; activation buffer management | Small MLP / tiny CNN deployment |
 								| **TFLite Micro** | Standard int8 flatbuffer inference, tensor arena (static alloc) | Keras → TFLite → int8 workflow |
 								### Real-Time Budget (1000 Hz, 25ms hop)
 								| Stage | Cost | Notes |
 								|-------|------|-------|
 								| ADC DMA sampling | ~0 µs | Hardware; CPU-free |
 								| IIR biquad (3 ch, 2 stages) | <100 µs | `dsps_biquad_f32` |
 								| Feature extraction (69 feat) | ~1,200 µs | FFT-based features dominate |
 								| 3 specialist LDAs | ~150 µs | `dsps_dotprod_f32` per class |
 								| Meta-LDA (15 inputs) | ~10 µs | 75 MACs total |
 								| int8 MLP fallback [69→32→16→5] | ~250 µs | esp-nn FC kernels |
 								| Post-processing | <50 µs | EMA, vote, debounce |
 								| **Total (full ensemble)** | **~1,760 µs** | **14× margin within 25ms** |
 								### Hard No-Gos
 								| Technique | Why |
 								|-----------|-----|
 								| Full MPF with matrix logarithm | Eigendecomposition per window; fragile float32; no SIMD path |
 								| Conv1D(16→512) + 3×LSTM(512) | ~4 MB weights; LSTM sequential dependency — impossible |
 								| Any transformer / attention | O(n²); no int8 transformer kernels for MCU |
 								| On-device gradient updates | Inference only — no training infrastructure |
 								| Heap allocations on hot path | FreeRTOS heap fragmentation kills determinism |
 								---
 								## 2. Current System Snapshot
 								| Aspect | Current State |
 								|--------|--------------|
 								| Channels | 4 total; ch0–ch2 forearm (FCR, FCU, extensor), ch3 bicep (excluded from hand classifier) |
 								| Sampling | 1000 Hz, timer/polling (jitter — fix with Change A) |
 								| Window | 150 samples (150ms), 25-sample hop (25ms) |
 								| Features | 12: RMS, WL, ZC, SSC × 3 channels |
 								| Classifier | Single LDA, float32 weights in C header |
 								| Label alignment | RMS onset detection — missing +100ms forward shift (Change 0) |
 								| Normalization | Per-session z-score in Python; no on-device equivalent (Change D) |
 								| Smoothing | EMA (α=0.7) + majority vote (5) + debounce (3 counts) |
 								| Confidence rejection | None — always outputs a class (Change C) |
 								| Signal filtering | Analogue only via MyoWare (Change B adds software IIR) |
 								| Gestures | 5: fist, hook\_em, open, rest, thumbs\_up |
 								| Training data | 15 HDF5 sessions, 1 user |
 								---
 								## 2.1 — Confirmed Firmware Architecture (From Codebase Exploration)
 								> Confirmed by direct codebase inspection 2026-02-24. All file paths relative to
 								> `C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/`
 								### ADC Pin Mapping (`drivers/emg_sensor.c`)
 								| Channel | ADC Channel | GPIO | Muscle Location | Role in Classifier |
 								|---------|-------------|------|-----------------|-------------------|
 								| ch0 | `ADC_CHANNEL_1` | GPIO 2 | Forearm Belly (FCR) | Primary flexion signal |
 								| ch1 | `ADC_CHANNEL_2` | GPIO 3 | Forearm Extensors | Extension signal |
 								| ch2 | `ADC_CHANNEL_8` | GPIO 9 | Forearm Contractors (FCU) | Ulnar flexion signal |
 								| ch3 | `ADC_CHANNEL_9` | GPIO 10 | Bicep | Independent — see Section 2.2 |
 								**Current ADC driver**: `adc_oneshot` (polling — **NOT DMA continuous yet**; Change A migrates this)
 								- Attenuation: `ADC_ATTEN_DB_12` (0–3.9V full-scale range)
 								- Calibration: `adc_cali_curve_fitting` scheme
 								- Output: calibrated millivolts as `uint16_t` packed into `emg_sample_t.channels[4]`
 								- Timing: `vTaskDelay(1)` in `run_inference_loop()` provides the ~1ms sample interval
 								### Current Task Structure (`app/main.c`)
 								| Task | Priority | Stack | Core Pinning | Role |
 								|------|----------|-------|--------------|------|
 								| `app_main` (implicit) | Default | Default | None | Runs inference loop + state machine |
 								| `serial_input_task` | 5 | 4096 B | **None** | Parses UART JSON commands |
 								**No other tasks exist.** Change A will add `adc_sampling_task` pinned to Core 0.
 								The inference loop runs on `app_main`'s default task — no explicit core affinity.
 								### State Machine (`app/main.c`)
 								```
 								STATE_IDLE  ─(BLE/UART connect)─►  STATE_CONNECTED
 								                                         │
 								                   {"cmd": "start_stream"}▼
 								                                  STATE_STREAMING    (sends raw ADC over UART for Python)
 								                                         │
 								                  {"cmd": "start_predict"}▼
 								                                  STATE_PREDICTING   (runs run_inference_loop())
 								```
 								Communication: UART at 921600 baud, JSON framing.
 								### Complete Data Flow (Exact Function Names)
 								```
 								emg_sensor_read(&sample)
 								  │  drivers/emg_sensor.c
 								  │  adc_oneshot_read() × 4 channels → adc_cali_raw_to_voltage() → uint16_t mV
 								  │  Result: sample.channels[4] = {ch0_mV, ch1_mV, ch2_mV, ch3_mV}
 								  │
 								  ▼  Called every ~1ms (vTaskDelay(1) in run_inference_loop)
 								inference_add_sample(sample.channels)
 								  │  core/inference.c
 								  │  Writes to circular window_buffer[150][4]
 								  │  Returns true when buffer is full (after first 150 samples)
 								  │
 								  ▼  Called every 25 samples (stride_counter % INFERENCE_HOP_SIZE == 0)
 								inference_predict(&confidence)
 								  │  core/inference.c
 								  │  compute_features() → LDA scores → softmax → EMA → majority vote → debounce
 								  │  Returns: gesture class index (int), fills confidence (float)
 								  │
 								  ▼
 								inference_get_gesture_enum(class_idx)
 								  │  core/inference.c
 								  │  String match on MODEL_CLASS_NAMES[] → gesture_t enum value
 								  │
 								  ▼
 								gestures_execute(gesture)
 								     core/gestures.c
 								     switch(gesture) → servo PWM via LEDC driver
 								     Servo pins: GPIO 1,4,5,6,7 (Thumb, Index, Middle, Ring, Pinky)
 								```
 								### Current Buffer State
 								```c
 								// core/inference.c line 19:
 								static uint16_t window_buffer[INFERENCE_WINDOW_SIZE][NUM_CHANNELS];
 								//       ^^^^^^^^ MUST change to float when adding IIR filter (Change B)
 								//
 								// uint16_t: 150 × 4 × 2 = 1,200 bytes in internal SRAM
 								// float:    150 × 4 × 4 = 2,400 bytes in internal SRAM  (still trivially small)
 								//
 								// Reason for change: IIR filter outputs float; casting back to uint16_t loses
 								// sub-mV precision and re-introduces the quantization noise we just filtered out.
 								```
 								### `platformio.ini` Current State (`EMG_Arm/platformio.ini`)
 								**Current `lib_deps`**: **None** — completely empty, no external library dependencies.
 								Required additions per change tier:
 								| Change | Library | `platformio.ini` `lib_deps` entry |
 								|--------|---------|----------------------------------|
 								| B (IIR biquad) | esp-dsp | `espressif/esp-dsp @ ^2.0.0` |
 								| 1 (FFT features) | esp-dsp | (same — add once for both B and 1) |
 								| E (int8 MLP) | TFLite Micro | `tensorflow/tflite-micro` |
 								| F (ensemble) | esp-dsp | (same as B) |
 								Add to `platformio.ini` under `[env:esp32-s3-devkitc1-n16r16]`:
 								```ini
 								lib_deps =
 								    espressif/esp-dsp @ ^2.0.0
 								    ; tensorflow/tflite-micro   ← add this only when implementing Change E
 								```
 								---
 								## 2.2 — Bicep Channel Subsystem (ch3 / ADC_CHANNEL_9 / GPIO 10)
 								### Current Status
 								The bicep channel is:
 								- **Sampled**: `emg_sensor_read()` reads all 4 channels; `sample.channels[3]` holds bicep data
 								- **Excluded from hand classifier**: `HAND_NUM_CHANNELS = 3`; `compute_features()` explicitly
 								  loops `ch = 0` to `ch < HAND_NUM_CHANNELS` (i.e., ch0, ch1, ch2 only)
 								- **Not yet independently processed**: the comment in `inference.c` line 68
 								  (`"ch3 (bicep) is excluded — it will be processed independently"`) is aspirational —
 								  the independent processing is not yet implemented
 								### Phase 1 — Binary Flex/Unflex (Current Target)
 								Implement a simple RMS threshold detector as a new subsystem:
 								**New files:**
 								```
 								EMG_Arm/src/core/bicep.h
 								EMG_Arm/src/core/bicep.c
 								```
 								**bicep.h:**
 								```c
 								#pragma once
 								#include <stdint.h>
 								#include <stdbool.h>
 								typedef enum {
 								    BICEP_STATE_REST = 0,
 								    BICEP_STATE_FLEX = 1,
 								} bicep_state_t;
 								// Call once at session start with ~3s of relaxed bicep data.
 								// Returns the computed threshold (also stored internally).
 								float bicep_calibrate(const uint16_t *ch3_samples, int n_samples);
 								// Call every 25ms (same hop as hand gesture inference).
 								// Computes RMS on the last BICEP_WINDOW_SAMPLES from the ch3 circular buffer.
 								bicep_state_t bicep_detect(void);
 								// Load/save threshold to NVS (reuse calibration.c infrastructure from Change D)
 								bool bicep_save_threshold(float threshold_mv);
 								bool bicep_load_threshold(float *threshold_mv_out);
 								```
 								**Core logic (`bicep.c`):**
 								```c
 								#define BICEP_WINDOW_SAMPLES  50     // 50ms window at 1000Hz
 								#define BICEP_FLEX_MULTIPLIER 2.5f   // threshold = rest_rms × 2.5
 								#define BICEP_HYSTERESIS      1.3f   // prevents rapid toggling at threshold boundary
 								static float s_threshold_mv = 0.0f;
 								static bicep_state_t s_state = BICEP_STATE_REST;
 								float bicep_calibrate(const uint16_t *ch3_samples, int n_samples) {
 								    float rms_sq = 0.0f;
 								    for (int i = 0; i < n_samples; i++)
 								        rms_sq += (float)ch3_samples[i] * ch3_samples[i];
 								    float rest_rms = sqrtf(rms_sq / n_samples);
 								    s_threshold_mv = rest_rms * BICEP_FLEX_MULTIPLIER;
 								    printf("[Bicep] Calibrated: rest_rms=%.1f mV, threshold=%.1f mV\n",
 								           rest_rms, s_threshold_mv);
 								    return s_threshold_mv;
 								}
 								bicep_state_t bicep_detect(void) {
 								    // Compute RMS on last BICEP_WINDOW_SAMPLES from ch3 circular buffer
 								    // (ch3 values are stored in window_buffer[][3] alongside hand channels)
 								    float rms_sq = 0.0f;
 								    int idx = buffer_head;
 								    for (int i = 0; i < BICEP_WINDOW_SAMPLES; i++) {
 								        float v = (float)window_buffer[idx][3];  // ch3 = bicep
 								        rms_sq += v * v;
 								        idx = (idx + 1) % INFERENCE_WINDOW_SIZE;
 								    }
 								    float rms = sqrtf(rms_sq / BICEP_WINDOW_SAMPLES);
 								    // Hysteresis: require FLEX_MULTIPLIER to enter flex, 1.0× to exit
 								    if (s_state == BICEP_STATE_REST && rms > s_threshold_mv * BICEP_HYSTERESIS)
 								        s_state = BICEP_STATE_FLEX;
 								    else if (s_state == BICEP_STATE_FLEX && rms < s_threshold_mv)
 								        s_state = BICEP_STATE_REST;
 								    return s_state;
 								}
 								```
 								**Integration in `main.c` `run_inference_loop()`:**
 								```c
 								// Call alongside inference_predict() every 25ms:
 								if (stride_counter % INFERENCE_HOP_SIZE == 0) {
 								    float confidence;
 								    int class_idx     = inference_predict(&confidence);
 								    gesture_t gesture = inference_get_gesture_enum(class_idx);
 								    bicep_state_t bicep = bicep_detect();
 								    // Combined actuation: hand gesture + bicep state
 								    // Example: bicep flex can enable/disable certain gestures,
 								    // or control a separate elbow/wrist joint.
 								    gestures_execute(gesture);
 								    // bicep_actuate(bicep);  ← add when elbow motor is wired
 								}
 								```
 								**Calibration trigger (add to serial_input_task command parsing):**
 								```c
 								// {"cmd": "calibrate_bicep"}  → collect 3s of rest data, call bicep_calibrate()
 								```
 								### Phase 2 — Continuous Angle/Velocity Prediction (Future)
 								When ready to move beyond binary flex/unflex:
 . **Collect angle-labeled data**: hold arm at 0°, 15°, 30°, 45°, 60°, 75°, 90°;
 								   log RMS at each; collect 5+ reps per angle.
 . **Fit polynomial**: `angle = a0 + a1*rms + a2*rms²` (degree-2 usually sufficient);
 								   use `numpy.polyfit(rms_values, angles, deg=2)`.
 . **Store coefficients in NVS**: 3 floats via `nvs_set_blob()`.
 . **On-device evaluation**: `angle = a0 + rms*(a1 + rms*a2)` — 2 MACs per inference.
 . **Velocity**: `velocity = (angle_now - angle_prev) / HOP_MS` with low-pass smoothing.
 								### Including ch3 in Hand Gesture Classifier (for Wrist Rotation)
 								If/when wrist rotation or supination gestures are added:
 								```python
 								# learning_data_collection.py — change this constant:
 								HAND_CHANNELS = [0, 1, 2, 3]  # was [0, 1, 2]; include bicep for rotation gestures
 								```
 								Feature count becomes: 4 channels × 20 per-ch + 10 cross-ch covariances + 6 correlations = **96 total**.
 								The bicep subsystem is then retired and ch3 becomes part of the main gesture classifier.
 								---
 								## 3. What Meta Built — Filtered for ESP32
 								Meta's Nature 2025 paper (doi:10.1038/s41586-025-09255-w) describes a 16-channel wristband
 								running Conv1D(16→512)+3×LSTM(512). **That exact model is not portable to ESP32-S3** (~4 MB
 								weights). What IS transferable:
 								| Meta Technique | Transferability | Where Used |
 								|----------------|-----------------|-----------|
 								| +100ms forward label shift after onset detection | ✓ Direct copy | Change 0 |
 								| Frequency features > amplitude features (Extended Data Fig. 6) | ✓ Core insight | Change 1, Change 6 |
 								| Deliberate electrode repositioning between sessions | ✓ Protocol | Change 2 |
 								| Window jitter + amplitude augmentation | ✓ Training | Change 3 |
 								| Reinhard compression `64x/(32+|x|)` | ✓ Optional flag | Change 4 |
 								| EMA α=0.7, threshold=0.35, debounce=50ms | ✓ Already implemented | Change C |
 								| Specialist features → meta-learner stacking | ✓ Adapted | Change 7 + F |
 								| Conv1D+LSTM architecture | ✗ Too large | Not implementable |
 								| Full MPF with matrix logarithm | ✗ Eigendecomp too costly | Not implementable |
 								---
 								## 4. Current Code State + Known Bugs
 								**All Python changes**: `C:/VSCode/Marvel_Projects/Bucky_Arm/learning_data_collection.py`
 								**Firmware**: `C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/core/inference.c`
 								**Config**: `C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/config/config.h`
 								**Weights**: `C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/core/model_weights.h`
 								### Key Symbol Locations
 								| Symbol | Line | Notes |
 								|--------|------|-------|
 								| Constants block | 49–94 | `NUM_CHANNELS`, `SAMPLING_RATE_HZ`, `WINDOW_SIZE_MS`, etc. |
 								| `align_labels_with_onset()` | 442 | RMS onset detection |
 								| `filter_transition_windows()` | 529 | Removes onset/offset ambiguity windows |
 								| `SessionStorage.save_session()` | 643 | Calls onset alignment, saves HDF5 |
 								| `SessionStorage.load_all_for_training()` | 871 | Returns 6 values (see bug below) |
 								| `EMGFeatureExtractor` class | 1404 | Current: RMS, WL, ZC, SSC only |
 								| `extract_features_single_channel()` | 1448 | Per-channel feature dict |
 								| `extract_features_window()` | 1482 | Flat array + cross-channel |
 								| `extract_features_batch()` | 1520 | Batch wrapper |
 								| `get_feature_names()` | 1545 | String names for features |
 								| `CalibrationTransform` class | 1562 | z-score at Python-side inference |
 								| `EMGClassifier` class | 1713 | LDA/QDA wrapper |
 								| `EMGClassifier.__init__()` | 1722 | Creates `EMGFeatureExtractor` |
 								| `EMGClassifier.train()` | 1735 | Feature extraction + model fit |
 								| `EMGClassifier._apply_session_normalization()` | 1774 | Per-session z-score |
 								| `EMGClassifier.cross_validate()` | 1822 | GroupKFold, trial-level |
 								| `EMGClassifier.export_to_header()` | 1956 | Writes `model_weights.h` |
 								| `EMGClassifier.save()` | 1910 | Persists model params |
 								| `EMGClassifier.load()` | 2089 | Reconstructs from saved params |
 								| `run_training_demo()` | 2333 | Main training entry point |
 								| `inference.c` `compute_features()` | 68 | C feature extraction |
 								| `inference.c` `inference_predict()` | 158 | C LDA + smoothing pipeline |
 								### Pending Cleanups (Do Before Any Other Code Changes)
 								| Item | File | Action |
 								|------|------|--------|
 								| Remove `system_mode_t` | `config/config.h` lines 93–100 | Delete the unused typedef (see Part 0, Section 0.7) |
 								| Add `EMG_STANDALONE` to enum | `config/config.h` line 19 | Add value to the existing MAIN_MODE enum |
 								| Add `STATE_LAPTOP_PREDICT` + `CMD_START_LAPTOP_PREDICT` | `app/main.c` | See Part 0, Section 0.5 for exact diffs |
 								| Add `run_standalone_loop()` | `app/main.c` | New function — see Part 0, Section 0.4 |
 								| Add `run_laptop_predict_loop()` | `app/main.c` | New function — see Part 0, Section 0.5 |
 								| Add `inference_get_gesture_by_name()` | `core/inference.c` + `core/inference.h` | Small helper — extracts existing strcmp logic |
 								### Known Bug — Line 2382
 								```python
 								# BUG: load_all_for_training() returns 6 values; this call unpacks only 5.
 								# session_indices_combined is silently dropped — breaks per-session normalization.
 								X, y, trial_ids, label_names, loaded_sessions = storage.load_all_for_training()
 								# FIX (apply with Change 1):
 								X, y, trial_ids, session_indices, label_names, loaded_sessions = storage.load_all_for_training()
 								```
 								### Current `model_weights.h` State (as of 2026-02-14 training run)
 								| Constant | Value | Note |
 								|----------|-------|------|
 								| `MODEL_NUM_CLASSES` | 5 | fist, hook_em, open, rest, thumbs_up |
 								| `MODEL_NUM_FEATURES` | 12 | RMS, WL, ZC, SSC × 3 forearm channels |
 								| `MODEL_CLASS_NAMES` | `{"fist","hook_em","open","rest","thumbs_up"}` | Alphabetical order |
 								| `MODEL_NORMALIZE_FEATURES` | *not defined yet* | Add when enabling cross-ch norm (Change B) |
 								| `MODEL_USE_REINHARD` | *not defined yet* | Add when enabling Reinhard compression (Change 4) |
 								| `FEAT_ZC_THRESH` | `0.1f` | Fraction of RMS for zero-crossing threshold |
 								| `FEAT_SSC_THRESH` | `0.1f` | Fraction of RMS for slope sign change threshold |
 								The LDA_WEIGHTS and LDA_INTERCEPTS arrays are current trained values — do not modify manually.
 								They are regenerated by `EMGClassifier.export_to_header()` after each training run.
 								### Current Feature Vector (12 features — firmware contract)
 								```
 								ch0: [0]=rms  [1]=wl  [2]=zc  [3]=ssc
 								ch1: [4]=rms  [5]=wl  [6]=zc  [7]=ssc
 								ch2: [8]=rms  [9]=wl [10]=zc [11]=ssc
 								```
 								### Target Feature Vector (69 features after Change 1)
 								```
 								Per channel (×3 channels, 20 features each):
 								  [0] rms  [1] wl   [2] zc   [3] ssc   [4] mav   [5] var
 								  [6] iemg [7] wamp [8] ar1  [9] ar2  [10] ar3  [11] ar4
 								 [12] mnf [13] mdf  [14] pkf [15] mnp  [16] bp0  [17] bp1
 								 [18] bp2 [19] bp3
 								ch0: indices  0–19
 								ch1: indices 20–39
 								ch2: indices 40–59
 								Cross-channel (9 features):
 								 [60] cov_ch0_ch0  [61] cov_ch0_ch1  [62] cov_ch0_ch2
 								 [63] cov_ch1_ch1  [64] cov_ch1_ch2  [65] cov_ch2_ch2
 								 [66] cor_ch0_ch1  [67] cor_ch0_ch2  [68] cor_ch1_ch2
 								```
 								### Specialist Feature Subset Indices (for Change F + Change 7)
 								```
 								TD (time-domain, 36 feat): indices [0–11, 20–31, 40–51]
 								FD (frequency-domain, 24 feat): indices [12–19, 32–39, 52–59]
 								CC (cross-channel, 9 feat): indices [60–68]
 								```
 								---
 								# PART II — TARGET ARCHITECTURE
 								## 5. Full Recommended Multi-Model Stack
 								```
 								ADC (DMA, Change A)
 								  └── IIR Biquad filter per channel (Change B)
 								        └── 150-sample circular window buffer
 								              │
 								              ▼  [every 25ms]
 								        compute_features()  →  69-feature vector
 								              │
 								              ▼
 								        calibration_apply()  (Change D — NVS z-score)
 								              │
 								              ├─── Stage 1: Activity Gate ──────────────────────────────────┐
 								              │    total_rms < REST_THRESHOLD?  →  return GESTURE_REST      │
 								              │    (skips all inference during obvious idle)                 │
 								              │                                                              │
 								              ▼  (only reached when gesture is active)                      │
 								        Stage 2: Parallel Specialist LDAs (Change F)                        │
 								              ├── LDA_TD  [TD features, 36-dim]  →  prob_td[5]             │
 								              ├── LDA_FD  [FD features, 24-dim]  →  prob_fd[5]             │
 								              └── LDA_CC  [CC features,  9-dim]  →  prob_cc[5]             │
 								                                                                            │
 								              ▼                                                             │
 								        Stage 3: Meta-LDA stacker (Change F)                               │
 								              input: [prob_td | prob_fd | prob_cc]  (15-dim)               │
 								              output: meta_probs[5]                                         │
 								                                                                            │
 								              ▼                                                             │
 								        EMA smoothing (α=0.7) on meta_probs                                │
 								              │                                                             │
 								              ├── max smoothed prob ≥ 0.50? ────── Yes ──────────────────┐ │
 								              │                                                           │ │
 								              └── No: Stage 4 Confidence Cascade (Change E)              │ │
 								                    run int8 MLP on full 69-feat vector                  │ │
 								                    use higher-confidence winner                         │ │
 								                          │                                              │ │
 								                          └────────────────────────────────────────────►│ │
 								                                                                         │ │
 								              ◄────────────────────────────────────────────────────────── │ │
 								              │                                                            ◄─┘
 								              ▼
 								        Stage 5: Confidence rejection (Change C)
 								              max_prob < 0.40?  →  return current_output (hold / GESTURE_NONE)
 								              │
 								              ▼
 								        Majority vote (window=5) + Debounce (count=3)
 								              │
 								              ▼
 								        final gesture → actuation
 								```
 								### Model Weight Footprint
 								| Model | Input Dim | Weights | Memory (float32) |
 								|-------|-----------|---------|-----------------|
 								| LDA_TD | 36 | 5×36 = 180 | 720 B |
 								| LDA_FD | 24 | 5×24 = 120 | 480 B |
 								| LDA_CC | 9 | 5×9 = 45 | 180 B |
 								| Meta-LDA | 15 | 5×15 = 75 | 300 B |
 								| int8 MLP [69→32→16→5] | 69 | ~2,900 | ~2.9 KB int8 |
 								| **Total** | | | **~4.6 KB** |
 								All model weights fit comfortably in internal SRAM.
 								---
 								## 6. Compute Budget for Full Stack
 								| Stage | Cost | Cumulative |
 								|-------|------|-----------|
 								| Feature extraction (69 feat, 128-pt FFT ×3) | 1,200 µs | 1,200 µs |
 								| NVS calibration apply | 10 µs | 1,210 µs |
 								| Activity gate (RMS check) | 5 µs | 1,215 µs |
 								| LDA_TD (36 feat × 5 classes) | 50 µs | 1,265 µs |
 								| LDA_FD (24 feat × 5 classes) | 35 µs | 1,300 µs |
 								| LDA_CC (9 feat × 5 classes) | 15 µs | 1,315 µs |
 								| Meta-LDA (15 feat × 5 classes) | 10 µs | 1,325 µs |
 								| EMA + confidence check | 10 µs | 1,335 µs |
 								| int8 MLP (worst case, ~30% of hops) | 250 µs | 1,585 µs |
 								| Vote + debounce | 20 µs | 1,605 µs |
 								| **Worst-case total** | **1,760 µs** | **7% of 25ms budget** |
 								---
 								## 7. Why This Architecture Works for 3-Channel EMG
 								Three channels means limited spatial information. The ensemble compensates by extracting
 								**maximum diversity from the temporal and spectral dimensions**:
 								- **LDA_TD** specializes in muscle activation *intensity and dynamics* (how hard and fast is each muscle firing)
 								- **LDA_FD** specializes in muscle activation *frequency content* (motor unit recruitment patterns — slow vs. fast twitch fibres fire at different frequencies)
 								- **LDA_CC** specializes in *inter-muscle coordination* (which muscles co-activate — the spatial "fingerprint" of each gesture)
 								These three signal aspects are partially uncorrelated. A gesture that confuses LDA_TD (similar amplitude patterns) may be distinguishable by LDA_FD (different frequency recruitment) or LDA_CC (different co-activation pattern). The meta-LDA learns which specialist to trust for each gesture boundary.
 								The int8 MLP fallback handles the residual nonlinear cases: gesture pairs where the decision boundary is curved in feature space, which LDA (linear boundary only) cannot resolve.
 								---
 								# PART III — GESTURE EXTENSIBILITY
 								## 8. What Changes When Adding or Removing a Gesture
 								The system is designed for extensibility. Adding a gesture requires **3 firmware lines and a retrain**.
 								### What Changes Automatically (No Manual Code Edits)
 								| Component | How it adapts |
 								|-----------|--------------|
 								| `MODEL_NUM_CLASSES` in `model_weights.h` | Auto-computed from training data label count |
 								| LDA weight array dimensions | `[MODEL_NUM_CLASSES][MODEL_NUM_FEATURES]` — regenerated by `export_to_header()` |
 								| `MODEL_CLASS_NAMES` array | Regenerated by `export_to_header()` |
 								| All ensemble LDA weight arrays | Regenerated by `export_ensemble_header()` (Change 7) |
 								| int8 MLP output layer | Retrained with new class count; re-exported to TFLite |
 								| Meta-LDA input/output dims | `META_NUM_INPUTS = 3 × MODEL_NUM_CLASSES` — auto from Python |
 								### What Requires Manual Code Changes
 								**Python side** (`learning_data_collection.py`):
 								```python
 								# 1. Add gesture name to the gesture list (1 line)
 								# Find where GESTURES or similar list is defined (near constants block ~line 49)
 								GESTURES = ['fist', 'hook_em', 'open', 'rest', 'thumbs_up', 'wrist_flex']  # example
 								```
 								**Firmware — `config.h`** (1 line per gesture):
 								```c
 								// Add enum value
 								typedef enum {
 								    GESTURE_NONE     = 0,
 								    GESTURE_REST     = 1,
 								    GESTURE_FIST     = 2,
 								    GESTURE_OPEN     = 3,
 								    GESTURE_HOOK_EM  = 4,
 								    GESTURE_THUMBS_UP = 5,
 								    GESTURE_WRIST_FLEX = 6,  // ← add this line
 								} gesture_t;
 								```
 								**Firmware — `inference.c`** `inference_get_gesture_enum()` (2–3 lines per gesture):
 								```c
 								if (strcmp(name, "wrist_flex") == 0 || strcmp(name, "WRIST_FLEX") == 0)
 								    return GESTURE_WRIST_FLEX;
 								```
 								**Firmware — `gestures.c`** (2 changes — these are easy to miss):
 								```c
 								// 1. Add to gesture_names[] static array — index MUST match gesture_t enum value:
 								static const char *gesture_names[GESTURE_COUNT] = {
 								    "NONE",       // GESTURE_NONE = 0
 								    "REST",       // GESTURE_REST = 1
 								    "FIST",       // GESTURE_FIST = 2
 								    "OPEN",       // GESTURE_OPEN = 3
 								    "HOOK_EM",    // GESTURE_HOOK_EM = 4
 								    "THUMBS_UP",  // GESTURE_THUMBS_UP = 5
 								    "WRIST_FLEX", // GESTURE_WRIST_FLEX = 6  ← add here
 								};
 								// 2. Add case to gestures_execute() switch statement:
 								case GESTURE_WRIST_FLEX:
 								    gesture_wrist_flex();   // implement the actuation function
 								    break;
 								```
 								**Critical**: `GESTURE_COUNT` at the end of the `gesture_t` enum in `config.h` is used as the
 								array size for `gesture_names[]`. It updates automatically when new enum values are added before
 								it. Both `gesture_names[GESTURE_COUNT]` and the switch statement must be kept in sync with
 								`GESTURE_COUNT`. Mismatch causes a bounds-overrun or silent misclassification.
 								### Complete Workflow for Adding a Gesture
 								```
 . Python: add gesture string to GESTURES list in learning_data_collection.py (1 line)
 . Data: collect ≥10 sessions × ≥30 reps of new gesture
 								   (follow Change 2 protocol: vary electrode placement between sessions)
 . Train: python learning_data_collection.py → option 3
 								         OR: python train_ensemble.py (after Change 7 is implemented)
 . Export: export_to_header() OR export_ensemble_header()
 								   → overwrites model_weights.h / model_weights_ensemble.h with new class count
 . config.h: add enum value before GESTURE_COUNT (1 line):
 								       GESTURE_WRIST_FLEX = 6,   // ← insert before GESTURE_COUNT
 								       GESTURE_COUNT             // stays last — auto-counts
 . inference.c: add string mapping in inference_get_gesture_enum() (2 lines)
 . gestures.c: add name to gesture_names[] array at correct index (1 line)
 . gestures.c: add case to gestures_execute() switch statement (3 lines)
 . Implement actuation function for new gesture (servo angles)
 . Reflash and validate: pio run -t upload
 								```
 								**Exact files touched per new gesture (summary):**
 								| File | What to change |
 								|------|---------------|
 								| `learning_data_collection.py` | Add string to GESTURES list |
 								| `config/config.h` | Add enum value before `GESTURE_COUNT` |
 								| `core/inference.c` | Add `strcmp` case in `inference_get_gesture_enum()` |
 								| `core/gestures.c` | Add to `gesture_names[]` array + add switch case |
 								| `core/gestures.c` | Implement `gesture_<name>()` function with servo angles |
 								| `core/model_weights.h` | Auto-generated — do not edit manually |
 								### Removing a Gesture
 								Removing is the same process in reverse, with one additional step: filter the HDF5 training
 								data to exclude sessions that contain the removed gesture's label. The simplest approach is
 								to pass a label whitelist to `load_all_for_training()`:
 								```python
 								# Proposed addition to load_all_for_training() — add include_labels parameter
 								X, y, trial_ids, session_indices, label_names, sessions = \
 								    storage.load_all_for_training(include_labels=['fist', 'open', 'rest', 'thumbs_up'])
 								    # hook_em removed — existing session files are not modified
 								```
 								---
 								## 9. Practical Limits of 3-Channel EMG
 								This is the most important constraint for gesture count:
 								| Gesture Count | Expected Accuracy | Notes |
 								|--------------|-------------------|-------|
 								| 3–5 gestures | >90% achievable | Current baseline target |
 								| 6–8 gestures | 80–90% achievable | Requires richer features + ensemble |
 								| 9–12 gestures | 65–80% achievable | Diminishing returns; some pairs will be confused |
 								| 13+ gestures | <65% | Surface EMG with 3 channels cannot reliably separate this many |
 								**Why 3 channels limits gesture count**: Surface EMG captures the summed electrical activity of
 								many motor units under each electrode. With only 3 spatial locations, gestures that recruit
 								overlapping muscle groups (e.g., all finger-flexion gestures recruit FCR) produce similar
 								signals. The frequency and coordination features from Change 1 help, but there's a hard
 								information-theoretic limit imposed by channel count.
 								**Rule of thumb**: aim for ≤8 gestures with the current 3-channel setup. For more, add the
 								bicep channel (ch3, currently excluded) to get 4 channels — see Section 10.
 								---
 								## 10. Specific Gesture Considerations
 								### Wrist Flexion / Extension
 								- **Feasibility**: High — FCR (ch0) activates strongly for flexion; extensor group (ch2) for extension
 								- **Differentiation from finger gestures**: frequency content differs (wrist involves slower motor units)
 								- **Recommendation**: Add these before wrist rotation — more reliable with surface EMG
 								### Wrist Rotation (Supination / Pronation)
 								- **Feasibility**: Medium — the primary supinator is a deep muscle; surface electrodes capture it weakly
 								- **Key helper**: the bicep activates strongly during supination → **include ch3** (`HAND_CHANNELS = [0, 1, 2, 3]`)
 								- **Code change for 4 channels**: Python: `HAND_CHANNELS = [0, 1, 2, 3]`; firmware: `HAND_NUM_CHANNELS` auto-updates from the exported header since `MODEL_NUM_FEATURES` is recalculated
 								- **Caveat**: pronation vs. rest may be harder to distinguish than supination vs. rest
 								### Pinch / Precision Grasp
 								- **Feasibility**: Medium — involves intrinsic hand muscles poorly captured by forearm electrodes
 								- Likely confused with open hand depending on electrode placement
 								- Collect with careful placement; validate cross-session accuracy before relying on it
 								### Including ch3 (Bicep) for Wrist Gestures
 								To include the bicep channel in the hand gesture classifier:
 								```python
 								# learning_data_collection.py — change this constant
 								HAND_CHANNELS = [0, 1, 2, 3]  # was [0, 1, 2] — add bicep channel
 								```
 								Feature count: 4 channels × 20 per-channel features + 10 cross-channel covariances + 6 correlations = **96 total features**.
 								The ensemble architecture handles this automatically — specialist LDA weight dimensions
 								recalculate at training time.
 								---
 								# PART IV — CHANGE REFERENCE
 								## 11. Change Classification Matrix
 								| Change | Category | Priority | Files | ESP32 Reflash? | Retrain? | Risk |
 								|--------|----------|----------|-------|----------------|----------|------|
 								| **C** | Firmware | **Tier 1** | inference.c | ✓ | No | **Very Low** |
 								| **B** | Firmware | **Tier 1** | inference.c / filter.c | ✓ | No | Low |
 								| **A** | Firmware | **Tier 1** | adc_sampling.c | ✓ | No | Medium |
 								| **0** | Python | **Tier 1** | learning_data_collection.py | No | ✓ | Low |
 								| **1** | Python+C | **Tier 2** | learning_data_collection.py + inference.c | ✓ after | ✓ | Medium |
 								| **D** | Firmware | **Tier 2** | calibration.c/.h | ✓ | No | Medium |
 								| **2** | Protocol | **Tier 2** | None | No | ✓ new data | None |
 								| **3** | Python | **Tier 2** | learning_data_collection.py | No | ✓ | Low |
 								| **E** | Python+FW | **Tier 3** | train_mlp_tflite.py + firmware | ✓ | ✓ | High |
 								| **4** | Python+C | **Tier 3** | learning_data_collection.py + inference.c | ✓ if enabled | ✓ | Low |
 								| **5** | Python | **Tier 3** | learning_data_collection.py | No | No | None |
 								| **6** | Python | **Tier 3** | learning_data_collection.py | No | ✓ | Low |
 								| **7** | Python | **Tier 3** | new: train_ensemble.py | No | ✓ | Medium |
 								| **F** | Firmware | **Tier 3** | new: inference_ensemble.c | ✓ | No (needs 7 first) | Medium |
 								**Recommended implementation order**: C → B → A → 0 → 1 → D → 2 → 3 → 5 (benchmark) → 7+F → E
 								---
 								# PART V — FIRMWARE CHANGES
 								## Change A — DMA-Driven ADC Sampling (Migration from `adc_oneshot` to `adc_continuous`)
 								**Priority**: Tier 1
 								**Current driver**: `adc_oneshot_read()` polling in `drivers/emg_sensor.c`. Timing is
 								controlled by `vTaskDelay(1)` in `run_inference_loop()` — subject to FreeRTOS scheduler
 								jitter of ±0.5–1ms, which corrupts frequency-domain features and ADC burst grouping.
 								**Why**: `adc_continuous` runs entirely in hardware DMA. Sample-to-sample jitter drops from
 								±1ms to <10µs. CPU overhead between samples is zero. Required for frequency features (Change 1).
 								**Effort**: 2–4 hours (replace `emg_sensor_read()` internals; keep public API the same)
 								### ESP-IDF ADC Continuous API
 								```c
 								// --- Initialize (call once at startup) ---
 								adc_continuous_handle_t adc_handle = NULL;
 								adc_continuous_handle_cfg_t adc_cfg = {
 								    .max_store_buf_size = 4096,    // PSRAM ring buffer size (bytes)
 								    .conv_frame_size    = 256,     // bytes per conversion frame
 								};
 								adc_continuous_new_handle(&adc_cfg, &adc_handle);
 								// Actual hardware channel mapping (from emg_sensor.c):
 								// ch0 = ADC_CHANNEL_1 / GPIO 2  (Forearm Belly / FCR)
 								// ch1 = ADC_CHANNEL_2 / GPIO 3  (Forearm Extensors)
 								// ch2 = ADC_CHANNEL_8 / GPIO 9  (Forearm Contractors / FCU)
 								// ch3 = ADC_CHANNEL_9 / GPIO 10 (Bicep — independent subsystem)
 								adc_digi_pattern_config_t chan_cfg[4] = {
 								    {.atten = ADC_ATTEN_DB_12, .channel = ADC_CHANNEL_1, .unit = ADC_UNIT_1, .bit_width = ADC_BITWIDTH_12},
 								    {.atten = ADC_ATTEN_DB_12, .channel = ADC_CHANNEL_2, .unit = ADC_UNIT_1, .bit_width = ADC_BITWIDTH_12},
 								    {.atten = ADC_ATTEN_DB_12, .channel = ADC_CHANNEL_8, .unit = ADC_UNIT_1, .bit_width = ADC_BITWIDTH_12},
 								    {.atten = ADC_ATTEN_DB_12, .channel = ADC_CHANNEL_9, .unit = ADC_UNIT_1, .bit_width = ADC_BITWIDTH_12},
 								};
 								adc_continuous_config_t cont_cfg = {
 								    .sample_freq_hz = 4000,        // 4 channels × 1000 Hz = 4000 total samples/sec
 								    .conv_mode      = ADC_CONV_SINGLE_UNIT_1,
 								    .format         = ADC_DIGI_OUTPUT_FORMAT_TYPE2,
 								    .pattern_num    = 4,
 								    .adc_pattern    = chan_cfg,
 								};
 								adc_continuous_config(adc_handle, &cont_cfg);
 								// --- ISR callback (fires each frame) ---
 								static SemaphoreHandle_t s_adc_sem;
 								static bool IRAM_ATTR adc_conv_done_cb(
 								        adc_continuous_handle_t handle,
 								        const adc_continuous_evt_data_t *edata, void *user_data) {
 								    BaseType_t hp_woken = pdFALSE;
 								    xSemaphoreGiveFromISR(s_adc_sem, &hp_woken);
 								    return hp_woken == pdTRUE;
 								}
 								adc_continuous_evt_cbs_t cbs = { .on_conv_done = adc_conv_done_cb };
 								adc_continuous_register_event_callbacks(adc_handle, &cbs, NULL);
 								adc_continuous_start(adc_handle);
 								// --- ADC calibration (apply per sample) ---
 								adc_cali_handle_t cali_handle;
 								adc_cali_curve_fitting_config_t cali_cfg = {
 								    .unit_id  = ADC_UNIT_1,
 								    .atten    = ADC_ATTEN_DB_12,   // matches ADC_ATTEN_DB_12 used in current emg_sensor.c
 								    .bitwidth = ADC_BITWIDTH_12,
 								};
 								adc_cali_create_scheme_curve_fitting(&cali_cfg, &cali_handle);
 								// --- Sampling task (pin to Core 0) ---
 								void adc_sampling_task(void *arg) {
 								    uint8_t result_buf[256];
 								    uint32_t out_len = 0;
 								    while (1) {
 								        xSemaphoreTake(s_adc_sem, portMAX_DELAY);
 								        adc_continuous_read(adc_handle, result_buf, sizeof(result_buf), &out_len, 0);
 								        // Parse: each entry is adc_digi_output_data_t
 								        // Apply adc_cali_raw_to_voltage() for each sample
 								        // Apply IIR filter (Change B) → post to inference ring buffer
 								    }
 								}
 								```
 								**Verify**: log consecutive sample timestamps via `esp_timer_get_time()`; spacing should be 1.0ms ± 0.05ms.
 								---
 								## Change B — IIR Biquad Bandpass Filter
 								**Priority**: Tier 1
 								**Why**: MyoWare analogue filters are not tunable. Software IIR removes powerline interference
 								(50/60 Hz), sub-20 Hz motion artifact, and >500 Hz noise — all of which inflate ZC, WL, and
 								other features computed at rest.
 								**Effort**: 2 hours
 								### Step 1 — Compute Coefficients in Python (one-time, offline)
 								```python
 								from scipy.signal import butter
 								import numpy as np
 								fs = 1000.0
 								sos = butter(N=2, Wn=[20.0, 500.0], btype='bandpass', fs=fs, output='sos')
 								# sos[i] = [b0, b1, b2, a0, a1, a2]
 								# esp-dsp Direct Form II convention: coeffs = [b0, b1, b2, -a1, -a2]
 								for i, s in enumerate(sos):
 								    b0, b1, b2, a0, a1, a2 = s
 								    print(f"Section {i}: {b0:.8f}f, {b1:.8f}f, {b2:.8f}f, {-a1:.8f}f, {-a2:.8f}f")
 								# Run this and paste the printed values into the C constants below
 								```
 								### Step 2 — Add to inference.c (after includes, before `// --- State ---`)
 								```c
 								#include "dsps_biquad.h"
 								// 2nd-order Butterworth bandpass 20–500 Hz @ 1000 Hz
 								// Coefficients: [b0, b1, b2, -a1, -a2] — Direct Form II, esp-dsp sign convention
 								// Regenerate with: scipy.signal.butter(N=2, Wn=[20,500], btype='bandpass', fs=1000, output='sos')
 								static const float BIQUAD_HP_COEFFS[5] = { /* paste section 0 output here */ };
 								static const float BIQUAD_LP_COEFFS[5] = { /* paste section 1 output here */ };
 								// Filter delay state: 3 channels × 2 stages × 2 delay elements = 12 floats (48 bytes)
 								static float biquad_hp_w[HAND_NUM_CHANNELS][2];
 								static float biquad_lp_w[HAND_NUM_CHANNELS][2];
 								```
 								Add to `inference_init()`:
 								```c
 								    memset(biquad_hp_w, 0, sizeof(biquad_hp_w));
 								    memset(biquad_lp_w, 0, sizeof(biquad_lp_w));
 								```
 								### Step 3 — Apply Per Sample (called before writing to window_buffer)
 								```c
 								// Apply to each channel before posting to the window buffer.
 								// Must be called IN ORDER for each sample (IIR has memory across calls).
 								static float IRAM_ATTR apply_bandpass(int ch, float raw) {
 								    float hp_out, lp_out;
 								    dsps_biquad_f32(&raw,   &hp_out, 1, (float *)BIQUAD_HP_COEFFS, biquad_hp_w[ch]);
 								    dsps_biquad_f32(&hp_out, &lp_out, 1, (float *)BIQUAD_LP_COEFFS, biquad_lp_w[ch]);
 								    return lp_out;
 								}
 								```
 								**Note**: `window_buffer` stores `uint16_t` — change to `float` when adding this filter, so
 								filtered values are stored directly without lossy integer round-trip.
 								**Verify**: log ZC count at rest before and after — filtered ZC should be substantially lower
 								(less spurious noise crossings).
 								---
 								## Change C — Confidence Rejection
 								**Priority**: Tier 1 — **implement this first, lowest risk of all changes**
 								**Why**: Without a rejection threshold, ambiguous EMG (rest-to-gesture transition,
 								mid-gesture fatigue, electrode lift) always produces a false actuation.
 								**Effort**: 15 minutes
 								### Step 1 — Add Constant (top of inference.c with other constants)
 								```c
 								#define CONFIDENCE_THRESHOLD 0.40f  // Reject when max smoothed prob < this.
 								                                    // Meta paper uses 0.35; 0.40 adds prosthetic safety margin.
 								                                    // Tune: lower to 0.35 if real gestures are being rejected.
 								```
 								### Step 2 — Insert After EMA Block in `inference_predict()` (after line 214)
 								```c
 								  // Confidence rejection: if the peak smoothed probability is below threshold,
 								  // hold the last confirmed output rather than outputting an uncertain prediction.
 								  // Prevents false actuations during gesture transitions and electrode artifacts.
 								  if (max_smoothed_prob < CONFIDENCE_THRESHOLD) {
 								    *confidence = max_smoothed_prob;
 								    return current_output;  // -1 (GESTURE_NONE) until first confident prediction
 								  }
 								```
 								**Verify**: arm at complete rest → confirm output stays at GESTURE_NONE and confidence logs
 								below 0.40. Deliberate fist → confidence rises above 0.40 within 1–3 inference cycles.
 								---
 								## Change D — On-Device NVS Calibration
 								**Priority**: Tier 2
 								**Why**: Python `CalibrationTransform` only runs during training. On-device NVS calibration
 								lets the ESP32 recalibrate z-score normalization at startup (3 seconds of REST) without
 								retraining — solving placement drift and day-to-day impedance variation.
 								**Effort**: 3–4 hours
 								### New Files
 								```
 								EMG_Arm/src/core/calibration.h
 								EMG_Arm/src/core/calibration.c
 								```
 								### calibration.h
 								```c
 								#pragma once
 								#include <stdbool.h>
 								#include "config/config.h"
 								#define CALIB_MAX_FEATURES 96  // supports up to 4-channel expansion
 								bool calibration_init(void);          // load from NVS at startup
 								void calibration_apply(float *feat);  // z-score in-place; no-op if not calibrated
 								bool calibration_update(const float X[][CALIB_MAX_FEATURES], int n_windows, int n_feat);
 								void calibration_reset(void);
 								bool calibration_is_valid(void);
 								```
 								### calibration.c
 								```c
 								#include "calibration.h"
 								#include "nvs_flash.h"
 								#include "nvs.h"
 								#include <math.h>
 								#include <string.h>
 								#include <stdio.h>
 								#define NVS_NAMESPACE "emg_calib"
 								#define NVS_KEY_MEAN  "feat_mean"
 								#define NVS_KEY_STD   "feat_std"
 								#define NVS_KEY_NFEAT "n_feat"
 								#define NVS_KEY_VALID "calib_ok"
 								static float s_mean[CALIB_MAX_FEATURES];
 								static float s_std[CALIB_MAX_FEATURES];
 								static int   s_n_feat = 0;
 								static bool  s_valid  = false;
 								bool calibration_init(void) {
 								    esp_err_t err = nvs_flash_init();
 								    if (err == ESP_ERR_NVS_NO_FREE_PAGES || err == ESP_ERR_NVS_NEW_VERSION_FOUND) {
 								        nvs_flash_erase();
 								        nvs_flash_init();
 								    }
 								    nvs_handle_t h;
 								    if (nvs_open(NVS_NAMESPACE, NVS_READONLY, &h) != ESP_OK) return false;
 								    uint8_t valid = 0;
 								    size_t mean_sz = sizeof(s_mean), std_sz = sizeof(s_std);
 								    bool ok = (nvs_get_u8(h, NVS_KEY_VALID, &valid)         == ESP_OK) && (valid == 1) &&
 								              (nvs_get_i32(h, NVS_KEY_NFEAT, (int32_t*)&s_n_feat) == ESP_OK) &&
 								              (nvs_get_blob(h, NVS_KEY_MEAN, s_mean, &mean_sz) == ESP_OK) &&
 								              (nvs_get_blob(h, NVS_KEY_STD,  s_std,  &std_sz)  == ESP_OK);
 								    nvs_close(h);
 								    s_valid = ok;
 								    printf("[Calib] %s (%d features)\n", ok ? "Loaded from NVS" : "Not found — identity", s_n_feat);
 								    return ok;
 								}
 								void calibration_apply(float *feat) {
 								    if (!s_valid) return;
 								    for (int i = 0; i < s_n_feat; i++)
 								        feat[i] = (feat[i] - s_mean[i]) / s_std[i];
 								}
 								bool calibration_update(const float X[][CALIB_MAX_FEATURES], int n_windows, int n_feat) {
 								    if (n_windows < 10 || n_feat > CALIB_MAX_FEATURES) return false;
 								    s_n_feat = n_feat;
 								    memset(s_mean, 0, sizeof(s_mean));
 								    for (int w = 0; w < n_windows; w++)
 								        for (int f = 0; f < n_feat; f++)
 								            s_mean[f] += X[w][f];
 								    for (int f = 0; f < n_feat; f++) s_mean[f] /= n_windows;
 								    memset(s_std, 0, sizeof(s_std));
 								    for (int w = 0; w < n_windows; w++)
 								        for (int f = 0; f < n_feat; f++) {
 								            float d = X[w][f] - s_mean[f];
 								            s_std[f] += d * d;
 								        }
 								    for (int f = 0; f < n_feat; f++) {
 								        s_std[f] = sqrtf(s_std[f] / n_windows);
 								        if (s_std[f] < 1e-6f) s_std[f] = 1e-6f;
 								    }
 								    nvs_handle_t h;
 								    if (nvs_open(NVS_NAMESPACE, NVS_READWRITE, &h) != ESP_OK) return false;
 								    nvs_set_blob(h, NVS_KEY_MEAN, s_mean, sizeof(s_mean));
 								    nvs_set_blob(h, NVS_KEY_STD,  s_std,  sizeof(s_std));
 								    nvs_set_i32(h,  NVS_KEY_NFEAT, n_feat);
 								    nvs_set_u8(h,   NVS_KEY_VALID, 1);
 								    nvs_commit(h);
 								    nvs_close(h);
 								    s_valid = true;
 								    printf("[Calib] Updated from %d REST windows, %d features\n", n_windows, n_feat);
 								    return true;
 								}
 								```
 								### Integration in inference.c
 								In `inference_predict()`, after `compute_features(features)`, before LDA:
 								```c
 								    calibration_apply(features);  // z-score using NVS-stored mean/std
 								```
 								### Startup Flow
 								```c
 								// In main application startup sequence:
 								calibration_init();  // load from NVS; no-op if not present yet
 								// When user triggers recalibration (button press or serial command):
 								// Collect ~120 REST windows (~3 seconds at 25ms hop)
 								// Call calibration_update(rest_feature_buffer, 120, MODEL_NUM_FEATURES)
 								```
 								---
 								## Change E — int8 MLP via TFLite Micro
 								**Priority**: Tier 3 — implement after Tier 1+2 changes and benchmark (Change 5) shows LDA plateauing
 								**Why**: LDA finds only linear decision boundaries. A 2-layer int8 MLP adds nonlinear
 								boundaries for gesture pairs that overlap in feature space.
 								**Effort**: 4–6 hours
 								### Python Training (new file: `train_mlp_tflite.py`)
 								```python
 								"""
 								Train int8 MLP for ESP32-S3 deployment via TFLite Micro.
 								Run AFTER Change 0 (label shift) + Change 1 (expanded features).
 								"""
 								import numpy as np
 								import tensorflow as tf
 								from pathlib import Path
 								import sys
 								sys.path.insert(0, str(Path(__file__).parent))
 								from learning_data_collection import SessionStorage, EMGFeatureExtractor, HAND_CHANNELS
 								storage = SessionStorage()
 								X_raw, y, trial_ids, session_indices, label_names, _ = storage.load_all_for_training()
 								extractor = EMGFeatureExtractor(channels=HAND_CHANNELS, cross_channel=True)
 								X = extractor.extract_features_batch(X_raw).astype(np.float32)
 								from sklearn.preprocessing import StandardScaler
 								scaler = StandardScaler()
 								X = scaler.fit_transform(X)
 								n_feat, n_cls = X.shape[1], len(np.unique(y))
 								model = tf.keras.Sequential([
 								    tf.keras.layers.Input(shape=(n_feat,)),
 								    tf.keras.layers.Dense(32, activation='relu'),
 								    tf.keras.layers.Dropout(0.2),
 								    tf.keras.layers.Dense(16, activation='relu'),
 								    tf.keras.layers.Dense(n_cls, activation='softmax'),
 								])
 								model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
 								model.fit(X, y, epochs=150, batch_size=64, validation_split=0.1, verbose=1)
 								def representative_dataset():
 								    for i in range(0, len(X), 10):
 								        yield [X[i:i+1]]
 								converter = tf.lite.TFLiteConverter.from_keras_model(model)
 								converter.optimizations = [tf.lite.Optimize.DEFAULT]
 								converter.representative_dataset = representative_dataset
 								converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
 								converter.inference_input_type  = tf.int8
 								converter.inference_output_type = tf.int8
 								tflite_model = converter.convert()
 								out = Path('EMG_Arm/src/core/emg_model_data.cc')
 								with open(out, 'w') as f:
 								    f.write('#include "emg_model_data.h"\n')
 								    f.write(f'const int g_model_len = {len(tflite_model)};\n')
 								    f.write('const unsigned char g_model[] = {\n  ')
 								    f.write(', '.join(f'0x{b:02x}' for b in tflite_model))
 								    f.write('\n};\n')
 								print(f"Wrote {out} ({len(tflite_model)} bytes)")
 								```
 								### Firmware (inference_mlp.cc)
 								```cpp
 								#include "inference_mlp.h"
 								#include "emg_model_data.h"
 								#include "tensorflow/lite/micro/micro_interpreter.h"
 								#include "tensorflow/lite/micro/micro_mutable_op_resolver.h"
 								#include "tensorflow/lite/schema/schema_generated.h"
 								static uint8_t tensor_arena[48 * 1024];  // 48 KB — tune down if memory is tight
 								static tflite::MicroInterpreter *interpreter = nullptr;
 								static TfLiteTensor *input = nullptr, *output = nullptr;
 								void inference_mlp_init(void) {
 								    const tflite::Model *model = tflite::GetModel(g_model);
 								    static tflite::MicroMutableOpResolver<4> resolver;
 								    resolver.AddFullyConnected();
 								    resolver.AddRelu();
 								    resolver.AddSoftmax();
 								    resolver.AddDequantize();
 								    static tflite::MicroInterpreter interp(model, resolver, tensor_arena, sizeof(tensor_arena));
 								    interpreter = &interp;
 								    interpreter->AllocateTensors();
 								    input  = interpreter->input(0);
 								    output = interpreter->output(0);
 								}
 								int inference_mlp_predict(const float *features, int n_feat, float *conf_out) {
 								    float iscale = input->params.scale;
 								    int   izp    = input->params.zero_point;
 								    for (int i = 0; i < n_feat; i++) {
 								        int q = (int)roundf(features[i] / iscale) + izp;
 								        input->data.int8[i] = (int8_t)(q < -128 ? -128 : q > 127 ? 127 : q);
 								    }
 								    interpreter->Invoke();
 								    float oscale = output->params.scale;
 								    int   ozp    = output->params.zero_point;
 								    float max_p = -1e9f;
 								    int max_c = 0;
 								    for (int c = 0; c < MODEL_NUM_CLASSES; c++) {
 								        float p = (output->data.int8[c] - ozp) * oscale;
 								        if (p > max_p) { max_p = p; max_c = c; }
 								    }
 								    *conf_out = max_p;
 								    return max_c;
 								}
 								```
 								**platformio.ini addition**:
 								```ini
 								lib_deps =
 								    tensorflow/tflite-micro
 								```
 								---
 								## Change F — Ensemble Inference Pipeline
 								**Priority**: Tier 3 (requires Change 1 features + Change 7 training + Change E MLP)
 								**Why**: This is the full recommended architecture from Part II.
 								**Effort**: 3–4 hours firmware (after Python ensemble is trained and exported)
 								### New Files
 								```
 								EMG_Arm/src/core/inference_ensemble.c
 								EMG_Arm/src/core/inference_ensemble.h
 								EMG_Arm/src/core/model_weights_ensemble.h   (generated by Change 7 Python script)
 								```
 								### inference_ensemble.h
 								```c
 								#pragma once
 								#include <stdbool.h>
 								void inference_ensemble_init(void);
 								int  inference_ensemble_predict(float *confidence);
 								```
 								### inference_ensemble.c
 								```c
 								#include "inference_ensemble.h"
 								#include "inference.h"          // for compute_features(), calibration_apply()
 								#include "inference_mlp.h"      // for inference_mlp_predict()
 								#include "model_weights_ensemble.h"
 								#include "config/config.h"
 								#include "dsps_dotprod.h"
 								#include <math.h>
 								#include <string.h>
 								#include <stdio.h>
 								#define ENSEMBLE_EMA_ALPHA      0.70f
 								#define ENSEMBLE_CONF_THRESHOLD 0.50f  // below this: escalate to MLP fallback
 								#define REJECT_THRESHOLD        0.40f  // below this even after MLP: hold output
 								#define REST_ACTIVITY_THRESHOLD 0.05f  // total_rms below this → skip inference, return REST
 								// EMA state
 								static float s_smoothed[MODEL_NUM_CLASSES];
 								// Vote + debounce (reuse existing pattern from inference.c)
 								static int s_vote_history[5];
 								static int s_vote_head = 0;
 								static int s_current_output = -1;
 								static int s_pending_output = -1;
 								static int s_pending_count  = 0;
 								// --- Generic LDA softmax predict ---
 								// weights: [n_classes][n_feat], intercepts: [n_classes]
 								// proba_out: [n_classes] — caller-provided output
 								static void lda_softmax(const float *feat, int n_feat,
 								                         const float *weights_flat, const float *intercepts,
 								                         int n_classes, float *proba_out) {
 								    float raw[MODEL_NUM_CLASSES];
 								    float max_raw = -1e9f, sum_exp = 0.0f;
 								    for (int c = 0; c < n_classes; c++) {
 								        raw[c] = intercepts[c];
 								        // dsps_dotprod_f32 requires 4-byte aligned arrays and length multiple of 4;
 								        // for safety use plain loop — compiler will auto-vectorize with -O2
 								        const float *w = weights_flat + c * n_feat;
 								        for (int f = 0; f < n_feat; f++) raw[c] += feat[f] * w[f];
 								        if (raw[c] > max_raw) max_raw = raw[c];
 								    }
 								    for (int c = 0; c < n_classes; c++) {
 								        proba_out[c] = expf(raw[c] - max_raw);
 								        sum_exp += proba_out[c];
 								    }
 								    for (int c = 0; c < n_classes; c++) proba_out[c] /= sum_exp;
 								}
 								void inference_ensemble_init(void) {
 								    for (int c = 0; c < MODEL_NUM_CLASSES; c++)
 								        s_smoothed[c] = 1.0f / MODEL_NUM_CLASSES;
 								    for (int i = 0; i < 5; i++) s_vote_history[i] = -1;
 								    s_vote_head = 0;
 								    s_current_output = -1;
 								    s_pending_output = -1;
 								    s_pending_count  = 0;
 								}
 								int inference_ensemble_predict(float *confidence) {
 								    // 1. Extract features (shared with single-model path)
 								    float features[MODEL_NUM_FEATURES];
 								    compute_features(features);
 								    calibration_apply(features);
 								    // 2. Activity gate — skip inference during obvious REST
 								    float total_rms_sq = 0.0f;
 								    for (int ch = 0; ch < HAND_NUM_CHANNELS; ch++) {
 								        float r = features[ch * ENSEMBLE_PER_CH_FEATURES]; // RMS is index 0 per channel
 								        total_rms_sq += r * r;
 								    }
 								    if (sqrtf(total_rms_sq) < REST_ACTIVITY_THRESHOLD) {
 								        *confidence = 1.0f;
 								        return GESTURE_REST;
 								    }
 								    // 3. Specialist LDAs
 								    float prob_td[MODEL_NUM_CLASSES];
 								    float prob_fd[MODEL_NUM_CLASSES];
 								    float prob_cc[MODEL_NUM_CLASSES];
 								    lda_softmax(features + TD_FEAT_OFFSET, TD_NUM_FEATURES,
 								                (const float *)LDA_TD_WEIGHTS, LDA_TD_INTERCEPTS,
 								                MODEL_NUM_CLASSES, prob_td);
 								    lda_softmax(features + FD_FEAT_OFFSET, FD_NUM_FEATURES,
 								                (const float *)LDA_FD_WEIGHTS, LDA_FD_INTERCEPTS,
 								                MODEL_NUM_CLASSES, prob_fd);
 								    lda_softmax(features + CC_FEAT_OFFSET, CC_NUM_FEATURES,
 								                (const float *)LDA_CC_WEIGHTS, LDA_CC_INTERCEPTS,
 								                MODEL_NUM_CLASSES, prob_cc);
 								    // 4. Meta-LDA stacker
 								    float meta_in[META_NUM_INPUTS];  // = 3 * MODEL_NUM_CLASSES
 								    memcpy(meta_in,                        prob_td, MODEL_NUM_CLASSES * sizeof(float));
 								    memcpy(meta_in +   MODEL_NUM_CLASSES,  prob_fd, MODEL_NUM_CLASSES * sizeof(float));
 								    memcpy(meta_in + 2*MODEL_NUM_CLASSES,  prob_cc, MODEL_NUM_CLASSES * sizeof(float));
 								    float meta_probs[MODEL_NUM_CLASSES];
 								    lda_softmax(meta_in, META_NUM_INPUTS,
 								                (const float *)META_LDA_WEIGHTS, META_LDA_INTERCEPTS,
 								                MODEL_NUM_CLASSES, meta_probs);
 								    // 5. EMA smoothing on meta output
 								    float max_smooth = 0.0f;
 								    int winner = 0;
 								    for (int c = 0; c < MODEL_NUM_CLASSES; c++) {
 								        s_smoothed[c] = ENSEMBLE_EMA_ALPHA * s_smoothed[c] +
 								                        (1.0f - ENSEMBLE_EMA_ALPHA) * meta_probs[c];
 								        if (s_smoothed[c] > max_smooth) { max_smooth = s_smoothed[c]; winner = c; }
 								    }
 								    // 6. Confidence cascade: escalate to MLP if meta-LDA is uncertain
 								    if (max_smooth < ENSEMBLE_CONF_THRESHOLD) {
 								        float mlp_conf = 0.0f;
 								        int mlp_winner = inference_mlp_predict(features, MODEL_NUM_FEATURES, &mlp_conf);
 								        if (mlp_conf > max_smooth) { winner = mlp_winner; max_smooth = mlp_conf; }
 								    }
 								    // 7. Reject if still uncertain
 								    if (max_smooth < REJECT_THRESHOLD) {
 								        *confidence = max_smooth;
 								        return s_current_output;
 								    }
 								    *confidence = max_smooth;
 								    // 8. Majority vote (window = 5)
 								    s_vote_history[s_vote_head] = winner;
 								    s_vote_head = (s_vote_head + 1) % 5;
 								    int counts[MODEL_NUM_CLASSES] = {0};
 								    for (int i = 0; i < 5; i++)
 								        if (s_vote_history[i] >= 0) counts[s_vote_history[i]]++;
 								    int majority = 0, majority_cnt = 0;
 								    for (int c = 0; c < MODEL_NUM_CLASSES; c++)
 								        if (counts[c] > majority_cnt) { majority_cnt = counts[c]; majority = c; }
 								    // 9. Debounce (3 consecutive predictions to change output)
 								    int final = s_current_output;
 								    if (s_current_output == -1) {
 								        s_current_output = majority; final = majority;
 								    } else if (majority == s_current_output) {
 								        s_pending_output = majority; s_pending_count = 1;
 								    } else if (majority == s_pending_output) {
 								        if (++s_pending_count >= 3) { s_current_output = majority; final = majority; }
 								    } else {
 								        s_pending_output = majority; s_pending_count = 1;
 								    }
 								    return final;
 								}
 								```
 								### model_weights_ensemble.h Layout (generated by Change 7)
 								```c
 								// Auto-generated by train_ensemble.py — do not edit manually
 								#pragma once
 								#define MODEL_NUM_CLASSES    5       // auto-computed from training data
 								#define MODEL_NUM_FEATURES   69      // total feature count (after Change 1)
 								#define ENSEMBLE_PER_CH_FEATURES 20  // features per channel
 								// Specialist feature subset offsets and sizes
 								#define TD_FEAT_OFFSET  0
 								#define TD_NUM_FEATURES 36   // time-domain: indices 0–11, 20–31, 40–51
 								#define FD_FEAT_OFFSET  12   // NOTE: FD features are interleaved per-channel
 								#define FD_NUM_FEATURES 24   // freq-domain: indices 12–19, 32–39, 52–59
 								#define CC_FEAT_OFFSET  60
 								#define CC_NUM_FEATURES 9    // cross-channel: indices 60–68
 								#define META_NUM_INPUTS (3 * MODEL_NUM_CLASSES)  // = 15
 								// Specialist LDA weights (flat row-major: [n_classes][n_feat])
 								extern const float LDA_TD_WEIGHTS[MODEL_NUM_CLASSES][TD_NUM_FEATURES];
 								extern const float LDA_TD_INTERCEPTS[MODEL_NUM_CLASSES];
 								extern const float LDA_FD_WEIGHTS[MODEL_NUM_CLASSES][FD_NUM_FEATURES];
 								extern const float LDA_FD_INTERCEPTS[MODEL_NUM_CLASSES];
 								extern const float LDA_CC_WEIGHTS[MODEL_NUM_CLASSES][CC_NUM_FEATURES];
 								extern const float LDA_CC_INTERCEPTS[MODEL_NUM_CLASSES];
 								// Meta-LDA weights
 								extern const float META_LDA_WEIGHTS[MODEL_NUM_CLASSES][META_NUM_INPUTS];
 								extern const float META_LDA_INTERCEPTS[MODEL_NUM_CLASSES];
 								// Class names (for inference_get_gesture_enum)
 								extern const char *MODEL_CLASS_NAMES[MODEL_NUM_CLASSES];
 								```
 								**Important note on FD features**: the frequency-domain features are interleaved at indices
 								[12–19] for ch0, [32–39] for ch1, [52–59] for ch2. The `lda_softmax` call for LDA_FD must
 								pass a **gathered** (non-contiguous) sub-vector. The cleanest approach is to gather them into
 								a contiguous buffer before calling lda_softmax:
 								```c
 								// Gather FD features into contiguous buffer before LDA_FD
 								float fd_buf[FD_NUM_FEATURES];
 								for (int ch = 0; ch < HAND_NUM_CHANNELS; ch++)
 								    memcpy(fd_buf + ch*8, features + ch*20 + 12, 8 * sizeof(float));
 								lda_softmax(fd_buf, FD_NUM_FEATURES, ...);
 								```
 								Similarly for TD features. This gather costs <5 µs — negligible.
 								---
 								# PART VI — PYTHON/TRAINING CHANGES
 								## Change 0 — Forward Label Shift
 								**Priority**: Tier 1
 								**Source**: Meta Nature 2025, Methods: "Discrete-gesture time alignment"
 								**Why**: +100ms shift after onset detection gives the classifier 100ms of pre-event "building"
 								signal, dramatically cleaning the decision boundary near gesture onset.
 								**ESP32 impact**: None.
 								### Step 1 — Add Constant After Line 94
 								```python
 								# After: TRANSITION_END_MS = 150
 								LABEL_FORWARD_SHIFT_MS = 100  # shift label boundaries +100ms after onset alignment
 								                               # Source: Kaifosh et al. Nature 2025. doi:10.1038/s41586-025-09255-w
 								```
 								### Step 2 — Apply Shift in `SessionStorage.save_session()` (after line ~704)
 								Find and insert after:
 								```python
 								            print(f"[Storage] Labels aligned: {changed}/{len(labels)} windows shifted")
 								```
 								Insert:
 								```python
 								        if LABEL_FORWARD_SHIFT_MS > 0:
 								            shift_windows = max(1, round(LABEL_FORWARD_SHIFT_MS / HOP_SIZE_MS))
 								            shifted = list(aligned_labels)
 								            for i in range(1, len(aligned_labels)):
 								                if aligned_labels[i] != aligned_labels[i - 1]:
 								                    for j in range(i, min(i + shift_windows, len(aligned_labels))):
 								                        if shifted[j] == aligned_labels[i]:
 								                            shifted[j] = aligned_labels[i - 1]
 								            n_shifted = sum(1 for a, b in zip(aligned_labels, shifted) if a != b)
 								            aligned_labels = shifted
 								            print(f"[Storage] Forward label shift (+{LABEL_FORWARD_SHIFT_MS}ms): {n_shifted} windows adjusted")
 								```
 								### Step 3 — Reduce TRANSITION_START_MS
 								```python
 								TRANSITION_START_MS = 200   # was 300 — reduce because 100ms shift already adds pre-event context
 								```
 								**Verify**: printout shows `N windows adjusted` where N is 5–20% of total windows per session.
 								---
 								## Change 1 — Expanded Feature Set
 								**Priority**: Tier 2
 								**Why**: 12 → 69 features; adds frequency-domain and cross-channel information that is
 								structurally more informative than amplitude alone (Meta Extended Data Fig. 6).
 								**ESP32 impact**: retrain → export new `model_weights.h`; port selected features to C.
 								### Sub-change 1A — Expand `extract_features_single_channel()` (line 1448)
 								Replace the entire function body:
 								```python
 								    def extract_features_single_channel(self, signal: np.ndarray) -> dict:
 								        if getattr(self, 'reinhard', False):
 								            signal = 64.0 * signal / (32.0 + np.abs(signal))
 								        signal = signal - np.mean(signal)
 								        N = len(signal)
 								        # --- Time domain ---
 								        rms  = np.sqrt(np.mean(signal ** 2))
 								        diff = np.diff(signal)
 								        wl   = np.sum(np.abs(diff))
 								        zc_thresh  = self.zc_threshold_percent * rms
 								        ssc_thresh = (self.ssc_threshold_percent * rms) ** 2
 								        sign_ch = signal[:-1] * signal[1:] < 0
 								        zc  = int(np.sum(sign_ch & (np.abs(diff) > zc_thresh)))
 								        d_l = signal[1:-1] - signal[:-2]
 								        d_r = signal[1:-1] - signal[2:]
 								        ssc = int(np.sum((d_l * d_r) > ssc_thresh))
 								        mav  = np.mean(np.abs(signal))
 								        var  = np.mean(signal ** 2)
 								        iemg = np.sum(np.abs(signal))
 								        wamp = int(np.sum(np.abs(diff) > 0.15 * rms))
 								        # AR(4) via Yule-Walker
 								        ar = np.zeros(4)
 								        if rms > 1e-6:
 								            try:
 								                from scipy.linalg import solve_toeplitz
 								                r = np.array([np.dot(signal[i:], signal[:N-i]) / N for i in range(5)])
 								                if r[0] > 1e-10:
 								                    ar = solve_toeplitz(r[:4], -r[1:5])
 								            except Exception:
 								                pass
 								        # --- Frequency domain (20–500 Hz) ---
 								        freqs = np.fft.rfftfreq(N, d=1.0 / SAMPLING_RATE_HZ)
 								        psd   = np.abs(np.fft.rfft(signal)) ** 2 / N
 								        m     = (freqs >= 20) & (freqs <= 500)
 								        f_m, p_m = freqs[m], psd[m]
 								        tp = np.sum(p_m) + 1e-10
 								        mnf = float(np.sum(f_m * p_m) / tp)
 								        cum = np.cumsum(p_m)
 								        mdf = float(f_m[min(np.searchsorted(cum, tp / 2), len(f_m) - 1)])
 								        pkf = float(f_m[np.argmax(p_m)]) if len(p_m) > 0 else 0.0
 								        mnp = float(tp / max(len(p_m), 1))
 								        # Bandpower in 4 physiological bands (mirrors firmware esp-dsp FFT bands)
 								        bands = [(20, 80), (80, 150), (150, 300), (300, 500)]
 								        bp = [float(np.sum(psd[(freqs >= lo) & (freqs < hi)])) for lo, hi in bands]
 								        return {
 								            'rms': rms, 'wl': wl, 'zc': zc, 'ssc': ssc,
 								            'mav': mav, 'var': var, 'iemg': iemg, 'wamp': wamp,
 								            'ar1': float(ar[0]), 'ar2': float(ar[1]),
 								            'ar3': float(ar[2]), 'ar4': float(ar[3]),
 								            'mnf': mnf, 'mdf': mdf, 'pkf': pkf, 'mnp': mnp,
 								            'bp0': bp[0], 'bp1': bp[1], 'bp2': bp[2], 'bp3': bp[3],
 								        }
 								```
 								### Sub-change 1B — Update `extract_features_window()` Return Block (line 1482)
 								Replace the return section:
 								```python
 								        FEATURE_ORDER = ['rms', 'wl', 'zc', 'ssc', 'mav', 'var', 'iemg', 'wamp',
 								                         'ar1', 'ar2', 'ar3', 'ar4', 'mnf', 'mdf', 'pkf', 'mnp',
 								                         'bp0', 'bp1', 'bp2', 'bp3']
 								        NORMALIZE_KEYS = {'rms', 'wl', 'mav', 'iemg'}
 								        features = []
 								        for ch_features in all_ch_features:
 								            for key in FEATURE_ORDER:
 								                val = ch_features.get(key, 0.0)
 								                if self.normalize and key in NORMALIZE_KEYS:
 								                    val = val / norm_factor
 								                features.append(float(val))
 								        if self.cross_channel and window.shape[1] >= 2:
 								            sel   = window[:, channel_indices].astype(np.float32)
 								            wc    = sel - sel.mean(axis=0)
 								            cov   = (wc.T @ wc) / len(wc)
 								            ri, ci = np.triu_indices(len(channel_indices))
 								            features.extend(cov[ri, ci].tolist())
 								            stds = np.sqrt(np.diag(cov)) + 1e-10
 								            cor  = cov / np.outer(stds, stds)
 								            ro, co = np.triu_indices(len(channel_indices), k=1)
 								            features.extend(cor[ro, co].tolist())
 								        return np.array(features, dtype=np.float32)
 								```
 								### Sub-change 1C — Update `EMGFeatureExtractor.__init__()` (line 1430)
 								```python
 								    def __init__(self, zc_threshold_percent=0.1, ssc_threshold_percent=0.1,
 								                 channels=None, normalize=True, cross_channel=True, reinhard=False):
 								        self.zc_threshold_percent  = zc_threshold_percent
 								        self.ssc_threshold_percent = ssc_threshold_percent
 								        self.channels      = channels
 								        self.normalize     = normalize
 								        self.cross_channel = cross_channel
 								        self.reinhard      = reinhard
 								```
 								### Sub-change 1D — Update Feature Count in `extract_features_batch()` (line 1520)
 								Replace `n_features = n_channels * 4`:
 								```python
 								        per_ch = 20
 								        if self.cross_channel and n_channels >= 2:
 								            n_features = n_channels * per_ch + \
 								                         n_channels*(n_channels+1)//2 + n_channels*(n_channels-1)//2
 								        else:
 								            n_features = n_channels * per_ch
 								```
 								### Sub-change 1E — Update `get_feature_names()` (line 1545)
 								```python
 								    def get_feature_names(self, n_channels=0):
 								        ch_idx = self.channels if self.channels is not None else list(range(n_channels))
 								        ORDER = ['rms','wl','zc','ssc','mav','var','iemg','wamp',
 								                 'ar1','ar2','ar3','ar4','mnf','mdf','pkf','mnp','bp0','bp1','bp2','bp3']
 								        names = [f'ch{ch}_{f}' for ch in ch_idx for f in ORDER]
 								        if self.cross_channel and len(ch_idx) >= 2:
 								            n = len(ch_idx)
 								            names += [f'cov_ch{ch_idx[i]}_ch{ch_idx[j]}' for i in range(n) for j in range(i, n)]
 								            names += [f'cor_ch{ch_idx[i]}_ch{ch_idx[j]}' for i in range(n) for j in range(i+1, n)]
 								        return names
 								```
 								### Sub-change 1F — Update `EMGClassifier.__init__()` (line 1722)
 								```python
 								        self.feature_extractor = EMGFeatureExtractor(
 								            channels=HAND_CHANNELS, cross_channel=True, reinhard=False)
 								```
 								### Sub-change 1G — Update `save()` (line 1910) and `load()` (line 2089)
 								In `save()`, add to `feature_extractor_params` dict:
 								```python
 								                'cross_channel': getattr(self.feature_extractor, 'cross_channel', True),
 								                'reinhard':      getattr(self.feature_extractor, 'reinhard', False),
 								```
 								In `load()`, update `EMGFeatureExtractor(...)` constructor:
 								```python
 								        classifier.feature_extractor = EMGFeatureExtractor(
 								            zc_threshold_percent  = params.get('zc_threshold_percent', 0.1),
 								            ssc_threshold_percent = params.get('ssc_threshold_percent', 0.1),
 								            channels              = params.get('channels', HAND_CHANNELS),
 								            normalize             = params.get('normalize', False),
 								            cross_channel         = params.get('cross_channel', True),
 								            reinhard              = params.get('reinhard', False),
 								        )
 								```
 								### Also Fix Bug at Line 2382
 								```python
 								X, y, trial_ids, session_indices, label_names, loaded_sessions = storage.load_all_for_training()
 								```
 								---
 								## Change 2 — Electrode Repositioning Protocol
 								**Protocol**: no code changes.
 								> *"Between sessions within a single day, the participants remove and slightly reposition the
 								> sEMG wristband to enable generalization across different recording positions."*
 								> — Meta Nature 2025 Methods
 								- Session 1: standard placement
 								- Session 2: band 1–2 cm up the forearm
 								- Session 3: band 1–2 cm down the forearm
 								- Session 4+: slight axial rotation or return to any above position
 								The per-session z-score normalization in `_apply_session_normalization()` handles the
 								resulting amplitude shifts. Perform **fast, natural** gestures — not slow/deliberate.
 								---
 								## Change 3 — Data Augmentation
 								**Priority**: Tier 2. Apply to **raw windows BEFORE feature extraction**.
 								Insert before the `# === LDA CLASSIFIER ===` comment (~line 1709):
 								```python
 								def augment_emg_batch(X, y, multiplier=3, seed=42):
 								    """
 								    Augment raw EMG windows for training robustness.
 								    Must be called on raw windows (n_windows, n_samples, n_channels),
 								    not on pre-computed features.
 								    Source (window jitter): Kaifosh et al. Nature 2025. doi:10.1038/s41586-025-09255-w
 								    """
 								    rng = np.random.default_rng(seed)
 								    aug_X, aug_y = [X], [y]
 								    for _ in range(multiplier - 1):
 								        Xc = X.copy().astype(np.float32)
 								        Xc *= rng.uniform(0.80, 1.20, (len(X), 1, 1)).astype(np.float32)          # amplitude
 								        rms = np.sqrt(np.mean(Xc**2, axis=(1,2), keepdims=True)) + 1e-8
 								        Xc += rng.standard_normal(Xc.shape).astype(np.float32) * (0.05 * rms)     # noise
 								        Xc += rng.uniform(-20., 20., (len(X), 1, X.shape[2])).astype(np.float32)  # DC jitter
 								        shifts = rng.integers(-5, 6, size=len(X))
 								        for i in range(len(Xc)):
 								            if shifts[i]: Xc[i] = np.roll(Xc[i], shifts[i], axis=0)              # jitter
 								        aug_X.append(Xc); aug_y.append(y)
 								    return np.concatenate(aug_X), np.concatenate(aug_y)
 								```
 								In `EMGClassifier.train()`, replace the start of the function's feature extraction block:
 								```python
 								        if getattr(self, 'use_augmentation', True):
 								            X_aug, y_aug = augment_emg_batch(X, y, multiplier=3)
 								            print(f"[Classifier] Augmented: {len(X)} → {len(X_aug)} windows")
 								        else:
 								            X_aug, y_aug = X, y
 								        X_features = self.feature_extractor.extract_features_batch(X_aug)
 								        # ... then use y_aug instead of y for model.fit()
 								```
 								---
 								## Change 4 — Reinhard Compression (Optional)
 								**Formula**: `output = 64 × x / (32 + |x|)`
 								**Enable in Python**: set `reinhard=True` in `EMGFeatureExtractor` constructor (Change 1F).
 								**Enable in firmware** (`inference.c` `compute_features()`, after signal copy loop, before mean calc):
 								```c
 								#if MODEL_USE_REINHARD
 								    for (int i = 0; i < INFERENCE_WINDOW_SIZE; i++) {
 								        float x = signal[i];
 								        signal[i] = 64.0f * x / (32.0f + fabsf(x));
 								    }
 								#endif
 								```
 								Add `#define MODEL_USE_REINHARD 0` to `model_weights.h` (set to `1` when Python uses `reinhard=True`).
 								**Python and firmware MUST match.** Mismatch silently corrupts all predictions.
 								---
 								## Change 5 — Classifier Benchmark
 								**Purpose**: tells you whether LDA accuracy plateau is a features problem (all classifiers similar → add features) or a model complexity problem (SVM/MLP >> LDA → implement Change E/F).
 								Add after `run_training_demo()`:
 								```python
 								def run_classifier_benchmark():
 								    from sklearn.svm import SVC
 								    from sklearn.neural_network import MLPClassifier
 								    from sklearn.pipeline import Pipeline
 								    from sklearn.preprocessing import StandardScaler
 								    from sklearn.model_selection import cross_val_score, GroupKFold
 								    from sklearn.discriminant_analysis import LinearDiscriminantAnalysis, QuadraticDiscriminantAnalysis
 								    storage = SessionStorage()
 								    X_raw, y, trial_ids, session_indices, label_names, _ = storage.load_all_for_training()
 								    extractor = EMGFeatureExtractor(channels=HAND_CHANNELS, cross_channel=True)
 								    X = extractor.extract_features_batch(X_raw)
 								    X = EMGClassifier()._apply_session_normalization(X, session_indices, y=y)
 								    clfs = {
 								        'LDA (ESP32 model)':  LinearDiscriminantAnalysis(),
 								        'QDA':                QuadraticDiscriminantAnalysis(reg_param=0.1),
 								        'SVM-RBF':            Pipeline([('s', StandardScaler()), ('m', SVC(kernel='rbf', C=10))]),
 								        'MLP-128-64':         Pipeline([('s', StandardScaler()),
 								                                         ('m', MLPClassifier(hidden_layer_sizes=(128,64),
 								                                                             max_iter=1000, early_stopping=True))]),
 								    }
 								    gkf = GroupKFold(n_splits=5)
 								    print(f"\n{'Classifier':<22} {'Mean CV':>8} {'Std':>6}")
 								    print("-" * 40)
 								    for name, clf in clfs.items():
 								        sc = cross_val_score(clf, X, y, cv=gkf, groups=trial_ids, scoring='accuracy')
 								        print(f"  {name:<20} {sc.mean()*100:>7.1f}%  ±{sc.std()*100:.1f}%")
 								    print("\n  → If LDA ≈ SVM: features are the bottleneck (add Change 1 features)")
 								    print("  → If SVM >> LDA: model complexity bottleneck (implement Change F ensemble)")
 								```
 								---
 								## Change 6 — Simplified MPF Features
 								**Python training only** — not worth porting to ESP32 directly (use bandpower bp0–bp3 from Change 1 as the firmware-side approximation).
 								Add after `EMGFeatureExtractor` class:
 								```python
 								class MPFFeatureExtractor:
 								    """
 								    Simplified 3-channel MPF: CSD upper triangle per 6 frequency bands = 36 features.
 								    Python training only. Omits matrix logarithm (not needed for 3 channels).
 								    Source: Kaifosh et al. Nature 2025. doi:10.1038/s41586-025-09255-w
 								    ESP32 approximation: use bp0–bp3 from EMGFeatureExtractor (Change 1).
 								    """
 								    BANDS = [(0,62),(62,125),(125,187),(187,250),(250,375),(375,500)]
 								    def __init__(self, channels=None, log_diagonal=True):
 								        self.channels = channels or HAND_CHANNELS
 								        self.log_diag = log_diagonal
 								        self.n_ch = len(self.channels)
 								        self._r, self._c = np.triu_indices(self.n_ch)
 								        self.n_features = len(self.BANDS) * len(self._r)
 								    def extract_window(self, window):
 								        sig   = window[:, self.channels].astype(np.float64)
 								        N     = len(sig)
 								        freqs = np.fft.rfftfreq(N, d=1.0/SAMPLING_RATE_HZ)
 								        Xf    = np.fft.rfft(sig, axis=0)
 								        feats = []
 								        for lo, hi in self.BANDS:
 								            mask = (freqs >= lo) & (freqs < hi)
 								            if not mask.any():
 								                feats.extend([0.0] * len(self._r)); continue
 								            CSD = (Xf[mask].conj().T @ Xf[mask]).real / N
 								            if self.log_diag:
 								                for k in range(self.n_ch): CSD[k,k] = np.log(max(CSD[k,k], 1e-10))
 								            feats.extend(CSD[self._r, self._c].tolist())
 								        return np.array(feats, dtype=np.float32)
 								    def extract_batch(self, X):
 								        out = np.zeros((len(X), self.n_features), dtype=np.float32)
 								        for i in range(len(X)): out[i] = self.extract_window(X[i])
 								        return out
 								```
 								In `EMGClassifier.train()`, after standard feature extraction:
 								```python
 								        if getattr(self, 'use_mpf', False):
 								            mpf = MPFFeatureExtractor(channels=HAND_CHANNELS)
 								            X_features = np.hstack([X_features, mpf.extract_batch(X_aug)])
 								```
 								---
 								## Change 7 — Ensemble Training
 								**Priority**: Tier 3 (implements Change F's training side)
 								**New file**: `C:/VSCode/Marvel_Projects/Bucky_Arm/train_ensemble.py`
 								```python
 								"""
 								Train the full 3-specialist-LDA + meta-LDA ensemble.
 								Requires Change 1 (expanded features) to be implemented first.
 								Exports model_weights_ensemble.h for firmware Change F.
 								Architecture:
 								  LDA_TD (36 time-domain feat) ─┐
 								  LDA_FD (24 freq-domain feat)  ├─ 15 probs ─► Meta-LDA ─► final class
 								  LDA_CC (9  cross-ch feat)     ─┘
 								"""
 								import numpy as np
 								from pathlib import Path
 								from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
 								from sklearn.model_selection import cross_val_predict, GroupKFold, cross_val_score
 								import sys
 								sys.path.insert(0, str(Path(__file__).parent))
 								from learning_data_collection import (
 								    SessionStorage, EMGFeatureExtractor, HAND_CHANNELS
 								)
 								# ─── Load and extract features ───────────────────────────────────────────────
 								storage = SessionStorage()
 								X_raw, y, trial_ids, session_indices, label_names, _ = storage.load_all_for_training()
 								extractor = EMGFeatureExtractor(channels=HAND_CHANNELS, cross_channel=True)
 								X = extractor.extract_features_batch(X_raw).astype(np.float64)
 								# Per-session normalization (same as EMGClassifier._apply_session_normalization)
 								from sklearn.preprocessing import StandardScaler
 								for sid in np.unique(session_indices):
 								    mask = session_indices == sid
 								    sc = StandardScaler()
 								    X[mask] = sc.fit_transform(X[mask])
 								feat_names = extractor.get_feature_names(n_channels=len(HAND_CHANNELS))
 								n_cls = len(np.unique(y))
 								# ─── Feature subset indices ───────────────────────────────────────────────────
 								TD_FEAT = ['rms','wl','zc','ssc','mav','var','iemg','wamp','ar1','ar2','ar3','ar4']
 								FD_FEAT = ['mnf','mdf','pkf','mnp','bp0','bp1','bp2','bp3']
 								td_idx = [i for i,n in enumerate(feat_names) if any(n.endswith(f'_{f}') for f in TD_FEAT)]
 								fd_idx = [i for i,n in enumerate(feat_names) if any(n.endswith(f'_{f}') for f in FD_FEAT)]
 								cc_idx = [i for i,n in enumerate(feat_names) if n.startswith('cov_') or n.startswith('cor_')]
 								print(f"Feature subsets — TD: {len(td_idx)}, FD: {len(fd_idx)}, CC: {len(cc_idx)}")
 								X_td = X[:, td_idx]
 								X_fd = X[:, fd_idx]
 								X_cc = X[:, cc_idx]
 								# ─── Train specialist LDAs with out-of-fold stacking ─────────────────────────
 								gkf = GroupKFold(n_splits=5)
 								print("Training specialist LDAs (out-of-fold for stacking)...")
 								lda_td = LinearDiscriminantAnalysis()
 								lda_fd = LinearDiscriminantAnalysis()
 								lda_cc = LinearDiscriminantAnalysis()
 								oof_td = cross_val_predict(lda_td, X_td, y, cv=gkf, groups=trial_ids, method='predict_proba')
 								oof_fd = cross_val_predict(lda_fd, X_fd, y, cv=gkf, groups=trial_ids, method='predict_proba')
 								oof_cc = cross_val_predict(lda_cc, X_cc, y, cv=gkf, groups=trial_ids, method='predict_proba')
 								# Specialist CV accuracy (for diagnostics)
 								for name, mdl, Xs in [('LDA_TD', lda_td, X_td), ('LDA_FD', lda_fd, X_fd), ('LDA_CC', lda_cc, X_cc)]:
 								    sc = cross_val_score(mdl, Xs, y, cv=gkf, groups=trial_ids)
 								    print(f"  {name}: {sc.mean()*100:.1f}% ± {sc.std()*100:.1f}%")
 								# ─── Train meta-LDA on out-of-fold outputs ───────────────────────────────────
 								X_meta = np.hstack([oof_td, oof_fd, oof_cc])   # (n_samples, 3*n_cls = 15)
 								meta_lda = LinearDiscriminantAnalysis()
 								meta_sc = cross_val_score(meta_lda, X_meta, y, cv=gkf, groups=trial_ids)
 								print(f"  Meta-LDA: {meta_sc.mean()*100:.1f}% ± {meta_sc.std()*100:.1f}%")
 								# Fit all models on full dataset for deployment
 								lda_td.fit(X_td, y); lda_fd.fit(X_fd, y); lda_cc.fit(X_cc, y)
 								meta_lda.fit(X_meta, y)
 								# ─── Export all weights to C header ──────────────────────────────────────────
 								def lda_to_c_arrays(lda, name, feat_dim, n_cls, label_names, class_order):
 								    """Generate C array strings for LDA weights and intercepts."""
 								    # Reorder classes to match label_names order
 								    coef = lda.coef_    # shape (n_cls, feat_dim) for LinearDiscriminantAnalysis
 								    intercept = lda.intercept_
 								    lines = []
 								    lines.append(f"const float {name}_WEIGHTS[{n_cls}][{feat_dim}] = {{")
 								    for c in class_order:
 								        row = ', '.join(f'{v:.8f}f' for v in coef[c])
 								        lines.append(f"    {{{row}}},  // {label_names[c]}")
 								    lines.append("};")
 								    lines.append(f"const float {name}_INTERCEPTS[{n_cls}] = {{")
 								    intercept_str = ', '.join(f'{intercept[c]:.8f}f' for c in class_order)
 								    lines.append(f"    {intercept_str}")
 								    lines.append("};")
 								    return '\n'.join(lines)
 								class_order = list(range(n_cls))
 								out_path = Path('EMG_Arm/src/core/model_weights_ensemble.h')
 								with open(out_path, 'w') as f:
 								    f.write("// Auto-generated by train_ensemble.py — do not edit\n")
 								    f.write("#pragma once\n\n")
 								    f.write(f"#define MODEL_NUM_CLASSES    {n_cls}\n")
 								    f.write(f"#define MODEL_NUM_FEATURES   {X.shape[1]}\n")
 								    f.write(f"#define ENSEMBLE_PER_CH_FEATURES 20\n\n")
 								    f.write(f"#define TD_FEAT_OFFSET  {min(td_idx)}\n")
 								    f.write(f"#define TD_NUM_FEATURES {len(td_idx)}\n")
 								    f.write(f"#define FD_FEAT_OFFSET  {min(fd_idx)}\n")
 								    f.write(f"#define FD_NUM_FEATURES {len(fd_idx)}\n")
 								    f.write(f"#define CC_FEAT_OFFSET  {min(cc_idx)}\n")
 								    f.write(f"#define CC_NUM_FEATURES {len(cc_idx)}\n")
 								    f.write(f"#define META_NUM_INPUTS ({3} * MODEL_NUM_CLASSES)\n\n")
 								    f.write(lda_to_c_arrays(lda_td,  'LDA_TD',  len(td_idx), n_cls, label_names, class_order))
 								    f.write('\n\n')
 								    f.write(lda_to_c_arrays(lda_fd,  'LDA_FD',  len(fd_idx), n_cls, label_names, class_order))
 								    f.write('\n\n')
 								    f.write(lda_to_c_arrays(lda_cc,  'LDA_CC',  len(cc_idx), n_cls, label_names, class_order))
 								    f.write('\n\n')
 								    f.write(lda_to_c_arrays(meta_lda, 'META_LDA', 3*n_cls,   n_cls, label_names, class_order))
 								    f.write('\n\n')
 								    names_str = ', '.join(f'"{label_names[c]}"' for c in class_order)
 								    f.write(f"const char *MODEL_CLASS_NAMES[MODEL_NUM_CLASSES] = {{{names_str}}};\n")
 								print(f"Exported ensemble weights to {out_path}")
 								print(f"Total weight storage: {(len(td_idx)+len(fd_idx)+len(cc_idx)+3*n_cls)*n_cls*4} bytes float32")
 								```
 								**Note on LinearDiscriminantAnalysis with multi-class**: scikit-learn's LDA uses a
 								`(n_classes-1, n_features)` coef matrix for multi-class. Verify `lda.coef_.shape` after
 								fitting — if it is `(n_cls-1, n_feat)` rather than `(n_cls, n_feat)`, use the
 								`decision_function()` output structure and adjust the export accordingly.
 								---
 								# PART VII — FEATURE SELECTION FOR ESP32 PORTING
 								After Change 1 is trained, use this to decide what to port to C firmware.
 								### Step 1 — Get Feature Importance
 								```python
 								importance = np.abs(classifier.model.coef_).mean(axis=0)
 								feat_names  = classifier.feature_extractor.get_feature_names(n_channels=len(HAND_CHANNELS))
 								ranked = sorted(zip(feat_names, importance), key=lambda x: -x[1])
 								print("Top 20 features by LDA discriminative weight:")
 								for name, score in ranked[:20]:
 								    print(f"  {name:<35} {score:.4f}")
 								```
 								### Step 2 — Port Decision Matrix
 								| Feature | C Complexity | Prereq | Port? |
 								|---------|-------------|--------|-------|
 								| RMS, WL, ZC, SSC | ✓ Already in C | — | Keep |
 								| MAV, VAR, IEMG | Very easy (1 loop) | None | ✓ Yes |
 								| WAMP | Very easy (threshold on diff) | None | ✓ Yes |
 								| Cross-ch covariance | Easy (3×3 outer product) | None | ✓ Yes |
 								| Cross-ch correlation | Easy (normalize covariance) | Covariance | ✓ Yes |
 								| Bandpower bp0–bp3 | Medium (128-pt FFT via esp-dsp) | Add FFT call | ✓ Yes — highest ROI |
 								| MNF, MDF, PKF, MNP | Easy after FFT | Bandpower FFT | ✓ Free once FFT added |
 								| AR(4) | Medium (Levinson-Durbin in C) | None | Only if top-8 importance |
 								Once `dsps_fft2r_fc32()` is added for bandpower, MNF/MDF/PKF/MNP come free.
 								### Step 3 — Adding FFT-Based Features to inference.c
 								Add inside `compute_features()` loop, after time-domain features per channel:
 								```c
 								// 128-pt FFT for frequency-domain features per channel
 								// Zero-pad signal from INFERENCE_WINDOW_SIZE (150) to 128 by truncating
 								float fft_buf[256] = {0};  // 128 complex floats
 								for (int i = 0; i < 128 && i < INFERENCE_WINDOW_SIZE; i++) {
 								    fft_buf[2*i]   = signal[i];  // real
 								    fft_buf[2*i+1] = 0.0f;       // imag
 								}
 								dsps_fft2r_fc32(fft_buf, 128);
 								dsps_bit_rev_fc32(fft_buf, 128);
 								// Bandpower: bin k → freq = k * 1000/128 ≈ k * 7.8125 Hz
 								// Band 0: 20–80 Hz  → bins  3–10
 								// Band 1: 80–150 Hz → bins 10–19
 								// Band 2: 150–300 Hz→ bins 19–38
 								// Band 3: 300–500 Hz→ bins 38–64
 								int band_bins[5] = {3, 10, 19, 38, 64};
 								float bp[4] = {0,0,0,0};
 								for (int b = 0; b < 4; b++)
 								    for (int k = band_bins[b]; k < band_bins[b+1]; k++) {
 								        float re = fft_buf[2*k], im = fft_buf[2*k+1];
 								        bp[b] += re*re + im*im;
 								    }
 								// Store at correct indices (base = ch * 20)
 								int base = ch * 20;
 								features_out[base+16] = bp[0]; features_out[base+17] = bp[1];
 								features_out[base+18] = bp[2]; features_out[base+19] = bp[3];
 								```
 								---
 								# PART VIII — MEASUREMENT AND VALIDATION
 								## Baseline Protocol
 								**Run this BEFORE any change and after EACH change.**
 								```
 . python learning_data_collection.py → option 3 (Train Classifier)
 . Record:
 								   - "Mean CV accuracy: XX.X% ± Y.Y%"  (cross-validation)
 								   - Confusion matrix (which gesture pairs are most confused)
 								   - Per-gesture accuracy breakdown
 . On-device test:
 								   - Put on sensors, perform 10 reps of each gesture
 								   - Log classification output (UART or Python serial monitor)
 								   - Compute per-gesture accuracy manually
 . Record REST false-trigger rate: hold arm at rest for 30 seconds,
 								   count number of non-REST outputs
 								```
 								## Results Log
 								| Change | CV Acc Before | CV Acc After | Delta | On-Device Acc | False Triggers/30s | Keep? |
 								|--------|--------------|-------------|-------|---------------|-------------------|-------|
 								| Baseline | — | — | — | — | — | — |
 								| Change C (reject) | — | — | — | — | — | — |
 								| Change B (filter) | — | — | — | — | — | — |
 								| Change 0 (label shift) | — | — | — | — | — | — |
 								| Change 1 (features) | — | — | — | — | — | — |
 								| Change D (NVS calib) | — | — | — | — | — | — |
 								| Change 3 (augment) | — | — | — | — | — | — |
 								| Change 5 (benchmark) | — | — | — | — | — | — |
 								| Change 7+F (ensemble) | — | — | — | — | — | — |
 								| Change E (MLP) | — | — | — | — | — | — |
 								## When to Add More Gestures
 								| CV Accuracy | Recommendation |
 								|-------------|----------------|
 								| <80% | Do NOT add gestures — fix the existing 5 first |
 								| 80–90% | Adding 1–2 gestures is reasonable; expect 5–8% drop per new gesture |
 								| >90% | Good baseline; can add gestures; target staying above 85% |
 								| >95% | Excellent; can be ambitious with gesture count |
 								---
 								# PART IX — EXPORT WORKFLOW
 								## Path 1 — LDA / Ensemble (Changes 0–4, 7+F)
 								```
 . Train: python learning_data_collection.py → option 3  (single LDA)
 								         OR: python train_ensemble.py                     (full ensemble)
 . Export:
 								   Single LDA:  classifier.export_to_header(Path('EMG_Arm/src/core/model_weights.h'))
 								   Ensemble:    export_ensemble_header() in train_ensemble.py
 								                → writes model_weights_ensemble.h
 . Port new features to inference.c (if Change 1 features added):
 								   - Follow feature selection decision matrix (Part VII)
 								   - CRITICAL: C feature index order MUST match Python FEATURE_ORDER exactly
 . Build + flash: pio run -t upload
 								```
 								## Path 2 — int8 MLP via TFLM (Change E)
 								```
 . python train_mlp_tflite.py  → emg_model_data.cc
 . Add TFLM to platformio.ini lib_deps
 . Replace LDA inference call with inference_mlp_predict() in inference.c
 								   OR use inference_ensemble_predict() which calls MLP as fallback (Change F)
 . pio run -t upload
 								```
 								## Feature Index Contract (Critical)
 								The order of values written to `features_out[]` in `compute_features()` in C **must exactly
 								match** `FEATURE_ORDER` in `extract_features_window()` in Python, index for index.
 								To verify before flashing: print both the C feature names (from `MODEL_FEATURE_NAMES` if
 								added to header) and Python `extractor.get_feature_names()` and diff them.
 								---
 								# PART X — REFERENCES
 								**Primary paper**: Kaifosh, P., Reardon, T., et al. "A high-bandwidth neuromotor prosthesis
 								enabled by implicit information in intrinsic motor neurons." *Nature* (2025).
 								doi:10.1038/s41586-025-09255-w
 								**Meta codebase** (label alignment, CLER metric, model architectures):
 								`C:/VSCode/Marvel_Projects/Meta_Emg_Stuff/generic-neuromotor-interface/`
 								- `data.py`: onset detection, `searchsorted` alignment, window jitter
 								- `cler.py`: threshold=0.35, debounce=50ms, tolerance=±50/250ms
 								- `networks.py`: model architectures, left_context=20, stride=10
 								- `lightning.py`: `targets[..., left_context::stride]` label shift
 								**Barachant et al. 2012**: "Multiclass brain–computer interface classification by
 								Riemannian geometry." — matrix logarithm reference (MPF features).
 								**Espressif libraries**:
 								- esp-dsp: `github.com/espressif/esp-dsp` — biquad, FFT, dot-product
 								- esp-dl: `github.com/espressif/esp-dl` — quantized MLP/CNN inference
 								- TFLite Micro: `github.com/tensorflow/tflite-micro`
 								**All project files** (existing + planned):
 								```
 								── Laptop / Python ─────────────────────────────────────────────────────────────────────────
 								C:/VSCode/Marvel_Projects/Bucky_Arm/learning_data_collection.py  ← main: data collection + training
 								C:/VSCode/Marvel_Projects/Bucky_Arm/live_predict.py              ← NEW (Part 0.6): laptop-side live inference
 								C:/VSCode/Marvel_Projects/Bucky_Arm/train_ensemble.py            ← NEW (Change 7): ensemble training
 								C:/VSCode/Marvel_Projects/Bucky_Arm/train_mlp_tflite.py          ← NEW (Change E): int8 MLP export
 								── ESP32 Firmware — Existing ───────────────────────────────────────────────────────────────
 								C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/platformio.ini
 								  └─ ADD lib_deps: espressif/esp-dsp (Changes B,1,F), tensorflow/tflite-micro (Change E)
 								C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/config/config.h
 								  └─ MODIFY: remove system_mode_t; add EMG_STANDALONE to MAIN_MODE enum (Part 0.7, S1)
 								C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/app/main.c
 								  └─ MODIFY: add STATE_LAPTOP_PREDICT, CMD_START_LAPTOP_PREDICT, run_laptop_predict_loop(),
 								             run_standalone_loop() (Part 0.5)
 								C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/drivers/emg_sensor.c
 								  └─ MODIFY (Change A): migrate from adc_oneshot to adc_continuous driver
 								C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/core/inference.c
 								  └─ MODIFY: add inference_get_gesture_by_name(), IIR filter (B), features (1), confidence rejection (C)
 								C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/core/inference.h
 								  └─ MODIFY: add inference_get_gesture_by_name() declaration
 								C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/core/gestures.c
 								  └─ MODIFY: update gesture_names[] and gestures_execute() when adding gestures
 								C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/core/model_weights.h
 								  └─ AUTO-GENERATED by export_to_header() — do not edit manually
 								── ESP32 Firmware — New Files ──────────────────────────────────────────────────────────────
 								C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/core/bicep.h/.c        ← Part 0 / Section 2.2
 								C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/core/calibration.h/.c  ← Change D (NVS z-score)
 								C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/core/inference_ensemble.h/.c  ← Change F
 								C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/core/inference_mlp.h/.cc      ← Change E
 								C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/core/model_weights_ensemble.h ← AUTO-GENERATED (Change 7)
 								C:/VSCode/Marvel_Projects/Bucky_Arm/EMG_Arm/src/core/emg_model_data.h/.cc     ← AUTO-GENERATED (Change E)
 								```