Files

2026-06-12 02:55:04 -07:00

22 KiB

Raw Permalink Blame History

Model Interface Specification

This document specifies exactly how the trained policy network interfaces with the onboard controller running on the MSPM0 sensor board. It exists so the simulation used for training can produce weights and biases that transfer directly to the robot without any additional tuning on the firmware side.

The controller is a residual architecture. A PD baseline controller computes an action every iteration. A small linear policy network produces a correction (delta) that is added to the PD action and clamped to safe limits. The simulation must train the network to output these corrections, not full actions.

1. Architecture

Network type: single-layer linear regression. No hidden layers, no activations.
Shape: output[3] = W[3][13] * input[13] + b[3]
Numeric format: Q16 signed fixed point throughout. Unity = 65536. Internal accumulator uses int64_t to prevent overflow during multiply.
Parameter count: 42 (39 weights + 3 biases).

Because the network is linear, training can treat it as a 39-weight multivariate regression or as a tiny policy network for reinforcement learning. Either is acceptable.

2. Constants (authoritative)

All constants live in Model.h. Sim must use these exact values. A shared header or a copy kept in lock-step is mandatory — any mismatch silently produces incorrect behavior on hardware.

Constant	Value	Units	Purpose
`CAP_IR`	800	mm	Maximum usable IR distance; inputs above this saturate to Q16 = 65536
`CAP_TFLUNA`	1000	mm	Maximum usable TF-Luna distance for normalization
`CAP_THROTTLE`	9000	PWM units	Maximum throttle command magnitude
`CAP_STEERING`	30	degrees	Maximum steering magnitude (±30°)
`CAP_ANGLE`	64	degrees	Wall-angle clamp magnitude for model input
`CAP_ANGLE_POW`	6	—	`log2(CAP_ANGLE)`, used to turn the angle normalize into a pure shift
`CAP_DELTA_STEERING`	10	degrees	Max magnitude of residual correction to steering
`CAP_DELTA_THROTTLE`	2000	PWM units	Max magnitude of residual correction to each throttle
`CAP_YAW`	16384	raw LSB	Yaw-rate clamp; 16384 LSB ≈ 125 deg/s at the MPU-6050 default ±250 deg/s range (131 LSB/(deg/s))
`CAP_ACCEL`	8192	raw LSB	Accel clamp (lateral and longitudinal); 8192 LSB ≈ 0.5 g at the MPU-6050 default ±2 g range (16384 LSB/g)

The delta caps determine how much authority the model has over the PD controller. Small delta caps keep the controller close to the known-safe PD baseline. Larger delta caps allow the model to override more aggressively. Start with the values above.

3. Input Vector Layout

The input vector has 13 elements in this exact order (matches input_t enum in Model.h):

Index	Name	Source	Raw Range	Normalization	Q16 Encoding
0	`ir_right`	Right IR sensor, mm	[0, ≥800]	`Model_Normalize(x, CAP_IR)`	unsigned [0, 65536]
1	`ir_left`	Left IR sensor, mm	[0, ≥800]	`Model_Normalize(x, CAP_IR)`	unsigned [0, 65536]
2	`tf_left`	Left TF-Luna, mm	[0, ≥1000]	`Model_Normalize(x, CAP_TFLUNA)`	unsigned [0, 65536]
3	`tf_middle`	Front TF-Luna, mm	[0, ≥1000]	`Model_Normalize(x, CAP_TFLUNA)`	unsigned [0, 65536]
4	`tf_right`	Right TF-Luna, mm	[0, ≥1000]	`Model_Normalize(x, CAP_TFLUNA)`	unsigned [0, 65536]
5	`throttle_left_prev`	Applied left throttle, previous iter	[0, 9000]	`Model_Normalize(x, CAP_THROTTLE)`	unsigned [0, 65536]
6	`throttle_right_prev`	Applied right throttle, previous iter	[0, 9000]	`Model_Normalize(x, CAP_THROTTLE)`	unsigned [0, 65536]
7	`steering_prev`	Applied steering angle, previous iter	[-30, +30]	`Model_NormalizeSigned(x, CAP_STEERING)`	offset [0, 65536], center = 32768
8	`angle_left`	Wall angle from left sensors, deg	clamped to [-64, +64]	`(x + 64) << 9`	offset [0, 65536], center = 32768
9	`angle_right`	Wall angle from right sensors, deg	clamped to [-64, +64]	`(x + 64) << 9`	offset [0, 65536], center = 32768
10	`yaw_rate`	`-IMU_GyroZ` (sign flipped so + = right turn), raw LSB	clamped to [-16384, +16384]	`Model_NormalizeSigned(x, CAP_YAW)`	offset [0, 65536], center = 32768
11	`accel_lat`	`IMU_AccelX` (+ = chassis accel to right), raw LSB	clamped to [-8192, +8192]	`Model_NormalizeSigned(x, CAP_ACCEL)`	offset [0, 65536], center = 32768
12	`accel_long`	`IMU_AccelY` (+ = forward accel), raw LSB	clamped to [-8192, +8192]	`Model_NormalizeSigned(x, CAP_ACCEL)`	offset [0, 65536], center = 32768

Normalization formulas

Unsigned sensor inputs (distance, throttle):

out = (input << 16) / cap
if out > 65536: out = 65536          # saturate, do not wrap

Signed inputs centered on zero (steering, wall angle):

clamp input to [-cap, +cap]
out = ((input + cap) << 16) / (2 * cap)

This maps -cap → 0, 0 → 32768, +cap → 65536.

Wall angle (special case using power-of-two cap):

clamp angle to [-64, +64]
out = (angle + 64) << 9         # equivalent to ((angle+64) << 16) / 128

IMU inputs (indices 10-12)

The three IMU inputs come from the GY-521 / MPU-6050 and use raw 16-bit LSB from the chip, not converted physical units. Scale factors (default MPU-6050 ranges): 131 LSB/(deg/s) for gyro, 16384 LSB/g for accel. Full documentation of axis mapping, sign conventions, and scale factors lives in IMU_behavior.md.

Sign conventions used throughout the project:

steeringAngle > 0 = right turn.
yaw_rate > 0 = right turn. This requires negating IMU_GyroZ when reading from the driver, because the raw gyro uses right-hand-rule (left turn = positive). The firmware does this negation at the input-packing site.
accel_lat > 0 = chassis accelerating to the right. Matches IMU_AccelX directly; no sign flip.
accel_long > 0 = chassis accelerating forward. Matches IMU_AccelY directly; no sign flip.

Simulation must produce equivalent raw-LSB values (either by simulating the chip quantization explicitly, or by converting physical units: raw_gyro = round(dps * 131), raw_accel = round(g * 16384)). The sim must also apply the negation on the gyro-Z channel before normalization so that the model sees a consistent "+ = right turn" convention across yaw rate and steering.

Saturation note: Model_NormalizeSigned clamps out-of-range inputs to [-CAP, +CAP] internally. Hard corners that exceed 125 deg/s yaw or 0.5 g lateral produce saturated inputs; the model cannot distinguish "at limit" from "beyond limit." Raise the caps if the log shows frequent saturation.

Previous-action inputs

Inputs 5, 6, 7 are the actual applied action from the previous iteration, meaning the PD baseline plus residual delta plus any clamp. They are not the raw model output before clamping. The firmware writes these three inputs at the end of each control iteration, after Model_ApplyResidual runs.

On the first iteration, the firmware explicitly initializes steering_prev = 32768 (center). The two throttle prev values are zero-initialized (motors off at boot).

The simulation must replicate this exactly: after applying and clamping the action, record the clamped values as the prev-inputs for the next timestep.

4. Output Vector Layout

The output vector has 3 elements in this exact order (matches output_t enum in Model.h):

Index	Name	Encoding	Semantic
0	`throttle_left`	offset Q16 [0, 65536], center = 32768	delta added to PD left throttle
1	`throttle_right`	offset Q16 [0, 65536], center = 32768	delta added to PD right throttle
2	`steering`	offset Q16 [0, 65536], center = 32768	delta added to PD steering

Output 32768 = zero delta = pure PD behavior. This is the key property that makes untrained models safe to flash: with all weights and biases equal to zero, the model outputs zero. To make the default output 32768 (not 0), the bias vector is initialized to {32768, 32768, 32768}, meaning an untrained or zero-weighted model produces exactly the PD baseline action.

Denormalization (what firmware does with outputs)

delta_steer = ((steering       - 32768) * CAP_DELTA_STEERING) >> 15
delta_thr_l = ((throttle_left  - 32768) * CAP_DELTA_THROTTLE) >> 15
delta_thr_r = ((throttle_right - 32768) * CAP_DELTA_THROTTLE) >> 15

Note the >> 15 (not >> 16). Subtracting the offset halves the usable range, so one less shift reaches full ±CAP_DELTA_*.

Final applied action:

steering_applied = clamp(PD_steering + delta_steer, -CAP_STEERING, +CAP_STEERING)
throttle_l_applied = clamp(PD_throttle_l + delta_thr_l, 0, CAP_THROTTLE)
throttle_r_applied = clamp(PD_throttle_r + delta_thr_r, 0, CAP_THROTTLE)

5. Inference

Pure matrix multiply plus bias. Pseudocode:

for r in 0..NUM_OUTPUTS:
  Model_Outputs[r] = 0
  for c in 0..NUM_INPUTS:
    Model_Outputs[r] += fixed_mul(Model_Weights[r][c], Model_Inputs[c])
  Model_Outputs[r] += Model_Bias[r]

where fixed_mul(x, y) = ((int64_t)x * y) >> 16

Weights and biases are int32_t (signed). Inputs are effectively unsigned but stored as int32_t. Output accumulator is int32_t; individual multiplies use int64_t intermediate to prevent Q16×Q16 overflow.

No clamping is applied to the raw output before denormalization. If the linear math produces values outside [0, 65536], the final clamp in Model_ApplyResidual catches them. This keeps inference lightweight.

6. PD Baseline (what sim must replicate before training)

The residual model only learns corrections on top of PD. The simulation must implement the exact PD controller used on-device, otherwise residuals learned in sim will not generalize to the robot. The controller is in RTOS_SensorBoard.c:Robot().

Step-by-step

Given sensor readings d_ir, ld_ir (right/left IR, mm), d2, ld2 (right/left TF-Luna, mm), front (front TF-Luna, mm):

# Wall geometry
angle   = arctan((d_ir*1414  - d2*1000)  / (224 + d2))  - 5        # right-side wall angle, deg
L_angle = arctan((ld_ir*1414 - ld2*1000) / (224 + ld2))            # left-side wall angle, deg

realDist   = d_ir  * cos(angle)   / 1000        # perpendicular right-wall distance
L_realDist = ld_ir * cos(L_angle) / 1000        # perpendicular left-wall distance

# Distance PD
e_d = realDist - L_realDist - 0                                   # dist_ref = 0
intend_angle = (kp_d * e_d) / 10 + (kd_d * (e_d - prevError)) / 10
prevError = e_d

# Angle PD
e_a = intend_angle - angle
steering = (e_a * kp_a) / 10 + ((e_a - prevE_A) * kd_a) / 10
prevE_A = e_a

# Base throttle
throttle_base = 9000

# Corner override
if front < 600:
  throttle_base -= 2000
  urgency = (600 - front) >> 4
  steering = -urgency if (ld2 > d2) else +urgency

# Clamp steering
steering = clamp(steering, -30, +30)

# Differential steering
throttle_l = throttle_base
throttle_r = throttle_base
if steering <= -15: throttle_r -= 2000
elif steering >= 15: throttle_l -= 2000

PD gain constants

Constant	Value
`kp_d`	1
`kd_d`	2
`kp_a`	5
`kd_a`	2
`angle_ref`	5
`dist_ref`	0

All PD math uses integer division by 10 in two places.

Integer arithmetic

The firmware does everything in signed integer math. The sim may use floats internally, but the PD path should behave identically if arithmetic precision is sufficient. Integer truncation in the real PD means tiny error terms can round to zero — if the sim is too precise, residual training may chase rounding artifacts that do not exist on the robot.

7. Full Simulation Control Loop (per timestep)

# 1. Read simulated sensors
(d_ir, ld_ir, d2, ld2, front) = sim_sensors()
(raw_gyro_z, raw_accel_x, raw_accel_y) = sim_imu_raw_lsb()     # see Section 3 for scale

# 2. Run PD baseline (see Section 6)
(pd_throttle_l, pd_throttle_r, pd_steering) = pd_controller(sensors, prev_pd_state)

# 3. Build model input vector (see Section 3)
inputs[0]  = normalize(d_ir,  CAP_IR)
inputs[1]  = normalize(ld_ir, CAP_IR)
inputs[2]  = normalize(ld2,   CAP_TFLUNA)
inputs[3]  = normalize(front, CAP_TFLUNA)
inputs[4]  = normalize(d2,    CAP_TFLUNA)
inputs[5]  = prev_applied_throttle_l_normalized      # from previous timestep
inputs[6]  = prev_applied_throttle_r_normalized
inputs[7]  = prev_applied_steering_normalized
inputs[8]  = encode_angle(L_angle)
inputs[9]  = encode_angle(angle)
inputs[10] = normalize_signed(-raw_gyro_z, CAP_YAW)            # sign flip: + = right turn
inputs[11] = normalize_signed( raw_accel_x, CAP_ACCEL)          # + = right
inputs[12] = normalize_signed( raw_accel_y, CAP_ACCEL)          # + = forward

# 4. Run inference
raw_outputs = W @ inputs + b

# 5. Compute deltas (see Section 4)
delta_steer = ((raw_outputs[2] - 32768) * CAP_DELTA_STEERING) >> 15
delta_thr_l = ((raw_outputs[0] - 32768) * CAP_DELTA_THROTTLE) >> 15
delta_thr_r = ((raw_outputs[1] - 32768) * CAP_DELTA_THROTTLE) >> 15

# 6. Apply and clamp
applied_steering   = clamp(pd_steering   + delta_steer, -CAP_STEERING, +CAP_STEERING)
applied_throttle_l = clamp(pd_throttle_l + delta_thr_l, 0, CAP_THROTTLE)
applied_throttle_r = clamp(pd_throttle_r + delta_thr_r, 0, CAP_THROTTLE)

# 7. Update prev-inputs for next timestep
prev_applied_throttle_l_normalized = normalize(applied_throttle_l, CAP_THROTTLE)
prev_applied_throttle_r_normalized = normalize(applied_throttle_r, CAP_THROTTLE)
prev_applied_steering_normalized   = normalize_signed(applied_steering, CAP_STEERING)

# 8. Step physics with (applied_throttle_l, applied_throttle_r, applied_steering)

Initial conditions

At the start of each simulated episode:

prev_applied_throttle_l_normalized = 0
prev_applied_throttle_r_normalized = 0
prev_applied_steering_normalized = 32768
PD state (prevError, prevE_A) = 0

Control rate

The firmware loop runs at approximately 12.5 Hz (one control iteration per sensor update cycle). Simulation timestep should target the same rate so trained weights assume the correct latency and discretization.

8. Captured Log Format (`robot0`)

The robot writes a CSV of every control iteration to the onboard SD card under the filename robot0 (no extension — the custom eFile filesystem stores raw blocks, not FAT). The file is rewritten from scratch at the start of every run (eFile_Delete then eFile_Create).

Header line

time_ms,ir_r,ir_l,tf_r,tf_l,tf_front,throttle_l,throttle_r,steering,gyro_z,accel_x,accel_y

Column definitions

Column	Units	Notes
`time_ms`	ms since `OS_ClearMsTime()` at Robot() start	monotonic per run
`ir_r`	mm	right IR reading (`d_ir` in firmware, already doubled from raw `Distance`)
`ir_l`	mm	left IR reading (`ld_ir`, already doubled from raw `L_Distance`)
`tf_r`	mm	right TF-Luna (`d2`, 8-sample mean unless `USE_MEDIAN_FILTER`)
`tf_l`	mm	left TF-Luna (`ld2`, same filter)
`tf_front`	mm	front TF-Luna (`FrontDist`)
`throttle_l`	PWM units, [0, 9000]	applied left throttle sent to CAN
`throttle_r`	PWM units, [0, 9000]	applied right throttle sent to CAN
`steering`	degrees, [-30, +30]	applied steering angle sent to CAN, signed decimal
`gyro_z`	raw LSB, signed	`IMU_GyroZ` as read from the chip (NOT sign-flipped). 131 LSB per deg/s. Negate at parse time to match the project's "+ = right turn" convention.
`accel_x`	raw LSB, signed	`IMU_AccelX` as read. 16384 LSB per g. + = chassis accel to right (matches steering sign).
`accel_y`	raw LSB, signed	`IMU_AccelY` as read. 16384 LSB per g. + = forward accel (speeding up).

All integer decimals. One row per control iteration (~12.5 Hz). No trailing comma. Newline is \n. IMU values are snapshotted from the IMU globals at the moment of logging; the IMU_Task thread updates them at ~50 Hz, so consecutive rows may repeat identical IMU samples.

Capture workflow (host side)

Flash the sensor-board firmware with Robot() as the active main and logging enabled (uncomment the FileDumpRow block in Robot()).
Drive a lap. Firmware stops logging on disk-full; stop conditions are handled in-loop.
Reflash with DumpRobotFileMain() as the active main (trampoline in main()).
On the host: ./log_serial.sh robot0.csv — the script auto-detects /dev/ttyACM0/1 or /dev/ttyUSB0/1, configures 115200 8N1 raw, and streams UART to the file.
Press S2 (PB21) on the LaunchPad to trigger the dump.
Stop the host capture with Ctrl+C once streaming ends.
Open robot0.csv in a text editor; delete any non-CSV preamble lines (e.g. the "Ready. Press S2..." prompt) if present, then load in Excel or pandas.

Simulation uses of the log

The log is the ground truth for aligning the sim to the real robot. Three uses:

Sensor noise fit. Compute statistics (variance, autocorrelation, bias) on each sensor column during straight-line segments. Inject matching noise into the sim so trained residuals do not chase noise patterns that differ between sim and hardware.
PD replay validation. Feed (ir_r, ir_l, tf_r, tf_l, tf_front) through the sim's PD controller (Section 6). Compare the sim's computed throttle_l, throttle_r, steering against the logged columns. They should match up to integer rounding. Any systematic divergence indicates a sim-side implementation bug in the PD path that must be fixed before any RL training runs.
Behavioral cloning warm start. Feed the log through the sim pipeline, train the residual network to predict near-zero deltas (because PD alone produced the logged actions). This gives a safe starting point for RL fine-tuning with the correct sim plumbing verified.

All three uses require the sim's PD controller, normalization, and sensor saturation to match Model_Interface.md exactly before the log data becomes meaningful.

9. Training Recommendations

Objective

The model must learn small corrections that improve lap performance beyond the PD baseline. Suggested reward components:

Positive: forward progress along track centerline; lap completion.
Negative: wall contact; excessive yaw rate; large delta magnitude (discourages the model from fighting PD when PD is already good).

Apply a small L2 penalty on weights to keep the residual conservative. A large magnitude residual means the model is overriding PD instead of correcting it — usually a sign of training instability or misaligned sim physics.

Method choices

The network is small enough (33 parameters) that several methods work:

Evolution strategies — spawn N weight vectors, evaluate each on one or more tracks, mutate the best, iterate. Gradient-free, handles the clamp non-differentiability, works well at this scale.
REINFORCE / PPO — standard policy gradient; treat the linear model as a deterministic policy with Gaussian exploration noise added to outputs during training.
Behavioral cloning from logs — record laps with the PD-only controller, then train the residual to predict zero everywhere (producing the PD behavior). This is a cheap sanity check: a correctly trained residual on PD-optimal data should produce near-zero deltas. Use the robot-log CSV captured via the DumpRobotFile path.
Residual DDPG — works for continuous actions; requires care to prevent large residuals.

Sim fidelity requirements

Items the sim must model for trained weights to transfer:

PD controller matching the pseudocode in Section 6 bit-for-bit where integer arithmetic is visible.
Sensor saturation at the cap values (IR >800mm and TF-Luna >1000mm produce Q16 = 65536).
Corner override triggered at front < 600, with the exact urgency = (600-front) >> 4 formula.
Differential steering thresholds at ±15°.
One-timestep latency between action and next observation (prev-inputs are from the previous iteration).
A measurement-noise model on the sensor values. Use the captured robot log to fit noise statistics.

10. Weight Export

After training, export weights as C array literals in Q16:

int32_t q_weight = round(float_weight * 65536)
int32_t q_bias   = round(float_bias   * 65536)

The final artifact is a block of C that replaces the current initializers in Model.c:

const fixed_t Model_Weights[NUM_OUTPUTS][NUM_INPUTS] = {
  { w_00, w_01, w_02, w_03, w_04, w_05, w_06, w_07, w_08, w_09, w_0A, w_0B, w_0C },  // throttle_left
  { w_10, w_11, w_12, w_13, w_14, w_15, w_16, w_17, w_18, w_19, w_1A, w_1B, w_1C },  // throttle_right
  { w_20, w_21, w_22, w_23, w_24, w_25, w_26, w_27, w_28, w_29, w_2A, w_2B, w_2C },  // steering
};

const fixed_t Model_Bias[NUM_OUTPUTS] = { b_0, b_1, b_2 };

Column order within each row must match the input_t enum order listed in Section 3. Row order must match the output_t enum order listed in Section 4.

Numeric range checks before export

Each weight's absolute value should stay well below 2^15 in Q16 (i.e., raw float magnitude below ~0.5), or intermediate fixed_mul products can be large enough that the accumulator approaches int32_t limits when summed across 13 inputs. If weights grow larger, enforce an L2 penalty during training.
Bias magnitudes typically stay within [0, 65536] (the output range). Values far outside this range indicate the model is trying to shift the nominal output far from center — usually a training issue.

Validation

Before deploying to hardware:

Replay the robot's captured log CSV through the simulated pipeline and confirm the sim produces the same PD outputs as were logged on the robot (up to rounding).
Run the trained model in the sim for full laps; verify no crashes, no throttle saturation, and modest delta magnitudes.
Flash, start in low-risk mode (e.g., reduced CAP_DELTA_* for the first on-track run) and scale up if behavior is stable.

22 KiB Raw Permalink Blame History Unescape Escape