guadaloop_lev_control/RL Testing/ENV_UPDATE.md

# Updated LevPodEnv - Physical System Clarification

## System Architecture

### Physical Configuration

**Two U-Shaped Magnetic Yokes:**
- **Front Yoke**: Located at X = +0.1259m
  - Has two ends: Left (+Y = +0.0508m) and Right (-Y = -0.0508m)
  - Force is applied at center: X = +0.1259m, Y = 0m

- **Back Yoke**: Located at X = -0.1259m
  - Has two ends: Left (+Y = +0.0508m) and Right (-Y = -0.0508m)
  - Force is applied at center: X = -0.1259m, Y = 0m

**Four Independent Coil Currents:**
1. `curr_front_L`: Current around front yoke's left (+Y) end
2. `curr_front_R`: Current around front yoke's right (-Y) end
3. `curr_back_L`: Current around back yoke's left (+Y) end
4. `curr_back_R`: Current around back yoke's right (-Y) end

**Current Range:** -15A to +15A (from Ansys CSV data)
- Negative current: Strengthens permanent magnet field → stronger attraction
- Positive current: Weakens permanent magnet field → weaker attraction

### Collision Geometry in URDF

**Yoke Ends (4 boxes):** Represent the tips of the U-yokes where gap is measured
- Front Left: (+0.1259m, +0.0508m, +0.08585m)
- Front Right: (+0.1259m, -0.0508m, +0.08585m)
- Back Left: (-0.1259m, +0.0508m, +0.08585m)
- Back Right: (-0.1259m, -0.0508m, +0.08585m)

**Sensors (4 cylinders):** Physical gap sensors at different locations
- Center Right: (0m, +0.0508m, +0.08585m)
- Center Left: (0m, -0.0508m, +0.08585m)
- Front: (+0.2366m, 0m, +0.08585m)
- Back: (-0.2366m, 0m, +0.08585m)

## RL Environment Interface

### Action Space
**Type:** `Box(4)`, Range: [-1, 1]

**Actions:** `[pwm_front_L, pwm_front_R, pwm_back_L, pwm_back_R]`
- PWM duty cycles for the 4 independent coils
- Converted to currents via RL circuit model: `di/dt = (V_pwm - I*R) / L`

### Observation Space
**Type:** `Box(4)`, Range: [-inf, inf]

**Observations:** `[sensor_center_right, sensor_center_left, sensor_front, sensor_back]`
- **Noisy sensor readings** (not direct yoke measurements)
- Noise: Gaussian with σ = 0.1mm (0.0001m)
- Agent must learn system dynamics from sensor data alone
- Velocities not directly provided - agent can learn from temporal sequence if needed

### Force Application Physics

For each timestep:

1. **Measure yoke end gap heights** (from 4 yoke collision boxes)
2. **Average left/right ends** for each U-yoke:
   - `avg_gap_front = (gap_front_L + gap_front_R) / 2`
   - `avg_gap_back = (gap_back_L + gap_back_R) / 2`

3. **Calculate roll angle** from yoke end positions:
   ```python
   roll_front = arctan((gap_right - gap_left) / y_distance)
   roll_back = arctan((gap_right - gap_left) / y_distance)
   roll = (roll_front + roll_back) / 2
   ```

4. **Predict forces** using maglev_predictor:
   ```python
   force_front, torque_front = predictor.predict(
       curr_front_L, curr_front_R, roll_deg, gap_front_mm
   )
   force_back, torque_back = predictor.predict(
       curr_back_L, curr_back_R, roll_deg, gap_back_mm
   )
   ```

5. **Apply forces at Y=0** (center of each U-yoke):
   - Front force at: `[+0.1259, 0, 0.08585]`
   - Back force at: `[-0.1259, 0, 0.08585]`

6. **Apply roll torques** from each yoke independently

### Key Design Decisions

**Why 4 actions instead of 2?**
- Physical system has 4 independent electromagnets (one per yoke end)
- Allows fine control of roll torque
- Left/right current imbalance on each yoke creates torque

**Why sensor observations instead of yoke measurements?**
- Realistic: sensors are at different positions than yokes
- Adds partial observability challenge
- Agent must learn system dynamics to infer unmeasured states
- Sensor noise simulates real measurement uncertainty

**Why not include velocities in observation?**
- Agent can learn velocities from temporal sequence (frame stacking)
- Reduces observation dimensionality
- Tests if agent can learn dynamic behavior from gap measurements alone

**Current sign convention:**
- No conversion needed - currents fed directly to predictor
- Range: -15A to +15A (from Ansys model)
- Coil RL circuit naturally produces currents in this range

### Comparison with Original Design

| Feature | Original | Updated |
|---------|----------|---------|
| **Actions** | 2 (left/right coils) | 4 (front_L, front_R, back_L, back_R) |
| **Observations** | 5 (gaps, roll, velocities) | 4 (noisy sensor gaps) |
| **Gap Measurement** | Direct yoke positions | Noisy sensor positions |
| **Force Application** | Front & back yoke centers | Front & back yoke centers ✓ |
| **Current Range** | Assumed negative only | -15A to +15A |
| **Roll Calculation** | From yoke end heights | From yoke end heights ✓ |

## Physics Pipeline (Per Timestep)

1. **Action → Currents**
   ```
   PWM[4] → RL Circuit Model → Currents[4]
   ```

2. **State Measurement**
   ```
   Yoke End Positions[4] → Gap Heights[4] → Average per Yoke[2]
   ```

3. **Roll Calculation**
   ```
   (Gap_Right - Gap_Left) / Y_distance → Roll Angle
   ```

4. **Force Prediction**
   ```
   (currL, currR, roll, gap) → Maglev Predictor → (force, torque)
   Applied separately for front and back yokes
   ```

5. **Force Application**
   ```
   Forces at Y=0 for each yoke + Roll torques
   ```

6. **Observation Generation**
   ```
   Sensor Positions[4] → Gap Heights[4] → Add Noise → Observation[4]
   ```

## Info Dictionary

Each `env.step()` returns comprehensive diagnostics:

```python
{
    'curr_front_L': float,      # Front left coil current (A)
    'curr_front_R': float,      # Front right coil current (A)
    'curr_back_L': float,       # Back left coil current (A)
    'curr_back_R': float,       # Back right coil current (A)
    'gap_front_yoke': float,    # Front yoke average gap (m)
    'gap_back_yoke': float,     # Back yoke average gap (m)
    'roll': float,              # Roll angle (rad)
    'force_front': float,       # Front yoke force (N)
    'force_back': float,        # Back yoke force (N)
    'torque_front': float,      # Front yoke torque (mN·m)
    'torque_back': float        # Back yoke torque (mN·m)
}
```

## Testing

Run the updated test script:
```bash
cd "/Users/adipu/Documents/lev_control_4pt_small/RL Testing"
/opt/miniconda3/envs/RLenv/bin/python test_env.py
```

Expected behavior:
- 4 sensors report gap heights with small noise variations
- Yoke gaps (in info) match sensor gaps approximately
- All 4 coils build up current over time (RL circuit dynamics)
- Forces should be ~50-100N upward at 10mm gap with moderate currents
- Pod should begin to levitate if forces overcome gravity (5.8kg × 9.81 = 56.898 N needed)

## Next Steps for RL Training

1. **Frame Stacking**: Use 3-5 consecutive observations to give agent velocity information
   ```python
   from stable_baselines3.common.vec_env import VecFrameStack
   env = VecFrameStack(env, n_stack=4)
   ```

2. **Algorithm Selection**: PPO or SAC recommended
   - PPO: Good for continuous control, stable training
   - SAC: Better sample efficiency, handles stochastic dynamics

3. **Reward Tuning**: Current reward weights may need adjustment based on training performance

4. **Curriculum Learning**: Start with smaller gap errors, gradually increase difficulty

5. **Domain Randomization**: Vary sensor noise, mass, etc. for robust policy