Files
guadaloop_lev_control/RL Testing/ENV_UPDATE.md
2025-12-10 15:50:20 -06:00

209 lines
7.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Updated LevPodEnv - Physical System Clarification
## System Architecture
### Physical Configuration
**Two U-Shaped Magnetic Yokes:**
- **Front Yoke**: Located at X = +0.1259m
- Has two ends: Left (+Y = +0.0508m) and Right (-Y = -0.0508m)
- Force is applied at center: X = +0.1259m, Y = 0m
- **Back Yoke**: Located at X = -0.1259m
- Has two ends: Left (+Y = +0.0508m) and Right (-Y = -0.0508m)
- Force is applied at center: X = -0.1259m, Y = 0m
**Four Independent Coil Currents:**
1. `curr_front_L`: Current around front yoke's left (+Y) end
2. `curr_front_R`: Current around front yoke's right (-Y) end
3. `curr_back_L`: Current around back yoke's left (+Y) end
4. `curr_back_R`: Current around back yoke's right (-Y) end
**Current Range:** -15A to +15A (from Ansys CSV data)
- Negative current: Strengthens permanent magnet field → stronger attraction
- Positive current: Weakens permanent magnet field → weaker attraction
### Collision Geometry in URDF
**Yoke Ends (4 boxes):** Represent the tips of the U-yokes where gap is measured
- Front Left: (+0.1259m, +0.0508m, +0.08585m)
- Front Right: (+0.1259m, -0.0508m, +0.08585m)
- Back Left: (-0.1259m, +0.0508m, +0.08585m)
- Back Right: (-0.1259m, -0.0508m, +0.08585m)
**Sensors (4 cylinders):** Physical gap sensors at different locations
- Center Right: (0m, +0.0508m, +0.08585m)
- Center Left: (0m, -0.0508m, +0.08585m)
- Front: (+0.2366m, 0m, +0.08585m)
- Back: (-0.2366m, 0m, +0.08585m)
## RL Environment Interface
### Action Space
**Type:** `Box(4)`, Range: [-1, 1]
**Actions:** `[pwm_front_L, pwm_front_R, pwm_back_L, pwm_back_R]`
- PWM duty cycles for the 4 independent coils
- Converted to currents via RL circuit model: `di/dt = (V_pwm - I*R) / L`
### Observation Space
**Type:** `Box(4)`, Range: [-inf, inf]
**Observations:** `[sensor_center_right, sensor_center_left, sensor_front, sensor_back]`
- **Noisy sensor readings** (not direct yoke measurements)
- Noise: Gaussian with σ = 0.1mm (0.0001m)
- Agent must learn system dynamics from sensor data alone
- Velocities not directly provided - agent can learn from temporal sequence if needed
### Force Application Physics
For each timestep:
1. **Measure yoke end gap heights** (from 4 yoke collision boxes)
2. **Average left/right ends** for each U-yoke:
- `avg_gap_front = (gap_front_L + gap_front_R) / 2`
- `avg_gap_back = (gap_back_L + gap_back_R) / 2`
3. **Calculate roll angle** from yoke end positions:
```python
roll_front = arctan((gap_right - gap_left) / y_distance)
roll_back = arctan((gap_right - gap_left) / y_distance)
roll = (roll_front + roll_back) / 2
```
4. **Predict forces** using maglev_predictor:
```python
force_front, torque_front = predictor.predict(
curr_front_L, curr_front_R, roll_deg, gap_front_mm
)
force_back, torque_back = predictor.predict(
curr_back_L, curr_back_R, roll_deg, gap_back_mm
)
```
5. **Apply forces at Y=0** (center of each U-yoke):
- Front force at: `[+0.1259, 0, 0.08585]`
- Back force at: `[-0.1259, 0, 0.08585]`
6. **Apply roll torques** from each yoke independently
### Key Design Decisions
**Why 4 actions instead of 2?**
- Physical system has 4 independent electromagnets (one per yoke end)
- Allows fine control of roll torque
- Left/right current imbalance on each yoke creates torque
**Why sensor observations instead of yoke measurements?**
- Realistic: sensors are at different positions than yokes
- Adds partial observability challenge
- Agent must learn system dynamics to infer unmeasured states
- Sensor noise simulates real measurement uncertainty
**Why not include velocities in observation?**
- Agent can learn velocities from temporal sequence (frame stacking)
- Reduces observation dimensionality
- Tests if agent can learn dynamic behavior from gap measurements alone
**Current sign convention:**
- No conversion needed - currents fed directly to predictor
- Range: -15A to +15A (from Ansys model)
- Coil RL circuit naturally produces currents in this range
### Comparison with Original Design
| Feature | Original | Updated |
|---------|----------|---------|
| **Actions** | 2 (left/right coils) | 4 (front_L, front_R, back_L, back_R) |
| **Observations** | 5 (gaps, roll, velocities) | 4 (noisy sensor gaps) |
| **Gap Measurement** | Direct yoke positions | Noisy sensor positions |
| **Force Application** | Front & back yoke centers | Front & back yoke centers ✓ |
| **Current Range** | Assumed negative only | -15A to +15A |
| **Roll Calculation** | From yoke end heights | From yoke end heights ✓ |
## Physics Pipeline (Per Timestep)
1. **Action → Currents**
```
PWM[4] → RL Circuit Model → Currents[4]
```
2. **State Measurement**
```
Yoke End Positions[4] → Gap Heights[4] → Average per Yoke[2]
```
3. **Roll Calculation**
```
(Gap_Right - Gap_Left) / Y_distance → Roll Angle
```
4. **Force Prediction**
```
(currL, currR, roll, gap) → Maglev Predictor → (force, torque)
Applied separately for front and back yokes
```
5. **Force Application**
```
Forces at Y=0 for each yoke + Roll torques
```
6. **Observation Generation**
```
Sensor Positions[4] → Gap Heights[4] → Add Noise → Observation[4]
```
## Info Dictionary
Each `env.step()` returns comprehensive diagnostics:
```python
{
'curr_front_L': float, # Front left coil current (A)
'curr_front_R': float, # Front right coil current (A)
'curr_back_L': float, # Back left coil current (A)
'curr_back_R': float, # Back right coil current (A)
'gap_front_yoke': float, # Front yoke average gap (m)
'gap_back_yoke': float, # Back yoke average gap (m)
'roll': float, # Roll angle (rad)
'force_front': float, # Front yoke force (N)
'force_back': float, # Back yoke force (N)
'torque_front': float, # Front yoke torque (mN·m)
'torque_back': float # Back yoke torque (mN·m)
}
```
## Testing
Run the updated test script:
```bash
cd "/Users/adipu/Documents/lev_control_4pt_small/RL Testing"
/opt/miniconda3/envs/RLenv/bin/python test_env.py
```
Expected behavior:
- 4 sensors report gap heights with small noise variations
- Yoke gaps (in info) match sensor gaps approximately
- All 4 coils build up current over time (RL circuit dynamics)
- Forces should be ~50-100N upward at 10mm gap with moderate currents
- Pod should begin to levitate if forces overcome gravity (5.8kg × 9.81 = 56.898 N needed)
## Next Steps for RL Training
1. **Frame Stacking**: Use 3-5 consecutive observations to give agent velocity information
```python
from stable_baselines3.common.vec_env import VecFrameStack
env = VecFrameStack(env, n_stack=4)
```
2. **Algorithm Selection**: PPO or SAC recommended
- PPO: Good for continuous control, stable training
- SAC: Better sample efficiency, handles stochastic dynamics
3. **Reward Tuning**: Current reward weights may need adjustment based on training performance
4. **Curriculum Learning**: Start with smaller gap errors, gradually increase difficulty
5. **Domain Randomization**: Vary sensor noise, mass, etc. for robust policy