built environment
This commit is contained in:
208
RL Testing/ENV_UPDATE.md
Normal file
208
RL Testing/ENV_UPDATE.md
Normal file
@@ -0,0 +1,208 @@
|
||||
# Updated LevPodEnv - Physical System Clarification
|
||||
|
||||
## System Architecture
|
||||
|
||||
### Physical Configuration
|
||||
|
||||
**Two U-Shaped Magnetic Yokes:**
|
||||
- **Front Yoke**: Located at X = +0.1259m
|
||||
- Has two ends: Left (+Y = +0.0508m) and Right (-Y = -0.0508m)
|
||||
- Force is applied at center: X = +0.1259m, Y = 0m
|
||||
|
||||
- **Back Yoke**: Located at X = -0.1259m
|
||||
- Has two ends: Left (+Y = +0.0508m) and Right (-Y = -0.0508m)
|
||||
- Force is applied at center: X = -0.1259m, Y = 0m
|
||||
|
||||
**Four Independent Coil Currents:**
|
||||
1. `curr_front_L`: Current around front yoke's left (+Y) end
|
||||
2. `curr_front_R`: Current around front yoke's right (-Y) end
|
||||
3. `curr_back_L`: Current around back yoke's left (+Y) end
|
||||
4. `curr_back_R`: Current around back yoke's right (-Y) end
|
||||
|
||||
**Current Range:** -15A to +15A (from Ansys CSV data)
|
||||
- Negative current: Strengthens permanent magnet field → stronger attraction
|
||||
- Positive current: Weakens permanent magnet field → weaker attraction
|
||||
|
||||
### Collision Geometry in URDF
|
||||
|
||||
**Yoke Ends (4 boxes):** Represent the tips of the U-yokes where gap is measured
|
||||
- Front Left: (+0.1259m, +0.0508m, +0.08585m)
|
||||
- Front Right: (+0.1259m, -0.0508m, +0.08585m)
|
||||
- Back Left: (-0.1259m, +0.0508m, +0.08585m)
|
||||
- Back Right: (-0.1259m, -0.0508m, +0.08585m)
|
||||
|
||||
**Sensors (4 cylinders):** Physical gap sensors at different locations
|
||||
- Center Right: (0m, +0.0508m, +0.08585m)
|
||||
- Center Left: (0m, -0.0508m, +0.08585m)
|
||||
- Front: (+0.2366m, 0m, +0.08585m)
|
||||
- Back: (-0.2366m, 0m, +0.08585m)
|
||||
|
||||
## RL Environment Interface
|
||||
|
||||
### Action Space
|
||||
**Type:** `Box(4)`, Range: [-1, 1]
|
||||
|
||||
**Actions:** `[pwm_front_L, pwm_front_R, pwm_back_L, pwm_back_R]`
|
||||
- PWM duty cycles for the 4 independent coils
|
||||
- Converted to currents via RL circuit model: `di/dt = (V_pwm - I*R) / L`
|
||||
|
||||
### Observation Space
|
||||
**Type:** `Box(4)`, Range: [-inf, inf]
|
||||
|
||||
**Observations:** `[sensor_center_right, sensor_center_left, sensor_front, sensor_back]`
|
||||
- **Noisy sensor readings** (not direct yoke measurements)
|
||||
- Noise: Gaussian with σ = 0.1mm (0.0001m)
|
||||
- Agent must learn system dynamics from sensor data alone
|
||||
- Velocities not directly provided - agent can learn from temporal sequence if needed
|
||||
|
||||
### Force Application Physics
|
||||
|
||||
For each timestep:
|
||||
|
||||
1. **Measure yoke end gap heights** (from 4 yoke collision boxes)
|
||||
2. **Average left/right ends** for each U-yoke:
|
||||
- `avg_gap_front = (gap_front_L + gap_front_R) / 2`
|
||||
- `avg_gap_back = (gap_back_L + gap_back_R) / 2`
|
||||
|
||||
3. **Calculate roll angle** from yoke end positions:
|
||||
```python
|
||||
roll_front = arctan((gap_right - gap_left) / y_distance)
|
||||
roll_back = arctan((gap_right - gap_left) / y_distance)
|
||||
roll = (roll_front + roll_back) / 2
|
||||
```
|
||||
|
||||
4. **Predict forces** using maglev_predictor:
|
||||
```python
|
||||
force_front, torque_front = predictor.predict(
|
||||
curr_front_L, curr_front_R, roll_deg, gap_front_mm
|
||||
)
|
||||
force_back, torque_back = predictor.predict(
|
||||
curr_back_L, curr_back_R, roll_deg, gap_back_mm
|
||||
)
|
||||
```
|
||||
|
||||
5. **Apply forces at Y=0** (center of each U-yoke):
|
||||
- Front force at: `[+0.1259, 0, 0.08585]`
|
||||
- Back force at: `[-0.1259, 0, 0.08585]`
|
||||
|
||||
6. **Apply roll torques** from each yoke independently
|
||||
|
||||
### Key Design Decisions
|
||||
|
||||
**Why 4 actions instead of 2?**
|
||||
- Physical system has 4 independent electromagnets (one per yoke end)
|
||||
- Allows fine control of roll torque
|
||||
- Left/right current imbalance on each yoke creates torque
|
||||
|
||||
**Why sensor observations instead of yoke measurements?**
|
||||
- Realistic: sensors are at different positions than yokes
|
||||
- Adds partial observability challenge
|
||||
- Agent must learn system dynamics to infer unmeasured states
|
||||
- Sensor noise simulates real measurement uncertainty
|
||||
|
||||
**Why not include velocities in observation?**
|
||||
- Agent can learn velocities from temporal sequence (frame stacking)
|
||||
- Reduces observation dimensionality
|
||||
- Tests if agent can learn dynamic behavior from gap measurements alone
|
||||
|
||||
**Current sign convention:**
|
||||
- No conversion needed - currents fed directly to predictor
|
||||
- Range: -15A to +15A (from Ansys model)
|
||||
- Coil RL circuit naturally produces currents in this range
|
||||
|
||||
### Comparison with Original Design
|
||||
|
||||
| Feature | Original | Updated |
|
||||
|---------|----------|---------|
|
||||
| **Actions** | 2 (left/right coils) | 4 (front_L, front_R, back_L, back_R) |
|
||||
| **Observations** | 5 (gaps, roll, velocities) | 4 (noisy sensor gaps) |
|
||||
| **Gap Measurement** | Direct yoke positions | Noisy sensor positions |
|
||||
| **Force Application** | Front & back yoke centers | Front & back yoke centers ✓ |
|
||||
| **Current Range** | Assumed negative only | -15A to +15A |
|
||||
| **Roll Calculation** | From yoke end heights | From yoke end heights ✓ |
|
||||
|
||||
## Physics Pipeline (Per Timestep)
|
||||
|
||||
1. **Action → Currents**
|
||||
```
|
||||
PWM[4] → RL Circuit Model → Currents[4]
|
||||
```
|
||||
|
||||
2. **State Measurement**
|
||||
```
|
||||
Yoke End Positions[4] → Gap Heights[4] → Average per Yoke[2]
|
||||
```
|
||||
|
||||
3. **Roll Calculation**
|
||||
```
|
||||
(Gap_Right - Gap_Left) / Y_distance → Roll Angle
|
||||
```
|
||||
|
||||
4. **Force Prediction**
|
||||
```
|
||||
(currL, currR, roll, gap) → Maglev Predictor → (force, torque)
|
||||
Applied separately for front and back yokes
|
||||
```
|
||||
|
||||
5. **Force Application**
|
||||
```
|
||||
Forces at Y=0 for each yoke + Roll torques
|
||||
```
|
||||
|
||||
6. **Observation Generation**
|
||||
```
|
||||
Sensor Positions[4] → Gap Heights[4] → Add Noise → Observation[4]
|
||||
```
|
||||
|
||||
## Info Dictionary
|
||||
|
||||
Each `env.step()` returns comprehensive diagnostics:
|
||||
|
||||
```python
|
||||
{
|
||||
'curr_front_L': float, # Front left coil current (A)
|
||||
'curr_front_R': float, # Front right coil current (A)
|
||||
'curr_back_L': float, # Back left coil current (A)
|
||||
'curr_back_R': float, # Back right coil current (A)
|
||||
'gap_front_yoke': float, # Front yoke average gap (m)
|
||||
'gap_back_yoke': float, # Back yoke average gap (m)
|
||||
'roll': float, # Roll angle (rad)
|
||||
'force_front': float, # Front yoke force (N)
|
||||
'force_back': float, # Back yoke force (N)
|
||||
'torque_front': float, # Front yoke torque (mN·m)
|
||||
'torque_back': float # Back yoke torque (mN·m)
|
||||
}
|
||||
```
|
||||
|
||||
## Testing
|
||||
|
||||
Run the updated test script:
|
||||
```bash
|
||||
cd "/Users/adipu/Documents/lev_control_4pt_small/RL Testing"
|
||||
/opt/miniconda3/envs/RLenv/bin/python test_env.py
|
||||
```
|
||||
|
||||
Expected behavior:
|
||||
- 4 sensors report gap heights with small noise variations
|
||||
- Yoke gaps (in info) match sensor gaps approximately
|
||||
- All 4 coils build up current over time (RL circuit dynamics)
|
||||
- Forces should be ~50-100N upward at 10mm gap with moderate currents
|
||||
- Pod should begin to levitate if forces overcome gravity (5.8kg × 9.81 = 56.898 N needed)
|
||||
|
||||
## Next Steps for RL Training
|
||||
|
||||
1. **Frame Stacking**: Use 3-5 consecutive observations to give agent velocity information
|
||||
```python
|
||||
from stable_baselines3.common.vec_env import VecFrameStack
|
||||
env = VecFrameStack(env, n_stack=4)
|
||||
```
|
||||
|
||||
2. **Algorithm Selection**: PPO or SAC recommended
|
||||
- PPO: Good for continuous control, stable training
|
||||
- SAC: Better sample efficiency, handles stochastic dynamics
|
||||
|
||||
3. **Reward Tuning**: Current reward weights may need adjustment based on training performance
|
||||
|
||||
4. **Curriculum Learning**: Start with smaller gap errors, gradually increase difficulty
|
||||
|
||||
5. **Domain Randomization**: Vary sensor noise, mass, etc. for robust policy
|
||||
Reference in New Issue
Block a user