209 lines
7.1 KiB
Markdown
209 lines
7.1 KiB
Markdown
|
|
# Updated LevPodEnv - Physical System Clarification
|
|||
|
|
|
|||
|
|
## System Architecture
|
|||
|
|
|
|||
|
|
### Physical Configuration
|
|||
|
|
|
|||
|
|
**Two U-Shaped Magnetic Yokes:**
|
|||
|
|
- **Front Yoke**: Located at X = +0.1259m
|
|||
|
|
- Has two ends: Left (+Y = +0.0508m) and Right (-Y = -0.0508m)
|
|||
|
|
- Force is applied at center: X = +0.1259m, Y = 0m
|
|||
|
|
|
|||
|
|
- **Back Yoke**: Located at X = -0.1259m
|
|||
|
|
- Has two ends: Left (+Y = +0.0508m) and Right (-Y = -0.0508m)
|
|||
|
|
- Force is applied at center: X = -0.1259m, Y = 0m
|
|||
|
|
|
|||
|
|
**Four Independent Coil Currents:**
|
|||
|
|
1. `curr_front_L`: Current around front yoke's left (+Y) end
|
|||
|
|
2. `curr_front_R`: Current around front yoke's right (-Y) end
|
|||
|
|
3. `curr_back_L`: Current around back yoke's left (+Y) end
|
|||
|
|
4. `curr_back_R`: Current around back yoke's right (-Y) end
|
|||
|
|
|
|||
|
|
**Current Range:** -15A to +15A (from Ansys CSV data)
|
|||
|
|
- Negative current: Strengthens permanent magnet field → stronger attraction
|
|||
|
|
- Positive current: Weakens permanent magnet field → weaker attraction
|
|||
|
|
|
|||
|
|
### Collision Geometry in URDF
|
|||
|
|
|
|||
|
|
**Yoke Ends (4 boxes):** Represent the tips of the U-yokes where gap is measured
|
|||
|
|
- Front Left: (+0.1259m, +0.0508m, +0.08585m)
|
|||
|
|
- Front Right: (+0.1259m, -0.0508m, +0.08585m)
|
|||
|
|
- Back Left: (-0.1259m, +0.0508m, +0.08585m)
|
|||
|
|
- Back Right: (-0.1259m, -0.0508m, +0.08585m)
|
|||
|
|
|
|||
|
|
**Sensors (4 cylinders):** Physical gap sensors at different locations
|
|||
|
|
- Center Right: (0m, +0.0508m, +0.08585m)
|
|||
|
|
- Center Left: (0m, -0.0508m, +0.08585m)
|
|||
|
|
- Front: (+0.2366m, 0m, +0.08585m)
|
|||
|
|
- Back: (-0.2366m, 0m, +0.08585m)
|
|||
|
|
|
|||
|
|
## RL Environment Interface
|
|||
|
|
|
|||
|
|
### Action Space
|
|||
|
|
**Type:** `Box(4)`, Range: [-1, 1]
|
|||
|
|
|
|||
|
|
**Actions:** `[pwm_front_L, pwm_front_R, pwm_back_L, pwm_back_R]`
|
|||
|
|
- PWM duty cycles for the 4 independent coils
|
|||
|
|
- Converted to currents via RL circuit model: `di/dt = (V_pwm - I*R) / L`
|
|||
|
|
|
|||
|
|
### Observation Space
|
|||
|
|
**Type:** `Box(4)`, Range: [-inf, inf]
|
|||
|
|
|
|||
|
|
**Observations:** `[sensor_center_right, sensor_center_left, sensor_front, sensor_back]`
|
|||
|
|
- **Noisy sensor readings** (not direct yoke measurements)
|
|||
|
|
- Noise: Gaussian with σ = 0.1mm (0.0001m)
|
|||
|
|
- Agent must learn system dynamics from sensor data alone
|
|||
|
|
- Velocities not directly provided - agent can learn from temporal sequence if needed
|
|||
|
|
|
|||
|
|
### Force Application Physics
|
|||
|
|
|
|||
|
|
For each timestep:
|
|||
|
|
|
|||
|
|
1. **Measure yoke end gap heights** (from 4 yoke collision boxes)
|
|||
|
|
2. **Average left/right ends** for each U-yoke:
|
|||
|
|
- `avg_gap_front = (gap_front_L + gap_front_R) / 2`
|
|||
|
|
- `avg_gap_back = (gap_back_L + gap_back_R) / 2`
|
|||
|
|
|
|||
|
|
3. **Calculate roll angle** from yoke end positions:
|
|||
|
|
```python
|
|||
|
|
roll_front = arctan((gap_right - gap_left) / y_distance)
|
|||
|
|
roll_back = arctan((gap_right - gap_left) / y_distance)
|
|||
|
|
roll = (roll_front + roll_back) / 2
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
4. **Predict forces** using maglev_predictor:
|
|||
|
|
```python
|
|||
|
|
force_front, torque_front = predictor.predict(
|
|||
|
|
curr_front_L, curr_front_R, roll_deg, gap_front_mm
|
|||
|
|
)
|
|||
|
|
force_back, torque_back = predictor.predict(
|
|||
|
|
curr_back_L, curr_back_R, roll_deg, gap_back_mm
|
|||
|
|
)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
5. **Apply forces at Y=0** (center of each U-yoke):
|
|||
|
|
- Front force at: `[+0.1259, 0, 0.08585]`
|
|||
|
|
- Back force at: `[-0.1259, 0, 0.08585]`
|
|||
|
|
|
|||
|
|
6. **Apply roll torques** from each yoke independently
|
|||
|
|
|
|||
|
|
### Key Design Decisions
|
|||
|
|
|
|||
|
|
**Why 4 actions instead of 2?**
|
|||
|
|
- Physical system has 4 independent electromagnets (one per yoke end)
|
|||
|
|
- Allows fine control of roll torque
|
|||
|
|
- Left/right current imbalance on each yoke creates torque
|
|||
|
|
|
|||
|
|
**Why sensor observations instead of yoke measurements?**
|
|||
|
|
- Realistic: sensors are at different positions than yokes
|
|||
|
|
- Adds partial observability challenge
|
|||
|
|
- Agent must learn system dynamics to infer unmeasured states
|
|||
|
|
- Sensor noise simulates real measurement uncertainty
|
|||
|
|
|
|||
|
|
**Why not include velocities in observation?**
|
|||
|
|
- Agent can learn velocities from temporal sequence (frame stacking)
|
|||
|
|
- Reduces observation dimensionality
|
|||
|
|
- Tests if agent can learn dynamic behavior from gap measurements alone
|
|||
|
|
|
|||
|
|
**Current sign convention:**
|
|||
|
|
- No conversion needed - currents fed directly to predictor
|
|||
|
|
- Range: -15A to +15A (from Ansys model)
|
|||
|
|
- Coil RL circuit naturally produces currents in this range
|
|||
|
|
|
|||
|
|
### Comparison with Original Design
|
|||
|
|
|
|||
|
|
| Feature | Original | Updated |
|
|||
|
|
|---------|----------|---------|
|
|||
|
|
| **Actions** | 2 (left/right coils) | 4 (front_L, front_R, back_L, back_R) |
|
|||
|
|
| **Observations** | 5 (gaps, roll, velocities) | 4 (noisy sensor gaps) |
|
|||
|
|
| **Gap Measurement** | Direct yoke positions | Noisy sensor positions |
|
|||
|
|
| **Force Application** | Front & back yoke centers | Front & back yoke centers ✓ |
|
|||
|
|
| **Current Range** | Assumed negative only | -15A to +15A |
|
|||
|
|
| **Roll Calculation** | From yoke end heights | From yoke end heights ✓ |
|
|||
|
|
|
|||
|
|
## Physics Pipeline (Per Timestep)
|
|||
|
|
|
|||
|
|
1. **Action → Currents**
|
|||
|
|
```
|
|||
|
|
PWM[4] → RL Circuit Model → Currents[4]
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
2. **State Measurement**
|
|||
|
|
```
|
|||
|
|
Yoke End Positions[4] → Gap Heights[4] → Average per Yoke[2]
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
3. **Roll Calculation**
|
|||
|
|
```
|
|||
|
|
(Gap_Right - Gap_Left) / Y_distance → Roll Angle
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
4. **Force Prediction**
|
|||
|
|
```
|
|||
|
|
(currL, currR, roll, gap) → Maglev Predictor → (force, torque)
|
|||
|
|
Applied separately for front and back yokes
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
5. **Force Application**
|
|||
|
|
```
|
|||
|
|
Forces at Y=0 for each yoke + Roll torques
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
6. **Observation Generation**
|
|||
|
|
```
|
|||
|
|
Sensor Positions[4] → Gap Heights[4] → Add Noise → Observation[4]
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## Info Dictionary
|
|||
|
|
|
|||
|
|
Each `env.step()` returns comprehensive diagnostics:
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
{
|
|||
|
|
'curr_front_L': float, # Front left coil current (A)
|
|||
|
|
'curr_front_R': float, # Front right coil current (A)
|
|||
|
|
'curr_back_L': float, # Back left coil current (A)
|
|||
|
|
'curr_back_R': float, # Back right coil current (A)
|
|||
|
|
'gap_front_yoke': float, # Front yoke average gap (m)
|
|||
|
|
'gap_back_yoke': float, # Back yoke average gap (m)
|
|||
|
|
'roll': float, # Roll angle (rad)
|
|||
|
|
'force_front': float, # Front yoke force (N)
|
|||
|
|
'force_back': float, # Back yoke force (N)
|
|||
|
|
'torque_front': float, # Front yoke torque (mN·m)
|
|||
|
|
'torque_back': float # Back yoke torque (mN·m)
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## Testing
|
|||
|
|
|
|||
|
|
Run the updated test script:
|
|||
|
|
```bash
|
|||
|
|
cd "/Users/adipu/Documents/lev_control_4pt_small/RL Testing"
|
|||
|
|
/opt/miniconda3/envs/RLenv/bin/python test_env.py
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Expected behavior:
|
|||
|
|
- 4 sensors report gap heights with small noise variations
|
|||
|
|
- Yoke gaps (in info) match sensor gaps approximately
|
|||
|
|
- All 4 coils build up current over time (RL circuit dynamics)
|
|||
|
|
- Forces should be ~50-100N upward at 10mm gap with moderate currents
|
|||
|
|
- Pod should begin to levitate if forces overcome gravity (5.8kg × 9.81 = 56.898 N needed)
|
|||
|
|
|
|||
|
|
## Next Steps for RL Training
|
|||
|
|
|
|||
|
|
1. **Frame Stacking**: Use 3-5 consecutive observations to give agent velocity information
|
|||
|
|
```python
|
|||
|
|
from stable_baselines3.common.vec_env import VecFrameStack
|
|||
|
|
env = VecFrameStack(env, n_stack=4)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
2. **Algorithm Selection**: PPO or SAC recommended
|
|||
|
|
- PPO: Good for continuous control, stable training
|
|||
|
|
- SAC: Better sample efficiency, handles stochastic dynamics
|
|||
|
|
|
|||
|
|
3. **Reward Tuning**: Current reward weights may need adjustment based on training performance
|
|||
|
|
|
|||
|
|
4. **Curriculum Learning**: Start with smaller gap errors, gradually increase difficulty
|
|||
|
|
|
|||
|
|
5. **Domain Randomization**: Vary sensor noise, mass, etc. for robust policy
|