7.1 KiB
Updated LevPodEnv - Physical System Clarification
System Architecture
Physical Configuration
Two U-Shaped Magnetic Yokes:
-
Front Yoke: Located at X = +0.1259m
- Has two ends: Left (+Y = +0.0508m) and Right (-Y = -0.0508m)
- Force is applied at center: X = +0.1259m, Y = 0m
-
Back Yoke: Located at X = -0.1259m
- Has two ends: Left (+Y = +0.0508m) and Right (-Y = -0.0508m)
- Force is applied at center: X = -0.1259m, Y = 0m
Four Independent Coil Currents:
curr_front_L: Current around front yoke's left (+Y) endcurr_front_R: Current around front yoke's right (-Y) endcurr_back_L: Current around back yoke's left (+Y) endcurr_back_R: Current around back yoke's right (-Y) end
Current Range: -15A to +15A (from Ansys CSV data)
- Negative current: Strengthens permanent magnet field → stronger attraction
- Positive current: Weakens permanent magnet field → weaker attraction
Collision Geometry in URDF
Yoke Ends (4 boxes): Represent the tips of the U-yokes where gap is measured
- Front Left: (+0.1259m, +0.0508m, +0.08585m)
- Front Right: (+0.1259m, -0.0508m, +0.08585m)
- Back Left: (-0.1259m, +0.0508m, +0.08585m)
- Back Right: (-0.1259m, -0.0508m, +0.08585m)
Sensors (4 cylinders): Physical gap sensors at different locations
- Center Right: (0m, +0.0508m, +0.08585m)
- Center Left: (0m, -0.0508m, +0.08585m)
- Front: (+0.2366m, 0m, +0.08585m)
- Back: (-0.2366m, 0m, +0.08585m)
RL Environment Interface
Action Space
Type: Box(4), Range: [-1, 1]
Actions: [pwm_front_L, pwm_front_R, pwm_back_L, pwm_back_R]
- PWM duty cycles for the 4 independent coils
- Converted to currents via RL circuit model:
di/dt = (V_pwm - I*R) / L
Observation Space
Type: Box(4), Range: [-inf, inf]
Observations: [sensor_center_right, sensor_center_left, sensor_front, sensor_back]
- Noisy sensor readings (not direct yoke measurements)
- Noise: Gaussian with σ = 0.1mm (0.0001m)
- Agent must learn system dynamics from sensor data alone
- Velocities not directly provided - agent can learn from temporal sequence if needed
Force Application Physics
For each timestep:
-
Measure yoke end gap heights (from 4 yoke collision boxes)
-
Average left/right ends for each U-yoke:
avg_gap_front = (gap_front_L + gap_front_R) / 2avg_gap_back = (gap_back_L + gap_back_R) / 2
-
Calculate roll angle from yoke end positions:
roll_front = arctan((gap_right - gap_left) / y_distance) roll_back = arctan((gap_right - gap_left) / y_distance) roll = (roll_front + roll_back) / 2 -
Predict forces using maglev_predictor:
force_front, torque_front = predictor.predict( curr_front_L, curr_front_R, roll_deg, gap_front_mm ) force_back, torque_back = predictor.predict( curr_back_L, curr_back_R, roll_deg, gap_back_mm ) -
Apply forces at Y=0 (center of each U-yoke):
- Front force at:
[+0.1259, 0, 0.08585] - Back force at:
[-0.1259, 0, 0.08585]
- Front force at:
-
Apply roll torques from each yoke independently
Key Design Decisions
Why 4 actions instead of 2?
- Physical system has 4 independent electromagnets (one per yoke end)
- Allows fine control of roll torque
- Left/right current imbalance on each yoke creates torque
Why sensor observations instead of yoke measurements?
- Realistic: sensors are at different positions than yokes
- Adds partial observability challenge
- Agent must learn system dynamics to infer unmeasured states
- Sensor noise simulates real measurement uncertainty
Why not include velocities in observation?
- Agent can learn velocities from temporal sequence (frame stacking)
- Reduces observation dimensionality
- Tests if agent can learn dynamic behavior from gap measurements alone
Current sign convention:
- No conversion needed - currents fed directly to predictor
- Range: -15A to +15A (from Ansys model)
- Coil RL circuit naturally produces currents in this range
Comparison with Original Design
| Feature | Original | Updated |
|---|---|---|
| Actions | 2 (left/right coils) | 4 (front_L, front_R, back_L, back_R) |
| Observations | 5 (gaps, roll, velocities) | 4 (noisy sensor gaps) |
| Gap Measurement | Direct yoke positions | Noisy sensor positions |
| Force Application | Front & back yoke centers | Front & back yoke centers ✓ |
| Current Range | Assumed negative only | -15A to +15A |
| Roll Calculation | From yoke end heights | From yoke end heights ✓ |
Physics Pipeline (Per Timestep)
-
Action → Currents
PWM[4] → RL Circuit Model → Currents[4] -
State Measurement
Yoke End Positions[4] → Gap Heights[4] → Average per Yoke[2] -
Roll Calculation
(Gap_Right - Gap_Left) / Y_distance → Roll Angle -
Force Prediction
(currL, currR, roll, gap) → Maglev Predictor → (force, torque) Applied separately for front and back yokes -
Force Application
Forces at Y=0 for each yoke + Roll torques -
Observation Generation
Sensor Positions[4] → Gap Heights[4] → Add Noise → Observation[4]
Info Dictionary
Each env.step() returns comprehensive diagnostics:
{
'curr_front_L': float, # Front left coil current (A)
'curr_front_R': float, # Front right coil current (A)
'curr_back_L': float, # Back left coil current (A)
'curr_back_R': float, # Back right coil current (A)
'gap_front_yoke': float, # Front yoke average gap (m)
'gap_back_yoke': float, # Back yoke average gap (m)
'roll': float, # Roll angle (rad)
'force_front': float, # Front yoke force (N)
'force_back': float, # Back yoke force (N)
'torque_front': float, # Front yoke torque (mN·m)
'torque_back': float # Back yoke torque (mN·m)
}
Testing
Run the updated test script:
cd "/Users/adipu/Documents/lev_control_4pt_small/RL Testing"
/opt/miniconda3/envs/RLenv/bin/python test_env.py
Expected behavior:
- 4 sensors report gap heights with small noise variations
- Yoke gaps (in info) match sensor gaps approximately
- All 4 coils build up current over time (RL circuit dynamics)
- Forces should be ~50-100N upward at 10mm gap with moderate currents
- Pod should begin to levitate if forces overcome gravity (5.8kg × 9.81 = 56.898 N needed)
Next Steps for RL Training
-
Frame Stacking: Use 3-5 consecutive observations to give agent velocity information
from stable_baselines3.common.vec_env import VecFrameStack env = VecFrameStack(env, n_stack=4) -
Algorithm Selection: PPO or SAC recommended
- PPO: Good for continuous control, stable training
- SAC: Better sample efficiency, handles stochastic dynamics
-
Reward Tuning: Current reward weights may need adjustment based on training performance
-
Curriculum Learning: Start with smaller gap errors, gradually increase difficulty
-
Domain Randomization: Vary sensor noise, mass, etc. for robust policy