Files
guadaloop_lev_control/RL Testing/ENV_UPDATE.md
2025-12-10 15:50:20 -06:00

7.1 KiB
Raw Blame History

Updated LevPodEnv - Physical System Clarification

System Architecture

Physical Configuration

Two U-Shaped Magnetic Yokes:

  • Front Yoke: Located at X = +0.1259m

    • Has two ends: Left (+Y = +0.0508m) and Right (-Y = -0.0508m)
    • Force is applied at center: X = +0.1259m, Y = 0m
  • Back Yoke: Located at X = -0.1259m

    • Has two ends: Left (+Y = +0.0508m) and Right (-Y = -0.0508m)
    • Force is applied at center: X = -0.1259m, Y = 0m

Four Independent Coil Currents:

  1. curr_front_L: Current around front yoke's left (+Y) end
  2. curr_front_R: Current around front yoke's right (-Y) end
  3. curr_back_L: Current around back yoke's left (+Y) end
  4. curr_back_R: Current around back yoke's right (-Y) end

Current Range: -15A to +15A (from Ansys CSV data)

  • Negative current: Strengthens permanent magnet field → stronger attraction
  • Positive current: Weakens permanent magnet field → weaker attraction

Collision Geometry in URDF

Yoke Ends (4 boxes): Represent the tips of the U-yokes where gap is measured

  • Front Left: (+0.1259m, +0.0508m, +0.08585m)
  • Front Right: (+0.1259m, -0.0508m, +0.08585m)
  • Back Left: (-0.1259m, +0.0508m, +0.08585m)
  • Back Right: (-0.1259m, -0.0508m, +0.08585m)

Sensors (4 cylinders): Physical gap sensors at different locations

  • Center Right: (0m, +0.0508m, +0.08585m)
  • Center Left: (0m, -0.0508m, +0.08585m)
  • Front: (+0.2366m, 0m, +0.08585m)
  • Back: (-0.2366m, 0m, +0.08585m)

RL Environment Interface

Action Space

Type: Box(4), Range: [-1, 1]

Actions: [pwm_front_L, pwm_front_R, pwm_back_L, pwm_back_R]

  • PWM duty cycles for the 4 independent coils
  • Converted to currents via RL circuit model: di/dt = (V_pwm - I*R) / L

Observation Space

Type: Box(4), Range: [-inf, inf]

Observations: [sensor_center_right, sensor_center_left, sensor_front, sensor_back]

  • Noisy sensor readings (not direct yoke measurements)
  • Noise: Gaussian with σ = 0.1mm (0.0001m)
  • Agent must learn system dynamics from sensor data alone
  • Velocities not directly provided - agent can learn from temporal sequence if needed

Force Application Physics

For each timestep:

  1. Measure yoke end gap heights (from 4 yoke collision boxes)

  2. Average left/right ends for each U-yoke:

    • avg_gap_front = (gap_front_L + gap_front_R) / 2
    • avg_gap_back = (gap_back_L + gap_back_R) / 2
  3. Calculate roll angle from yoke end positions:

    roll_front = arctan((gap_right - gap_left) / y_distance)
    roll_back = arctan((gap_right - gap_left) / y_distance)
    roll = (roll_front + roll_back) / 2
    
  4. Predict forces using maglev_predictor:

    force_front, torque_front = predictor.predict(
        curr_front_L, curr_front_R, roll_deg, gap_front_mm
    )
    force_back, torque_back = predictor.predict(
        curr_back_L, curr_back_R, roll_deg, gap_back_mm
    )
    
  5. Apply forces at Y=0 (center of each U-yoke):

    • Front force at: [+0.1259, 0, 0.08585]
    • Back force at: [-0.1259, 0, 0.08585]
  6. Apply roll torques from each yoke independently

Key Design Decisions

Why 4 actions instead of 2?

  • Physical system has 4 independent electromagnets (one per yoke end)
  • Allows fine control of roll torque
  • Left/right current imbalance on each yoke creates torque

Why sensor observations instead of yoke measurements?

  • Realistic: sensors are at different positions than yokes
  • Adds partial observability challenge
  • Agent must learn system dynamics to infer unmeasured states
  • Sensor noise simulates real measurement uncertainty

Why not include velocities in observation?

  • Agent can learn velocities from temporal sequence (frame stacking)
  • Reduces observation dimensionality
  • Tests if agent can learn dynamic behavior from gap measurements alone

Current sign convention:

  • No conversion needed - currents fed directly to predictor
  • Range: -15A to +15A (from Ansys model)
  • Coil RL circuit naturally produces currents in this range

Comparison with Original Design

Feature Original Updated
Actions 2 (left/right coils) 4 (front_L, front_R, back_L, back_R)
Observations 5 (gaps, roll, velocities) 4 (noisy sensor gaps)
Gap Measurement Direct yoke positions Noisy sensor positions
Force Application Front & back yoke centers Front & back yoke centers ✓
Current Range Assumed negative only -15A to +15A
Roll Calculation From yoke end heights From yoke end heights ✓

Physics Pipeline (Per Timestep)

  1. Action → Currents

    PWM[4] → RL Circuit Model → Currents[4]
    
  2. State Measurement

    Yoke End Positions[4] → Gap Heights[4] → Average per Yoke[2]
    
  3. Roll Calculation

    (Gap_Right - Gap_Left) / Y_distance → Roll Angle
    
  4. Force Prediction

    (currL, currR, roll, gap) → Maglev Predictor → (force, torque)
    Applied separately for front and back yokes
    
  5. Force Application

    Forces at Y=0 for each yoke + Roll torques
    
  6. Observation Generation

    Sensor Positions[4] → Gap Heights[4] → Add Noise → Observation[4]
    

Info Dictionary

Each env.step() returns comprehensive diagnostics:

{
    'curr_front_L': float,      # Front left coil current (A)
    'curr_front_R': float,      # Front right coil current (A)
    'curr_back_L': float,       # Back left coil current (A)
    'curr_back_R': float,       # Back right coil current (A)
    'gap_front_yoke': float,    # Front yoke average gap (m)
    'gap_back_yoke': float,     # Back yoke average gap (m)
    'roll': float,              # Roll angle (rad)
    'force_front': float,       # Front yoke force (N)
    'force_back': float,        # Back yoke force (N)
    'torque_front': float,      # Front yoke torque (mN·m)
    'torque_back': float        # Back yoke torque (mN·m)
}

Testing

Run the updated test script:

cd "/Users/adipu/Documents/lev_control_4pt_small/RL Testing"
/opt/miniconda3/envs/RLenv/bin/python test_env.py

Expected behavior:

  • 4 sensors report gap heights with small noise variations
  • Yoke gaps (in info) match sensor gaps approximately
  • All 4 coils build up current over time (RL circuit dynamics)
  • Forces should be ~50-100N upward at 10mm gap with moderate currents
  • Pod should begin to levitate if forces overcome gravity (5.8kg × 9.81 = 56.898 N needed)

Next Steps for RL Training

  1. Frame Stacking: Use 3-5 consecutive observations to give agent velocity information

    from stable_baselines3.common.vec_env import VecFrameStack
    env = VecFrameStack(env, n_stack=4)
    
  2. Algorithm Selection: PPO or SAC recommended

    • PPO: Good for continuous control, stable training
    • SAC: Better sample efficiency, handles stochastic dynamics
  3. Reward Tuning: Current reward weights may need adjustment based on training performance

  4. Curriculum Learning: Start with smaller gap errors, gradually increase difficulty

  5. Domain Randomization: Vary sensor noise, mass, etc. for robust policy