Partially Equivariant Reinforcement Learning

TL;DR: By selectively applying equivariance, PE-RL maintains sample efficiency while remaining robust to symmetry-breaking!

UR5e Reach

MuJoCo Ant

Abstract

Group symmetries provide a powerful inductive bias for reinforcement learning (RL), enabling efficient generalization across symmetric states and actions via group-invariant Markov Decision Processes (MDPs). However, real-world environments rarely realize fully group-invariant MDPs; dynamics, actuation limits, and reward design usually break symmetries, often only locally. Under group-invariant Bellman backups for such cases, local symmetry-breaking introduces errors that propagate across the entire state-action space, resulting in global value estimation errors. To address this, we introduce Partially Group-Invariant MDP (PI-MDP), which selectively applies group-invariant or standard Bellman backups depending on where symmetry holds. This framework mitigates error propagation from locally broken symmetries while maintaining the benefits of equivariance, thereby enhancing sample efficiency and generalizability. Building on this framework, we present practical RL algorithms - Partially Equivariant (PE)-DQN for discrete control and PE-SAC for continuous control - that combine the benefits of equivariance with robustness to symmetry-breaking. Experiments across grid world, locomotion, and manipulation benchmarks show that PE-DQN and PE-SAC significantly outperform baselines, highlighting the importance of selective symmetry exploitation for robust and sample-efficient RL.

Problem: Local symmetry-breaking induces global value errors

Strict Equivariance Assumption: Standard equivariant RL assumes a globally symmetric Markov Decision Process (MDP). Thus, rotating the state and action spaces should theoretically yield invariant optimal values (see a and b).
Symmetry-Breaking in Reality: Real-world environments violate this assumption due to static obstacles or kinematic limits. Comparing rotated values with a stabilized goal (c) reveals local equivariance errors exactly at the obstacle.
Global Error Propagation: Bellman backups amplify this local one-step mismatch, $\delta(s,a)$, by the horizon $\frac{1}{1-\gamma}$. This causes the error to propagate outward, corrupting the global value estimation.

Theory: Partially Group-Invariant MDP (PI-MDP)

PI-MDP: We propose PI-MDP, introducing a gating function $\lambda(s,a)$ to interpolate between the group-invariant MDP and the true environment. This selectively applies equivariance only where symmetry holds.

PI-MDP recovers the true optimum under correct gating

Mitigating the Error: By correctly routing symmetry-broken state-action pairs to the true MDP, PI-MDP suppresses the local mismatch. This theoretically forces the global equivariance error bound to zero.

Algorithm: Partially Equivariant RL (PE-RL)

Partially Equivariant RL: Based on PI-MDP, we develop practical algorithms (PE-DQN and PE-SAC). They retain sample efficiency from equivariance while adaptively reverting to standard Bellman updates in asymmetric regions.
Learning the Gate: We optimize $\lambda(s,a)$ via an auxiliary disagreement objective. Large prediction errors between an equivariant dynamics predictor ($\hat{P}_E$) and a standard predictor ($\hat{P}_N$) indicate symmetry violations, dynamically triggering a switch to the standard network.

Continuous control results

High Sample Efficiency: In symmetric tasks like Swimmer, PE-SAC matches the sample efficiency of strict equivariance.
Robustness to Symmetry-Breaking: In heavily symmetry-broken tasks with free orientation (like UR5e Reach), strict equivariance collapses. PE-SAC dynamically routes around these breaks to achieve robust, stable learning.

Learned gate visualization

The videos below demonstrate how the learned gate responds to local symmetry-breaking. The background color and the moving circle indicate the predicted gate probability $\Pr(\lambda=1 \mid s,a)$. Green ($\lambda \approx 0$) indicates the equivariant approximation is reliable, while Red ($\lambda \approx 1$) indicates local symmetry-breaking, triggering a switch to the standard model.

UR5e Reach

Task: The robot arm aligns its end-effector with a target SE(3) pose, exhibiting an approximate SO(3) symmetry around the workspace center.
What to look for: Notice how the gate increases (turns red) near configurations where joint limits and kinematic singularities locally violate this assumed symmetry.

MuJoCo Ant

Task: Moving forward safely. The leg configuration exhibits an approximate 90° rotational symmetry.
What to look for: The gate increases (turns red) when contact-driven dynamics, friction, or joint limits make the equivariant approximation unreliable.

Citation


    @inproceedings{
      chang2026partially,
      title={Partially Equivariant Reinforcement Learning in Symmetry-Breaking Environments},
      author={Junwoo Chang and Minwoo Park and Joohwan Seo and Roberto Horowitz and Jongmin Lee and Jongeun Choi},
      booktitle={International Conference on Learning Representations},
      year={2026},
      url={https://openreview.net/forum?id=dRDcVyobhH}
    }