Plan in Sandbox, Navigate in Open Worlds: Learning Physics-Grounded Abstracted Experience for Embodied Navigation

ICML 2026
SAGE pipeline overview

SAGE utilizes a physics-grounded sandbox for self-evolving data generation and policy optimization, enabling the agent to bridge the gap between sandbox and open-world.

Abstract

Vision-Language Models exhibit strong reasoning abilities, yet their application to embodied navigation is limited by insufficient paired data linking visual perception to robot control. While simulators offer cost-effective data generation, photorealistic environments often fail to transfer well to real-world deployment.

We propose SAGE (Sandbox-Abstracted Grounded Experience), a framework enabling agents to learn from physics-constrained semantic environments rather than photorealistic simulations — mirroring how humans mentally rehearse plans using simplified physical models before acting. The approach comprises three phases: Genesis constructs diverse physics-constrained semantic environments for initial experience; Evolution refines experiences through reinforcement learning with a novel asymmetric adaptive clipping mechanism; and Navigation bridges the abstract policy to real-world control.

Results demonstrate significant improvements, achieving 53.21% LLM-Match Success Rate on A-EQA (+9.7% over baseline) while demonstrating promising transfer to physical indoor robot deployment.

Method

Our SAGE framework utilizes a physics-grounded sandbox for self-evolving data generation and policy optimization, enabling the agent to bridge the gap between sandbox and open-world.

🌱
Stage I
Genesis

Constructs diverse physics-constrained semantic environments from HM3D and InteriorGS scenes, synthesizing sandbox navigation tasks and grounded experience without photorealistic rendering.

⚗️
Stage II
Evolution

Refines navigation policies through reinforcement learning with an asymmetric adaptive clipping (AAC) mechanism, distilling physics-grounded priors into a compact VLM-based agent.

🧭
Stage III
Navigation

Bridges the abstract sandbox policy to open-world robot control, enabling zero-shot transfer across A-EQA, GOAT-Bench, and physical indoor robot deployments.

Results

A-EQA — LLM-Match Success Rate
53.21%
↑ +9.7% over baseline
Policy Model Size
2B
Qwen3-VL backbone

SAGE is evaluated on A-EQA (Augmented Embodied Question Answering) and GOAT-Bench, two challenging embodied navigation benchmarks covering object goal navigation, spatial reasoning, and multi-step question answering. Beyond simulation, SAGE demonstrates transfer to physical indoor robot deployment, validating the sim-to-real potential of physics-grounded abstracted experience.

BibTeX

@inproceedings{shen2026sage,
  title     = {Plan in Sandbox, Navigate in Open Worlds: Learning
               Physics-Grounded Abstracted Experience for Embodied Navigation},
  author    = {Shen, Zhixuan and Du, Jiawei and Guo, Ziyu and Luo, Han
               and Peng, Lilan and Zhou, Joey Tianyi and Luo, Haonan
               and Li, Tianrui},
  booktitle = {Proceedings of the 43rd International Conference on Machine Learning},
  year      = {2026},
  note      = {ICML 2026}
}