Vision-Language Models exhibit strong reasoning abilities, yet their application to embodied navigation is limited by insufficient paired data linking visual perception to robot control. While simulators offer cost-effective data generation, photorealistic environments often fail to transfer well to real-world deployment.
We propose SAGE (Sandbox-Abstracted Grounded Experience), a framework enabling agents to learn from physics-constrained semantic environments rather than photorealistic simulations — mirroring how humans mentally rehearse plans using simplified physical models before acting. The approach comprises three phases: Genesis constructs diverse physics-constrained semantic environments for initial experience; Evolution refines experiences through reinforcement learning with a novel asymmetric adaptive clipping mechanism; and Navigation bridges the abstract policy to real-world control.
Results demonstrate significant improvements, achieving 53.21% LLM-Match Success Rate on A-EQA (+9.7% over baseline) while demonstrating promising transfer to physical indoor robot deployment.
Our SAGE framework utilizes a physics-grounded sandbox for self-evolving data generation and policy optimization, enabling the agent to bridge the gap between sandbox and open-world.
Constructs diverse physics-constrained semantic environments from HM3D and InteriorGS scenes, synthesizing sandbox navigation tasks and grounded experience without photorealistic rendering.
Refines navigation policies through reinforcement learning with an asymmetric adaptive clipping (AAC) mechanism, distilling physics-grounded priors into a compact VLM-based agent.
Bridges the abstract sandbox policy to open-world robot control, enabling zero-shot transfer across A-EQA, GOAT-Bench, and physical indoor robot deployments.
SAGE is evaluated on A-EQA (Augmented Embodied Question Answering) and GOAT-Bench, two challenging embodied navigation benchmarks covering object goal navigation, spatial reasoning, and multi-step question answering. Beyond simulation, SAGE demonstrates transfer to physical indoor robot deployment, validating the sim-to-real potential of physics-grounded abstracted experience.
@inproceedings{shen2026sage,
title = {Plan in Sandbox, Navigate in Open Worlds: Learning
Physics-Grounded Abstracted Experience for Embodied Navigation},
author = {Shen, Zhixuan and Du, Jiawei and Guo, Ziyu and Luo, Han
and Peng, Lilan and Zhou, Joey Tianyi and Luo, Haonan
and Li, Tianrui},
booktitle = {Proceedings of the 43rd International Conference on Machine Learning},
year = {2026},
note = {ICML 2026}
}