CaPE: Safe and Interpretable Multimodal Path Planning for Multi-Agent Cooperation

CaPE: Code as Path Editor

We propose CaPE (Code as Path Editor), a safe and interpretable multimodal path planning framework for multi-agent cooperation. CaPE enables robots to adapt motion plans in response to natural language communication by generating structured, interpretable path-editing programs that are validated by a model-based planner.

🧭 Planner + editor: A model-based joint planner proposes candidate paths for the target agent and predicts other agents’ trajectories. CaPE then edits the target path using language and contextual information.
🧩 Language-guided path editing: Cooperation is formulated as editing an existing plan. CaPE selects among planner-proposed candidates and applies targeted waypoint-level edits.
👁️🗣️ Multimodal program synthesis: A vision-language model generates a compact DSL program conditioned on the map, candidate paths, predicted agent trajectories, and communication.
✅ Safety verification: A planner-based verifier checks each DSL instruction and rejects edits that violate physical or interaction constraints, producing a verified final path.
🔎 Interpretability: The output is explicit, human-readable edit code, such as path selection, waypoint translation or rotation, waypoint insertion, and waiting.

Core idea: Use a vision-language model to synthesize structured path-editing programs, then apply planner-based verification to ensure safety. This enables open-ended language-driven coordination while preserving robustness and interpretability.

CaPE can be integrated into diverse domains

CaPE enables safe and interpretable language-guided coordination across multi-robot interactions, household human-robot interaction, and real-world human-robot joint carrying tasks.

Video Demo (Quantitative Examples)

Human-Robot Joint Lifting: With CaPE, the robot interprets the human’s verbal instruction and adapts its motion, taking the rightmost path while avoiding obstacles.

Human-Robot Teaming: Without CaPE, the human and robot collide at the doorway because they cannot communicate to coordinate their movements. With CaPE, the robot understands the human instruction to move aside and allow passage, successfully avoiding the collision.

BibTeX


@article{shi2026safe,
  title={Safe and Interpretable Multimodal Path Planning for Multi-Agent Cooperation},
  author={Shi, Haojun and Ye, Suyu and Guerrerio, Katherine M and Shen, Jianzhi and Yin, Yifan and Khashabi, Daniel and Huang, Chien-Ming and Shu, Tianmin},
  journal={arXiv preprint arXiv:2602.19304},
  year={2026}
}