RealWebAssist is the first sequential instruction following benchmark that evaluates long-horizon web assistance with real-world users. It features:
RealWebAssist includes tasks collected from real users across shopping, food, entertainment, and travel websitesโranging from booking flights to ordering dinner or buying a gift.
RealWebAssist features multiple challenges that could emerge in long-horizon web assistance with real-world users. These include spatial and temporal reasoning needed to understand ambiguous and context-dependent user instructions, planning for multiple steps of actions to reach the goal communicated by an instruction, and learning about user-specific routines.
@article{ye2025realwebassist,
title={RealWebAssist: A Benchmark for Long-Horizon Web Assistance with Real-World Users},
author={Ye, Suyu and Shi, Haojun and Shih, Darren and Yun, Hyokun and Roosta, Tanya and Shu, Tianmin},
journal={arXiv preprint arXiv:2504.10445},
year={2025}
}