ICML 2026

MindZero

Learning Online Mental Reasoning With Zero Annotations

Shunchi Zhang^1*, Jin Lu^1*, Chuanyang Jin^1*, Yichao Zhou^2*, Zhining Zhang², Tianmin Shu¹

¹ Johns Hopkins University ² Peking University ^* Equal contribution

Project Page

Code and models are open source

https://scai.cs.jhu.edu/MindZero

01 / Online Mental Reasoning

Infer mental states from a partial behavior stream

At every time step, the assistant maintains mental-state hypotheses over latent human goals.

Uncertainty robust uncertainty over multiple hypotheses
Efficiency fast inference for real-time assistance
Zero annotations learning with zero ground-truth annotations

02 / Bayesian Theory of Mind

A standard target, but expensive to run online

P(m_t | s_1:t, a_1:t) ∝ P(a_1:t | m_t, s_1:t) · P(m_t)

Posterior the mental-state distribution on the left side

Likelihood how likely the observed actions are under this state

Prior whether the mental state itself is plausible

Model-based ToM (e.g. AutoToM) estimates it through Bayesian networks, but each edge may require an LLM call, which is slow and expensive.

03 / Self-Supervised Reinforcement Learning

Amortize model-based ToM into one forward pass

Self-supervised reinforcement learning pipeline

Model-based reasoning can be used as the training signal for amortization.

Proposal A full particle set of candidate mental states
Scoring A planner or frozen LLM scorer checks action likelihood
Optimization Reinforcement learning for non-differentiable scoring

At test time, the model can produce hypotheses in a single pass.

04 / Objective

ELBO as the reward

ELBO encourages hypotheses that explain actions and keep uncertainty.

Likelihood high action likelihood
Prior prior plausibility
Entropy discourages early collapse

05 / Setup

Domains and tasks for evaluation

Domains

GridWorld: visual map input
Household: text-converted scenarios

Tasks

Story-based Theory-of-Mind question answering
Proactive assistance

06 / Question Answering

MindZero improves story-based QA accuracy

MindZero improves significantly over the pretrained checkpoint and is competitive with strong commercial models using much less computation.

07 / Proactive Assistance

Proactive assistance tests online inference

MindZero obtains the best speedup efficiently collaborating with both simulated and real humans.

08 / Analysis

Mode seeking does not become mode collapse

Prediction quality over task progress — Prediction quality sharpens as more actions are observed.

Ablation study on diversity controls — Diversity depends on prior design, multiple hypotheses, and entropy bonus.

09 / Takeaways

Mental reasoning can be learned with self-supervision

Problem Online mental reasoning with uncertainty, efficiency, and zero annotations

Method Online mental reasoning can be learned with self-supervision using RL

Result Efficient single-pass inference at test time and strong results across domains and tasks

Project Page

Code and models are open source

https://scai.cs.jhu.edu/MindZero