Part II: Frontier Robot Manipulation

Chapter 5: Jim Fan and GEAR — A Map of Generalist Embodied Agents

Written: 2026-06-08 Last updated: 2026-06-08

Jim Fan's research arc is a useful map for NVIDIA physical AI. MineDojo and Voyager studied open-ended agents in simulated worlds; Eureka and DrEureka showed how LLMs can generate rewards and domain randomization; GEAR extends this direction toward robotics, dexterous manipulation, world models, and synthetic data [3] [4] [5] [1].

Figure 5.1: Code-as-Policies style language-model-generated robot skills. source: S3 reused figure
Figure 5.1: Code-as-Policies style language-model-generated robot skills. source: S3 reused figure

5.1 The GEAR Question

GEAR is not only asking whether one robot can solve one task. It is asking whether embodied agents can learn across embodiments, scenes, tasks, and data sources. GR00T, DreamGen, DreamZero, EgoScale, DexMimicGen, DexUMI, and RealDexUMI are different parts of that question.

For manufacturing, egocentric video and dexterous teleoperation matter because they may convert tacit manual-work knowledge into training data.

Figure 5.2: Robot manipulation as an execute-debug loop inspired by coding agents. source: S3 reused figure
Figure 5.2: Robot manipulation as an execute-debug loop inspired by coding agents. source: S3 reused figure

5.2 From LLM Agents to Robot Agents

Voyager's skill-library loop cannot be copied directly into robotics because physical failures are costly. A manufacturing robot agent needs pre-execution simulation, in-execution anomaly detection, and post-execution quality feedback.

References

  1. NVIDIA GEAR (2026). GEAR Publications. NVIDIA Research.
  2. Jim Fan (2026). Jim Fan Homepage. Personal Research Page.
  3. Linxi Fan et al. (2022). MineDojo. arXiv.
  4. Guanzhi Wang et al. (2023). Voyager. arXiv.
  5. Yecheng Jason Ma et al. (2024). DrEureka. arXiv.