Part III: Manufacturing Deployment

Chapter 8: Verification and Safety — When Sim-to-Real Meets Production Liability

Written: 2026-06-08 Last updated: 2026-06-08

The core lesson from S3 is that robotics is harder to verify than coding agents. Code can be reverted; robot failures may damage products or hurt people. Manufacturing physical AI must therefore be governed by release gates, not only model scores.

Figure 8.1: AutoRT-style fleet orchestration and safety constitution. source: Brohan et al. 2024 reused figure
Figure 8.1: AutoRT-style fleet orchestration and safety constitution. source: Brohan et al. 2024 reused figure

8.1 From SIMPLER to Factory Validation

SIMPLER clarified the problem of evaluating real robot policies in simulation [1]. Factories need a production version: success rate, cycle time, defect rate, recovery rate, safe-stop rate, and human override rate.

8.2 A Manufacturing Robot Constitution

AutoRT limits LLM/VLM task proposals through a Robot Constitution and affordance filters [2]. In manufacturing, this becomes SOP, quality documentation, safety PLCs, GMP rules, and line-stop protocols.

Figure 8.2: Failure explanation from multisensory robot logs. source: S3 reused figure
Figure 8.2: Failure explanation from multisensory robot logs. source: S3 reused figure

References

  1. Xinghang Li et al. (2024). Evaluating Real-World Robot Manipulation Policies in Simulation. arXiv.
  2. Anthony Brohan et al. (2024). AutoRT. arXiv.
  3. Chen Liu et al. (2023). REFLECT. arXiv.