#2
★ Gold
The embodied AI sector confronts a massive data bottleneck: training general-purpose robots requires hundreds of billions of interaction data points, but the industry currently holds only a few million—a 100,000x gap. In response, companies are abandoning proprietary data silos. JD.com plans to collect 5 million hours of human video and 1 million hours of robot data. Government-backed open-source communities are forming, and competitors including Unitree, AgiBot, and Leju are collaborating on shared datasets. The industry is converging on a layered training approach combining simulation, teleoperation, UMI (universal manipulation interface), and video-based learning.
#3
★ Gold
Bessemer Venture Partners published a deep analysis of world models—a new class of AI that learns physical intuition from video rather than expensive robot teleoperation data. Models like NVIDIA Cosmos (7-14B parameters), DeepMind Genie 3, and OpenAI Sora are demonstrating emergent physical understanding at scale. The key insight: by pre-training on abundant internet video and fine-tuning with minimal robot-specific data for action conditioning, these approaches could dramatically reduce cost and data requirements for robot learning. However, challenges remain: spatial-temporal consistency over long horizons, tactile sensing gaps, and inference costs (~$100/hour for Genie 3).
#12
★ Silver
Engineered Arts revealed its new Tritium AI platform for the 61-DoF Ameca humanoid robot, enabling users to define robot behaviors entirely through plain text with knowledge documents and custom abilities. The system integrates NLP, speech recognition, and text-to-speech with 55+ language support and voice cloning. Rather than writing code, operators describe desired behaviors in natural language, and Tritium translates them into robot actions—a dramatic abstraction layer over traditional robot programming.