Jensen Huang defines the evolution of AI in stages—moving from perception and generation to agentic AI, and finally to Physical AI, which understands and interacts with the physical world. He presented World Foundation Models (WFMs) as the core tool for this new era.
Cosmos serves as an integrated platform that creates virtual worlds governed by the laws of physics using text and video data. It generates the massive amounts of data required for training autonomous vehicles and robots. By using Cosmos, companies can conduct sophisticated simulations without real-world constraints, dramatically improving the safety of self-driving cars and the efficiency of manufacturing automation.
NVIDIA’s Cosmos WFM is a world-understanding model developed to open the age of Physical AI. Its core definitions and features are as follows:
1. Definition of Cosmos WFM (World Foundation Model)
While traditional Large Language Models (LLMs) learn from text, Cosmos WFM learns from video data of the physical world to build a "digital twin."
This allows Physical AI, such as robots and autonomous vehicles, to learn and train safely in a virtual space before being deployed in the real world.
2. Key Technical Components
The Cosmos platform provides two core modeling methods (recipes) that developers can choose based on their objectives:
Diffusion Models: Starting from random noise, these models generate high-definition, realistic video based on text prompts. It is like a sculptor carving out a precise virtual world.
Autoregressive Models: Similar to how language models predict the next word, these models predict the next frame of a video. This enables the prediction of future scenarios with fast inference speeds.
Additionally, to handle vast amounts of video data efficiently, NVIDIA uses a Video Tokenizer that is up to 12 times faster than previous technologies. This allows driving data that previously took a month to process to be completed in just two days.
3. Primary Applications
Cosmos WFM plays a critical role in fields where real-world data is difficult or dangerous to obtain.
Autonomous Driving: It trains AI by virtually generating "edge cases," such as dangerous near-miss accidents or extreme weather conditions like heavy rain and snow. Companies like Mercedes-Benz, Hyundai, and Uber are using it to create autonomous driving test scenarios.
Robotics: Robots are trained by simulating how they perceive objects or perform actions in unfamiliar environments.
Future Prediction: Based on the current state, it simulates various future possibilities to help the AI choose the optimal path.

Autonomous Vehicle Training: NVIDIA Cosmos

Robot Learning: NVIDIA Cosmos


Summary
NVIDIA’s Cosmos WFM can be described as a "massive simulation engine" or a "Matrix for AI" that teaches the laws of physics to artificial intelligence. Cosmos WFMs allow AI to undergo countless trials and errors in a virtual world, preparing it to respond perfectly in reality.
To help you intuitively understand WFM, I have included a video below of Doctor Strange from the Avengers, who simulations millions of potential futures to find the one winning move.
[Photo Credit] https://www.nvidia.com/en-us/ai/cosmos/
[Video Credit] https://www.youtube.com/watch?v=eGKPfZTXHsc
