As the development of Video Generation Models (VGMs) progresses, many are claimed to possess world modeling capabilities—the ability to generate arbitrary realistic videos. These models have significant potential for applications in fields such as robotics and autonomous driving as well as in scientific simulation and medical data augmentation. However, a critical question arises: to what extent do these models adhere to physical laws? To address this, we introduce MORPHEUS, a novel benchmark designed to evaluate the intrinsic physical reasoning capabilities of VGMs. MORPHEUS provides a curated dataset comprising 80 real-world videos of experiments that capture physical phenomena, guided by specific physical invariances such as energy or momentum conservation. These invariances reveal the specific physical phenomena missed by the models, enabling a fine-grained evaluation and forming the foundation of our interpretable Physical Score measure. We also propose a Statistical Score, based on Physics-Informed Neural Networks (PINNs), to provide a complementary evaluation across a broader range of physical scenarios. Our findings reveal that even with advanced prompting techniques, such as multi-frame prompting and enhanced textual descriptions, current VGMs demonstrate substantial limitations in their ability to model and understand physical phenomena.
Falling Ball
Bouncing Ball
Projectile
Holonomic Pendulum
Non-holonomic Pendulum
Double Pendulum
Model Name | Experiment Type | Conditioning | Prompt Type | Discard Rate | Physical Invariance Score | Dynamical score |
---|
An example of the disappearance of the object (the projectile ball). Model: PyramidalFlow, multi-frame conditioning, enhanced text prompt.
An example of the duplication of the object (non-holonomic pendulum). Model: COSMOS, multi-frame conditioning, plain text prompt.
An example of the stillness of the object (double pendulum). Model: LTX, single frame conditioning, plain text prompt.