Structured, coupled motion is fundamentally different from natural scene motion. Current video models have never been systematically tested on it — until now.
21,156 video clips from 1,357 mechanical assemblies across 28 categories — with systematic variation in speed, camera viewpoint, and motion direction.
We evaluate 15 state-of-the-art image-to-video models on mechanically-consistent video synthesis using VBench-I2V metrics and expert human evaluation across 12 dimensions.
Click any model to highlight its profile. Scores are mean Likert ratings (1–5) across 12 dimensions from expert human annotators.
Figures and charts as they appear in the publication. Add more figures to MECHVERSE_CONFIG.figures in config.js.
The same input image and prompt fed to every model. Ground truth first, then all models side by side.