Why Mechanical Motion Breaks Video Models

Structured, coupled motion is fundamentally different from natural scene motion. Current video models have never been systematically tested on it — until now.

MechVerse Dataset

21,156 video clips from 1,357 mechanical assemblies across 28 categories — with systematic variation in speed, camera viewpoint, and motion direction.

Category Showcase Cam1 · Mid Speed · Forward

Speed Variants Same assembly · Slow / Mid / Fast

Camera Angles Same assembly · Cam1 / Cam2 / Cam3

Direction Variation Same clip · Forward vs Reversed

Dataset Statistics

Evaluation Protocol

We evaluate 15 state-of-the-art image-to-video models on mechanically-consistent video synthesis using VBench-I2V metrics and expert human evaluation across 12 dimensions.

Human Evaluation Scores

Click any model to highlight its profile. Scores are mean Likert ratings (1–5) across 12 dimensions from expert human annotators.

Click to highlight →

From the Paper

Figures and charts as they appear in the publication. Add more figures to MECHVERSE_CONFIG.figures in config.js.

Cross-Model Comparison

The same input image and prompt fed to every model. Ground truth first, then all models side by side.

Loading comparison clips…