MechVerse — Physical Motion Consistency Benchmark

The Problem

Why Mechanical Motion Breaks Video Models

Structured, coupled motion is fundamentally different from natural scene motion. Current video models have never been systematically tested on it — until now.

The Dataset

MechVerse Dataset

21,156 video clips from 1,357 mechanical assemblies across 28 categories — with systematic variation in speed, camera viewpoint, and motion direction.

Category Showcase Cam1 · Mid Speed · Forward

Speed Variants Same assembly · Slow / Mid / Fast

Camera Angles Same assembly · Cam1 / Cam2 / Cam3

Direction Variation Same clip · Forward vs Reversed

Dataset Statistics

Evaluation

Evaluation Protocol

We evaluate 15 state-of-the-art image-to-video models on mechanically-consistent video synthesis using VBench-I2V metrics and expert human evaluation across 12 dimensions.

Results

Human Evaluation Scores

Click any model to highlight its profile. Scores are mean Likert ratings (1–5) across 12 dimensions from expert human annotators.

            Click to highlight →
          

Visual Comparison

Cross-Model Comparison

The same input image and prompt fed to every model. Ground truth first, then all models side by side.

Loading comparison clips…

Citation

If you find MechVerse useful in your work, please consider citing:


@article{jain2026mechverse,
  title={MechVerse: Evaluating Physical Motion Consistency in Video Generation Models},
  author={Jain, Rahul and Patel, Mayank and Unmesh, Asim and Ramani, Karthik},
  journal={arXiv preprint arXiv:2605.14843},
  year={2026}
}