ViDiHand

The Surprising Effectiveness of Video Diffusion Models for Hand Motion Reconstruction

Yuxi Wang1, Chengkai Jin1, Yufei Liu2, Wenqi Ouyang1, Tianyi Wei1, Zhiwei Zeng1, Siyuan Huang2, Zhiqi Shen1, Xingang Pan1
1Nanyang Technological University   2Shanghai Jiao Tong University

Performance Overview

ViDiHand satisfies the target properties of occlusion robustness, accuracy, and temporal smoothness for 4D hand recovery.

Reconstruction Results

Click a case below, then switch between our results and method comparison.

Click ↑ a case to preview
Synchronized multi-view playback
Source
Joint Mesh
3D Mesh
Ego View
Front View
Master View

Method Overview

Top

The VACE branch is finetuned with hand-overlay rendering while the base DiT remains frozen, yielding a hand-aware video diffusion model.

Middle

A lightweight dual-branch decoder extracts MANO pose, 2D joints, and translation from a single intermediate VACE feature.

Bottom

At inference, the same feature is decoded in a single VACE pass.

ViDiHand method overview

Quantitative Results

Comparison on three egocentric benchmarks. ARCTIC and HOT3D are in-distribution; HOI4D is a held-out cross-dataset comparison fair to all methods.

Method Detection 3D Pose Orient. & Position Temporal
FAcc ↑Recall ↑F1 ↑ MPJPE-p ↓PA-p ↓ EPE-p ↓GO-p ↓CT-p ↓ Jitter ↓
ARCTIC InterWild0.8780.9430.95930.81715.95253.88825.3860.09746.577
HaMeR0.8760.9430.95739.97629.32567.81624.9240.09418.094
Hamba0.8330.9120.94240.96531.30088.83927.8220.11015.026
WildHands0.8790.9460.96025.70413.94150.51722.3200.05812.972
WiLoR0.9190.9510.97437.17326.64671.21617.3580.07523.978
Dyn-HaMR0.8410.9170.95140.17230.84998.81326.0060.12112.506
HaWoR0.7000.8180.89555.67737.182160.91243.3200.14919.735
OmniHands0.8660.9490.95429.67414.20351.50524.5800.08745.312
ViDiHand (Ours)0.9970.9990.99921.6689.82112.40714.6420.0473.183
HOT3D InterWild0.6690.8810.86877.16824.81171.48258.5010.213101.164
HaMeR0.6920.9040.88367.59336.24160.80149.6330.10223.206
Hamba0.6320.8290.85371.05443.342108.50256.5350.12818.111
WildHands0.6550.8630.84452.79128.946111.43853.9330.15722.885
WiLoR0.8270.8980.93744.82535.07969.88125.7500.09817.784
Dyn-HaMR0.5580.7610.75582.86551.921205.42850.7000.58347.483
HaWoR0.3480.4990.65580.14674.957332.31179.3500.26223.806
OmniHands0.6490.8950.86863.28122.68268.43749.1200.13369.510
ViDiHand (Ours)0.9480.9740.98321.51411.38314.95315.8290.0403.741
HOI4D InterWild0.7310.9220.86453.07222.90980.54941.7430.22898.866
HaMeR0.7300.9230.86448.87533.21581.71733.6360.18720.197
Hamba0.7090.8850.84951.69837.395117.80137.4660.20421.740
WildHands0.7300.9240.86445.62323.60182.24645.6540.15918.615
WiLoR0.9620.9650.97241.60327.76743.33525.6030.11617.735
Dyn-HaMR0.7490.8610.84452.82640.172151.90240.3060.25818.146
HaWoR0.8680.8630.91858.44239.665144.22943.2090.14028.098
OmniHands0.6550.9370.83444.25518.68970.66234.3920.10824.212
ViDiHand (Ours)0.9840.9910.99030.09013.96024.46023.4200.1174.010