Generating the Past, Present and Future from a
Motion-Blurred Image

SaiKiran Tedla1      Kelly Zhu2,3      Trevor Canham1      Felix Taubner2,3
Michael S. Brown1      Kiriakos N. Kutulakos2,3      David B. Lindell2,3
1York University      2University of Toronto      3Vector Institute
ACM Transactions on Graphics (SIGGRAPH Asia 2025)


In-the-Wild Results

We showcase our method on a variety of in-the-wild scenes. Our model can both predict in the present (1st tab) and the past and future (2nd tab). These scenes are sourced from various cameras, settings, and blur types. We find our model is able to reconstruct videos from a wide range of scenes, including those with complex motion and occlusions. Below we visualize the results of our method along with MotionETR and Jin et al. for comparison. We also show the motion tracks of the predicted frames along with dynamic 3D reconstructions recovered by MegaSAM.

Scene Selection

Blurry Image
Recovered Video Motion Tracks
Ours
MotionETR
Jin et al.
Frame Control
Past Predictions
Present (within exposure time)
Future Predictions
3D reconstruction (from past, present, and future prediction)
MegaSAM on Ours
camera poses


Bringing Historical Photos to Life

We find our method can recover video from historical motion-blurred photos. Again we test our model on both predicting the present (1st tab) and the past and future (2nd tab). We also show the motion tracks of the predicted frames along with dynamic 3D reconstructions recovered by MegaSAM.

Scene Selection

Blurry Image
Recovered Video Motion Tracks
Ours
Frame Control
Past Predictions
Present (within exposure time)
Future Predictions
3D reconstruction (from past, present, and future prediction)
MegaSAM on Ours
camera poses

Recovering 3D Human Pose

We find our method can recover pose from a blurry image with human motion. From this blurred image, we are able to recover a video sequence of a man shaking his head. We visualize the recovered video sequence as well as the human head pose.

Blurry Image
Reconstructed Video
Pose Estimation

Simulated Results

We test our model on the simulated dataset we generated in our paper. We are able to recover a robust range of motions while predicting the past, present, and future. We show motion tracks to visualize the predicted motions.

Scene Selection

Video Reconstruction

Blurry Image
Recovered Video Motion Tracks
Ours
Frame Control
Past Predictions
Present (within exposure time)
Future Predictions

GoPro Results

We test our model on the GoPro dataset and compare against Motion ETR and Jin et al. Our method is able to recover the videos more faithfully and predicts more realistic Motion Tracks. MotionETR warps the scene content in many cases and Jin et al. fails to recover many subtle motions and also produces many artifacts. In cases of blur caused by global motion, our Motion Tracks are very smooth, while baselines latch onto local features and produce jittery tracks.

Scene Selection


Blurry Image
Recovered Video Motion Tracks
Ours
MotionETR
Jin et al.
Ground Truth
Frame Control
Past Predictions
Present (within exposure time)
Future Predictions


B-AIST++ Results

We compare our method on the B-AIST++ dataset. We find our model is able to recover the dancer poses and motions more faithfully than Animation From Blur. Our model is able to handle the various occlusion and disocclusion changes that happen while the dancer is moving.

Scene Selection


Blurry Image
Recovered Video Motion Tracks
Ours
Animation from Blur
Ground Truth
Frame Control
Past Predictions
Present (within exposure time)
Future Predictions


Limitations

Our model is limited in scenarios with artful motion blur where photos are synthesized for images captured at different time instances. Additionally, our model fails in scenarios with complex camera and scene motion.

Scene Selection

Blurry Image
Recovered Video Motion Tracks
Ours
MotionETR
Jin et al.
Frame Control
Past Predictions
Present (within exposure time)
Future Predictions


Multimodal Generation

Many different videos could potentially be consistent with an input motion-blurred image because the direction of motion is inherently ambiguous. We find that our model captures this ambiguity through a multi-modal distribution over generated motions, which we assess by sampling multiple output videos for a given input image.

Scene Selection

Blurry Image
Video 1 Video 2
Video 3 Video 4
Frame Control
Past Predictions
Present (within exposure time)
Future Predictions

Exposure Interval Control

The exposure interval encoding enables video generation with control over the start and end times of the generated frames. To test the effectiveness of this mechanism, we evaluate the accuracy of the predicted video frames under varying exposure interval configurations. Specifically, we test two scenarios: one with varying frame durations relative to the input motion-blurred image, and another with dead time between generated frames.

Scene Selection

Blurry Image
Ground Truth Recovered Video
Ours
Frame Control
Past Predictions
Present (within exposure time)
Future Predictions

Proposed vs Alternate Embedding

We also compare against an alternative exposure interval encoding scheme. Instead of explicitly providing the per-frame exposure intervals as input, we fine-tuned a new model whose temporal conditioning signal consists of (1) the start time of the first output frame, (2) the end time of the last output frame, (3) the (uniform) duration of individual frames, and (4) the number of frames. This scheme contains equivalent information, but encodes the intervals implicitly.

Scene Selection

Blurry Image Ground Truth
Alternate Embedding Proposed Embedding
Frame Control
Past Predictions
Present (within exposure time)
Future Predictions