Track Everything Everywhere
Fast and Robustly

1 University of Pennsylvania
2 Archimedes, Athena RC

*Indicates Equal Contribution

Our optimization-based approach leverages depth priors and achieves fast and robust long-term dense tracking

Abstract

We propose a novel test-time optimization approach for efficiently and robustly tracking any pixel at any time in a video. The latest state-of-the-art optimization-based tracking technique, OmniMotion, requires a prohibitively long optimization time, rendering it impractical for downstream applications. OmniMotion is sensitive to the choice of random seeds, leading to unstable convergence. To improve efficiency and robustness, we introduce a novel invertible deformation network, CaDeX++, which factorizes the function representation into a local spatial-temporal feature grid and enhances the expressivity of the coupling blocks with non-linear functions. While CaDeX++ incorporates a stronger geometric bias within its architectural design, it also takes advantage of the inductive bias provided by the vision foundation models. Our system utilizes monocular depth estimation to represent scene geometry and enhances the objective by incorporating DINOv2 long-term semantics to regulate the optimization process. Our experiments demonstrate a substantial improvement in training speed (more than 10 times faster), robustness, and accuracy in tracking over the SoTA optimization-based method OmniMotion.

5 min video intro and results (with narration)



Method Overview

We avoid the NeRF-like reconstruction process by explicitly exploiting recent advances in foundational monocular metric depth estimation, i.e. ZoeDepth, which estimates a reasonably accurate and consistent geometry for each frame. We factorize the invertible deformation field into the local representation and invoke non-linear transformation to increase expressivity.

BibTeX


      @misc{song2024track,
      title={Track Everything Everywhere Fast and Robustly}, 
      author={Yunzhou Song and Jiahui Lei and Ziyun Wang and Lingjie Liu and Kostas Daniilidis},
      year={2024},
      eprint={2403.17931},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
      }