Fan, K., Bai, Z., Xiao, T., Zietlow, D., Horn, M., Zhao, Z., et al. (2023). Unsupervised
Open-Vocabulary Object Localization in Videos. In IEEE/CVF International Conference on Computer Vision
(pp. 13701-13709). Piscataway, NJ: IEEE. doi:10.1109/ICCV51070.2023.01264.