Chen, Y., Xian, Y., Koepke, A. S., & Akata, Z. (2021). Distilling Audio-Visual Knowledge by Compositional Contrastive Learning. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE.