This document provides instructions for preparing the datasets used in CodeBind.
CodeBind supports the following datasets across multiple modalities:
- Vision: Places365, K400,MSR-VTT etc.
- Audio: AudioSet, AudioCaps, VGGSOUND, etc.
- Depth: SUN Depth, NYU Depth.
- Thermal: LLVIP, FLIR_v2.
- Tactile: Torch and Go.
- EEG: ImageNet-EEG.
- Point Cloud: ModelNet40.
Download the datasets and add symbolic link to ~/.datasets.
We follow similar preprocessing methods in ImageBind and Vit-Lens. Also, we detail the dataset information in the Appendix in our paper.