## Audio Dataset ## Stage1: Pretraining We mainly use [WavCaps](https://github.com/XinhaoMei/WavCaps) dataset for pre-training. ### Download ```Bash # install git-lfs sudo apt update sudo apt-get install git-lfs git clone https://huggingface.co/datasets/cvssp/WavCaps cd WavCaps git lfs pull --include "*" ``` ### Processing 1. Extract zip file ```bash # merge shards first zip -s- FILE_NAME.zip -O COMBINED_FILE.zip unzip COMBINED_FILE.zip ``` 2. Processing Extract raw audio data ```bash unzip COMBINED_FILE.zip -d /target/dir ``` Create json files (annotations) for each example. Before processing, modify `dataset/audio/process.py` to set data and json path. ```bash python3 --dataset test --data_dir /path/to/data --json_path /path/to/json ``` 3. Pack with tar ```bash python3 dataset/audio/make_tar.py --input /path/to/data --output /path/to/web_dataset \ --dataclass none --filename filename --num_element 500 ``` To view tar file ``` tar tf filename.tar | sed 10q ``` **To setup in one line:** ```bash # DATASET=soundbible bbc audioset freesound DATASET=soundbible bash dataset/audio/setup.sh ```