mirror of
https://github.com/Vision-CAIR/MiniGPT-4.git
synced 2025-04-23 22:20:50 +00:00
* add audio dataset setup --------- Co-authored-by: bingyikang <bingyikang@bytedance.com>
56 lines
1.1 KiB
Markdown
56 lines
1.1 KiB
Markdown
## Audio Dataset
|
|
|
|
## Stage1: Pretraining
|
|
We mainly use [WavCaps](https://github.com/XinhaoMei/WavCaps) dataset for pre-training.
|
|
|
|
### Download
|
|
|
|
```Bash
|
|
# install git-lfs
|
|
sudo apt update
|
|
sudo apt-get install git-lfs
|
|
|
|
|
|
git clone https://huggingface.co/datasets/cvssp/WavCaps
|
|
cd WavCaps
|
|
git lfs pull --include "*"
|
|
```
|
|
|
|
### Processing
|
|
|
|
1. Extract zip file
|
|
```bash
|
|
# merge shards first
|
|
zip -s- FILE_NAME.zip -O COMBINED_FILE.zip
|
|
unzip COMBINED_FILE.zip
|
|
```
|
|
|
|
2. Processing
|
|
Extract raw audio data
|
|
```bash
|
|
unzip COMBINED_FILE.zip -d /target/dir
|
|
```
|
|
|
|
Create json files (annotations) for each example. Before processing, modify `dataset/audio/process.py` to set data and json path.
|
|
```bash
|
|
python3 --dataset test --data_dir /path/to/data --json_path /path/to/json
|
|
```
|
|
|
|
|
|
3. Pack with tar
|
|
```bash
|
|
python3 dataset/audio/make_tar.py --input /path/to/data --output /path/to/web_dataset \
|
|
--dataclass none --filename filename --num_element 500
|
|
```
|
|
|
|
To view tar file
|
|
```
|
|
tar tf filename.tar | sed 10q
|
|
```
|
|
|
|
**To setup in one line:**
|
|
```bash
|
|
# DATASET=soundbible bbc audioset freesound
|
|
DATASET=soundbible bash dataset/audio/setup.sh
|
|
```
|