diff --git a/dataset/README_MINIGPTv2_FINETUNE.md b/dataset/README_MINIGPTv2_FINETUNE.md index be1e09b..c582acc 100644 --- a/dataset/README_MINIGPTv2_FINETUNE.md +++ b/dataset/README_MINIGPTv2_FINETUNE.md @@ -1,9 +1,29 @@ ## Download the COCO captions, RefCOCO, RefCOCO+. RefCOCOg, visual genome, textcaps, LLaVA, gqa, AOK-VQA, OK-VQA, OCR-VQA, filtered Flickr-30k, multi-task conversation, and Unnatural instruction datasets +After downloading all of them, organize the data as follows in `./playground/data`, + +``` +├── coco +│ └── train2017 +├── gqa +│ └── images +├── ocr_vqa +│ └── images +├── textvqa +│ └── train_images +└── vg + ├── VG_100K + └── VG_100K_2 +``` + ### COCO captions +- [train2017](http://images.cocodataset.org/zips/train2017.zip) ### RefCOCO, RefCOCO+, RefCOCOg +### Visual genome +- [part1](https://cs.stanford.edu/people/rak248/VG_100K_2/images.zip), [part2](https://cs.stanford.edu/people/rak248/VG_100K_2/images2.zip) +### TextCaps Makesure you have the COCO 2014 images first. Then, @@ -48,6 +68,11 @@ Similarly, set **ann_path** in all the following configs to the above folder (Lo ### LLaVA +### TextVQA +- [train_val_images](https://dl.fbaipublicfiles.com/textvqa/images/train_val_images.zip) +### GQA +- [images](https://downloads.cs.stanford.edu/nlp/data/gqa/images.zip) +- [Annotations](https://storage.googleapis.com/sfr-vision-language-research/LAVIS/datasets/gqa/testdev_balanced_questions.json) @@ -58,6 +83,7 @@ Similarly, set **ann_path** in all the following configs to the above folder (Lo ### AOK-VQA ### OCR-VQA +- [download script](https://drive.google.com/drive/folders/1_GYPY5UkUy7HIcR0zq3ZCFgeZN7BAfm_?usp=sharing), **we save all files as `.jpg`** ### filtered Flickr-30k