diff --git a/dataset/Evaluation.md b/dataset/Evaluation.md new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/dataset/Evaluation.md @@ -0,0 +1 @@ + diff --git a/dataset/README_MINIGPTv2_FINETUNE.md b/dataset/README_MINIGPTv2_FINETUNE.md index 512b1ff..070fbf5 100644 --- a/dataset/README_MINIGPTv2_FINETUNE.md +++ b/dataset/README_MINIGPTv2_FINETUNE.md @@ -1,33 +1,17 @@ ## Download the COCO captions, RefCOCO, RefCOCO+. RefCOCOg, visual genome, textcaps, LLaVA, gqa, AOK-VQA, OK-VQA, OCR-VQA, filtered Flickr-30k, multi-task conversation, and Unnatural instruction datasets -After downloading all of them, organize the data as follows in `./playground/data`, - -``` -├── coco -│ └── train2017 -├── gqa -│ └── images -├── ocr_vqa -│ └── images -├── textvqa -│ └── train_images -└── vg - ├── VG_100K - └── VG_100K_2 -``` ### COCO captions - [train2017](http://images.cocodataset.org/zips/train2017.zip) - ### Visual genome - [part1](https://cs.stanford.edu/people/rak248/VG_100K_2/images.zip), [part2](https://cs.stanford.edu/people/rak248/VG_100K_2/images2.zip) ### TextCaps ### RefCOCO, RefCOCO+, RefCOCOg -Makesure you have the COCO 2014 images first. +Make sure you have the COCO 2014 images first. Then, download RefCOCO, RefCOCO+, and RefCOCOg annotation files in the following links. @@ -88,16 +72,6 @@ detail_23k.json, and complex_reasoning_77k.json in conversation.yaml, detail.yam - [minigpt4/configs/datasets/llava/reason.yaml](../minigpt4/configs/datasets/llava/reason.yaml) -### TextVQA -- [train_val_images](https://dl.fbaipublicfiles.com/textvqa/images/train_val_images.zip) -### GQA -- [images](https://downloads.cs.stanford.edu/nlp/data/gqa/images.zip) -- [Annotations](https://storage.googleapis.com/sfr-vision-language-research/LAVIS/datasets/gqa/testdev_balanced_questions.json) - - - -### GQA - ### OKVQA ### AOK-VQA