Merge branch 'main' of https://github.com/junchen14/MiniGPT-4_finetune into main

# Conflicts: # dataset/README_MINIGPTv2_FINETUNE.md
2025-04-04 18:10:47 +00:00 · 2023-10-23 21:33:10 +03:00 · 2023-10-23 21:33:10 +03:00 · b78053681a
commit b78053681a
parent c2de397a86 68df270f14
1 changed files with 26 additions and 0 deletions
--- a/dataset/README_MINIGPTv2_FINETUNE.md
+++ b/dataset/README_MINIGPTv2_FINETUNE.md
@ -1,9 +1,29 @@
 ## Download the COCO captions, RefCOCO, RefCOCO+. RefCOCOg, visual genome, textcaps, LLaVA, gqa, AOK-VQA, OK-VQA, OCR-VQA, filtered Flickr-30k, multi-task conversation, and Unnatural instruction datasets

+After downloading all of them, organize the data as follows in `./playground/data`,
+
+```
+├── coco
+│   └── train2017
+├── gqa
+│   └── images
+├── ocr_vqa
+│   └── images
+├── textvqa
+│   └── train_images
+└── vg
+    ├── VG_100K
+    └── VG_100K_2
+```
+
 ### COCO captions
+- [train2017](http://images.cocodataset.org/zips/train2017.zip)

 ### RefCOCO, RefCOCO+, RefCOCOg

+### Visual genome
+- [part1](https://cs.stanford.edu/people/rak248/VG_100K_2/images.zip), [part2](https://cs.stanford.edu/people/rak248/VG_100K_2/images2.zip)
+### TextCaps
 Makesure you have the COCO 2014 images first. 

 Then,
@ -48,6 +68,11 @@ Similarly, set **ann_path** in all the following configs to the above folder (Lo

 ### LLaVA

+### TextVQA
+- [train_val_images](https://dl.fbaipublicfiles.com/textvqa/images/train_val_images.zip)
+### GQA
+- [images](https://downloads.cs.stanford.edu/nlp/data/gqa/images.zip)
+- [Annotations](https://storage.googleapis.com/sfr-vision-language-research/LAVIS/datasets/gqa/testdev_balanced_questions.json)



@ -58,6 +83,7 @@ Similarly, set **ann_path** in all the following configs to the above folder (Lo
 ### AOK-VQA

 ### OCR-VQA
+- [download script](https://drive.google.com/drive/folders/1_GYPY5UkUy7HIcR0zq3ZCFgeZN7BAfm_?usp=sharing), **we save all files as `.jpg`**

 ### filtered Flickr-30k