9.9 KiB
Download the COCO captions, RefCOCO, RefCOCO+. RefCOCOg, visual genome, textcaps, LLaVA, gqa, AOK-VQA, OK-VQA, OCR-VQA, filtered Flickr-30k, multi-task conversation, and Unnatural instruction datasets
Download the dataset
Image source | Download path |
---|---|
COCO 2014 images | images captions |
COCO VQA | vqa train vqa val |
Visual Genome | images part1 images part2 |
TextCaps | images annotations |
RefCOCO | annotations |
RefCOCO+ | annotations |
RefCOCOg | annotations |
LLaVA | Compelex reasoning Detailed description Conversation |
OKVQA | annotations |
AOK-VQA | annotations |
OCR-VQA | annotations |
Filtered Flickr-30k | annotations |
Multi-task conversation | annotations |
Filtered unnatural instruction | annotations |
COCO captions
Download the COCO 2014 images and captions
├── ${MINIGPTv2_DATASET}
│ ├── coco_captions
│ ├── coco_images
| ├── annotations
| ├── coco_karpathy_train.json
Set image_path to the COCO 2014 image folder. Similarly, set ann_path to the coco_karpathy_train.json path
COCO VQA
Download the vqa v2 train and validation json files
├── ${MINIGPTv2_DATASET}
│ ├── vqav2
│ ├── vqa_train.json
| ├── vqa_val.json
Set image_path to the COCO 2014 image folder. Similarly, set ann_path to the vqa_train.json and vqa_val.json path
Visual genome
Download visiual genome images and annotation files
├── ${MINIGPTv2_DATASET}
│ ├── visual_genome
│ ├── VG_100K
│ ├── VG_100K_2
| ├── region_descriptions.json
Set image_path to visual_genome folder. Similarly, set ann_path to to visual_genome folder.
TextCaps
Download the TextCaps images and annotation files
├── ${MINIGPTv2_DATASET}
│ ├── TextCaps
│ ├── train_images
│ ├── TextCaps_0.1_train.json
Set image_path to TextCaps train_images folder. Similarly, set ann_path to the TextCaps_0.1_train.json path
RefCOCO, RefCOCO+, RefCOCOg
Download the RefCOCO, RefCOCO+, RefCOCOg annotation files
Location_you_like
├── ${MINIGPTv2_DATASET}
│ ├── refcoco_annotations
│ ├── refcoco
| ├── instances.json
| ├── refs(google).p
| ├── refs(unc).p
│ ├── refcoco+
| ├── instances.json
| ├── refs(unc).p
│ ├── refcocog
| ├── instances.json
| ├── refs(google).p
| ├── refs(und).p
Set image_path to the COCO 2014 image folder. Similarly, set ann_path in all the following configs to the above folder (Location_you_like) that contains refcoco, refcoco+, and refcocog.
- minigpt4/configs/datasets/coco_bbox/refcoco.yaml
- minigpt4/configs/datasets/coco_bbox/refcocog.yaml
- minigpt4/configs/datasets/coco_bbox/refcocop.yaml
- minigpt4/configs/datasets/coco_bbox/invrefcoco.yaml
- minigpt4/configs/datasets/coco_bbox/invrefcocog.yaml
- minigpt4/configs/datasets/coco_bbox/invrefcocop.yaml
LLaVA
Location_you_like
├── ${MINIGPTv2_DATASET}
│ ├── llava
│ ├── conversation_58k.json
│ ├── detail_23k.json
│ ├── complex_reasoning_77k.json
Set image_path to the COCO 2014 image folder. Similarly, set ann_path to the location of the previous downloaded conversation_58k.json, detail_23k.json, and complex_reasoning_77k.json in conversation.yaml, detail.yaml, and reason.yaml, respectively.
- minigpt4/configs/datasets/llava/conversation.yaml
- minigpt4/configs/datasets/llava/detail.yaml
- minigpt4/configs/datasets/llava/reason.yaml
OKVQA
Location_you_like
├── ${MINIGPTv2_DATASET}
│ ├── OKVQA
│ ├── okvqa_train.json
Set image_path to the COCO 2014 image folder. Similarly, set ann_path to the location of the OKVQA dataset
COCO-VQA
AOK-VQA
Download the AOK-VQA annotation dataset
export AOKVQA_DIR=YOUR_DATASET_PATH
mkdir -p ${AOKVQA_DIR}
curl -fsSL https://prior-datasets.s3.us-east-2.amazonaws.com/aokvqa/aokvqa_v1p0.tar.gz | tar xvz -C ${AOKVQA_DIR}
Location_you_like
├── ${MINIGPTv2_DATASET}
│ ├── AOKVQA
│ ├── aokvqa_v1p0_train.json
Set image_path to the COCO 2014 image folder. Similarly, set ann_path to the location of the AOKVQA dataset
OCR-VQA
Download the OCR-VQA annotation files
Location_you_like
├── ${MINIGPTv2_DATASET}
│ ├── OCR-VQA
│ ├── images
│ ├── dataset.json
Set image_path as the OCR-VQA image folder. Similarly, set ann_path to the lhe OCR-VQA dataset.json
filtered Flickr-30k
Download filtered Flickr-30k images and annotation files
Location_you_like
├── ${MINIGPTv2_DATASET}
│ ├── filtered_flickr
│ ├── images
│ ├── captiontobbox.json
│ ├── groundedcaption.json
│ ├── phrasetobbox.json
Set image_path as the flickr-30k images foler. Similarly, set ann_path to the groundedcaption.json, captiontobbox.json and phrasetobbox.json for the grounded image caption, caption to bbox, and phrase to bbox datasets.
- minigpt4/configs/datasets/flickr/default.yaml
- minigpt4/configs/datasets/flickr/caption_to_phrase.yaml
- minigpt4/configs/datasets/flickr/object_to_phrase.yaml
Multi-task conversation
Download the multi-task converstation dataset
Location_you_like
├── ${MINIGPTv2_DATASET}
│ ├── multitask_conversation
│ ├── multitask_conversation.json
Set image_path as the COCO 2014 images folder. Similarly, set ann_path to the multitask_conversation.json file path
Unnatural instruction
Download the filtered unnatural instruction annotation files (we remove the very long sentences from the original unnatural instruction dataset)
Location_you_like
├── ${MINIGPTv2_DATASET}
│ ├── unnatural-instructions
│ ├── filtered_unnatural_instruction.json
There is no image path. Similarly, set ann_path to the filtered_unnatural_instruction.json file path