MiniGPT-4/dataset/README_MINIGPTv2_FINETUNE.md
2023-10-24 09:04:24 +03:00

9.9 KiB

Download the COCO captions, RefCOCO, RefCOCO+. RefCOCOg, visual genome, textcaps, LLaVA, gqa, AOK-VQA, OK-VQA, OCR-VQA, filtered Flickr-30k, multi-task conversation, and Unnatural instruction datasets

Download the dataset

Image source Download path
COCO 2014 images images    captions
COCO VQA vqa train    vqa val
Visual Genome images part1 images part2
TextCaps images annotations
RefCOCO annotations
RefCOCO+ annotations
RefCOCOg annotations
LLaVA Compelex reasoning    Detailed description    Conversation
OKVQA annotations
AOK-VQA annotations
OCR-VQA annotations
Filtered Flickr-30k annotations
Multi-task conversation annotations
Filtered unnatural instruction annotations

COCO captions

Download the COCO 2014 images and captions

├── ${MINIGPTv2_DATASET}
│   ├── coco_captions
│       ├── coco_images
|       ├── annotations
|            ├── coco_karpathy_train.json

Set image_path to the COCO 2014 image folder. Similarly, set ann_path to the coco_karpathy_train.json path

COCO VQA

Download the vqa v2 train and validation json files

├── ${MINIGPTv2_DATASET}
│   ├── vqav2
│       ├── vqa_train.json
|       ├── vqa_val.json

Set image_path to the COCO 2014 image folder. Similarly, set ann_path to the vqa_train.json and vqa_val.json path

Visual genome

Download visiual genome images and annotation files

├── ${MINIGPTv2_DATASET}
│   ├── visual_genome
│       ├── VG_100K
│       ├── VG_100K_2
|       ├── region_descriptions.json

Set image_path to visual_genome folder. Similarly, set ann_path to to visual_genome folder.

TextCaps

Download the TextCaps images and annotation files

├── ${MINIGPTv2_DATASET}
│   ├── TextCaps
│       ├── train_images
│       ├── TextCaps_0.1_train.json

Set image_path to TextCaps train_images folder. Similarly, set ann_path to the TextCaps_0.1_train.json path

RefCOCO, RefCOCO+, RefCOCOg

Download the RefCOCO, RefCOCO+, RefCOCOg annotation files

Location_you_like
├── ${MINIGPTv2_DATASET}
│   ├── refcoco_annotations
│       ├── refcoco
|            ├── instances.json
|            ├── refs(google).p
|            ├── refs(unc).p
│       ├── refcoco+
|            ├── instances.json
|            ├── refs(unc).p
│       ├── refcocog
|            ├── instances.json
|            ├── refs(google).p
|            ├── refs(und).p

Set image_path to the COCO 2014 image folder. Similarly, set ann_path in all the following configs to the above folder (Location_you_like) that contains refcoco, refcoco+, and refcocog.

LLaVA

Location_you_like
├── ${MINIGPTv2_DATASET}
│   ├── llava
│       ├── conversation_58k.json
│       ├── detail_23k.json
│       ├── complex_reasoning_77k.json

Set image_path to the COCO 2014 image folder. Similarly, set ann_path to the location of the previous downloaded conversation_58k.json, detail_23k.json, and complex_reasoning_77k.json in conversation.yaml, detail.yaml, and reason.yaml, respectively.

OKVQA

Location_you_like
├── ${MINIGPTv2_DATASET}
│   ├── OKVQA
│       ├── okvqa_train.json

Set image_path to the COCO 2014 image folder. Similarly, set ann_path to the location of the OKVQA dataset

COCO-VQA

AOK-VQA

Download the AOK-VQA annotation dataset

export AOKVQA_DIR=YOUR_DATASET_PATH
mkdir -p ${AOKVQA_DIR}
curl -fsSL https://prior-datasets.s3.us-east-2.amazonaws.com/aokvqa/aokvqa_v1p0.tar.gz | tar xvz -C ${AOKVQA_DIR}
Location_you_like
├── ${MINIGPTv2_DATASET}
│   ├── AOKVQA
│       ├── aokvqa_v1p0_train.json

Set image_path to the COCO 2014 image folder. Similarly, set ann_path to the location of the AOKVQA dataset

OCR-VQA

Download the OCR-VQA annotation files

Location_you_like
├── ${MINIGPTv2_DATASET}
│   ├── OCR-VQA
│       ├── images
│       ├── dataset.json

Set image_path as the OCR-VQA image folder. Similarly, set ann_path to the lhe OCR-VQA dataset.json

filtered Flickr-30k

Download filtered Flickr-30k images and annotation files

Location_you_like
├── ${MINIGPTv2_DATASET}
│   ├── filtered_flickr
│       ├── images
│       ├── captiontobbox.json
│       ├── groundedcaption.json
│       ├── phrasetobbox.json

Set image_path as the flickr-30k images foler. Similarly, set ann_path to the groundedcaption.json, captiontobbox.json and phrasetobbox.json for the grounded image caption, caption to bbox, and phrase to bbox datasets.

Multi-task conversation

Download the multi-task converstation dataset

Location_you_like
├── ${MINIGPTv2_DATASET}
│   ├── multitask_conversation
│       ├── multitask_conversation.json

Set image_path as the COCO 2014 images folder. Similarly, set ann_path to the multitask_conversation.json file path

Unnatural instruction

Download the filtered unnatural instruction annotation files (we remove the very long sentences from the original unnatural instruction dataset)

Location_you_like
├── ${MINIGPTv2_DATASET}
│   ├── unnatural-instructions
│       ├── filtered_unnatural_instruction.json

There is no image path. Similarly, set ann_path to the filtered_unnatural_instruction.json file path