MiniGPT-4/README_MINIGPTv2_FINETUNE.md at 5e8c105fd38382d580116d72606a156117f60201

chris/MiniGPT-4

Fork 0

mirror of https://github.com/Vision-CAIR/MiniGPT-4.git synced 2025-04-05 02:20:47 +00:00

junchen14 1d0c37d924 add datasets

2023-10-24 09:04:24 +03:00

9.9 KiB

Raw Blame History

Download the COCO captions, RefCOCO, RefCOCO+. RefCOCOg, visual genome, textcaps, LLaVA, gqa, AOK-VQA, OK-VQA, OCR-VQA, filtered Flickr-30k, multi-task conversation, and Unnatural instruction datasets

Download the dataset

Image source	Download path
COCO 2014 images	images captions
COCO VQA	vqa train vqa val
Visual Genome	images part1 images part2
TextCaps	images annotations
RefCOCO	annotations
RefCOCO+	annotations
RefCOCOg	annotations
LLaVA	Compelex reasoning Detailed description Conversation
OKVQA	annotations
AOK-VQA	annotations
OCR-VQA	annotations
Filtered Flickr-30k	annotations
Multi-task conversation	annotations
Filtered unnatural instruction	annotations

COCO captions

Download the COCO 2014 images and captions

├── ${MINIGPTv2_DATASET}
│   ├── coco_captions
│       ├── coco_images
|       ├── annotations
|            ├── coco_karpathy_train.json

Set image_path to the COCO 2014 image folder. Similarly, set ann_path to the coco_karpathy_train.json path

minigpt4/configs/datasets/coco/caption.yaml

COCO VQA

Download the vqa v2 train and validation json files

├── ${MINIGPTv2_DATASET}
│   ├── vqav2
│       ├── vqa_train.json
|       ├── vqa_val.json

Set image_path to the COCO 2014 image folder. Similarly, set ann_path to the vqa_train.json and vqa_val.json path

minigpt4/configs/datasets/coco/defaults_vqa.yaml

Visual genome

Download visiual genome images and annotation files

├── ${MINIGPTv2_DATASET}
│   ├── visual_genome
│       ├── VG_100K
│       ├── VG_100K_2
|       ├── region_descriptions.json

Set image_path to visual_genome folder. Similarly, set ann_path to to visual_genome folder.

minigpt4/configs/datasets/vg/ref.yaml

TextCaps

Download the TextCaps images and annotation files

├── ${MINIGPTv2_DATASET}
│   ├── TextCaps
│       ├── train_images
│       ├── TextCaps_0.1_train.json

Set image_path to TextCaps train_images folder. Similarly, set ann_path to the TextCaps_0.1_train.json path

minigpt4/configs/datasets/textcaps/caption.yaml

RefCOCO, RefCOCO+, RefCOCOg

Download the RefCOCO, RefCOCO+, RefCOCOg annotation files

Location_you_like
├── ${MINIGPTv2_DATASET}
│   ├── refcoco_annotations
│       ├── refcoco
|            ├── instances.json
|            ├── refs(google).p
|            ├── refs(unc).p
│       ├── refcoco+
|            ├── instances.json
|            ├── refs(unc).p
│       ├── refcocog
|            ├── instances.json
|            ├── refs(google).p
|            ├── refs(und).p

Set image_path to the COCO 2014 image folder. Similarly, set ann_path in all the following configs to the above folder (Location_you_like) that contains refcoco, refcoco+, and refcocog.

LLaVA

Location_you_like
├── ${MINIGPTv2_DATASET}
│   ├── llava
│       ├── conversation_58k.json
│       ├── detail_23k.json
│       ├── complex_reasoning_77k.json

Set image_path to the COCO 2014 image folder. Similarly, set ann_path to the location of the previous downloaded conversation_58k.json, detail_23k.json, and complex_reasoning_77k.json in conversation.yaml, detail.yaml, and reason.yaml, respectively.

OKVQA

Location_you_like
├── ${MINIGPTv2_DATASET}
│   ├── OKVQA
│       ├── okvqa_train.json

Set image_path to the COCO 2014 image folder. Similarly, set ann_path to the location of the OKVQA dataset

minigpt4/configs/datasets/okvqa/defaults.yaml

COCO-VQA

AOK-VQA

Download the AOK-VQA annotation dataset

export AOKVQA_DIR=YOUR_DATASET_PATH
mkdir -p ${AOKVQA_DIR}
curl -fsSL https://prior-datasets.s3.us-east-2.amazonaws.com/aokvqa/aokvqa_v1p0.tar.gz | tar xvz -C ${AOKVQA_DIR}

Location_you_like
├── ${MINIGPTv2_DATASET}
│   ├── AOKVQA
│       ├── aokvqa_v1p0_train.json

Set image_path to the COCO 2014 image folder. Similarly, set ann_path to the location of the AOKVQA dataset

minigpt4/configs/datasets/aokvqa/defaults.yaml

OCR-VQA

Download the OCR-VQA annotation files

Location_you_like
├── ${MINIGPTv2_DATASET}
│   ├── OCR-VQA
│       ├── images
│       ├── dataset.json

Set image_path as the OCR-VQA image folder. Similarly, set ann_path to the lhe OCR-VQA dataset.json

minigpt4/configs/datasets/ocrvqa/ocrvqa.yaml

filtered Flickr-30k

Download filtered Flickr-30k images and annotation files

Location_you_like
├── ${MINIGPTv2_DATASET}
│   ├── filtered_flickr
│       ├── images
│       ├── captiontobbox.json
│       ├── groundedcaption.json
│       ├── phrasetobbox.json

Set image_path as the flickr-30k images foler. Similarly, set ann_path to the groundedcaption.json, captiontobbox.json and phrasetobbox.json for the grounded image caption, caption to bbox, and phrase to bbox datasets.

Multi-task conversation

Download the multi-task converstation dataset

Location_you_like
├── ${MINIGPTv2_DATASET}
│   ├── multitask_conversation
│       ├── multitask_conversation.json

Set image_path as the COCO 2014 images folder. Similarly, set ann_path to the multitask_conversation.json file path

minigpt4/configs/datasets/multitask_conversation/default.yaml

Unnatural instruction

Download the filtered unnatural instruction annotation files (we remove the very long sentences from the original unnatural instruction dataset)

Location_you_like
├── ${MINIGPTv2_DATASET}
│   ├── unnatural-instructions
│       ├── filtered_unnatural_instruction.json

There is no image path. Similarly, set ann_path to the filtered_unnatural_instruction.json file path

minigpt4/configs/datasets/nlp/unnatural_instruction.yaml

9.9 KiB Raw Blame History