MiniGPT-4/dataset/README_MINIGPTv2_FINETUNE.md
2023-10-24 06:31:52 +03:00

6.4 KiB
Raw Blame History

Download the COCO captions, RefCOCO, RefCOCO+. RefCOCOg, visual genome, textcaps, LLaVA, gqa, AOK-VQA, OK-VQA, OCR-VQA, filtered Flickr-30k, multi-task conversation, and Unnatural instruction datasets

Download the dataset

Image source Download path
COCO 2014 images images captions
Visual Genome images part1 images part2
TextCaps images annotations
RefCOCO annotations
RefCOCO+ annotations
RefCOCOg annotations
LLaVA Compelex reasoning Detailed description Conversation
OKVQA annotations
AOK-VQA annotations
OCR-VQA annotations
Filtered Flickr-30k images: annotations: annotations
Multi-task conversation annotations
Filtered unnatural instruction annotations

. ├── ${MINIGPTv2_DATASET} │ ├── coco_captions │ ├── coco_images | ├── annotations | ├── coco_karpathy_train.json

COCO captions

Download the COCO 2014 images

Visual genome

TextCaps

RefCOCO, RefCOCO+, RefCOCOg

Make sure you have the COCO 2014 images first.

Then, download RefCOCO, RefCOCO+, and RefCOCOg annotation files in the following links.

Unzip these files to the location you like. It should have the structure like the following

Location_you_like
├── refcoco
│   ├── instances.json
│   ├── refs(google).p
│   └── refs(unc).p
├── refcoco+
│   ├── instances.json
│   └── refs(unc).p
└── refcocog
    ├── instances.json
    ├── refs(google).p
    └── refs(umd).p

Set image_path in all the following dataset configuration files to the COCO 2014 image folder. Similarly, set ann_path in all the following configs to the above folder (Location_you_like) that contains refcoco, refcoco+, and refcocog.

LLaVA

Makesure you have the COCO 2014 images first.

Download Llava annotation files in the following link to the place you like.

Set image_path in all the following dataset configuration files to the COCO 2014 image folder. Similarly, set ann_path to the location of the previous downloaded conversation_58k.json, detail_23k.json, and complex_reasoning_77k.json in conversation.yaml, detail.yaml, and reason.yaml, respectively.

OKVQA

AOK-VQA

export AOKVQA_DIR=YOUR_DATASET_PATH
mkdir -p ${AOKVQA_DIR}
curl -fsSL https://prior-datasets.s3.us-east-2.amazonaws.com/aokvqa/aokvqa_v1p0.tar.gz | tar xvz -C ${AOKVQA_DIR}

OCR-VQA

filtered Flickr-30k

Multi-task conversation

Unnatural instruction