This commit is contained in:
junchen14 2023-10-23 23:56:58 +03:00
commit 3fdb6f35c9
10 changed files with 123 additions and 131 deletions

26
dataset/Evaluation.md Normal file
View File

@ -0,0 +1,26 @@
### OKVQA
### GQA
Images and question-answer pairs will be loaded during the evaluation.
``` python run_eval.py xxxx ```
### VSR
Images and question-answer pairs will be loaded during the evaluation.
``` python run_eval.py xxxx ```
### IconVQA
### VizWiz
1. Download [`test.json`](https://vizwiz.cs.colorado.edu/VizWiz_final/vqa_data/Annotations.zip) and extract [`test.zip`](https://vizwiz.cs.colorado.edu/VizWiz_final/images/test.zip) to `test`. Put them under `your_path/vizwiz`.
2. Single-GPU inference.
``` python run_eval.py xxxx ```
### HM

View File

@ -1,133 +1,94 @@
## Download the COCO captions, RefCOCO, RefCOCO+. RefCOCOg, visual genome, textcaps, LLaVA, gqa, AOK-VQA, OK-VQA, OCR-VQA, filtered Flickr-30k, multi-task conversation, and Unnatural instruction datasets ## Download the COCO captions, RefCOCO, RefCOCO+. RefCOCOg, visual genome, textcaps, LLaVA, gqa, AOK-VQA, OK-VQA, OCR-VQA, filtered Flickr-30k, multi-task conversation, and Unnatural instruction datasets
### COCO captions ### COCO captions
- [train2017](http://images.cocodataset.org/zips/train2017.zip)
### RefCOCO, RefCOCO+, RefCOCOg
### Visual genome ### Visual genome
- [part1](https://cs.stanford.edu/people/rak248/VG_100K_2/images.zip), [part2](https://cs.stanford.edu/people/rak248/VG_100K_2/images2.zip)
### textcaps ### TextCaps
- [TextCaps_0.1_train](https://dl.fbaipublicfiles.com/textvqa/data/textcaps/TextCaps_0.1_train.json)
- [Images](https://dl.fbaipublicfiles.com/textvqa/images/train_val_images.zip)
### RefCOCO, RefCOCO+, RefCOCOg
Make sure you have the COCO 2014 images first.
Then,
download RefCOCO, RefCOCO+, and RefCOCOg annotation files in the following links.
- https://bvisionweb1.cs.unc.edu/licheng/referit/data/refcoco.zip
- https://bvisionweb1.cs.unc.edu/licheng/referit/data/refcoco+.zip
- https://bvisionweb1.cs.unc.edu/licheng/referit/data/refcocog.zip
Unzip these files to the location you like. It should have the structure like the following
```
Location_you_like
├── refcoco
│ ├── instances.json
│ ├── refs(google).p
│ └── refs(unc).p
├── refcoco+
│ ├── instances.json
│ └── refs(unc).p
└── refcocog
├── instances.json
├── refs(google).p
└── refs(umd).p
```
Set **image_path** in all the following dataset configuration files to the COCO 2014 image folder.
Similarly, set **ann_path** in all the following configs to the above folder (Location_you_like) that contains refcoco, refcoco+, and refcocog.
- [minigpt4/configs/datasets/coco_bbox/refcoco.yaml](../minigpt4/configs/datasets/coco_bbox/refcoco.yaml)
- [minigpt4/configs/datasets/coco_bbox/refcocog.yaml](../minigpt4/configs/datasets/coco_bbox/refcocog.yaml)
- [minigpt4/configs/datasets/coco_bbox/refcocop.yaml](../minigpt4/configs/datasets/coco_bbox/refcocop.yaml)
- [minigpt4/configs/datasets/coco_bbox/invrefcoco.yaml](../minigpt4/configs/datasets/coco_bbox/invrefcoco.yaml)
- [minigpt4/configs/datasets/coco_bbox/invrefcocog.yaml](../minigpt4/configs/datasets/coco_bbox/invrefcocog.yaml)
- [minigpt4/configs/datasets/coco_bbox/invrefcocop.yaml](../minigpt4/configs/datasets/coco_bbox/invrefcocop.yaml)
### LLaVA ### LLaVA
Makesure you have the COCO 2014 images first.
Download Llava annotation files in the following link to the place you like.
- https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K/resolve/main/conversation_58k.json
- https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K/resolve/main/detail_23k.json
- https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K/resolve/main/complex_reasoning_77k.json
Set **image_path** in all the following dataset configuration files to the COCO 2014 image folder.
Similarly, set **ann_path** to the location of the previous downloaded conversation_58k.json,
detail_23k.json, and complex_reasoning_77k.json in conversation.yaml, detail.yaml, and reason.yaml, respectively.
- [minigpt4/configs/datasets/llava/conversation.yaml](../minigpt4/configs/datasets/llava/conversation.yaml)
- [minigpt4/configs/datasets/llava/detail.yaml](../minigpt4/configs/datasets/llava/detail.yaml)
- [minigpt4/configs/datasets/llava/reason.yaml](../minigpt4/configs/datasets/llava/reason.yaml)
### gqa
### OKVQA ### OKVQA
- [OK-VQA Input Questions](https://okvqa.allenai.org/static/data/OpenEnded_mscoco_train2014_questions.json.zip)
- [OK-VQA Annotations](https://okvqa.allenai.org/static/data/mscoco_train2014_annotations.json.zip)
- [okvqa_train](https://storage.googleapis.com/sfr-vision-language-research/LAVIS/datasets/okvqa/okvqa_train.json)
- Images are from COCO
### AOK-VQA ### AOK-VQA
```
export AOKVQA_DIR=YOUR_DATASET_PATH
mkdir -p ${AOKVQA_DIR}
curl -fsSL https://prior-datasets.s3.us-east-2.amazonaws.com/aokvqa/aokvqa_v1p0.tar.gz | tar xvz -C ${AOKVQA_DIR}
```
### OCR-VQA ### OCR-VQA
- [download script](https://drive.google.com/drive/folders/1_GYPY5UkUy7HIcR0zq3ZCFgeZN7BAfm_?usp=sharing), **we save all files as `.jpg`**
### filtered Flickr-30k ### filtered Flickr-30k
### Multi-task conversation ### Multi-task conversation
### Unnatural instruction ### Unnatural instruction
### Pre-training datasets download:
We use the filtered synthetic captions prepared by BLIP. For more details about the dataset, please refer to [BLIP](https://github.com/salesforce/BLIP).
It requires ~2.3T to store LAION and CC3M+CC12M+SBU datasets
Image source | Filtered synthetic caption by ViT-L
--- | :---:
CC3M+CC12M+SBU | <a href="https://storage.googleapis.com/sfr-vision-language-research/BLIP/datasets/ccs_synthetic_filtered_large.json">Download</a>
LAION115M | <a href="https://storage.googleapis.com/sfr-vision-language-research/BLIP/datasets/laion_synthetic_filtered_large.json">Download</a>
This will download two json files
```
ccs_synthetic_filtered_large.json
laion_synthetic_filtered_large.json
```
## prepare the data step-by-step
### setup the dataset folder and move the annotation file to the data storage folder
```
export MINIGPT4_DATASET=/YOUR/PATH/FOR/LARGE/DATASET/
mkdir ${MINIGPT4_DATASET}/cc_sbu
mkdir ${MINIGPT4_DATASET}/laion
mv ccs_synthetic_filtered_large.json ${MINIGPT4_DATASET}/cc_sbu
mv laion_synthetic_filtered_large.json ${MINIGPT4_DATASET}/laion
```
### Convert the scripts to data storate folder
```
cp convert_cc_sbu.py ${MINIGPT4_DATASET}/cc_sbu
cp download_cc_sbu.sh ${MINIGPT4_DATASET}/cc_sbu
cp convert_laion.py ${MINIGPT4_DATASET}/laion
cp download_laion.sh ${MINIGPT4_DATASET}/laion
```
### Convert the laion and cc_sbu annotation file format to be img2dataset format
```
cd ${MINIGPT4_DATASET}/cc_sbu
python convert_cc_sbu.py
cd ${MINIGPT4_DATASET}/laion
python convert_laion.py
```
### Download the datasets with img2dataset
```
cd ${MINIGPT4_DATASET}/cc_sbu
sh download_cc_sbu.sh
cd ${MINIGPT4_DATASET}/laion
sh download_laion.sh
```
The final dataset structure
```
.
├── ${MINIGPT4_DATASET}
│ ├── cc_sbu
│ ├── convert_cc_sbu.py
│ ├── download_cc_sbu.sh
│ ├── ccs_synthetic_filtered_large.json
│ ├── ccs_synthetic_filtered_large.tsv
│ └── cc_sbu_dataset
│ ├── 00000.tar
│ ├── 00000.parquet
│ ...
│ ├── laion
│ ├── convert_laion.py
│ ├── download_laion.sh
│ ├── laion_synthetic_filtered_large.json
│ ├── laion_synthetic_filtered_large.tsv
│ └── laion_dataset
│ ├── 00000.tar
│ ├── 00000.parquet
│ ...
...
```
## Set up the dataset configuration files
Then, set up the LAION dataset loading path in
[here](../minigpt4/configs/datasets/laion/defaults.yaml#L5) at Line 5 as
${MINIGPT4_DATASET}/laion/laion_dataset/{00000..10488}.tar
and the Conceptual Captoin and SBU datasets loading path in
[here](../minigpt4/configs/datasets/cc_sbu/defaults.yaml#L5) at Line 5 as
${MINIGPT4_DATASET}/cc_sbu/cc_sbu_dataset/{00000..01255}.tar

View File

@ -178,7 +178,6 @@ class MiniGPTBase(BaseModel):
answers = [self.llama_tokenizer(a + self.end_sym, answers = [self.llama_tokenizer(a + self.end_sym,
return_tensors="pt", return_tensors="pt",
add_special_tokens=False).to(self.device) for a in answers] add_special_tokens=False).to(self.device) for a in answers]
cur_id = [] cur_id = []
cur_target = [] cur_target = []
for i in range(len(questions)): for i in range(len(questions)):
@ -226,8 +225,6 @@ class MiniGPTBase(BaseModel):
conv_q = [[self.prompt_template.format(item) for item in items] for items in conv_q] conv_q = [[self.prompt_template.format(item) for item in items] for items in conv_q]
cond_embeds, cond_atts = self.prompt_wrap(img_embeds, img_atts, [q[0] for q in conv_q]) cond_embeds, cond_atts = self.prompt_wrap(img_embeds, img_atts, [q[0] for q in conv_q])
regress_token_ids, regress_atts, part_targets = self.tokenize_conversation(conv_q, conv_a) regress_token_ids, regress_atts, part_targets = self.tokenize_conversation(conv_q, conv_a)

View File

@ -75,7 +75,7 @@ class LlamaForCausalLM(LlamaForCausalLMOrig):
) )
hidden_states = outputs[0] hidden_states = outputs[0]
if self.config.pretraining_tp > 1: if hasattr(self.config, 'pretraining_tp') and self.config.pretraining_tp > 1:
lm_head_slices = self.lm_head.weight.split(self.vocab_size // self.config.pretraining_tp, dim=0) lm_head_slices = self.lm_head.weight.split(self.vocab_size // self.config.pretraining_tp, dim=0)
logits = [F.linear(hidden_states, lm_head_slices[i]) for i in range(self.config.pretraining_tp)] logits = [F.linear(hidden_states, lm_head_slices[i]) for i in range(self.config.pretraining_tp)]
logits = torch.cat(logits, dim=-1) logits = torch.cat(logits, dim=-1)

View File

@ -12,6 +12,7 @@ import random
import numpy as np import numpy as np
import torch import torch
import torch.backends.cudnn as cudnn import torch.backends.cudnn as cudnn
import wandb
import minigpt4.tasks as tasks import minigpt4.tasks as tasks
from minigpt4.common.config import Config from minigpt4.common.config import Config
@ -30,7 +31,6 @@ from minigpt4.models import *
from minigpt4.processors import * from minigpt4.processors import *
from minigpt4.runners import * from minigpt4.runners import *
from minigpt4.tasks import * from minigpt4.tasks import *
import wandb
def parse_args(): def parse_args():
@ -44,12 +44,10 @@ def parse_args():
"in xxx=yyy format will be merged into config file (deprecate), " "in xxx=yyy format will be merged into config file (deprecate), "
"change to --cfg-options instead.", "change to --cfg-options instead.",
) )
parser.add_argument("--wandb_log", default=False) parser.add_argument("--job_name", default="minigpt_v2",type=str)
parser.add_argument("--job_name",default="minigpt_v2",type=str)
args = parser.parse_args() args = parser.parse_args()
return args return args
@ -80,16 +78,13 @@ def main():
# set before init_distributed_mode() to ensure the same job_id shared across all ranks. # set before init_distributed_mode() to ensure the same job_id shared across all ranks.
job_id = now() job_id = now()
args = parse_args() args = parse_args()
cfg = Config(args)
cfg = Config(parse_args())
init_distributed_mode(cfg.run_cfg) init_distributed_mode(cfg.run_cfg)
setup_seeds(cfg) setup_seeds(cfg)
# set after init_distributed_mode() to only log on master. # set after init_distributed_mode() to only log on master.
setup_logger() setup_logger()
cfg.pretty_print() cfg.pretty_print()
task = tasks.setup_task(cfg) task = tasks.setup_task(cfg)
@ -98,10 +93,9 @@ def main():
if cfg.run_cfg.wandb_log: if cfg.run_cfg.wandb_log:
wandb.login() wandb.login()
wandb.init(project="minigptv2",name=args.job_name) wandb.init(project="minigptv", name=cfg.run_cfg.job_name)
wandb.watch(model) wandb.watch(model)
runner = get_runner_class(cfg)( runner = get_runner_class(cfg)(
cfg=cfg, job_id=job_id, task=task, model=model, datasets=datasets cfg=cfg, job_id=job_id, task=task, model=model, datasets=datasets
) )

View File

@ -53,3 +53,6 @@ run:
world_size: 1 world_size: 1
dist_url: "env://" dist_url: "env://"
distributed: True distributed: True
wandb_log: True
job_name: minigpt4_llama2_pretrain

View File

@ -47,3 +47,6 @@ run:
world_size: 1 world_size: 1
dist_url: "env://" dist_url: "env://"
distributed: True distributed: True
wandb_log: True
job_name: minigpt4_llama2_finetune

View File

@ -53,3 +53,6 @@ run:
world_size: 1 world_size: 1
dist_url: "env://" dist_url: "env://"
distributed: True distributed: True
wandb_log: True
job_name: minigpt4_pretrain

View File

@ -47,3 +47,6 @@ run:
world_size: 1 world_size: 1
dist_url: "env://" dist_url: "env://"
distributed: True distributed: True
wandb_log: True
job_name: minigpt4_finetune

View File

@ -276,7 +276,6 @@ run:
init_lr: 1e-5 init_lr: 1e-5
min_lr: 8e-5 min_lr: 8e-5
warmup_lr: 1e-6 warmup_lr: 1e-6
wandb_log: True
weight_decay: 0.05 weight_decay: 0.05
max_epoch: 50 max_epoch: 50
@ -297,3 +296,6 @@ run:
world_size: 1 world_size: 1
dist_url: "env://" dist_url: "env://"
distributed: True distributed: True
wandb_log: True
job_name: minigptv2_finetune