From b700afd18b28d8c3b049a587aa63ed819e09c643 Mon Sep 17 00:00:00 2001 From: Deyao Zhu Date: Fri, 13 Oct 2023 17:07:31 +0300 Subject: [PATCH] update readme --- MiniGPT4_Train.md | 41 ++++++++++++++++++++++++++++++ PrepareVicuna.md | 35 -------------------------- README.md | 64 +++++++++++++---------------------------------- 3 files changed, 58 insertions(+), 82 deletions(-) create mode 100644 MiniGPT4_Train.md delete mode 100644 PrepareVicuna.md diff --git a/MiniGPT4_Train.md b/MiniGPT4_Train.md new file mode 100644 index 0000000..f9e8a5c --- /dev/null +++ b/MiniGPT4_Train.md @@ -0,0 +1,41 @@ +## Training of MiniGPT-4 + +The training of MiniGPT-4 contains two alignment stages. + +**1. First pretraining stage** + +In the first pretrained stage, the model is trained using image-text pairs from Laion and CC datasets +to align the vision and language model. To download and prepare the datasets, please check +our [first stage dataset preparation instruction](dataset/README_1_STAGE.md). +After the first stage, the visual features are mapped and can be understood by the language +model. +To launch the first stage training, run the following command. In our experiments, we use 4 A100. +You can change the save path in the config file +[train_configs/minigpt4_stage1_pretrain.yaml](train_configs/minigpt4_stage1_pretrain.yaml) + +```bash +torchrun --nproc-per-node NUM_GPU train.py --cfg-path train_configs/minigpt4_stage1_pretrain.yaml +``` + +A MiniGPT-4 checkpoint with only stage one training can be downloaded +[here (13B)](https://drive.google.com/file/d/1u9FRRBB3VovP1HxCAlpD9Lw4t4P6-Yq8/view?usp=share_link) or [here (7B)](https://drive.google.com/file/d/1HihQtCEXUyBM1i9DQbaK934wW3TZi-h5/view?usp=share_link). +Compared to the model after stage two, this checkpoint generate incomplete and repeated sentences frequently. + + +**2. Second finetuning stage** + +In the second stage, we use a small high quality image-text pair dataset created by ourselves +and convert it to a conversation format to further align MiniGPT-4. +To download and prepare our second stage dataset, please check our +[second stage dataset preparation instruction](dataset/README_2_STAGE.md). +To launch the second stage alignment, +first specify the path to the checkpoint file trained in stage 1 in +[train_configs/minigpt4_stage1_pretrain.yaml](train_configs/minigpt4_stage2_finetune.yaml). +You can also specify the output path there. +Then, run the following command. In our experiments, we use 1 A100. + +```bash +torchrun --nproc-per-node NUM_GPU train.py --cfg-path train_configs/minigpt4_stage2_finetune.yaml +``` + +After the second stage alignment, MiniGPT-4 is able to talk about the image coherently and user-friendly. diff --git a/PrepareVicuna.md b/PrepareVicuna.md deleted file mode 100644 index 0585e62..0000000 --- a/PrepareVicuna.md +++ /dev/null @@ -1,35 +0,0 @@ -## How to Prepare Vicuna Weight -Vicuna is an open-source LLAMA-based LLM that has a performance close to ChatGPT. -We currently use the v0 version of Vicuna-13B. - -To prepare Vicuna’s weight, first download Vicuna’s **delta** weight from [https://huggingface.co/lmsys/vicuna-13b-delta-v0](https://huggingface.co/lmsys/vicuna-13b-delta-v0). -In case you have git-lfs installed (https://git-lfs.com), this can be done by - -``` -git lfs install -git clone https://huggingface.co/lmsys/vicuna-13b-delta-v0 # more powerful, need at least 24G gpu memory -# or -git clone https://huggingface.co/lmsys/vicuna-7b-delta-v0 # smaller, need 12G gpu memory -``` - -Note that this is not directly the working weight, but the difference between the working weight and the original weight of LLAMA-13B. (Due to LLAMA’s rules, we cannot distribute the weight of LLAMA.) - -Then, you need to obtain the original LLAMA-7B or LLAMA-13B weights in the HuggingFace format -either following the instruction provided by HuggingFace -[here](https://huggingface.co/docs/transformers/main/model_doc/llama) or from the Internet. - -When these two weights are ready, we can use tools from Vicuna’s team to create the real working weight. -First, Install their library that is compatible with v0 Vicuna by - -``` -pip install git+https://github.com/lm-sys/FastChat.git@v0.1.10 -``` - -Then, run the following command to create the final working weight - -``` -python -m fastchat.model.apply_delta --base /path/to/llama-13bOR7b-hf/ --target /path/to/save/working/vicuna/weight/ --delta /path/to/vicuna-13bOR7b-delta-v0/ -``` - -Now you are good to go! - diff --git a/README.md b/README.md index 8b3689c..aa34310 100644 --- a/README.md +++ b/README.md @@ -63,14 +63,20 @@ Download the corresponding LLM weights from the following huggingface space via [Download](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/tree/main) | [Downlad](https://huggingface.co/Vision-CAIR/vicuna/tree/main) | [Download](https://huggingface.co/Vision-CAIR/vicuna-7b/tree/main) -Then, set the path to the vicuna weight in the model config file -[here](minigpt4/configs/models/minigpt4_vicuna0.yaml#L18) at Line 18 -and/or the path to the llama2 weight in the model config file +Then, set the variable *llama_model* in the model config file to the LLM weight path. + +* For MiniGPT-v2, set the LLM path +[here](minigpt4/configs/models/minigpt_v2.yaml#L15) at Line 14. + +* For MiniGPT-4 (Llama2), set the LLM path [here](minigpt4/configs/models/minigpt4_llama2.yaml#L15) at Line 15. +* For MiniGPT-4 (Vicuna), set the LLM path +[here](minigpt4/configs/models/minigpt4_vicuna0.yaml#L18) at Line 18 + **3. Prepare the pretrained model checkpoints** -Download the pretrained checkpoints +Download the pretrained model checkpoints | MiniGPT-v2 (LLaMA-2 Chat 7B) | @@ -114,53 +120,17 @@ python demo.py --cfg-path eval_configs/minigpt4_llama2_eval.yaml --gpu-id 0 To save GPU memory, LLMs loads as 8 bit by default, with a beam search width of 1. This configuration requires about 23G GPU memory for 13B LLM and 11.5G GPU memory for 7B LLM. For more powerful GPUs, you can run the model -in 16 bit by setting `low_resource` to `False` in the relevant config file -(**MiniGPT-v2**: [minigptv2_eval.yaml](eval_configs/minigptv2_eval.yaml#6); **MiniGPT-4 (Llama2)**: [minigpt4_llama2_eval.yaml](eval_configs/minigpt4_llama2_eval.yaml#6); **MiniGPT-4 (Vicuna)**: [minigpt4_eval.yaml](eval_configs/minigpt4_eval.yaml#6)) +in 16 bit by setting `low_resource` to `False` in the relevant config file: + +* MiniGPT-v2: [minigptv2_eval.yaml](eval_configs/minigptv2_eval.yaml#6) +* MiniGPT-4 (Llama2): [minigpt4_llama2_eval.yaml](eval_configs/minigpt4_llama2_eval.yaml#6) +* MiniGPT-4 (Vicuna): [minigpt4_eval.yaml](eval_configs/minigpt4_eval.yaml#6) Thanks [@WangRongsheng](https://github.com/WangRongsheng), you can also run MiniGPT-4 on [Colab](https://colab.research.google.com/drive/1OK4kYsZphwt5DXchKkzMBjYF6jnkqh4R?usp=sharing) ### Training -For training details of MiniGPT-4, check [here](). -The training of MiniGPT-4 contains two alignment stages. - -**1. First pretraining stage** - -In the first pretrained stage, the model is trained using image-text pairs from Laion and CC datasets -to align the vision and language model. To download and prepare the datasets, please check -our [first stage dataset preparation instruction](dataset/README_1_STAGE.md). -After the first stage, the visual features are mapped and can be understood by the language -model. -To launch the first stage training, run the following command. In our experiments, we use 4 A100. -You can change the save path in the config file -[train_configs/minigpt4_stage1_pretrain.yaml](train_configs/minigpt4_stage1_pretrain.yaml) - -```bash -torchrun --nproc-per-node NUM_GPU train.py --cfg-path train_configs/minigpt4_stage1_pretrain.yaml -``` - -A MiniGPT-4 checkpoint with only stage one training can be downloaded -[here (13B)](https://drive.google.com/file/d/1u9FRRBB3VovP1HxCAlpD9Lw4t4P6-Yq8/view?usp=share_link) or [here (7B)](https://drive.google.com/file/d/1HihQtCEXUyBM1i9DQbaK934wW3TZi-h5/view?usp=share_link). -Compared to the model after stage two, this checkpoint generate incomplete and repeated sentences frequently. - - -**2. Second finetuning stage** - -In the second stage, we use a small high quality image-text pair dataset created by ourselves -and convert it to a conversation format to further align MiniGPT-4. -To download and prepare our second stage dataset, please check our -[second stage dataset preparation instruction](dataset/README_2_STAGE.md). -To launch the second stage alignment, -first specify the path to the checkpoint file trained in stage 1 in -[train_configs/minigpt4_stage1_pretrain.yaml](train_configs/minigpt4_stage2_finetune.yaml). -You can also specify the output path there. -Then, run the following command. In our experiments, we use 1 A100. - -```bash -torchrun --nproc-per-node NUM_GPU train.py --cfg-path train_configs/minigpt4_stage2_finetune.yaml -``` - -After the second stage alignment, MiniGPT-4 is able to talk about the image coherently and user-friendly. +For training details of MiniGPT-4, check [here](MiniGPT4_Train.md). @@ -173,7 +143,7 @@ After the second stage alignment, MiniGPT-4 is able to talk about the image cohe + [LLaMA](https://github.com/facebookresearch/llama) The strong open-sourced LLaMA 2 language model. -If you're using MiniGPT-4 in your research or applications, please cite using this BibTeX: +If you're using MiniGPT-4/MiniGPT-v2 in your research or applications, please cite using this BibTeX: ```bibtex @article{Chen2023minigpt,