mirror of
https://github.com/Vision-CAIR/MiniGPT-4.git
synced 2025-04-04 18:10:47 +00:00
update readme
This commit is contained in:
parent
3000873dcc
commit
b700afd18b
41
MiniGPT4_Train.md
Normal file
41
MiniGPT4_Train.md
Normal file
@ -0,0 +1,41 @@
|
|||||||
|
## Training of MiniGPT-4
|
||||||
|
|
||||||
|
The training of MiniGPT-4 contains two alignment stages.
|
||||||
|
|
||||||
|
**1. First pretraining stage**
|
||||||
|
|
||||||
|
In the first pretrained stage, the model is trained using image-text pairs from Laion and CC datasets
|
||||||
|
to align the vision and language model. To download and prepare the datasets, please check
|
||||||
|
our [first stage dataset preparation instruction](dataset/README_1_STAGE.md).
|
||||||
|
After the first stage, the visual features are mapped and can be understood by the language
|
||||||
|
model.
|
||||||
|
To launch the first stage training, run the following command. In our experiments, we use 4 A100.
|
||||||
|
You can change the save path in the config file
|
||||||
|
[train_configs/minigpt4_stage1_pretrain.yaml](train_configs/minigpt4_stage1_pretrain.yaml)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
torchrun --nproc-per-node NUM_GPU train.py --cfg-path train_configs/minigpt4_stage1_pretrain.yaml
|
||||||
|
```
|
||||||
|
|
||||||
|
A MiniGPT-4 checkpoint with only stage one training can be downloaded
|
||||||
|
[here (13B)](https://drive.google.com/file/d/1u9FRRBB3VovP1HxCAlpD9Lw4t4P6-Yq8/view?usp=share_link) or [here (7B)](https://drive.google.com/file/d/1HihQtCEXUyBM1i9DQbaK934wW3TZi-h5/view?usp=share_link).
|
||||||
|
Compared to the model after stage two, this checkpoint generate incomplete and repeated sentences frequently.
|
||||||
|
|
||||||
|
|
||||||
|
**2. Second finetuning stage**
|
||||||
|
|
||||||
|
In the second stage, we use a small high quality image-text pair dataset created by ourselves
|
||||||
|
and convert it to a conversation format to further align MiniGPT-4.
|
||||||
|
To download and prepare our second stage dataset, please check our
|
||||||
|
[second stage dataset preparation instruction](dataset/README_2_STAGE.md).
|
||||||
|
To launch the second stage alignment,
|
||||||
|
first specify the path to the checkpoint file trained in stage 1 in
|
||||||
|
[train_configs/minigpt4_stage1_pretrain.yaml](train_configs/minigpt4_stage2_finetune.yaml).
|
||||||
|
You can also specify the output path there.
|
||||||
|
Then, run the following command. In our experiments, we use 1 A100.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
torchrun --nproc-per-node NUM_GPU train.py --cfg-path train_configs/minigpt4_stage2_finetune.yaml
|
||||||
|
```
|
||||||
|
|
||||||
|
After the second stage alignment, MiniGPT-4 is able to talk about the image coherently and user-friendly.
|
@ -1,35 +0,0 @@
|
|||||||
## How to Prepare Vicuna Weight
|
|
||||||
Vicuna is an open-source LLAMA-based LLM that has a performance close to ChatGPT.
|
|
||||||
We currently use the v0 version of Vicuna-13B.
|
|
||||||
|
|
||||||
To prepare Vicuna’s weight, first download Vicuna’s **delta** weight from [https://huggingface.co/lmsys/vicuna-13b-delta-v0](https://huggingface.co/lmsys/vicuna-13b-delta-v0).
|
|
||||||
In case you have git-lfs installed (https://git-lfs.com), this can be done by
|
|
||||||
|
|
||||||
```
|
|
||||||
git lfs install
|
|
||||||
git clone https://huggingface.co/lmsys/vicuna-13b-delta-v0 # more powerful, need at least 24G gpu memory
|
|
||||||
# or
|
|
||||||
git clone https://huggingface.co/lmsys/vicuna-7b-delta-v0 # smaller, need 12G gpu memory
|
|
||||||
```
|
|
||||||
|
|
||||||
Note that this is not directly the working weight, but the difference between the working weight and the original weight of LLAMA-13B. (Due to LLAMA’s rules, we cannot distribute the weight of LLAMA.)
|
|
||||||
|
|
||||||
Then, you need to obtain the original LLAMA-7B or LLAMA-13B weights in the HuggingFace format
|
|
||||||
either following the instruction provided by HuggingFace
|
|
||||||
[here](https://huggingface.co/docs/transformers/main/model_doc/llama) or from the Internet.
|
|
||||||
|
|
||||||
When these two weights are ready, we can use tools from Vicuna’s team to create the real working weight.
|
|
||||||
First, Install their library that is compatible with v0 Vicuna by
|
|
||||||
|
|
||||||
```
|
|
||||||
pip install git+https://github.com/lm-sys/FastChat.git@v0.1.10
|
|
||||||
```
|
|
||||||
|
|
||||||
Then, run the following command to create the final working weight
|
|
||||||
|
|
||||||
```
|
|
||||||
python -m fastchat.model.apply_delta --base /path/to/llama-13bOR7b-hf/ --target /path/to/save/working/vicuna/weight/ --delta /path/to/vicuna-13bOR7b-delta-v0/
|
|
||||||
```
|
|
||||||
|
|
||||||
Now you are good to go!
|
|
||||||
|
|
64
README.md
64
README.md
@ -63,14 +63,20 @@ Download the corresponding LLM weights from the following huggingface space via
|
|||||||
[Download](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/tree/main) | [Downlad](https://huggingface.co/Vision-CAIR/vicuna/tree/main) | [Download](https://huggingface.co/Vision-CAIR/vicuna-7b/tree/main)
|
[Download](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/tree/main) | [Downlad](https://huggingface.co/Vision-CAIR/vicuna/tree/main) | [Download](https://huggingface.co/Vision-CAIR/vicuna-7b/tree/main)
|
||||||
|
|
||||||
|
|
||||||
Then, set the path to the vicuna weight in the model config file
|
Then, set the variable *llama_model* in the model config file to the LLM weight path.
|
||||||
[here](minigpt4/configs/models/minigpt4_vicuna0.yaml#L18) at Line 18
|
|
||||||
and/or the path to the llama2 weight in the model config file
|
* For MiniGPT-v2, set the LLM path
|
||||||
|
[here](minigpt4/configs/models/minigpt_v2.yaml#L15) at Line 14.
|
||||||
|
|
||||||
|
* For MiniGPT-4 (Llama2), set the LLM path
|
||||||
[here](minigpt4/configs/models/minigpt4_llama2.yaml#L15) at Line 15.
|
[here](minigpt4/configs/models/minigpt4_llama2.yaml#L15) at Line 15.
|
||||||
|
|
||||||
|
* For MiniGPT-4 (Vicuna), set the LLM path
|
||||||
|
[here](minigpt4/configs/models/minigpt4_vicuna0.yaml#L18) at Line 18
|
||||||
|
|
||||||
**3. Prepare the pretrained model checkpoints**
|
**3. Prepare the pretrained model checkpoints**
|
||||||
|
|
||||||
Download the pretrained checkpoints
|
Download the pretrained model checkpoints
|
||||||
|
|
||||||
|
|
||||||
| MiniGPT-v2 (LLaMA-2 Chat 7B) |
|
| MiniGPT-v2 (LLaMA-2 Chat 7B) |
|
||||||
@ -114,53 +120,17 @@ python demo.py --cfg-path eval_configs/minigpt4_llama2_eval.yaml --gpu-id 0
|
|||||||
To save GPU memory, LLMs loads as 8 bit by default, with a beam search width of 1.
|
To save GPU memory, LLMs loads as 8 bit by default, with a beam search width of 1.
|
||||||
This configuration requires about 23G GPU memory for 13B LLM and 11.5G GPU memory for 7B LLM.
|
This configuration requires about 23G GPU memory for 13B LLM and 11.5G GPU memory for 7B LLM.
|
||||||
For more powerful GPUs, you can run the model
|
For more powerful GPUs, you can run the model
|
||||||
in 16 bit by setting `low_resource` to `False` in the relevant config file
|
in 16 bit by setting `low_resource` to `False` in the relevant config file:
|
||||||
(**MiniGPT-v2**: [minigptv2_eval.yaml](eval_configs/minigptv2_eval.yaml#6); **MiniGPT-4 (Llama2)**: [minigpt4_llama2_eval.yaml](eval_configs/minigpt4_llama2_eval.yaml#6); **MiniGPT-4 (Vicuna)**: [minigpt4_eval.yaml](eval_configs/minigpt4_eval.yaml#6))
|
|
||||||
|
* MiniGPT-v2: [minigptv2_eval.yaml](eval_configs/minigptv2_eval.yaml#6)
|
||||||
|
* MiniGPT-4 (Llama2): [minigpt4_llama2_eval.yaml](eval_configs/minigpt4_llama2_eval.yaml#6)
|
||||||
|
* MiniGPT-4 (Vicuna): [minigpt4_eval.yaml](eval_configs/minigpt4_eval.yaml#6)
|
||||||
|
|
||||||
Thanks [@WangRongsheng](https://github.com/WangRongsheng), you can also run MiniGPT-4 on [Colab](https://colab.research.google.com/drive/1OK4kYsZphwt5DXchKkzMBjYF6jnkqh4R?usp=sharing)
|
Thanks [@WangRongsheng](https://github.com/WangRongsheng), you can also run MiniGPT-4 on [Colab](https://colab.research.google.com/drive/1OK4kYsZphwt5DXchKkzMBjYF6jnkqh4R?usp=sharing)
|
||||||
|
|
||||||
|
|
||||||
### Training
|
### Training
|
||||||
For training details of MiniGPT-4, check [here]().
|
For training details of MiniGPT-4, check [here](MiniGPT4_Train.md).
|
||||||
The training of MiniGPT-4 contains two alignment stages.
|
|
||||||
|
|
||||||
**1. First pretraining stage**
|
|
||||||
|
|
||||||
In the first pretrained stage, the model is trained using image-text pairs from Laion and CC datasets
|
|
||||||
to align the vision and language model. To download and prepare the datasets, please check
|
|
||||||
our [first stage dataset preparation instruction](dataset/README_1_STAGE.md).
|
|
||||||
After the first stage, the visual features are mapped and can be understood by the language
|
|
||||||
model.
|
|
||||||
To launch the first stage training, run the following command. In our experiments, we use 4 A100.
|
|
||||||
You can change the save path in the config file
|
|
||||||
[train_configs/minigpt4_stage1_pretrain.yaml](train_configs/minigpt4_stage1_pretrain.yaml)
|
|
||||||
|
|
||||||
```bash
|
|
||||||
torchrun --nproc-per-node NUM_GPU train.py --cfg-path train_configs/minigpt4_stage1_pretrain.yaml
|
|
||||||
```
|
|
||||||
|
|
||||||
A MiniGPT-4 checkpoint with only stage one training can be downloaded
|
|
||||||
[here (13B)](https://drive.google.com/file/d/1u9FRRBB3VovP1HxCAlpD9Lw4t4P6-Yq8/view?usp=share_link) or [here (7B)](https://drive.google.com/file/d/1HihQtCEXUyBM1i9DQbaK934wW3TZi-h5/view?usp=share_link).
|
|
||||||
Compared to the model after stage two, this checkpoint generate incomplete and repeated sentences frequently.
|
|
||||||
|
|
||||||
|
|
||||||
**2. Second finetuning stage**
|
|
||||||
|
|
||||||
In the second stage, we use a small high quality image-text pair dataset created by ourselves
|
|
||||||
and convert it to a conversation format to further align MiniGPT-4.
|
|
||||||
To download and prepare our second stage dataset, please check our
|
|
||||||
[second stage dataset preparation instruction](dataset/README_2_STAGE.md).
|
|
||||||
To launch the second stage alignment,
|
|
||||||
first specify the path to the checkpoint file trained in stage 1 in
|
|
||||||
[train_configs/minigpt4_stage1_pretrain.yaml](train_configs/minigpt4_stage2_finetune.yaml).
|
|
||||||
You can also specify the output path there.
|
|
||||||
Then, run the following command. In our experiments, we use 1 A100.
|
|
||||||
|
|
||||||
```bash
|
|
||||||
torchrun --nproc-per-node NUM_GPU train.py --cfg-path train_configs/minigpt4_stage2_finetune.yaml
|
|
||||||
```
|
|
||||||
|
|
||||||
After the second stage alignment, MiniGPT-4 is able to talk about the image coherently and user-friendly.
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
@ -173,7 +143,7 @@ After the second stage alignment, MiniGPT-4 is able to talk about the image cohe
|
|||||||
+ [LLaMA](https://github.com/facebookresearch/llama) The strong open-sourced LLaMA 2 language model.
|
+ [LLaMA](https://github.com/facebookresearch/llama) The strong open-sourced LLaMA 2 language model.
|
||||||
|
|
||||||
|
|
||||||
If you're using MiniGPT-4 in your research or applications, please cite using this BibTeX:
|
If you're using MiniGPT-4/MiniGPT-v2 in your research or applications, please cite using this BibTeX:
|
||||||
```bibtex
|
```bibtex
|
||||||
|
|
||||||
@article{Chen2023minigpt,
|
@article{Chen2023minigpt,
|
||||||
|
Loading…
Reference in New Issue
Block a user