2023-10-13 05:50:51 +00:00
# MiniGPT-4 and MiniGPT-v2
2023-10-13 05:46:52 +00:00
2023-04-16 22:04:16 +00:00
**King Abdullah University of Science and Technology**
2023-04-24 11:48:21 +00:00
< a href = 'https://minigpt-4.github.io' >< img src = 'https://img.shields.io/badge/Project-Page-Green' ></ a > < a href = 'https://arxiv.org/abs/2304.10592' >< img src = 'https://img.shields.io/badge/Paper-Arxiv-red' ></ a > < a href = 'https://huggingface.co/spaces/Vision-CAIR/minigpt4' >< img src = 'https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue' ></ a > < a href = 'https://huggingface.co/Vision-CAIR/MiniGPT-4' >< img src = 'https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-blue' ></ a > [](https://colab.research.google.com/drive/1OK4kYsZphwt5DXchKkzMBjYF6jnkqh4R?usp=sharing) [](https://www.youtube.com/watch?v=__tftoxpBAw& feature=youtu.be)
2023-04-16 22:04:16 +00:00
2023-08-25 17:45:06 +00:00
## 💡 Get help - [Q&A](https://github.com/Vision-CAIR/MiniGPT-4/discussions/categories/q-a) or [Discord 💬](https://discord.gg/5WdJkjbAeE)
2023-04-16 22:04:16 +00:00
2023-04-20 19:57:41 +00:00
## News
2023-10-13 13:54:55 +00:00
[Oct.13 2023] Breaking! We release the first major update with our MiniGPT-v2
2023-10-13 05:49:11 +00:00
2023-10-13 13:54:55 +00:00
[Aug.28 2023] We now provide a llama 2 version of MiniGPT-4
2023-04-20 19:57:41 +00:00
2023-04-16 22:04:16 +00:00
## Online Demo
2023-10-13 06:11:29 +00:00
Click the image to chat with MiniGPT-v2 around your images
[](https://minigpt-v2.github.io/)
2023-10-13 05:50:51 +00:00
2023-04-16 22:04:16 +00:00
Click the image to chat with MiniGPT-4 around your images
[](https://minigpt-4.github.io)
2023-10-13 13:54:55 +00:00
## MiniGPT-v2 Examples
2023-10-13 06:23:40 +00:00

2023-10-13 06:21:54 +00:00
2023-10-13 06:23:55 +00:00
2023-10-13 13:54:55 +00:00
## MiniGPT-4 Examples
2023-04-16 22:04:16 +00:00
| | |
:-------------------------:|:-------------------------:
 | 
 | 
More examples can be found in the [project page ](https://minigpt-4.github.io ).
## Getting Started
### Installation
**1. Prepare the code and the environment**
2023-04-19 17:28:28 +00:00
Git clone our repository, creating a python environment and activate it via the following command
2023-04-16 22:04:16 +00:00
```bash
git clone https://github.com/Vision-CAIR/MiniGPT-4.git
cd MiniGPT-4
conda env create -f environment.yml
conda activate minigpt4
```
2023-08-28 18:26:00 +00:00
**2. Prepare the pretrained LLM weights**
2023-04-16 22:04:16 +00:00
2023-10-13 13:54:55 +00:00
**MiniGPT-v2** is based on Llama2 Chat 7B. For **MiniGPT-4** , we have both Vicuna V0 and Llama 2 version.
2023-08-28 18:26:00 +00:00
Download the corresponding LLM weights from the following huggingface space via clone the repository using git-lfs.
2023-10-13 13:54:55 +00:00
| Llama 2 Chat 7B | Vicuna V0 13B | Vicuna V0 7B |
2023-08-28 18:26:00 +00:00
:------------------------------------------------------------------------------------------------:|:----------------------------------------------------------------------------------------------:|:----------------------------------------------------------------------------------------------:
2023-10-13 13:54:55 +00:00
[Download ](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/tree/main ) | [Downlad ](https://huggingface.co/Vision-CAIR/vicuna/tree/main ) | [Download ](https://huggingface.co/Vision-CAIR/vicuna-7b/tree/main )
2023-04-16 22:04:16 +00:00
Then, set the path to the vicuna weight in the model config file
2023-08-28 18:26:00 +00:00
[here ](minigpt4/configs/models/minigpt4_vicuna0.yaml#L18 ) at Line 18
and/or the path to the llama2 weight in the model config file
[here ](minigpt4/configs/models/minigpt4_llama2.yaml#L15 ) at Line 15.
2023-04-16 22:04:16 +00:00
2023-10-13 13:54:55 +00:00
**3. Prepare the pretrained model checkpoints**
2023-04-16 22:04:16 +00:00
2023-10-13 13:54:55 +00:00
Download the pretrained checkpoints
2023-10-13 08:24:40 +00:00
2023-10-13 08:12:45 +00:00
2023-10-13 13:54:55 +00:00
| MiniGPT-v2 (LLaMA-2 Chat 7B) |
|------------------------------|
| [Download ](https://drive.google.com/file/d/1aVbfW7nkCSYx99_vCRyP1sOlQiWVSnAl/view?usp=sharing ) |
2023-04-20 19:03:34 +00:00
2023-10-13 13:54:55 +00:00
For **MiniGPT-v2** , set the path to the pretrained checkpoint in the evaluation config file
in [eval_configs/minigptv2_eval.yaml ](eval_configs/minigptv2_eval.yaml#L10 ) at Line 8.
2023-04-20 19:03:34 +00:00
2023-10-13 13:54:55 +00:00
| MiniGPT-4 (Vicuna 13B) | MiniGPT-4 (Vicuna 7B) | MiniGPT-4 (LLaMA-2 Chat 7B) |
|----------------------------|---------------------------|---------------------------------|
| [Download ](https://drive.google.com/file/d/1a4zLvaiDBr-36pasffmgpvH5P7CKmpze/view?usp=share_link ) | [Download ](https://drive.google.com/file/d/1RY9jV0dyqLX-o38LrumkKRh6Jtaop58R/view?usp=sharing ) | [Download ](https://drive.google.com/file/d/11nAPjEok8eAGGEG1N2vXo3kBLCg0WgUk/view?usp=sharing ) |
For **MiniGPT-4** , set the path to the pretrained checkpoint in the evaluation config file
2023-08-28 18:26:00 +00:00
in [eval_configs/minigpt4_eval.yaml ](eval_configs/minigpt4_eval.yaml#L10 ) at Line 8 for Vicuna version or [eval_configs/minigpt4_llama2_eval.yaml ](eval_configs/minigpt4_llama2_eval.yaml#L10 ) for LLama2 version.
2023-04-16 22:04:16 +00:00
### Launching Demo Locally
2023-10-13 13:54:55 +00:00
For MiniGPT-v2, run
2023-04-16 22:04:16 +00:00
```
2023-10-13 13:54:55 +00:00
python demo_v2.py --cfg-path eval_configs/minigpt4v2_eval.yaml --gpu-id 0
2023-04-16 22:04:16 +00:00
```
2023-10-13 13:54:55 +00:00
For MiniGPT-4 (Vicuna version), run
2023-08-28 18:26:00 +00:00
```
2023-10-13 13:54:55 +00:00
python demo.py --cfg-path eval_configs/minigpt4_eval.yaml --gpu-id 0
2023-08-28 18:26:00 +00:00
```
2023-10-13 13:54:55 +00:00
For MiniGPT-4 (Llama2 version), run
2023-10-13 07:06:02 +00:00
```
2023-10-13 13:54:55 +00:00
python demo.py --cfg-path eval_configs/minigpt4_llama2_eval.yaml --gpu-id 0
2023-10-13 07:06:02 +00:00
```
2023-08-28 18:26:00 +00:00
To save GPU memory, LLMs loads as 8 bit by default, with a beam search width of 1.
This configuration requires about 23G GPU memory for 13B LLM and 11.5G GPU memory for 7B LLM.
2023-04-20 19:08:42 +00:00
For more powerful GPUs, you can run the model
2023-09-12 16:06:28 +00:00
in 16 bit by setting `low_resource` to `False` in the relevant config file
2023-10-13 13:54:55 +00:00
(**MiniGPT-v2**: [minigptv2_eval.yaml ](eval_configs/minigptv2_eval.yaml#6 ); **MiniGPT-4 (Llama2)** : [minigpt4_llama2_eval.yaml ](eval_configs/minigpt4_llama2_eval.yaml#6 ); **MiniGPT-4 (Vicuna)** : [minigpt4_eval.yaml ](eval_configs/minigpt4_eval.yaml#6 ))
2023-04-16 22:04:16 +00:00
2023-10-13 13:54:55 +00:00
Thanks [@WangRongsheng ](https://github.com/WangRongsheng ), you can also run MiniGPT-4 on [Colab ](https://colab.research.google.com/drive/1OK4kYsZphwt5DXchKkzMBjYF6jnkqh4R?usp=sharing )
2023-04-20 19:08:42 +00:00
2023-04-16 22:04:16 +00:00
### Training
2023-10-13 13:54:55 +00:00
For training details of MiniGPT-4, check [here]().
2023-04-16 22:04:16 +00:00
The training of MiniGPT-4 contains two alignment stages.
**1. First pretraining stage**
In the first pretrained stage, the model is trained using image-text pairs from Laion and CC datasets
to align the vision and language model. To download and prepare the datasets, please check
our [first stage dataset preparation instruction ](dataset/README_1_STAGE.md ).
After the first stage, the visual features are mapped and can be understood by the language
model.
To launch the first stage training, run the following command. In our experiments, we use 4 A100.
You can change the save path in the config file
[train_configs/minigpt4_stage1_pretrain.yaml ](train_configs/minigpt4_stage1_pretrain.yaml )
```bash
torchrun --nproc-per-node NUM_GPU train.py --cfg-path train_configs/minigpt4_stage1_pretrain.yaml
```
2023-04-19 17:43:04 +00:00
A MiniGPT-4 checkpoint with only stage one training can be downloaded
2023-04-27 10:25:54 +00:00
[here (13B) ](https://drive.google.com/file/d/1u9FRRBB3VovP1HxCAlpD9Lw4t4P6-Yq8/view?usp=share_link ) or [here (7B) ](https://drive.google.com/file/d/1HihQtCEXUyBM1i9DQbaK934wW3TZi-h5/view?usp=share_link ).
2023-04-19 17:43:04 +00:00
Compared to the model after stage two, this checkpoint generate incomplete and repeated sentences frequently.
2023-04-18 13:01:24 +00:00
**2. Second finetuning stage**
2023-04-16 22:04:16 +00:00
In the second stage, we use a small high quality image-text pair dataset created by ourselves
and convert it to a conversation format to further align MiniGPT-4.
To download and prepare our second stage dataset, please check our
[second stage dataset preparation instruction ](dataset/README_2_STAGE.md ).
To launch the second stage alignment,
first specify the path to the checkpoint file trained in stage 1 in
[train_configs/minigpt4_stage1_pretrain.yaml ](train_configs/minigpt4_stage2_finetune.yaml ).
You can also specify the output path there.
Then, run the following command. In our experiments, we use 1 A100.
```bash
torchrun --nproc-per-node NUM_GPU train.py --cfg-path train_configs/minigpt4_stage2_finetune.yaml
```
After the second stage alignment, MiniGPT-4 is able to talk about the image coherently and user-friendly.
## Acknowledgement
2023-04-17 15:33:24 +00:00
+ [BLIP2 ](https://huggingface.co/docs/transformers/main/model_doc/blip-2 ) The model architecture of MiniGPT-4 follows BLIP-2. Don't forget to check this great open-source work if you don't know it before!
+ [Lavis ](https://github.com/salesforce/LAVIS ) This repository is built upon Lavis!
+ [Vicuna ](https://github.com/lm-sys/FastChat ) The fantastic language ability of Vicuna with only 13B parameters is just amazing. And it is open-source!
2023-10-13 07:14:38 +00:00
+ [LLaMA ](https://github.com/facebookresearch/llama ) The strong open-sourced LLaMA 2 language model.
2023-04-16 22:04:16 +00:00
If you're using MiniGPT-4 in your research or applications, please cite using this BibTeX:
```bibtex
2023-10-13 07:14:38 +00:00
@article {Chen2023minigpt,
title={MiniGPT-v2: Large Language Model as a Unified Interface for Vision-Language Multi-task Learning},
2023-10-13 13:54:55 +00:00
author={Chen, Jun and Zhu, Deyao and Shen, Xiaoqian and Li, Xiang and Liu, Zechu and Zhang, Pengchuan and Krishnamoorthi, Raghuraman and Chandra, Vikas and Xiong, Yunyang and Elhoseiny, Mohamed},
2023-10-13 07:14:38 +00:00
journal={github},
year={2023}
}
2023-05-01 11:02:12 +00:00
@article {zhu2023minigpt,
title={MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models},
author={Zhu, Deyao and Chen, Jun and Shen, Xiaoqian and Li, Xiang and Elhoseiny, Mohamed},
journal={arXiv preprint arXiv:2304.10592},
year={2023}
2023-04-16 22:04:16 +00:00
}
```
2023-04-17 14:52:22 +00:00
2023-04-16 22:04:16 +00:00
## License
This repository is under [BSD 3-Clause License ](LICENSE.md ).
Many codes are based on [Lavis ](https://github.com/salesforce/LAVIS ) with
BSD 3-Clause License [here ](LICENSE_Lavis.md ).