2023-10-13 19:13:15 +00:00
# MiniGPT-V
2023-10-13 05:46:52 +00:00
2023-10-13 19:13:15 +00:00
< font size = '5' > **MiniGPT-v2: Large Language Model as a Unified Interface for Vision-Language Multi-task Learning**< / font >
2023-04-16 22:04:16 +00:00
2023-10-13 19:13:15 +00:00
Jun Chen, Deyao Zhu, Xiaoqian Shen, Xiang Li, Zechun Liu, Pengchuan Zhang, Raghuraman Krishnamoorthi, Vikas Chandra, Yunyang Xiong☨, Mohamed Elhoseiny☨
2023-10-13 18:24:10 +00:00
2023-10-13 19:13:15 +00:00
☨equal last author
2023-10-13 18:24:10 +00:00
2023-10-17 07:18:13 +00:00
< a href = 'https://minigpt-v2.github.io' >< img src = 'https://img.shields.io/badge/Project-Page-Green' ></ a > < a href = 'https://arxiv.org/abs/2310.09478.pdf' >< img src = 'https://img.shields.io/badge/Paper-Arxiv-red' ></ a > < a href = 'https://huggingface.co/spaces/Vision-CAIR/MiniGPT-v2' >< img src = 'https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue' > < a href = 'https://minigpt-v2.github.io' >< img src = 'https://img.shields.io/badge/Gradio-Demo-blue' ></ a > [](https://www.youtube.com/watch?v=atFCwV2hSY4)
2023-10-13 19:13:15 +00:00
< font size = '5' > **MiniGPT-4: Enhancing Vision-language Understanding with Advanced Large Language Models**< / font >
Deyao Zhu*, Jun Chen*, Xiaoqian Shen, Xiang Li, Mohamed Elhoseiny
*equal contribution
2023-04-16 22:04:16 +00:00
2023-04-24 11:48:21 +00:00
< a href = 'https://minigpt-4.github.io' >< img src = 'https://img.shields.io/badge/Project-Page-Green' ></ a > < a href = 'https://arxiv.org/abs/2304.10592' >< img src = 'https://img.shields.io/badge/Paper-Arxiv-red' ></ a > < a href = 'https://huggingface.co/spaces/Vision-CAIR/minigpt4' >< img src = 'https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue' ></ a > < a href = 'https://huggingface.co/Vision-CAIR/MiniGPT-4' >< img src = 'https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-blue' ></ a > [](https://colab.research.google.com/drive/1OK4kYsZphwt5DXchKkzMBjYF6jnkqh4R?usp=sharing) [](https://www.youtube.com/watch?v=__tftoxpBAw& feature=youtu.be)
2023-04-16 22:04:16 +00:00
2023-10-13 18:24:10 +00:00
*King Abdullah University of Science and Technology*
2023-08-25 17:45:06 +00:00
## 💡 Get help - [Q&A](https://github.com/Vision-CAIR/MiniGPT-4/discussions/categories/q-a) or [Discord 💬](https://discord.gg/5WdJkjbAeE)
2023-04-16 22:04:16 +00:00
2023-04-20 19:57:41 +00:00
## News
2023-11-01 23:19:52 +00:00
[Oct.31 2023] We release the evaluation code of our MiniGPT-v2.
2023-11-01 23:19:06 +00:00
2023-11-01 23:19:40 +00:00
[Oct.24 2023] We release the finetuning code of our MiniGPT-v2.
2023-10-25 07:00:27 +00:00
2023-10-13 13:54:55 +00:00
[Oct.13 2023] Breaking! We release the first major update with our MiniGPT-v2
2023-10-13 05:49:11 +00:00
2023-10-13 13:54:55 +00:00
[Aug.28 2023] We now provide a llama 2 version of MiniGPT-4
2023-04-20 19:57:41 +00:00
2023-04-16 22:04:16 +00:00
## Online Demo
2023-10-13 06:11:29 +00:00
Click the image to chat with MiniGPT-v2 around your images
[](https://minigpt-v2.github.io/)
2023-10-13 05:50:51 +00:00
2023-04-16 22:04:16 +00:00
Click the image to chat with MiniGPT-4 around your images
[](https://minigpt-4.github.io)
2023-10-13 13:54:55 +00:00
## MiniGPT-v2 Examples
2023-10-13 06:23:40 +00:00

2023-10-13 06:21:54 +00:00
2023-10-13 06:23:55 +00:00
2023-10-13 13:54:55 +00:00
## MiniGPT-4 Examples
2023-04-16 22:04:16 +00:00
| | |
:-------------------------:|:-------------------------:
 | 
 | 
More examples can be found in the [project page ](https://minigpt-4.github.io ).
## Getting Started
### Installation
**1. Prepare the code and the environment**
2023-04-19 17:28:28 +00:00
Git clone our repository, creating a python environment and activate it via the following command
2023-04-16 22:04:16 +00:00
```bash
git clone https://github.com/Vision-CAIR/MiniGPT-4.git
cd MiniGPT-4
conda env create -f environment.yml
2023-10-23 17:20:40 +00:00
conda activate minigptv
2023-04-16 22:04:16 +00:00
```
2023-08-28 18:26:00 +00:00
**2. Prepare the pretrained LLM weights**
2023-04-16 22:04:16 +00:00
2023-10-13 13:54:55 +00:00
**MiniGPT-v2** is based on Llama2 Chat 7B. For **MiniGPT-4** , we have both Vicuna V0 and Llama 2 version.
2023-08-28 18:26:00 +00:00
Download the corresponding LLM weights from the following huggingface space via clone the repository using git-lfs.
2023-10-13 13:54:55 +00:00
| Llama 2 Chat 7B | Vicuna V0 13B | Vicuna V0 7B |
2023-08-28 18:26:00 +00:00
:------------------------------------------------------------------------------------------------:|:----------------------------------------------------------------------------------------------:|:----------------------------------------------------------------------------------------------:
2023-10-13 13:54:55 +00:00
[Download ](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/tree/main ) | [Downlad ](https://huggingface.co/Vision-CAIR/vicuna/tree/main ) | [Download ](https://huggingface.co/Vision-CAIR/vicuna-7b/tree/main )
2023-04-16 22:04:16 +00:00
2023-10-13 14:07:31 +00:00
Then, set the variable *llama_model* in the model config file to the LLM weight path.
* For MiniGPT-v2, set the LLM path
[here ](minigpt4/configs/models/minigpt_v2.yaml#L15 ) at Line 14.
* For MiniGPT-4 (Llama2), set the LLM path
2023-08-28 18:26:00 +00:00
[here ](minigpt4/configs/models/minigpt4_llama2.yaml#L15 ) at Line 15.
2023-04-16 22:04:16 +00:00
2023-10-13 14:07:31 +00:00
* For MiniGPT-4 (Vicuna), set the LLM path
[here ](minigpt4/configs/models/minigpt4_vicuna0.yaml#L18 ) at Line 18
2023-10-13 13:54:55 +00:00
**3. Prepare the pretrained model checkpoints**
2023-04-16 22:04:16 +00:00
2023-10-13 14:07:31 +00:00
Download the pretrained model checkpoints
2023-10-13 08:24:40 +00:00
2023-10-13 08:12:45 +00:00
2023-10-25 05:36:31 +00:00
| MiniGPT-v2 (after stage-2) | MiniGPT-v2 (after stage-3) | MiniGPT-v2 (online developing demo)|
2023-10-25 04:52:44 +00:00
|------------------------------|------------------------------|------------------------------|
2023-10-30 20:28:39 +00:00
| [Download ](https://drive.google.com/file/d/1Vi_E7ZtZXRAQcyz4f8E6LtLh2UXABCmu/view?usp=sharing ) |[Download](https://drive.google.com/file/d/1HkoUUrjzFGn33cSiUkI-KcT-zysCynAz/view?usp=sharing) | [Download ](https://drive.google.com/file/d/1aVbfW7nkCSYx99_vCRyP1sOlQiWVSnAl/view?usp=sharing ) |
2023-10-25 04:52:44 +00:00
2023-04-20 19:03:34 +00:00
2023-10-13 13:54:55 +00:00
For **MiniGPT-v2** , set the path to the pretrained checkpoint in the evaluation config file
in [eval_configs/minigptv2_eval.yaml ](eval_configs/minigptv2_eval.yaml#L10 ) at Line 8.
2023-04-20 19:03:34 +00:00
2023-10-13 13:54:55 +00:00
| MiniGPT-4 (Vicuna 13B) | MiniGPT-4 (Vicuna 7B) | MiniGPT-4 (LLaMA-2 Chat 7B) |
|----------------------------|---------------------------|---------------------------------|
| [Download ](https://drive.google.com/file/d/1a4zLvaiDBr-36pasffmgpvH5P7CKmpze/view?usp=share_link ) | [Download ](https://drive.google.com/file/d/1RY9jV0dyqLX-o38LrumkKRh6Jtaop58R/view?usp=sharing ) | [Download ](https://drive.google.com/file/d/11nAPjEok8eAGGEG1N2vXo3kBLCg0WgUk/view?usp=sharing ) |
For **MiniGPT-4** , set the path to the pretrained checkpoint in the evaluation config file
2023-08-28 18:26:00 +00:00
in [eval_configs/minigpt4_eval.yaml ](eval_configs/minigpt4_eval.yaml#L10 ) at Line 8 for Vicuna version or [eval_configs/minigpt4_llama2_eval.yaml ](eval_configs/minigpt4_llama2_eval.yaml#L10 ) for LLama2 version.
2023-04-16 22:04:16 +00:00
### Launching Demo Locally
2023-10-13 13:54:55 +00:00
For MiniGPT-v2, run
2023-04-16 22:04:16 +00:00
```
2023-10-16 10:56:16 +00:00
python demo_v2.py --cfg-path eval_configs/minigptv2_eval.yaml --gpu-id 0
2023-04-16 22:04:16 +00:00
```
2023-10-13 13:54:55 +00:00
For MiniGPT-4 (Vicuna version), run
2023-08-28 18:26:00 +00:00
```
2023-10-13 13:54:55 +00:00
python demo.py --cfg-path eval_configs/minigpt4_eval.yaml --gpu-id 0
2023-08-28 18:26:00 +00:00
```
2023-10-13 13:54:55 +00:00
For MiniGPT-4 (Llama2 version), run
2023-10-13 07:06:02 +00:00
```
2023-10-13 13:54:55 +00:00
python demo.py --cfg-path eval_configs/minigpt4_llama2_eval.yaml --gpu-id 0
2023-10-13 07:06:02 +00:00
```
2023-08-28 18:26:00 +00:00
To save GPU memory, LLMs loads as 8 bit by default, with a beam search width of 1.
This configuration requires about 23G GPU memory for 13B LLM and 11.5G GPU memory for 7B LLM.
2023-04-20 19:08:42 +00:00
For more powerful GPUs, you can run the model
2023-10-13 14:07:31 +00:00
in 16 bit by setting `low_resource` to `False` in the relevant config file:
* MiniGPT-v2: [minigptv2_eval.yaml ](eval_configs/minigptv2_eval.yaml#6 )
* MiniGPT-4 (Llama2): [minigpt4_llama2_eval.yaml ](eval_configs/minigpt4_llama2_eval.yaml#6 )
* MiniGPT-4 (Vicuna): [minigpt4_eval.yaml ](eval_configs/minigpt4_eval.yaml#6 )
2023-04-16 22:04:16 +00:00
2023-10-13 13:54:55 +00:00
Thanks [@WangRongsheng ](https://github.com/WangRongsheng ), you can also run MiniGPT-4 on [Colab ](https://colab.research.google.com/drive/1OK4kYsZphwt5DXchKkzMBjYF6jnkqh4R?usp=sharing )
2023-04-20 19:08:42 +00:00
2023-04-16 22:04:16 +00:00
### Training
2023-10-13 14:07:31 +00:00
For training details of MiniGPT-4, check [here ](MiniGPT4_Train.md ).
2023-04-16 22:04:16 +00:00
2023-10-25 04:52:44 +00:00
For finetuning details of MiniGPT-v2, check [here ](MiniGPTv2_Train.md )
2023-04-16 22:04:16 +00:00
2023-11-01 23:19:06 +00:00
### Evaluation
For finetuning details of MiniGPT-v2, check [here ](eval_scripts/EVAL_README.md )
2023-04-16 22:04:16 +00:00
## Acknowledgement
2023-04-17 15:33:24 +00:00
+ [BLIP2 ](https://huggingface.co/docs/transformers/main/model_doc/blip-2 ) The model architecture of MiniGPT-4 follows BLIP-2. Don't forget to check this great open-source work if you don't know it before!
+ [Lavis ](https://github.com/salesforce/LAVIS ) This repository is built upon Lavis!
+ [Vicuna ](https://github.com/lm-sys/FastChat ) The fantastic language ability of Vicuna with only 13B parameters is just amazing. And it is open-source!
2023-10-13 07:14:38 +00:00
+ [LLaMA ](https://github.com/facebookresearch/llama ) The strong open-sourced LLaMA 2 language model.
2023-04-16 22:04:16 +00:00
2023-10-13 14:07:31 +00:00
If you're using MiniGPT-4/MiniGPT-v2 in your research or applications, please cite using this BibTeX:
2023-04-16 22:04:16 +00:00
```bibtex
2023-10-13 07:14:38 +00:00
2023-10-20 05:15:20 +00:00
@article {chen2023minigptv2,
title={MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning},
author={Chen, Jun and Zhu, Deyao and Shen, Xiaoqian and Li, Xiang and Liu, Zechu and Zhang, Pengchuan and Krishnamoorthi, Raghuraman and Chandra, Vikas and Xiong, Yunyang and Elhoseiny, Mohamed},
year={2023},
journal={arXiv preprint arXiv:2310.09478},
2023-10-13 07:14:38 +00:00
}
2023-05-01 11:02:12 +00:00
@article {zhu2023minigpt,
title={MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models},
author={Zhu, Deyao and Chen, Jun and Shen, Xiaoqian and Li, Xiang and Elhoseiny, Mohamed},
journal={arXiv preprint arXiv:2304.10592},
year={2023}
2023-04-16 22:04:16 +00:00
}
```
2023-04-17 14:52:22 +00:00
2023-04-16 22:04:16 +00:00
## License
This repository is under [BSD 3-Clause License ](LICENSE.md ).
Many codes are based on [Lavis ](https://github.com/salesforce/LAVIS ) with
BSD 3-Clause License [here ](LICENSE_Lavis.md ).