mirror of
https://github.com/Vision-CAIR/MiniGPT-4.git
synced 2025-04-04 18:10:47 +00:00
169 lines
7.4 KiB
Markdown
169 lines
7.4 KiB
Markdown
# MiniGPT-4 and MiniGPT-v2
|
|
|
|
|
|
**King Abdullah University of Science and Technology**
|
|
|
|
<a href='https://minigpt-4.github.io'><img src='https://img.shields.io/badge/Project-Page-Green'></a> <a href='https://arxiv.org/abs/2304.10592'><img src='https://img.shields.io/badge/Paper-Arxiv-red'></a> <a href='https://huggingface.co/spaces/Vision-CAIR/minigpt4'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue'></a> <a href='https://huggingface.co/Vision-CAIR/MiniGPT-4'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-blue'></a> [](https://colab.research.google.com/drive/1OK4kYsZphwt5DXchKkzMBjYF6jnkqh4R?usp=sharing) [](https://www.youtube.com/watch?v=__tftoxpBAw&feature=youtu.be)
|
|
|
|
## 💡 Get help - [Q&A](https://github.com/Vision-CAIR/MiniGPT-4/discussions/categories/q-a) or [Discord 💬](https://discord.gg/5WdJkjbAeE)
|
|
|
|
|
|
## News
|
|
[Oct.13 2023] Breaking! We release the first major update with our MiniGPT-v2
|
|
|
|
[Aug.28 2023] We now provide a llama 2 version of MiniGPT-4
|
|
|
|
## Online Demo
|
|
|
|
Click the image to chat with MiniGPT-v2 around your images
|
|
[](https://minigpt-v2.github.io/)
|
|
|
|
Click the image to chat with MiniGPT-4 around your images
|
|
[](https://minigpt-4.github.io)
|
|
|
|
|
|
## MiniGPT-v2 Examples
|
|
|
|

|
|
|
|
|
|
|
|
## MiniGPT-4 Examples
|
|
| | |
|
|
:-------------------------:|:-------------------------:
|
|
 | 
|
|
 | 
|
|
|
|
More examples can be found in the [project page](https://minigpt-4.github.io).
|
|
|
|
|
|
|
|
## Getting Started
|
|
### Installation
|
|
|
|
**1. Prepare the code and the environment**
|
|
|
|
Git clone our repository, creating a python environment and activate it via the following command
|
|
|
|
```bash
|
|
git clone https://github.com/Vision-CAIR/MiniGPT-4.git
|
|
cd MiniGPT-4
|
|
conda env create -f environment.yml
|
|
conda activate minigpt4
|
|
```
|
|
|
|
|
|
**2. Prepare the pretrained LLM weights**
|
|
|
|
**MiniGPT-v2** is based on Llama2 Chat 7B. For **MiniGPT-4**, we have both Vicuna V0 and Llama 2 version.
|
|
Download the corresponding LLM weights from the following huggingface space via clone the repository using git-lfs.
|
|
|
|
| Llama 2 Chat 7B | Vicuna V0 13B | Vicuna V0 7B |
|
|
:------------------------------------------------------------------------------------------------:|:----------------------------------------------------------------------------------------------:|:----------------------------------------------------------------------------------------------:
|
|
[Download](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/tree/main) | [Downlad](https://huggingface.co/Vision-CAIR/vicuna/tree/main) | [Download](https://huggingface.co/Vision-CAIR/vicuna-7b/tree/main)
|
|
|
|
|
|
Then, set the variable *llama_model* in the model config file to the LLM weight path.
|
|
|
|
* For MiniGPT-v2, set the LLM path
|
|
[here](minigpt4/configs/models/minigpt_v2.yaml#L15) at Line 14.
|
|
|
|
* For MiniGPT-4 (Llama2), set the LLM path
|
|
[here](minigpt4/configs/models/minigpt4_llama2.yaml#L15) at Line 15.
|
|
|
|
* For MiniGPT-4 (Vicuna), set the LLM path
|
|
[here](minigpt4/configs/models/minigpt4_vicuna0.yaml#L18) at Line 18
|
|
|
|
**3. Prepare the pretrained model checkpoints**
|
|
|
|
Download the pretrained model checkpoints
|
|
|
|
|
|
| MiniGPT-v2 (LLaMA-2 Chat 7B) |
|
|
|------------------------------|
|
|
| [Download](https://drive.google.com/file/d/1aVbfW7nkCSYx99_vCRyP1sOlQiWVSnAl/view?usp=sharing) |
|
|
|
|
For **MiniGPT-v2**, set the path to the pretrained checkpoint in the evaluation config file
|
|
in [eval_configs/minigptv2_eval.yaml](eval_configs/minigptv2_eval.yaml#L10) at Line 8.
|
|
|
|
|
|
|
|
| MiniGPT-4 (Vicuna 13B) | MiniGPT-4 (Vicuna 7B) | MiniGPT-4 (LLaMA-2 Chat 7B) |
|
|
|----------------------------|---------------------------|---------------------------------|
|
|
| [Download](https://drive.google.com/file/d/1a4zLvaiDBr-36pasffmgpvH5P7CKmpze/view?usp=share_link) | [Download](https://drive.google.com/file/d/1RY9jV0dyqLX-o38LrumkKRh6Jtaop58R/view?usp=sharing) | [Download](https://drive.google.com/file/d/11nAPjEok8eAGGEG1N2vXo3kBLCg0WgUk/view?usp=sharing) |
|
|
|
|
For **MiniGPT-4**, set the path to the pretrained checkpoint in the evaluation config file
|
|
in [eval_configs/minigpt4_eval.yaml](eval_configs/minigpt4_eval.yaml#L10) at Line 8 for Vicuna version or [eval_configs/minigpt4_llama2_eval.yaml](eval_configs/minigpt4_llama2_eval.yaml#L10) for LLama2 version.
|
|
|
|
|
|
|
|
### Launching Demo Locally
|
|
|
|
For MiniGPT-v2, run
|
|
```
|
|
python demo_v2.py --cfg-path eval_configs/minigpt4v2_eval.yaml --gpu-id 0
|
|
```
|
|
|
|
For MiniGPT-4 (Vicuna version), run
|
|
|
|
```
|
|
python demo.py --cfg-path eval_configs/minigpt4_eval.yaml --gpu-id 0
|
|
```
|
|
|
|
For MiniGPT-4 (Llama2 version), run
|
|
|
|
```
|
|
python demo.py --cfg-path eval_configs/minigpt4_llama2_eval.yaml --gpu-id 0
|
|
```
|
|
|
|
|
|
To save GPU memory, LLMs loads as 8 bit by default, with a beam search width of 1.
|
|
This configuration requires about 23G GPU memory for 13B LLM and 11.5G GPU memory for 7B LLM.
|
|
For more powerful GPUs, you can run the model
|
|
in 16 bit by setting `low_resource` to `False` in the relevant config file:
|
|
|
|
* MiniGPT-v2: [minigptv2_eval.yaml](eval_configs/minigptv2_eval.yaml#6)
|
|
* MiniGPT-4 (Llama2): [minigpt4_llama2_eval.yaml](eval_configs/minigpt4_llama2_eval.yaml#6)
|
|
* MiniGPT-4 (Vicuna): [minigpt4_eval.yaml](eval_configs/minigpt4_eval.yaml#6)
|
|
|
|
Thanks [@WangRongsheng](https://github.com/WangRongsheng), you can also run MiniGPT-4 on [Colab](https://colab.research.google.com/drive/1OK4kYsZphwt5DXchKkzMBjYF6jnkqh4R?usp=sharing)
|
|
|
|
|
|
### Training
|
|
For training details of MiniGPT-4, check [here](MiniGPT4_Train.md).
|
|
|
|
|
|
|
|
|
|
## Acknowledgement
|
|
|
|
+ [BLIP2](https://huggingface.co/docs/transformers/main/model_doc/blip-2) The model architecture of MiniGPT-4 follows BLIP-2. Don't forget to check this great open-source work if you don't know it before!
|
|
+ [Lavis](https://github.com/salesforce/LAVIS) This repository is built upon Lavis!
|
|
+ [Vicuna](https://github.com/lm-sys/FastChat) The fantastic language ability of Vicuna with only 13B parameters is just amazing. And it is open-source!
|
|
+ [LLaMA](https://github.com/facebookresearch/llama) The strong open-sourced LLaMA 2 language model.
|
|
|
|
|
|
If you're using MiniGPT-4/MiniGPT-v2 in your research or applications, please cite using this BibTeX:
|
|
```bibtex
|
|
|
|
@article{Chen2023minigpt,
|
|
title={MiniGPT-v2: Large Language Model as a Unified Interface for Vision-Language Multi-task Learning},
|
|
author={Chen, Jun and Zhu, Deyao and Shen, Xiaoqian and Li, Xiang and Liu, Zechu and Zhang, Pengchuan and Krishnamoorthi, Raghuraman and Chandra, Vikas and Xiong, Yunyang and Elhoseiny, Mohamed},
|
|
journal={github},
|
|
year={2023}
|
|
}
|
|
|
|
@article{zhu2023minigpt,
|
|
title={MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models},
|
|
author={Zhu, Deyao and Chen, Jun and Shen, Xiaoqian and Li, Xiang and Elhoseiny, Mohamed},
|
|
journal={arXiv preprint arXiv:2304.10592},
|
|
year={2023}
|
|
}
|
|
```
|
|
|
|
|
|
## License
|
|
This repository is under [BSD 3-Clause License](LICENSE.md).
|
|
Many codes are based on [Lavis](https://github.com/salesforce/LAVIS) with
|
|
BSD 3-Clause License [here](LICENSE_Lavis.md).
|