Merge pull request #60 from N4RMA/typos

corrected typos in README.md
This commit is contained in:
Mohamed F. Ahmed 2023-08-15 07:47:13 -07:00 committed by GitHub
commit 81d8278bd1
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -28,7 +28,7 @@ More examples can be found in the [project page](https://minigpt-4.github.io).
## Introduction ## Introduction
- MiniGPT-4 aligns a frozen visual encoder from BLIP-2 with a frozen LLM, Vicuna, using just one projection layer. - MiniGPT-4 aligns a frozen visual encoder from BLIP-2 with a frozen LLM, Vicuna, using just one projection layer.
- We train MiniGPT-4 with two stages. The first traditional pretraining stage is trained using roughly 5 million aligned image-text pairs in 10 hours using 4 A100s. After the first stage, Vicuna is able to understand the image. But the generation ability of Vicuna is heavilly impacted. - We train MiniGPT-4 with two stages. The first traditional pretraining stage is trained using roughly 5 million aligned image-text pairs in 10 hours using 4 A100s. After the first stage, Vicuna is able to understand the image. But the generation ability of Vicuna is heavily impacted.
- To address this issue and improve usability, we propose a novel way to create high-quality image-text pairs by the model itself and ChatGPT together. Based on this, we then create a small (3500 pairs in total) yet high-quality dataset. - To address this issue and improve usability, we propose a novel way to create high-quality image-text pairs by the model itself and ChatGPT together. Based on this, we then create a small (3500 pairs in total) yet high-quality dataset.
- The second finetuning stage is trained on this dataset in a conversation template to significantly improve its generation reliability and overall usability. To our surprise, this stage is computationally efficient and takes only around 7 minutes with a single A100. - The second finetuning stage is trained on this dataset in a conversation template to significantly improve its generation reliability and overall usability. To our surprise, this stage is computationally efficient and takes only around 7 minutes with a single A100.
- MiniGPT-4 yields many emerging vision-language capabilities similar to those demonstrated in GPT-4. - MiniGPT-4 yields many emerging vision-language capabilities similar to those demonstrated in GPT-4.
@ -42,7 +42,7 @@ More examples can be found in the [project page](https://minigpt-4.github.io).
**1. Prepare the code and the environment** **1. Prepare the code and the environment**
Git clone our repository, creating a python environment and ativate it via the following command Git clone our repository, creating a python environment and activate it via the following command
```bash ```bash
git clone https://github.com/Vision-CAIR/MiniGPT-4.git git clone https://github.com/Vision-CAIR/MiniGPT-4.git