From 051fe3981cd1568fdda44d761d82b1e79213006f Mon Sep 17 00:00:00 2001 From: N4RMA Date: Wed, 19 Apr 2023 13:28:28 -0400 Subject: [PATCH 01/12] corrected typos --- README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 063af4f..600457a 100644 --- a/README.md +++ b/README.md @@ -24,7 +24,7 @@ More examples can be found in the [project page](https://minigpt-4.github.io). ## Introduction - MiniGPT-4 aligns a frozen visual encoder from BLIP-2 with a frozen LLM, Vicuna, using just one projection layer. -- We train MiniGPT-4 with two stages. The first traditional pretraining stage is trained using roughly 5 million aligned image-text pairs in 10 hours using 4 A100s. After the first stage, Vicuna is able to understand the image. But the generation ability of Vicuna is heavilly impacted. +- We train MiniGPT-4 with two stages. The first traditional pretraining stage is trained using roughly 5 million aligned image-text pairs in 10 hours using 4 A100s. After the first stage, Vicuna is able to understand the image. But the generation ability of Vicuna is heavily impacted. - To address this issue and improve usability, we propose a novel way to create high-quality image-text pairs by the model itself and ChatGPT together. Based on this, we then create a small (3500 pairs in total) yet high-quality dataset. - The second finetuning stage is trained on this dataset in a conversation template to significantly improve its generation reliability and overall usability. To our surprise, this stage is computationally efficient and takes only around 7 minutes with a single A100. - MiniGPT-4 yields many emerging vision-language capabilities similar to those demonstrated in GPT-4. @@ -38,7 +38,7 @@ More examples can be found in the [project page](https://minigpt-4.github.io). **1. Prepare the code and the environment** -Git clone our repository, creating a python environment and ativate it via the following command +Git clone our repository, creating a python environment and activate it via the following command ```bash git clone https://github.com/Vision-CAIR/MiniGPT-4.git @@ -50,7 +50,7 @@ conda activate minigpt4 **2. Prepare the pretrained Vicuna weights** -The current version of MiniGPT-4 is built on the v0 versoin of Vicuna-13B. +The current version of MiniGPT-4 is built on the v0 version of Vicuna-13B. Please refer to our instruction [here](PrepareVicuna.md) to prepare the Vicuna weights. The final weights would be in a single folder with the following structure: From 3bd99950f0ebcbbc7ee7b54aa33f332feeccef09 Mon Sep 17 00:00:00 2001 From: Jun Chen Date: Sun, 23 Apr 2023 15:49:37 +0300 Subject: [PATCH 02/12] Update runner_base.py --- minigpt4/runners/runner_base.py | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/minigpt4/runners/runner_base.py b/minigpt4/runners/runner_base.py index d794ed0..ccb5706 100644 --- a/minigpt4/runners/runner_base.py +++ b/minigpt4/runners/runner_base.py @@ -627,14 +627,14 @@ class RunnerBase: cached_file = download_cached_file( url_or_filename, check_hash=False, progress=True ) - checkpoint = torch.load(cached_file, map_location=self.device, strict=False) + checkpoint = torch.load(cached_file, map_location=self.device) elif os.path.isfile(url_or_filename): - checkpoint = torch.load(url_or_filename, map_location=self.device, strict=False) + checkpoint = torch.load(url_or_filename, map_location=self.device) else: raise RuntimeError("checkpoint url or path is invalid") state_dict = checkpoint["model"] - self.unwrap_dist_model(self.model).load_state_dict(state_dict) + self.unwrap_dist_model(self.model).load_state_dict(state_dict,strict=False) self.optimizer.load_state_dict(checkpoint["optimizer"]) if self.scaler and "scaler" in checkpoint: From d596970812f1aeb53ccb8bf28090c96c4d320a23 Mon Sep 17 00:00:00 2001 From: XiaoqianShen <64844805+xiaoqian-shen@users.noreply.github.com> Date: Mon, 24 Apr 2023 14:47:49 +0300 Subject: [PATCH 03/12] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 3950690..184ce8c 100644 --- a/README.md +++ b/README.md @@ -3,7 +3,7 @@ **King Abdullah University of Science and Technology** - [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1OK4kYsZphwt5DXchKkzMBjYF6jnkqh4R?usp=sharing) [![YouTube](https://badges.aleen42.com/src/youtube.svg)](https://www.youtube.com/watch?v=__tftoxpBAw&feature=youtu.be) + [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1OK4kYsZphwt5DXchKkzMBjYF6jnkqh4R?usp=sharing) [![YouTube](https://badges.aleen42.com/src/youtube.svg)](https://www.youtube.com/watch?v=__tftoxpBAw&feature=youtu.be) ## News From d9dc75dfc423f8d9955572e69491f0d6840c8c1e Mon Sep 17 00:00:00 2001 From: XiaoqianShen <64844805+xiaoqian-shen@users.noreply.github.com> Date: Mon, 24 Apr 2023 14:48:21 +0300 Subject: [PATCH 04/12] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 184ce8c..6c66fad 100644 --- a/README.md +++ b/README.md @@ -3,7 +3,7 @@ **King Abdullah University of Science and Technology** - [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1OK4kYsZphwt5DXchKkzMBjYF6jnkqh4R?usp=sharing) [![YouTube](https://badges.aleen42.com/src/youtube.svg)](https://www.youtube.com/watch?v=__tftoxpBAw&feature=youtu.be) + [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1OK4kYsZphwt5DXchKkzMBjYF6jnkqh4R?usp=sharing) [![YouTube](https://badges.aleen42.com/src/youtube.svg)](https://www.youtube.com/watch?v=__tftoxpBAw&feature=youtu.be) ## News From 31b3b9078afa5b357f972c6b256eb65bc46ba90f Mon Sep 17 00:00:00 2001 From: XiaoqianShen <64844805+xiaoqian-shen@users.noreply.github.com> Date: Mon, 24 Apr 2023 15:35:29 +0300 Subject: [PATCH 05/12] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 6c66fad..4452e9f 100644 --- a/README.md +++ b/README.md @@ -158,6 +158,7 @@ If you're using MiniGPT-4 in your research or applications, please cite using th @misc{zhu2022minigpt4, title={MiniGPT-4: Enhancing Vision-language Understanding with Advanced Large Language Models}, author={Deyao Zhu and Jun Chen and Xiaoqian Shen and xiang Li and Mohamed Elhoseiny}, + journal={arXiv preprint arXiv:2304.10592}, year={2023}, } ``` From a0b7c126c060700e97a5b518e210e89ca836b929 Mon Sep 17 00:00:00 2001 From: XiaoqianShen <64844805+xiaoqian-shen@users.noreply.github.com> Date: Mon, 24 Apr 2023 15:35:56 +0300 Subject: [PATCH 06/12] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 4452e9f..519c217 100644 --- a/README.md +++ b/README.md @@ -157,7 +157,7 @@ If you're using MiniGPT-4 in your research or applications, please cite using th ```bibtex @misc{zhu2022minigpt4, title={MiniGPT-4: Enhancing Vision-language Understanding with Advanced Large Language Models}, - author={Deyao Zhu and Jun Chen and Xiaoqian Shen and xiang Li and Mohamed Elhoseiny}, + author={Deyao Zhu and Jun Chen and Xiaoqian Shen and Xiang Li and Mohamed Elhoseiny}, journal={arXiv preprint arXiv:2304.10592}, year={2023}, } From c6acff2d8d7ea6c89cfe79b35a85b2ceea5ccfb2 Mon Sep 17 00:00:00 2001 From: ZhuDeyao Date: Thu, 27 Apr 2023 13:25:54 +0300 Subject: [PATCH 07/12] Update README.md add stage 1 7b link --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 519c217..16de690 100644 --- a/README.md +++ b/README.md @@ -121,7 +121,7 @@ torchrun --nproc-per-node NUM_GPU train.py --cfg-path train_configs/minigpt4_sta ``` A MiniGPT-4 checkpoint with only stage one training can be downloaded -[here](https://drive.google.com/file/d/1u9FRRBB3VovP1HxCAlpD9Lw4t4P6-Yq8/view?usp=share_link). +[here (13B)](https://drive.google.com/file/d/1u9FRRBB3VovP1HxCAlpD9Lw4t4P6-Yq8/view?usp=share_link) or [here (7B)](https://drive.google.com/file/d/1HihQtCEXUyBM1i9DQbaK934wW3TZi-h5/view?usp=share_link). Compared to the model after stage two, this checkpoint generate incomplete and repeated sentences frequently. From 22d8888ca2cf0aac862f537e7d22ef5830036808 Mon Sep 17 00:00:00 2001 From: XiaoqianShen <64844805+xiaoqian-shen@users.noreply.github.com> Date: Mon, 1 May 2023 14:02:12 +0300 Subject: [PATCH 08/12] Update README.md --- README.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index 16de690..7aa29f2 100644 --- a/README.md +++ b/README.md @@ -155,11 +155,11 @@ After the second stage alignment, MiniGPT-4 is able to talk about the image cohe If you're using MiniGPT-4 in your research or applications, please cite using this BibTeX: ```bibtex -@misc{zhu2022minigpt4, - title={MiniGPT-4: Enhancing Vision-language Understanding with Advanced Large Language Models}, - author={Deyao Zhu and Jun Chen and Xiaoqian Shen and Xiang Li and Mohamed Elhoseiny}, - journal={arXiv preprint arXiv:2304.10592}, - year={2023}, +@article{zhu2023minigpt, + title={MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models}, + author={Zhu, Deyao and Chen, Jun and Shen, Xiaoqian and Li, Xiang and Elhoseiny, Mohamed}, + journal={arXiv preprint arXiv:2304.10592}, + year={2023} } ``` From 7a0eec4a3810c3f6b58bd09274018612419f99e5 Mon Sep 17 00:00:00 2001 From: Amal Chandran Date: Fri, 28 Jul 2023 08:38:36 +0530 Subject: [PATCH 09/12] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 7aa29f2..a5d9084 100644 --- a/README.md +++ b/README.md @@ -54,7 +54,7 @@ conda activate minigpt4 **2. Prepare the pretrained Vicuna weights** -The current version of MiniGPT-4 is built on the v0 versoin of Vicuna-13B. +The current version of MiniGPT-4 is built on the v0 version of Vicuna-13B. Please refer to our instruction [here](PrepareVicuna.md) to prepare the Vicuna weights. The final weights would be in a single folder in a structure similar to the following: From dbe715bd2a49c50e6641dd5592503389697da28d Mon Sep 17 00:00:00 2001 From: "Mohamed F. Ahmed" Date: Tue, 15 Aug 2023 07:39:49 -0700 Subject: [PATCH 10/12] Create CODE_OF_CONDUCT.md --- CODE_OF_CONDUCT.md | 128 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 128 insertions(+) create mode 100644 CODE_OF_CONDUCT.md diff --git a/CODE_OF_CONDUCT.md b/CODE_OF_CONDUCT.md new file mode 100644 index 0000000..1ee61a1 --- /dev/null +++ b/CODE_OF_CONDUCT.md @@ -0,0 +1,128 @@ +# Contributor Covenant Code of Conduct + +## Our Pledge + +We as members, contributors, and leaders pledge to make participation in our +community a harassment-free experience for everyone, regardless of age, body +size, visible or invisible disability, ethnicity, sex characteristics, gender +identity and expression, level of experience, education, socio-economic status, +nationality, personal appearance, race, religion, or sexual identity +and orientation. + +We pledge to act and interact in ways that contribute to an open, welcoming, +diverse, inclusive, and healthy community. + +## Our Standards + +Examples of behavior that contributes to a positive environment for our +community include: + +* Demonstrating empathy and kindness toward other people +* Being respectful of differing opinions, viewpoints, and experiences +* Giving and gracefully accepting constructive feedback +* Accepting responsibility and apologizing to those affected by our mistakes, + and learning from the experience +* Focusing on what is best not just for us as individuals, but for the + overall community + +Examples of unacceptable behavior include: + +* The use of sexualized language or imagery, and sexual attention or + advances of any kind +* Trolling, insulting or derogatory comments, and personal or political attacks +* Public or private harassment +* Publishing others' private information, such as a physical or email + address, without their explicit permission +* Other conduct which could reasonably be considered inappropriate in a + professional setting + +## Enforcement Responsibilities + +Community leaders are responsible for clarifying and enforcing our standards of +acceptable behavior and will take appropriate and fair corrective action in +response to any behavior that they deem inappropriate, threatening, offensive, +or harmful. + +Community leaders have the right and responsibility to remove, edit, or reject +comments, commits, code, wiki edits, issues, and other contributions that are +not aligned to this Code of Conduct, and will communicate reasons for moderation +decisions when appropriate. + +## Scope + +This Code of Conduct applies within all community spaces, and also applies when +an individual is officially representing the community in public spaces. +Examples of representing our community include using an official e-mail address, +posting via an official social media account, or acting as an appointed +representative at an online or offline event. + +## Enforcement + +Instances of abusive, harassing, or otherwise unacceptable behavior may be +reported to the community leaders responsible for enforcement at +https://discord.gg/2aNvvYVv. +All complaints will be reviewed and investigated promptly and fairly. + +All community leaders are obligated to respect the privacy and security of the +reporter of any incident. + +## Enforcement Guidelines + +Community leaders will follow these Community Impact Guidelines in determining +the consequences for any action they deem in violation of this Code of Conduct: + +### 1. Correction + +**Community Impact**: Use of inappropriate language or other behavior deemed +unprofessional or unwelcome in the community. + +**Consequence**: A private, written warning from community leaders, providing +clarity around the nature of the violation and an explanation of why the +behavior was inappropriate. A public apology may be requested. + +### 2. Warning + +**Community Impact**: A violation through a single incident or series +of actions. + +**Consequence**: A warning with consequences for continued behavior. No +interaction with the people involved, including unsolicited interaction with +those enforcing the Code of Conduct, for a specified period of time. This +includes avoiding interactions in community spaces as well as external channels +like social media. Violating these terms may lead to a temporary or +permanent ban. + +### 3. Temporary Ban + +**Community Impact**: A serious violation of community standards, including +sustained inappropriate behavior. + +**Consequence**: A temporary ban from any sort of interaction or public +communication with the community for a specified period of time. No public or +private interaction with the people involved, including unsolicited interaction +with those enforcing the Code of Conduct, is allowed during this period. +Violating these terms may lead to a permanent ban. + +### 4. Permanent Ban + +**Community Impact**: Demonstrating a pattern of violation of community +standards, including sustained inappropriate behavior, harassment of an +individual, or aggression toward or disparagement of classes of individuals. + +**Consequence**: A permanent ban from any sort of public interaction within +the community. + +## Attribution + +This Code of Conduct is adapted from the [Contributor Covenant][homepage], +version 2.0, available at +https://www.contributor-covenant.org/version/2/0/code_of_conduct.html. + +Community Impact Guidelines were inspired by [Mozilla's code of conduct +enforcement ladder](https://github.com/mozilla/diversity). + +[homepage]: https://www.contributor-covenant.org + +For answers to common questions about this code of conduct, see the FAQ at +https://www.contributor-covenant.org/faq. Translations are available at +https://www.contributor-covenant.org/translations. From faac01de6b027479f1d64244494e353c79322116 Mon Sep 17 00:00:00 2001 From: "Mohamed F. Ahmed" Date: Tue, 15 Aug 2023 07:40:45 -0700 Subject: [PATCH 11/12] Create SECURITY.md --- SECURITY.md | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) create mode 100644 SECURITY.md diff --git a/SECURITY.md b/SECURITY.md new file mode 100644 index 0000000..034e848 --- /dev/null +++ b/SECURITY.md @@ -0,0 +1,21 @@ +# Security Policy + +## Supported Versions + +Use this section to tell people about which versions of your project are +currently being supported with security updates. + +| Version | Supported | +| ------- | ------------------ | +| 5.1.x | :white_check_mark: | +| 5.0.x | :x: | +| 4.0.x | :white_check_mark: | +| < 4.0 | :x: | + +## Reporting a Vulnerability + +Use this section to tell people how to report a vulnerability. + +Tell them where to go, how often they can expect to get an update on a +reported vulnerability, what to expect if the vulnerability is accepted or +declined, etc. From 6e6fa5fc4b445bf395dea6faea031c4ea15c7433 Mon Sep 17 00:00:00 2001 From: "Mohamed F. Ahmed" Date: Tue, 15 Aug 2023 07:41:32 -0700 Subject: [PATCH 12/12] Update issue templates --- .github/ISSUE_TEMPLATE/bug_report.md | 38 +++++++++++++++++++++++ .github/ISSUE_TEMPLATE/feature_request.md | 20 ++++++++++++ 2 files changed, 58 insertions(+) create mode 100644 .github/ISSUE_TEMPLATE/bug_report.md create mode 100644 .github/ISSUE_TEMPLATE/feature_request.md diff --git a/.github/ISSUE_TEMPLATE/bug_report.md b/.github/ISSUE_TEMPLATE/bug_report.md new file mode 100644 index 0000000..dd84ea7 --- /dev/null +++ b/.github/ISSUE_TEMPLATE/bug_report.md @@ -0,0 +1,38 @@ +--- +name: Bug report +about: Create a report to help us improve +title: '' +labels: '' +assignees: '' + +--- + +**Describe the bug** +A clear and concise description of what the bug is. + +**To Reproduce** +Steps to reproduce the behavior: +1. Go to '...' +2. Click on '....' +3. Scroll down to '....' +4. See error + +**Expected behavior** +A clear and concise description of what you expected to happen. + +**Screenshots** +If applicable, add screenshots to help explain your problem. + +**Desktop (please complete the following information):** + - OS: [e.g. iOS] + - Browser [e.g. chrome, safari] + - Version [e.g. 22] + +**Smartphone (please complete the following information):** + - Device: [e.g. iPhone6] + - OS: [e.g. iOS8.1] + - Browser [e.g. stock browser, safari] + - Version [e.g. 22] + +**Additional context** +Add any other context about the problem here. diff --git a/.github/ISSUE_TEMPLATE/feature_request.md b/.github/ISSUE_TEMPLATE/feature_request.md new file mode 100644 index 0000000..bbcbbe7 --- /dev/null +++ b/.github/ISSUE_TEMPLATE/feature_request.md @@ -0,0 +1,20 @@ +--- +name: Feature request +about: Suggest an idea for this project +title: '' +labels: '' +assignees: '' + +--- + +**Is your feature request related to a problem? Please describe.** +A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] + +**Describe the solution you'd like** +A clear and concise description of what you want to happen. + +**Describe alternatives you've considered** +A clear and concise description of any alternative solutions or features you've considered. + +**Additional context** +Add any other context or screenshots about the feature request here.