add minigptv2_train md

This commit is contained in:
junchen14 2023-10-25 08:30:47 +03:00
parent eb560920e0
commit 7073c19bb3
4 changed files with 15 additions and 50 deletions

3
.gitignore vendored
View File

@ -180,4 +180,5 @@ jobs/
slurm*
sbatch_generate*
eval_data/
dataset/Evaluation.md
dataset/Evaluation.md
jupyter_notebook.slurm

View File

@ -1,22 +1,21 @@
## Finetune of MiniGPT-4
The training of MiniGPT-4 contains two alignment stages.
**1. First pretraining stage**
You firstly need to prepare the dataset. you can follow this step to prepare the dataset.
our [dataset preparation](dataset/README_MINIGPTv2_FINETUNE.md).
in train_configs/minigptv2_finetune.yaml, you need to set up the paths
llama_model checkpoint path: "/path/to/llama_checkpoint"
ckpt: "/path/to/pretrained_checkpoint"
ckpt save path: "/path/to/save_checkpoint"
For ckpt, you may load from our pretrained model checkpoints:
| MiniGPT-v2 (after stage-2) | MiniGPT-v2 (after stage-3) | MiniGPT-v2 (developing model (online demo)) |
|------------------------------|------------------------------|------------------------------|
| [Download](https://drive.google.com/file/d/1Vi_E7ZtZXRAQcyz4f8E6LtLh2UXABCmu/view?usp=sharing) |[Download](https://drive.google.com/file/d/1jAbxUiyl04SFJMN4sF1vvUU69Etuz4qa/view?usp=sharing) | [Download](https://drive.google.com/file/d/1aVbfW7nkCSYx99_vCRyP1sOlQiWVSnAl/view?usp=sharing) |
In the first pretrained stage, the model is trained using image-text pairs from Laion and CC datasets
to align the vision and language model. To download and prepare the datasets, please check
our [first stage dataset preparation instruction](dataset/README_1_STAGE.md).
After the first stage, the visual features are mapped and can be understood by the language
model.
To launch the first stage training, run the following command. In our experiments, we use 4 A100.
You can change the save path in the config file
[train_configs/minigpt4_stage1_pretrain.yaml](train_configs/minigpt4_stage1_pretrain.yaml)
```bash
torchrun --nproc-per-node NUM_GPU train.py --cfg-path train_configs/minigpt4_stage1_pretrain.yaml
torchrun --nproc-per-node NUM_GPU train.py --cfg-path train_configs/minigptv2_finetune.yaml
```
A MiniGPT-4 checkpoint with only stage one training can be downloaded
[here (13B)](https://drive.google.com/file/d/1u9FRRBB3VovP1HxCAlpD9Lw4t4P6-Yq8/view?usp=share_link) or [here (7B)](https://drive.google.com/file/d/1HihQtCEXUyBM1i9DQbaK934wW3TZi-h5/view?usp=share_link).
Compared to the model after stage two, this checkpoint generate incomplete and repeated sentences frequently.

View File

@ -1,35 +0,0 @@
#!/bin/bash -l
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=6
#SBATCH --gres=gpu:1
#SBATCH --reservation=A100
#SBATCH --mem=32GB
#SBATCH --time=4:00:00
#SBATCH --partition=batch
##SBATCH --account=conf-iclr-2023.09.29-elhosemh
# Load environment which has Jupyter installed. It can be one of the following:
# - Machine Learning module installed on the system (module load machine_learning)
# - your own conda environment on Ibex
# - a singularity container with python environment (conda or otherwise)
module load machine_learning
# get tunneling info
export XDG_RUNTIME_DIR="" node=$(hostname -s)
user=$(whoami)
submit_host=${SLURM_SUBMIT_HOST}
port=10035
echo $node pinned to port $port
# print tunneling instructions
echo -e "
To connect to the compute node ${node} on IBEX running your jupyter notebook server, you need to run following two commands in a terminal 1.
Command to create ssh tunnel from you workstation/laptop to glogin:
ssh -L ${port}:${node}:${port} ${user}@glogin.ibex.kaust.edu.sa
Copy the link provided below by jupyter-server and replace the NODENAME with localhost before pasting it in your browser on your workstation/laptop "
# Run Jupyter
jupyter notebook --no-browser --port=${port} --port-retries=50 --ip=${node}