diff --git a/.gitignore b/.gitignore index 1dc019b..d143271 100755 --- a/.gitignore +++ b/.gitignore @@ -180,4 +180,5 @@ jobs/ slurm* sbatch_generate* eval_data/ -dataset/Evaluation.md \ No newline at end of file +dataset/Evaluation.md +jupyter_notebook.slurm \ No newline at end of file diff --git a/MiniGPTv2_Train .md b/MiniGPTv2_Train .md index bd62ef2..db30220 100644 --- a/MiniGPTv2_Train .md +++ b/MiniGPTv2_Train .md @@ -1,22 +1,21 @@ ## Finetune of MiniGPT-4 -The training of MiniGPT-4 contains two alignment stages. -**1. First pretraining stage** +You firstly need to prepare the dataset. you can follow this step to prepare the dataset. +our [dataset preparation](dataset/README_MINIGPTv2_FINETUNE.md). + +in train_configs/minigptv2_finetune.yaml, you need to set up the paths +llama_model checkpoint path: "/path/to/llama_checkpoint" +ckpt: "/path/to/pretrained_checkpoint" +ckpt save path: "/path/to/save_checkpoint" + +For ckpt, you may load from our pretrained model checkpoints: +| MiniGPT-v2 (after stage-2) | MiniGPT-v2 (after stage-3) | MiniGPT-v2 (developing model (online demo)) | +|------------------------------|------------------------------|------------------------------| +| [Download](https://drive.google.com/file/d/1Vi_E7ZtZXRAQcyz4f8E6LtLh2UXABCmu/view?usp=sharing) |[Download](https://drive.google.com/file/d/1jAbxUiyl04SFJMN4sF1vvUU69Etuz4qa/view?usp=sharing) | [Download](https://drive.google.com/file/d/1aVbfW7nkCSYx99_vCRyP1sOlQiWVSnAl/view?usp=sharing) | -In the first pretrained stage, the model is trained using image-text pairs from Laion and CC datasets -to align the vision and language model. To download and prepare the datasets, please check -our [first stage dataset preparation instruction](dataset/README_1_STAGE.md). -After the first stage, the visual features are mapped and can be understood by the language -model. -To launch the first stage training, run the following command. In our experiments, we use 4 A100. -You can change the save path in the config file -[train_configs/minigpt4_stage1_pretrain.yaml](train_configs/minigpt4_stage1_pretrain.yaml) ```bash -torchrun --nproc-per-node NUM_GPU train.py --cfg-path train_configs/minigpt4_stage1_pretrain.yaml +torchrun --nproc-per-node NUM_GPU train.py --cfg-path train_configs/minigptv2_finetune.yaml ``` -A MiniGPT-4 checkpoint with only stage one training can be downloaded -[here (13B)](https://drive.google.com/file/d/1u9FRRBB3VovP1HxCAlpD9Lw4t4P6-Yq8/view?usp=share_link) or [here (7B)](https://drive.google.com/file/d/1HihQtCEXUyBM1i9DQbaK934wW3TZi-h5/view?usp=share_link). -Compared to the model after stage two, this checkpoint generate incomplete and repeated sentences frequently. diff --git a/jupyter_notebook.slurm b/jupyter_notebook.slurm deleted file mode 100755 index 87277dd..0000000 --- a/jupyter_notebook.slurm +++ /dev/null @@ -1,35 +0,0 @@ -#!/bin/bash -l -#SBATCH --ntasks=1 -#SBATCH --cpus-per-task=6 -#SBATCH --gres=gpu:1 -#SBATCH --reservation=A100 -#SBATCH --mem=32GB -#SBATCH --time=4:00:00 -#SBATCH --partition=batch -##SBATCH --account=conf-iclr-2023.09.29-elhosemh - -# Load environment which has Jupyter installed. It can be one of the following: -# - Machine Learning module installed on the system (module load machine_learning) -# - your own conda environment on Ibex -# - a singularity container with python environment (conda or otherwise) - -module load machine_learning - -# get tunneling info -export XDG_RUNTIME_DIR="" node=$(hostname -s) -user=$(whoami) -submit_host=${SLURM_SUBMIT_HOST} -port=10035 -echo $node pinned to port $port -# print tunneling instructions - -echo -e " -To connect to the compute node ${node} on IBEX running your jupyter notebook server, you need to run following two commands in a terminal 1. -Command to create ssh tunnel from you workstation/laptop to glogin: - -ssh -L ${port}:${node}:${port} ${user}@glogin.ibex.kaust.edu.sa - -Copy the link provided below by jupyter-server and replace the NODENAME with localhost before pasting it in your browser on your workstation/laptop " - -# Run Jupyter -jupyter notebook --no-browser --port=${port} --port-retries=50 --ip=${node} diff --git a/train_configs/minigpt_v2_finetune.yaml b/train_configs/minigptv2_finetune.yaml similarity index 100% rename from train_configs/minigpt_v2_finetune.yaml rename to train_configs/minigptv2_finetune.yaml