mirror of
https://github.com/Vision-CAIR/MiniGPT-4.git
synced 2025-04-05 02:20:47 +00:00
add minigptv2_train md
This commit is contained in:
parent
eb560920e0
commit
7073c19bb3
3
.gitignore
vendored
3
.gitignore
vendored
@ -180,4 +180,5 @@ jobs/
|
|||||||
slurm*
|
slurm*
|
||||||
sbatch_generate*
|
sbatch_generate*
|
||||||
eval_data/
|
eval_data/
|
||||||
dataset/Evaluation.md
|
dataset/Evaluation.md
|
||||||
|
jupyter_notebook.slurm
|
@ -1,22 +1,21 @@
|
|||||||
## Finetune of MiniGPT-4
|
## Finetune of MiniGPT-4
|
||||||
|
|
||||||
The training of MiniGPT-4 contains two alignment stages.
|
|
||||||
|
|
||||||
**1. First pretraining stage**
|
You firstly need to prepare the dataset. you can follow this step to prepare the dataset.
|
||||||
|
our [dataset preparation](dataset/README_MINIGPTv2_FINETUNE.md).
|
||||||
|
|
||||||
|
in train_configs/minigptv2_finetune.yaml, you need to set up the paths
|
||||||
|
llama_model checkpoint path: "/path/to/llama_checkpoint"
|
||||||
|
ckpt: "/path/to/pretrained_checkpoint"
|
||||||
|
ckpt save path: "/path/to/save_checkpoint"
|
||||||
|
|
||||||
|
For ckpt, you may load from our pretrained model checkpoints:
|
||||||
|
| MiniGPT-v2 (after stage-2) | MiniGPT-v2 (after stage-3) | MiniGPT-v2 (developing model (online demo)) |
|
||||||
|
|------------------------------|------------------------------|------------------------------|
|
||||||
|
| [Download](https://drive.google.com/file/d/1Vi_E7ZtZXRAQcyz4f8E6LtLh2UXABCmu/view?usp=sharing) |[Download](https://drive.google.com/file/d/1jAbxUiyl04SFJMN4sF1vvUU69Etuz4qa/view?usp=sharing) | [Download](https://drive.google.com/file/d/1aVbfW7nkCSYx99_vCRyP1sOlQiWVSnAl/view?usp=sharing) |
|
||||||
|
|
||||||
In the first pretrained stage, the model is trained using image-text pairs from Laion and CC datasets
|
|
||||||
to align the vision and language model. To download and prepare the datasets, please check
|
|
||||||
our [first stage dataset preparation instruction](dataset/README_1_STAGE.md).
|
|
||||||
After the first stage, the visual features are mapped and can be understood by the language
|
|
||||||
model.
|
|
||||||
To launch the first stage training, run the following command. In our experiments, we use 4 A100.
|
|
||||||
You can change the save path in the config file
|
|
||||||
[train_configs/minigpt4_stage1_pretrain.yaml](train_configs/minigpt4_stage1_pretrain.yaml)
|
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
torchrun --nproc-per-node NUM_GPU train.py --cfg-path train_configs/minigpt4_stage1_pretrain.yaml
|
torchrun --nproc-per-node NUM_GPU train.py --cfg-path train_configs/minigptv2_finetune.yaml
|
||||||
```
|
```
|
||||||
|
|
||||||
A MiniGPT-4 checkpoint with only stage one training can be downloaded
|
|
||||||
[here (13B)](https://drive.google.com/file/d/1u9FRRBB3VovP1HxCAlpD9Lw4t4P6-Yq8/view?usp=share_link) or [here (7B)](https://drive.google.com/file/d/1HihQtCEXUyBM1i9DQbaK934wW3TZi-h5/view?usp=share_link).
|
|
||||||
Compared to the model after stage two, this checkpoint generate incomplete and repeated sentences frequently.
|
|
||||||
|
@ -1,35 +0,0 @@
|
|||||||
#!/bin/bash -l
|
|
||||||
#SBATCH --ntasks=1
|
|
||||||
#SBATCH --cpus-per-task=6
|
|
||||||
#SBATCH --gres=gpu:1
|
|
||||||
#SBATCH --reservation=A100
|
|
||||||
#SBATCH --mem=32GB
|
|
||||||
#SBATCH --time=4:00:00
|
|
||||||
#SBATCH --partition=batch
|
|
||||||
##SBATCH --account=conf-iclr-2023.09.29-elhosemh
|
|
||||||
|
|
||||||
# Load environment which has Jupyter installed. It can be one of the following:
|
|
||||||
# - Machine Learning module installed on the system (module load machine_learning)
|
|
||||||
# - your own conda environment on Ibex
|
|
||||||
# - a singularity container with python environment (conda or otherwise)
|
|
||||||
|
|
||||||
module load machine_learning
|
|
||||||
|
|
||||||
# get tunneling info
|
|
||||||
export XDG_RUNTIME_DIR="" node=$(hostname -s)
|
|
||||||
user=$(whoami)
|
|
||||||
submit_host=${SLURM_SUBMIT_HOST}
|
|
||||||
port=10035
|
|
||||||
echo $node pinned to port $port
|
|
||||||
# print tunneling instructions
|
|
||||||
|
|
||||||
echo -e "
|
|
||||||
To connect to the compute node ${node} on IBEX running your jupyter notebook server, you need to run following two commands in a terminal 1.
|
|
||||||
Command to create ssh tunnel from you workstation/laptop to glogin:
|
|
||||||
|
|
||||||
ssh -L ${port}:${node}:${port} ${user}@glogin.ibex.kaust.edu.sa
|
|
||||||
|
|
||||||
Copy the link provided below by jupyter-server and replace the NODENAME with localhost before pasting it in your browser on your workstation/laptop "
|
|
||||||
|
|
||||||
# Run Jupyter
|
|
||||||
jupyter notebook --no-browser --port=${port} --port-retries=50 --ip=${node}
|
|
Loading…
Reference in New Issue
Block a user