在Colab利用LLMTune微调llama-7b-4bit模型
Colab是谷歌的人工智能算力平台,向开发者提供免费和付费的算力资源,开发者通过notebook使用。
LLaMA是Meta开源的大语言模型LLM,他包括7B,13B,33B和65B参数个子模型。
LLMTune是Cornell Tech(康奈尔大学科技校区)和 Cornell University(康奈尔大学)的一个研究项目,旨在提供易用的平台,进行大语言模型的创新实验。
在Colab可免费环境的内存是12.7GB,使用的GPU为Tesla T4显存15GB
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12 Driver Version: 525.85.12 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |
| N/A 62C P8 10W / 70W | 0MiB / 15360MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
这样的配置足够基于LLMTune用来llama-7b-4bit模型的推理和微调,但由于内存过小不能支持llama-13b-4bit推理和微调。
在Colab下克隆LLMTune
!git clone https://github.com/kuleshov-group/llmtune.git
安装依赖项并安装llmtune。在Colab中输入%cd xxx将执行目录切换到xxx
%cd llmtune
!pip install -r requirements.txt
!python setup.py install
下载预训练基础模型llama-7b-4bit.pt
!wget https://huggingface.co/decapoda-research/llama-7b-hf-int4/resolve/main/llama-7b-4bit.pt
下载用于微调的数据集
!wget https://huggingface.co/datasets/kuleshov/alpaca-data/resolve/main/dataset.json
创建微调后模型存放地址
!mkdir alpaca-adapter-folder-13b-4bit
执行微调命令
!llmtune finetune --model llama-13b-4bit --weights llama-13b-4bit.pt --adapter alpaca-adapter-folder-13b-4bit --dataset dataset.json
训练过程信息如下
Num examples = 41,601
Num Epochs = 3
Instantaneous batch size per device = 1
Total train batch size (w. parallel, distributed & accumulation) = 2
Gradient Accumulation steps = 2
Total optimization steps = 62,400
Number of trainable parameters = 4,194,304
{'loss': 2.2349, 'learning_rate': 4e-05, 'epoch': 0.0}
{'loss': 2.0549, 'learning_rate': 8e-05, 'epoch': 0.0}
{'loss': 1.8113, 'learning_rate': 0.00012, 'epoch': 0.0}
{'loss': 1.1134, 'learning_rate': 0.00016, 'epoch': 0.0}
{'loss': 0.9847, 'learning_rate': 0.0002, 'epoch': 0.0}
0% 50/62400 [00:56<18:12:34, 1.05s/it]Saving model checkpoint to alpaca-adapter-folder-7b-4bit/checkpoint-50
参考
https://colab.research.google.com/