Colab是谷歌的人工智能算力平台,向开发者提供免费和付费的算力资源,开发者通过notebook使用。

LLaMA是Meta开源的大语言模型LLM,他包括7B,13B,33B和65B参数个子模型。

LLMTuneCornell Tech(康奈尔大学科技校区)和 Cornell University(康奈尔大学)的一个研究项目,旨在提供易用的平台,进行大语言模型的创新实验。

在Colab可免费环境的内存是12.7GB,使用的GPU为Tesla T4显存15GB

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12    Driver Version: 525.85.12    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   62C    P8    10W /  70W |      0MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

这样的配置足够基于LLMTune用来llama-7b-4bit模型的推理和微调,但由于内存过小不能支持llama-13b-4bit推理和微调。

在Colab下克隆LLMTune

!git clone https://github.com/kuleshov-group/llmtune.git

安装依赖项并安装llmtune。在Colab中输入%cd xxx将执行目录切换到xxx

%cd llmtune
!pip install -r requirements.txt
!python setup.py install

下载预训练基础模型llama-7b-4bit.pt

!wget https://huggingface.co/decapoda-research/llama-7b-hf-int4/resolve/main/llama-7b-4bit.pt

下载用于微调的数据集

!wget https://huggingface.co/datasets/kuleshov/alpaca-data/resolve/main/dataset.json

创建微调后模型存放地址

!mkdir alpaca-adapter-folder-13b-4bit

执行微调命令

!llmtune finetune --model llama-13b-4bit --weights llama-13b-4bit.pt --adapter alpaca-adapter-folder-13b-4bit --dataset dataset.json

训练过程信息如下

  Num examples = 41,601
  Num Epochs = 3
  Instantaneous batch size per device = 1
  Total train batch size (w. parallel, distributed & accumulation) = 2
  Gradient Accumulation steps = 2
  Total optimization steps = 62,400
  Number of trainable parameters = 4,194,304
{'loss': 2.2349, 'learning_rate': 4e-05, 'epoch': 0.0}
{'loss': 2.0549, 'learning_rate': 8e-05, 'epoch': 0.0}
{'loss': 1.8113, 'learning_rate': 0.00012, 'epoch': 0.0}
{'loss': 1.1134, 'learning_rate': 0.00016, 'epoch': 0.0}
{'loss': 0.9847, 'learning_rate': 0.0002, 'epoch': 0.0}
  0% 50/62400 [00:56<18:12:34,  1.05s/it]Saving model checkpoint to alpaca-adapter-folder-7b-4bit/checkpoint-50

参考

colab.research.google.com

github.com/kuleshov-gro

ai.facebook.com/blog/la