Full fine-tuning versus (Q)LoRA
Plus rank stabilized LoRA (rsLoRA) and LoRA Aware quanting (LoftQ)
[NEW] Trelis Livestream
Starting this Thursday 18th April 2024 - from 5-6 pm Irish time - there’ll be a livestream on X (twitter.com/@TrelisResearch) and on YouTube (youtube.com/@TrelisResearch) from Trelis. Submit questions in advance by commenting here on Substack or on X.
🔍 Comparing Full Fine-Tuning, LoRA, and Quantized LoRA 🔍
➡️ I compare full fine-tuning, LoRA fine-tuning, and quantized LoRA fine-tuning.
➡️ I focus on how to choose the optimal hyperparameters like learning rate, rank, and alpha when using LoRA.
Shout out to Damjan Kalajdzievski for his work on rsLoRA and Daniel Han for the phenomenal Unsloth library for fine-tuning.
A few highlights from the video:
➡️ LoRA can converge faster and achieve lower loss than full fine-tuning, with the right hyperparameters
➡️ Quantized LoRA (QLoRA) reduces VRAM usage by 3-4x but can slightly degrade performance
➡️ Rank-stabilized LoRA scales the adapter learning rate by alpha/sqrt(rank) which improves stability (and provides a more systematic way to set LoRA alpha)
➡️ Quantization-aware training with LoRA (Loft-Q) didn't improve results in my experiments
Cheers, Ronan
Ronan McGovern, Trelis Research