Representation Fine-tuning (ReFT)

Updating Just 0.0004% of Parameters

May 01, 2024

🎥 Trelis Livestream

Join the live Q&A Thursday at 5 pm Irish time on YouTube and on X. Drop any questions in the YouTube live chat in advance.

🔍 This week’s video - ReFT

It's possible to fine-tune language models by updating only a tiny number of parameters. It can be done with LoRA or also by fine-tuning activations via ReFT (representation fine tuning)

➡️ Two examples covered:

1. Guardrailing a model to resist responding with any financial advice

2. Causing a model to always respond in French, regardless of the question asked

➡️ Benefits of few parameter fine-tuning:

* Allows you to fine-tune quickly

* Enables you to create different behaviors that can be snapped together at inference time

* May result in more robust performance than doing a broader fine-tune on a larger number of parameters

➡️ How it works:

* Target a specific layer (usually a middle layer) within the model

* Use a very low rank (e.g. rank 4) for the adapters

* Apply the intervention at the last input position of the prompt

* Train to overfit on a small number of examples

➡️ Comparing LoRA and representation fine-tuning (ReFT):

* Both methods can achieve good performance with a small number of trainable parameters

* ReFT allows targeting activations rather than weights, but the two approaches may end up being somewhat equivalent

* LoRA has more mature library support (PEFT), while ReFT is a newer technique

And that’s it for this week, cheers, Ronan

Ronan McGovern, Trelis Research

➡️ ADVANCED-fine-tuning Repo

➡️ Trelis Resources/Support

Trelis Research Updates

Discussion about this post