How Low-Rank Adaptation Works

LoRA (Low-Rank Adaptation) – Detailed Explanation

1. What is LoRA?

LoRA is a method to fine-tune large neural networks without changing most of their original weights.
Instead of updating a huge weight matrix directly, LoRA adds a small, trainable "adapter" that learns the change.

Key idea:
Keep original weights frozen, and only learn a small correction.

Why this matters:

Much faster training
Much less memory usage
Easy to switch between different tasks

2. The Core Mathematical Idea

In a normal neural network layer, we have a weight matrix:

W (size: d x k)

Normally during training, we update W directly.

With LoRA, we do NOT change W. Instead, we approximate the update like this:

W' = W + ΔW

But instead of learning ΔW as a full matrix, we break it into two smaller matrices:

ΔW = A × B

Where:

A is size (d x r)
B is size (r x k)
r is a small number (rank), like 4, 8, or 16

So instead of learning d×k parameters, we only learn r×(d+k).

This is the "low-rank" idea.

3. Why Low-Rank Works

In practice, large neural network updates often have redundancy.
This means the important changes can be represented in a lower-dimensional space.

Think of it like:

Full matrix = full detail image
Low-rank = compressed version with key features

Example:
If W is 4096×4096, that is ~16 million parameters.
If r = 8:

A = 4096×8 → ~32K params
B = 8×4096 → ~32K params
Total = ~64K params

This is 250x smaller than full fine-tuning.

4. How LoRA is Applied in Transformers

LoRA is usually applied to attention layers, especially:

Query (Q)
Key (K)
Value (V)
Output projection

Example for a linear layer:
Original:

y = W x

With LoRA:

y = W x + (A × B × x)

Or equivalently:

y = (W + A×B) x

During training:

W is frozen
A and B are trained

During inference:

You can merge A×B into W
Or keep them separate

5. Step-by-Step Training Flow

Step 1: Load pretrained model
Step 2: Freeze all original weights
Step 3: Insert LoRA layers (A and B)
Step 4: Train only A and B
Step 5: Save LoRA weights

Example (conceptual):

for each batch:
    output = model_with_lora(input)
    loss = compute_loss(output, target)
    update(A, B)

6. Intuition with a Simple Example

Imagine a model trained to describe images.

Original model output:

"A fruit on a table"

After LoRA fine-tuning on a dataset of apples:

"Two red apples on a wooden table"

LoRA didn't relearn everything. It only learned:

How to count
How to be more specific

7. Real Use Case: Chatbot Personalization

Base model:

User: Hello
Model: Hello! How can I help you?

After LoRA trained on company tone:

User: Hello
Model: Welcome! How may we assist you today?

Only small style changes were learned.

8. Choosing Rank (r)

Rank controls capacity.

Small r (e.g. 4): very efficient, less expressive
Medium r (e.g. 8–16): good balance
Large r (e.g. 64+): closer to full fine-tuning

Rule of thumb:

Simple task → small r
Complex domain → larger r

9. Scaling Factor (Alpha)

LoRA often uses a scaling factor:

W' = W + (α / r) × (A × B)

Where:

α controls how strong the LoRA update is

Example:

α = 16, r = 8 → scale = 2

This stabilizes training.

10. Advantages of LoRA

Very memory efficient
Fast training
Can store many task-specific adapters
No need to retrain full model

11. Limitations

May not capture very complex changes if rank is too small
Requires careful tuning of r and α
Not always as powerful as full fine-tuning

12. LoRA vs Full Fine-Tuning

Full fine-tuning:

Updates all weights
Very expensive
Best performance (sometimes)

LoRA:

Updates small matrices
Cheap and fast
Slightly less flexible

13. Practical Example (Pseudo Code)

class LoRALinear:
    def __init__(self, W, r):
        self.W = freeze(W)
        self.A = random_matrix(d, r)
        self.B = random_matrix(r, k)

    def forward(self, x):
        return W @ x + A @ B @ x

14. Where LoRA is Used

Large language models (LLMs)
Image models (diffusion, captioning)
Speech models
Recommendation systems

15. Final Intuition

Think of LoRA as:

Original model = knowledge base
LoRA = small "patch" or "skill"

Instead of rewriting the whole brain, you just attach a small module that adjusts behavior.

This is why LoRA is powerful: it adds new abilities without rebuilding everything.
Any comments? Feel free to participate below in the Facebook comment section.

Enjoy the following random pages..

This website helps you translate Chinese into English!

This website helps C++ programmers improve their programming skills!

This website is for a Taiwanese company who produces quality end mills and cutters.

This is a 3D slime war board game with computer AI.

Post your comment below.
Anything is okay.
I am serious.