How Low-Rank Adaptation Works - Take 2

Question

Using your example in "3. Why Low-Rank Works" in this post, my understanding is this:

Without LoRA:
# of params during training = 4096×4096
# of params used during inference= 4096×4096

With LoRA:
# of params during training = 4096×8+8×4096
# of params used during inference= 4096×4096+4096×8+8×4096

Am I correct?

Answer

Parameter Count: Training vs Inference

This is a very important point, and your understanding is almost correct. Let’s clarify it carefully.

Without LoRA (Full Fine-Tuning):

Training parameters = 4096 × 4096 ≈ 16 million
Inference parameters = 4096 × 4096 ≈ 16 million

In this case:

All parameters are updated during training
The same full matrix is used during inference

With LoRA:
We split into two parts:

Original matrix W (frozen)
LoRA matrices A and B (trainable)

During Training:

Trainable parameters = 4096 × 8 + 8 × 4096 = 65,536
Frozen parameters = 4096 × 4096 (not updated)

So:

Total parameters in model = still ~16 million
But only ~65K parameters are actually trained

During Inference (Two Options):
Option 1: Keep LoRA Separate

Parameters used = W + A + B
= 4096×4096 + 4096×8 + 8×4096

This matches your understanding exactly.

Option 2: Merge LoRA into W (Common in practice)

Compute: W' = W + A×B
Use only W' during inference

After merging:

Parameters used = 4096 × 4096
No extra computation from A and B

Key Insight:

LoRA reduces training cost, not necessarily total model size
Inference cost can stay the same (after merging)
The main savings come from not updating the full matrix

Simple Analogy:
Think of W as a large book.

Full fine-tuning = rewriting the entire book
LoRA = writing a small note that modifies parts of the book

During inference:

You can either read the book + note together
Or rewrite the book once with the note applied

Both give the same result, but LoRA made training much cheaper.
Any comments? Feel free to participate below in the Facebook comment section.

Enjoy the following random pages..

Prader Willi Syndrome Association

This website is for Prader Willi Syndrome Association who help Prader Willi Syndrome patients.

Like.com Visual Search Engine

This is a visual search engine acquired by Google in 2010.

3D Reversi Board Game

This is a 3D reversi board game with computer AI.

Ho Chi Minh City Night View

This introduces best hotels in Ho Chi Minh City.

Post your comment below.
Anything is okay.
I am serious.