How Low-Rank Adaptation Works - Take 3

Question

Are you saying both "Option 1: Keep LoRA Separate" and "Option 2: Merge LoRA into W" referenced in this post are used in practice?

Answer

Yes, both are used in practice, but they serve slightly different purposes and you will see clear patterns depending on the scenario.

When "Keep LoRA Separate" is Used

This is very common during development and in flexible systems.

Typical situations:

Serving multiple LoRA adapters on the same base model
Switching tasks dynamically (e.g., different customers, styles, domains)
Experimentation and evaluation

Why keep them separate?

You can load/unload LoRA weights quickly
Storage is very small (only A and B)
No need to duplicate the large base model

Example:
One base LLM:

LoRA #1 → legal domain
LoRA #2 → medical domain
LoRA #3 → customer support tone

You simply swap LoRA weights depending on the request.

Trade-off:

Slightly more computation during inference (extra A×B×x)

When "Merge LoRA into W" is Used

This is common in production deployments where performance matters.

Typical situations:

Single fixed task model
Latency-sensitive applications
Edge deployment or optimized inference pipelines

Why merge?

No extra computation during inference
Simpler model graph
Better compatibility with optimized runtimes

What happens technically:

 W' = W + A×B

After this:

A and B are no longer needed
The model behaves like a fully fine-tuned model

How Practitioners Usually Work

In real workflows, people often do both:

Train LoRA → keep adapters separate (flexibility)
Evaluate multiple LoRAs
Choose the best one
Merge it into W for final deployment

Key Takeaway

Keeping LoRA separate = flexibility and modularity
Merging LoRA = speed and simplicity

So your understanding is right, and the important nuance is this: LoRA is not just about reducing parameters, it is also about enabling a modular way to adapt models.
Any comments? Feel free to participate below in the Facebook comment section.

Enjoy the following random pages..

This website helps C++ programmers improve their programming skills!

This program detects where eyes are in a photo.

This software allows hundreds of participants to hold a meeting online.

This is a draught puzzle solving program written in C.

Post your comment below.
Anything is okay.
I am serious.