[Tech] How Does SGLang Support LoRA?
I recently started looking into LoRA + RL, and as a side project I dug into how SGLang implements LoRA serving. This post is a concise walkthrough. TL;DR At a high level, SGLang’s LoRA support looks like this (source: https://arxiv.org/abs/2311.03285): it separates base-model computation from LoRA-adapter computation, computes them independently, and adds them together. This enables one base model to serve multiple adapters. To batch requests that use different adapters, SGLang uses SGMV (Segmented Gather Matrix-Vector Multiplication) (source: https://arxiv.org/pdf/2310.18547), so a mixed-adapter batch can still be handled efficiently with shared kernels. ...