What Is VRAM and Why Does It Matter for AI Development?
VRAM — Video Random Access Memory — is the dedicated memory on your graphics card. If you’re planning to run AI models locally on your laptop, VRAM is the single most important specification you need to understand. It determines what models you can run, how fast they run, and whether certain tasks are possible at all.
This guide explains what VRAM actually is, how it differs from regular RAM, exactly how much you need for different AI workloads, and the common traps to avoid when shopping for a used laptop.
VRAM vs RAM — What’s the Difference?
Think of your laptop as a kitchen. System RAM is your countertop — it’s where the CPU does its general work, holding your browser tabs, code editor, operating system, and whatever else is running. VRAM is a separate, smaller countertop inside a specialised appliance (the GPU) that’s designed exclusively for graphics and parallel computation.
When you run an AI model on your GPU, the entire model needs to fit onto that GPU countertop — the VRAM. If it doesn’t fit, one of two things happens: either the model refuses to load, or it partially spills over onto the CPU’s countertop (system RAM), which is dramatically slower.
Here’s the critical distinction:
| System RAM | VRAM | |
|---|---|---|
| Located on | Motherboard (SO-DIMM slots or soldered) | Graphics card (soldered, never upgradeable) |
| Typical laptop sizes | 8–64 GB | 0–16 GB |
| Bandwidth | ~50 GB/s (DDR5) | ~200–600 GB/s (GDDR6) |
| Used by | CPU, operating system, applications | GPU, AI model weights, image generation |
| Upgradeable? | Often yes (SO-DIMM slots) | Never |
The bandwidth difference is crucial. VRAM can shuttle data 4–10x faster than system RAM. When an AI model generates tokens or renders an image, it needs to read and write billions of numbers per second. Fast VRAM makes this feasible; slow system RAM makes it painful.
A laptop with 32 GB of system RAM and 0 GB of VRAM cannot run Stable Diffusion. A laptop with 16 GB of system RAM and 6 GB of VRAM can. The numbers on the spec sheet that matter most are the ones next to “VRAM” or “GPU Memory.”
Why VRAM Is Critical for AI
Every major local AI task is fundamentally limited by VRAM.
Loading LLMs into GPU Memory
When you run a language model through Ollama or LM Studio, the model weights need to sit in memory. Quantisation (compressing the model from 16-bit to 4-bit precision) reduces the memory footprint dramatically, but even quantised models are large:
- 7B parameter model (Llama 3, Mistral 7B) in Q4 quantisation: ~4–6 GB
- 13B parameter model (Llama 2 13B, DeepSeek-Coder) in Q4: ~8–10 GB
- 70B parameter model (Llama 3 70B) in Q4: ~35–40 GB — laptop territory only with CPU offloading
If the model fits entirely in VRAM, you get fast GPU-accelerated inference — typically 20–40 tokens per second. If it doesn’t fit, Ollama offloads layers to system RAM, and speed drops to 3–5 tokens per second. That’s the difference between a usable chat experience and watching paint dry.
Image Generation
Stable Diffusion and its successors need VRAM for the model weights, the image being generated, and intermediate computation:
- Stable Diffusion 1.5: ~4 GB VRAM minimum, 6 GB comfortable
- Stable Diffusion XL (SDXL): ~6 GB minimum, 8 GB comfortable
- FLUX.1: ~8 GB minimum, 10–12 GB comfortable
Running out of VRAM during image generation usually means an out-of-memory crash — not a graceful slowdown. You either have enough or you don’t.
Fine-Tuning
Fine-tuning a model on your own data is the most VRAM-hungry task. Even efficient techniques like LoRA (Low-Rank Adaptation) need to hold the model, the training data batch, gradients, and optimiser state in memory simultaneously:
- LoRA fine-tuning of a 7B model: ~6–8 GB VRAM minimum
- QLoRA (quantised LoRA): ~4–6 GB — the most memory-efficient option
- Full fine-tuning: impractical on laptops — use cloud compute
How Much VRAM Do You Actually Need?
This table covers the most common AI tasks and their real-world VRAM requirements:
| Task | Min. VRAM | Comfortable | Notes |
|---|---|---|---|
| Ollama 7B (Q4_K_M) | 4 GB | 6 GB | Leaves room for system overhead |
| Ollama 13B (Q4_K_M) | 8 GB | 10 GB | Tight at 8 GB — close to limit |
| Stable Diffusion 1.5 | 4 GB | 6 GB | 512x512 images, 20–30 steps |
| SDXL | 6 GB | 8 GB | 1024x1024 images |
| FLUX.1 | 8 GB | 12 GB | Latest generation, memory-hungry |
| LoRA fine-tuning (7B) | 6 GB | 8 GB | Using QLoRA drops to ~4 GB |
| ComfyUI workflows | 6 GB | 8–12 GB | Depends on workflow complexity |
| Whisper transcription | 2 GB | 4 GB | Runs fine on modest GPUs |
VRAM in Laptops — Common Traps
Shopping for a used laptop with “good VRAM” is full of potential mistakes. Here’s what to watch for.
Mobile GPU VRAM Is Not Desktop VRAM
The RTX 4090 desktop card has 24 GB of VRAM. The RTX 4090 Laptop GPU has 16 GB. Same name, different chip, different memory. Always check the specific mobile variant — don’t assume laptop specs match desktop specs.
Similarly, the RTX 3060 exists in both 6 GB and 12 GB desktop versions, but the mobile RTX 3060 is always 6 GB. The naming is confusing by design.
Shared Memory vs Dedicated VRAM
Laptops with integrated GPUs (Intel Iris Xe, AMD Radeon 680M/780M) have no dedicated VRAM. They share system RAM with the GPU, which means:
- The GPU “borrows” 2–4 GB from your system RAM
- This shared memory runs at system RAM speed (~50 GB/s), not VRAM speed (~200+ GB/s)
- It’s 4–10x slower for AI workloads than dedicated VRAM
- Listings that say “up to 16 GB GPU memory” on an integrated GPU are misleading — it’s just your system RAM being shared
The ThinkPad T14 Gen 3 and Dell Latitude 5540 both have integrated GPUs with 0 GB dedicated VRAM. They can run LLMs on CPU, but image generation and GPU-accelerated inference are not possible.
”16 GB GPU” Doesn’t Always Mean What You Think
Some laptop listings advertise “16 GB GPU Memory” for machines with integrated graphics. This is technically the maximum amount of system RAM the integrated GPU can address — not dedicated VRAM. The actual AI performance of 16 GB shared memory is nowhere near 16 GB of dedicated GDDR6 VRAM.
If a listing doesn’t specify “dedicated” or mention an NVIDIA/AMD discrete GPU model, assume it’s shared memory and treat the effective VRAM as 0.
How to Choose: VRAM Tiers for 2026
Tier 1: No Dedicated VRAM (Integrated GPU Only)
What you can do: CPU-only LLM inference (slow — 3–5 tok/s), API-based AI tools (Copilot, ChatGPT, Claude), Whisper transcription on CPU.
What you can’t do: Stable Diffusion, SDXL, FLUX, GPU-accelerated inference, fine-tuning.
Example laptops: ThinkPad T14 Gen 3 (AI Score: 42, £320–£480), Dell Latitude 5540 (AI Score: 38, £280–£420).
Best for: Students on a tight budget who want to learn AI fundamentals and run small models while spending under £500.
Tier 2: 4–6 GB VRAM — The Entry Point
What you can do: Ollama 7B at full GPU speed (20–30 tok/s), Stable Diffusion 1.5, SDXL (tight at 6 GB), basic LoRA with QLoRA.
What you can’t do: FLUX, 13B models on GPU, serious fine-tuning.
Example laptops: Dell Precision 5560 (4 GB VRAM, AI Score: 62, £480–£680), Legion 5 Gen 6 (6 GB VRAM, AI Score: 71, £550–£750).
Best for: Anyone who wants GPU-accelerated AI without spending over £800.
Tier 3: 8–16 GB VRAM — Serious AI Work
What you can do: 13B models comfortably, SDXL and FLUX, LoRA fine-tuning, ComfyUI workflows, multiple models simultaneously.
What you can’t do: 70B models fully on GPU (still need CPU offload), full fine-tuning.
Example laptops: Used RTX 3070/3080 gaming laptops (8–16 GB), ThinkPad P-series with RTX A4000/A5000.
Best for: Professionals and serious hobbyists who need reliable, fast AI inference and image generation.
VRAM and Our Reviewed Laptops
Here’s how the laptops we’ve reviewed stack up on VRAM:
| Laptop | GPU | Dedicated VRAM | AI Score | Price (UK) | Best AI Use Case |
|---|---|---|---|---|---|
| Dell Latitude 5540 | Intel Iris Xe | 0 GB (shared) | 38 | £280–£420 | CPU inference only |
| ThinkPad T14 Gen 3 | AMD Radeon 660M | 0 GB (shared) | 42 | £320–£480 | CPU inference, API tools |
| ThinkPad T14s Gen 4 | AMD Radeon 780M | 0 GB (shared) | 48 | £420–£580 | Faster CPU/iGPU inference |
| Dell Precision 5560 | NVIDIA RTX A2000 | 4 GB GDDR6 | 62 | £480–£680 | SD 1.5, GPU-accelerated LLMs |
| Legion 5 Gen 6 | NVIDIA RTX 3060 | 6 GB GDDR6 | 71 | £550–£750 | SDXL, 13B models, ComfyUI |
The jump from 0 GB to 4 GB VRAM takes the AI Score from the 38–48 range to 62. The jump from 4 GB to 6 GB pushes it to 71. That’s because even a small amount of dedicated VRAM unlocks an entirely different category of AI workloads.
Summary
- VRAM is the GPU’s dedicated memory — separate from system RAM, much faster, and never upgradeable in laptops
- It’s the #1 bottleneck for local AI — your model must fit in VRAM for fast GPU inference
- 4 GB is the bare minimum for meaningful GPU-accelerated AI; 6–8 GB is the sweet spot for 2026
- Integrated GPUs have 0 GB dedicated VRAM — they can only do CPU inference, which is 5–10x slower
- Always check for dedicated VRAM when shopping — ignore “shared GPU memory” marketing claims
- For more context on choosing the right used laptop for AI, read our complete buyer’s guide