← Guides

What Is VRAM and Why Does It Matter for AI Development?

VRAM — Video Random Access Memory — is the dedicated memory on your graphics card. If you’re planning to run AI models locally on your laptop, VRAM is the single most important specification you need to understand. It determines what models you can run, how fast they run, and whether certain tasks are possible at all.

This guide explains what VRAM actually is, how it differs from regular RAM, exactly how much you need for different AI workloads, and the common traps to avoid when shopping for a used laptop.

VRAM vs RAM — What’s the Difference?

Think of your laptop as a kitchen. System RAM is your countertop — it’s where the CPU does its general work, holding your browser tabs, code editor, operating system, and whatever else is running. VRAM is a separate, smaller countertop inside a specialised appliance (the GPU) that’s designed exclusively for graphics and parallel computation.

When you run an AI model on your GPU, the entire model needs to fit onto that GPU countertop — the VRAM. If it doesn’t fit, one of two things happens: either the model refuses to load, or it partially spills over onto the CPU’s countertop (system RAM), which is dramatically slower.

Here’s the critical distinction:

System RAMVRAM
Located onMotherboard (SO-DIMM slots or soldered)Graphics card (soldered, never upgradeable)
Typical laptop sizes8–64 GB0–16 GB
Bandwidth~50 GB/s (DDR5)~200–600 GB/s (GDDR6)
Used byCPU, operating system, applicationsGPU, AI model weights, image generation
Upgradeable?Often yes (SO-DIMM slots)Never

The bandwidth difference is crucial. VRAM can shuttle data 4–10x faster than system RAM. When an AI model generates tokens or renders an image, it needs to read and write billions of numbers per second. Fast VRAM makes this feasible; slow system RAM makes it painful.

A laptop with 32 GB of system RAM and 0 GB of VRAM cannot run Stable Diffusion. A laptop with 16 GB of system RAM and 6 GB of VRAM can. The numbers on the spec sheet that matter most are the ones next to “VRAM” or “GPU Memory.”

Why VRAM Is Critical for AI

Every major local AI task is fundamentally limited by VRAM.

Loading LLMs into GPU Memory

When you run a language model through Ollama or LM Studio, the model weights need to sit in memory. Quantisation (compressing the model from 16-bit to 4-bit precision) reduces the memory footprint dramatically, but even quantised models are large:

  • 7B parameter model (Llama 3, Mistral 7B) in Q4 quantisation: ~4–6 GB
  • 13B parameter model (Llama 2 13B, DeepSeek-Coder) in Q4: ~8–10 GB
  • 70B parameter model (Llama 3 70B) in Q4: ~35–40 GB — laptop territory only with CPU offloading

If the model fits entirely in VRAM, you get fast GPU-accelerated inference — typically 20–40 tokens per second. If it doesn’t fit, Ollama offloads layers to system RAM, and speed drops to 3–5 tokens per second. That’s the difference between a usable chat experience and watching paint dry.

Image Generation

Stable Diffusion and its successors need VRAM for the model weights, the image being generated, and intermediate computation:

  • Stable Diffusion 1.5: ~4 GB VRAM minimum, 6 GB comfortable
  • Stable Diffusion XL (SDXL): ~6 GB minimum, 8 GB comfortable
  • FLUX.1: ~8 GB minimum, 10–12 GB comfortable

Running out of VRAM during image generation usually means an out-of-memory crash — not a graceful slowdown. You either have enough or you don’t.

Fine-Tuning

Fine-tuning a model on your own data is the most VRAM-hungry task. Even efficient techniques like LoRA (Low-Rank Adaptation) need to hold the model, the training data batch, gradients, and optimiser state in memory simultaneously:

  • LoRA fine-tuning of a 7B model: ~6–8 GB VRAM minimum
  • QLoRA (quantised LoRA): ~4–6 GB — the most memory-efficient option
  • Full fine-tuning: impractical on laptops — use cloud compute

How Much VRAM Do You Actually Need?

This table covers the most common AI tasks and their real-world VRAM requirements:

TaskMin. VRAMComfortableNotes
Ollama 7B (Q4_K_M)4 GB6 GBLeaves room for system overhead
Ollama 13B (Q4_K_M)8 GB10 GBTight at 8 GB — close to limit
Stable Diffusion 1.54 GB6 GB512x512 images, 20–30 steps
SDXL6 GB8 GB1024x1024 images
FLUX.18 GB12 GBLatest generation, memory-hungry
LoRA fine-tuning (7B)6 GB8 GBUsing QLoRA drops to ~4 GB
ComfyUI workflows6 GB8–12 GBDepends on workflow complexity
Whisper transcription2 GB4 GBRuns fine on modest GPUs

VRAM in Laptops — Common Traps

Shopping for a used laptop with “good VRAM” is full of potential mistakes. Here’s what to watch for.

Mobile GPU VRAM Is Not Desktop VRAM

The RTX 4090 desktop card has 24 GB of VRAM. The RTX 4090 Laptop GPU has 16 GB. Same name, different chip, different memory. Always check the specific mobile variant — don’t assume laptop specs match desktop specs.

Similarly, the RTX 3060 exists in both 6 GB and 12 GB desktop versions, but the mobile RTX 3060 is always 6 GB. The naming is confusing by design.

Shared Memory vs Dedicated VRAM

Laptops with integrated GPUs (Intel Iris Xe, AMD Radeon 680M/780M) have no dedicated VRAM. They share system RAM with the GPU, which means:

  • The GPU “borrows” 2–4 GB from your system RAM
  • This shared memory runs at system RAM speed (~50 GB/s), not VRAM speed (~200+ GB/s)
  • It’s 4–10x slower for AI workloads than dedicated VRAM
  • Listings that say “up to 16 GB GPU memory” on an integrated GPU are misleading — it’s just your system RAM being shared

The ThinkPad T14 Gen 3 and Dell Latitude 5540 both have integrated GPUs with 0 GB dedicated VRAM. They can run LLMs on CPU, but image generation and GPU-accelerated inference are not possible.

”16 GB GPU” Doesn’t Always Mean What You Think

Some laptop listings advertise “16 GB GPU Memory” for machines with integrated graphics. This is technically the maximum amount of system RAM the integrated GPU can address — not dedicated VRAM. The actual AI performance of 16 GB shared memory is nowhere near 16 GB of dedicated GDDR6 VRAM.

If a listing doesn’t specify “dedicated” or mention an NVIDIA/AMD discrete GPU model, assume it’s shared memory and treat the effective VRAM as 0.

How to Choose: VRAM Tiers for 2026

Tier 1: No Dedicated VRAM (Integrated GPU Only)

What you can do: CPU-only LLM inference (slow — 3–5 tok/s), API-based AI tools (Copilot, ChatGPT, Claude), Whisper transcription on CPU.

What you can’t do: Stable Diffusion, SDXL, FLUX, GPU-accelerated inference, fine-tuning.

Example laptops: ThinkPad T14 Gen 3 (AI Score: 42, £320–£480), Dell Latitude 5540 (AI Score: 38, £280–£420).

Best for: Students on a tight budget who want to learn AI fundamentals and run small models while spending under £500.

Tier 2: 4–6 GB VRAM — The Entry Point

What you can do: Ollama 7B at full GPU speed (20–30 tok/s), Stable Diffusion 1.5, SDXL (tight at 6 GB), basic LoRA with QLoRA.

What you can’t do: FLUX, 13B models on GPU, serious fine-tuning.

Example laptops: Dell Precision 5560 (4 GB VRAM, AI Score: 62, £480–£680), Legion 5 Gen 6 (6 GB VRAM, AI Score: 71, £550–£750).

Best for: Anyone who wants GPU-accelerated AI without spending over £800.

Tier 3: 8–16 GB VRAM — Serious AI Work

What you can do: 13B models comfortably, SDXL and FLUX, LoRA fine-tuning, ComfyUI workflows, multiple models simultaneously.

What you can’t do: 70B models fully on GPU (still need CPU offload), full fine-tuning.

Example laptops: Used RTX 3070/3080 gaming laptops (8–16 GB), ThinkPad P-series with RTX A4000/A5000.

Best for: Professionals and serious hobbyists who need reliable, fast AI inference and image generation.

VRAM and Our Reviewed Laptops

Here’s how the laptops we’ve reviewed stack up on VRAM:

LaptopGPUDedicated VRAMAI ScorePrice (UK)Best AI Use Case
Dell Latitude 5540Intel Iris Xe0 GB (shared)38£280–£420CPU inference only
ThinkPad T14 Gen 3AMD Radeon 660M0 GB (shared)42£320–£480CPU inference, API tools
ThinkPad T14s Gen 4AMD Radeon 780M0 GB (shared)48£420–£580Faster CPU/iGPU inference
Dell Precision 5560NVIDIA RTX A20004 GB GDDR662£480–£680SD 1.5, GPU-accelerated LLMs
Legion 5 Gen 6NVIDIA RTX 30606 GB GDDR671£550–£750SDXL, 13B models, ComfyUI

The jump from 0 GB to 4 GB VRAM takes the AI Score from the 38–48 range to 62. The jump from 4 GB to 6 GB pushes it to 71. That’s because even a small amount of dedicated VRAM unlocks an entirely different category of AI workloads.

Summary

  • VRAM is the GPU’s dedicated memory — separate from system RAM, much faster, and never upgradeable in laptops
  • It’s the #1 bottleneck for local AI — your model must fit in VRAM for fast GPU inference
  • 4 GB is the bare minimum for meaningful GPU-accelerated AI; 6–8 GB is the sweet spot for 2026
  • Integrated GPUs have 0 GB dedicated VRAM — they can only do CPU inference, which is 5–10x slower
  • Always check for dedicated VRAM when shopping — ignore “shared GPU memory” marketing claims
  • For more context on choosing the right used laptop for AI, read our complete buyer’s guide

Related articles