Can I use system RAM instead of VRAM for AI?

Technically yes — tools like Ollama can offload model layers to system RAM. But system RAM is 10–20x slower than VRAM for GPU workloads, so inference speed drops dramatically. A 7B model that runs at 30+ tokens per second on GPU might manage 4–5 tok/s on CPU with system RAM.

Is 4 GB VRAM enough for AI in 2026?

Barely. 4 GB lets you run Ollama 7B models in Q4 quantisation and Stable Diffusion 1.5, but you'll hit the ceiling quickly. SDXL, FLUX, and 13B models all need more. If you're buying now, 6 GB is the realistic minimum.

Does AMD VRAM work for AI the same as NVIDIA?

Not in practice. Almost all AI software is built on NVIDIA's CUDA platform. AMD's ROCm alternative exists but has patchy support and frequent compatibility issues. For AI work, stick to NVIDIA GPUs.

Can I add more VRAM to my laptop later?

No. VRAM is soldered onto the graphics card and cannot be upgraded. Unlike system RAM, which uses replaceable SO-DIMM slots in many laptops, VRAM is fixed at purchase. This makes it the single most important spec to get right when buying.

What Is VRAM and Why Does It Matter for AI Development?

VRAM — Video Random Access Memory — is the dedicated memory on your graphics card. If you’re planning to run AI models locally on your laptop, VRAM is the single most important specification you need to understand. It determines what models you can run, how fast they run, and whether certain tasks are possible at all.

This guide explains what VRAM actually is, how it differs from regular RAM, exactly how much you need for different AI workloads, and the common traps to avoid when shopping for a used laptop.

VRAM vs RAM — What’s the Difference?

Think of your laptop as a kitchen. System RAM is your countertop — it’s where the CPU does its general work, holding your browser tabs, code editor, operating system, and whatever else is running. VRAM is a separate, smaller countertop inside a specialised appliance (the GPU) that’s designed exclusively for graphics and parallel computation.

When you run an AI model on your GPU, the entire model needs to fit onto that GPU countertop — the VRAM. If it doesn’t fit, one of two things happens: either the model refuses to load, or it partially spills over onto the CPU’s countertop (system RAM), which is dramatically slower.

Here’s the critical distinction:

	System RAM	VRAM
Located on	Motherboard (SO-DIMM slots or soldered)	Graphics card (soldered, never upgradeable)
Typical laptop sizes	8–64 GB	0–16 GB
Bandwidth	~50 GB/s (DDR5)	~200–600 GB/s (GDDR6)
Used by	CPU, operating system, applications	GPU, AI model weights, image generation
Upgradeable?	Often yes (SO-DIMM slots)	Never

The bandwidth difference is crucial. VRAM can shuttle data 4–10x faster than system RAM. When an AI model generates tokens or renders an image, it needs to read and write billions of numbers per second. Fast VRAM makes this feasible; slow system RAM makes it painful.

A laptop with 32 GB of system RAM and 0 GB of VRAM cannot run Stable Diffusion. A laptop with 16 GB of system RAM and 6 GB of VRAM can. The numbers on the spec sheet that matter most are the ones next to “VRAM” or “GPU Memory.”

Why VRAM Is Critical for AI

Every major local AI task is fundamentally limited by VRAM.

Loading LLMs into GPU Memory

When you run a language model through Ollama or LM Studio, the model weights need to sit in memory. Quantisation (compressing the model from 16-bit to 4-bit precision) reduces the memory footprint dramatically, but even quantised models are large:

7B parameter model (Llama 3, Mistral 7B) in Q4 quantisation: ~4–6 GB
13B parameter model (Llama 2 13B, DeepSeek-Coder) in Q4: ~8–10 GB
70B parameter model (Llama 3 70B) in Q4: ~35–40 GB — laptop territory only with CPU offloading

If the model fits entirely in VRAM, you get fast GPU-accelerated inference — typically 20–40 tokens per second. If it doesn’t fit, Ollama offloads layers to system RAM, and speed drops to 3–5 tokens per second. That’s the difference between a usable chat experience and watching paint dry.

Image Generation

Stable Diffusion and its successors need VRAM for the model weights, the image being generated, and intermediate computation:

Stable Diffusion 1.5: ~4 GB VRAM minimum, 6 GB comfortable
Stable Diffusion XL (SDXL): ~6 GB minimum, 8 GB comfortable
FLUX.1: ~8 GB minimum, 10–12 GB comfortable

Running out of VRAM during image generation usually means an out-of-memory crash — not a graceful slowdown. You either have enough or you don’t.

Fine-Tuning

Fine-tuning a model on your own data is the most VRAM-hungry task. Even efficient techniques like LoRA (Low-Rank Adaptation) need to hold the model, the training data batch, gradients, and optimiser state in memory simultaneously:

LoRA fine-tuning of a 7B model: ~6–8 GB VRAM minimum
QLoRA (quantised LoRA): ~4–6 GB — the most memory-efficient option
Full fine-tuning: impractical on laptops — use cloud compute

How Much VRAM Do You Actually Need?

This table covers the most common AI tasks and their real-world VRAM requirements:

Task	Min. VRAM	Comfortable	Notes
Ollama 7B (Q4_K_M)	4 GB	6 GB	Leaves room for system overhead
Ollama 13B (Q4_K_M)	8 GB	10 GB	Tight at 8 GB — close to limit
Stable Diffusion 1.5	4 GB	6 GB	512x512 images, 20–30 steps
SDXL	6 GB	8 GB	1024x1024 images
FLUX.1	8 GB	12 GB	Latest generation, memory-hungry
LoRA fine-tuning (7B)	6 GB	8 GB	Using QLoRA drops to ~4 GB
ComfyUI workflows	6 GB	8–12 GB	Depends on workflow complexity
Whisper transcription	2 GB	4 GB	Runs fine on modest GPUs

VRAM in Laptops — Common Traps

Shopping for a used laptop with “good VRAM” is full of potential mistakes. Here’s what to watch for.

Mobile GPU VRAM Is Not Desktop VRAM

The RTX 4090 desktop card has 24 GB of VRAM. The RTX 4090 Laptop GPU has 16 GB. Same name, different chip, different memory. Always check the specific mobile variant — don’t assume laptop specs match desktop specs.

Similarly, the RTX 3060 exists in both 6 GB and 12 GB desktop versions, but the mobile RTX 3060 is always 6 GB. The naming is confusing by design.

Shared Memory vs Dedicated VRAM

Laptops with integrated GPUs (Intel Iris Xe, AMD Radeon 680M/780M) have no dedicated VRAM. They share system RAM with the GPU, which means:

The GPU “borrows” 2–4 GB from your system RAM
This shared memory runs at system RAM speed (~50 GB/s), not VRAM speed (~200+ GB/s)
It’s 4–10x slower for AI workloads than dedicated VRAM
Listings that say “up to 16 GB GPU memory” on an integrated GPU are misleading — it’s just your system RAM being shared

The ThinkPad T14 Gen 3 and Dell Latitude 5540 both have integrated GPUs with 0 GB dedicated VRAM. They can run LLMs on CPU, but image generation and GPU-accelerated inference are not possible.

”16 GB GPU” Doesn’t Always Mean What You Think

Some laptop listings advertise “16 GB GPU Memory” for machines with integrated graphics. This is technically the maximum amount of system RAM the integrated GPU can address — not dedicated VRAM. The actual AI performance of 16 GB shared memory is nowhere near 16 GB of dedicated GDDR6 VRAM.

If a listing doesn’t specify “dedicated” or mention an NVIDIA/AMD discrete GPU model, assume it’s shared memory and treat the effective VRAM as 0.

How to Choose: VRAM Tiers for 2026

Tier 1: No Dedicated VRAM (Integrated GPU Only)

What you can do: CPU-only LLM inference (slow — 3–5 tok/s), API-based AI tools (Copilot, ChatGPT, Claude), Whisper transcription on CPU.

What you can’t do: Stable Diffusion, SDXL, FLUX, GPU-accelerated inference, fine-tuning.

Example laptops: ThinkPad T14 Gen 3 (AI Score: 42, £320–£480), Dell Latitude 5540 (AI Score: 38, £280–£420).

Best for: Students on a tight budget who want to learn AI fundamentals and run small models while spending under £500.

Tier 2: 4–6 GB VRAM — The Entry Point

What you can do: Ollama 7B at full GPU speed (20–30 tok/s), Stable Diffusion 1.5, SDXL (tight at 6 GB), basic LoRA with QLoRA.

What you can’t do: FLUX, 13B models on GPU, serious fine-tuning.

Example laptops: Dell Precision 5560 (4 GB VRAM, AI Score: 62, £480–£680), Legion 5 Gen 6 (6 GB VRAM, AI Score: 71, £550–£750).

Best for: Anyone who wants GPU-accelerated AI without spending over £800.

Tier 3: 8–16 GB VRAM — Serious AI Work

What you can do: 13B models comfortably, SDXL and FLUX, LoRA fine-tuning, ComfyUI workflows, multiple models simultaneously.

What you can’t do: 70B models fully on GPU (still need CPU offload), full fine-tuning.

Example laptops: Used RTX 3070/3080 gaming laptops (8–16 GB), ThinkPad P-series with RTX A4000/A5000.

Best for: Professionals and serious hobbyists who need reliable, fast AI inference and image generation.

VRAM and Our Reviewed Laptops

Here’s how the laptops we’ve reviewed stack up on VRAM:

Laptop	GPU	Dedicated VRAM	AI Score	Price (UK)	Best AI Use Case
Dell Latitude 5540	Intel Iris Xe	0 GB (shared)	38	£280–£420	CPU inference only
ThinkPad T14 Gen 3	AMD Radeon 660M	0 GB (shared)	42	£320–£480	CPU inference, API tools
ThinkPad T14s Gen 4	AMD Radeon 780M	0 GB (shared)	48	£420–£580	Faster CPU/iGPU inference
Dell Precision 5560	NVIDIA RTX A2000	4 GB GDDR6	62	£480–£680	SD 1.5, GPU-accelerated LLMs
Legion 5 Gen 6	NVIDIA RTX 3060	6 GB GDDR6	71	£550–£750	SDXL, 13B models, ComfyUI

The jump from 0 GB to 4 GB VRAM takes the AI Score from the 38–48 range to 62. The jump from 4 GB to 6 GB pushes it to 71. That’s because even a small amount of dedicated VRAM unlocks an entirely different category of AI workloads.

Summary

VRAM is the GPU’s dedicated memory — separate from system RAM, much faster, and never upgradeable in laptops
It’s the #1 bottleneck for local AI — your model must fit in VRAM for fast GPU inference
4 GB is the bare minimum for meaningful GPU-accelerated AI; 6–8 GB is the sweet spot for 2026
Integrated GPUs have 0 GB dedicated VRAM — they can only do CPU inference, which is 5–10x slower
Always check for dedicated VRAM when shopping — ignore “shared GPU memory” marketing claims
For more context on choosing the right used laptop for AI, read our complete buyer’s guide