Ollama Laptop Requirements: VRAM & RAM for Local LLMs (2026)

Q: How much RAM do I need for local LLMs?

At least 16 GB for 7B models, and 32 GB to run 13B models comfortably or to keep other apps open. For CPU-only inference of larger models (30B+), 64 GB is the practical floor. RAM matters most when the model does not fit in VRAM and spills to system memory.

Q: What does quantisation (Q4, Q8) mean for requirements?

Quantisation shrinks a model by storing weights at lower precision. Q4 roughly halves the memory of Q8, so a 13B model needs about 8 GB at Q4 versus 14 GB at Q8, with a small quality drop. On a used laptop, Q4_K_M is the usual sweet spot between size and output quality.

Ollama is the easiest way to run large language models locally, but “will it run on my laptop?” has a precise answer: it depends on VRAM first, then RAM, then CPU. This guide gives the real numbers for each model size, so you can match a used laptop to the models you actually want to run.

If you are still deciding whether a dedicated GPU is worth it, start with our explainer on what VRAM is and why it matters for AI — it underpins everything below.

The one rule: the model must fit in memory

An LLM has to load its weights into memory before it can generate a single token. Where those weights live decides your speed:

In VRAM (dedicated GPU memory): fastest by far. A 7B model in VRAM runs at 25–38 tokens/second.
In system RAM (CPU inference): works, but slow — 3–6 tokens/second for a 7B model.
Split across VRAM + RAM: when a model is slightly too big for VRAM, Ollama offloads some layers to the GPU and runs the rest on CPU. Speed lands between the two.

So the question is always: does the quantised model fit in my VRAM? If yes, you get fast inference. If no, you fall back to RAM and accept lower speed.

Requirements by model size and quantisation

The table below shows the memory each model needs at common quantisation levels, plus realistic throughput on three hardware tiers. Tokens/second are estimates for interactive single-user use.

Model	Quant	Memory needed	CPU only (RAM)	iGPU (780M)	dGPU 6–8 GB
Llama 3 7B	Q4_K_M	~5 GB	3–6 tok/s	6–10 tok/s	25–38 tok/s
Llama 3 7B	Q8	~8 GB	2–4 tok/s	4–7 tok/s	18–26 tok/s
13B	Q4_K_M	~8 GB	1.5–3 tok/s	3–5 tok/s	12–20 tok/s
13B	Q8	~14 GB	1–2 tok/s	n/a	16 GB VRAM
34B	Q4_K_M	~19 GB	0.5–1 tok/s	n/a	split only
70B	Q4_K_M	~40 GB	needs 64 GB RAM	n/a	n/a (laptop)

Key takeaways:

7B at Q4 is the universal baseline — it runs on almost anything with 8 GB of RAM.
13B at Q4 needs ~8 GB: comfortable on a 6 GB dGPU (partial) or an 8 GB dGPU (full).
13B at Q8 or 34B realistically needs 16 GB VRAM — the territory of the ThinkPad P15 Gen 2.
70B is not a laptop GPU workload; it only runs slowly on 64 GB of system RAM.

Matching hardware to your needs

CPU-only laptops (no dedicated GPU). Machines like the ThinkPad T14 Gen 3 run 7B models on CPU at 3–6 tok/s — fine for occasional questions, drafting and learning. Get 16 GB RAM minimum, 32 GB to run 13B. A modern iGPU (Radeon 780M) adds a modest boost over older integrated graphics.

6–8 GB dGPU laptops. This is the sweet spot for most people. A 6 GB card runs 7B fully on GPU and 13B partially; an 8 GB card like the Lenovo Legion 5 Gen 7 runs 13B comfortably and leaves room for longer context. Expect 25–38 tok/s on 7B — faster than you can read.

16 GB dGPU laptops. Only needed if you want 13B at high quality (Q8), 34B models, or long-context work. The ThinkPad P15 Gen 2 is the used-market option here.

RAM still matters even with a GPU

VRAM runs the model, but system RAM holds everything else: the OS, your editor, the browser, and any model layers that spill out of VRAM. For GPU inference, 16 GB RAM is the floor and 32 GB is comfortable. For CPU inference of larger models, RAM is the model store — 64 GB unlocks 30B-class models that no laptop GPU can hold.

Practical tips for used laptops

Buy on VRAM, not GPU name. A “laptop” GPU often has less VRAM than its desktop namesake. Verify the actual VRAM in GPU-Z before buying.
Start at Q4_K_M. It is the best size-versus-quality trade-off for local use; only move to Q8 if you have spare VRAM.
Watch context length. Long prompts and large context windows consume extra memory on top of the model weights — leave headroom.
NVMe matters for load time, not speed. A fast SSD loads the model into memory quicker but does not change tokens/second once it is running.

Frequently Asked Questions

How much VRAM do I need to run Ollama? For a smooth experience, 6–8 GB of VRAM lets you run 7B and 13B models fully on the GPU. You can run Ollama with no dedicated VRAM at all — it falls back to CPU and system RAM — but expect 3–5 tokens per second on a 7B model instead of 25–35 on a dGPU.

Can I run Ollama without a dedicated GPU? Yes. Ollama runs on CPU using system RAM, and a modern integrated GPU gives a small boost. A 7B model at Q4 needs about 8 GB of RAM and runs at 3–6 tokens per second. It works fine for light use; it is just slower.

How much RAM do I need for local LLMs? At least 16 GB for 7B models, and 32 GB to run 13B comfortably. For CPU-only inference of larger models (30B+), 64 GB is the practical floor.

What does quantisation (Q4, Q8) mean for requirements? Quantisation stores model weights at lower precision to shrink them. Q4 roughly halves the memory of Q8, so a 13B model needs about 8 GB at Q4 versus 14 GB at Q8, with a small quality drop. Q4_K_M is the usual sweet spot.

For a hand-picked list of machines that hit these targets, see our best used laptops for local LLMs roundup.