TechHacker News5 days ago

A 10 year old Xeon is all you need (for 26B-A4B MTP Drafters without GPU)

1 min read

A developer ran Gemma 4's 26B-A4B MTP drafters on a recycled server with a single Intel Xeon E5-2620 v4 from 2016, 128 GB of DDR3 RAM, and no GPU. Neither Ollama nor standard llama.cpp supported the required model or offered enough configuration knobs for this workload. The author built a custom inference pipeline to bypass those limitations and achieve functional performance on decade-old hardware. The DDR3 RAM is 5-6 times slower than current laptop RAM, and the CPU is about 5 times slower than a modern laptop CPU. The post details the quantization and pairing of MTP drafters with a verifier from a previous guide. The result demonstrates that specialized model architectures can run on hardware far below typical requirements.

Level

Hype check

Tap to vote and see what everyone thinks.

#xeon #gemma4 #inference

Read full story

More to chew on!

AIabout 17 hours ago

Local AI runs on six-year-old laptop without GPU

AI2 days ago

Google Releases Gemma 4 12B for 16GB Laptop AI

Dev