Raymond Chen recalled a Windows team that built an x86-32 emulator using binary translation. A compiler unrolled a 64 KB memory initialization loop into 65,536 individual instructions, consuming 256 KB of code. The engineers added special translator code to detect and replace that bloated function with a tight loop.
Tap to vote and see what everyone thinks.
Summary by ByteBrief
ML Rewrites Compiler Optimization Rules