GEPA improves prompt performance for arithmetic word problems by evolving multi-component prompts with structured feedback. A weak seed prompt is refined using GPT-4o-mini for task solving and GPT-4.1 for reflection. Feedback from failed attempts is fed into GEPA to guide prompt evolution. The optimized prompt outperforms the baseline on a held-out validation set of 100 problems. The process uses a deterministic benchmark with problem types including discount, travel, wallet, and chain scenarios.
Tap to vote and see what everyone thinks.
Claude users avoid limits by starting prompts differently
Summary by ByteBrief