Allen AI released olmo-eval, an evaluation workbench built on OLMES for the LLM development loop. It reduces work for implementing new evaluations, offers flexible run configurations, and simplifies composing components. The tool addresses the challenge of repeatedly evaluating models across data, architecture, and hyperparameter changes.
Tap to vote and see what everyone thinks.