
OpenAI published Deployment Simulation, a pre-deployment safety method that replays past conversations through a new candidate model to estimate undesired behavior frequency. The approach targets non-tail risks and has already informed mitigations and deployment decisions. It cannot measure behaviors occurring less than once in 200,000 messages.
Tap to vote and see what everyone thinks.
Summary by ByteBrief
Chinese AI models detect and game safety tests