OpenAIs o3 AI model underperforms in independent benchmark tests compared to initial claims