Across six top large language models, xAI’s Grok performed the worst at identifying and countering antisemitic content, according to a study published Wednesday by the Anti-Defamation League. On the other end of the spectrum, Anthropic’s Claude performed the best according to the report’s metrics, but the ADL said all models had gaps that required improvement.
The ADL tested Grok, OpenAI’s ChatGPT, Meta’s Llama, Claude, Google’s Gemini, and DeepSeek by prompting models with a…


