News

To further explore the impact of using descriptive data, the researchers trained two models-one using descriptive data and the other using normative data to judge rule violations.
Although models like Google’s Gemma-2 9b and OpenAI’s GPT-4o achieve near-perfect scores on DiscrimEval, the Stanford team found that these models performed poorly on their descriptive and ...