News
To further explore the impact of using descriptive data, the researchers trained two models-one using descriptive data and the other using normative data to judge rule violations.
Although models like Google’s Gemma-2 9b and OpenAI’s GPT-4o achieve near-perfect scores on DiscrimEval, the Stanford team found that these models performed poorly on their descriptive and ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results