GPT-4 Didn’t Pass the Bar — the ‘Top 10%’ Claim Was Overhyped

Last year, OpenAI representatives boasted that their artificial intelligence tool, GPT-4, outperformed 90 percent of law interns on the bar exam. This caused a media frenzy. However, a new study finds that claim was an exaggeration.

OpenAI released results showing GPT-4 answered questions from the Uniform Bar Examination (UBE). The company said the model scored 298 out of a possible 400, putting it in the top 10 percent of test takers. But the new study shows that comparison used only scores from interns who had previously failed the exam and were retaking it.

Eric Martinez, a doctoral student in the Department of Cognitive Sciences at MIT and the lead author of the new study, said the comparison would be more accurate if it used scores from people taking the test for the first time.

GPT-4 did not pass the bar exam; the merits of AI are exaggerated.

What else is known about that exam?

Martinez’s analysis also found that the model’s performance on the essay portion ranged from average to below average.

To dig deeper, Martinez had GPT-4 retake the test using the same parameters as the original study. He noted that the researchers did not use the National Conference of Bar Examiners’ recommended essay grading rubric. Instead, they compared the AI’s responses to the “good answers” submitted by Maryland test takers.

Writing the bar exam essays is the part of the test that most closely resembles the work of a practicing lawyer. That’s also where GPT-4 performed worst, Live Science reported.

“The fact that GPT-4 struggles to write essays compared to lawyers indicates that large language models, at least on their own, have difficulty handling the daily tasks of a lawyer,” Martinez said.

The minimum passing score for the UBE varies by state, typically ranging from 260 to 272. GPT-4’s score would not have been high enough to pass the full exam in most states. Martinez said current artificial intelligence systems are impressive, but they need careful evaluation before being used in legal practice.