GPT-4 did not pass the bar exam: the merits of AI are exaggerated.

by cropped 21969Gaby

Last year, the developers GPT OpenAI representatives boasted that their artificial intelligence tool outperformed 90 percent of law interns on the bar exam. This caused a media frenzy. However, according to a new study, this claim turned out to be an exaggeration.

Then OpenAI released the results of a study in which GPT-4 answered questions from the Uniform Bar Examination (UBE). According to the developers, their language model has AI She scored 298 points out of a possible 400. This supposedly allowed her to enter the top ten percent of the best interns. However, it has now become clear that the chatbot only made it into the 10 percent of those interns who had previously failed the exam (one or more times) and were retaking it.

According to Eric Martinez, a doctoral student in the Department of Cognitive Sciences at MIT and the lead author of the new study, the comparison would be more accurate with the scores of those who took the test for the first time.

GPT-4 did not pass the bar exam: the merits of AI are exaggerated.

What else is known about that exam?

Eric Martinez’s conclusions also indicate that the model’s results ranged from average to below average during the essay writing stage.

To further study the results, Mr. Martinez had GPT-4 retake the test according to the parameters set by the authors of the original research. The scientist noted that the recommendations for essay evaluation established by the National Conference of Bar Examiners, which administers the exams, were not used. Instead, the researchers simply compared the AI’s responses to “good answers” from residents of the state of Maryland.

Meanwhile, writing an essay during the lawyer exam is the closest to the tasks performed by a practicing lawyer. It is at this stage that the GPT-4 model showed the worst results, the publication reported. Live Science .

“The fact that GPT-4 struggles to write essays compared to lawyers indicates that large language models, at least on their own, have difficulty handling the tasks that a lawyer performs daily,” noted the researcher.

The minimum passing score for this exam varies from state to state, ranging from 260 to 272. Therefore, the GPT-4 score for the essay would not allow the model to pass the overall exam. According to Eric Martinez, while current systems… artificial intelligence Undoubtedly, they are impressive and should be carefully evaluated before being used in legal practice.

ABOUT ME

main logo
21969

My goal is to provide interesting and useful information to readers and inspire them at every stage of life.

LATEST POSTS

DON'T MISS