Solving Hallucinations
Solving hallucinations in legal briefs is playing on easy mode —— and still too hardAll LLMs ever do is hallucinate. Sometimes we like their hallucinations, sometimes we don’t. And a lot of people are claiming to have solved LLM hallucinations, because they’re a clear impediment to business adoption.
Those people are hallucinating, along with their machines.
Solving hallucinations in legal briefs is playing on easy mode. You
chain your main LLM output through a thing which detects legal
citations. This is not a trivial thing. You can find a lot looking
for /v\./
, as in Brown v. Board of
Education. That gets you a lot, and it may not be
enough. That’s ok! A quick search reveals code like open source code that matches a
case name and rewrites it to be a link to Cornell’s law
site. Looking up the case was going to be my step 2: check the
citation you’ve matched against a database of cases. There are
lots of those. You can chain that code with your LLM today, and if
there are too many failures, toss the whole response. I could have
ChatGPT write the code for me in an afternoon.
Sure, my afternoon’s code would be imperfect, and there are certainly edge cases that make this hard. There’s probably more sophisticated approaches available... but that’s not even getting done. An article in Lawnext, Two More Cases, Just this Week, of Hallucinated Citations in Court Filings Leading to Sanctions documents a brief with 22 out of 24 cases cited being inaccurate. That’s not a result of edge cases slipping through. It’s a result of not trying. In fact, in that case, 20 out of the 22 citations would be identified by a check for ‘v.’ Of those, 5 are real cases, what the court refers to as ‘Fictitious citation using a real case name.’ Those would be distinguished by my second check. So really trivial validation could have revealed that 17 of 22 cases didn’t even exist.
Perhaps we can attribute this to LLMs not being fine tuned (in the English sense, not the technical one) for legal cases. But all the public chatbots are wrapped in layers of ‘safety,’ and it’s hard to see this as an oversight, or the many, many ways in which LLMs might need to be tuned means that general purpose chatbots are going to remain dangerous. It’s more reasonable to see it as an indication that preventing damaging hallucinations is not just a complex task or an hard bit of engineering: it’s unclear if it’s even possible when you get beyond easy mode.
The observation that all LLMs do is hallucinate is by Andrej Karpathy. Image by midjourney.