3 Comments
author

"Some people claimed that it’s because the AI was only trained on data from 2021 or before, but I was asking about events that occurred between 2016 and 2019. No, I think the answer is that the AI became convinced of its own written answer, and somehow either constructed or found fake links to support the answer. The answer itself turned out to be half true, and half false. Bing’s Sidney didn’t fall into that trap. I didn’t try Google’s Bard."

The problem is that the AI knows what links look like and is building new ones, as it's still at its core an autocomplete system. The problem is that each link is unique, so you don't see the same kind of converge in terms of link styles and patterns that you get in human English. It has nothing to to with the AI convincing itself of anything. An identical phenomenon is at work when asking it to write papers, and it invents a COMPLETELY fictitious reference section (mimicking what other reference sections look like, NOT treating each reference as an atomic token):

https://teche.mq.edu.au/2023/02/why-does-chatgpt-generate-fake-references/

Expand full comment
author

From the nerdy nerd land, your answer is most correct. I apologize for anthropomorphizing an autocomplete routine. “Convinced” is not the right word, but it’s the only word I could think of to apply to an algorithm that bases its reference material on the text of the generated answer instead of indexed actual links.

Expand full comment
author

It's not a bad explanation - but links and references are one of those cases where it's useful to know how it works so that expectations can be calibrated accordingly. I suspect (but haven't tested) that it will make up similar kinds of data "nuggets" consisting of aggregates of the tokens (words) it's dealing with: phone numbers, e-mail addresses, physical addresses, etc.

Didn't mean to be a pedantic, and when I get some more time later, I'll find a link I wanted to share from a professor dealing with requests for papers that he didn't actually write, since people are writing papers with this technology, and not realizing that plausible-looking reference section is likely entirely a fiction.

Expand full comment