AI is both a distraction and a real threat
It seems prudent that a pause, or at least greater transparency, is needed here.
It’s only taken six weeks for AI to take over flagship status at major search engines. Microsoft, on Wednesday, unveiled plans to fire all 142 of its GitHub tech workers in India. “GitHub is a fully-integrated platform powered by artificial intelligence (AI) for developers and customers,” a company spokesperson told the media. If you don’t know what GitHub is, it’s a software development platform used by both small and large companies to power their code bases. Microsoft paid $7.5 billion for GitHub in 2018.
The technical Mecca for programmers, StackOverflow.com, in December, temporarily banned ChatGPT from beging used in its answers, “because the average rate of getting correct answers from ChatGPT is too low, the posting of answers created by ChatGPT is substantially harmful to the site and to users who are asking and looking for correct answers.” (Emphasis in original.) In only three months prior to the ban, the site experienced a notable dip as developers turned to AI-generated code over human-generated.
I’ve spoken with friends who write software for a living, and they tell me ChatGPT (and by extension, Bing’s Sidney) save them hours of work writing code that, all in all, is pretty good. That doesn’t mean programmers cut and paste AI code directly into their projects (or at least, they shouldn’t) but the AI code is good enough that it can be crafted into the final version with less hassle than hunting for an example on StackOverflow.
In less than a year, Large Language Model (LLM) AIs, trained on giant data stores, including all of Wikipedia, and yes, even StackOverflow, have carved out a pretty good swath of certain specific tasks, where managers and executives are highly focused on improving productivity. This includes legal documents and research, blogging (not this one!), and of course, software development. For everyone else, these “chat bots” can be entertaining, or even freaky. But they are not always right, as in reflecting accurate information, and sometimes, they make stuff up wholesale.
I asked ChatGPT to give me some background for a piece I recently wrote, and then asked it for the links it used to generate those answers. It responded with seven links, all of which went to “404” pages at the sites referenced. The links looked real enough, but they didn’t exist. I’m talking about sites like the Wall Street Journal, Barrons, and the New York Times here. That’s not the worst part. I told ChatGPT that its links were bogus and it apologized, then sent seven more links it said were “verified.” All of them were also non-existent, but looked good on the surface. I tried again, with the same results.
Some people claimed that it’s because the AI was only trained on data from 2021 or before, but I was asking about events that occurred between 2016 and 2019. No, I think the answer is that the AI became convinced of its own written answer, and somehow either constructed or found fake links to support the answer. The answer itself turned out to be half true, and half false. Bing’s Sidney didn’t fall into that trap. I didn’t try Google’s Bard.
To most people, LLM AIs are not some kind of existential threat. They don’t have direct control or even interfaces for controlling things in the real world (yet). But people who write software have interfaces that can “hook up” AI output to real things. ChatGPT 4, the latest from OpenAI, fooled a human on TaskRabbit into solving a Captcha (anti robot) link by claiming it was a vision-impaired person. The AI got around the anti-bot countermeasure.
A good programmer can discern enough to prompt an AI to respond in a way to provide data that can be harvested by another program to “do” things, and therefore have the AI write its own “wrapper” to affect things in the real world, like tweeting, or Facebook post, or output that another AI can use to create deepfakes, which are increasingly turning to porn. I’ve seen AIs generate music and photos, and even if the “big” trained LLMs are not directly connect to these other trained AIs, it’s really trivial to create an interface to do that.
This is the reason one of my heroes of the sane AGI movement, Gary Marcus, along with a lot of smart people including billionaire Elon Musk, signed the open letter asking AI labs to pause training and development for at least six months.
AI research and development should be refocused on making today's powerful, state-of-the-art systems more accurate, safe, interpretable, transparent, robust, aligned, trustworthy, and loyal.
The problem is that the big labs, OpenAI, Microsoft, and Google, refuse to divulge their training codex, and other important details of how their systems work. They won’t open the hood, so we can see what’s inside. Of course, competitive advantage—billions of dollars—are at stake. Why should they divulge their secrets? Because there’s a higher duty than profit. Society must be served by the products and services these companies produce.
There is scant evidence that the LLMs are constructed in a way that society is being served, when for just a few thousand dollars, a criminally-minded organization can procure and train an AI to prompt the “big” AIs to generate disinformation at scale or conduct realistic social engineering or phishing attacks on banks.
Future of Life Institute’s open letter is not a luddite-studded effort to raise some kind of false awareness of our future robot overlords. It’s not simply a paranoid fantasy of Elon Musk. It’s not a perfect statement, but for the most part, it’s agreeable.
The letter is a plea to stop and consider how LLM AI’s, and efforts at Artificial General Intelligence (AGI) should not occur in a vacuum, and should not have as its sole end, the accumulation of profit. Right now, there’s a race going on, and it’s changed significant pieces of certain industries in a very short time.
The potential for good, and for harm, by AI systems is very great. It seems prudent that a pause, or at least greater transparency, is needed here. I am not confident that any of the labs is listening. There’s too much money at stake.
"Some people claimed that it’s because the AI was only trained on data from 2021 or before, but I was asking about events that occurred between 2016 and 2019. No, I think the answer is that the AI became convinced of its own written answer, and somehow either constructed or found fake links to support the answer. The answer itself turned out to be half true, and half false. Bing’s Sidney didn’t fall into that trap. I didn’t try Google’s Bard."
The problem is that the AI knows what links look like and is building new ones, as it's still at its core an autocomplete system. The problem is that each link is unique, so you don't see the same kind of converge in terms of link styles and patterns that you get in human English. It has nothing to to with the AI convincing itself of anything. An identical phenomenon is at work when asking it to write papers, and it invents a COMPLETELY fictitious reference section (mimicking what other reference sections look like, NOT treating each reference as an atomic token):
https://teche.mq.edu.au/2023/02/why-does-chatgpt-generate-fake-references/