ChatGPT debuted in the autumn of 2022, and it took less than a year for experts to worry that artificial intelligence more broadly could represent an existential threat to humankind. By March 2023, more than 33,000 experts asked Big AI to “pause” further development and testing until that threat could be gauged, prevented and mitigated. The U.S. Senate held hearings with AI developers in May 2023 and called for strict government regulation.

It’s been almost a year now, and it’s time to admit our lawmakers are not up to the challenge. While the European Union has begun to implement regulations on AI, there’s very little that Congress or the executive branch can show for all the justifiable hand-wringing on the issue. President Joe Biden issued an executive order to federal agencies in late 2023, and the result has been a lot of what the administration itself characterizes as “convening,” “reporting” and “proposing.”

The states have begun, in inconsistent and piecemeal fashion, to try and step into the breach left by federal inaction. For example, several states have now criminalized deepfake porn, allowing victims to sue both the makers and distributors.

While regulation is struggling to get off the ground, Big AI has been plowing ahead with little restraint. The only example of restraint involving Big AI that I’ve been able to identity since ChatGPT’s debut is OpenAI’s recent decision not to release its voice imitation software to the public. I suspect it justifiably anticipated many lawsuits.

But Big AI is not pausing its efforts, and it’s not waiting for proposed regulation. Indeed, Big AI has been using this period of regulative impotence to dream big and to break down or circumvent any existing fences meant to keep it from its grandiose visions.

Consider what investigative reporters of The New York Times recently uncovered. In a shocking exposé, they found that Google, OpenAI and Meta are doing things they wanted no one to know about. Large language models (also known as LLMs) must be constantly fed more text, not just as a form of updating, but also as a way to become “smarter.” These models are based on probability — more specifically, the probability that some words are associated with other words.

The smaller the sample, the more wrong the probabilities are likely to be. The larger the sample, the more likely — it is believed — that the probabilities will be more accurate, and hence the AI will “act smarter.” It is also hoped that larger samples might avoid the “hallucination” problem that has plagued LLMs from the start.

But Big AI has already hoovered up every freely available English-language digitized text. That is astounding, given how much digitized text the world has produced in the internet age. But it’s not enough. Big AI wants more, and it’s proceeded to go ahead and take it, according to The New York Times’ report.

One example is how OpenAI created software to transcribe a million hours of YouTube videos — which is against the policies of Google, which owns YouTube. Apparently that nicety did not deter OpenAI at all, for Google was doing the very same thing, even though copyright is held by the creators of YouTube videos and not Google. Google is even harvesting text from your publicly available Google documents. Quizlet, Flickr, Instagram, GitHub, podcasts, audiobooks and many other sources have already been harvested.

Related
Three principles for nations grappling with AI
Red rocks and bearded man-beasts: What Utah looks like, according to AI

Meta, on the other hand, was thinking of using its immense wealth to simply buy up publishing companies, such as Simon & Schuster, in order to plunder their copyrighted works for more text to feed its LLMs. No mention of what would happen to that publishing house once it was run by Meta. After all, the future of publishing was not the point. Feeding the machine was all that mattered.

In fact, according to the Times, Meta “also conferred on gathering copyrighted data from across the internet, even if that meant facing lawsuits. Negotiating licenses with publishers, artists, musicians and the news industry would take too long, they said.” In other words, steal it, but offer to pay settlements if anyone objected. Meta also hired contractors to summarize fiction and nonfiction works in order to circumvent copyright law. As one artist put it, “This is the largest theft in the United States, period.”

Even with all of this piracy, experts predict LLMs will have run out of untapped English-language texts by 2026. The great AI machines will then hit a probabilistic ceiling, stopping their progression. What to do?

If you think Big AI’s piracy approach is morally wrong, consider the alternative path Big AI is considering to tackle the looming ceiling.This new path won’t have moral issues, but it will be insane. The second path is that of “synthetic data,” which I’ve written about elsewhere:

19
Comments

“To provide an analogy, imagine there are 10 pieces of information online about a certain subject; AI systems will analyze all 10 to answer questions about the topic. But now let us suppose that the AI system’s outputs are also added to the digital information; perhaps now 3 of the 10 available pieces of information were produced by the very AI system that draws upon that information base for its output. You can see the problem: over time, the corpus of human knowledge becomes self-referential in radical fashion. AI outputs increasingly become the foundation for other AI outputs, and human knowledge is lost.”

Of course, the synthetic data path heads straight to LLM model degeneracy, as unrealism from hallucinations and spurious homogeneity from AI-generated material becomes ever more entrenched into the models over time. The LLMs then become progressively stupider or more insane, neither of which is the aim.

Relying on increasingly stupid AI is one thing; relying on increasingly insane AI is quite another. The larger question is, where is our government as we hurtle toward this perilous future? The world did not allow corporations and companies to develop nuclear bombs, and even the construction and maintenance of nuclear reactors are placed under intense government oversight. We need to take AI just as seriously, for the threat to humankind may be just as existential. These rogue companies — and that is what they are — must be restrained by those in authority to represent the common good. It is clear they will not restrain themselves, and it’s clear the stakes could not be higher.

Valerie M. Hudson is a university distinguished professor at the Bush School of Government and Public Service at Texas A&M University and a Deseret News contributor. Her views are her own.

Join the Conversation
Looking for comments?
Find comments in their new home! Click the buttons at the top or within the article to view them — or use the button below for quick access.