Former OpenAI Staffer Says the Company Is Breaking Copyright Law and Destroying the Internet
A former researcher at the OpenAI has come out against the company’s business model, writing, in a personal blog, that he believes the company is not complying with U.S. copyright law. That makes him one of a growing chorus of voices that sees the tech giant’s data-hoovering business as based on shaky (if not plainly illegitimate) legal ground.
“If you believe what I believe, you have to just leave the company,” Suchir Balaji recently told the New York Times. Balaji, a 25-year-old UC Berkeley graduate who joined OpenAI in 2020 and went on to work on GPT-4, said he originally became interested in pursuing a career in the AI industry because he felt the technology could “be used to solve unsolvable problems, like curing diseases and stopping aging.” Balaji worked for OpenAI for four years before leaving the company this summer. Now, Balaji says he sees the technology being used for things he doesn’t agree with, and believes that AI companies are “destroying the commercial viability of the individuals, businesses and internet services that created the digital data used to train these A.I. systems,” the Times writes.
This week, Balaji posted an essay on his personal website, in which he argued that OpenAI was breaking copyright law. In the essay, he attempted to show “how much copyrighted information” from an AI system’s training dataset ultimately “makes its way to the outputs of a model.“ Balaji’s conclusion from his analysis was that ChatGPT’s output does not meet the standard for “fair use,” the legal standard that allows the limited use of copyrighted material without the copyright holder’s permission.
“The only way out of all this is regulation,” Balaji later told the Times, in reference to the legal issues created by AI’s business model.
Gizmodo reached out to OpenAI for comment. In a statement provided to the Times, the tech company offered the following rebuttal to Balaji’s criticism: “We build our A.I. models using publicly available data, in a manner protected by fair use and related principles, and supported by longstanding and widely accepted legal precedents. We view this principle as fair to creators, necessary for innovators, and critical for US competitiveness.”
It should be noted that the New York Times is currently suing OpenAI for unlicensed use of its copyrighted material. The Times claimed that the company and its partner, Microsoft, had used millions of news articles from the newspaper to train its algorithm, which has since sought to compete for the same market.
The newspaper is not alone. OpenAI is currently being sued by a broad variety of celebrities, artists, authors, and coders, all of whom claim to have had their work ripped off by the company’s data-hoovering algorithms. Other well-known folks/organizations who have sued OpenAI include Sarah Silverman, Ta-Nahisi Coates, George R. R. Martin, Jonathan Franzen, John Grisham, the Center for Investigative Reporting, The Intercept, a variety of newspapers (including The Denver Post and the Chicago Tribune), and a variety of YouTubers, among others.
Despite a mixture of confusion and disinterest from the general public, the list of people who have come out to criticize the AI industry’s business model continues to grow. Celebrities, tech ethicists, and legal experts are all skeptical of an industry that continues to grow in power and influence while introducing troublesome new legal and social dilemmas to the world.