image of this article category

Apple, NVIDIA and Anthropic reportedly used YouTube transcripts without permission to train AI models

21.07.2024 09:00 AM
Latest
Apple, NVIDIA and Anthropic reportedly used YouTube transcripts without permission to train AI models
dooklik website logo
According to a recent Proof News investigation, some of the biggest tech corporations in the world trained their AI models on a dataset that contained transcripts of over 173,000 YouTube videos—all without the users' consent.
share
share this article on facebook
share this article on twitter
share this article on whatsapp
share this article on facebook messenger
Apple, NVIDIA and Anthropic reportedly used YouTube transcripts without permission to train AI models
Among other businesses, Apple, NVIDIA, and Anthropic used the dataset, which was produced by the nonprofit organization EleutherAI and contains transcripts of YouTube videos from over 48,000 channels. The investigation's conclusions highlight the unsettling reality of artificial intelligence: most of the technology's development is based on data that has been stolen from creators without their knowledge or permission.

The collection contains video transcripts from major news organizations like The New York Times, the BBC, and ABC News, as well as major YouTube creators like Marques Brownlee and MrBeast. However, it does not contain any photos or videos from YouTube. A portion of the collection also includes Engadget video subtitles.

Brownlee said on X, "Apple has sourced data for their AI from numerous organizations." He said, "One of them stole a ton of information and transcripts from YouTube videos, including mine. "This is going to be a long-term, growing problem."

According to a Google representative, Neal Mohan, the CEO of YouTube, previously stated that it would be against the platform's terms of service for businesses to use YouTube's data for AI model training. This information was confirmed to Engadget. An inquiry for comments from Engadget was not answered by Apple, NVIDIA, Anthropic, or EleutherAI.

Businesses using AI have not been open about the data they use to train their models up until now. Artists and photographers attacked Apple earlier this month for withholding the source of training data for Apple Intelligence, the company's take on generative AI that will be available on millions of Apple devices this year.

Particularly, YouTube, the largest video repository on the planet, is a treasure trove of audio, video, and image content in addition to transcripts, which makes it a desirable dataset for AI model training. When The Wall Street Journal questioned OpenAI's chief technical officer, Mira Murati, earlier this year, she avoided answering if the business used YouTube films to train Sora, the company's future AI video production tool. At the time, Murati stated, "I am not going to go into the details of the data that was used, but it was publicly available or licensed data." Sundar Pichai, the CEO of Alphabet, has also stated that it would be against YouTube's terms of service for businesses to use the platform's data to train their AI models.

Use the lookup tool on Proof News to find out if the subtitles from your favorite YouTube channels or videos are included in the dataset.
Related Articles
doolik website logo
Following FDA approval, Apple AirPods Pro 2 will soon be able to be worn as hearing aids—a development that experts believe will revolutionize the hearing aid industry.
doolik website logo
In the past decade, YouTubers have transformed from casual creators filming in their bedrooms to global influencers with the power to shape opinions, trends, and industries. With over 2 billion monthly users, YouTube has become a formidable platform where anyone with a camera and a creative spark can build a community and even launch a career. L
Live Video Streaming
Live video streaming lets you engage with your audience in real time with a video feed. Broadcast your daily show to your audience with no limits, no buffering and high quality videos. Reach all devices anytime anywhere with different video qualities that suits any device and any connection.
$1,120/YE*
The website uses cookies to improve your experience. We’ll assume you’re ok with this, but you can opt-out if you wish.
ACCEPT