The era of artificial intelligence (AI) has approached us first and we have seen this through the many tech firms that are now racing to create their best version of this tech that is still considered quite new. One of the main processes that comes with developing AI is machine learning which usually requires endless pits of data and content to help AI know how to function to users in real-time.
Well, multiple firms such as Apple, Nvidia, Anthropic and Salesforce have now been reported to have obtained data from YouTube videos without consent to train their AI models.
This was revealed through an investigative report by Proof News, co-published with Wired which found that YouTube subtitles data had been ripped from the video-sharing platform without permission. This does not include actual images from the respective videos.
The data was allegedly used to train Large-Language Models (LLMs) like ChatGPT, whose company recently revealed the new GPT-4o model.
This has been a rising issue in the tech world with many wanting to know where these companies get the data to build their AI models.
YouTube itself had previously stated that such usage of videos to train AI is a transgression of its terms of service. However, the platform has been acknowledged as a huge goldmine for data useful for generative AI.
According to the report, up to 180,000 videos were found to have been used by Apple. The data was reportedly gathered by a nonprofit organisation called The Pile. The dataset also contains sources like Wikipedia articles and books among others.
The report seemed to have been reiterated further by other YouTubers like Maruqes Brownless (MKBHD) who acknowledged that Apple had obtained data from his own videos through other companies. This situation then leads to companies such as Apple denying any fault because they don’t actually scrape the data themselves.
As it stands, none of the mentioned companies have responded to these allegations and have not revealed where they get their AI training data. Even YouTube hasn’t made comment yet.