The New York Times has taken proactive steps to prevent the exploitation of its material for the development and training of artificial intelligence models.
The NYT changed its Terms of Service on August 3rd to forbid the use of its content, including text, pictures, audio and video clips, look and feel, metadata, and compilations, in the creation of any software programme, including, but not limited to, training a machine learning or artificial intelligence system.
The revised terms now add a restriction prohibiting the use of automatic technologies, such as website crawlers, for accessing, using, or gathering such content without express written consent from the publication. According to the NYT, there may be undefined fines or repercussions if people refuse to abide by these new regulations.
Read More: OpenAI’s Sam Altman Launches Cryptocurrency Project Worldcoin
Despite adding the new guidelines to its policy, it doesn’t appear that the publication has altered its robots.txt file, which tells search engine crawlers which URLs can be viewed. The action might be in response to Google’s recent privacy policy update, which disclosed that the search engine giant may use open data from the internet to train its numerous AI services, such as Bard or Cloud AI.
However, the New York Times also agreed to a $100 million contract with Google in February, allowing the search engine to use part of the Times’ content on its platforms for the following three years. Given that both businesses would collaborate on technologies for content distribution, subscriptions, marketing, advertising, and “experimentation,” it is probable that the modifications to the NYT terms of service are aimed at rival businesses like OpenAI or Microsoft.
According to a recent announcement. website owners can now prevent OpenAI’s GPTBot web crawler from scraping their sites. Numerous large language models that power well-known AI systems like OpenAI’s ChatGPT are trained on large data sets that may contain content that has been illegally stolen from the internet or is otherwise protected by copyright.