Monday, November 25, 2024
ad
HomeNewsOpenAI Introduces New Web Crawler GPTBot to Consume more Open Web

OpenAI Introduces New Web Crawler GPTBot to Consume more Open Web

The web crawler will gather information from websites that is freely accessible to the public while avoiding content that is paywalled, sensitive, or illegal.

To increase its dataset for training its upcoming generation of AI systems, OpenAI has introduced a new web crawling bot called GPTBot. According to OpenAI, the web crawler will gather information from websites that are freely accessible to the public while avoiding content that is paywalled, sensitive, or illegal. 

However, the system is opt-out. GPTBot will presume available information is open for use by default, similar to other search engines like Google, Bing, and Yandex. The owner of a website must include a “disallow” rule in a common server file in order to stop the OpenAI web crawler from digesting that webpage.

Additionally, according to OpenAI, GPTBot will check scrapped material in advance to weed out personally identifiable information (PII) and anything that contravenes its rules. However, some technological ethicists believe that the opt-out strategy still poses consent-related concerns.

Read More: OpenAI’s Sam Altman Launches Cryptocurrency Project Worldcoin

Some commenters on Hacker News defended OpenAI’s action by arguing that it needs to amass as much information as possible if people want to have a powerful generative AI tool in the future. Another person who was more concerned with privacy complained that “OpenAI isn’t even quoting in moderation. It obscures the original by creating a derivative work without citing it.”

The launch of GPTBot comes in response to recent criticism of OpenAI for previously illegally collecting data to train Large Language Models (LLMs) like ChatGPT. The business changed its privacy policy in April to address these issues.

Meanwhile, a recent GPT-5 trademark filing appears to hint that OpenAI might be working on its next version of the GPT AI model. Large-scale web scraping would probably be used by the new system to refresh and increase its training data. However, there is no official announcement concerning GPT-5 as of yet. 

Subscribe to our newsletter

Subscribe and never miss out on such trending AI-related articles.

We will never sell your data

Join our WhatsApp Channel and Discord Server to be a part of an engaging community.

Sahil Pawar
Sahil Pawar
I am a graduate with a bachelor's degree in statistics, mathematics, and physics. I have been working as a content writer for almost 3 years and have written for a plethora of domains. Besides, I have a vested interest in fashion and music.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular