OpenAI launches GPTBot to enrich ChatGPT
OpenAI, a leading artificial intelligence company, has created GPTBot, an advanced web crawler designed for targeted online data collection, with the aim of further enhancing its models, including the popular ChatGPT.
A New Era of Transparency and Controlled Access
GPTBot stands out for its commendable transparency. Contrary to other web crawlers, GPTBot presents itself unequivocally through the use of the “GPTBot” token in the User Agent and an identification string that confirms its membership of OpenAI. This approach allows website owners to exercise more precise control over access.
Selection of Contents and Guaranteed Access
The selection of data is precise and targeted. GPTBot is configured to access only websites that do not require login authentication, do not collect personal information and comply with current policies. OpenAI's primary goal is to enrich its artificial intelligence systems by improving their accuracy and capabilities.
Responsibilities of Website Owners
Website owners enjoy total control over GPTBot. They can restrict access by placing the bot's User Agent token in the “robots.txt” file or they can choose to allow access only to specific directories. OpenAI shared the IP addresses used by the crawler to enable more precise tracking.
Addressing Criticisms with Concrete Actions
The launch of GPTBot is a direct response to concerns expressed about the use of data by large language models, such as GPT-4. Although the contents are in the public domain, there is debate over whether explicit consent is required for use in AI systems. OpenAI seeks to address these criticisms with concrete initiatives.
Future Planning
If OpenAI continues to source data from third parties, blocking the company's crawler alone may have limited impact, as the data may come from multiple sources. This initiative reflects OpenAI's commitment to dealing responsibly and ethically with issues of transparency and data use.
In conclusion, OpenAI launches GPTBot to enrich ChatGPT and represents a significant step towards more responsible and controlled management of online data by OpenAI. The company seeks to balance access to publicly available content with respect for privacy and website policies, refining its artificial intelligence models in compliance with ethical principles and transparency.