OpenAI, the artificial intelligence company, has recently unveiled its latest web crawling tool called “GPTBot,” which holds the potential to enhance future iterations of ChatGPT models.
The company believes that by crawling web pages, the data collected can be utilized to improve accuracy and broaden the capabilities of their upcoming AI models.
Web crawlers, also known as web spiders, are bots that index website content across the internet. Search engines like Google and Bing employ these crawlers to ensure websites appear in search results.
OpenAI clarified that GPTBot will only gather publicly available data from the world wide web, avoiding sources with paywalled content, personal identifiable information, or text that violates their policies.
Website owners can prevent GPTBot from crawling their sites by adding a “disallow” command to a standard file on their servers.
This feature allows them to control whether their web content is included in the data collection process.
Interestingly, OpenAI filed a trademark application for “GPT-5,” the anticipated successor to their current GPT-4 model.
However, the CEO, Sam Altman, clarified that GPT-5’s training is not imminent, as the company needs to conduct several safety audits before starting the process.
Recent concerns have been raised about OpenAI’s data collection practices, specifically regarding copyright and consent.
In June, Japan’s privacy watchdog issued a warning to OpenAI for collecting sensitive data without proper authorization.
READ MORE: Chamber of Digital Commerce Releases Report on SEC vs Ripple Ruling
Similarly, Italy temporarily banned the use of ChatGPT due to alleged breaches of European Union privacy laws.
Additionally, a class-action lawsuit was filed against OpenAI by 16 plaintiffs, accusing the company of accessing private information from ChatGPT user interactions. Microsoft, named as a defendant in the lawsuit, might also be implicated.
If these allegations are proven true, OpenAI and Microsoft could be found in violation of the Computer Fraud and Abuse Act, a law with a history of addressing web-scraping cases.
In conclusion, OpenAI’s new web crawling tool, GPTBot, offers promising potential for improving future ChatGPT models.
However, concerns regarding data collection practices must be addressed to ensure compliance with privacy laws and prevent potential legal repercussions.
As the company gears up for the development of GPT-5, it is essential to prioritize safety audits and adhere to ethical standards in AI research and development.
Other Stories:
Digital Currency Group Faces Regulatory Scrutiny Over Transactions with Genesis Global Capital
2024 Presidential Candidates’ Mixed Views on Crypto