In a recent study conducted by researchers from Stanford and UC Berkeley, it was found that OpenAI’s ChatGPT, the artificial intelligence-powered chatbot, appears to be getting worse over time. The researchers discovered that the newest models of ChatGPT had become less capable of providing accurate answers to a series of identical questions within just a few months.
The study’s authors could not determine a clear reason for the decline in the AI chatbot’s capabilities. To assess the reliability of different ChatGPT models, the researchers tested ChatGPT-3.5 and ChatGPT-4 with various tasks, including solving math problems, answering sensitive questions, writing code, and conducting spatial reasoning.
The results revealed a significant drop in accuracy for ChatGPT-4 compared to its performance in March. For instance, ChatGPT-4’s ability to identify prime numbers declined drastically from 97.6% accuracy in March to just 2.4% accuracy in June. The deterioration was also evident in the models’ code generation capabilities.
In addition, ChatGPT’s responses to sensitive questions changed noticeably. While earlier versions of the chatbot provided extensive reasoning for not answering certain sensitive questions, the newer models simply refused to answer, often in a more concise manner. Some examples showed a focus on ethnicity and gender in the refusal.
The researchers highlighted the importance of continuous monitoring of AI model quality, as the behavior of large language models (LLMs) like ChatGPT can change significantly in a short period. They recommended that users and companies relying on LLM services implement some form of monitoring analysis to ensure the chatbot’s performance remains up to standard.
In response to concerns about potential risks from superintelligent AI systems, OpenAI announced plans to create a team to manage such risks, expecting the arrival of superintelligent AI within the next decade.
Get $200 Free Bitcoins every hour! No Deposit No Credit Card required. Sign Up