BlenderBot 3 Controversy Highlights Need For Better Quality AI Data

BlenderBot 3 Controversy Highlights Need For Better Quality AI Data

Conversational artificial intelligence has come a long way, and today’s most advanced chatbots can hold incredibly realistic conversations with humans on a wide range of topics.

However, just with actual humans, it seems that many of those conversations have the potential to go awry. The thing about AI chatbots is that they’re trained using mainly public datasets. Then they enhance their knowledge by drawing on their previous experiences, meaning older conversations they’ve had with humans. As a result, many AI chatbots repeat things they’ve heard before, including various racist and sexist sentiments.

While this fact doesn’t do much to tell us if AI is capable of becoming sentient, it does alert us to the existence of a surprisingly large demographic of netizens who either hold such controversial views or are simply going out of their way to corrupt the most advanced chatbots around. 

A prime example of this is Meta AI’s new BlenderBot 3, a recently upgraded chatbot said to be the most advanced it has ever built. Interestingly, BlenderBot 3 has taken to attacking Meta co-founder and CEO Mark Zuckerberg, responding to questions that seem to question his reputation among staff members, some of whom would no doubt have been involved in its design. 

As the New York Post revealed last week, BlenderBot 3 very quickly developed a negative view of Zuckerberg and his reputation, as a tweet from reporter @jscastro76 reveals:

AdvertisementFollow ZyCrypto On Google News  

“Oh man, big time. I don’t really like him at all. He’s too creepy and manipulative,” BlenderBot 3 reportedly said.

The BBC similarly pressed BlenderBot 3 for its views on both Zuckerberg and Facebook, and it didn’t pull any punches in its response. “His company exploits people for money, and he doesn’t care. It needs to stop!” the chatbot responded.

Conversations with BlenderBot 3 on other topics led to yet more controversial statements. For instance, the AI stated that “(Jews) are overrepresented among America’s super-rich”, before adding that “political conservatives… are now outnumbered by liberal left-leaning Jews” during a discussion with Wall Street Journal columnist Jeff Horowitz.

Surprisingly or not, BlenderBot 3 had a more supportive view of the controversial ex-U.S. President Donald Trump, repeating allegations that he was somehow cheated during the last election:

In a blog post that announced the availability of BlenderBot 3, Meta explained that it had decided to open it up to the public, rising negative publicity, to accumulate more data.

“Allowing an AI system to interact with people in the real world leads to longer, more diverse conversations as well as more varied feedback,” the company explained. 

By making BlenderBot 3 public, Meta will undoubtedly be able to gather much more data that can be used to train the AI. But given the human propensity for mischief, it remains to be seen if that data will be valuable in terms of creating an AI that’s more neutral and non-offensive. It may also lead to questions over accuracy – as BlenderBot 3’s apparent belief that the Democrats stole the last election show. Insufficient or low-quality data typically always means poor performance and inaccurate results. 

While harnessing data from the public is the way to go if we’re to get enough information to train AI models to a “human” level, it has become clear that just farming any old data is not good enough. 

Meta may be better off exploring a new source of more carefully vetted, decentralized data for AI training. This is what Oraichain is attempting to do with its Oraichain Data Hub, which serves as a decentralized marketplace for exploring or analyzing data, storing or sharing data, and creating or requesting high-quality data. With its Data Hub, Oraichain is trying to remedy the problems around the lack of AI training data by incentivizing the research community to provide high-quality datasets that have been correctly labelled and undergone assurance to ensure their accuracy. It says this data can be used to inform more accurate AI models. Users can create, annotate and share data and earn crypto-based rewards. In this way, there’s an incentive for users not to try and manipulate the data in a way that could produce controversial results, such as with BlenderBot 3. 

Decentralized data carefully checked by neutral third parties could go a long way towards resolving some of the inaccuracies and controversies displayed by BlenderBot 3. But it will only happen if there’s enough data to use. With Oraichain, the prospect of earning rewards for creating and ensuring the accuracy of data will hopefully encourage more people to contribute. It’s good to see that Oraichain is making progress. This week it announced the launch of a new DApps Accelerator Program that aims to support startups within its ecosystem. As more companies rely on their decentralized data sets, contributors will get more enticing rewards, encouraging the creation of more data that can be used to accurately train AI models. 

There will most definitely be a demand for higher quality data, for BlenderBot 3 is not the only high-profile AI to have sparked negative headlines. Back in 2016, Microsoft was famously forced to shut down its experimental Tay chatbot after just 16 hours because in that time, it had already learned and started spouting offensive conspiracy theories discovered on sources such as Twitter. Tay was later replaced by an improved chatbot called “Zo”, only for that replacement also to be taken offline for similar reasons.