OpenAI says it is dedicating a fifth of its computational resources to developing machine learning techniques to stop superintelligent systems “going rogue.”

Founded in 2015, the San Francisco AI startup’s stated goal has always been to develop artificial general intelligence safely. The technology doesn’t exist yet – and experts are divided over what exactly that would look like or when it may arrive.

Nevertheless, OpenAI intends to carve out 20 percent of its processing capacity and launch a new unit – led by co-founder and chief scientist Ilya Sutskever – to in some way, somehow, prevent future-gen machines from endangering humanity. It’s a subject OpenAI has brought up before.

“Superintelligence will be the most impactful technology humanity has ever invented, and could help us solve many of the world’s most important problems,” the would-be savior of the species opined this week.

“But the vast power of superintelligence could also be very dangerous, and could lead to the disempowerment of humanity or even human extinction.”

OpenAI believes computer systems capable of surpassing human intelligence and overpowering the human race could be developed this decade [Before or after fusion? Or quantum computing? – Ed.].

“Managing these risks will require, among other things, new institutions for governance and solving the problem of superintelligence alignment: how do we ensure AI systems much smarter than humans follow human intent?” the biz added. 

Speaking of OpenAI …

  • The startup, bankrolled by Microsoft, has made its GPT-4 API generally available to paying developers.
  • CompSci professor and ML expert Emily Bender has penned an essay on the real threats from AI models versus the fear of superhuman AI that certain corners have been pushing.

Methods already exist to align – or at least attempt to align – models to human values. Those techniques can involve something called Reinforcement Learning from Human Feedback, or RLHF. With that approach, you’re basically supervising machines to shape them so that they behave more like a human.

Although RLHF has helped make systems such as ChatGPT less prone to generating toxic language, it can still introduce biases, and it is difficult to scale. It typically involves having to recruit a big load of people on not-very-high wages to provide feedback on a model’s outputs – a practice which has its own set of problems.

Developers cannot rely on a few people to police a technology that will affect many, it’s claimed. OpenAI’s alignment team is attempting to solve this problem by building “a roughly human-level automated alignment researcher.” Instead of humans, OpenAI wants to build an AI system that can align other machines to human values without explicitly relying on humans. 

That would be artificial intelligence training artificial intelligence to be more like non-artificial intelligence, it seems to us. It feels a bit chicken and egg.

A post-apocalyptic scene of a city in ruins

If AI drives humans to extinction, it’ll be our fault


Such a system could, for example, search for problematic behavior and provide feedback, or take some other steps to correct it. To test that system’s performance, OpenAI said it could deliberately train misaligned models and see how well the alignment AI cleans up bad behavior. The new team has set a target of solving the alignment problem in four years. 

“While this is an incredibly ambitious goal and we’re not guaranteed to succeed, we are optimistic that a focused, concerted effort can solve this problem. There are many ideas that have shown promise in preliminary experiments, we have increasingly useful metrics for progress, and we can use today’s models to study many of these problems empirically,” the outfit concluded.

“Solving the problem includes providing evidence and arguments that convince the machine learning and safety community that it has been solved. If we fail to have a very high level of confidence in our solutions, we hope our findings let us and the community plan appropriately.”

We’ll start building our bunker now. ®