Truly Intelligent AI Could Play by the Rules, No Matter How Strange
To build safe but powerful AI models, start by testing their ability to play games on the fly
A proposed game-playing challenge would evaluate AIs on how well they can adapt to and follow new rules
Tic-tac-toe is about as simple as games get—but as Scientific American’s legendary contributor Martin Gardner pointed out almost 70 years ago, it has complex variations and strategic aspects. They range from “reverse” games—where the first player to make three in a row loses—to three-dimensional versions played on cubes and beyond. Gardner’s games, even if they boggle a typical human mind, might point us to a way to make artificial intelligence more humanlike.
That’s because games in their endless variety—with rules that must be imagined, understood and followed—are part of what makes us human. Navigating rules is also a key challenge for AI models as they start to approximate human thought. And as things stand, it’s a challenge where most of these models fall short.
That’s a big deal because if there’s a path to artificial general intelligence, the ultimate goal of machine-learning and AI research, it can only come through building AIs that are capable of interpreting, adapting to and rigidly following the rules we set for them.
On supporting science journalism
If you’re enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.
To drive the development of such AI, we must develop a new test—let’s call it the Gardner test—in which an AI is surprised with the rules of a game and is then expected to play by those rules without human intervention. One simple way to achieve the surprise is to disclose the rules only when the game begins.
The Gardner test, with apologies to the Turing test, is inspired by and builds on the pioneering work in AI on general game playing (GGP), a field largely shaped by Stanford University professor Michael Genesereth. In GGP competitions, AIs running on standard laptops face off against other AIs in games whose rules—written in a formal mathematical language—are revealed only at the start. The test proposed here advances a new frontier: accepting game rules expressed in a natural language such as English. Once a distant goal, this is now within reach of modern AIs because of the recent breakthroughs in large language models (LLMs) such as those that power ChatGPT and that fall within the families of Claude and Llama.
The proposed challenge should include a battery of tests that could be initially focused on games that have been staples of GGP competitions such as Connect Four, Hex and Pentago. It should also leverage an impressive array of games that Gardner wrote about. Test design could benefit from the involvement of the vibrant international GGP research community, developers of frontier AI models and, of course, diehard Martin Gardner fans.
But to p*** the new test, it isn’t enough to create an AI system that’s good at playing one specific predetermined game or even many. Instead, an AI must be designed to master any strategy game on the fly. Strategy games require humanlike ability to think across and beyond multiple steps, deal with unpredictable responses, adapt to changing objectives and still conform to a strict rule set.
That’s a big leap from today’s top game-playing AI models, which rely on knowing the rules in advance to train their algorithms. Consider, for instance, AlphaZero, the revolutionary AI model that’s capable of playing three games—chess, Go and shogi (Japanese chess)—at a superhuman level. AlphaZero learns through a technique known as “self-play”—it repeatedly plays against a copy of itself, and from that experience, it gets better over time. Self-play, however, requires the rules of each game to be set before training. AlphaZero’s ability to master complex games is undoubtedly impressive, but it’s a brittle system: if you present AlphaZero with a game different than the ones it’s learned, it will be completely flummoxed. In contrast, an AI model performing well on the proposed new test would be capable of adapting to new rules, even in the absence of data; it would play any game and follow any novel rule set with power and precision.
That last point—precision—is an important one. You can prompt many generative AI systems to execute variants on simple games, and they’ll play along: ChatGPT can play a 4×4 or 5×5 variant of tic-tac-toe, for instance. But an LLM prompt is best thought of as a suggestion rather than a concrete set of rules—that’s why we often have to coax, wheedle and prompt tune LLMs into doing exactly what we want. A general intelligence that would p*** the Gardner test, by contrast, would by definition be able to follow the rules perfectly: not following a rule exactly would mean failing the test.
Specialized tools that operate without truly understanding the rules tend to color outside the lines, reproducing past errors from training data rather than adhering to the rules we set. It’s easy to imagine real-world scenarios in which such errors could be catastrophic: in a national security context, for instance, AI capabilities are needed that can accurately apply rules of engagement dynamically or negotiate subtle but crucial differences in legal and command authorities. In finance, programmable money is emerging as a new form of currency that can obey rules of ownership and transferability—and misapplying these rules could lead to financial disaster.
Ironically, building AI systems that can follow rules rigorously would ultimately make it possible to create machine intelligences that are far more humanlike in their flexibility and ability to adapt to uncertain and novel situations. When we think of human game players, we tend to think of specialists: Magnus Carlsen is a great chess player but might not be so hot at Texas Hold’Em. The point, though, is that humans are capable of generalizing; if Carlsen ever gave up chess, he could be a decent contender for the Pentamind World Championship, which celebrates the best all-round games player.
Game playing with a novel set of rules is crucial to the next evolution of AI because it will potentially let us create AIs that will be capable of anything—but that will also meticulously and reliably follow the rules we set for them. If we want powerful but safe AI, testing its ability in playing games on the fly might be the best path forward.