“This is the true story of 25 video game characters picked to live in a town and have their lives taped…to find out what happens when computers stop being polite…and start getting real.”
Researchers at Google and Stanford recently created a new reality show of sorts—with AI agents instead of people.
Using OpenAI’s viral chatbot ChatGPT and some custom code, they generated 25 AI characters with back stories, personalities, memories, and motivations. Then the researchers dropped these characters into a 16-bit video game town—and let them get on with their lives. So, what does happen when computers start getting real?
“Generative agents wake up, cook breakfast, and head to work,” the researchers wrote in a preprint paper posted to the arXiv outlining the project. “Artists paint, while authors write; they form opinions, and notice each other, and initiate conversations; they remember and reflect on days past as they plan the next day.”
Not exactly riveting television, but surprisingly lifelike for what boils down to an enormous machine learning algorithm…talking to itself.
The AI town, Smallville, is just the latest development in a fascinating moment for AI. While the basic version of ChatGPT takes interactions one at a time—write a prompt, get a reply—a number of offshoot projects are combining ChatGPT with other programs to automatically complete a cascade of tasks. These might include making a to-do list and checking off items on the list one by one, Googling information and summarizing the results, writing and debugging code, even critiquing and correcting ChatGPT’s own output.
It’s these kinds of cascading interactions that make Smallville work too. The researchers have crafted a series of companion algorithms that, together, power simple AI agents that can store memories and then reflect, plan, and act based on those memories.
The first step is to create a character. To do this, the researchers write a foundational memory in the form of a detailed prompt describing that character’s personality, motivations, and situation. Here’s an abbreviated example from the paper: “John Lin is a pharmacy shopkeeper at the Willow Market and Pharmacy who loves to help people. He is always looking for ways to make the process of getting medication easier for his customers; John Lin is living with his wife, Mei Lin, who is a college professor, and son, Eddy Lin, who is a student studying music theory.”
But characterization isn’t enough. Each character also needs a memory. So, the team created a database called the “memory stream” that logs an agent’s experiences in everyday language.
When accessing the memory stream, an agent surfaces the most recent, important, and relevant memories. Events of the highest “importance” are recorded as separate memories the researchers call “reflections.” Finally, the agent creates plans using a nest of increasingly detailed prompts that break the day into smaller and smaller increments of time—each high level plan is thus broken down into smaller steps. These plans are also added to the memory stream for retrieval.
As the agent goes about its day—translating text prompts into actions and conversations with other characters in the game—it taps its memory stream of experiences, reflections, and plans to inform each action and conversation. Meanwhile, new experiences feed back into the stream. The process is fairly simple, but when combined with OpenAI’s large language models by way of the ChatGPT interface, the output is surprisingly complex, even emergent.
In a test, the team prompted a character, Isabella, to plan a Valentine’s Day party and another, Maria, to have a crush on a third, Klaus. Isabella went on to invite friends and customers to the party, decorate the cafe, and recruit Maria, her friend, to help. Maria mentions the party to Klaus and invites him to go with her. Five agents attend the party—but equally human—several flake or simply fail to show up.
Beyond the initial seeds—the party plan and the crush—the rest emerged of its own accord. “The social behaviors of spreading the word, decorating, asking each other out, arriving at the party, and interacting with each other at the party, were initiated by the agent architecture,” the authors wrote.
It’s remarkable this can be accomplished, for the most part, by simply splitting ChatGPT into a number of functional parts and personalities and playing them off one another.
Video games are the most obvious application of this kind of believable, open-ended interaction, especially when combined with high-fidelity avatars. Non-player characters could evolve from scripted interactions to conversations with convincing personalities.
The researchers warn people may be tempted to form relationships with realistic characters—a trend that’s already here—and designers should take care to add content guardrails and always disclaim when a character is an agent. Other risks include those applicable to generative AI at large, such as the spread of misinformation and over-reliance on agents.
This approach may not be practical enough to work in mainstream video games just yet, but it does suggest such a future is likely coming soon.
The same is true of the larger trend in agents. Current implementations are still limited, despite the hype. But connecting multiple algorithms—complete with plugins and internet access—may allow for the creation of capable, assistant-like agents that can carry out multistep tasks at a prompt. Longer term, such automated AI could be quite useful, but also pose the risk of misaligned algorithms causing unanticipated problems at scale.
For now, what’s most obvious is how the dance between generative AI and a community of developers and researchers continues to surface surprising new directions and capabilities—a feedback loop that’s showing no signs of slowing just yet.
Image Credit: “Generative Agents: Interactive Simulacra of Human Behavior,” Joon Sung Park, Joseph C. O’Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, Michael S. Bernstein