When ChatGPT Meets the Sims
A piece of AI related news once again took the headlines back in April 2023. This time it’s in relation to a research paper dubbed <<Generative Agents: Interactive Simulacra of Human Behaviour>> which demonstrated findings from research carried out by various researchers from Standford University and Google Research.
How It All Works
The study conducted involved the placement of 25 generative agents in a simple, small simulated sandbox-like town environment (Smallville) that resembles that of The Sims. Through that, users are able to observe and intervene accordingly as agents “plan their days, share news, form relationships, and coordinate group activities.” It was interesting to observe the agents performing various daily tasks that we would normally do, such as waking up, cooking breakfast, heading to work, as well as forming opinions and noticing each other as they initiate conversations while remembering and reflecting on their previous days as they plan for the next ones. How it works in a nutshell, is that the research team enables the generative agents through an architecture which comprises of “a large language model to store a complete record of the agent’s experiences using natural language, synthesize those memories over time into higher-level reflections, and retrieve them dynamically to plan behaviour.” Through the use of natural language to facilitate interactions between one another, the generative agents are able to demonstrate forms of believable individual and emergent social behaviours. For example, when only a single user-specified notion of one agent wants to throw a Valentine’s Day Party, the other agents autonomously spread the word amongst one another, coordinate with others to show up and even go as far as asking another agent out on a date, akin to how humans would behave in such a real-life scenario.
Unlike traditional in-game environments which would require manual scripting for tens of the characters’ behaviour if we wanted Smallville to host such an in-game Valentine’s Day party, the gamechanger in this instance is that all the above actions could all be facilitated from a single user-generated seed suggestion.
Information about the research can be further broken down into the following segments:
Generative Agent Behavior and Interaction
Inter-agent communication: The agents interact with the world through their actions and with each other using natural language. Eg: “(Agent-xxx) is getting ready for bed”, in which this interaction is translated into movements within the sandbox world and represented as a set of emojis through a language model. As agents communicate amongst themselves, they are also cognizant of other agents in the same area, where the generative agent architecture would determine if they are just walking by or actually engaging in a conversation with one other.
Environmental Interaction: The environment in Smallville resembles that of a small village which contains your usual places like a bar, house, store, café, park, school and contains even specific functional objects like a bed, stove & others. Just like in a video game, agents in Smallville move around by entering buildings, approaching other agents and navigating the map. They can also influence the state of objects in the map, such as for example a refrigerator is displayed as empty if an agent uses up all the ingredients.
Emergent Social Behaviours
Social behaviours between agents that allow them to perform tasks such as exchanging information, forming relationships and coordinating for activities are emergent, rather than programmed. This notion can be explained through the following 3 segments:
Information Diffusion: Involves the spreading of information through dialogue between agents as they converse. Example: Agent Sam telling Agent Tom about his candidacy in the local election which the words spreads and it becomes the talk of the town.
Relationship memory: As agents form relationships overtime, they remember their interactions with other agents. Example: Agent Sam meets Agent Latoya who mentions she is working on a photography project. Their subsequent interaction shows Sam asking Latoya how her project’s going.
Coordination: This coordinative behaviour alludes to the Valentine’s Day example mentioned earlier on where one agent plans a party and the other agents coordinate with each other to attend. Example: The user only set 2 clarifications: Agent Isabella’s initial intent of throwing a party and Agent Maria’s crush on Agent Klaus, in which the following scenario unfolded: Isabella plans for a party from 5–7pm on Feb 14th at a café, decorates the café when she saw Maria (a close friend) and asks her for help which she agrees to. Maria invites Klaus (her crush) to the party, which he accepts. 5 agents in total (inclusive of Klaus & Maria) shows up for the party. Happenings such as putting up decorations, asking/inviting others out and interactions were all initiated by the agent architecture.
Generative Agent Architecture
It is known that “generative agents take their current environment and past experience as input and generate behaviour as output”. The ‘magic’ that allows for this to happen lies within the agent architecture itself, which combines a “large language model with mechanisms for synthesizing and retrieving relevant information to condition the language model’s output on.” Due to having to retain large streams of events and memories produced by the agents, the architecture would have to ensure that only the most important parts of the agent’s memory are retrieved and synthesized when required. The core part of the architecture is the memory stream which is a database that stores the records of an agent’s experience. From there, relevant records are extracted when having to plan the agent’s action while allowing for the proper reaction of the agent to the environment, as the records are “recursively synthesized into higher & higher observations that guide behaviour. With the current implementation of ChatGPT, everything in the architecture is recorded and leveraged off a large language model (LLM).
- Memory and Retrieval: Given the challenge that involves generating agents requiring reasoning that far exceeds than what should be described in a prompt, the memory stream is designed as such that it maintains a comprehensive record of the agent’s experience, and consists of a “list of memory objects, where each object contains a natural language description, a creation timestamp and a most recent access timestamp.”
The most fundamental aspect of the memory stream is an observation which is a direct event perceived by an agent that can be in the form of behaviours that are performed by the agents themselves or perceived from others. The architecture implements a retrieval function that takes in the agent’s situation as input and returns a subset of the memory stream. The following three components play a role in influencing what the agent deems as important when deciding how to act, given the various possible implementations of a retrieval function.
Recency: A higher score is assigned to memory objects that were recently accessed. Example: Events from a moment ago or this morning would remain in the agent’s attentional sphere.
Importance: Differentiator between core from mundane memories. Memory objects perceived as important by the agent are assigned a higher score, while mundane ones a lower score. Example: Eating breakfast = low score, Break up = high score
Relevance: Memory objects related to the current situation is assigned a higher score. Relevance to ‘current situation’ is conditioned on a query memory. Example: Query of a student’s discussion about a test would have low relevance to memory objects about their breakfast and high relevance to their teacher or schoolwork.
With the 3 elements normalized to the range of [0,1] by min-max scaling, the retrieval function is ultimately scored as a weighted combination of all 3: 𝑠𝑐𝑜𝑟𝑒 = [𝛼 𝑟𝑒𝑐𝑒𝑛𝑐𝑦 · 𝑟 𝑒𝑐𝑒𝑛𝑐𝑦] + [𝛼 𝑖𝑚𝑝𝑜𝑟𝑡𝑎𝑛𝑐𝑒 ·𝑖𝑚𝑝𝑜𝑟𝑡𝑎𝑛𝑐𝑒] +[𝛼 𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑐𝑒 ·𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑐𝑒].
2. Reflection: Known as a second type of memory, reflections are “higher-level, abstract thoughts generated by the agent”, and are “generated when the sum of the importance scores for the latest events perceived by the agents exceeds a certain threshold.” When the retrieval function occurs, reflections are included alongside other observations. The reflection process occurs as the agent determines what to reflect on, which is followed through the identification of questions asked given the agent’s recent experiences. For example, recent records that are extracted from an agent’s memory stream would include experiences of what Agent X does. From then on, questions which answers can be answered from the above experiences/statements are generated, and are subsequently used as queries for retrieval alongside the gathering of other relevant memories.
- Planning and Reacting: Plans describe the future sequence of actions of agents, and allow the actions displayed by agents towards situational information look more coherent and believable. Plans, just like reflections are stored within the memory stream and are included in the retrieval process. Hence when deciding how to behave or react accordingly, the agent can consider all aspects of observations, reflections and plans as a whole.
In terms of reacting, “generative agents operate in an action loop where, at each time, they perceive the world around them and those perceived observations are stored in their memory stream.” Those perceived observations are then prompted to the language model to determine if the agent should proceed with their initial plan or react accordingly to the current scenario on hand.
This is then followed by a dialogue which involves agents conversing as they interact amongst one another. The dialogue is generated through the conditioning of their utterances on their memories about one another.
Implications
This experiment presents a breakthrough in terms of the areas of social simulacra and social prototyping, and explores the various implications such technological advancements would have on society. Such enhancement of large language models-based simulacra alongside the introduction of autonomous AI agents which display interaction human-like behaviour are anticipated to have profound impacts on some of the following industries or areas:
- Gaming: Non-playing characters (NPCs) in games could provide a more interactive, immersive and realistic in-game experience for players.
- Organisations: Corporations could construct simulated closed-end systems analyzing how humans might behave amongst one another, and then form potential incentive structures which could produce optimum results for the company.
- Virtual Assistants: Imagine using generative agents which are tailored according to users’ needs and preferences for daily tasks, leading to more personalized and effective experiences.
- Healthcare: Autonomous Generative Agents could potentially work alongside medical and health professionals in providing support with daily tasks such as medical diagnosis and even palliative care.
- Education: Personalized better tailored educational tools and learning methods to cater specifically towards the specific needs of students, even those with special needs.
Other areas include those that incorporate and involve social aspects or elements. That being said, despite the potential that generative agents bring, there are also always ethical concerns and societal risks at the other end of the spectrum. One of the more predominant risks as outlined in the research associates with people forming parasocial relationships with generative agents. Therein lies the tendency for individuals to anthropomorphize generative agents and attach human emotions to them even though the agents are computational entities. This is already true in the sense, that the dating industry has seen an influx of individuals resorting to AI chatbots as a source of romantic companionship, as well as an uptick in general users that take to AI companions as a form of socialization. Hence, it is imperative that there are guardrails in place that help mitigate such risks or to address concerns as such that AI can be used ethically and responsibly given its inevitable rise in today’s world.
Conclusion
Overall, it just feels surreal to fathom how ‘artificial’ beings are able to learn, memorize, reflect, plan, communicate and interact akin to how we as humans do. This exemplifies the notion of emergent behaviour and brings us one step closer to demystifying the concept of complexity theory, and perhaps even making progress towards creating artificial general intelligence (AGI). As AI research intensifies, expect the lines between what we perceive as being ‘real’ and ‘artificial’ begin to blur as society finds ways to harness the power of such technology to enhance productivity and improve standards of living while not overstepping ‘boundaries’ at the same time.