At this point in the evolution of machine learning AI, we are used to specially trained agents that can take complete control of everything from Atari games to complex board games such as: go. But what if an AI agent could be trained not only to play a specific game, but also to interact with his 3D environment in general? What if we focused on responding to natural language commands in the environment?
These are, as research engineer Tim Hurley put it, “not trained to win, but trained to do what you're told” and “scalable, instructable, multi-world These are the kinds of questions that are driving Google's DeepMind research group as it develops SIMA, a “agent''. Presentation attended by Ars Technica. “And not just one game, but a bunch of different games running at the same time.”
Hurley said SIMA is still “mostly a research project” and the results achieved in the project's first technical report show there is a long way to go before SIMA begins to approach human-level listening abilities. Emphasize that there is. Still, Hurley said he hopes SIMA can eventually provide the basis for AI agents that players can direct and talk to in cooperative gameplay situations. Rather than a “superhuman enemy,” we can think of them more as a “trusted partner.”
“This study was not about achieving high scores in games,” Google said in a blog post announcing the study. “Learning to play even one video game is a technical feat for an AI system, but learning to follow instructions in a variety of game settings can unlock AI agents that are more useful in any environment. It may be possible to cancel it.”
learn how to learn
To train SIMA, the DeepMind team focused on three-dimensional games, testing environments controlled from a first-person or over-the-shoulder third-person perspective. All nine games in the test suite, provided by Google's development partners, prioritize “open-ended interactions,” avoid “extreme violence,” and play in a wide range of different environments, from “space exploration” to “wacky games.” Provide interaction. Goat mayhem. ” To make SIMA as generalizable as possible, agents are not given privileged access to the game's internal data or control APIs. This system accepts only pixels on the screen as input and provides only keyboard and mouse controls as output, [model] humans have used [to play video games] The research team also designed the agent to work in games that run in real time (i.e., 30 frames per second), rather than slowing down the simulation due to extra processing time like other interactive machines. . learning project.
Although these limitations increase the difficulty of SIMA's task, agents can be integrated into new games or environments “off-the-shelf” with minimal setup and without the need for special training on the “ground truth” of the game world. It also means that you can. It is also relatively easy to test whether SIMA can “transfer” what it has learned from training on previous games to games it has never seen before. This could be an important step towards achieving artificial general intelligence.
SIMA uses videos of human gameplay (and associated timecoded input) in the provided games as training data, annotated with natural language descriptions of what is happening in the footage. Masu. As the researchers note in a technical report, these clips “complete in less than approximately 10 seconds” to avoid complications that can arise from “the wide range of instructions possible over long timescales.” The focus is on “instructions that can be given.” Integration with pre-trained models such as SPARC and Phenaki eliminates the need for SIMA models to learn how to interpret language and visual data from scratch.