Microsoft’s Muse AI Edits Video Games on the Fly (2025)

So far, AI has only nipped at the edge of the games industry with tools for art, music, writing, coding, and other elements that make up video games. But what if an AI model could generate examples of gameplay from a single screenshot?

That’s the idea behind Microsoft’s Muse, a transformer model with 1.6 billion parameters trained on 500,000 hours of player data. The result is a model that, when prompted with a screenshot of the game, can generate multiple examples of gameplay, which can extend up to several minutes in length.

“They have trained what’s essentially a neural game engine that has unprecedented temporal coherence and fidelity,” says Julian Togelius, an associate professor of computer science at New York University and co-founder of AI game testing company Modl.ai. “That has wide implications and is something I could see being used in the future as part of game development more generally.”

How Microsoft’s Muse Works

Muse (also known as the World and Human Action Model, or WHAM) was trained on human gameplay data from the multiplayer action game Bleeding Edge. The researchers trained a series of models on that data, which varied from 15 million to 1.6 billion parameters; the largest, which performed best, is the focus of a paper published in February in Nature.

Though innovative, Muse isn’t the first AI model capable of generating gameplay. Notable predecessors include Google DeepMind’s Genie, Tencent’s GameGen-X, and GameNGen. These earlier models generate visually attractive gameplay and, in many cases, do so at higher frame rates and resolutions than Muse.

However, Microsoft’s approach to developing Muse offers several unique advantages.

Unlike prior models, Muse was trained on real-world human gameplay data that includes image data from gameplay and corresponding controller inputs. Microsoft was able to access this data through Ninja Theory, a game developer owned by Microsoft’s Xbox Game Studios. Genie and GameGen-X, by contrast, didn’t have access to controller inputs and instead trained on publicly available image data from various games.

Muse also uses an autoregressive transformer architecture, which is uncommon for a model that generates images (gameplay, like video, is a series of images in sequence). Muse generates gameplay as sequences of discrete tokens which weave together images and controller actions. While Genie uses a transformer architecture, it doesn’t model controller input. GameNGen and GameGen-X, meanwhile, use specialized diffusion models to generate gameplay, and again don’t model controller input.

“What we’ve seen so far, is we haven’t been able to get the consistency with diffusion models that we have with autoregressive models,” says Katja Hofmann, a senior principal research manager at Microsoft Research.

The researchers built a frontend called the WHAM Demonstrator to show off the model’s consistency. It can be used to prompt Muse with a screenshot, which then generates multiple “continuations” of gameplay, each providing a different prediction of what might happen. Muse and the WHAM Demonstrator are available for download from HuggingFace.

Once generated, users can explore the continuations with a game controller. It’s even possible to drag-and-drop objects the model is familiar with straight into gameplay. The gameplay will update to include the object, which becomes a part of the game world. These objects persisted with a success rate of 85 to 98 percent, depending on the object inserted.

Microsoft’s Muse AI Edits Video Games on the Fly (1)Muse users are able to visually tweak the behavior of non-player characters (NPCs) and the environment by drawing directly onto the frame. Image or video references can also be used to influence, and subsequently choose from, scene generations.Anssi Kanervisto, Dave Bignell et al.

Building World Models

Microsoft’s announcement was careful to avoid calling Muse a complete AI game generator, and for good reason. While its generated gameplay clips are remarkably consistent even across several minutes of gameplay, the clips are generated at a resolution of just 380 by 180 pixels and 10 frames per second, which is far too low for an enjoyable gameplay experience. Muse is also limited to generating gameplay similar to Bleeding Edge.

These choices were made to keep Muse manageable; Hofmann says Muse was trained to be “the smallest possible models we can get away with to show what’s possible.” Because of that, she believes there’s room to improve the model’s quality.

Instead of pitching itself as a replacement for games, Muse is meant as a tool for developers looking to iterate on gameplay ideas. “You can create a sort of iterative loop. You can create multiple branches of predictions. You can go back, you can make modifications on the fly,” says Hofmann.

Muse also represents progress toward creating advanced “world models” that capture the dynamics of a real or simulated environment.

Models that generate gameplay, such as Muse and Genie, learn to predict gameplay across multiple modalities that span 3D graphics, 2D graphics, physics, and audio, to name a few. That implies AI models can be trained to form a more general understanding of a complex environment, forming a more wholistic world model rather than an assembly of disparate parts.

“In the past, to train a model on something specific, like jazz music, you would need to train to understand music theory, to have many rules and insights,” says Hofmann. “We now have a recipe for training generative AI models on this very complex structured data without a lot of handcrafting of the rules that underlie these systems.”

Togelius sees similar possibilities. He says a model like Muse could be used to iterate gameplay not only by generating gameplay, but also by creating world models that simulate an environment. That could in turn open new possibilities for probing and testing that environment, like turning AI agents loose to interact with and learn within the world model.

“This has a lot of implications for games, and also for things outside of games,” he says.

From Your Site Articles

  • DeepMind's New AI Masters Games Without Even Being Taught the Rules ›
  • Mind Games ›
  • DeepMind’s AI Shows Itself to Be a World-Beating World Builder ›

Related Articles Around the Web

Microsoft’s Muse AI Edits Video Games on the Fly (2025)
Top Articles
Latest Posts
Recommended Articles
Article information

Author: Francesca Jacobs Ret

Last Updated:

Views: 6516

Rating: 4.8 / 5 (68 voted)

Reviews: 91% of readers found this page helpful

Author information

Name: Francesca Jacobs Ret

Birthday: 1996-12-09

Address: Apt. 141 1406 Mitch Summit, New Teganshire, UT 82655-0699

Phone: +2296092334654

Job: Technology Architect

Hobby: Snowboarding, Scouting, Foreign language learning, Dowsing, Baton twirling, Sculpting, Cabaret

Introduction: My name is Francesca Jacobs Ret, I am a innocent, super, beautiful, charming, lucky, gentle, clever person who loves writing and wants to share my knowledge and understanding with you.