Usability testing is the most powerful tool in the toolbox for game design improvement.
Getting real players interacting with your prototypes early and regularly in development can be uncomfortable. Yet it ensures teams are never out-of-touch with where design implementation is falling short, or headed for a design dead-end.
The usability testing exercise is able to discover new design flaws and actionable improvements on the topic of a game’s ease-of-use.
By maintaining a laser-focus on what players do and understand it leads to game-changing insight into foundational aspects of the play experience: controls, UI, instructions, audiovisual feedback, learning curve, and so on.
What makes a usability test work?
Decades of usability testing have established a standardised process for how they run. The approach is adaptable enough for any genre of game, but is standardised enough to know what types of data can be expected, and how accurate that data will be.
Here’s a breakdown of our usability testing standard:
A usability test always involves a member of the public — a ‘playtester’ — playing a prototype game.
Typically between 6 and 12 playtesters are invited for perhaps 90 minutes per person, one-at-a-time.
Playtesters are invited to “play as you would normally” and directed to a game mode that’s in design flux. They might be asked to ‘think out loud’ while playing.
A trained researcher moderates the session to gather behavioural data describing what the player did, in context. The researcher will also run a question-and-answer session with the player to establish why they did that.
These observations and questions are in service of design improvement:
Does this prove players will comprehend our game?
Does our design reliably nudge players toward the fun?
Can players use the controls and UI in every scenario?
Each aspect of the prototype design has some desired outcome in players’ comprehension of game scenarios and players’ behaviours. Afterwards, with all notes from all 6-to-12 sessions in hand, teams ask themselves:
Are we OK with our designs generating these observed outcomes in players? If not, what’s the design fix?
Each usability test is short, focused, and iterative, usually taking 3-to-6 workdays. These rapid quality-checks are repeated sprint-to-sprint to re-test redesigns and new content.
Perfecting your usability testing
While the logistics of a usability test can sound simple, there are tough strategic choices to make to get the most from each test:
What do we test first? Which playtesters should we invite? How often do we repeat tests?
Let’s dive into some tips for running insightful and rigorous sessions, and get the most from usability testing.
Formalise your ‘success factors’ for the first-time player
Teams will have objectives for what a new player will quickly understand.
Most video games act on these objectives by including a tutorial or pre-formulated play. But even games where players are ‘thrown into the deep end’ will have expectations for what’s easy or hard to learn, even if they are unspoken: WASD to move; zombies come out at night; starvation kills you quickly; hitting a tree gives wood…
Usability testing helps make these learnability expectations explicit, and then tests against them.
Before your usability tests, forge a list of ‘Things the player is supposed to understand in the first minute’, then -5 minutes, then -15 minutes, and so on.
Against each thing, note if that knowledge will be explicitly taught or hoped to be somehow intuitive.
Against each thing again, consider if there are unique in-game behaviours that indicate players probably correctly understand each.
“Inventory: In the first five minutes players will be explicitly told that they can hold 2 items maximum in their Quick Item Bar; we’d expect to see them ‘trading up’ as they find more valuable items later.”
This becomes a checklist for observing your usability testing session
If playtesters cannot explain to the researchers how these mechanics work, or they don’t exhibit your target behaviours: your teaching isn’t working and it needs a fix.
Identify which mechanics are unclear by design
Following on from above. Tutorials usually aren’t proposed to teach the player everything.
Every title has a unique threshold between teaching enough for the basics, and not so much to ruin players’ fun of learning through experimentation.
Defining that threshold is valuable. It requires teams to decide on mechanics, systems or strategies that are OK to be misunderstood or unrecognised by players, despite being exposed to them.
Learnability is ‘under the microscope’ in the usability test lab. We must have a sense of what’s a pass or a fail.
The prime example here is anything ‘metagame’ or associated with longer-term progression. Players need a more playtime to grok the ‘loop’.
Teams will need to be comfortable with playtesters demonstrating a weak understanding. And will need to be resilient to feedback that these elements “need more tutorials” or “this is confusing”.
The value of a newcomer
Playtesters are selected anew for each round from the general public. But they should never be just some random person or volunteer. Choosing the ‘profile’ of the playtester is paramount to avoiding wacky and misleading insights.
The most important elements to inviting the right people to playtest is their genre experience, platform experience, and recent game purchases.
Why these factors?
When testing the quality of your game controls, you don’t want to be handing the player a gamepad they’re unfamiliar with. Imagine the misleading conclusions one might come to, as your Xbox-savvy playtester repeatedly fails to complete a task, but unbeknownst to you are simply unsure which gamepad button is square or circle. Imagine the same scenario for a VR novice, or on a keyboard and mouse for a player used to a laptop trackpad.
We have to know playtesters’ platform experience for context.
The same is true of genre experience.
It’s not hard to imagine a genre-newcomer being utterly bewildered, compared to a genre-super-fan breezing through early game content. Parsing out what’s their pre-existing genre knowledge, versus what the game successfully taught the player afresh, is extremely difficult if you don’t control the player’s profile.
The balance to find is simple: select specific playtesters with minimum knowledge enough to be at ease, but not so much knowledge that your tutorials are folly. Seek out those members of the public who’ll really put your usability to the test, not those who’ll breeze through regardless. Simply ask about their most recent purchases or favourite games in related genres. There won’t always be a total overlap between your ideal test participants and players in the centre of your game’s target audience.
Bringing in genre-newcomers — folks who have played fewer games than your average fan — means specifically searching outside your community fanbase who would likely volunteer to test for free. That also means paying playtesters for their time.
“Is it Fun?”
Of course this is the number one question, top of every team’s mind.
Closely followed by:
“Is our game better than [genre leader]?”
“Would they buy it?”
One of the major drawbacks of usability testing is how unsuitable it is for establishing a top-down, holistic view of the game’s ‘fun factor’.
Four reasons why: playtesters, sample size, environment, and timing.
With only 6 to 12 players involved there’s simply not enough people involved to get a strong enough read. If there’s an overwhelming consensus (all 12 players say “level 3 is terribly dull”) then that’s a good indication there’s a problem, but opinions are rarely so stark. Rigorously polling for opinions requires at least double the usability test norm: 25 to 35 players minimum.
The usability testing environment itself isn’t suited to these types of ‘make or break’ questions either.
A skilled interviewer will quickly build rapport and draw the truth from playtesters. But if you ask: “would you consider buying what we’re selling, once it’s finished?” don’t expect the truth in reply. Out of courtesy or naivety, players will imagine the game being a thousand times better “once it’s finished”.
And that leads us to the most important factor: how usability tests should fit into the devcycle.
These tests improve the foundations of UI, controls, HUD, feedback — these are all typically solidified long before there’s much fun to be had. Certainly long before the build will compare favourably against the released titles that your playtesters will be using as their reference point.
Tackling ‘fun factor’ head-on is ill-advised during usability tests — wrong people, wrong place, wrong time and wrong question. Yet, a game with poor learnability or usability is rarely pleasant to play. Usability tests make games more enjoyable, even if seemingly indirectly.
Less “was it fun?”, more “could players get to the fun?”
Have a ‘next test’ question bucket
Each usability testing session tends to open a Pandora’s Box of new questions, concerns, insights and opportunities.
Watching user testing sessions as a group is incredibly valuable for this. Purpose-built playtest labs will include a soundproof observation room to keep devteam conversations flowing. A video livestream and a groupchat is the next best thing during social distancing.
A 90 minute session is typically 10 minutes of briefing the playtester and paperwork, 50 minutes of uninterrupted play, 25 minutes of 1:1 interview, and 5 minutes of wrap-up.
With sessions booked back-to-back, every interview question becomes precious use of time.
There is always a request for the moderator to “ask this player [XYZ]!” in the moment. A skilled facilitator will strike a balance: mainly following the interview script to capture answers comparable across all playtesters’, versus formulating new questions specific to this player or exploring unforeseen issues: going off script.
To help with this balance, ensure there’s a way to capture questions for the subsequent round of research.
A spreadsheet or chat channel to dump questions in, such that everyone involved can feel heard and have their concerns addressed in time, but without derailing the testing in progress.
And because usability testing can be scary and occasionally uncomfortable for the team, these ‘next test’ questions become a welcome reminder of a gain that’s worth the pain of doing this all again.
Design Some Representative Scenarios
Usability tests are designed to demonstrate natural play. Some UI design challenges can, however, sometimes only occur in uncommon or extreme situations that don’t arise in the short playtest sessions.
Only letting the usability testers roam freely might not expose nuanced design weaknesses.
To ensure playtesters are exposed to sensitive designs, it’s common to generate some scenarios for them to complete. For example, during usability testing on a third-person building game, we asked the playtesters:
“Could you try and build a big McDonald’s ‘M’, on some flat ground?”.
This activity forced players into a series of known-difficult steps: making curves, choosing a specific colour to build with, building above the characters’ head height, and so on. All in one elegant and intuitive instruction.
Build a basic research repository
If you can find the brainspace between iterations of your usability testing (!) spend some time capturing some metrics on how playtesters performed.
For example, if you’re building a series of ‘session goals’ or scenarios as suggested above, it’s worth tracking how game design pass/fails or errors change round-to-round.
Tracking the time taken to reach a major in-game milestone, or number of failed attempts, or number of total ‘high priority’ issues encountered on average: all valuable metrics. In doing so you’re plotting the game’s gradual improvement — and the positive impact of the research studies.
Video recording the usability testing sessions is a staple action, but taking the extra step of storing and categorising the recordings in Sharepoint, Confluence, Trello (or whatever you use) is worth the time saved digging through old files, should you want to review them again.
There is an emerging field of expertise in the building of these ‘research repositories’.
Getting new players in front of your video game prototypes is a powerful and inspiring exercise. Setting goals. Observing players at play. Iterating design.
There’s a true craft to streamlining every aspect of the process: minimising bias, ensuring playtester comfort, and maximising the value of the insights you gain.
If you’d like to read more about perfecting testing, check out Player Research’s article on how to make participants comfortable when you invite them to your studio: What Should a Playtest Smell Like?
For even more, here’s a fantastic talk on ‘success factors’ from Glu Senior Researcher Sara Romoslawski , and ArenaNet’s John Hopson penned an article on exactly why it’s troublesome to ask players “is it fun”.