The Indie Gaming Cookie: Game Review Scores Don’t Make Sense (In the Context of the Medium)

(ORIGINAL POST DATE: NOVEMBER 9, 2024)

https://cookiejarwp.wordpress.com/2024/11/09/game-review-scores/

P.S.: Hey, uh, March 2025 CookieJar here. I don't really fully agree with this article anymore, and while I do certainly think it holds some merit, especially with regards to people needing to read reviews more often, I think the article unfairly disregards the real value that critic scores hold, and as a whole, it sounds incredibly reactionary. If you manage to agree with it though, I wouldn't blame you. There really still is a problem with how we review games today. And, certainly, numeric scores are fairly flawed. But removing them entirely might not be the best way to go about it in retrospect. Apologies!

Ah, game review scores: you love ’em, and you’ve most likely hated them. Every other month, there always seems to be new fuel to the echo chamber flames; whether ignited by the latest controversial IGN review or by another case of Metacritic bombing, most of the discussion generally centers around one number: the score. Plenty opinions have been had regarding these numeric summaries, but one particular idea graced my mind while scrolling through the hellish depths of social media takes, one which has only ever stuck with me since, and it is the idea that, in our attempt to review games with the same system “initially designed for movies”, we unintentionally garnered the production of “movie-like” games.

Unfortunately, I’ve lost the original source for this quote. What I’ve stated here is a paraphrasing at best, and perhaps a complete misremembrance altogether, and I would not like to argue about games becoming more “movie-like” either. I do, however, want to discuss the provocative first part of the statement, which claims that the current system we use to score games was simply never designed for games. It made me question whether or not the way we score the medium is the “right” or “wrong” approach, where this system works, and where it inevitably falls apart. Specifically, it made me ask:

How are video games traditionally scored in reviews?
In what ways does this traditional system work? In what ways does it break?
What are the alternatives we could use in reviewing games? What are the shortcomings of these alternatives?
Finally, is a systematic change in the way we score games worth pursuing?

Throughout this article, I aim to answer these questions, gather insights, and form an argument against traditional, single-numeric review scoring practices that hinder the way we discuss and view games today.

Game Review Scores: An Overview

Game review scores, by and large, are singular numbers, usually between 0-5, 0-10, or 0-100. These scores are oft accompanied by review text and are meant to act as a representation of the game’s overall quality. Scores are prescribed even more value when compared with external, non-numeric meaning, like adjectives. IGN’s overview of their review practices highlights the meanings of each of their discrete possible ratings by likening them to adjectives, wherein a 5 signifies the game is “Mediocre”, an 8 signifies the game is “Great”, a 10 signifies the game is a “Masterpiece”, and so on.

The purpose of the review score, as described in this Lotus Eater blog post is threefold. One, review scores are meant to reflect a game’s level of quality in a way that is accessible to the general public. Two, review scores are meant to reflect the critic’s individual taste and experiences with the game. Finally, review scores serve as quantifiable measures that can be used for aggregation and general comparison — though, notably, not micro-comparisons — in that we can generally call a 10/10 game better than a 6/10 game, but not necessarily a 9/10 game.

*Screenshot of IGN’s review of Mario & Luigi: Brothership*

How reviewers actually arrive at their score varies greatly from reviewer-to-reviewer, platform-to-platform. Some critics think of one unifying and summarizing number that captures the brunt of their feelings toward the game, but they often do so at once, usually after they’ve already written their review. Other critics may follow some criteria unbeknownst to the reader which they then aggregate. Noclip founder Danny O’Dwyer described in his tweet GameSpot’s early methods of calculating review scores, in which critics had to punch numbers into a mysterious black-box form, spitting out a score without anyone knowing the algorithm behind it. The existence of these differences in scoring should not be surprising. After all, criticism is a subjective art, though one thing remains in common in all of them; the same outcome: a single, all-important number.

For the most part, this current critic scoring system seemingly works well. Along with short review summaries, review scores recapitulate a game’s overall quality into an easily understandable metric, aiding consumers looking for a product worth bang-for-their-buck. They’re especially effective when viewed through a vacuum, as pointed out by Anderson in his blog about game review scores, where an 8/10 could only mean that a game achieved 80% of what it set out to do, with the goal of reducing comparative bias.

Remarkably, it may come as no surprise that this system is still largely flawed, not only carrying the same issues traditional scores already have but also bearing completely unique problems when applied in the context of gaming.

Where Problems Arise

Many have pointed out plenty of shortcomings underlying video game review scores — some, already inherent to the system. One example is the “bastardization” of the review, the phenomenon in which consumers tend to focus solely on review scores in their decision-making and even in their discussions, outright forgetting to read the actual review. In fact, a study by Kasper et al. (2019) found a significant portion of gamers who based their perception of review helpfulness on nothing but the score, leading the researchers to believe a good chunk of consumers no longer read review text. But this is a problem already seen in everyplace that uses the same system.

What fascinates me more are the problems specific to the gaming industry, whether it be the rampant issue of score inflation publications seemingly wield — like IGN giving 3/4ths of their game reviews a score greater than 6 (see: Josh George’s IGN’s review scale makes no sense), or the general inaccuracy that people feel are portrayed by these scores — often resulting in critics receiving much backlash. As already screamed by others in years past, like Carl in his 2017 blog post “Do Game Review Scores have any Meaning?”, or Matt Edwards in his 2020 SUPERJUMP article “Review Scores and Toxicity”, and which I reiterate here: game review scores do not adequately reflect player experience.

However, I believe the truth behind these issues lies deeper at the heart of the medium to which these very scores attempt to attach themselves. In fact, I would like to go even further and discuss why I believe gaming scores do not act as a good mirror for the general consumer, which boils down to two inherent facts about the medium of gaming. Two inherent facts that simple, traditional, single-numeric review scoring systems do not and can not fully encapsulate: gaming being multifactorial, and gaming being personalized.

Gaming as a Multifactorial, Multidimensional Medium

Merriam-Webster defines multifactorial as an adjective, meaning “having, involving, or produced by a variety of elements or causes.” I found the idea of multifactorial analysis from reading about the captivating processes behind survey questionnaire creation. Christiaan Verwijs in his article “How (Not) To Construct A Proper Questionnaire” showcased the concept of “latent factors”: factors unmeasured, but not unmeasurable. Say you’re surveying folks to look at their productivity; how would you even begin to measure it? Human behavior tends to be complex and involve a myriad of causes and effects, and in the case of behavioral studies, a single metric is often not enough. Thus, many latent factors must be weighed into consideration to create an accurate profile report of a population.

In much the same way complicated respondent behavior is measured by various latent factors, video games are comprised of many different elements, too. Gaming is an inherently interdisciplinary medium, in ways far more incomparable, and existing in forms far more incompatible to that of other media. Whereas music is audio and TV is audiovisual, gaming can be described as “audiovisualtactile”, and while these media have over the years started to merge in direction and discipline, this fundamental truth remains wholly intact. One could even say that game development is the ultimate combination of all the digital arts. No matter which way you spin it, games do require more effort, they do require broader skillsets, and, certainly, they do require more nuance.

It becomes even more complicated, then, when you realize much of that nuance is inapplicable to a vast majority of the medium; in other words, the criteria required in game critique are not as set-in-stone as perhaps people realize. Sure, games do have elements common to most or all of them. Gameplay, story, art, visuals, sound, and music are only some of the many aspects this medium can often include, and these are by and large what you would often hear in related conversations, but the inclusion and concentration of these aspects remain largely selective and context-dependent. Do you think every game consists of these same principles? Do you think every game should be seen on the same gameplay basis? Do you think every game has the same class of storytelling, if they even have a story at all? Indeed, it would not be very wise to critique visual novels based on their gameplay, nor would it be very wise to critique open sandbox games based on their narrative.

Art should always be viewed with no less substance than that which takes to make them. But instead of attempting to capture that substance, gaming publications have widely constrained themselves to a one-number system, one from the times of old, and one which could only ever hope of circumscribing the complexities of the medium. Otherwise, it would need to be able to answer uncomfortable questions like, “How much did gameplay weigh to the experience? How much did the game’s mechanics affect the score? What about the story? The music?” These questions can certainly be answered by review text with words of incalculable depth, but they could never be answered by a single, lonely score.

Moreover, scoring games in this way can feel rather restrictive and reductive. A game may be as enjoyable as a modern classic, but if the story is less interesting than drying paint, does it deserve a flopping score? Many reviewers would tell you.. no! An enjoyable game with an uninteresting plot would not deserve a bad score, because it is still enjoyable, and if you so much as extend this logic to other factors of a video game, you might just get one of the unspoken reasons for score inflation. Apart from critics not wanting to review games they know are already below 6 (i.e., games so bad they’re not even worth critiquing), and apart from their hesitation to give games brutally low scores which would well affect the livelihoods of their developers, critics are also generally hesitant to give games low scores because of the multitude of factors that can at the very least provide enjoyment in the absence of others. Besides, if all games contain something valuable, do any of them really deserve a low score? Maybe review publishers believe a game has to suck in every single aspect before they could have the gall to give it anything below 3. It’s akin to what my friend told me when I asked him about his opinions on the matter,

“Sometimes, […] games can have the same score but be wildly different in enjoyability because one part of it carries the score.”

And, look, even if the usage of one metric to quantify a game’s caliber makes sense (despite the system not ever being built for the medium), and even if it is the intuitive, simple approach we’ve been using for decades, single-numeric scores still lack so much descriptive ability that to any consumer with above average quality-consciousness, they’re practically worthless.

This is even more true when you consider that the sheer interactivity and freedom gaming allows create so many unique experiences really no other form of media can provide, making them feel far more personalized.

Gaming as a Boundlessly Interactive, Personalized Medium

In my journey to find light within this dark conundrum, I went ahead and asked my friends for their thoughts on the matter. Specifically, I asked them all a simple question, “What do you think about game review scores?” Some didn’t carry a strong stance, and those who did provide their insights I’ve quoted throughout this article, but one answer in particular stood out to me,

“Game review scores just kind of feel disingenuous.”

I felt shocked when I first heard this. Disingenuous? Really? Where did that distrust come from? Undoubtedly, much of this distrust can be attributed to the bad rep many gaming publications have received over the years, with reviews constantly being published disagreed with by a sizeable portion of the majority. Though, I believe that at the heart of this player-critic disagreement lies the second fundamental truth about gaming which single-numeric scores are not equipped to handle. Games are personal.

When you think about the sentence, “everyone experiences art differently,” you often think of it as a rhetoric — an expression highlighting the cognitive uniqueness of man underlying his subjectivity. We experience art differently in the sense that we perceive it differently, that we analyze it differently, and that we derive different messages from art depending on who we are and what our role is in society. But when we say, “everyone plays games differently,” that’s just literally true. Not only are there perceptive, emotional differences in the way we experience games, but there are also tangible differences in the way we experience games.

This can easily be seen in, say, the bugs that a player experiences. A critic may find a whole post-rainy season worth of bugs in one game, severely ruining their otherwise incredible experience, but these bugs may not be as prevalent in another critic’s playthrough, allowing them to find a lot more value from the work. That’s just one example of so many, and to the critic tasked to review procedurally generated sandbox games like Minecraft, or random dungeon roguelikes like The Binding of Isaac, where every outcome is to a great extent decided by personal choice and contingencies: good luck on giving those games scores.

We haven’t even thought about the differences player skill, or individual taste, or socioeconomic circumstances make to a player’s experience, because honestly, we don’t need to. We need not waste time as it should be easily understood at this point that video games, by their very nature, are different for everyone and anyone, in ways no other form of media possesses. TV shows and films in their linear nature can at least rely on everyone’s tangible experiences being the same, even if their perceived experiences are not. You watch shows like everyone else, read books like everyone else, and listen to music like everyone else, but you don’t play games like anyone else. Games don’t have such luxury and must continuously design themselves around player decision all the time, and single-numeric, unidimensional scores cannot capture the whole host of decisions individual consumers can make. Arguably, no metric can. Because while cinephiles are at the mercy of the film, games are at the mercy of the gamer.

Potential Alternative Methods of Scoring Video Games

So, if single-numeric, unidimensional scores don’t suffice for the medium, what are the alternatives?

The first thing we could attempt to naturally address is the dimensionality of the score, as we may opt instead for multidimensional scoring systems, which are surprisingly commonplace and are being practiced by plenty of lesser-known review publications to this day. Such a system would involve looking at individual aspects or criteria of a game rather than thinking of the game heuristically as a whole. This not only captures a lot of the nuance I referred to in the previous sections, but it’s also generally more informative to consumers (especially those who are looking for specific qualities in games), and it also has the added bonus of making the job of the critic easier by allowing them to segregate their feelings on the game to its constituent factors. Hearkening back to my discussions with my friends, one of them agrees with this particular practice, stating,

“[Game review scores] should have definite criteria, personally.”

However, this modus operandi has its fair share of glaring flaws. For one, as stated by the aforementioned Lotus Eater blog, it has the potential to complicate the scoring process in a way that makes it less functional as a heuristic. Although I believe it works both ways, in that if the individual criteria themselves are seen as no more than heuristics in their own right, multidimensional scoring can still work and can, in fact, far improve the system’s descriptive ability.

The more glaring issue for me is one mercilessly portrayed by this satire Tweet from user HotCyder, which is the issue of arbitrariness. Choosing which metrics and which dimensions to include is not founded upon strong theoretical precedence. Some sites, like RPGFan, divide their score into Graphics, Sound, Gameplay, Control, and Story. But, like, these criteria are arbitrary, are they not? We could have just as easily included, say, Accessibility, or we could have removed Control and combined it with Gameplay. When I asked my friend exactly what criteria they would like to implement, they stated a “50/50” divide; 50% of the score is for the game by itself, and 50% of the score is for the game in relation to its genre.

A more extreme example of multidimensional scoring is the “metric scoresheet” used by The Longhouse Podcast as seen above, featuring droves of criteria grouped into many categories. I attempted to reach out to them to ask about their thoughts on traditional, unidimensional review scores and their reasoning for choosing such criteria, but unfortunately, I was not able to get a response.

Choosing criteria that are (a) broad enough to encompass the vast majority of games out there and (b) specific enough to communicate subtleties, is hard. I myself probably prefer RPGFan’s criteria most, but the same does not go for everyone at all. Likewise, the rationale behind the titan that is the metric scoresheet above is one that I can respect, but I can’t help feeling like such hyper-specific criteria will only lead to checklist-style critique rather than real appreciation of the art.

One may also consider the idea of “dynamic criteria”, in which only the factors applicable to the game are assessed. For instance, an entirely linear narrative game may ditch the Gameplay criterion altogether. In terms of showcasing the differences between games, this is great, but this also makes life a nightmare for review publishers and consumers alike, what with having to sort through every review to condense individual criteria instead of having one global criteria. However, if we instead choose to keep one criteria for all games, it runs into the problem of potentially being too rigid, especially for a medium that constantly breaks its own boundaries.

Alternatively, instead of prescribing a numeric score to video games, we could directly prescribe adjectives or verbs to games, like how Anderson in his blog describes a world where we simply categorize games based on how much we recommend them, following the adjectives “Avoid at all costs”, “Try if you like the series/genre”, “Try”, and “Buy”. This system does alleviate the pressure of having to prescribe numeric, arbitrary scores to games, and it does retain the purpose of informing the general public about a game’s inherent quality. However, it loses a lot of communicative power and the ability to make comparisons about game quality. If you do not care so much about such discussions and merely want a summary of whether or not you should purchase what you perceive to be products, this system certainly works, but for those looking to engage with the conversation at a deeper level, such a system may not exactly suffice.

One must also consider the fourth, secret purpose of review scores that I have yet to discuss, which is their use in marketing. Game publishers love to flaunt how their releases get 10/10 from various media outlets, and such consumer-focused methods may not nearly be as marketable. The data “most critics say that you should buy the game” is sadly not as easy to sell as a simple “100/100”. But you know what is easily marketable? Words.

The last alternative system, and one which I perhaps agree with most, is to have no system at all — a complete departure from scoring altogether, instead featuring nothing but review text. Plenty of publications have lived off numeric silence, and it’s not exactly an impossible change to implement, though it will unquestionably prove to be difficult. Scores drive discussion, and they especially drive clicks. Not only that, but truthfully, scoring games can also be fun. Reaching into that level of abstraction in order to take what is a deeply complex medium and turn it into a single score is one hell of a challenge, but it is an exercise quite rewarding for the critic and the gamer alike. But our hyperfocus on scores — scores that really don’t even make sense in the context of the medium — has twisted what could’ve been insightful conversations about that very medium, and I believe that after all the time passed, we should finally strive for change, once and for all.

“The focus needs to shift to listening to what is actually said.”
– Matt Edwards, 2020, in Review Scores and Toxicity.

The Implications of Change

Throughout this article, I’ve answered three of the four questions I set out in the beginning. I’ve answered how video games are scored traditionally; how this system breaks; how we could shift to alternatives and how they may introduce new issues of their own. But there remains one unanswered question: “Is a systemic change in the way we score games worth it?”

To be honest……

I don’t know. As much as I can sit here and point out the issues with the current review practices that belie the gaming industry, and as much as I have the freedom to suggest alternative approaches it could take, I can only do so much as speculate about its future with said alternatives, and whether those imagined futures truly are brighter than the now. I can’t deny that as flawed as single-numeric, unidimensional scores are for gaming, they still do contain critical merit, enough that we could even continue using them to this day. But critical merit does not translate to communicative power, and it especially does not translate to accurate representation of a game’s “quality”, as abstract and complex of a concept that truly is.

Admittedly, I’ve likely failed to consider many things too, like the implications of such changes for the publications themselves, or perhaps the consumer response, or how such systems would feel for the critics themselves to utilize.

Certainly, I don’t claim to have the answers. I don’t. And I also don’t mean for this article to jeopardize the already quite endangered gaming critic who even now has to suffer in such a dangerously hostile space. This is not meant to start a “who’s who” finger-pointing game because I don’t believe this to be the fault of one person or group.

If anything, the continued utilization of the current practices today shows our human stubbornness and the burden we must carry with it. The best I could do is to analyze, to argue, and to inform, in the meek hopes that things would change for the better. Because if nothing changes, then nothing will.

Conclusion

“At the end of the day, review scores are only worth as much value as you give them.”
– Carl, 2017, in Do Game Review Scores have any Meaning?

As video games become more and more varied and boundary-breaking over the years, it becomes clearer and clearer still that this single-numeric scoring system was never built for the medium. We could’ve never expected just how different games would be from each other, and the inadequacy of these metrics is slowly being felt.

When we started some four to five decades ago the adventure of the video game, we never could have expected where we would lead the medium. Gaming used to be seeds on the ground that we once drew on crusty paper with jet-black ink. Over its many years, gaming has blossomed into a beautiful rose of curling, velvet petals, opening in ever-so-slightly unique ways to every person who touches it. But did we choose to capture that rose in high-resolution video? Did we choose to seize the beauty of that rose in poignant prose? No. We chose. And we chose to continue drawing it, on crusty paper, with jet-black ink.

Gaming has evolved so much since its birth, yet our scoring practices have not aged with it. Maybe we need to fundamentally change the way we view these numbers. Or maybe we just need to take a break.

Lotus Eater’s blog post on review scores is a highly recommended read in parallel to this article, which indulges in the current review practices seen today but views them under a critical light, rather than the outright rejection of the system that I’ve argued for.

The Indie Gaming Cookie

Wednesday, March 19, 2025

Game Review Scores Don’t Make Sense (In the Context of the Medium)

Game Review Scores: An Overview

Where Problems Arise

Gaming as a Multifactorial, Multidimensional Medium

Gaming as a Boundlessly Interactive, Personalized Medium

Potential Alternative Methods of Scoring Video Games

The Implications of Change

Conclusion

No comments:

Post a Comment

"Cozy Combat": The Garden Story Contradiction

Contact Me

Labels