Game Evaluation


Overview

When Setupbooster is complete, it could be tested by recruiting volunteers to play on one or more demonstration instances. Once volunteers had created accounts and logged on, the site would direct them to begin the tutorial. To fully test the system's use of gamification elements, the playtesting would also need to cover the creation and rating of articles.

Normal User Testing

Most game actions are part of the tutorial. Usability and gameplay testing would focus mostly on assessing users' experience with the playability of the tutorial, asking them to rate the extent to which they felt it helped them to understand the SetupBooster wiki system. A Google Forms survey would be distributed so that users could report the total time taken to complete each tutorial section and describe issues that were encountered.

Testing the full ratings and points system would be difficult due to the reliance on user-generated content. Adding pre-made, pre-rated articles to the database before the beginning of the gameplay testing and tracking how their ratings changed could be a way of achieving consistency between trials. All other data would vary greatly between trials depending on the interests of the users.

These trials would take place with preset administrator users (likely the people managing the trial) and volunteer users.

Administrator Testing

Like any points-based system, it is probable that the points system can and will be gamed. As the plan for the final implementation of the system currently stands, all new users, once they have completed the Basic Training badge, will start their wiki experience with 155 points, plus 20 more for the badge, for a total of 175. Creating articles earns 10 points, which can be negated by negative ratings.

Three of the possible issues are likely to involve behavior which it would be an administrator's job to handle. There is likely to be a need for separate testing with users who are granted administrator rights over a pool of users that are actually being "played" by one or more of the people conducting the research.

In this scenario, users are played by the people or researchers that are conducting the trial, and administrators would be the volunteers as the researchers attempted to manipulate the system.

There are several potential issues:

  • It's theoretically possible to spam short articles that are never read and never rated by anyone, earning 10 points for creating each one. There is nothing intrinsically wrong with creating many unread articles: not every article will be popular. In theory, however, an editor or group of editors could create large numbers of blank or nearly-blank articles with the intention of their never being read, earning 10 points for each one. It would be difficult to do this in the early days of a wiki instance when there would be fewer articles and this behavior would stand out more, but on a high-activity instance it would be less obvious. There will be badges (with points) awarded for creating certain multiples of five articles (5, 10, 20, etc.), which would provide an additional cushion against some of the low-quality spam articles being found and downvoted. This is compounded by a current technical limitation: all articles are created as blank with a title, then edited later.
    • This could be tested by nominating administrators, then arranging for someone to pose as a normal user while creating many spam articles. The goal would be to see how quickly administrators would detect something like this against the background noise of normal wiki activity.
  • A spambot could become the highest-ranked user. This would be caught by the administrators once it happened, but might reflect badly on the perceived quality of the user base.
    • This could be tested by creating an actual bot, or selecting a group of people to pose as one, posting at regular intervals, and measuring how long it took for this activity to be detected.
  • Groups of editors could form alliances to upvote each others' content and downvote that of others. This would be difficult for administrators to detect, and difficult to deal with fairly - especially if administrators were part of the groups, or the groups involved were some of the most frequent contributors to the site. However, this is more of a people problem than a technical problem and would need to be dealt with by administrators.
    • This would be difficult to test, as it would require large number of subjects (or a few subjects posing as many users) to be using the demonstration instance in the first place.