Game Evaluation
Overview
When Setupbooster is complete, it could be tested by recruiting volunteers to
play on one or more demonstration instances. Once volunteers had created accounts and logged on,
the site would direct them to begin the tutorial. To fully test the system's use of
gamification elements, the playtesting would also need to cover the creation and
rating of articles.
Normal User Testing
Most game actions are part of the tutorial. Usability and gameplay testing
would focus mostly on assessing users' experience with the playability of
the tutorial, asking them to rate the extent to which they felt it helped
them to understand the SetupBooster wiki system. A Google Forms survey would
be distributed so that users could report the total time taken to complete
each tutorial section and describe issues that were encountered.
Testing the full ratings and points system would be difficult due to
the reliance on user-generated content. Adding pre-made, pre-rated
articles to the database before the beginning of the gameplay testing
and tracking how their ratings changed could be a way of achieving
consistency between trials. All other data would vary
greatly between trials depending on the interests of the users.
These trials would take place with preset administrator users
(likely the people managing the trial) and volunteer users.
Administrator Testing
Like any points-based system, it is probable that the points system can and will be
gamed. As the plan for the final implementation of the system currently stands, all
new users, once they have completed the Basic Training badge, will start their wiki
experience with 155 points, plus 20 more for the badge, for a total of 175. Creating
articles earns 10 points, which can be negated by negative ratings.
Three of the possible issues are likely to involve behavior which it would be an
administrator's job to handle. There is likely to be a need for separate testing
with users who are granted administrator rights over a pool of users that are
actually being "played" by one or more of the people conducting the research.
In this scenario, users are played by the people or researchers that are conducting the trial,
and administrators would be the volunteers as the researchers attempted to manipulate
the system.
There are several potential issues:
- It's theoretically possible to spam short articles that are never read and
never rated by anyone, earning 10 points for creating each one. There is
nothing intrinsically wrong with creating many unread articles: not every article
will be popular. In theory, however, an editor or group of editors could create
large numbers of blank or nearly-blank articles with the intention of
their never being read, earning 10 points for each one. It would be difficult to do
this in the early days of a wiki instance when there would be fewer articles and this
behavior would stand out more, but on a high-activity instance it would be less
obvious. There will be badges (with points) awarded for creating certain multiples
of five articles (5, 10, 20, etc.), which would provide an additional cushion against
some of the low-quality spam articles being found and downvoted. This is compounded
by a current technical limitation: all articles are created as blank with a title,
then edited later.
- This could be tested by nominating administrators, then arranging for
someone to pose as a normal user while creating many spam articles. The
goal would be to see how quickly administrators would detect something
like this against the background noise of normal wiki activity.
- A spambot could become the highest-ranked user. This would be
caught by the administrators once it happened, but might reflect badly on the perceived
quality of the user base.
- This could be tested by creating an actual bot, or selecting a group of
people to pose as one, posting at regular intervals, and measuring how
long it took for this activity to be detected.
- Groups of editors could form alliances to upvote each others' content and
downvote that of others. This would be difficult for administrators to detect, and
difficult to deal with fairly - especially if administrators were part of the groups,
or the groups involved were some of the most frequent contributors to the site. However,
this is more of a people problem than a technical problem and would need to be dealt with
by administrators.
- This would be difficult to test, as it would require large number of
subjects (or a few subjects posing as many users) to be using the demonstration
instance in the first place.