User:TeaDrinker/Welcome study

From Wikipedia, the free encyclopedia

Abstract This pilot study investigates the effect of welcoming new users to Wikipedia before they edit. A randomly selected subset of 214 newly registered editors received a standard welcome template and their subsequent editing behavior was monitored for one week following registration. The welcome template seemed to increase the probability the editor would edit from 0.2, for non-welcomed users, to 0.298 for welcomed users (one sided p-value: 0.059). Several other variables are measured.

Contents

[edit] Introduction

Wikipedia has the primary goal of writing an encyclopedia, and around this work a community of editors has developed. The community exists to better the encyclopedia, although little research has been done to quantify this contribution. This study looks at the effectiveness of welcoming new users, prior to their first edit. The primary measured outcome is whether or not the user makes an edit.

Wikipedia's founder, Jimbo Wales, included in his Statement of principles the point that "New users should always be welcomed," although this was probably intended in a broader sense. Welcoming new users with a welcome template provides a contact within the community, a short collection of links intended to help find information on how to contribute, and is intended to provide positive reinforcement for contributing. This study examines the effect of this particular form of welcoming on the subsequent editing of the new user.

[edit] Materials and methods

From 02:27 on 4 May, 2007 to 19:59 on 10 May, 2007, 214 newly registered editors were selected from the top of Special:log/newusers. After selection, a pseudo-random number generator simulated in R determined if the user would be welcomed with equal probability. A total of 113 users were welcomed using the {{welcome}} template, under a welcome headline, and signed by User:TeaDrinker. The intent was to welcome the user before they made any edits, although it is perhaps inevitable that some users did not see the message until after they had completed their first edit.

The time of the editor's first edit was recorded as soon as it was seen, and subjectively classified into one of thirteen types. In cases of vandalism or creation of pages which fit the criteria for speedy deletion, the appropriate action was taken including warning the user. One week after the user created an account, the number of (non-deleted) contributions were tallied (although in most cases, the other edits were not examined or classified).

The time a first edit was made was recorded even if edit were subsequently deleted. Deleted pages, however, were not included in the end of the week count. Since Wikipedia does not keep an accessible record of deleted edits indexed by user, some first edits were likely missed. An examination of the talk pages of editors in the study showed one instance of a notification of a speedy deletion which was not recorded.

Several users were excluded from the study at various stages. One user was an obvious attempt to impersonate an administrator, and the username was reported to administrator intervention against vandalism and subsequently blocked. No further data was collected on this user. Two users were welcomed by other editors, and one new editor generated an edit before they were welcomed. These have all been excluded from analysis.

Although the users selected for inclusion in the study was done in a non-selective manner, the time of day at which they were selected was not random. Most editors were selected in the 9 hours between 1800 and 0400 (UCT) due to difficulties in acquiring a representative sample.

[edit] Results

Of the 211 editors included in the study, 112 were welcomed in the manner described above, while 99 did not receive a welcome. Of the 211 editors, 66 made an edit in the first week. The breakdown of edits by type are shown in the table.

Number of first edits by type
Total Description
Vandalized 17 Vandalism was broadly defined, and included traditional vandalism and clear advertising, even when the latter may have been done in good faith, and test editing on articles.
Possible Vandalism 2 Edits which were reverted, but were likely the result an error rather than an intentional mischief or misunderstanding policy.
Small edits 12 These were edits which were less than approximately two sentences, excluding edits which created a new page or were to revert vandalism.
Large edits 3 Substantive contributions greater than or equal to two sentences in length, excluding the creation of new pages or to revert vandalism.
Reverting vandalism 1 Edits which reverted or removed vandalism.
Image uploads 6 Uploading an image, whether or not it was marked for speedy deletion.
Userspace 5 Edits to the user's own userpage or talk page.
Discussion 5 Edits to other user's talk pages or article talk pages.
Sandbox/Editing help 4 Edits to sandbox, pages for test editing, or requests for editing help.
New pages, not CSD 2 New pages which did not fit the criteria for speedy deletion.
New pages, deleted 9 Pages created which met the criteria for speedy deletion.
Move pages 0 Page moves other than vandalism.
Other 0 Any edit which did not fall into one of the above categories

The principle aim of the study is to determining if welcoming users has an effect on their behavior. With this goal in mind, the principle metric is the number of users who make at least one non-vandalism edit. The proportion of users making non-vandalism edits was higher in the welcomed group, as shown in the table.

Number of edits by welcome status
Welcomed Not welcomed Total
Edits 112 99 211
Vandalism 8 9 17
Non-vandal edits 31 18 49
Proportion 0.2981 0.2000 0.2526

Under a null hypothesis that the proportions in the welcomed and non-welcomed groups were both equal to their combined proportion, the p-value for the hypothesis that the welcomed group were more likely to edit is 0.05845 (using a normal approximation with no continuity correction).

[edit] Other measures

There are other metrics which can be calculated in the data. In most cases, multiple hypotheses can be constructed for how each measure should be affected by the welcome message. Since many of these hypotheses can be generated for any number of possible outcomes of the data, it would be perhaps misleading to do a formal hypothesis test on each of these, at least without careful consideration to correcting for multiple comparisons. As such, these data are best viewed as exploratory and perhaps indicative of directions for future research, rather than definitive conclusions.

[edit] Time of edits

One measure which can be calculated is the time to the first edit. This measure seemed to have two distinct behaviors; many people edited within an hour or so of registering. If an editor did not make their first edit within roughly an hour, however, they tended to wait for hours or days before making their first edit. There was some variability in the outcomes by whether the editor received a welcome, as shown in the table.

Waiting time for non-vandalism first edits
Welcomed Not welcomed Total
Time < 60 min 28 14 42
Times < 60 min: Editors making an edit 0.9032 0.7778 0.8571

These data suggest that welcoming users increases the probability they make an edit in that session.

[edit] Total number of edits

Total number of edits is a somewhat complicated measure since it is dependent on how the user edits (see m:Edit counting). Some users tend to make many small edits, while others make multiple substantive changes in one edit. Among editors who edited at least once (although there may be some zeros due to deleted pages) and whose first edit was not vandalism, the median number of edits was higher in the welcomed group. The two editors with the highest counts were also both in the welcomed group, standing out at 44 and 46 edits.

Since each edit was not checked, it is possible some of these edit counts are vandalism subsequent to their first edit.

[edit] Discussion

The trend in both the principle measure of behavior seems to indicate users probably do respond positively to receiving a welcome, although the p-value does not rule out random effects. Approximately 20-30 percent of users do make a non-vandalism edit in the first week after editing, most of these edits occurring immediately following registration.

Using the estimates found in this study, a welcomed user is approximately 10% more likely to edit within the first week if they are welcomed. Since this was a randomized, controlled study, the difference in means between the welcomed and non-welcomed groups is probably caused by the welcome itself. As such, if 100 welcomes are given, 10 more editors are likely to be making a contribution to the project.

Further work, however, is needed to confirm this result and possibly investigate other questions, including

  • Which welcome template is most effective;
  • What effect a welcome template has if given after the user's first edit;
  • Are other measures more effective in measuring the welcome message's effect?