User:Durin/Admin nominee charts

From Wikipedia, the free encyclopedia

Edit count analysis chart example
Edit count analysis chart example

I frequent WP:RFA. When I see a nominee with <2000 edits I will often create a chart (see example on the right) showing the nominee's edits per day and total number of edits over time. If meaningful, I will also add an average # of edits per day line. From time to time, I get asked "How did you make that?". I've created this particular little sub page to explain how I made those charts.


Contents

[edit] Simple answer

I collect data on the user's edits by looking at their user contributons (example) and put this into a Microsoft Excel spreadsheet. I then parse the data into a usable form, and summarize by day the number of edits. This then provides a lookup table to use to generate a complete table across time for the user's edits, beginning with their very first edit through the current day. I then generate a chart based on this complete table, and upload the chart to Wikipedia.


[edit] Complex answer

You're either really bored, or really geeky if you want to read this section! If you fall asleep and whack your forehead on the keyboard, don't blame me! :)

To do the graphs, I've created a template which processes the raw data into a form that is usable for the purposes of the chart. How this chart is craft could be better; the state it is in now is really just an interim step, and works fine for now. If I become motivated enough, I may improve it. Nevertheless, here's the guts of the template spreadsheet that I dump the data into:

There are 12 columns in the chart. These are, along with the contents of the cells in each column:

Column Titled Contents
A Raw Data from edit contributions cut/paste
B Edit Data =IF(A2<>"",RIGHT(A2,LEN(A2)-FIND("--",A2)-2),"")
C Raw Date =IF(A2<>"",RIGHT(LEFT(A2,FIND("--",A2)-2),LEN(LEFT(A2,FIND("--",A2)-2))-7),"")
D Time =IF(A2<>"",TIME(LEFT(A2,2),RIGHT(LEFT(A2,5),2),0),"")
E Day =IF(C2<>"",LEFT(C2,FIND(" ",C2)-1),"")
F Month =IF(C2<>"",RIGHT(LEFT(C2,LEN(C2)-5),LEN(LEFT(C2,LEN(C2)-5))-LEN(E2)-1),"")
G Year =IF(C2<>"",RIGHT(C2,4),"")
H Month # =IF(C2<>"",VLOOKUP(F2,$O$2:$P$13,2,FALSE),"")
I Date =IF(C2<>"",DATE(G2,H2,E2),"")
J Raw Count =IF(C2<>"",IF(I2=I1,1+J1,1),"")
K Day Count =IF(C2<>"",IF(J2>J3,J2,""),"")
L Sum Date =IF(C2<>"",IF(K2<>"",I2,""),"")

(note: the "If(C2<>"",..." notation is used to not generate results if there is no data to work on. I have populated my template all the way up to row 10,000 with these equations, and having those cells filled with errors is not desirable. Checking for the existence of data to work on prevents this)

To collect the data on the nominee, I view their user contributions using a show limit of 2000 (since I don't normally do this for people with >2000 edits). Here's an example of this, using myself (though I have more than 2000 edits).

Column H uses a lookup table in columns O and P starting at row 2 with the following contents:

January 1
February 2
March 3
April 4
May 5
June 6
July 7
August 8
September 9
October 10
November 11
December 12

After I cut/paste the data, I then replace "(hist) (diff)" with "--". This is currently necessary to the process, but does not need to be; the equations in columns B and C could be modified to not use the "--" to find it's place when parsing the data. Once this is done, the other columns then correctly parse the data, leaving columns K and L as the two columns with the summary data that I want to use, with one exception. The last date in the list will not have the # of edits for that day; but I do not use that day, since it is not 'complete'; the day isn't over.

I then select columns K and L, and copy and "paste special" (values) that data into a second worksheet within the spreadsheet. I then format the date column to be in date format (since I only copied values), and sort by date.

Next, I create a new column at column F, row 2, with first entry being equivalent to the first date for which there is an edit count for the user. For subsequent values in column F, I add one to the prior row's data (for example, =F2+1) and copy this to sufficient subsequent rows to cover the entire time frame of the user's edits from first edit through yesterday's edits. Then, in column G I add in this function: "=IF(ISNA(VLOOKUP(F2,$A$2:$B$175,2,FALSE)),0,VLOOKUP(F2,$A$2:$B$175,2,FALSE)) , and copy it to every subsequent row used in column F. I then create columns H and I. Cell H2 is equivalent to G2. H3 is "=G3+H2", and I copy H3 to subsequent rows like column G. I2 is =G2. I3 is "=H3/((F3-$F$2)+1)". The F3-$F$2 simply counts the number of days between the two dates in the range to that point in the row list, and the starting date. The +1 makes it inclusive of the starting date. I3 is then copied similar to columns H and G (you can copy them all together after all of row 3 is done). This column then shows the average # of edits per day.

With this done, I then create a chart using chart wizard based on columns F, G, and H. If average # of edits per day is significant (more than say, 10) then I'll include that column as well. Once the chart is created, I'll right click on the data series for total edits, select "Chart Type" and in the resulting dialogue box click "Line" and the chart sub-type that does not have dots for each data point, then click OK. Next, I right click on the data series for total edits again, click on "format data series", click on the "Axis" tab, and select "Secondary axis" under "Plot series on". I'll also click on the "Patterns" tab and change the line thickness and color, clicking OK when done. Then, I'll right click anywhere in the chart, select Chart Options, and under the "Titles" tab and put in names for the primary value Y axis and seconary Y axis, clicking OK when done.

Once I have the chart looking the way I want it to look, I then click on "View" in the toolbar, select "Full Screen" and then hit the "Print Screen" button. I then load up Adobe Photoshop, do "File", "New", click OK, then hit ctrl-v. I then have a screen shot. I crop the image to show just the graph, export to web using GIF format, and upload the result to Wikipedia.

I may have glossed over a step or two above, but if you have some skill with Microsoft Excel and some image editing software (you don't have to have Photoshop) you should be able to follow along well enough to do this yourself.

[edit] Frequently asked questions

[edit] "Isn't this just a more advanced form of Editcountitis?"

No. I have observed that people in RfA tend to be far too focused on the # of edits a user has, rather than the distribution and/or quality of the edits. Kate's tools have improved recently to show the distribution of edits over namespaces. There is no tool to evaluate quality of edits; this is purely subjective. What Kate's tools do not do is show contribution patterns over time.

[edit] "Why do you make these charts?"

I provide them as a tool to evaluate a user's contribution patterns over time. This provides a better basis on which to judge a nominee who by some people's measures does not have enough edits to be an admin.

For example; in the RfA for Ixfd64, it was noted that the user did not have very many edits considering the total amount of time that the user had been a member of Wikipedia. While I didn't create a chart for this particular nominee (they had more than 2,000 edits), I did use the underlying template to generate some information about the nominee's editing history and found that the user had been making strong contributions over the preceeding 5 months. Thus, the user's level of contribution and dedication had signficantly increased during the prior six months. This was a useful bit of information in evaluating the nominee, and resulted in a person who voted neutral shifting their vote to support. This example shows why edit counting can be a flawed way of looking at a nominee's potential value as an administrator.

As noted in my admin voting measures, 2000 edits is not a magic number at which a person is suddenly imbued with the talents that would make them a good admin. They came to Wikipedia with those talents. Everything from their first view of Wikipedia to their admin nomination is simply training. The training one gives themselves is mostly about reading existing policy and observing how other people perform various types of work on Wikipedia. Using edit counting to evaluate how much training a person has given themselves on Wikipedia is a horrible way of evaluating the person. It is akin to evaluating a person's ability to run fast by counting how many times they have run; i.e., absurd. The number of edits a person has may or may not have a vague connection to how well that person knows Wikipedia policy. Yet, edit counting and registration date on Wikipedia are used to judge a person's knowledge level of Wikipedia.

By providing these charts, I feel I am offering another way in which to evaluate nominees who have edit counts that are typically viewed as perhaps too low to be an admin. I hope that by making edit analysis charts available for <2000 edit count admin nominees, that people will turn away from using edit counts and time since registration as a measure of talent and knowledge and focus instead on how active the nominee is and may likely be as an admin. I hope.

[edit] "Isn't 2000 edits pretty arbitrary? Why not 3000?"

Chart showing # of edits vs. RfA success rate
Chart showing # of edits vs. RfA success rate

The less than 2000 edit count criteria for me to make a chart is not arbitrary. The following table shows admin nominations and their success/failure for the period from late June to mid September of 2005. Update: I've now added figures from June 05 to January 06, and June 05 to April 06.

Noms 9/05 % Succ. 9/05 Noms 1/06 % Succ. 1/06 Noms 4/06 % Succ. 4/06
All nominations 126 75% 469 66% 683 60%
0-1000 edits 2 0% 51 8% 89 4%
1001-2000 edits 25 48% 82 51% 100 45%
2001-3000 edits 29 83% 105 78% 134 69%
3001-4000 edits 18 78% 61 79% 93 74%
4001-5000 edits 8 100% 42 79% 70 76%
>5000 edits 44 82% 128 80% 197 76%

You can see from the above table that if a nominee has less than 2,000 edits there's a very good chance they will not be successful in their admin nomination. All other groups of nominees have greater than 2/3rds success rate. It is fairly clear than 2,000 edits is a magic number after which a nominee begins to be judged based on other criteria, and not on the number of edits they have accumulated. Get to 2000 edits, and you'll get a ~25% increase in support votes for your admin nomination.

[edit] "How long does it take you to make one of these charts?"

In total, doing one of these graphs takes at most 10 minutes to do. The first one I created took the better part of an hour to get the process and resulting graph massaged into a form I thought useful. Subsequent charts have been easy to do.

[edit] "Will you do a chart for me?"

If you are an admin nominee with less than 2,000 edits I've probably already done it. If not, feel free to request that I do so. If you are not an admin nominee, the chances I will do a graph are pretty low and very case dependent. You can still ask though!

[edit] "Does it make any difference?"

Good question. Right now, I don't know. I've done just eight charts as of this writing, and that's not a statistically relevant sample yet. For what it's worth, of the eight charts I have done for nominees, two of the nominees are still pending, five were unsuccessful and one was successful. However, judging by the success rate may not be a good measure of productive outcome. Have people who vote on RfAs begun thinking more in terms other than edit counts and time on Wikipedia? I don't have the answer to that question. We might be able to judge this by collecting data on voting habits of every voter who votes on nominees with less than 2,000 edits and see how voting behaviors per user change (or not change) on nominees who have charts done for them. Personally, I don't want to go to that level of effort :)

People seem to like the charts and find them useful. As long as that remains the case and I remain interested in doing them, I'll keep doing them. If they help, then I'm happy.

[edit] "Are you still doing the charts?"

In fall of 2005 I came under concerted attack (I'm not using the term lightly) from a vocal minority that did not like the charts. The vast majority of comments I'd received were positive. I did receive criticism for supposedly advocating higher standards and a new form of editcountitis on RfA. This was not the case. Most people that I explained this to recanted their statements. Some did not. Some of the ones who did not engaged in frequent attacks against me both on Wikipedia and the #wikipedia IRC channel. While I'm not here to make any friends, I'm also not here to regularly be the focus of irrational attacks. Thus, I no longer make the charts. Sadly, this didn't seem to have the effect that I hoped for; even two months later, I am still occasionally attacked for making the charts. C'est la vie. The charts did have a (to me) positive effect; I observed an 8% increase in the success rate of <2,000 edit nominations that I made a chart for vs. those that I did not as compared to before I began making the charts. Though I no longer make the charts, they've been at least partially replaced by a new tool that Interiot developed (example). Though not as detailed a graph, the histogram does at least a partial job of demonstrating activity over time which I think is useful to people considering an RfA candidate.

[edit] Other uses

Using the same template that generates the underlying data for the chart, I can also parse out whether an edit summary was used or not. I like to see admin nominees use edit summaries at least 80% of the time if not 90%. This data provides a quick way for me to get a picture of their average use of edit summaries as a percent of total edits, their use over the first 500 edits and the most recent 500 edits to see if the nominee is getting better at using edit summaries. See the Rfa for Ixfd64 again for a demonstration of the outcome of this.


[edit] See also