Wikipedia talk:CheckUser

From Wikipedia, the free encyclopedia

Archives:

  • /Archive1 - to April 2008 (old page, prior to creation of policy page)

Contents

[edit] Adoption note

See [1] and [2] for discussions. FT2 (Talk | email) 22:04, 12 April 2008 (UTC)

[edit] Query

The policy says: "Complaints of abuse of CheckUser or privacy policy breaches may also be brought to the Ombudsman committee."

The Foundation has said the ombudsmen may only deal with breaches of the privacy policy. Has this been changed? SlimVirgin talk|edits 12:07, 13 April 2008 (UTC)

The Ombudsman commission deal with violations of the privacy policy; that is to say, inappropriate release of information. Complaints about misuse of the tool that do not result in release of information should be referred to Arbcom, which on this wikipedia at least, is the body that has the authority to grant or revoke access. Thatcher 15:50, 13 April 2008 (UTC)
Okay, thanks, Thatcher. The policy should probably make clear that the ombudsmen currently aren't allowed to look into checkuser misuse alone. SlimVirgin talk|edits 03:42, 14 April 2008 (UTC)
I wouldn't say not allowed, exactly. The members of the commission have said (on the CU-L mailing list) that they do not believe use of the tool falls into their scope unless it involves release of information. For one thing, community norms are very different on different wikis. I believe on huwiki all requests must be public, are publicly logged, and require at least 5 votes in favor of running the check with no more than 2 opposed. The privacy policy is a Foundation-wide standard; the commmission is not really set up to judge whether just checking (without release) violates local standards for checking. Thatcher 17:34, 14 April 2008 (UTC)
I'm not sure I follow. We have a situation where we have two ombudsmen for the English Wikipedia — Rebecca and Mackensen, and until last year we had Uninvited Company. They are tasked by the Foundation with investigating privacy policy violations. Whenever anyone has tried to raise a checkuser violation with them, we have been told by various people, including Florence, that ombudsmen may not investigate checkuser violations in the absence of privacy policy violations.
No explanation for this has ever been given, except that to allow them to investigate checkuser misuse would significantly increase their workload. But if the number of complaints about checkuser is such that to allow the ombudsmen to deal with them would increase their workload to the point of making it unreasonable, we need to appoint more ombudsmen. It is also not clear how anyone can know it would significantly increase their workload if the complaints are currently not being made, because there is no one to make them to.
As things stand, if anyone wants to complain about a checkuser on the English Wikipedia, they have to initiate a full ArbCom hearing, which a lot of people will not want to do for reasons of privacy. If someone feels that Checkuser X should not have obtained their IP address, they're not likely to want to make a complaint that might involve sharing it with the ArbCom mailing list of over 30 people, and drawing even more people's attention to it in the checkuser log.
It's therefore a real Catch 22 situation, and it's puzzling that it's allowed to continue, because it's completely unnecessary. SlimVirgin talk|edits 22:00, 14 April 2008 (UTC)

In the "privacy policy" section, the statement "Requests should not be accepted on the basis of "fishing"". "fishing" here might be "phishing" in this context?.203.162.3.155 (talk) 13:48, 25 April 2008 (UTC)

[edit] Comments

  • I guess I share this concern. What or how does one ask about possible checkuser abuse? There is some anecdotal evidence that it goes on (CharlottesWeb arbcom hearing), what is the review mechanism? I don't have a problem with a quiet (not on wiki) review mechanism, but having one that is described on wiki, and place to ask for review of activity on wiki would I think aleviate a lot of concerns. --Rocksanddirt (talk) 22:51, 14 April 2008 (UTC)
    • What's the definition of "fishing" anyway? I'm sure I can find a post to the mailing list where someone was saying half the times checkuser is used its a private request. Everyone knows that CUs hang out in IRC and you can get hold of them quickly if you want them there. So how do we define checkuser abuse, precisely? Does it merely mean that someone else other than the CU has to ask? Its OK if so, but it should be stated up front. In these off-wiki requests, is the name of the other person asking listed as well? --Relata refero (disp.) 23:29, 14 April 2008 (UTC)
      • yes! these are all important questions! (but not rush ones) I'm sure whatever abuse there is is very limited and mostly abusive requests are ignored, but a fuller discussion of some of these kinds of potentials is important. --Rocksanddirt (talk) 23:53, 14 April 2008 (UTC)

[edit] Role of the ombudsman commission

There was extensive discussion of this issue on checkuser-L in the last couple of weeks, with participation by Rebecca and Mackensen (from the ombudsman commission) as well as many other checkusers from other projects. The question was framed as "Who do we complain to when a checkuser runs a check for no reason." Both Rebecca and Mackensen were of the opinion that the Foundation privacy and checkuser policies prohibit inappropriate release of information, but that checks which do not result in release of private information do not violate the privacy policy, by definition, therefore they do not have the authority to investigate claims of inappropriate checking that do not involve data release. This was also the consensus of non-enwiki checkusers because, as I said above, different local wikis have different standards of privacy and what constitutes "sufficient reason." And of course, there may be differences of opinion even among checkusers on the same project as to what constitutes sufficient reason for a check. So the outcome of that discussion was that the ombudsman commission will deal with complaints about inappropriate release of results but not inappropriate checking. If that bothers you, take it to m:Talk:Ombudsman_commission. There is nothing that can be done about it here.

Exactly how to deal with accusations of inappropriate checking was never really dealt with conclusively. At one point I suggested forming a grand jury of other checkusers, but that didn't go over very well. The discussion ended without a satisfactory resolution; the consensus was that on projects that have a mechanism for granting checkuser, that same mechanism has the power to take it away. (Some large wikis have Arbcoms, some smaller wikis have elections.) The Foundation does not appear to want a role micromanaging checkusers on individual projects, either. So the issue was sort of left hanging.

It is, of course, practically impossible to talk about this issue without having a specific complaint to look at, and obviously someone who is concerned that their privacy has been violated will not want to make a public complaint. That leaves either checkuser-L, or Arbcom. Arbcom is theoretically the body that has the power to investigate misuse of checkuser and revoke access--I really don't know what to say about editors who mistrust one checkuser or the other, and are afraid that making a complaint will draw the attention of untrustworthy checkusers. Certainly the arbcom-L mailing list is not secure at the present time. I suspect that if a complaint was made to checkuser-L, with sufficient details to allow an investigation, and if a majority of checkusers felt that the check had been run without sufficient cause, that Arbcom would take the results seriously. A jury of non-enwiki checkusers would avoid the catch-22 posited above, but I'm not sure you could get enough volunteers to take the job, and they certainly couldn't do it without full cooperation from all parties. Perhaps a complaint could be heard by a small subset of checkusers who were trusted by both sides, and a recommendation made to Arbcom.

Ultimately, this is a situation that can arise only rarely, when one checkuser looks in the log and tells a friend, "I saw you were checkusered." The majority of checks are carried out quietly and without drama and result in no action, or result in blocks or warnings or other action that makes the fact of the check obvious. It is rarely a good idea to try to write policy to fit rare, unusual and difficult circumstances; better to try and work it out one way or the other. In any event, it is presently the view of the ombudsman commission that no privacy violation occurs unless the results are released, and the place to discuss that is m:Talk:Ombudsman_commission. Thatcher 00:57, 15 April 2008 (UTC)

Thanks for that, Thatcher. I'll reply properly when I have more time, but I just wanted to say that the privacy policy issue is a red herring. The issue is purely who is checking that checkuser is not misused; who should be complained to if it is; and how are users supposed to know whether it has been. Also, the point about misuse happening rarely — we have no idea whether it happens rarely. All we know is that misuse rarely comes to light, because there is no mechanism for checking that. The cases I know of where it has come to light have been because, in one case (when I was checked in 2006), I asked someone whether I had been (and checkusers are allowed under this policy to answer that question); and on another occasion where the checkuser discussed his check of an admin on IRC; and in yet another example, where the checkuser himself told one of the people he checked that he had done it.
When you say, "this is a situation that can arise only rarely" — that is actually part of the problem. SlimVirgin talk|edits 01:16, 15 April 2008 (UTC)
In part, you are correct. However, even if you empowered the ombuds or some other agency to investigate "checkuser abuse", it would only come to light under the same rare circumstances. Or do you propose a proactive and continual investigation of all checkusers? That's awfully big-brotherish. In any event, the ombuds have stated that if there is no release, there is no violation, and that can't be changed here. Thatcher 01:29, 15 April 2008 (UTC)
So, just to get this absolutely clear: we aren't going to define checkuser abuse, and we aren't going to get the ombuds involved unless there is a provable public release of info? I'm not objecting per se to those principles - since almost all checkusers are trusted by the community, one could assume that its not a problem - but is there any reason that cannot be stated directly, so people know exactly how much we must trust them? --Relata refero (disp.) 12:05, 15 April 2008 (UTC)
There is no single definition of "checkuser abuse" just like there is no single definition of "administrator abuse." Much depends on the circumstances. Arbcom has occasionally revoked admin access for cause, after investigating and considering all the facts, and the page already says that Arbcom can remove checkuser access for misuse. It also happens that there is an additional process to consider a subset of allegations of misuse (allegations involving inappropriate release). Thatcher 12:26, 15 April 2008 (UTC)
I wouldn't expect a perfect definition. But WP:ADMIN does, for example, make some effort, though it is necessarily incomplete. The point is: if we have no idea at all what abuse is short of actually releasing information, then that should be stated, saying that all other actions have their propriety determined by ArbCom. (I actually am quite uncomfortable with that, but whatever. At least the page should be open about it, because it makes it even more clear why all checkusers need to be trusted to an quite extraordinary degree. --Relata refero (disp.) 12:31, 15 April 2008 (UTC)

[edit] Logging

Someone removed this — "if the IP addresses returned by a check of a user account are also checked, the log will retain those IPs, and the timing of the check is likely to link the IP addresses to the user account" — which I've returned because it's important not to give the impression that IP addresses aren't retained at all. SlimVirgin talk|edits 03:42, 14 April 2008 (UTC)

[edit] Perplexed

For some reason FT2 and 1+2 keep reverting this from FT2's talk page: All I want is a straight and public answer.

I think I would like an explanation "on-wiki" please [51]. Thanks Giano (talk) 19:14, 14 April 2008 (UTC) - ::Sorry, FT2, I won't have this message quietly removed like this [52], I received your email, and quite frankly that is not the way Wikipedia can do business. Everything, but the most important has to be upfront, otherwise none of us ever know where we stand. It is my suspicion that checkusers are used secretly on many editors for no substantial reason. So please don't force this issue into a dark corner. There was nothing in your email that could not be said here, so why this clandestine behaviour.Giano (talk) 21:50, 14 April 2008 (UTC) - - - Giano - I'm waiting for other input on this, to check if what I was given other's concerns over is actually a reasonable concern, or an unnecessary one, and if so its implications. And to address your other points. I've already spent some time speaking to the other person involved. Please bear with me while some Q&A's go back and forward a bit. FT2 (Talk | email) 22:07, 14 April 2008 (UTC) - - ...and this silly behaviour [53] is getting beyond a joke. Giano (talk) 22:14, 14 April 2008 (UTC) (Giano (talk) 22:32, 14 April 2008 (UTC))

[edit] Reply

I have moved Giano's comment because it seems to refer to this. This is a WP:BEANS issue; a clever person already knows that IPs may be inferred from the log without direct checking, but non-clever people do not know it unless it is spelled out. And this is something I would prefer that non-clever people should not know about. The Foundation is frequently contacted by lawyers seeking information on editors; the Foundation resists such attempts to the best of its ability but must comply with lawful subpoenas. One of the reasons that the checkuser table itself expires (there is no technical reason why it must, of course) is to limit the Foundation's exposure to subpoenas. Knowing that IPs can sometimes be associated with names via the log gives a smart attorney another avenue of discovering an editor's IP address even after the main checkuser data vanishes. Given the reluctance of the Foundation to disclose editor's information, this could conceivable lead to a shortening of the log itself, which would make it that much harder to use checkuser to find and stop disruptive sockpuppets and banned users. Of course, whether or not the log is explained in detail has no effect on what is actually in the log, and because the log can only be seen by checkusers, the format of the log will only ever be a concern to an editor who does not trust one or more of the current (or future) checkusers. Thatcher 01:21, 15 April 2008 (UTC)

Thank you. Now why could that be explained, on wiki, before without all this silly fuss? Giano (talk) 06:29, 15 April 2008 (UTC)
That's rather disingenuous, Giano. You had it (fully) by email, with exact and precise details of the concerns raised. It read (in part, my words):
"That may lead to speculation of a dangerous and unfair kind to users, in legal cases. It may be speculated that user A is IP B, because of various searches in close proximity, even though no information is held one way or the other. It may lead to allegations that are incorrect and speculative."'
which is the flip side of Thatcher's take on the problem. Further, when you said you wanted it on-wiki, you were also (fully) told without holding back[3]:
"I'm waiting for other input on this, to check if what I was given other's concerns over is actually a reasonable concern, or an unnecessary one, and if so its implications. And to address your other points. I've already spent some time speaking to the other person involved. Please bear with me while some Q&A's go back and forward a bit."
In light of this, I feel most reasonable people will read that you were in fact given appropriate and reasonable answers. User privacy issues are not something to be ignored just because a single user is highly impatient, nor something to trivialize for "politics" as you do. You were told, by a user whose communal role includes insight on serious issues, that there was a genuine concern, you had this explained, you were then told discussion is in progress. Unfortunately you were apparently incapable of seeing this answer in terms other than "must.. be.. bad.. motive.." and decided to edit war, and then insult administrator/s who told you to stop it. That's not intelligent or witty, it's not what I'd expect from an intelligent or clued in user. It was a blinkered, narrow, view that ignored all input except the input it wanted.
Don't in future. FT2 (Talk | email) 09:37, 15 April 2008 (UTC)
Giano, the reason FT2 wanted to keep the discussion private is for precisely the reason I stated above, we do not want to help non-clever people figure this out. And I still hope to keep the information out of the page and hide this discussion in the archives. And FT2 is correct as well, using this information could lead not only to true conclusions that editors would prefer to keep quiet, but also to false conclusions. I am disappointed that you were blocked over this issue but I am also disappointed in your conduct. Preferring to keep quiet about some technical aspects is not a sign of dirty deeds being hidden. Thatcher 10:45, 15 April 2008 (UTC)
Thatcher, I don't mean to bang on about this unnecessarily, but I'm having difficulty imagining a lawyer drawing up a legal document requesting IPs who, assuming he knows checkuser exists, would not word the document in such a way as to include the checkuser log, regardless of anything this policy says. Could you address that issue?
The difficulty here is that, while it might be desirable not to say too much, we can't actively mislead either, and the wording as it was did mislead, and on an issue that many editors might care about. SlimVirgin talk|edits 05:50, 16 April 2008 (UTC)

<--I don't know what the typical legal threat looks like; I do know the Foundation has received some pretty clueless requests, like for all the identifying information of an IP editor. When I emailed FT2 about this I cc'd Mike Godwin but he has not replied. I disagree that not stating explicitly that inferences may be drawn from the log is intentionally misleading. It could also be argued that it is still misleading in its current form, since many inferences that could be drawn from the log will be wrong or incomplete. (If I check 5 users and 2 IPs, for example, there is no way to really know which belonged to whom or what the findings were.) Thatcher 12:19, 16 April 2008 (UTC)

This sounds like a classic software bug that exposes the Foundation to legal risk. Why not just change the software to not show IPs here? If a Checkuser thinks they need to have an easy to access historical record of an IP, that's what your local computer is for. Lawrence Cohen § t/e 14:48, 19 April 2008 (UTC)

IP checks will appear because Checkusers need the ability to check each others' work - and tracking IP to username is very definitely something we want to be able to check if there's ever a question (or even if there isn't). WMF policy takes that seriously enough that it forbids having checkuser at all on a wiki, unless there are at least two checkusers at all times, to review each others' tool use. FT2 (Talk | email) 14:46, 25 April 2008 (UTC)
Logs that don't log are not logs. Would you have a block log that did not list unblocks? What this boils down to is SlimVirgin being concerned (on behalf of herself and other editors) that a checkuser could look at the log and guess her IP (assuming she had ever been checked in the first place) without actually running a logged check him or herself. While this is true, it is only relevant if the checkuser goes on to share that information inappropriately. If one is concerned that a checkuser is not trustworthy, the more reasonable approach seems to be to file a complaint with the ombudsman commission or Arbcom, rather than to think of ways to cripple the tools used by all the checkusers. Thatcher 14:56, 25 April 2008 (UTC)

[edit] Is there a reason why we don't bug this? (FT2's edit)

See here. If this is a problem, it needs to be on bugzilla. Is there any reason not to put in a bug to have this changed? Lawrence Cohen § t/e 14:44, 25 April 2008 (UTC)

It's not got a bug report because at a technical level, it's not a bug, and there's no obvious technical enhancement which developers can code via a request or bug report. Thatcher's comment above covers it I think. FT2 (Talk | email) 12:41, 28 April 2008 (UTC)

[edit] Addition of m:Steward requests/Checkuser

I've added m:Steward requests/Checkuser, the meta requests page for multi-project checkusering. Any comments or opposition to this? Anthøny 23:12, 15 April 2008 (UTC)

[edit] Activity levels of individual Checkusers

Posting this here from an ANI thread:

We currently have 29 Checkusers. As far as I'm aware, logs are kept of every Checkuser action. The current situation is that the Checkusers can see those logs and keep tabs on each other's activities. My impression (and it may only be an impression) is that some Checkusers are more active than others, or to put it another way, I have seen some Checkusers popping into discussions to point out socks, or carrying out blocks, in many cases without a suspected sockpuppet or request for checkuser being filed (I'm not naming names here, but I did at the ANI thread, and have notified the people I named). I'm not saying that anything untoward is going on, but I do fear that some Checkusers are more willing to use Checkuser than others (off their own bat and without being asked), and that does worry me a bit. I am aware that sometimes checkusers can and do need to be run without a formal request being made, but what I would like to see made available, to provide some sort of public check on this, is the activity levels of each individual checkuser. Simply a publication of the number of checkuser actions made each month by each checkuser. That would also help answer another editor's question about whether some checkusers are overloaded and trying to do too much, while others are mostly inactive (some only need it now and again). I'm going to notify six Checkusers and see whether they are prepared to say exactly how much Checkuser activity they engage in. I don't want to spam all the other 23 Checkusers, so I'll wait and see what response I get to this first. I think most of the CheckUsers are current or former arbitrators as well, so maybe someone could e-mail that list? Or is there a Checkuser mailing list? Ah, I see there is checkuser-l-at-lists.wikimedia.org. If I send something to that, will it get through?

There is also the general principle of being able to publically track the activity levels of other user rights groups (eg. oversight and IP block exemptions), while for obvious reasons keeping the contents of the logs private. To reiterate, I realise the contents of the logs need to be kept private in most cases, but it is the level of activity of people with these user rights levels that I think the Wikipedia community should be able to see as a way of having a public check on the activity levels, and also simply to see if people are using the tools they have been entrusted with. Even if individual activity levels are not possible, it would be good to have overall activity levels tracked. Any thoughts on whether any of this is feasible or desirable? Can this be done technically (or is the necessary information removed after a period of time?), and even if it can't be done publically in an automated fashion, could it be done manually by a group of checkusers (or maybe the ombudsman committee?) preparing an annual report on activity levels during the past year? Carcharoth (talk) 11:43, 8 June 2008 (UTC)

Note: I've tweaked your message, to have the CU list email address displayed with {{NonSpamEmail}}. Anthøny 12:04, 8 June 2008 (UTC)
As a brief note, checks that are not ran further to a 'formal check' are supposed to be documented at Wikipedia:Requests for checkuser/Case/Unsorted results. I'm not sure if that is used very often, however. Anthøny 12:04, 8 June 2008 (UTC)
Carcharoth, what problem are you trying to solve? I do not agree with making busy-work. Jehochman Talk 12:34, 8 June 2008 (UTC)
If you read what I posted, it should be clear: "I do fear that some Checkusers are more willing to use Checkuser than others (off their own bat and without being asked), and that does worry me a bit" As a general principle, I think there should be a public level of visibility about these matters. More transparency, in a word, and nothing to do with busy-work. If making things more transparent requires more work to be done, that is something that I think should be debated. Jehochman's response is short and avoids most of what I said, which is his right, but I would appreciate if people would address what I said and not reject it out of hand. Carcharoth (talk) 13:07, 8 June 2008 (UTC)

I've had a look at Wikipedia:Requests for checkuser/Case/Unsorted results. I was unaware of that page. If I notice checkusers commenting in discussions and it is clear that they are talking about checkuser results that are not as a result of a request, would it be acceptable for me to note it there or ask them to note it there? Or is there an unwritten rule that in fact background checkusers do need to be carried out without on-wiki reports other than maybe in a block log or by the application of a sockpuppet tag to a user page? Carcharoth (talk) 13:11, 8 June 2008 (UTC)

It is absolutely not the rule that checkusers need to be noted on-wiki. Some are, some are not -- depending on circumstances. Wherever useful, I make detailed comments. Often, however, what you say in a block summary ("sockpuppet of X, per CU") is all you can usefully say. Frankly, I don't know what you expect this information to tell you. Volume of work per the log is pretty much indicative of the volume of work as seen on-wiki, as far as I can tell -- but plain numbers won't reveal that, as some checks require fifty IPs to be checked, while others require about three. You aren't going to get any more transparency because there is nothing that can usefully be revealed. I don't really care about releasing this information -- but you wouldn't care about receiving it either. Checkuser-l, incidentally, would reach all enwiki checkusers, but also all Wikimedia project checkusers.
Similar sentiments apply to oversight. When it comes to IP-block exemption, the user list is public and contributions are public -- it's not going to make precisely zero difference.
Go ahead, if you care. I don't, because there's nothing to care about.
Sam Korn (smoddy) 13:42, 8 June 2008 (UTC)
"but plain numbers won't reveal that, as some checks require fifty IPs to be checked, while others require about three" - I hadn't thought of that. Thanks for pointing that out, though if the 50 checks were bundled under one request, that would help show what was happenning. I also wasn't aware the IP-block exemption contributions logs were public - where would I go to find those? It seems that there is some resistance to even raising these issues or asking questions. Please, I'm not trying to find problems, just asking for more transparency - there is a difference, and the tone of "Frankly, I don't know what you expect" and "I don't really care" and "you wouldn't care" and "Go ahead, if you care. I don't, because there's nothing to care about." doesn't really enthuse me. I was hoping for calm, helpful answers. BTW, I can't go ahead, as you say, I'm asking here if it can be done. The fact that at least one checkuser was unaware of that page indicates that some tidying up is needed if nothing else. That is the sort of things that emerges when discussions like this are started, and I would hope that alone would justify starting such discussions, and that people would accept that and not react with an abrupt "why are you asking these questions?" response. Carcharoth (talk) 14:16, 8 June 2008 (UTC)
(ec) I had no idea of the existence of that page, nor any knowledge of any requirement to document checks run there. (I share the concern that it might be busywork to ask CUs to use it, although if someone else wants to document requests they observe that's fine) If someone privately asks for a check, or pops up on my user page asking for a check, I'm going to evaluate the rationale and decide what to do, often I decline and that is that. In the former case, the only publicly visible evidence of my running the check would be if I found something untoward and took action. I don't typically share details with those who privately ask for checks. In the latter case, again, my user page is where I'll respond. I tend to try to give as little detail as possible, to preserve the privacy of the innocent. I would say there is an unwritten rule that some checks very much ought to be kept private. The negative result (no evidence of any issue) ones, especially. To the original question of gathering statistics... that ultimately seems' ArbCom's remit, if they decide there are CUs that are not using, and do not need, the tools, they presumably would act to remedy that. If individual CUs are overloaded they ought to speak out on the CU mailing list, I'm sure some of the other CUs would pick up the slack. ++Lar: t/c 13:44, 8 June 2008 (UTC)
What I'm talking about here is checkusers running checks on their own initiative without anyone asking them to run the checks. That sort of activity is only visible in the logs, as there is not even someone making a private request who is aware of the checkuser being run. It is excessive "let's see if there are any socks here" checkusers (ie. fishing) that I am concerned about. I don't think any individual checkuser can be objective enough to decide whether a checkuser needs running without the added input of someone else requesting it. I feel that if a checkuser runs a check on their own initiative, with no-one asking for the check, they should document it somewhere, or at the least have some misgivings if they consistently run checkusers on their own initiative that were not needed. Does no-one else share my concerns about this? Carcharoth (talk) 14:16, 8 June 2008 (UTC)
I don't, for what its worth. Requests for CU checks, made on the page or elsewhere, usually aren't made with an eye towards whether or not the check would uphold or violate the checkuser policy. Generally speaking, the checkuser is the only person involved in deciding whether a particular check is appropriate or not. So running a check without a request is essentially no different than running one with a request, and I don't see a problem with it. The only change is the absence of an on-wiki record of the request and its result - which is also, to my mind, acceptable. As far as determining whether some checkusers are inactive - as Lar says, maintaining the ranks there is the responsibility of ArbCom. Sorry to butt in, happened to follow this link from Lar's talkpage. AvruchT * ER 14:40, 8 June 2008 (UTC)
My question about activity works both ways. Do you think it is possible for a checkuser to be too active? Just as those working in other areas can lose sight of the bigger picture, is it not possible for a checkuser to get too caught up in being able to run checkusers, and lose sight of the bigger picture? Currently the only existing check is for individual checkusers to check each other's activities. Sorry for the touch of cynicism here (it is in response to the tone others have taken), but I do hope it is not "busy-work" to ask whether any of the checkusers do actually use the logs to look at the activity of the other checkusers, or whether they only do that if a specific complaint is made. I shouldn't have to point out the obvious that specific complaints are rather difficult to make when logs are (correctly) not publically visible. If there were activity logs, I could ask whether checkuser X, who has done 10 times the amount of checkusering that the other checkusers together have done, might possibly be doing lots of "own initiative" checkusering, and whether the amount of that checkusering is justified. But having no information like that, I can't ask that. Similarly, checkuser Y could be spending most of his time dealing with requests for checkuser, and only doing one or two "own initiative" ones, but checkuser Z could be only doing a few at WP:RFCU and could mostly be focusing on pre-emptive checkusers to protect a few pages from sockpuppets, which doesn't seem to be to be the purpose of checkuser. We can't protect individual pages by aggressive use of checkuser, as that doesn't scale across the whole encyclopedia. At the root of this is a perception I have (and am unable to check) that different checkusers have different standards as to what they are prepared to use the tool for. The only people I can ask whether this impression is correct or not is the checkusers themselves. I would hope they would respond openly to such questioning, even if it is conducted among themselves and not on-wiki. I suppose the shortest way to put this is:
  • (1) Is it desirable for the checkusers to be consistent in their checkuser activities?
  • (2) If so, are the checkusers as a group consistent in their use of checkuser?
  • (3) Finally, do the checkusers talk to each other about their activities to ensure that their use of checkuser is consistent?
Apologies for taking a while to work out what I was trying to ask (I can see that my previous lines of argument might have seemed a bit off kilter), but I hope my three questions above sum this up. Carcharoth (talk) 15:15, 8 June 2008 (UTC)
I would suggest, if there is any question of busywork effecting the collation of allowable information regarding CU activity, that (yet another set of) clerks could be recruited to do the scut work - pretty much like Arb Clerks, being the only persons other than CU's privy to sensitive information and not being allowed to divulge same. It also seems, pending response to Carcharoths questions, there might be a need to confirm what the SOP regarding recording CU activity is. If nothing else, sharing CU knowledge and practice may improve the effectiveness of the tool(s). LessHeard vanU Talk 15:23, 8 June 2008 (UTC)
(ecx2)It strikes me from my observations that checkusers "follow" certain situations or cases, and may well be doing CUs without formal requests identified by other Wikipedians, and that this isn't inherently a bad thing. I don't think it would be of benefit to anyone to point out those specific cases; sometimes the people behind the accounts are seeking attention, and in other cases these quiet CUs help to build cases that must be developed over time. The key point is that they are all logged, and that the information revealed is held confidentially unless used for blocking; even then, only information that *must* be revealed to support an administrative action is brought out into the open. It's my personal opinion, completely unsupported by any facts, that information about editors in good standing has been discovered during a CU investigation, but that (generally speaking) such information is not acted upon except if a genuine problem has been revealed, and that it remains confidential. I am aware that at one time I was active on a policy page where there were genuine concerns about sockpuppets trying to alter WP policy, and it was at a time when I was not well-known to the community, so it wouldn't surprise me if my username got checked back then. The only concern I would have is if the information went any further than the CU log. On the other hand, I'm in favour of having some sort of audit process in place to ensure only appropriate use is made of the tool. Risker (talk) 15:28, 8 June 2008 (UTC)

Well, the question of whether there is effective monitoring of checkuser use is pretty different from the questions originally posed, although I can see how requests for activity levels and log-style data are a way to get at that question indirectly. A review or audit of the checkuser activity from a specific period might not be a bad idea, something perhaps the ombudsmen should be able to perform. If you have a specific concern, Carcharoth (and it seems like you might), then you can ask to have it investigated directly, but in the absence of a specific concern the only method to determine if checkuser use is in line with the policy is to sample the activity and look for violations. I'm not sure really how you would find violations using check activity logs, but I'd be interested to see how checkuser activity is monitored and whether that monitoring is effective. I have no reason to distrust any individual granted checkuser, but even in a situation of complete trust some internal auditing still makes sense. AvruchT * ER 15:38, 8 June 2008 (UTC)

I did meander a bit to get there, yes, but what you've said above is essentially what I am asking for (thanks for putting it so well!): a regular (eg. annual) internal audit on general principles. I do have specific concerns, as I said at the ANI thread, but if there have been inconsistencies in the past, I would prefer that things are tightened up from the inside, rather than harp on what has or may have been happening. As LessHeard vanU said "If nothing else, sharing CU knowledge and practice may improve the effectiveness of the tool". If done between checkusers on a mailing list, I agree absolutely. Carcharoth (talk) 16:00, 8 June 2008 (UTC)

Now that some discussion has taken place, could someone notify the other checkusers? Is there an internal en-checkuser mailing list, or would the wikimedia one be best? Carcharoth (talk) 16:00, 8 June 2008 (UTC)

  • The raw numbers would be meaningless. As far as checkusers doing their own checks without being prompted: damned right. We know more about abusive sockpuppeteers than anyone else. When we notice, for example, a request for unblock from someone who looks like a DavidYork71 sock, we're not going to wait for someone else to verify that this looks like a DavidYork71 sock; what would be the point? We also frequently double-check other checkuser operator's results, without being asked; if I note that Ali has said some user is a sock, and the user is requesting unblock denying it, I'll run checkuser again. I hope and assume other checkuser ops do the same for me. Comparisons of numbers between the various checkuser people isn't going to say much other than, say, "Josh is more active than Ali". One of us might spend an hour a day on Wikipedia, another might spend ten. Our levels of obsessiveness vary dramatically. Further, our techniques differ. I might run checkuser a half dozen times on the same editor if I keep finding interesting details -- since I rarely keep records of the results of my searches. Other checkusers keep more records than I do, I'm pretty sure. But running checkuser once on someone has exactly the same impact as running it a half dozen times on the same person. If you have specific issues with specific checkuser actions, it would make sense to bring that up in the proper context, but analyzing the logs isn't going to shed any light on anything. (Oh, by the way, there's no en-only checkuser mailing list, just the global one; we're used to most of the mail there being about enwiki.) --jpgordon∇∆∇∆ 16:40, 8 June 2008 (UTC)
    • Thanks for this. It makes things a lot clearer, and in a good way. I see now that the raw numbers may not mean much, but I would say that some overall level of activity might mean something. How has checkuser activity scaled over the years, for example? As far as checking other checkusers activities, it is more the possibly of, or definitely, incorrect actions that I am talking about. Surely there are sometimes actions that are controversial or that cross a line, or where checkuser data ends up giving the wrong result in an investigation. Do the checkusers as a group ever talk about that internally, or one-to-one by e-mail, and not just in response to an external query or a dispute on-wiki? Also, do checkusers discuss best practice amongst themselves or not? From waht you've said about not being sure what other checkusers do, I get the impression that maybe not much internal discussion takes place. Even just a little bit of reassurance on these matters would go a long way. Thanks. Carcharoth (talk) 16:53, 8 June 2008 (UTC)
  • Sure, we discuss results and puzzles on the mailing list all the time. Though the most common mail topic seems to be "Cross-wiki vandalism from some IP", there's also a lot of "hey, these results puzzle me, can you take a look?", and sometimes policy discussion. "Best practices" as far as policy issues are concerned is highly project-specific, as local rules may not be more permissive than Foundation requirements, but they can certainly be stricter. We do sometimes discuss that on the arbcom list. And we also sometimes do it by individual emails; of course I don't know how much. Maybe one email a week for me. Oh, and there's also a checkuser IRC channel, but it wasn't getting much action when last I looked in on it (a year or so ago.) As far as "not being sure what other checkusers do", there's no sort of orientation for newly hatched checkuser (or at least there wasn't when I got into it); so we each kinda figured out our own way to use the tools, and we each have our own methodology. --jpgordon∇∆∇∆ 17:37, 8 June 2008 (UTC)
    • Thanks for that. Just one more question, and then I will leave it. Do all checkusers participate in such discussions, or are some checkusers more independent, doing their own thing and developing their own way of dealing with things without much input or oversight from others? If the checkusers (and there are only 29 of them) work as a team and talk to each other, then that is fine. If there are lone guns who do stuff on their own and don't discuss things with other checkusers, then that is not so good, in my opinion. I know there is no requirement for the checkusers to work with each other and in consultation with each other, but I don't see any harm in that either. Carcharoth (talk) 20:09, 8 June 2008 (UTC)
      • That's kind of an unanswerable question. I can't tell who is actually reading the discussions on the checkuser list. Some people are taciturn, some people actively participate in the discussion, but there aren't, like, checkuser meetings where all of us are required to discuss our caseload and results for the week. --jpgordon∇∆∇∆ 00:43, 9 June 2008 (UTC)

What Sam Korn said. Detailing which checkusers I run on-wiki would be time-consuming, laborious, and is absolutely not required. Thus, I will not be doing it. As to whether or not a checkuser can be overly active - I find the proposition absurd on its face. Raul654 (talk) 18:55, 8 June 2008 (UTC)

Raul, I should have asked that question the following way: do you think a checkuser needs to take a minimal amount of time with each checkuser result to make sure the results are being interpreted correctly? If that is the case, then there is a natural limit to how much any one checkuser can do. You said elsewhere that you had done 54 checkusers this week. The impression I got was that this was not normal. But then only checkusers know what is a normal level of activity or not (depending of course on their relative level of activity to begin with). 10 a week, 20 a week, 50 a week? You were dealing with a persistent creator of sockpuppets - have you discussed the situation with other checkusers to get their advice on what to do, or not? Carcharoth (talk) 20:09, 8 June 2008 (UTC)
Having an internal auditing/recording process may also help the above, what may be commonplace (and considered un-noteworthy) by one CU may be an area of unfamiliarity - although not enough to be raised ordinarily in the other channels - for another. I find what appears to me to be the dismissive tone of Raul654 indicative of the reasons why I think there is perhaps some areas of concern by members of the wider community in how CU is used by those with the relevant permissions; I would be extremely concerned if Raul's seeming attitude is an example that of the other CU's. LessHeard vanU (talk) 21:08, 8 June 2008 (UTC)
Certainly, some checkusers will miss known miscreants, especially when they're new at the job. That's why new checkusers tend to hang out at RFCU, where everything is visible and results are quite often double- and triple-checked. When false positives occur, the usual result is the victim saying "Hey! Not so!", and then another checkuser analyzes the data and either confirms or casts doubt on the result. It's pretty unlikely that false positives are going unnoticed. As far as what to do or not with a persistent creator of sockpuppets, yes, we discuss that, but those sorts of creeps tend to become known to more than just the CU team pretty quickly, so those discussions aren't just for us -- especially since some of the possible cures can be pretty drastic (like blocking big IP ranges.) Most individual cases are dealt with without any discussion. Most individual cases don't yield any useful results. I do agree with Raul -- if I were expected to log on-wiki all my checkuser activity, I'd just find some other way to help Wikipedia. And I don't think that would be a good policy, anyway, even if I weren't the person who'd have the increased workload. Think about it for a sec: as I said, most checkuser results are negative, but even being listed means that suspicion was cast upon a possibly innocent user. --jpgordon∇∆∇∆ 00:43, 9 June 2008 (UTC)

(edit conflict) As an aside, I had no idea until now that Wikipedia:Requests for checkuser/Case/Unsorted results existed. I intend to use it where and when I can, but may forget. Re. logging, I think there may be a case for some statistical logging - Alison 07:10, 9 June 2008 (UTC)

I figured it was pretty unknown, hence why I threw a link out there. :) Anthøny 15:57, 9 June 2008 (UTC)

[edit] Checkuser usage

I got asked this a while ago, and checked it back then. It might be surprising, that users are busy as checkusers, who you don't much see mentioned as such, on-wiki. Here's the last 3 complete months data, sorted in order of average usage March - May:

The small print -

  1. Please don't ask "who was checking what" as the actual check logs themselves have a big disclaimer about privacy policy, but in the aggregate I think it's fine to post here.
  2. CheckUser, like edits, isn't a league table. Many checkusers take requests from the community, but some use it only "as needed", for example to recheck the findings on more serious cases presented to arbitration. Low usage usually means, for users such as Kirill, they are focussed on other aspects of the project, such as RFAR pages.
  3. Cases vary in depth and complexity - number of checks doesn't represent number of cases, or amount of work required. Everyday cases can routinely need multiple checks by the time they're done (eg suspect usernames/IPs -> list of IPs -> check each IP -> recheck behaviors of relevant accounts on that IP -> wash, rinse, repeat -> identify socks). Some more, some less.
Arbitration commitee checkusers
  March 2008 April 2008 May 2008
Jpgordon 524 381 555
FT2 165 64 607
Blnguyen 55 66 82
Morven 41 66 46
FayssalF 18 5 42
Deskana   34 29
FloNight   49 4
Jdforrester 1   21
Thebainer 2   8
Kirill Lokshin   4  
UninvitedCompany 1    
Ex-arbitration commitee checkusers
  March 2008 April 2008 May 2008
Dmcdevit 339 527 552
Raul654 473 271 306
Jayjg 11 92 85
Sam Korn     162
David Gerard 16 75 27
Fred Bauder 11   3
The Epopt 3 2  
Mackensen     2
Rebecca      
Other active checkusers
  March 2008 April 2008 May 2008
Thatcher 2216 1733 1561
Alison 843 452 1141
Lar 39 35 82
Voice of All 9 22 24
Cary Bass 11 17 12
Redux      
Interpretation

Unsurprisingly, well known checkusers such as the most recent two Arbcom appointees, Alison and Thatcher, carry much of the brunt of the workload, as do well-known arb-l checkusers such as jpgordon and dmcdevit. Roughly speaking, around 8 checkusers between them handle most of the community's requests. But in fact, almost all the community's checkusers are fairly active - those who are showing less active above tend to be active when specific cases (usually arbitration cases, well known sock-operators, or admin sock) cases require it, rather than also doing other requests (eg RFCU) as well.

The only genuinely low usage checkuser is Redux (last usage November 2007). (Other low usage are not expected to be necessarily that active: the WMF developers Brion Vibber and Tim Starling and the WMF Ombudsmen Rebecca and Hei ber). All others shown are active, even if not every month.

Many ex-arbitrators who have to an extent faded from high profile in the community, such as The Epopt, in fact are still fairly active behind the scenes backing up casework for the community. The ex-arbitrators do a lot of the checkuser work, and sock-spotting for others to verify, as witness Raul654, Jayjg, Sam Korn and David Gerard. (For example, part of my May 2008 CU activity was a request to recheck from scratch, a specific likely reincarnator, as an uninvolved checkuser. Not that it needed it, but it did confirm his work and also identified three further more subtle socks being set up as some kind of "not me at all" accounts, so it was worthwhile.)

A lot of caution is needed in interpreting these kinds of figures. Roughly speaking I'd say that you can't assume much from this kind of raw count. As noted above, a "case" may need anything from one check, to 50 or more. For example - the Zippycup case (40 socks) required 166 checkuser actions from me and isn't yet completed; a further request to recheck a second case from scratch, for "due diligence", took another 66 plus behavioral checking and IP tracing.

Likewise, the volume of checks is also no guide to the time needed or how likely a connection is.

Example - User:Query has used some 15 IPs, so one may well have to checkuser all 15 of the IPs (15 clicks x 2) and search on each for User:SuspectedSock's name to appear (CTRL-F). One then looks closer at the timings and frequency of overlap and whatever other evidence shows up. That's one case, 1-15 checks, and anything from 3 minutes to an hour. Maybe there's a completely blatant likely connection (3 minutes and {{likely/confirmed}}). On the other hand maybe User:Query has no obvious connection to SuspectedSock, but you notice that Query happens to be using an IP used heavily by User:Blocked, a well-known repeat sock-master, and one goes and digs up Blocked's socks' contribs and those old results too, finding that one sock of Blocked used the identical 2 IPs and edited the same topics as Query, and a second sock of Blocked used the same 3 IPs and edited the same topics as SuspectedSock, and you've found yourself in fact a blatent attempt to restart the User:Blocked sock-farm and half an evening's work for two checkusers to identify the other new socks, plus a request to others to re-check it. If it's a case with multiple suspected socks, one might work through it from check to check a dozen or more times (total = 3 mins to 3 hours), before finding the one or two checks that show clearly there's a likely technical connection.

It varies immensely depending "how deep the rabbit hole goes". Basically you do what you have to, to check the case out and form a view on the technical evidence.

If it's a contentious or high profile case, or non-trivial, many times a case will be independently checked by others too. As with any admin work, some stuff one just does (and doesn't need to announce it), other stuff needs more discussion. Checkusers will also tend to have a sense (or pick up) what are the "higher profile cases" or cases needing more eyeballs, at any given time. On well-known higher profile cases in the community, you can often trace multiple checkusers each independently verifying the results are accurately and fairly interpreted.

Last observation - yes, there are checkusers who will check the logs and bring up checks they have questions over. At least one non-arbcom checkuser for example, has told me explicitly that as part of their role they generally like to review the log daily. Mostly, the role is trusted and would be removed for misuse; I'm not aware myself of any actual abuse cases or allegations since becoming a checkuser. To underline this, I've seen in the 5 months since January, just one single inquiry of the form "why would X be checking Y". (I gather 3rd or 4th hand, that there may have been concerns raised on some matters in the past, but if so I wasn't around for those and get the impression they got discussed if so.) Two classic interpretations exist - either minimal or virtually no misuse happens, or it happens all the time and nobody notices or says. I'm inclined to the former, if for no other reason that checkusers tend to be in it for the long haul... if a checkuser did have others that they wished to target improperly, they'd be known for it long ago, the log entries seen, or a breach of privacy would have been commented on by someone in all that time. (And because it's like adminship - most admins know the ropes and do follow them, it would be an immediate dead-end on wiki if caught, and the log is 100% viewable by many others indefinitely into the future who wouldn't be very forgiving if a genuine case/complaint came up.) For example, in my work I'm extremely careful that I will decline a check by anybody, unless I have good evidence it's appropriate and within both WMF policy and communal norms. Usual response to a request is "evidence or wiki link?" and I probably decline around 1/2 of all requests for insufficient cause, evidence, or necessity. Those I do accept are thoroughly documented as to reason in the checkuser log, for others to verify. And a thank-you to Deskana who explained "never, ever accept a case without checking yourself for good cause, whoever might ask" back in December/January. Being told once was enough.

FT2 (Talk | email) 07:06, 9 June 2008 (UTC)

Wow. I think that has more than answered my questions! Thanks very much for that, FT2. It certainly does give a little insight into what checkusers do, and where the brunt of the workload lies (remembering of course that the raw numbers don't tell the whole story). It also gives some insight into what people mean when they say they do a lot of work behind the scenes. I would normally have more to say, but this will take a little while to digest. Carcharoth (talk) 11:20, 9 June 2008 (UTC)
That is excellent - and it goes to show that CU is a workload type job, rather than "waving a magic wand" that some uninitiated may regard it as. I believe that if such information was available on a CU page available to the community that there would be a greater appreciation of the work done by CU's, and possibly a greater effort in providing the relevant information to allow CU's to do their work. As I said previously, since no CU wants or needs to get bogged down in collating this information on a regular basis, there may be a role for a clerk to facilitate communication between the CU's and the rest of the community (and to act as a conduit between CU's if there needs to be one). Or have it as a "Signpost" feature? LessHeard vanU (talk) 20:04, 9 June 2008 (UTC)
FT2. thank you for publishing this. Anything that brings more openness and transparency to the Checkuser process has to be welcomed. This is excellent progress. Can we possibly produce these stats on an ongoing basis somewhere? - Alison 22:25, 9 June 2008 (UTC)
I just hope he did it the easy way and wrote a script to put it together! But the problem is that the numbers indeed are meaningless other than saying who the busy core is. --jpgordon∇∆∇∆ 01:16, 10 June 2008 (UTC)
I did it the same way I do all the usual kinds of data crunching. CU log x 5000 edits x last 6 screenfuls -> copy/paste -> text editor (cleanup) -> excel or other spreadsheet (text rendered into tabular data and magic done) -> access or other database (data collate and summarize) -> spreadsheet again (automate wikitable markup from raw data). 20 mins start to end.
LHvU - That assumes checkusers (or others of like trust) want to display their workloads and get fussed over. Most don't, they do a job and I guess, don't really feel a need to parade it. Those who want to assume they do a job will do so, or ask, those who don't won't. High profile appreciation isn't needed per se; the users doing it know it helps, as those who do anti-vandalism patrols know their work's appreciated without a big deal being made of it. Same as how most admins would not really want to have a deal made out of how many admin actions they do a month, or how many edits they have done this year.
Alison - The raw table alone could be done easily as needed. But I'm very reluctant to second doing it regularly at all, since it would get to be a bit like a list of "editors with most edits" or "usage of blocking and protection by admin" ... a distraction and would eventually get MFD'ed or perennially listed for encouraging an unhelpful view. (Eg, Wikipedia:Miscellany for deletion/Wikipedia:List of administrators by edit count, and Wikipedia:Miscellany for deletion/Wikipedia:List of Wikipedians by number of edits (second nomination), first listed for VfD in 2005, and still regularly hitting MFD.) We know now roughly what usage we have, on the 3 most recent months. I wouldn't honestly bother doing it again for a long time, at least another year. Apart from anything else we dont need our checkusers watching who will think it's too high, or too low. Just let them do their jobs. Any actual analysis of CU function, we have enough data for, now.
Last observation. Checkuser requests have changed in character somewhat, as the more experienced sockmasters have got more sophisticated. Whilst the volume of cases grows relatively slowly, the depth (hence volume) of CU actions on a case probably grow much more quickly. In many past cases a finding of "no connection" was reached when more checking might have found a connection; the limiting factor was ease of finding connections given the long pages of data, more than anything. The community's seeing more demanding checks being done more often now, as skills develop and sockmasters have escalated their activities. FT2 (Talk | email) 04:05, 10 June 2008 (UTC)
Thanks again, FT2. Awesome work. The reason I see this as a welcome step forward, and why I was hoping it was a (semi-)regular occurrence was because I see it more as being a step towards possibly making the entire process more open, at least within the realms of privacy. It would be interesting to see, for example, how many cases were driven from WP:RFCU and how many were not. It would also be interesting - heh - to show metrics similar to filling in edit summaries while editing, on how many checkusers are actually filling in the 'reason:' field :) My personal bête noire - Alison 04:59, 10 June 2008 (UTC)
Interesting stats. I do agree with Alison about the reason field, probably something to raise on the internal CU list... It does highlight that Alison and Thatcher are working really hard at this task and should be commended. I knew they did, but I had no idea. ++Lar: t/c 10:35, 10 June 2008 (UTC)
Last comment; "appreciation by" rather than "appreciation of". ;~) LessHeard vanU (talk) 12:35, 10 June 2008 (UTC)

I didn't have any idea of the scale of work either. I don't think anyone did - they just did it.

I've also done some work on the "reason" field too as requested. This by nature has to be much more tentative, since there's no way I can get a computer to comprehend if a reason might be obvious to a competent reader or not. But in the same way that RFA tends to look at edit summary usage % and uses that as a rule of thumb, here's some raw stats on usage and "reason" field, based on data from Dec 2007 - May 2008:

  • Checks are: 35% = checks on named accounts, 65% = checks on IPs and IP ranges. This makes sense since the most common activity for checkusers is to check the named account(s) and then their IPs. That's a "one to many" mapping, ie, each account will use at least one IP and often multiple IPs, whereas checkuser actions on IPs will less often also involve checking even one account. So you'd expect many more checks to be on on IPs or IP ranges, than on accounts. I'm a bit surprised it isn't more extreme - I would have thought a ratio of 3:1 (25% accounts, 75% IPs) would be quite common.
  • 10 12 of 25 active checkusers provide a reason at or very close to 100% of the time (98% - 100%). In more detail -
  • Reason field used 93% - 100% of the time = 16 checkusers (64% of 25)
  • Reason field used 80% - 100% of the time = 19 checkusers (76% of 25)
  • Reason field used 65% + of the time = 22 checkusers (88% of 25)
  • 3 of 25 active checkusers do not habitually use the reason field.
  • Reasons are provided for around 70% 78% of checks overall. The figures are extremely consistent month to month - eg, for the last four months Feb-May 2008, 80 - 84% of account checks and 75 - 80% of IP checks have a reason given in every month. (That said, a lot of the time it's obvious anyway from the CU log context, user name, or user's block log/contribs/talk page, what the check was about, and if there is doubt, any checkuser wishing to verify it, usually can for a considerable time afterwards, or can ask.)
Month Checks with a reason Account checks with a reason IP checks with a reason
December 2007 76% 81% 73%
January 2008 73% 78% 70%
February 2008 78% 80% 78%
March 2008 77% 81% 75%
April 2008 79% 81% 78%
May 2008 82% 84% 80%
  • I also did a quick check for common strings that tend to indicate a good reason and traceable cause is provided. This isn't exhaustive by any means, as many cases have good obvious reasons but a reason text that didn't match any of these. In detail -
  • I searched all checks performed for Dec 2007 - May 2008, for the following common strings which generally tend to indicate a good checkable reason was provided if the full reason was checked: ("vandal" or "sock" or "rfcu" or "checkuser" or "case/" or "arb" or "evasion" or "evade" or "ban" or "disrupt" or "spa" or "threat" or "harass" or "sockcheck" or <list of 7 prolific sock-users' names> or "ANI" or "pedo" or "otrs" or "http"). This is rough only.
  • In every month, a very consistent 64% - 72% of account checks, and 57% - 64% of IP (or IP range) checks, had a reason with one of these word fragments in it.
  • Most of the rest, where a reason was provided, the reason was eg, the specific users name, or a specific page name, or an OTRS ticket ID, ie, some other descriptive string or comment that was too individual to be picked up by a coarse test like this.
  • Comparing the 70% 78% of checks with a reason, with the 64-72% of account checks and 57-64% or IP checks that had a clearly checkable reason with one of these strings, suggests that when a reason is provided, it is most usually checkable, and that one of the above strings or reasons will commonly apply.


FT2 (Talk | email) 12:49, 10 June 2008 (UTC)

  • This kind of analysis of logs sometimes throws up interesting patterns. An analysis of blocks and in particular indefinite block reasons, and one of image deletion reasons, can be seen here and here. I think there is, or was, a "wikipedia stats" wikiproject that used to co-ordinate this kind of thing, but it is difficult to keep under control. People tend to analyse the wrong things, or lose interest. Still, I'm filing away these checkuser stats in some remote part of my memory, and I'll mention them as appropriate if I see the issue raised in the future. Carcharoth (talk) 13:00, 10 June 2008 (UTC)
  • FT2 said: "I did it the same way I do all the usual kinds of data crunching. CU log x 5000 edits x last 6 screenfuls -> copy/paste -> text editor (cleanup) -> excel or other spreadsheet (text rendered into tabular data and magic done) -> access or other database (data collate and summarize) -> spreadsheet again (automate wikitable markup from raw data). 20 mins start to end." - that sounds scarily like how I number crunch Wikipedia stats. Except I haven't fully got the hang of automating some of the steps. Going the full hog and automating some regular requests has been done for ages, but some of the more obscure areas lack these kind of stats. Technically, scraping data like this does put more load on the servers than direct database dump analysis or API querying does, but I think checkuser analysis would be limited in those regards anyway. Oh, and I fully agree that this sort of analysis is not needed on any frequent sort of basis. Just an annual report would be good, and maybe something that the ombudsman committee could do (I'm not 100% sure what their remit is), or failing that, one of the checkusers. Doing this sort of thing once a year and publishing it would, I think, be very helpful. If plans do go ahead to do any annual report, could it be documented on the main page to bring some closure to this thread? Thanks. Carcharoth (talk) 13:00, 10 June 2008 (UTC)

I'd like to thank FT2 for providing this analysis, particularly his last paragraph, which partly addresses a question I raise above. --Relata refero (disp.) 13:23, 10 June 2008 (UTC)

(Minor correction - some figures for previous months in 2007 were included, leading to a slight understatement in reason usage in a number of cases. Struck out and fixed above. Extra table added showing reason usage by month.) FT2 (Talk | email) 15:57, 10 June 2008 (UTC)
While I think this is an interesting snapshot, I'm not sure it has any diagnostic value. The number of raw checks a person runs, and the use or non-use of a logged reason, has nothing to do with whether any particular check meets some minimum standard of reasonableness. Thatcher 16:17, 10 June 2008 (UTC)
I agree the analysis of the "reasons" is not that helpful. But if, say, the ombudsman committee were to produce an annual report on checkuser activity, or, say, a checkuser were to produce such a report, what would you think would be a useful way to present such a report, and what do you think the advantages and disadvantages of such a report would be? To me, the main advantage is communicating something to the general community, rather than the effective "nothing" that non-checkusers currently get told about what goes on. I've gleaned bits here and there, but the volume of work (both in raw number of checkusers, and in actual hours put in) is something that I was completely unaware of. Carcharoth (talk) 16:49, 10 June 2008 (UTC)
I just don't see what the advantage would be; we could also say simply "Thatcher and Ali did the most, Josh and FT2 did a bunch too." The numbers really are meaningless; they say nothing about time (an awful lot of mine are check, nope, discard, three clicks and a better interface would have just one, but when I was doing a lot of RFCU work, each one took a lot more time); nor anything about the nature of the requests (RFCU? Known socks? Checking up on supposedly reformed puppeteers?). We spend a disproportionate amount of time chasing down a small number of annoyances (and there's been a lot of WP:DENYing going on in this discussion). The raw numbers are mostly fodder for the ignorant. --jpgordon∇∆∇∆ 21:25, 10 June 2008 (UTC)
Your last edit mangled things a bit. I've repaired it. My point was that there are misunderstandings about checkuser, and a bit more openness could avoid that. It would also raise new questions, but that isn't always a bad thing, as it should be simple to explain things like you are doing here. The numbers are only meaningless if they are presented without qualifiers, which you have made a start providing. I'm not sure what your point about WP:DENY is. If you mean checkusers spending "a disproportionate amount of time chasing down a small number of annoyances", then you here, and Raul elsewhere, have already made that point without any help from anyone else. If you mean something like 'checkusers often carry out checks on their own initiative', well, that's been obvious for a while (in my case, it was Thatcher popping up in discussions and going "sockpuppet" to help clear things up, without any formal RFCU). From my point of view, an annual report, which sounds like it wouldn't take long to prepare, would keep people informed as to what is going on (while still maintaining privacy as regards IP data, obviously), while having nothing of this sort would just continue to leave people in the dark as to basic questions about checkuser. At the moment, people know who can use it, and where to ask for a checkuser to be run, and (if WP:RFCU is done) they usually know why a checkuser has been requested. What they don't know is to what extent checkuser is used, or not, and what goes on in the background. I understand that you and many others see checkuser as a key weapon against sockpuppets, but there are other issues involved as well that it is important not to lose sight of, namely the need for as much openness and transparency as possible as regards the use of a tool that involves (in some, not all, cases) data that can breach an individual Wikipedian's privacy. It's a general principle that those using such tools should be open to transparent processes, rather than talking about "fodder for the ignorant".
Anyway, based on what has been discussed here, a bit more could be added to the policy page to make things clearer. For example "and monitor and crosscheck each other's use of the function" - something could be added to indicate that this does indeed happen (ie. not just saying how this should work in theory, but how it does work, and is working, in practice). I also get the impression that the ombudsman committee is reactive only, ie. it reacts to complaints, but doesn't do any monitoring or tracking of what happens. Finally, I think documenting the history is an end in itself. For historical purposes alone, an annual report doesn't seem too much to ask for. If, say, FT2 did prepare such a report, would you object to it in principle? Carcharoth (talk) 22:05, 10 June 2008 (UTC)
Not really anticipating that would be useful or desirable, or something I'd really want to be doing. I'm not really a report writing type. My interest is more writing content myself, and removing obstacles so others can write content. I'm glad this is helpful, and as a once-off exercise it probably is useful, but I'm not going to tie myself down to any kind of "editorial statistics table production". FT2 (Talk | email) 22:34, 10 June 2008 (UTC)
I probably did go a bit far there, didn't I? :-) Rule number whatever would be "never volunteer others for work they might not want"! Still, I think some people at least have found this useful, so, unless anyone else wants to take this up, I'll leave it at that for now and see what people think in a year's time or so. Looking back on these stats after a few months to a year, it will be interesting to see what desire there will be for an updated version or not. Thanks again, FT2, for producing those stats (I think you said you had done them earlier following a previous request?), and thanks to those who replied to my questions (which weren't initially as clear as they could have been). I was a bit worried about what reaction there would be, but this was more than I was expecting in many ways. Thanks. Carcharoth (talk) 04:05, 11 June 2008 (UTC)

[edit] A fantastic first step!

Thanks to FT2 for the above - I absolutely believe it's a step in the right direction - further;

  • I would like to know when I have been 'checked', and if possible by whom, and with what rationale
  • I currently have no means of assessing whether or not there has been 'wiki-political' abuse of checkuser tools in my case, but I would like it openly discussed / investigated - any advice about how to go about this would be much appreciated.

cheers, Privatemusings (talk) 00:41, 11 June 2008 (UTC)

  • Of course you've been checked; you were abusing multiple accounts (even if you didn't think it was abuse) and the whole purpose of checkuser is to detect people doing exactly that. We do not reveal specific information from the logs; if you believe you have been checked inappropriately, you may petition the checkuser ombudsman to investigate the case, and the will take the appropriate actions. --jpgordon∇∆∇∆ 04:29, 11 June 2008 (UTC)
I didn't mean to ask if I was checked (and I don't really think I did, but it's no big deal that you've answered a question which wasn't asked). Further - you can read above some chat about the role of the ombud.s - it's been made very very clear to me that because no violation of the privacy policy has occurred, my case lies outside their purview. Can checkuser problems occur without a violation of the privacy policy? If so, what do they look like, and where can we discuss / examine it? In my view, it's been clearly established that a checkuser minded to release the information I've asked for to me would not be contravening any policies - they'd just have to perceive a sensible reason.... I'd hope that it's no big deal, and should that not be reason enough, I'd like to discuss it further... cheers. Privatemusings (talk) 06:04, 11 June 2008 (UTC)
There is, as you say, no reason in the privacy policy or indeed any other policy to prevent us from releasing this information. As a matter of practice, however, this information is generally not released and will not be without a good reason. It would be an enormous burden on those with the CheckUser permission to have to respond to these questions. If you have a serious allegation, you are welcome to bring it up and it would be seriously considered. We are not minded to enter into these kinds of games. Sam Korn (smoddy) 09:43, 11 June 2008 (UTC)

This was a post of aggregate data in response to a request by Carcharoth and others, for information of a general nature on the checkuser role and workload, and general practices. As said, "Please don't ask 'who was checking what'..." Checkusers check all sorts of cases where there is evidence multiple account based disruption may be occurring. Sometimes it is; sometimes it isn't. Apologies, PM; but as I said right up front, details of that kind aren't given out except to the Ombudsmen in the rare case of investigating an actual complaint or other checkusers crosschecking each others' work. FT2 (Talk | email) 09:13, 11 June 2008 (UTC)

many thanks to Sam and FT for their responses. To FT, I'd draw attention to my reply to jp - namely that there seems to be some confusion over the role of the ombuds (as in, if it is your view that they should investigate complaints that do not contravene the privacy policy, then that does not seem to be shared by them). To Sam - I totally understand that this sort of information - and requests for it - need to be sensitively handled, but yes - I do wish to make serious allegations. I believe there may have been an abuse of checkuser tools in my case, and wish to discuss it in a sane and sensible manner, without gameplay, and am happy to do so in as drama free a manner as possible (email, IRC, here, somewhere else?) - thanks heaps for the substantive responses though - I really really appreciate it. Privatemusings (talk) 12:38, 11 June 2008 (UTC)
Will contact you on that then, when I'm next free a few moments. Understand it'll be simply to see and advise yes/no if I can see a serious concern or not, and not for digging or fishing. FT2 (Talk | email) 12:56, 11 June 2008 (UTC)
(Presuming that FT2 hasn't been in touch yet) On English Wikipedia, complaints about misuse that do not result in privacy breaches should therefore be referred to the Arbitration Committee initially. I can't see any occasion when you have actually made the allegations -- though, of course, I may have missed something -- so that would be the appropriate place to start. Sam Korn (smoddy) 16:59, 11 June 2008 (UTC)
thanks very much guys - there's really no rush about any of this, but you'll gather that it's a subject I do care about, and would like discussed. It's interesting (to me anyways!) that in this short thread I've been pointed at the ombud.s by two sitting arb.s, and at the arbcom by an ex-arb - I think (per this, and the thread above) that there's still confusion over who deals with what - and maybe clearing that up is a fringe benefit?! I'd hope that some 'on-wiki' process might be possible, to discuss not just checkuser abuse, but also 'best practice' (things like when to make the check known etc. - for example, if a checkuser has performed a check, should they avoid reviewing an unblock request? etc. etc.) - I would be very interested to gauge community views on this sort of thing in some way... does this sound a bit like an RfC? I'll hold my horses until I've had the chance to catch up with FT2 as well.... cheers, Privatemusings (talk) 00:38, 12 June 2008 (UTC)