Wikipedia:Mediation Cabal/Cases/2006-04-27 Elo rating system

From Wikipedia, the free encyclopedia

1 Mediation Case: 2006-04-27 Elo rating system

[edit] Mediation Case: 2006-04-27 Elo rating system

Please observe Wikipedia:Etiquette and Talk Page Etiquette in disputes. If you submit complaints or insults your edits are likely to be removed by the mediator and refactoring of the mediation case by anybody but the mediator is likely to be reverted. If you are not satisfied with the mediation procedure please submit your complaints to Mediation Cabal: Coordination Desk.

[edit] Request Information

Request made by: Dionyseus 23:55, 27 April 2006 (UTC)

Where is the issue taking place?: Elo_rating_system

Who's involved?: User:Wolfkeeper, and myself

What's going on?: Wolfkeeper insists in including in the article two uncited and unfounded claims. He claims that centaurs (these are humans who use the aid of computer chess programs) regularly outperform Hydra (a chess machine that has never been beaten by a human in a standard time control chess game, and has dominated the computer chess scene ever since 2004), and that centaurs perform at the 3200 rating level. Both of these claims are false, unfounded, and uncited. Furthermore, Hydra won the 2006 PAL/CSS Freestyle tournament this month, a tournament in which centaurs, teams of computers, grandmasters, and even teams of grandmasters can participate in, by a full point ahead of the strong field. This was no luck, it follows the pattern of dominance Hydra has held since 2004 when it completely humiliated the former computer world champion, Shredder 8[1], and the humiliation of Michael Adams in 2005 [2]. Michael Adams himself called Hydra the "Kasparov of computers" [3]. Keep in mind that the 2004 version only used 16 nodes, the 2005 version used 32 nodes, the 2006 version is using 64 nodes.

What would you like to change about that?: I'd like for Wolfkeeper to understand that he cannot include these claims in the article because they are false, unfounded, and uncited.

If you'd prefer we work discreetly, how can we reach you?: No discreetness needed. I just would like this to be resolved.

Would you be willing to be a mediator yourself, and accept a mediation assignment in a different case?

 Sorry, don't have the time.

This is, following the Categorical Imperative, the idea that you might want to do

what you expect others to do. You don't have to, of course, that's why it's a question.

...

[edit] Mediator response

[edit] Analysis of the dispute

Please, you both, read Wikipedia:Verifiability and Wikipedia:No original research. The encyclopedia can't have any claims that don't have reliable sources. BTW, the one who added the centaur claim was this anonymous user.

So, the major problem here -- you only claim, and don't cite. Dion, can you remember where you read that Hydra's games it played were sufficient to determine it's status as the strongest (I can't find that info on the the hydrachess page now)? Wolf, did you see somewhere else the centaur claim or can you provide us some games between centaurs and Hydra?

The issue on Hydra's ability to play is only a minor issue; you can just find a mid-term (provided "strong evidence", for example), as I don't think you will find that discussion anywhere else on the net.

As for the Freestyle chess tournament note, it can be used to partially prove that Hydra is the strongest, or that centaurs are the strongest. First, you need to define if the Hydra team did assist the computer or not.

This notice on chessbase doesn't make things any clearer; they don't mention if Hydra was a centaur or an engine player. F e tofs ^Hello! 16:47, 30 April 2006 (UTC)

Thanks for taking this case. I agree, but the article does not claim that Hydra is the strongest, the article says "Hydra is probably the strongest," keyword is "probably." In other words, the article is claiming that perhaps Hydra is the strongest, but that it has not played enough games to prove it. As for the question of whether or not Hydra received any assistance in the 2006 PAL/CSS Freestyle tournament, I strongly believe it did not because the entire goal of the Hydra team is to prove that Hydra is the strongest chess entity in the world, but because I have been unable to source it I've decided to keep it out of the article. Either way, I don't see how this tournament has any relevance to the article, perhaps it can be used as further evidence that it is the strongest, I don't know I personally think that Hydra's match wins against Shredder 8 and Michael Adams were more impressive. Dionyseus 20:31, 30 April 2006 (UTC)

I agree that WolfKeeper needs to find sources for the claims that centaurs regularly outperform Hydra, and that centaurs perform at a 3200 rating level. I personally know for a fact that the first claim is entirely false, and as for the second claim I've tried searching for it with Yahoo and Google and have not found anything resembling the 3200 claim. Dionyseus 20:31, 30 April 2006 (UTC)

Sorry, that was not what I meant. I meant the "provided evidence for"/"demonstrated" issue. Look at my last edit on Elo rating system, I guess this problem is solved by applying a bit of a neutral point of view. However, I still need a source for it down below so we could safely say it's not original research. F e tofs ^Hello! 21:16, 30 April 2006 (UTC)

[edit] Suggestion

So I think we should reword the article to say "some computer scientists believe that the greatest level of play is achieved by centaurs (insert my last reference here), but at the 2006 PAL/CSS tournament (that involved both engines and centaurs), Hydra won a full point ahead of the runner-up, Vasik Rajlich with Rybka." (that is, if you both agree). F e tofs ^Hello! 13:33, 1 May 2006 (UTC)

That way, we're holding to facts, and not speculations or opinions. F e tofs ^Hello! 13:36, 1 May 2006 (UTC)

I disagree. Vasik Rajlich's statement that 'Computer chess experts as well as human chess masters have traditionally agreed on one thing: a human chess master, assisted by a top chess engine, is capable of producing the highest level of chess known to mankind' says that a 'human's' play increases when assisted by a computer, it does not say at all that an engine's play may increase when assisted by a human. At the bottom of that report, after he had witnessed the conclusion of the first round of the 2006 tournament, Rajlich says 'the extent to which an intelligent centaur is better than a pure engine may be a bit less than I had previously thought.' He goes on to say 'A number of chess themes – things that many humans would even consider to be positional principles – have turned out to simply not work. How much of what we think is good chess play is in fact good chess play, and how much is just myth?' That's my point, it is entirely possible that humans, including Garry Kasparov, are simply too flawed to even assist something as powerful as Hydra. Dionyseus 19:48, 1 May 2006 (UTC)

"That's my point, it is entirely possible that humans, including Garry Kasparov, are simply too flawed to even assist something as powerful as Hydra." This seems to be your opinion, I want to hold to the sources. F e tofs ^Hello! 20:39, 1 May 2006 (UTC)

Alright, I'll support the inclusion of your sentence. Dionyseus 21:25, 1 May 2006 (UTC)

I agree with this bit: "some computer scientists believe that the greatest level of play is achieved by centaurs (insert my last reference here)". But the problem with the rest of it is that the only hard evidence we have says that Zor_champ including Hydra won, not just Hydra. If we added a bit about the Freestyle tournament being a way of establishing it or something it might be ok, if suitably phrased.WolfKeeper 01:27, 2 May 2006 (UTC)

That was my main concern. Should we exclude this part from the sentence, then? I'm going to try to find somewhere that says this, but most sites seem to tell Hydra did play alone, including Rajlich's report. On to google! F e tofs ^Hello! 12:30, 2 May 2006 (UTC)

Can you provide any further evidence that Hydra didn't play alone? Look at this, for example. I think your "evidence" is a lack of proper wording. F e tofs ^Hello! 12:45, 2 May 2006 (UTC)

[edit] Recommendations on solving this issue

So, what you will be discussing here is not how the other has a great POV, etc. I recommend that you take your time to find sources that support all of your claims somewhere that is not Wikipedia, and then come back to discuss and present them to the other party, and make sure your sources are reliable. If you don't succeed in providing sources that support any of your claims you should give up trying to put that one into Wikipedia, as the policy as of now doesn't accept that kinds of actions. I have even created a "source" section below; feel free to add anything you find. Once you think you have searched enough (and only then), restart the discussion. F e tofs ^Hello! 16:47, 30 April 2006 (UTC)

[edit] Sources to consider

The following discussion is an archived debate. Please do not modify it. Subsequent comments should be made in a new section.

This seems to be agreed upon. The centaur issue is greater, however. F e tofs ^Hello! 23:41, 30 April 2006 (UTC)

[edit] The creators of Hydra have evaluated it as having 3000 ELO

Lorent's Hydra strength claim"We think we have crossed the 3,000 ELO line,"

He doesn't state how he calculated this though; it's quite plausible that they could have played it against other engines during the development of Hydra, for example Fritz against any of the other commercial engines. They may also have estimated it from extrapolating the strength as they add nodes to Hydra.WolfKeeper 23:01, 30 April 2006 (UTC)

I'm certain that the games against Adams are not sufficient to determine a 3000 rating. 6 games against a ~2750 isn't enough.WolfKeeper 23:01, 30 April 2006 (UTC)

The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made in a new section.

[edit] Centaurs perform (or not) at 3200 ELO

[edit] Centaurs frequently outweigh (or not) Hydra

Well the claim WolfKeeper is trying to put into the article is that centaurs frequently outperform Hydra. Therefore I am not required to prove the claim wrong, rather WolfKeeper is required to provide a source for the claim. Regardless, Hydra winning the 2006 PAL/CSS Freestyle tournament a full point ahead of the field of centaurs, grandmasters, and other engines is strong evidence against the claim that centaurs regularly outperform Hydra. Dionyseus 21:34, 30 April 2006 (UTC)

Please provide evidence below of this fact. F e tofs ^Hello! 21:41, 30 April 2006 (UTC)

Computer chess experts as well as human chess masters have traditionally agreed on one thing: a human chess master, assisted by a top chess engine, is capable of producing the highest level of chess known to mankind. This is too WP:WEASEL, however... F e tofs ^Hello! 21:41, 30 April 2006 (UTC)

There's also the point that whatever strength Hydra plays at, a centaur containing Hydra plays even better; there's certain positions that humans are better at handling than machines.WolfKeeper 21:51, 30 April 2006 (UTC)

That statement would be true only if one were to assume that Hydra was assisted by grandmasters.

The Zor_champ team had atelast one grandmaster on the team; but my point is a more general one than whether Zor_champ was or wasn't running in centaur or cyborg mode. And I think you're agreeing that grandmaster(s) + Hydra > Hydra; hence the centaur comment seems to be quite supportable.WolfKeeper 22:22, 30 April 2006 (UTC)

Also, it is not known whether or not humans are better at handling certain positions than Hydra. The former FIDE world champion Ruslan Ponomariov tried anti-computer chess strategies against Hydra in 2004 and 2005 and failed horribly, losing all three games game 1 game 2 game 3. Dionyseus 22:05, 30 April 2006 (UTC)

That's a human against Hydra. That doesn't speak to whether a human could help Hydra to beat another player by giving it hints in certain situationsWolfKeeper 22:26, 30 April 2006 (UTC)

"That statement would be true only if one were to assume that Hydra was assisted by grandmasters." (Huh?). F e tofs ^Hello! 22:16, 30 April 2006 (UTC)

I was referring to WolfKeeper's statement that Hydra would play better than centaurs if it was a centaur itself. Dionyseus 22:18, 30 April 2006 (UTC)

But why can't a human giving hints to Hydra be considered better advice than only Hydra's purely calculating technique? F e tofs ^Hello! 22:32, 30 April 2006 (UTC)

That's possible, but it's not proven. It's also possible that a human cannot give helpful advice to Hydra. The strongest human player ever, Kasparov, was only 2851 at his peak afterall. Dionyseus 22:35, 30 April 2006 (UTC)

You seem to be saying that less strong humans cannot assist a stronger player. Considering that most champions train with weaker players in preparation for a match, that is not a usual point of view.WolfKeeper 23:07, 30

April 2006 (UTC)

No, what I am saying is that there may be ceiling to what humans can achieve. In other words, it is possible that humans are incapable of assisting Hydra because Hydra is too powerful. It would be like a student trying to assist a master. Dionyseus 23:18, 30 April 2006 (UTC)

Also, the claim in question is the claim that 'Centaurs regularly outperform Hydra in standard and correspondence play." The key words in that claim are 'standard' and 'correspondence' play. First of all the only times Hydra has played against centaurs is in the 2005 and 2006 PAL/CSS Freestyle tournament, and it won the 2006 tournament a full point ahead of the field, and these tournaments were played at rapid time control, not standard. Second, the only correspondence games Hydra has played were the three games it played against Arno Nickel, and the 32 node version of Hydra managed to score a draw in its third game. So clearly this claim is false for several reasons: 1) Hydra is not 'regularly' beaten by centaurs. 2) It has never played centaurs in standard time control. 3) Hydra has not been beaten in correspondece play 'regularly.' Dionyseus 01:28, 1 May 2006 (UTC)

Merely doubling the speed doesn't help as much as you think it does, besides, computers are speeding up all the time anyway; so it needs to speed up to track Moore's law. I think the general preponderance of evidence is that humans can outperform engines over longer time schedules; particularly if they have access to engines to help them. At the end of the day Hydra is just another chess engine. Sure, it's got hardware speed ups, but it's not playing significantly differently than other chess computers, it's just searching further. To discount the fact that we know the properties of the algorithms it uses to search is not helpful.WolfKeeper 22:07, 8 May 2006 (UTC)

As to the fact that it isn't winning the correspondence match, its performance seems to be consistent with how I think it would play, based on its construction, and the likely strengths of engines available to Arno Nickel.WolfKeeper 22:07, 8 May 2006 (UTC)

[edit] The Hydra Freestyle team helped (or not) the engine

The hydrachess webpage [4] describes it as:

'In a fierce last round battle, when the leading team 'Zor_champ' (UAE) with Hydra had to defend itself against the German centaur team 'Ciron', it finally gained the upper hand and took the title same as the biggest money prize: $8'000.'

Absolutely no claim that Hydra played all the moves is made. That is evidence against; since they would have been expected to mention it.WolfKeeper 21:08, 30 April 2006 (UTC)

Notice the keyword "itself," indicating that Hydra was defending itself, not themselves. Dionyseus 21:15, 30 April 2006 (UTC)

Um, no actually, 'itself' obviously refers to the subject of the sentence, 'zor_champ'. Seriously, why do you even bother with this bullshit?WolfKeeper 21:19, 30 April 2006 (UTC)

You are mistaken, 'itself' refers to Hydra. The subject is 'last round battle,' with the action being 'Hydra had to defend itself.' Dionyseus 21:22, 30 April 2006 (UTC)

Oh really? So a 'last round battle' gained the upper hand and won the prize money did it?. No. Battles don't collect money. And Hydra wasn't even entered into the competition- the Zor_champ team was. Zor_champ won the prize, and gained the upperhand, it's unambiguous unless you're living in bizarro land.WolfKeeper 21:43, 30 April 2006 (UTC)

According to the wikipedia Hydra_(chess):

'Due to human handler errors and program errors, Hydra did not fare well in the June 2005 PAL/CSS Freestyle Chess Tournament, an online tournament where players are allowed to access any and all resources to them, including computer engines, databases, as well as human grandmasters. Two versions of Hydra participated in the tournament- Hydra Chimera (without human intervention) scored 3.5/8, and Hydra Scylla (with human intervention) scored 4/8. Neither version of Hydra qualified for the quarter-finals.'

I haven't checked this, but if it's correct, in the 2005 competition, the human assisted version got more points than the unassisted one, and more to the point, the humans could and did assist Hydra.WolfKeeper 23:35, 30 April 2006 (UTC)

But the assisted one had 32 nodes, while the unassisted 16. F e tofs ^Hello! 23:59, 30 April 2006 (UTC)

Nevertherless, my main point is that historically they have helped their engine, so they must think that there is some benefit to doing that.WolfKeeper 00:11, 1 May 2006 (UTC)

Fetos you are right, the unassisted version, which played under the nick of Zor_Champ (which by the way is further evidence that when it plays under the nick of Zor_Champ it plays unasssisted), used only 16 nodes, while the assisted version, which played under the nick Ares01, had 32 nodes. The one with the 32 nodes managed to get half a point more than the 16 node version. Here's the Chessbase report. Dionyseus 00:47, 1 May 2006 (UTC)

the hardware monster Hydra, plays under the handle Zor_champ. F e tofs ^Hello! 12:14, 1 May 2006 (UTC)

Thanks, that sentence 'Apart from the hardware monster Hydra, which is running on 64 parallel processors and plays under the handle "Zor_champ" out of Abu Dhabi, some other engine players...' indicates that Hydra, under the handle of Zor_Champ, was unassisted in the 2006 PAL/CSS Freestyle tournament. The unassisted version of Hydra in the 2005 PAL/CSS Freestyle tournament also played under the handle of Zor_Champ. I think the reason they had an unassisted version and an assisted version in the 2005 tournament was because the Hydra team was probably wondering whether or not it would be beneficial to assist it. Apparently they concluded that it was not beneficial. Dionyseus 12:28, 1 May 2006 (UTC)

The fact is that Hydra has been beaten two games out of three, and only managed to draw the third game against Arno Nickel + engine(s). This is strong evidence that Centaurs/Cyborgs are better than Hydra at longer times. Increasing the strength of the engine by doubling nodes would not significantly help with this; at most it will make the engine twice as fast for each doubling of nodes. In addition, the program did not win in the previous Freestyle events, only the 2006 event which was run on shorter times. The commentators believe that this favoured the engines. I don't believe that these facts are controversial; the weakness of engines at long times is well known. Hydra is just a chess engine when all is said and done.WolfKeeper 22:00, 7 May 2006 (UTC)

3 games isn't strong evidence at all, you yourself said that Hydra's 5.5-0.5 victory over Michael Adams was not enough evidence to support Hydra's team's claim that it plays at the 3000 elo rating level. I don't understand your argument that doubling the nodes wouldn't help, of course it helps, all evidence says that doubling the speed increases elo rating, I think double speed would increase strength by about 50 elo. And look at the result itself, the 16 node versions lost two correspondence chess games to Arno, and the 32 node version managed to draw. When you say "that program did not win in the previous Freestyle events" you make it sound as if there had been many events, which is simply not the case, there have only been two Freestyle events, the one in 2005, and this year's. By the way, what does all of this have to do with whether or not Hydra was unassisted or not? Dionyseus 21:52, 8 May 2006 (UTC)

[edit] Evidence

Please report evidence in this section with {{Wikipedia:Mediation_Cabal/Evidence}} for misconduct and {{Wikipedia:Mediation_Cabal/Evidence3RR}} for 3RR violations. If you need help ask a mediator or an advocate. Evidence is of limited use in mediation as the mediator has no authority. Providing some evidence may, however, be useful in making both sides act more civil.
Wikipedia:Etiquette: Although it's understandably difficult in a heated argument, if the other party is not as civil as you'd like them to be, make sure to be more civil than him or her, not less.

[edit] Compromise offers

This section is for listing and discussing compromise offers.

[edit] Comments by others

While using the talk page of the article in question to solve a dispute is encouraged to involve a larger audience, feel free to discuss the case below if that is not possible. Other mediators are also encouraged to join in on the discussion as Wikipedia is based on consensus.

[edit] Discussion

What's actually happening is that he is removing information from the wikipedia that he doesn't agree with. That's wrong under NPOV. NPOV is about collecting views in the wikipedia, not imposing views.

There is a point of factual contention: I contend that zor_champ was actually a centaur in the 2006 championship (in other words a team of chess players that included Hydra), and there is evidence from the hydra web page (that I referenced and was repeatedly removed) that supports my contention. It was even admitted that Zor_champ was centaur, but he then removed it all the paragraphs again anyway.

Some of the information I can neither confirm nor deny- it had been claimed that centaurs might have achieved 3200 elo; but that wasn't added by me, it sounds plausible, perhaps a little high.

Anyway, he hasn't provided any references to back up any of his claims, and I've been quite willing to let his views be included in the wikipedia. It's just that the only way he knows to express them is by removing contrary opinions.

He's also asserting things that can't be proved; that Hydra is stronger than the other players; that's likely, but not been statistically proven as yet- it hasn't played enough matches. Again I don't have a problem with including this view, provided he doesn't remove opposing views, and provided he doesn't overstate the case.WolfKeeper 05:28, 28 April 2006 (UTC)

No I removed two claims that you kept insisting on including in the article without provided references. I listed the two unfounded and unsourced claims above. I do not contend that Zor_Champ, Hydra's longtime handle in the Playchess server, played in centaur mode, I am however contending on the talk page that the Hydra team only used Hydra's suggestions to decide what moves to play, for it is their long stated goal to prove to the world that Hydra is the strongest entity, they have some Grandmasters in the team but they are for improving the opening book and providing practice, testing, and fine-tuning for the engine. What claims are you saying that I am not backing up with sources? -Dionyseus

You have no source for the claim that the team didn't help the program, and the web page doesn't claim that they didn't which I would have expected them to mention.WolfKeeper 07:28, 29 April 2006 (UTC)

The whole point of the Hydra project is to create the strongest chess playing entity. The Hydra team mentions this on their Hydra-Adams match page FAQ: http://tournament.hydrachess.com/faq.php . In this other article it is stated that their goal is to dominate the chess world and to become a worthy successor of Deep Blue [5] . Dionyseus 08:08, 29 April 2006 (UTC)

That simply doesn't prove that Hydra ran unassisted in the Freestyle competition. They might simply have wanted to win the competition; there's absolutely nothing in the rules against outside assistance in that .WolfKeeper 21:00, 30 April 2006 (UTC)

Also I did not claim Hydra is the strongest, the text says "Hydra is probably the strongest" because of its ongoing and demonstrated success, -Dionyseus

No, the text says that, because I wrote that, and kept putting it back in, everytime somebody took it out. And you'll note that it doesn't say 'Hydra is the strongest' because that isn't true- it hasn't been proven to be true, yet anyway.WolfKeeper 07:28, 29 April 2006 (UTC)

no human has ever beaten Hydra in a standard time control game, that is an incredible feat. -Dionyseus

It has never played the absolute top rank players though. And it would have been expected to beat Adams. It's easy to always win if you're always against players with significantly lower ranking than you.WolfKeeper 07:28, 29 April 2006 (UTC)

With that comment you are asserting that Hydra is much higher rated than Adams, who was rated 2737 at the time and ranked 7th in the world.

No, as the article says when you're not vandalising it Lorentz claims that. See: [6] for example.WolfKeeper 21:00, 30 April 2006 (UTC)

Hydra's 5.5-.5 victory against Adams is a rating performace of over 3000. If your assertion is correct, Hydra would have no trouble beating anyone! No human has ever been rated higher than 2851. Dionyseus 08:10, 29 April 2006 (UTC)

No. An ELO rating is statistical. Every 100 points means a 63% chance of winning a match, but the other side still has a good chance. That's what you're missing; it actually takes quite a few games to establish a rating, and the closer the other players are, the more accurately this can be done. Adams rating is too low for this match to establish Hydra's rating with any guarantee. Hydra hasn't really played enough games to give it a good rating based on games played.WolfKeeper 21:00, 30 April 2006 (UTC)

You do not even know who came up with the "centaurs play at the 3200 level" claim, and you have not even provided a single evidence that Hydra is regularly outperformed by centaurs, yet you insist that these two unfounded claims be included in the article, why? Dionyseus 06:22, 28 April 2006 (UTC)

No, I'm insisting you don't remove an entire paragraph, which contains several claims. And it doesn't say that that is true, it says that some people claim that. WolfKeeper 07:28, 29 April 2006 (UTC)

The paragraph that you are referring to consists of two unsourced and unfounded claims, and a statement that Hydra won the 2006 PAL/CSS Freestyle tournament, which has zero relevance to the article and actually makes the paragraph completely contradictory. If you cannot source the claims, do not insist on including it in the article. Dionyseus 08:08, 29 April 2006 (UTC)

No, that is the question isn't it? And you haven't been able to establish your claims at all, you were dumb enough to want me to do it for you. You've also repeatedly been deleting the reference section. That's pure and simple vandalism. In fact, it strongly suggests you don't want any references that would prove you wrong doesn't it?WolfKeeper 21:00, 30 April 2006 (UTC)

Retrieved from "http://en.wikipedia.org../../../m/e/d/Wikipedia%7EMediation_Cabal_Cases_2006-04-27_Elo_rating_system_15d8.html"