Wikipedia talk:WikiProject Fix common mistakes
From Wikipedia, the free encyclopedia
Contents |
[edit] Spell checker bug
Your spell checker flagged Herosé, but it showed up in the misspellings list as simply "hero".
- Thanks for the bug report. The problem seems to be that it doesn't handle special characters (like the acute accent) properly. I have removed these false positives from the list and will fix it with the next version of the spellchecker. Sietse 14:23, 27 Nov 2004 (UTC)
[edit] Extra \
Page titles containing an apostrophe had extra backslash characters in the links. I removed them, and you should correct this in the future. Eric119 03:48, 29 Nov 2004 (UTC)
- Thanks, I didn't notice those entries; I have a greyscale monitor, so red links don't really stand out on my screen. I'll take care of this in the next listing. Sietse 08:52, 29 Nov 2004 (UTC)
[edit] Ideas for future checks
In this place, I would like to exchange ideas about possible error patterns to search for in the article space. I would like to check for patterns that:
- Are easy to search for using regular expression matching (e.g. with grep or something similar)
- Have a low false alarm rate
- Occur frequently in the article space
Sietse 13:50, 29 Nov 2004 (UTC)
These are the future ideas for the project that I have seen on related pages, or heard from other people. If you have any other ideas, please add them here:
- Other repeated words:
- 'a a'
- 'an an'
- 'by by'
- 'it it'
- 'of of'
- 'to to'
- 'is is', 'are are', 'was was', 'were were'
- 'has has', 'have have'
- Frequent misspellings, especially
- recieve, recieved, recieves, recieving
- percieve, percieved, percieves, percieving
- acheive, acheived, acheives, acheiving, acheivement, acheivements
- Multiple interwiki tags for the same language
- Level three headings without preceding level two headings
- How about articles that contain the same link more than once? I remember reading a style guide (can't find it now) that said you should only mark up a link on its first occurrence unless there's a good reason to repeat the markup for readability (for example after starting a new major section). For extra credit, find pages that link to both a disambiguation page and a specific page on the same topic (until yesterday, Thimerosal did that). I realize these would require some heavy SQL. DavidBrooks 23:04, 8 Dec 2004 (UTC)
- BTW no relation to DavidWBrooks! DavidBrooks
- Word misuses, such as 'comprised of' -- Smjg 17:44, 15 Dec 2004 (UTC)
- There are some legitimate cases for repeating words, but I presume that for most words, this can NEVER be the case. Thus, we could make or obtain a list of words where this can be the case, and check for any word not on the list being repeated. Brianjd | Why restrict HTML? | 06:43, 2005 Mar 20 (UTC)
-
- You won't find much. I've already done that at User:R3m0t/Reports2. r3m0t talk 11:19, Mar 20, 2005 (UTC)
[edit] The the
Seems to me that the "the the" list includes a lot of false hits because the test doesn't do a whole word test -- in other words, it's getting things like "The then governor..." or "The theory" or "the thenar eminence". --jpgordon{gab} 18:45, 29 Nov 2004 (UTC)
I've seen some pages like that, but if I continued searching there was an incidence of "the the" as well. If you didn't look further, you may wish to try that. If you did look, and didn't find it, perhaps the test should be updated. Examples would be nice.... --Brian A. Sayrs (talk) 18:52, 29 Nov 2004 (UTC)
- Jpgordon: I have replied on your talkpage. Everybody else: if you see any entries in which the version of November 26th (including comments!) did not contain 'the the', please notify me and tell me the title of the article. That should not happen. As Brian says, it it possible that an article also contains instances of 'the theory' or other similar patterns, or that entries have already been fixed by someone else. Sietse 21:00, 29 Nov 2004 (UTC)
-
- Sietse: Based on your last comment, "it it" may need to be checked, too! :) Brian
-
-
- ;-) Oh irony! Well, I'll add it to the list, maybe I'm not the only one. Sietse 22:24, 29 Nov 2004 (UTC)
-
User:p3d0 wrote in an edit summary of the project page: ETA contains two instances of "the then" but no "the the"
- That is because the checks are against the database copy of November 26th. This instance of 'the the' was fixed on November 29th. Please see this diff. Sietse 22:52, 29 Nov 2004 (UTC)
[edit] Aside - repetitive 'and'
"In this scene, the King, and, I think, the Queen are present". As shown, there should be a comma between the King and and, and and and I.
-- Solipsist 20:34, 29 Nov 2004 (UTC)
- Shouldn't that be either
- There should be a comma between the King and and, and and and I think.
- There should be a comma between King and and, and and and I.
- ...? :-) -- Smjg 11:18, 30 Nov 2004 (UTC)
(P3d0) No, it should be this:
- As shown, there should be a comma between "the King" and "and", and "and" and "I".
[edit] Checking miscapitalized adjectives
Very often, names of languages and ethnic groups are not capitalized when they should be, i.e., french, english, instead of French, English. That could be checked too. Danny 14:26, 4 Dec 2004 (UTC)
- It turns out that there indeed is a very large amount of such errors in the article space. I have put a selection of the articles that contain such words on the project page. Thanks for the idea! Sietse 17:55, 4 Dec 2004 (UTC)
-
- I broke the list into manageable pieces. I THINK it's less daunting, but I KNOW it lessens the burden on the server. I apologize if this causes a problem, but I was attempting to be bold! Make sure that if it is a problem, you take a moment to slap me around a little! Brian Sayrs 01:22, 2004 Dec 5 (UTC)
-
-
- No need to slap anyone around a little :) Good idea! It certainly looks less overwhelming this way. I'll keep this in mind when putting the next list on-line.
-
-
-
- Wasn't that list a bit small in fact? I am doing the same fixes, sorted by article creation date, and the last 300 articles I fixed were not in your list (Zazou, Saramaka, Rubem Fonseca are the latest), despite having the exact same miscapitalised words. Any idea why they are missing? Sam Hocevar 08:58, 6 Dec 2004 (UTC)
-
-
-
-
- I deliberately kept the list incomplete to make it look less daunting. I thought it would discourage potential contributors if they are faced with a list of a few thousand articles that contain mistakes. My intention was to split the list of problem articles into batches of about 1000 entries and try to fix those in a period of a few weeks. Btw, 300 articles?, great job! -- Sietse 09:15, 6 Dec 2004 (UTC)
-
-
I think you should make it clear when these should be capitalized, and when not.. because i believe in many.. possibly most cases, it is correct not to capitalize. [comment by anonymous user]
- Which cases do you mean? As far as I know, adjectives that are derived from proper nouns should always be capitalized in English. I have learned it this way; the capitalization article and the grammar guide in my dictionary say this too. But please tell me if I'm wrong or if this is not correct in all variants of English. Sietse 17:12, 5 Dec 2004 (UTC)
-
- "french fries" should be left uncapitalised, because the "french" there comes from the verb "to french". Also, "cousin-german", "brother-german". Sam Hocevar 17:54, 5 Dec 2004 (UTC)
-
-
- Sam makes a great point...especially how often "french fries" shows up in the 'pedia. It makes you wonder a little! Brian Sayrs 18:50, 2004 Dec 5 (UTC)
-
-
-
-
- Thanks guys, I forgot to filter for those words. Anyway, seems like we'll have to include 'French fries' occurences that are not at the beginning of a sentence in the next run :) Sietse 19:10, 5 Dec 2004 (UTC)
-
-
TODO:
- italian
- peruvian
- estonian
- algerian
- wikipedian ;)
-
-
- Yes, I've only checked for fourteen such adjectives. The next run will contain words which I thought would be less common (italian, peruvian, estonian, algerian, ...) Sietse 10:32, 5 Dec 2004 (UTC)
-
-
-
-
- And fourteen made quite a list in itself, didn't it!
-
-
-
-
-
-
- Don't forget soviet... --Dryazan 22:49, 5 Dec 2004 (UTC)
-
-
-
-
-
-
-
-
- Beware this one: when meaning pertaining to the Soviet Union, it's Soviet; however, when talking about the council, it's soviet. Sam Hocevar 11:01, 8 Dec 2004 (UTC)
-
-
-
-
I noticed a link in entheogen was changed from Aborigine to Aborigine. This seems pedantic to me ... is there really any good reason why we should care about the capitalization of text the reader will never see? (If there is, it should be Aborigine.) (For those who are wondering what I'm on about, edit this text to see the differences) Rkundalini 00:39, 16 Dec 2004 (UTC)
- The idea is that it avoids going through the same false positives again and again, while being completely harmless whatsoever. Also, I dream of a day when Wikipedia article names are fully case sensitive :-) Sam Hocevar 01:47, 16 Dec 2004 (UTC)
[edit] Bug in pages with slashes?
In miscaptialized words, List of people by name was listed a number of times, yet none of the listed errors were present. I suspect the actual errors are on "List of people by name/something" but the page name got truncated at the slash. However, there are so many sub-pages to that particular page that I can't confirm this theory. --P3d0 00:01, Dec 7, 2004 (UTC)
- Thanks for the report. I've looked into it, and it turns out that the miscapitalization-finding script actually does what it should do. The culprit is the program that adds wiki-syntax to the output of that script: it doesn't handle colons in titles properly. I'll fix it with the next run of the script. Sietse 11:23, 8 Dec 2004 (UTC)
[edit] Capitalizing web addresses
Please be careful when running your script (or whatever it is you are running) over URLs. Microsoft servers won't care if you capitalize a letter in the URL, but *nix servers (the majority) will care, and will send you to a possibly non-existant page. Please try not to edit URLs. Thanks. --Cantus…☎ 01:47, Dec 8, 2004 (UTC)
That is true for paths of URLs, but the hostname portion of a URL is case insensitive, according to internet protocols for hostnames. While hostnames are traditionally presented in lowercase, people will sometimes capitalize them for emphasis or readability, such as IMDb.com or ThinkGeek.com. However, it would be very difficult to automate capitalization checks on these strings because the words are all run together. Basically, I agree that you should probably not edit the capitalization of URLs, but your statement isn't quite accurate. Podkayne 14:47, 14 January 2006 (UTC)
[edit] Suggested edit summary
This mechanical bug fixing and various robots itterfere with monitoring anon vandals. I suggest in the future to propose two suggestions for edit summary: one as usual, another with indication that the previous contributor was an anon. E.g.,
- fix miscapitalisation; please help us fix common mistakes in the article space
- fix miscapitalisation (after anon edit); please help us fix common mistakes in the article space
Mikkalai 03:12, 9 Dec 2004 (UTC)
- How does it interfere? Doesn't the javascript-enhanced RC listing avoid this problem? The wikis are so awfully slow these days that I am a bit reluctant to spend even more time waiting for the history page to display. But if this is really a serious problem for you, I'm willing to make an effort. Sam Hocevar 00:31, 11 Dec 2004 (UTC)
[edit] WikiProject
Why not make this is a proper WikiProject so that someone else can upload the lists? Brianjd | Why restrict HTML? | 03:11, 2005 Mar 30 (UTC)
- I have stopped working on this project for at least a while. I'm not working on Wikipedia stuff, except for occasional minor fixes, until my master's thesis is finished. If anyone else wants to add lists to the page in the mean time , or move/copy the page, or convert it to a WikiProject that's fine with me (of course). Sietse 12:47, 31 Mar 2005 (UTC)
[edit] Using Google
Google can be a useful tool for finding simple errors of various sorts such as repeated words. For example, the following Google search string:
"with with" site:en.wikipedia.org -"talk:" -"user:"
Will find duplicates of "with" while skipping talk and user pages.
A significant downside of using Google is that it searches a cached version of Wikipedia that is not up-to-date. None-the-less, I have been able to make good use of it. Gaius Cornelius 17:32, 20 September 2005 (UTC)
[edit] {ednote}
For those of us who are tagging problem sections while reading, and dont want to switch to edit mode, but still want to correct an issue, use Template:Ednote, usage {{ednote|[[problem]]}}, where "problem" is anything like WP:SELF, etc. -Ste|vertigo 21:41, 16 June 2006 (UTC)
[edit] Project directory
Hello. The WikiProject Council has recently updated the Wikipedia:WikiProject Council/Directory. This new directory includes a variety of categories and subcategories which will, with luck, potentially draw new members to the projects who are interested in those specific subjects. Please review the directory and make any changes to the entries for your project that you see fit. There is also a directory of portals, at User:B2T2/Portal, listing all the existing portals. Feel free to add any of them to the portals or comments section of your entries in the directory. The three columns regarding assessment, peer review, and collaboration are included in the directory for both the use of the projects themselves and for that of others. Having such departments will allow a project to more quickly and easily identify its most important articles and its articles in greatest need of improvement. If you have not already done so, please consider whether your project would benefit from having departments which deal in these matters. It is my hope that all the changes to the directory can be finished by the first of next month. Please feel free to make any changes you see fit to the entries for your project before then. If you should have any questions regarding this matter, please do not hesitate to contact me. Thank you. B2T2 13:41, 26 October 2006 (UTC)