Wikipedia talk:WikiProject Red Link Recovery

From Wikipedia, the free encyclopedia

Before posting questions here, rember to check the FAQ in case they are answered there already.

Contents

[edit] A more intelligent search engine?

Not sure how to list this on the main page, so here's my discussion idea: How about some sort of semi-intelligent search engine on Wikipedia that will find alternative spellings or a part of a searched phrase, or suggest alternate spellings, etc? For example, "Breslov" (a group within Hasidic Judaism) is also spelled "Breslev" and "Breslav" because it is originally Hebrew, which has a different alphabet. Similarly, "Rabbi Nachman of Breslov" is the same person as "Nachman of Breslov" or "Rebbe Nachman" -- all three of which might be used by searchers. Plus, Yiddish and modern Hebrew differ in pronunciations of the same thing, hence differing transliterations. For example, a yechidus is the same thing as a yechidut. (No page on that yet -- I plan to create one.) I've been working on Hasidic Judaism pages and have had a heck of a time trying to figure out how Wikipedia is spelling Hebrew and Yiddish terms and names and whether or not an appropriate page already exists. rooster613

  • If there are any rules that can be derived I can certainly pull out lists of suggested changes to links. For example if "Rabbe Nachman" and "Rabbi Nachman of Breslov" are used inconsistently, I can detect red links to "Rabbi Nachman of Breslov" and suggest "Rabbe Nachman" as a link. If you have trouble picking out rules, I'm happy to run a set of examples through a pattern matcher and see if it produces anything useful. - TB 22:27, 2005 Jun 23 (UTC)
  • I would like to implement some type of nearest-neighbour matching, and query-reformulation in the URL's of Wikipedia to match, the fact of what happens to wikipedia search terms. It derives from ideas of search engines that look at user querys and match the terms to what other users spelt in addition to the correction lists, which we may already have. This POV, has the idea that, people will 90-95% of the time type correct URL's as they want to retrieve the right pages, and wont waste time. Thus we may capitalize of Wikipedias query logs to check if a particular search term is already present in the query-logs, [meaning its right!], or if a closely matching words are found in the logs, we may present them as suggestions to the user [like Googles suggestion]. I have posted a bug on this, at MediaWiki servers, and doing background research on this. Muthu CDT 9:47.00pm Oct 20th 2005.

There's no such spelling as "Rabbe" -- the word is "Rebbe" -- with an "RE" not an "RA." A rabbi is a scholar of Jewish law and teachings. A Rebbe is a charismatic saintly leader of a group of Hasidic Jews. These words are usually not a problem. But if you can disambiguate "Rabbi Nachman of Breslov", "Rabbi Nachman" and "Rebbe Nachman" and point them to the Nachman of Breslov page, it would be a great help. Ditto with pointing "Breslev" and "Bratzlav" to the Breslov (Hasidic dynasty). (Although Bratzlav is also a town in Germany.) Also, "Reb Nosson" and "Nosson of Nemirov" should point to Nathan of Nemirov. Thank you! rooster613

[edit] How often are the lists updated?

Just wondering if there is any point in going out of my way to remove listed links that are no longer red (a list update will fix those anyway right?). I did remove some today, but then it ocured to me that while cleaning the list is well and good that hour could have been better spent fixing actual red links instead. --Sherool 28 June 2005 15:50 (UTC)

I'd hope to regenerate the lists every month or two, time and database dumps depending. The current lists are basedon the 15th May database dump, so they're at least 40 days out of date - the chances are that a number of entries have already been fixed in this time, these'll be the one's you're seeing. All in all it's probably not worth deleting them unless you're editing a section anyway. Do however mark up false positives - I'll filter these out of future editions of the reports to save everyone time and effort. - TB June 28, 2005 21:33 (UTC)

[edit] Regarding the numeric list

How about implementing roman numerals into this list at some point. Granted I haven't done any research, but I would imagune there are a number of mis-spelled links where people have user roman numerals instead of regular numbers (or written numbers) or vice versa. Can apply to anyting from game and movie titles (Doom II <-> Doom 2, Episode IV <-> Episode Four etc), to Royalty and Popes (John Paul II <-> John Paul the second etc etc.), or even Olympic games. Does add a fair bit of complexity to the code though... --Sherool 1 July 2005 10:17 (UTC)

An excellent idea - I'll give it a go and see what comes out. - TB July 1, 2005 13:46 (UTC)

[edit] Redirects

I realise this is almost certainly a lesson in sucking eggs but it seems to me that the fastest way to turn red links blue is to make as many appropriate redirects as possible. One good redirect can turn a whole stack of red links blue without the need to individually edit each one.

That's a very good approach. These current links won't be the last ones to use the red link instead of the (only slightly different) blue link. - Tεxτurε 20:25, 13 July 2005 (UTC)

[edit] TLAs

What are we doing with pages such as TLAs from AAA to DZZ? These pages are designed, it seems, to utilise the red links, and I can't see a method of removing them that won't cause more problems. --me_and 06:48, 14 July 2005 (UTC)

I think they should be left alone. As you say, they contain red links for a reason. Those are some of the few pages where red links are a good thing. – Quadell (talk) (sleuth) 13:55, July 14, 2005 (UTC)

[edit] French names nightmare :)

Hi. I've a suggestion : one should make exceptions for some pages related to French names. Take a look at Communes of the Gironde département. I guess this list of place name has been copied from somewhere like a gouvernemental site and may not contain common typing mistakes. For etymological reasons, a huge lot of place names in French do have an ending s and many of them contain common word. Take the place name Coutures for instance : it has probably very few to do with couture. As a very lot of "plural suggestions" are those in "Communes of the XXX departement", I would suggest to not check these pages next time. French names are a nigthmare, even for French people like me : many time I have no idea on how to prononce them ! gbog 04:54, 15 July 2005 (UTC)

I agree, in this case the suggestions are, in my experience, 0% effective. These lists just have a TON of links, and there are correspondingly a TON of little French villages without Wikipedia articles. (Note that I'm working through them anyway, because I have way too much free time ^^; So they'll probably end up on the exception list eventually) Junkyard prince 05:01, 31 July 2005 (UTC)

[edit] tip

FYI- the "tabbed browsing" feature of the Mozilla, Fire Fox browser makes this really fast. Stainless steel 18:48, 27 July 2005 (UTC)

Yep, tabs are neat. Perfer Opera myself though, its "notes" feature is also extremely hepfull when editing, I use it to manage often used edit summaries, templates, categories and such. Way better than copying and pasting from some external document or whatever, just right click → insert note, and pick the one you want. I'm sure there is a FireFox extention for simmilar functionality too though. --Sherool 00:18, 22 August 2005 (UTC)

[edit] How successful were we?

This project's current iteration seems to be coming to a completion, with most sections done, Part 6 of the Pluralisation section likely to be finished within mere minutes. How successful was this effort? What percentage of suggestions was struck through, and what percentage was not? NatusRoma 05:15, August 11, 2005 (UTC)

Indeed, it would be nice to see some stats here now that the project is finished. -- Rune Welsh ταλκ 12:56, August 21, 2005 (UTC)
I've not been the most active participant, but if memory serves I think capitalisation and the numerical lists worked out quite well, while the pluralisation list resulted in a significant number of exceptions. At least that's my impression. But it depends on how you measure sucess now doesn't it? I don't see an exception as a "failure", if no relevant article exist then there is nothing to fix, and someome might eventualy make an article to fill the "gap". As I understand the main aim was to fix links that cold be made to point at existing articles and I think we did pretty good in that regard. Though naturaly between the database dump the lists where generated from and now there are probably a million new red links waiting to be fixed :P --Sherool 00:33, 22 August 2005 (UTC)
I've yet to compile accurate stats, but believe that more than 50,000 red links were recovered in this iteration of the project. A hearty well done to us all! The automatically generated suggestions were on average about 90% correct. Of course, this was a first pass and concentrated on all the easy wins - hopefully the list of 5000 exceptions will help keep the quality of the next iteration up. - TB 22:08, August 22, 2005 (UTC)

[edit] Any clue when we'll get the next batch?

I'm glad that we appear done for now...but anyone know when they'll be ready for us to go back at it again?

--Kell 23:52, 14 September 2005 (UTC)

Topbanana seems to be on a haitus or something (no edits since early September). Maybe someome else could cobble together some lists, the script and exception lists are all available on subpages here if I'm not mistaken. --Sherool 18:10, 22 September 2005 (UTC)
I just tried to do it. However, it seems that the June 23 database dumps of the two dump files we need are the most recent files available from the Wikimedia download site. So we're working with the most recent information right now, it seems.
Plus, it looks like they've started using XML for one thing or another, making the directions given on the subpage of this WikiProject completely useless. Shame.
ArmadniGeneral (talkcontribs) 20:15, 2 October 2005 (UTC)
Yes, I'm on haitus (usual story .. got married .. business took off dramatically .. moved to a non-broadband house .. first child on the way .. etc etc). Apologies all who are awaiting the next round of this project. If anyone has been able to get a recent database dump downloaded and available in mysql and is willing to have a go at generating more suggestions, give me a shout and I'll try to lend what assistance I can. - TB 22:01, 17 November 2005 (UTC)

[edit] Bot for WikiProjects?

I can't seem to find an answer for this, but is there a bot that can be run against a category that would then list redlinks? I'd like to be able to transclude a page into our Project page of redlinks so editors can go in and either unlink or create stubs....plange 07:24, 1 July 2006 (UTC)

Am I right in thinking that you want lists of all red-links in pages listed in a given category? I'm sure I can manage this if you want, name your category :) - TB 22:57, 23 July 2006 (UTC)

[edit] Wikipedia Integration

I've identified this project as a candidate for material to be analyzed by Wikipedia Integration methodology. Please feel welcome to offer suggestions and feedback. WP:ʃ Cwolfsheep 16:22, 22 July 2006 (UTC)

[edit] Award needed?

Does the red link recovery need a bardstar or similar to award to it's most active bluelinkers ? I've always given out janitorial medals in the past, but I guess they're not quite appropriate here. Any suggestions ? - TB 08:35, 5 August 2006 (UTC)

If you can get enough support from the WikiProject, I'd recommend an image of a star that's half blue, half red, with an arrow pointing to the blue side. --Gray Porpoise 19:20, 16 August 2006 (UTC)

Is it easy to find who is the most active blulinker? -- Magioladitis 10:09, 29 September 2006 (UTC)

[edit] Anyone know of a project that ...

The Red Link Recovery project is, as a side effect of its normally activities, generating lists of articles that don't yet exist, but that have names very similar to articles that do already exist, for example Christmas pudding and Christmas Pudding). I can't help but think that it would be useful to create the missing articles in stub form and check that links to the pre-existing one aren't in fact intended for the newly created stub. The problem is that I've no idea who might be interested in carrying out such a task; does this fall within the scope of any existing cleanup projects? - TB 20:04, 30 August 2006 (UTC)

The problem you are describing sounds similar to what Redirect is trying to do. --Everchanging02 05:39, 31 October 2006 (UTC)

[edit] Project directory

Hello. The WikiProject Council has recently updated the Wikipedia:WikiProject Council/Directory. This new directory includes a variety of categories and subcategories which will, with luck, potentially draw new members to the projects who are interested in those specific subjects. Please review the directory and make any changes to the entries for your project that you see fit. There is also a directory of portals, at User:B2T2/Portal, listing all the existing portals. Feel free to add any of them to the portals or comments section of your entries in the directory. The three columns regarding assessment, peer review, and collaboration are included in the directory for both the use of the projects themselves and for that of others. Having such departments will allow a project to more quickly and easily identify its most important articles and its articles in greatest need of improvement. If you have not already done so, please consider whether your project would benefit from having departments which deal in these matters. It is my hope that all the changes to the directory can be finished by the first of next month. Please feel free to make any changes you see fit to the entries for your project before then. If you should have any questions regarding this matter, please do not hesitate to contact me. Thank you. B2T2 14:18, 26 October 2006 (UTC)


[edit] To do list

The work on the to Do list table has all been completed. The table should either be updated with new work, or removed. raining_girl 20:35, 13 November 2006 (UTC)

Yes it would be nice for someone to update the to do list, what is needed to do this, a DB Dump and is there a bot to look for certain occurrances of red links? Djanvk 07:36, 21 December 2006 (UTC)

Topbanana generates a new list every 6-10 months according to this post on his talk page. ~ BigrTex 14:53, 21 December 2006 (UTC)

[edit] comment on Wikipedia: Red link

I left the following comment on Wikipedia_talk:Red_link#warning_not_to_arbitrarily_remove_red_links; please continue discussion there. --C S (Talk) 21:06, 28 December 2006 (UTC)

I think it would be useful to have somewhere in the lead, bold text that states do not arbitrarily remove red links just because the article doesn't exist. The reason being that several times I've run across editors just removing clearly important red links. In the most recent case, I pointed the editor toward this page, but then he cited part 2 of "dealing with existing red links" which says to remove broken links. Apparently, "broken" link can be easily confused with "red" link. --Chan-Ho (Talk) 21:04, 28 December 2006 (UTC)

[edit] Wikipedia Day Awards

Hello, all. It was initially my hope to try to have this done as part of Esperanza's proposal for an appreciation week to end on Wikipedia Day, January 15. However, several people have once again proposed the entirety of Esperanza for deletion, so that might not work. It was the intention of the Appreciation Week proposal to set aside a given time when the various individuals who have made significant, valuable contributions to the encyclopedia would be recognized and honored. I believe that, with some effort, this could still be done. My proposal is to, with luck, try to organize the various WikiProjects and other entities of wikipedia to take part in a larger celebrartion of its contributors to take place in January, probably beginning January 15, 2007. I have created yet another new subpage for myself (a weakness of mine, I'm afraid) at User talk:Badbilltucker/Appreciation Week where I would greatly appreciate any indications from the members of this project as to whether and how they might be willing and/or able to assist in recognizing the contributions of our editors. Thank you for your attention. Badbilltucker 19:30, 30 December 2006 (UTC)

[edit] Suggestion

May I suggest that lists of all the red link article are created so that when the link is clicked the users is rediredcted to the page with the list in. —The preceding unsigned comment was added by Ajuk (talkcontribs) 19:41, 14 January 2007 (UTC).

[edit] List of redlinks to categories that do not exist?

Is there a list anywhere of redlinks to categories that do not exist? Thanks! --Ling.Nut 19:45, 21 March 2007 (UTC)

[edit] Any ideas?

There is a perennial situation where a list becomes a veritable link farm, including links to non-RS and advertising for dubious organizations and groups. This article has been plagued by this problem for some time, and the solution that has been used is to convert them all to red links, resulting in an awful lot of red links for articles unlikely to be created in the near future. Here's the latest attempt. Any suggestions? -- Fyslee/talk 07:14, 8 April 2007 (UTC)

[edit] Redlink / bluelink report required

I am slowly populating Wikipedia with Royal Navy ships. There are 15000+, so this may take some time! It would be good to be able to generate a list of all current links starting "HMS " with an indication of whether they are red or blue. If possible I'd like to run the report myself from time to time, but I have no access to offline dumps etc. Does this facility exist anywhere? Can anyone help? Thanks, Welsh 20:19, 10 April 2007 (UTC)

[edit] Wikipedia:Requests for verification

Please see: Wikipedia:Requests for verification

A proposal designed as a process similar to {{prod}} to delete articles without sources if no sources are provided in 30 days.

It reads:

This page has been listed in Category:Requests for verification.
It has been suggested that this article might not meet Wikipedia's core content policies Verifiability and/or No original research. If references are not cited within a month, the disputed information will be removed.

If you can address this concern by sourcing please edit this page and do so. You may remove this message if you reference the article.

The article may be deleted if this message remains in place for 30 days. (This message was added: 12 June 2008.)

If you created the article, please don't take offense. Instead, improve the article so that it is acceptable according to Verifiability and/or No original research.


Please help improve this article by adding citations to reliable sources. (help, get involved!)

Some editors see this as necessary to improve Wikipedia as a whole and assert that this idea is supported by policy, and others see this as a negative thing for the project with the potential of loss of articles that could be easily sourced.

I would encourage your comments in that page's talk or Mailing list thread on this proposal WikiEN-l: Proposed "prod" for articles with no sources

Signed Jeepday (talk) 14:06, 18 July 2007 (UTC)

[edit] Bot suggests dab

For the bot-generated lists, it says that you should remove the entry if the suggestion was correct (and you fixed it), and if it's incorrect, you should mark it with strike-through and leave a brief comment explaining why it was incorrect. What about the case where the bot suggests a disambiguation page? That's not exactly correct since you don't actually want the link to point there, but it's not exactly incorrect either, since bots can't (generally) disambiguate. So what should I do? I tend to think that I should treat them as correct, but I want to make sure, since I'm not really familiar with the history of this project or how the data will be used. Xtifr tälk 12:00, 3 August 2007 (UTC)

You're right there's a hole in the logic there. As this lists are to do with 'red links', the intention is that if the red link is 'fixed' - that is removed or mde blue by any means then the entry should be deleted. The non-deleted (struck-through) entries are used to filter future lists and ensure they are not erroneously suggested again. So, to go back to your case if a report suggested that a link to Jonh Smith should in fact be to John Smith (almost certainly a disambig page) I'd suggest deleting the entry one the red link is 'fixed' on the basis that it was useful. In contrast a suggestion that John G Kennedy be changed to John F Kennedy where the former happens to be a distinct, notable person who doesn't happen to have an article yet should be struck-through, even if it inspires you to write the missing article to 'fix' the red link. I'll see if I can't clarify the instructions a bit, although you're very welcome to do so yourself - be bold. TB 08:22, 4 August 2007 (UTC)
Cool, thanks. About what I expected, but it never hurts to check. :) Cheers, Xtifr tälk 00:13, 5 August 2007 (UTC)


I participated again last year. I think we have to leave only the cases in which the red link is something that can be covered by the suggestion or/and another solution the user suggests. We need the remaining red links in order to unlink them or create new articles. This project is not a bit test but a way to make Wikipedia better. Btw, I think the special characters were dumbed incorrectly. Moreover, I think they must be regular checks of a bot the delete entries in which the article containing the red links doesn't exist (it's a red link itself). This 'll save us a lot of work. -- Magioladitis 16:29, 6 August 2007 (UTC)

[edit] Red links created by deletions

Does anybody agree that the deletion process ought to mop up the red links that result? There are many reasons for deletion, and sometimes red links are left.

Maybe some of these could be solved by an automated process. When pages are transwikied to Wiktionary (etc) then PRODded, could a bot trace incoming links and change the internal link into an interwiki link? See e.g. Special:Whatlinkshere/Taffy (ethnic slur).

Transwiki-ing also creates red links in the new wiki, e.g. this one at Wiktionary which used to link to other articles in Wikipedia.

Other cases: (i) Spam articles - often when an author creates a promotional article, they also insert links to it in other articles and lists. These remain as a form of free advertising after the article itself is deleted. (ii) Spoof articles - likewise there will be nonsense links left behind. Generally, these links should be deleted.

There are more reasons for deletion, and sometimes the link should be manually changed to something else rather than deleted.

The Deletion Log lists all deletions. Unfortunately you can't use the Popups tool to scan it quickly for incoming links ("What links here"), as it cannot detect these for deleted pages.

Suggestions? - Fayenatic london (talk) 19:37, 9 August 2007 (UTC)

The problem that I see is that of a valid redlink future article where a vandal chooses to create a nonsense entry. When the nonsense article was deleted, we'd lose all of the redlinks which might encourage another editor to write a real article. ~ BigrTex 19:44, 9 August 2007 (UTC)

[edit] Diacritics again?

Is there any plan to create a list of red links with wrong diacritics? --Amir E. Aharoni (talk) 15:16, 25 February 2008 (UTC)

[edit] Hogenakkal_drinking_water_project

Something must have gone wrong with the parse--there are entries all over the place that claim the bad source is in Hogenakkal_drinking_water_project, but it's clearly not. Looking at what links here tells you the correct article, though.
A few examples:
Wikipedia:WikiProject_Red_Link_Recovery/Capitalisation/52, #25510, 25513
Wikipedia:WikiProject_Red_Link_Recovery/Unlikely/images, #45-69
(and probably several more, but those are the two I've noticed.

(incidentally, there also seems to have been a problem with special characters. On the images page, every item with question marks are not actually linked from anywhere.)
Noliver (talk) 09:21, 7 June 2008 (UTC)