Wikipedia talk:WikiProject Red Link Recovery

From Wikipedia, the free encyclopedia

Contents

[edit] A more intelligent search engine?

Not sure how to list this on the main page, so here's my discussion idea: How about some sort of semi-intelligent search engine on Wikipedia that will find alternative spellings or a part of a searched phrase, or suggest alternate spellings, etc? For example, "Breslov" (a group within Hasidic Judaism) is also spelled "Breslev" and "Breslav" because it is originally Hebrew, which has a different alphabet. Similarly, "Rabbi Nachman of Breslov" is the same person as "Nachman of Breslov" or "Rebbe Nachman" -- all three of which might be used by searchers. Plus, Yiddish and modern Hebrew differ in pronunciations of the same thing, hence differing transliterations. For example, a yechidus is the same thing as a yechidut. (No page on that yet -- I plan to create one.) I've been working on Hasidic Judaism pages and have had a heck of a time trying to figure out how Wikipedia is spelling Hebrew and Yiddish terms and names and whether or not an appropriate page already exists. rooster613

  • If there are any rules that can be derived I can certainly pull out lists of suggested changes to links. For example if "Rabbe Nachman" and "Rabbi Nachman of Breslov" are used inconsistently, I can detect red links to "Rabbi Nachman of Breslov" and suggest "Rabbe Nachman" as a link. If you have trouble picking out rules, I'm happy to run a set of examples through a pattern matcher and see if it produces anything useful. - TB 22:27, 2005 Jun 23 (UTC)
  • I would like to implement some type of nearest-neighbour matching, and query-reformulation in the URL's of Wikipedia to match, the fact of what happens to wikipedia search terms. It derives from ideas of search engines that look at user querys and match the terms to what other users spelt in addition to the correction lists, which we may already have. This POV, has the idea that, people will 90-95% of the time type correct URL's as they want to retrieve the right pages, and wont waste time. Thus we may capitalize of Wikipedias query logs to check if a particular search term is already present in the query-logs, [meaning its right!], or if a closely matching words are found in the logs, we may present them as suggestions to the user [like Googles suggestion]. I have posted a bug on this, at MediaWiki servers, and doing background research on this. Muthu CDT 9:47.00pm Oct 20th 2005.

There's no such spelling as "Rabbe" -- the word is "Rebbe" -- with an "RE" not an "RA." A rabbi is a scholar of Jewish law and teachings. A Rebbe is a charismatic saintly leader of a group of Hasidic Jews. These words are usually not a problem. But if you can disambiguate "Rabbi Nachman of Breslov", "Rabbi Nachman" and "Rebbe Nachman" and point them to the Nachman of Breslov page, it would be a great help. Ditto with pointing "Breslev" and "Bratzlav" to the Breslov (Hasidic dynasty). (Although Bratzlav is also a town in Germany.) Also, "Reb Nosson" and "Nosson of Nemirov" should point to Nathan of Nemirov. Thank you! rooster613

[edit] How often are the lists updated?

Just wondering if there is any point in going out of my way to remove listed links that are no longer red (a list update will fix those anyway right?). I did remove some today, but then it ocured to me that while cleaning the list is well and good that hour could have been better spent fixing actual red links instead. --Sherool 28 June 2005 15:50 (UTC)

I'd hope to regenerate the lists every month or two, time and database dumps depending. The current lists are basedon the 15th May database dump, so they're at least 40 days out of date - the chances are that a number of entries have already been fixed in this time, these'll be the one's you're seeing. All in all it's probably not worth deleting them unless you're editing a section anyway. Do however mark up false positives - I'll filter these out of future editions of the reports to save everyone time and effort. - TB June 28, 2005 21:33 (UTC)

[edit] Regarding the numeric list

How about implementing roman numerals into this list at some point. Granted I haven't done any research, but I would imagune there are a number of mis-spelled links where people have user roman numerals instead of regular numbers (or written numbers) or vice versa. Can apply to anyting from game and movie titles (Doom II <-> Doom 2, Episode IV <-> Episode Four etc), to Royalty and Popes (John Paul II <-> John Paul the second etc etc.), or even Olympic games. Does add a fair bit of complexity to the code though... --Sherool 1 July 2005 10:17 (UTC)

An excellent idea - I'll give it a go and see what comes out. - TB July 1, 2005 13:46 (UTC)

[edit] Redirects

I realise this is almost certainly a lesson in sucking eggs but it seems to me that the fastest way to turn red links blue is to make as many appropriate redirects as possible. One good redirect can turn a whole stack of red links blue without the need to individually edit each one.

That's a very good approach. These current links won't be the last ones to use the red link instead of the (only slightly different) blue link. - Tεxτurε 20:25, 13 July 2005 (UTC)

[edit] TLAs

What are we doing with pages such as TLAs from AAA to DZZ? These pages are designed, it seems, to utilise the red links, and I can't see a method of removing them that won't cause more problems. --me_and 06:48, 14 July 2005 (UTC)

I think they should be left alone. As you say, they contain red links for a reason. Those are some of the few pages where red links are a good thing. – Quadell (talk) (sleuth) 13:55, July 14, 2005 (UTC)

[edit] French names nightmare :)

Hi. I've a suggestion : one should make exceptions for some pages related to French names. Take a look at Communes of the Gironde département. I guess this list of place name has been copied from somewhere like a gouvernemental site and may not contain common typing mistakes. For etymological reasons, a huge lot of place names in French do have an ending s and many of them contain common word. Take the place name Coutures for instance : it has probably very few to do with couture. As a very lot of "plural suggestions" are those in "Communes of the XXX departement", I would suggest to not check these pages next time. French names are a nigthmare, even for French people like me : many time I have no idea on how to prononce them ! gbog 04:54, 15 July 2005 (UTC)

I agree, in this case the suggestions are, in my experience, 0% effective. These lists just have a TON of links, and there are correspondingly a TON of little French villages without Wikipedia articles. (Note that I'm working through them anyway, because I have way too much free time ^^; So they'll probably end up on the exception list eventually) Junkyard prince 05:01, 31 July 2005 (UTC)

[edit] tip

FYI- the "tabbed browsing" feature of the Mozilla, Fire Fox browser makes this really fast. Stainless steel 18:48, 27 July 2005 (UTC)

Yep, tabs are neat. Perfer Opera myself though, its "notes" feature is also extremely hepfull when editing, I use it to manage often used edit summaries, templates, categories and such. Way better than copying and pasting from some external document or whatever, just right click → insert note, and pick the one you want. I'm sure there is a FireFox extention for simmilar functionality too though. --Sherool 00:18, 22 August 2005 (UTC)

[edit] How successful were we?

This project's current iteration seems to be coming to a completion, with most sections done, Part 6 of the Pluralisation section likely to be finished within mere minutes. How successful was this effort? What percentage of suggestions was struck through, and what percentage was not? NatusRoma 05:15, August 11, 2005 (UTC)

Indeed, it would be nice to see some stats here now that the project is finished. -- Rune Welsh ταλκ 12:56, August 21, 2005 (UTC)
I've not been the most active participant, but if memory serves I think capitalisation and the numerical lists worked out quite well, while the pluralisation list resulted in a significant number of exceptions. At least that's my impression. But it depends on how you measure sucess now doesn't it? I don't see an exception as a "failure", if no relevant article exist then there is nothing to fix, and someome might eventualy make an article to fill the "gap". As I understand the main aim was to fix links that cold be made to point at existing articles and I think we did pretty good in that regard. Though naturaly between the database dump the lists where generated from and now there are probably a million new red links waiting to be fixed :P --Sherool 00:33, 22 August 2005 (UTC)
I've yet to compile accurate stats, but believe that more than 50,000 red links were recovered in this iteration of the project. A hearty well done to us all! The automatically generated suggestions were on average about 90% correct. Of course, this was a first pass and concentrated on all the easy wins - hopefully the list of 5000 exceptions will help keep the quality of the next iteration up. - TB 22:08, August 22, 2005 (UTC)

[edit] Any clue when we'll get the next batch?

I'm glad that we appear done for now...but anyone know when they'll be ready for us to go back at it again?

--Kell 23:52, 14 September 2005 (UTC)

Topbanana seems to be on a haitus or something (no edits since early September). Maybe someome else could cobble together some lists, the script and exception lists are all available on subpages here if I'm not mistaken. --Sherool 18:10, 22 September 2005 (UTC)
I just tried to do it. However, it seems that the June 23 database dumps of the two dump files we need are the most recent files available from the Wikimedia download site. So we're working with the most recent information right now, it seems.
Plus, it looks like they've started using XML for one thing or another, making the directions given on the subpage of this WikiProject completely useless. Shame.
ArmadniGeneral (talkcontribs) 20:15, 2 October 2005 (UTC)
Yes, I'm on haitus (usual story .. got married .. business took off dramatically .. moved to a non-broadband house .. first child on the way .. etc etc). Apologies all who are awaiting the next round of this project. If anyone has been able to get a recent database dump downloaded and available in mysql and is willing to have a go at generating more suggestions, give me a shout and I'll try to lend what assistance I can. - TB 22:01, 17 November 2005 (UTC)

[edit] Bot for WikiProjects?

I can't seem to find an answer for this, but is there a bot that can be run against a category that would then list redlinks? I'd like to be able to transclude a page into our Project page of redlinks so editors can go in and either unlink or create stubs....plange 07:24, 1 July 2006 (UTC)

Am I right in thinking that you want lists of all red-links in pages listed in a given category? I'm sure I can manage this if you want, name your category :) - TB 22:57, 23 July 2006 (UTC)

[edit] Wikipedia Integration

I've identified this project as a candidate for material to be analyzed by Wikipedia Integration methodology. Please feel welcome to offer suggestions and feedback. WP:ʃ Cwolfsheep 16:22, 22 July 2006 (UTC)

[edit] Award needed?

Does the red link recovery need a bardstar or similar to award to it's most active bluelinkers ? I've always given out janitorial medals in the past, but I guess they're not quite appropriate here. Any suggestions ? - TB 08:35, 5 August 2006 (UTC)

If you can get enough support from the WikiProject, I'd recommend an image of a star that's half blue, half red, with an arrow pointing to the blue side. --Gray Porpoise 19:20, 16 August 2006 (UTC)

Is it easy to find who is the most active blulinker? -- Magioladitis 10:09, 29 September 2006 (UTC)

[edit] Anyone know of a project that ...

The Red Link Recovery project is, as a side effect of its normally activities, generating lists of articles that don't yet exist, but that have names very similar to articles that do already exist, for example Christmas pudding and Christmas Pudding). I can't help but think that it would be useful to create the missing articles in stub form and check that links to the pre-existing one aren't in fact intended for the newly created stub. The problem is that I've no idea who might be interested in carrying out such a task; does this fall within the scope of any existing cleanup projects? - TB 20:04, 30 August 2006 (UTC)

The problem you are describing sounds similar to what Redirect is trying to do. --Everchanging02 05:39, 31 October 2006 (UTC)

[edit] Project directory

Hello. The WikiProject Council has recently updated the Wikipedia:WikiProject Council/Directory. This new directory includes a variety of categories and subcategories which will, with luck, potentially draw new members to the projects who are interested in those specific subjects. Please review the directory and make any changes to the entries for your project that you see fit. There is also a directory of portals, at User:B2T2/Portal, listing all the existing portals. Feel free to add any of them to the portals or comments section of your entries in the directory. The three columns regarding assessment, peer review, and collaboration are included in the directory for both the use of the projects themselves and for that of others. Having such departments will allow a project to more quickly and easily identify its most important articles and its articles in greatest need of improvement. If you have not already done so, please consider whether your project would benefit from having departments which deal in these matters. It is my hope that all the changes to the directory can be finished by the first of next month. Please feel free to make any changes you see fit to the entries for your project before then. If you should have any questions regarding this matter, please do not hesitate to contact me. Thank you. B2T2 14:18, 26 October 2006 (UTC)


[edit] To do list

The work on the to Do list table has all been completed. The table should either be updated with new work, or removed. raining_girl 20:35, 13 November 2006 (UTC)