User:DumZiBoT/refLinks

From Wikipedia, the free encyclopedia

1 What is DumZiBoT doing?
2 The idea
- 2.1 Features
3 Hey, you forgot some links!
- 3.1 Blacklists
- 3.2 Meta-data
4 And what about server load?
5 Could you use {{Cite web}} instead of the standard link syntax?
6 Where do I request DumZiBoT to go through a specific page?
- 6.1 Online tool
7 Where should I grumble report a problem?

[edit] What is DumZiBoT doing?

He is converting bare external links in references into named external links.

Here are some examples of his work: [2], [3], [4] and here is what he is doing now.

He usually runs every time that a new XML dump is available. Processing a dump takes days : handling enwiki's March 15th dump took more than 120 hours, 5 days of uninterrupted work.

His owner is NicDumZ.

[edit] The idea

References like these:

<ref>[http://www.google.fr]</ref>^[1]
<ref>http://www.google.fr</ref>^[2]

are converted into this:

<ref>[http://www.google.fr Google]</ref>^[3]

They look like this:

The title which is used as the url title is the HTML title from the linked page. (from the <title> tag)
newlines, linefeeds, and tabs from titles are converted into a single space to avoid long titles. Extra spaces are also removed.
Titles containing ], several consecutive } or ' are handled correctly, converting some of the preceding characters to their html entities (This title enclose brackets [here])
When content-type is not text/html (medias, .doc, etc...), I can't automatically find a title, hence I only convert references to <ref>http://lien.org/doc.pdf</ref>.
Lengthy titles are arbitrarily truncated to 250 characters. When this happens, "..." is appended to the title.

[edit] Features

Reads the titles from PDF files
If a dead link is found, it is tagged using {{dead link}}
When no <references/> or {{reflist}} is in the page, <references/> is appended.

[edit] Hey, you forgot some links!

If a link is unchanged after an edit by DumZiBoT, please check that you can access the linked page.
If you think that a particular link was ignored by DumZiBoT because it's particular, please poke me.

Some links may not be changed, even after DumZiBoT's run. These things may have occurred :

The HTML linked page has no title (rare, but happens).
DumZiBoT got an HTTP error while trying to get the page (see 4xx Client Error and 5xx Client Error). The link may be invalid, the page may not be available anymore, or may be protected. These links should be repaired or removed, but chances are that the error is temporary. Also, some pages, such as Google cache links, and Google books pages, give bots a 401/403 error although they're available. You may wish to try the Link checker tool to correct the problem.
Either the link or the html title is blacklisted.

[edit] Blacklists

Link blacklist : for now, only JSTOR links are ignored, since for non-registered users JSTOR gives the message: "JSTOR: Accessing JSTOR". Please contact me if you think that a particular domain should get blacklisted
Title blacklist : Based on an original idea from Dispenser, I exclude links containing register, sign up, 404 not found, and so on.

[edit] Meta-data

Why doesn't DumZiBoT include extra information, like access date, author, or publication, or use citation templates? Changing the citation style in an article cannot be done without gaining consensus. SEWilcoBot and RefBot were blocked for changing the citation style in articles.

[edit] And what about server load?

The search for pages containing invalid references is made from the last XML dump. DumZiBoT only fetches from the servers pages that needed modifications at the time of the dump. (Some pages are downloaded but eventually do not need changes, because the references were fixed between the dump and the fetch.)

[edit] Could you use {{Cite web}} instead of the standard link syntax?

No. Read this talk page archive for further explanations.

[edit] Where do I request DumZiBoT to go through a specific page?

Nowhere. Just wait : DumZiBoT goes through every page that need a fix whenever a new dump is available.

[edit] Online tool

However, thanks to Dispenser, you can manually run DumZiBot's script on a page or a modified script which makes more assumptions about references and formatting.

[edit] Where should I grumble report a problem?

User Talk:NicDumZ, and not elsewhere =)

User:DumZiBoT/refLinks

From Wikipedia, the free encyclopedia

Contents

[edit] What is DumZiBoT doing?

[edit] The idea

[edit] Features

[edit] Hey, you forgot some links!

[edit] Blacklists

[edit] Meta-data

[edit] And what about server load?

[edit] Could you use {{Cite web}} instead of the standard link syntax?

[edit] Where do I request DumZiBoT to go through a specific page?

[edit] Online tool

[edit] Where should I grumble report a problem?

Views

Navigation

Interaction

Search

Languages