Talk:List of people by name
From Wikipedia, the free encyclopedia
Note: The talk area for List of people by name is divided into multiple subpages. See the index to subpages below before posting a new topic and then place your comment on appropriate subpage.
It is recommended that if you have a comment, put a title on it so people can find it in the table of contents. If your comment is related to one prior one, add it to that so the comments are related to each other.
[edit] Organization of this Suite of Talk Pages and Sub-pages for LoPbN
[edit] Short Index to Subpages
(updated 06:47, 15 December 2005 (UTC))
- Individual entries, or their relationships to near neighbors
- LoPbN list as a whole
- The LoPbN page -- Intro, & root of the LoPbN tree
- Subdivisions within pages
- "Sequential list of Edits and each one's Affected Section(s)" remains a sub-section (on this page) of this section, awaiting updating.
- "Use of MediaWiki msg and subst calls" remains a section on this page, awaiting more attention.
[edit] Full List of VfD/AfD Debates
- In addition, there have been VfD/AfD debates (and a speedy nom) affecting the tree or its infrastructure:
- 2003 June VfD of List of people by name: Mn
- 2003 Nov, VfD of (rdr) List of people by name: Ha-Hd, mentioned at Talk:List of people by name/Whole list#Structure that Integrates Pages into List
- 2004 Oct, Wikipedia:Votes for deletion/List of people by name: Db-Dd
- 2005 December, Wikipedia:Articles for deletion/List of people by name
- 2005 December, Speedy Deletion re Bm-Bn: nom & direct response; individual response
- 2006 February, Wikipedia:Articles for deletion/List of people by name: Huh-Hum (perhaps a classic, in view of comments on the amusingly named range!)
[edit] Index to Subpages, with Their Section Headings
- Moved Sections
- 1 Individual entries, or their relationships to near neighbors
- 1.1 Got a Name for the List but There's no Page to put it on??
- 1.1.1 People Waiting for a Page to be Listed on
- 1.2 Name Formats & Order
- 1.2.1 Name Formats & Order 1
- 1.2.2 Name Formats & Order 2
- 1.2.3 Alphabetizing these names
- 1.2.4 Separating Surnames from Identical Given Names
- 1.3 Multi-version names
- 1.1 Got a Name for the List but There's no Page to put it on??
- 2 LoPbN list as a whole
- 2.1 Name of page
- 2.2 Structure that Integrates Pages into List
- 2.3 Imputation of Design Rules for Structure
- 2.4 Toward Exhaustive Enumeration of Title Formats
- 2.5 Statistics about this article's data structure
- 2.5.1 Large pages
- 3 The LoPbN page
- 3.1 Table Structure
- 3.1.1 Code Saved for Row J of Table
- 3.1.2 General Template for One-Page-Approach Table Row
- 3.2 About missing letters
- 3.3 Layout of 26x26 table
- 3.4 Discussion of Implementation in the Article?
- 3.5 Possible Reunification of Inter-Page Design
- 3.5.1 Pilot Project: Rework of B Row of the table
- 3.5.2 Ba/Be Outcome & J Row
- 3.5.3 What Next
- 3.1 Table Structure
- 1 Individual entries, or their relationships to near neighbors
- For details on the following sections that remain on this page, see this page's current ToC
- 4 Sequential list of Edits and each one's Affected Section(s)
- 5 Use of MediaWiki msg and subst calls
[edit] Sequential list of Edits and each one's Affected Section(s)
- 22:22, 2003 Jul 8 . . SGBailey (Two questions) -
- 00:21, 2003 Jul 9 . . Amillar (Simple duplicates are errors; remove them.) - #Multi-version names
- 11:37, 2003 Aug 3 . . Patrick (Name of page) - #Name of page
- 13:42, 2003 Aug 3 . . Docu - #Name of page
- 09:02, 2003 Aug 6 . . 217.24.129.50 (Order of names) - (closes with "-- SGBailey 2003-08-06") #Name Formats & Order 2
- 14:15, 2003 Aug 6 . . MyRedDice (agree with SGBailey) - #Name Formats & Order 2
- 23:56, 2003 Aug 7 . . Amillar (automated sorting?) - #Name Formats & Order 2
- 14:39, 2003 Nov 12 . . Jerzy (Sorting, Structure (& + sec-headings))
- 04:27, 2003 Dec 13 . . Jerzy (Sec'ns "Got a Name ...?" & "Got a Name")
- 04:29, 2003 Dec 13 . . Jerzy (Whoops, other section should be "Table Sructure") - (Heading change only)
- 10:27, 2003 Dec 17 . . Jerzy (Add "(as far as i know)" in Table Structure & correct that hdg's spelling) - #Table Structure
- 18:58, 2003 Dec 17 . . Rfc1394 - #About missing letters
- 19:00, 2003 Dec 17 . . Rfc1394 - Intro Section
- 17:52, 2003 Dec 18 . . Docu - #Layout of 26x26 table
- 20:06, 2003 Dec 18 . . Docu (fmt) - (formatting change affects : #Layout of 26x26 table
- 21:11, 2003 Dec 18 . . Jerzy - (response to off-topic request in : #About missing letters
- 01:43, 2003 Dec 19 . . Jerzy (Statistics; comment on going back to the no-hyphens table)
- 06:42, 2003 Dec 19 . . Docu
- (modification of previous contrib in : #Layout of 26x26 table
- #Layout of 26x26 table
- 16:17, 2003 Dec 19 . . Jerzy (Stein & Bennett)
- 19:19, 2003 Dec 19 . . 204.60.225.77 (clarify absence of irony; grammar quibble w/ myself) - (modification of previous contrib in : #Layout of 26x26 table
- 07:19, 2003 Dec 20 . . Docu (fmt my previous post) - (modification of previous contrib in : #Layout of 26x26 table
- 19:49, 2004 Jan 15 . . Jerzy (Document split of Ma under secn Large pages) - (modification of previous contrib in : #Large pages
- (3 edits by Jerzy, on 2004 Jan 20-21, all apply to #People Waiting for a Page to be Listed on
- 04:29, 2004 Feb 11 . . Iam (repairing redirect link) - (modification of previous link in : #Layout of 26x26 table
- 08:xx, 2004 Feb 24 ... (Jerzy Refactoring by sections) - (not specific to any few sections)
I believe the preceding list of edits is up to date including the last entry. --Jerzy 08:33, 2004 Feb 24 (UTC)
[edit] Refactoring of this set of talk pages
This page refactored by reordering sections, and subordinating most of them to new headings.
- (The last non-minor edit preceding refactor was 07:44, 2004 Jan 21)
--Jerzy 08:33, 2004 Feb 24 (UTC)
I'm not sure i'd realized before now how large this page has become: over 50 kB. My previous refactoring may not be perfect, but i'm using it to parcel out the top level sections into subpages.
- - - -
OK, we're down to 17-18 kB, before i start adding more.
--Jerzy(t) 04:26, 2004 Sep 24 (UTC)
I have moved the section about commas to Talk:List of people by name/Individual Entries --jni 08:06, 17 Jan 2005 (UTC)
[edit] Use of MediaWiki msg and subst calls
This section is a very rough & stubby draft, but may be of some immediate interest, since documentation has been lacking.
Two important developments for the List of people by name tree are
- The introduction of MediaWiki msg calls and
- the pending enhancements of that facility when MW 1.3 is introduced.
Jerzy(t) 18:57, 2004 Apr 23 [missing sig added Jerzy(t) 03:43, 2004 Jun 20 (UTC)]
[edit] Introduction of MediaWiki msg Calls
I am referring to the introduction of MediaWiki msg calls for providing the links from a given page to others in the tree. User:Timwi initiated this, and he and User: Angela did nearly all the dog-work of installing it on every page of the tree except the master-table page.
One MediaWiki page, for use in such calls, now exists for each LoPbN page that has children within the tree; it links to
- that LoPbN page's siblings in the tree (lateral links),
- the siblings of its ancestors (upward links), and
- its children.
And of course those links appear on any (non-MW name-space) page that has a msg call to that MW page. That MW page is in fact the target of an msg call
- on that LoPbN page, and
- on each of its children in the tree, that don't themselves have children. (The reason for this exception is that the MW page doesn't link to the childrens' children:
-
- it shouldn't, since this MW page will also be called not only from the page with the children, but also from "aunts and uncles" of those grandchildren, where links to the grandchildren would function as "niece/nephew" links that would be of little use but offer significant confusion, and
- it can't, since in many cases, there would be multiple "families" of niece/nephew links, stacked one above another in an intolerable clutter.)
- An important change this implies is that it eliminates the (IMO) most important of the former reasons for using redirects between the 26x26 master table on LoPbN (the root page) and the pages that "span" several entries in the same row of that table. I became an avid supporter of the redirects when i realized that linking the table entries and "lateral links" (see above for definition) meant that merging or splitting LoPbN-tree pages that are linked from the table would require editing not only the table but each of the other pages linked from the same row.
- In contrast, such a change now requires a change to the table and a change to any of the MW pages that mention the merged or split page(s): usually only one, and three would be exceptional until the list gets at least several times larger.
Jerzy(t) 18:57, 2004 Apr 23 [missing sig added Jerzy(t) 03:43, 2004 Jun 20 (UTC)]
[edit] Pending Enhancements of msg facility in MW 1.3
A meta article describes these. The nested calls will make it possible for a single edit of one MW page (not among the ones described above) to cause immediate update of all of the needed changes in the MW pages described above, and thus of all the affected pages (other than the master table, where relevant). The parameter facility will make it easier for non-experts to create the MW pages and calls needed when LoPbN pages become "first-time parents". More to follow.
Jerzy(t) 18:57, 2004 Apr 23 [missing sig added Jerzy(t) 03:43, 2004 Jun 20 (UTC)]
[edit] MW 1.3 Features in Use Here
I did a pilot use of the MW 1.3 Template facility with Template:List of people Z; the restrictions imposed by the exact specs of the facility disappointed me, but the enhancements are still very welcome.
It may not be obvious why that example represents an advance -- especially since Z was specifically chosen as a low-impact test bed, and since it followed from that choice that the main benefit of the new capability does not accrue in that case. Briefly, and barely less cryptically, Template:List of people Z is the only page that needs to reference Template:List of people Z Links, and much of the point is that a change to, say, Template:List of people H Links, reflecting, say, the breaking up into multiple pages of List of people by name: Ho will (once the technique used on Template:List of people Z is extended to its logical limits) affect the appearance of 4 templates beyond the new Template:List of people Ho that the breakup would require.
Based on further thought about the problem, i am about to move forward with a more thorough approach to it (which will be easier to describe clearly when i can refer to specific examples). But barring surprises, i will rewrite all of the existing templates whose names are of the form
- Template:List of people #
where # is a string of 1 to 3 letters.
--Jerzy(t) 04:56, 2004 Jun 20 (UTC)
[edit] The Promised "specific examples"
I have just merged two pages, namely List of people by name: Hp and List of people by name: Hq; the merged page is List of people by name: Hp-Hq. Besides
- the two redirects involved,
such a change has traditionally required editing
(which i also did),
- List of people by name: H and
- the 17 pages linked from List of people by name's H row, from List of people by name: Ha to List of people by name: Hz.
The introduction of indexes based on (un-nested) template calls in recent months reduced those 18 edits to one, in this case namely Template:List of people H. Nevertheless, a certain amount of dog-work remained: in this case, Template:List of people Ha, Template:List of people He, and Template:List of people Har also linked to List of people by name: Hp and List of people by name: Hq, and thus, until this weekend, each would have required an edit to replace the two links with a single link to the replacement page.
(I hasten to mention that while reducing the time spent editing is nice, a greater concern is reducing the opportunity for typos and errors of omission. It is desirable that those who may make such merges, and more often splits of pages, not need to track down the pages needing changes.)
Because i had converted all four of the templates just mentioned to in turn call Template:List of people H Links, the only changes i had to make to fix links to Hp and Hq were the one to List of people by name (mentioned above) and another to Template:List of people H Links. (The name of the page in question will become apparent to any editor who understands that double-braces imply reference to the Template: namespace; examining the markup for any of the pages formerly needing changes leads to a template whose markup includes a call to one of the templates already mentioned, and its markup includes a call to Template:List of people H Links.)
--Jerzy(t) 06:13, 2004 Jun 21 (UTC)
[edit] Implications of Category Tags for LoPbN
I am enthusiastic about the jobs that tags using the
- Category:
namespace can take over from LoPbN; i gather users will be able to refer to [[Category:People]] for a list of narrower categories, and, by steps progressively narrow further and further to the point where they have a small list of names of people who have a lot in common, so that for instance Cohen, Cohn, Cohan, and Kohane (all of them American actors, or all of them British political leaders, or what have you) will be isolated in close proximity to each other, instead of being separated by lots of obviously irrelevant names and spread across many headings or even pages.
It may be that Category:People can do most of the jobs that people use LoPbN for now.
Nevertheless, i hope to see a continuation of this list (enhanced by the addition of names added from a master list of people built automatically using Category: -based info), and of more specialized lists (that i have paid little attention to), similarly enhanced by smaller lists generated similarly. I hope for that continuation for the sake of jobs that categories alone cannot fulfill. These are some that occur to me:
- Listing people not deserving full bio articles, but well enough known to occasionally cause confusion with similarly named people with articles.
- Listing likely mis-spellings or mis-filings (e.g., Thich Nhat Hanh under T, Japanese names with the surname and given name misidentified, or people with a "von" or an Arabic article in their name (where do those belong, BTW? Our practice is inconsistent at present)), whose redirects, as i understand, don't fit into the category system.
- Listing people who should have articles, when someone is adding enough of them at once that starting a page and putting even an minimally adquate set of Category: tags on each of them would be burden that would limit the number of names added, or when the adder is clueless about Category: tags.
- Searching for an existing article, in order to pursue a reference to a person whose relevant field is unclear from context, or too minor among their minor talents for their article to deserve (or at least to yet have) a category tag referring to it, especially when the reference is misspelled or is less than a full name.
--Jerzy(t) 04:56, 2004 Jun 20 (UTC)
[edit] Removing Boxes around Indexes
Until the box-describing facilities can be made to work better in conjunction with a new means of implementing the indexes at the tops of LoPbN pages, i am implementing a new version of the indexes that avoids some major problems by just leaving the boxes off. I favor boxes in the long run, but for reasons i will explicate here in the next week (largely by showing how the new implementation works), they are a serious problem for now.
The return of boxes will be a very minor editing editing task, under the new implementation, once a solution to the problems emerges. (Someone may already know how that solution, or it may require MediaWiki changes.)
--Jerzy(t) 17:59, 2004 Jun 19 (UTC) [, w/ touch-ups Jerzy(t) 04:56, 2004 Jun 20 (UTC)]
[edit] Info on Moved-away Sections
[edit] Toward Formal Documentation of Established Practice on the LoPbN Tree
(Multi-sub-section text, an abandoned effort, moved to Wikipedia talk:LoPbN Meta-structure, under the same heading.
--Jerzy•t 08:46, 6 January 2006 (UTC)
)
[edit] A Relevant Deletion Discussion
[At some point (and no later than if & when it starts to get long) this discussion will be moved to Talk:List of people by name/Individual Entries. (I propose to subordinate both it and "Got a Name for the List but There's no Page to put it on??" under a new heading "The Future-Entry Problem".) Those interested would be wise to Watch-list that page now.] --Jerzy(t) 07:36, 2004 Oct 21 (UTC)
I think the vote may already be shifting to a clear Keep, but some readers of this page may be interested by my long defense of that action, at Wikipedia:Votes for deletion/List of people by name: Db-Dd. --Jerzy(t) 22:43, 2004 Oct 6 (UTC)
The VfD process ended in a vote of 10 for Keep and 4 for Delete or Merge; details have been moved to Talk:List of people by name: Db-Dd --Jerzy(t) 07:36, 2004 Oct 21 (UTC)
[edit] Alphabetizing
[At some point (and no later than if & when it starts to get long) this discussion will be moved to Talk:List of people by name/Individual Entries; those interested would be wise to Watch-list that page now.]
Now moved. --Jerzy•t 23:11, 15 December 2005 (UTC)
[edit] Replacing groups of names by surname links on LoPbN
(Content moved to Talk:List of people by name/Replacing groups of names by surname links on LoPbN
[edit] What Belongs in an Entry besides the Link?
Material formerly on a user-talk subpage and formerly linked from here, has been moved to join related material on a sub-page of this talk page, namely at Talk:List of people by name/Individual Entries#What Belongs in an Entry besides the Link?. --Jerzy·t 05:23, 25 Jun 2005 (UTC)
[edit] List of people??
(The material formerly here was 3 'graphs by 2 editors. It is completely duplicated at Wikipedia:Articles for deletion/List of people by name, under the 7th top-level-bullet-point, where all but one response (the 2nd & 3rd 'graphs) originated. It no longer served any purpose here.
--Jerzy•t 09:10, 6 January 2006 (UTC))
- This discussion has been copied to Wikipedia:Articles for deletion/List of people by name, where it has continued with additional participants. Discussion on that page will be cut off, probably in less than a week, so this section may become an appropriate place to go further after that point. But at present, please comment there (even if you don't choose to vote on the deletion proposal) if this needs furthere discussion before the AfD closes.
--Jerzy•t 20:05, 6 December 2005 (UTC) - The AfD has concluded with three delete votes and 8 for keep. Unless further discussion appears here, i will review my impression that this section is redundant to the AfD sub-page (which is referenced more accessibly elsewhere on this page), and probably delete this section.
--Jerzy•t 17:19, 13 December 2005 (UTC)
[edit] Recently Active Discussions
[edit] Purpose of name lists
Discussion, in progress, is moving to Talk:List of people by name/Whole list#Purpose of LoPbN Tree --Jerzy·t 01:07, 2005 July 14 (UTC)
Off-topic discussions contributed in the same edits are moving to User talk:Jerzy/Jerzy & Superm401. --Jerzy·t 01:07, 2005 July 14 (UTC)
[edit] Commas
The commas before the vital-stats dates will be disappearing: see Talk:List of people by name/Individual Entries#Commas.
--Jerzy•t 17:31, 13 December 2005 (UTC)
The immediately preceding entry is a followup to the following "Commas and Dates" subsection, which previously appeared above at the top of this "Recently Active Discussions" section:
- In 2005 May and June, on Talk:List of people by name/Individual Entries, the following have been discussed (and IMO deserve wider input):
- commas adjacent to the parenthesized vital-dates info, under Commas, and
- what kinds of links and descriptive info to include in entries, under What Belongs in an Entry besides the Link?.
- Please weigh in, if only to say "Yeah, that's OK."
- --Jerzy·t 05:23, 25 Jun 2005 (UTC)
[edit] Within-section subdivision
See Talk:List of people by name/Intra-page structure#Within-section subdivision, where the use of multi-level bullet headings to subdivide sections is described for comment. (And the other sections of the page, which include some retrospective background.)
--Jerzy•t 09:08, 15 December 2005 & 05:08, 15 June 2006 (UTC)
- I erred: only this page ever had a section with that title. The entire page is relevant, but the focused discussion begins at Talk:List of people by name/Intra-page structure#Experimental Bullet-list Hierarchy; at this time its ToC reads (below a section-heading intended merely to amplify upon the page title)
- Theory
- Section Hierarchy
- Experimental Bullet-list Hierarchy
- Discussion of how to use bullets in subdivisions
- --Jerzy•t 05:08, 15 June 2006 (UTC)
[edit] Inter-page Indexes below ToCs
A wiser designer than myself would have realized two years ago that having the index to other pages of the tree at the top of pages had no reason behind it except the weight of habit. I should have moved the index on each page below the ToC, when i came aware of pages starting to have ToCs due to the creation of sections.
All the names-pages listed on User:Jerzy/LoPbN Pilot Pages have this new approach, which seems to work fine, and slightly more conveniently than the old version. (Most of those pages also feature the within-section subdivisions using multi-level bullet headings, as mentioned in the preceding section. However, the last two, List of people by name: Brb-Brd and List of people by name: name P are examples of the trivial or degenerate cases where there is too little content of names for multiple levels of bullets. They can serve as demonstrations of only what will change in introducing them, without the distraction of the bullet-list innovations i have proposed.)
I don't consider this change controversial enough to bother urging comment, but comment is as ever welcome, and if someone perceives a problem, i'll hold off on completing that conversion pending completion of discussion.
--Jerzy•t 08:23, 6 January 2006 (UTC)
[edit] Jerzy's Grand Tour of LoPbN
In conjunction with the elimination of the commas before the vital stats of entries, i intend to visit and edit every page other than the "Index-only" pages. A suggestion was made that a bot make this tour, but IMO it's an opportunity for to do some other things at the same time that i don't care to entrust to a bot.
Here are the tasks i have in mind for each such page:
- Remove the commas between the lk & any parenthesized vital stats that may appear immediately after it
- Convert to inx-after-Toc format, with these specific steps:
- Add a __TOC__ directive at the top (producing a ToC at that point, even on the pages with the minimal 2 sections)
- Place a heading reading "Access to rest of list"
- Leave the existing index template call as the next line, which will thus fall as the body of the 1st headed section.
- If there are no existing headings, add a second one, whose range roughly reflects the alphabaetic range of the actual entries (or, if there are no entries, echos, as an interim expedient, the exact range specified in the page's title)
- Gather data:
- Note sections likely to occupy more than a single screen on users' monitors, and whether the ToC is likely to do the same, for either immediate action or for later attention
- Note entries containing lks other than the bio for the entry's person, probably for immediate elimination if there are a handful or fewer, and for later action if numerous
- Note any presence on the page of the character Ð (Icelandic uppercase Edh, and i think sometimes a Vietnamese character), since it may be worth duplicate entries under D, Dh, and Th for the sake of accessibility to those confused by the odd character.
Let me know if you think of a short task that would be worth attending to on each of these pages, and i should be able to include it.
I'll probably start within the next 24 hours, but it should be a long process, so don't be worried abt it being too late for making suggestions -- or for stopping the ToC/Inx change.
--Jerzy•t 08:36, 6 January 2006 (UTC)
Just for the record, my "LoPbN Grand Tour" now include removal of vulture hyphens. They are so named bcz "Wikipedia (2001-)" is unseemly in appearing we are waiting for the subject to kick the bucket. The accepted formats are "(born 2001)" and (where the birth date is unknown) "(died 2001)" on LoPbN (and dates precise to the calendar day in bios). And i'm just starting Ak.
--Jerzy•t 21:51, 10 January 2006 (UTC)
[edit] Thoughts on semi-automated additions?
Being an addicted Perl hacker, I thought about scripting additions to LoPbN. It was relatively easy to whip up a script that would pull a series of names from Category:Living people and merge it with an existing LoPbN page, so that I could paste it into the page. Test run at List of people by name: Taa-Tax, under "Tam" (still some Unicode problems). But - these entries will be a little "naked" - no years, no occupations. And getting that out of the pages themselves is a somewhat bigger programming project. What's the thought of others - is it best to add them, or not to add them? --Alvestrand 03:36, 11 February 2006 (UTC)
- _ _ Since adjusting the section headings on Taa - Tax after your test, i've been waiting by the phone wondering whether that was a one night stand!
- _ _ A naked entry is something like 90% as valuable as a full standardized one, in terms of work saved. And tho i'll bet the human editors will lag far behind you in following up, most entries don't need the dates, nationality, and occupation-of-notability data, to do their job of getting users to bio articles -- even if no one were following up. But why not program, at the first sign of deviating from the prescribed bio-article lead-sentence format, to slap the rest of the lead sentence into a comment and go on to the next name? You'd be saving the editors who do follow-up all the mechanics of switching windows and cutting and pasting, so what's to complain about?
- _ _ BTW, the names with troublesome letters would also be worth including hidden in comments. I copied the garbled Méndez name from the history of your two edits on that page, and pasted it into the search box even without considering the context of the surrounding entries, i guessed the bad characters represented a form of E, and overtyped them. There was a rdr to the article from the Mendez version, so even the garbled version was pretty handy for adding that bio article to the list.
- _ _ On a blue-sky engineering basis, i'll mention, in case this goes on long enough, that i'm slowly working toward a more formal version of my "low-resolution terminology" for LoPbN occupations, and if your software got good enough at extracting occupations suitable for article leads, it wouldn't be crazy to give you a list of terms that, for instance, i manually convert to "soldier" for use on LoPbN.
- _ _ BTW, would you consider keeping a record on the talk pages of Categories that you use as input? An indication of the dates when you harvested specific alpha ranges could at the least guide others as to when it was worthwhile repeating the process, even if harvesting just new Cat members never becomes practical.
--Jerzy•t 06:43, 11 February 2006 (UTC)
-
- The year of birth could be gathered from the subcategory of Category:20th century births.
-- User:Docu
- Good to see that it's thought useful!
- The nice thing about gathering from Category:Living people is that I can do all that WITHOUT touching the user's page - if I touch the user page, I can gather much more info, but it takes another couple of seconds per user. Probably definitely worth it.... but another SMOP. Luckily I also found the page yesterday with Perl modules that take the guesswork out of parsing Wikipedia pages..... so that I can program the perl to deal with the content, not the framework. It's likely to be some time until I get time to work on this again - but it was fun!
--Alvestrand 10:20, 11 February 2006 (UTC)
- This is overall a very good idea, in fact I have been recently toying with the pywikipedia framework (sorry, no Perl for me!) with the intent of writing a LoPbN populating bot. But don't wait for me if you want to implement this on your own.
- I think that for the general case (any people category as input) the easiest way to get the dates is to parse category tags from page and detect 'xxxx births' and 'xxxx deaths' and get the dates from there. And if you give, say, Category:German biologists as input, it should be trivial for the script to know at least one occupation without operator needing to re-type it for all entries derived from that category. Or better yet, build a list of nationalities and occupations to some wikipage by hand (could do a recursive scan starting from Category:People by nationality to initialize it), read the cats from biopage and compare them to pre-built list and construct the occupation entry from that.
-
- Where is your bot account BTW? Remember that you need a permission from community (see Wikipedia talk:bots) to run your script, especially if you are loading pages with speed nearing the robots.txt crawling rate (currently 1 page per second). Most robots should use much lower rate anyway, given that websucking Wikipedia is generally forbidden (with certain exceptions). And I think that just crawling the 743 LoPbN pages and few categories and their content will be deemed a valid exception.
jni 11:38, 12 February 2006 (UTC)
- Where is your bot account BTW? Remember that you need a permission from community (see Wikipedia talk:bots) to run your script, especially if you are loading pages with speed nearing the robots.txt crawling rate (currently 1 page per second). Most robots should use much lower rate anyway, given that websucking Wikipedia is generally forbidden (with certain exceptions). And I think that just crawling the 743 LoPbN pages and few categories and their content will be deemed a valid exception.
-
-
- No bot account yet - that should be done. The current script pulls 2 pages and presents a text I can copy/paste into a browser - this seems to me to fall somewhere below the limit for a "bot" as defined by Wikipedia. If I pull the people pages, and automatically update (both are very tempting), I should get a bot account.
--Alvestrand 17:48, 13 February 2006 (UTC)
- No bot account yet - that should be done. The current script pulls 2 pages and presents a text I can copy/paste into a browser - this seems to me to fall somewhere below the limit for a "bot" as defined by Wikipedia. If I pull the people pages, and automatically update (both are very tempting), I should get a bot account.
-
It took a few months for me to return to this idea, but I did not forget.... you can find the output of an experimental run against the "Han" page on User:Alvestrand/LoP Experiment. Compare with List of people by name: Han.
I took the short bios from the first sentence of the page (first paragraph was too long). The format is quite variable. Also, there are some problems in determining the end of the sentence (people using . within anchors, for instance). Suggestions? (BTW, I asked on the bots page about a bot account - the response was "show me your code". Since I have to write the code first, I went back to coding...)
Thoughts?
--Alvestrand 18:46, 23 April 2006 (UTC)
- _ _ I think this is valuable, and IMO it is not out of the question to have whole pages replaced by such output, tho i would not want to see more than say 10 to 30 pages (about 2% to 5%) replaced without waiting for manual editors (&/or other types of specialized bots) to catch up. Despite some fairly systematic contributors to LoPbN (who may do a lot of biographical work, and routinely check to see if a bio article that gets their attention is already listed), i think most fall on a spectrum between putting down the first thing that comes into their head, and looking for comparable cases on the same page for guidance as to format and style of the entries they add.
- _ _ My biggest misgiving about the prospect of this bot rewriting the whole of the list in one pass is that it would exacerbate what is (and always will be) a tendency toward two complementary practices that i consider antithetical to the role of LoPbN as a navigational device:
- Taking wiki articles as a model, and making virtually all nouns and most adjectives links to articles.
- Writing prose or near-prose, rather than strictly structured name/vital-stats/nationality/category-of-notability entries.
- The pages from A through people named Lo are virtually free of lks other than
- those to one bio article per entry,
- the "access to rest of list" box of lks on each page, and
- a few "cross-reference"-type within-LoPbN lks. (IMO, more of these cross-refs would be good, but they are not a current priority for me.)
- The valuable additions, until domesticated, could be expected to encourage internal "link farms" and unstructured miniature (but almost-a-full-long-sentence) biographies in subsequent new entries on the same pages. (I'll be glad to go into why i find such content harmful to the navigational role, but that is largely separate from Alvestrand's question, and IMO should be a separate topic in another section on this page.)
- _ _ I note that apparently only living people have been added, that existing entries have been folded in rather than being replaced or omitted, and that nothing has been added into existing entries. If i'm mistaken, let me know; i'd undoubtedly have other comments after looking harder in response to having my impression corrected. I doubt there is a policy (since it would IMO be at seriously odds with WP traditions!) of gratuitously holding bot applicants "for ransom" while fuller feature-sets are negotiated, and even if there were, i'd probably advocate "licensing" this one as it stands, subject to Alvestrand's recognition of the hazard i've already discussed. Of course that won't stop me from publishing my "wish list" of possibly practical enhancements. (I do so with the understanding that i haven't seen the code, and may in any case not be in a position to evaluate it, so these are more blue-sky engineering on my part. I don't even have to hear what the reasons for ignoring my suggestions are, before resigning myself to their going unimplemented.) Its first draft follows:
- The <!-- added by mergenames --> comments are a necessary touch that had not occurred to me. Bravo. A hidden virtue of this is that altho a period within a comment (which causes an open-comment markup to be copied, without its end-comment markup), can't hide anything in subsequent entries. (Comments in the lead sentence of bios are probably much rarer than comments in LoPbN entries, but you could easily occasionally hide a big chunk of new and existing entries, without these fine automated comments.)
- The best single additional feature i can imagine would be tagging bios that don't begin with the MoS-prescribed name/parenthesized-vital-stats/is-or-was-a/nationality/more-stuff/full-stop format. Lots of bios do and should break out of that, but there are enough that respect it and enough that don't but should, that it would be worth having a Cat (akin to stub-cats) for those that break out, so that editors could then go thru and convert Alvebot-0.2-noncompliant tags to either Mosbio-noncompliant or Alvebot-0.2-false-positive tags (without having to pay attention to the bot-detected compliant ones).
- Almost never should a bio end its first sentence with a blank-letter-period-blank sequence. In fact, pathologically undecidable cases like
-
- Frank is in the St. Charles races today.
- are rare enough to be handled by following up, with our standard enough-eyeballs methods, any automated-tagging false positives that result. In light of that, it might be worth a suite of features that include ignoring periods inside any form of bracketing. (Idly, i'm picturing ''', '', ', {, [, (, <, __, and &1 thru &9 as forms of open-bracket.)
-
- Living-person-Cat'd bios with death dates (e.g. "-2000)") could be flagged.
- All (square-)bracketed constructs could be stripped down to the piped text, or article-title if unpiped; unpiped external lks can have interim value (until they are used in writing a missing article) to editors, but IMO should be hidden from non-editors in inside comments.
- Everything inside the first set of paren, except years, hyphen, and vital stats words (born, died, flourished, floruit, circa, their abbreviations, and ~, *, †) could be stripped, and the v-s words could be standardized to the local standard re abbreviation. (E.g., de: uses *, †, and um where en: uses born, died, and c. I favor b., d., and c., but never mind.)
- Where LoPbN has an entry lacking YoB or nationality, or ends immediately after the nat'ty, the existing entry could be enhanced with info from the bio article.
- Living people are absolutely first priority and dead (or even uncertain ones) far behind. Of course at some point, it would be great for this bot or a closely related one to be able to either do one death-year at a time, and/or to collect (partially?) "flattened" versions of the Category:Deaths by year hierarchy, and then fold them into the existing LoPbN entries.
- What follows occupations (or other sources of notability) could be trimmed off, usually, once half a dozen or a few dozen transitional markers like ", who" or a semicolon are identified.
- "Vulture hyphens", e.g. in
-
- *Hansen, Rolf Arthur (1920-) ...
- could be fixed. (I'm reminded of Zotz!, where the main character, after fiddling with a pentagram, takes the next person he encounters (a reporter who merely says he's from "the Star") to be an occult agent, and is especially frightened by the apparent prophecy implicit in "we have a photo of you in the morgue". (See final 'graph of the morgue article's first section.)) All our living subjects are going to die, but we don't have to be seen preparing for their deaths.
-
- The entries for
- and
- Hansen, Søren (born 21 March 1974), Danish golfer
- involve a couple of levels of complication. Their order (preserved here) probably reflects a tag coded as
- [[Category:Living people|Hansen, Søren]]
- and at one level "the problem" is its being coded with a simple inversion of the title, rather than as
- [[Category:Living people|Hansen, Soren]]
- At this level, there are in theory three compatible approaches to "the fix":
- Going "by hand" thru the Ha section of the Cat, looking for bios listed "out of order", and editing the corresponding bio articles, changing their Cat tags accordingly, before the running of mergenames. (In practice, it almost always suffices to check only the bios with non-English characters in their titles. However, Hans Hoogervorst is one of the exceptions: his Cat tags are (all) unpiped, a problem unrelated to character sets.)
- Enhancing mergenames do the same kind of inspection; hardly a realistic option!
- Doing a check similar to the first, but only on the completed output of mergenames. This has an advantage over the first: some alpha anomolies become visible only in the context of old names in LoPbN that have no Living-people Cat tags.
- At another level, some editors would, i think, argue that there is no problem: that the collating sequence MediaWiki provides (all non-English characters following Z somewhere, i think) is a standard, and that users of en: just need to learn to get used to that counter-intuitive standard.
- The instance i saw of duplicating an old entry reflects the old entry lk'g to a joint bio, and the new one lk'g a new individual bio. This seems to me a general maintenance issue, & not anything inviting a mergenames solution.
- --Jerzy•t 05:25, 25 April 2006 (UTC)
Thanks for the feedback! Some issue thoughts:
- on source: I use "Category: Living people" for testing - it seems that approx 8% of the people listed in "living people" are already listed on LoPBN - so we should expect the LoPBN page to grow a lot longer!
- on sort order: "Hansen, Søren" sorts after "Hansen, Svein Roald" because "ø" is an Unicode codepoint with a larger value than "o". I converted to lowercase and used the perl "cmp" operator - I think it's possible to make that sort in locale order, but pretty sure it sorts by Unicode codepoint by default. The nice thing about this sort order is that it's unique and predefined; it's got no other redeeming features.
- on bio string: I wonder if a better approach is to synthesize the bio string from the categories, now that I'm fetching them anyway. So something labeled with Category: 1996 births and Category: English people would get " (born 1996), English" as his bio. I could add the first sentence of their bio inside comments, for ease of editing when the manual editor comes along. That gets rid of the vultures too :-)
- on what is added: What I did was to look at a block of 200 living people, and assume that the article name was unique. If I found the article name on the page, I skipped it; if it wasn't there, I added the person. This seems to work well, apart from the twins murder case - but I think *no* automated process could have caught that, so I'm not worried.
- on existing entries: I haven't touched them at all. I think that's a reasonable approach for a bot to take: Add new information, but don't do anything to damage information entered by hand.
I'll code up a version with a synthesized bio string and make a test page that uses that. Test pages are good! --Alvestrand 13:55, 25 April 2006 (UTC)
I created Yet Another Sample Output - I think I found a reasonably scalable way to generate bios from categories (with the help of some offline edit). Can we take the discussion to User:Alvestrand/LoP Experiment? --Alvestrand 20:42, 25 April 2006 (UTC)
[edit] Mergenames outputs
I've done one live insertion: Taa-Tax is now filled in with the names from Category:Living People.
The result was 136 Kbytes long. I've done some prettying (and found 1 bug in the program). Check it out.
I'll be away on holidays for a week, so nothing more will happen for a while. --Alvestrand 18:30, 27 April 2006 (UTC)
- I'm continuing through the T sections with mergenames - still debugging the code, and manually entering the results. Currently around Thomas, and worrying about the practice of sorting Thomas Aquinas differently from August Thomas.... --Alvestrand 17:33, 26 May 2006 (UTC)
[edit] Thomas as given name
- [That heading was the initial one on Talk:List of people by name: Thom; in the absence of an automatic rdr'n facility, it is retained here to remain a target for links. The section's content now appears in #People well known without reference to a Surname
[edit] People well known without reference to a Surname
- [This discussion began at Talk:List of people by name: Thom but its implications make it more appropriate for discussion here.]
[Former heading: === Thomas as given name ===]
I dislike this category. It sorts badly, and I believe it confuses people. People would expect to find Thomas a Kempis either with the rest of the "Thomas, A"s or with the "Thomas, K" people.
It also tempts people to add people that are normally alphabetized by their surname, such as Thomas Ravelli, who I just deleted. But for now, I've just reformatted it to sort better.
--Alvestrand 16:58, 26 May 2006 (UTC)
- It's gone.
--Alvestrand 06:09, 29 May 2006 (UTC)- _ _ It's unfortunate that i became aware only late last week that this had finally (after 20 some months) become controversial, and couldn't take time to begin putting down a coherant comment on it until now. But (i assume) only the single section, and any subsections it may have contained, have been merged
intoin among the people with surname Thomas. Thus what we have is not so much the fait accompli that your tone suggests, but a page at odds with a fundamental design decision that users are used to and that is still is in effect on the many other applicable pages,that exist among these roughly 700 LoPbN-tree pages. - _ _ I think that for now
, keeping that page (that Alv has reshaped) as he left it, in this respect, is for the best:it is best to not restore the separation of given- and sur-names Thomas (which Alv eliminated): Keeping it that wayitfurthers consideration of the overall design decision. I would suggest, however, thattheyAlv foregothatextendingthatthe intermingled approach for now, and that one of us pick out another page with a decent-size "People named ..." section that we can use to do a mock-up of an approach that can at least respond to the bot difficulty (which obviously raised the issue). Much of what i will have to say about the issue will be easier and clearer via concrete examples (the Thom page and the page to be named ASAP) than in the abstract. - _ _ The user-friendliness issues also deserve to be addressed, but they really are separate; i'll try to get a branched-off thread for that started, elsewhere within this section.
- _ _ My time available for WP will unfortunately be much more restricted in (at least) the next
threetwo months than usual, but i will move this design decision to the top of my WP priorities for now. I hope colleagues will bear with me.
--Jerzy•t 14:15, 5 & 16:27, 9 June 2006 (UTC)
- _ _ It's unfortunate that i became aware only late last week that this had finally (after 20 some months) become controversial, and couldn't take time to begin putting down a coherant comment on it until now. But (i assume) only the single section, and any subsections it may have contained, have been merged
-
-
- Note - I have no problem with "People named Thomas" sections. The Thomas page was the only one in all the T pages where names were placed differently on the page based on some thought about whether the name was a surname or a given name. On all the other pages, the names were simply sorted on what would be natural lexicographical order when written as they would normally be presented for sorting - that is, "surname, givenname title" for people with both types of name, and "name title" for others. Just to be clear on what the problem was. --Alvestrand 14:59, 5 June 2006 (UTC)
- Yes, i rushed in replying, & proofread inadequately. (Conceptually, an enclosing "People named..." section (or bullet heading) is a prerequisite for an "... as surname" or "Surname..." one, setting a trap for me.)
--Jerzy•t 03:41, 6 June 2006 (UTC) - At the risk of quibbling, something like 75% to 99% of the hundreds of pages are in what we might as well call "this modified lexicographic order" (MLO). But people known by a given name are rare, and not all of their given names are also a common surnames. (The Separating Surnames from Identical Given Names section i just moved happens to point out how surprisingly rare the surname "John" is.) Thus even a list this large will quite seldom make it apparent whether the NLO or MLO is in effect in a given stretch. There are probably at least a few pages, primarily in the last half of the alphabet, where i've never made sure that MLO is adhered to; on the other hand, it is far from certain that it would make a difference on any of them.
--Jerzy•t 16:27, 9 June 2006 (UTC)
- Yes, i rushed in replying, & proofread inadequately. (Conceptually, an enclosing "People named..." section (or bullet heading) is a prerequisite for an "... as surname" or "Surname..." one, setting a trap for me.)
- Note - I have no problem with "People named Thomas" sections. The Thomas page was the only one in all the T pages where names were placed differently on the page based on some thought about whether the name was a surname or a given name. On all the other pages, the names were simply sorted on what would be natural lexicographical order when written as they would normally be presented for sorting - that is, "surname, givenname title" for people with both types of name, and "name title" for others. Just to be clear on what the problem was. --Alvestrand 14:59, 5 June 2006 (UTC)
- _ _ There are a thousand things that tempt putting Tom K Smith under T, starting with vanity ("i'm known by my first name, just like Madonna", plus the desire to be listed as many places as possible, e.g., under S, To, Th, and K, and on 7 other lists.) This one could be ameliorated by instruction creep: instead of "Thomas as given name", use "People well known by given name Thomas without reference to a surname", but it's unnecessary for careful editors and ineffective for careless ones. WP relies on lots of eyeballs, which happens to be a more effective solution than any design decision.
- _ _ WP (as is easy for us to forget) exists for the sake of readers, not of editors, and i will make some time to state why dumping both kinds of Thomases together is bad for readers even if it makes it easier on contributors and maintainers of entries.
--Jerzy•t 14:10, 6 June 2006 (UTC)
-
- _ _ It's possible, but not obvious, that
- Thomas as given name
- encouraged the addition of
- *Thomas Ravelli, Swedish goalkeeper
- or of (to take just one more out of at least four recent examples) Saresto's addition of [sic!]
- *[[Thomas Aresto (born 1969)Husband and hockey fan
- That more bizarre example suggests that your reorganization is far from likely to eliminate such misplacements -- as does the fact that the Ravelli insertion is the sole LoPbN edit of a 600-and-some-edit contributor over the last 4 months, and that that contributor showed no sign of interest in the absence (which i remedied the other day) of Ravelli on List of people by name: Ra.
- _ _ I doubt interviewing the editors in question would be either feasible or credible. (And i think the spirit (whether or not the letter) of WP:POINT rules out a controlled experiment.) Barring that, even approaching verification of your assertion would mean going back even further than the period preceding my putting (17 January 2004) a bold heading (but not a section heading) in front of 11 given-name-Thomas entries. At the time the page was created (and named List of people by name: Tf-Th) -- presumably by subdividing List of people by name: T, tho i could be guessing wrong -- it had 8 given-name-Thomas entries, of which four were grouped immediately before Thomas, Alma, and three were grouped immediately before two surname-Thomas entries that lay in the midst of the Thu entries. (At that time, there was also a "see"-lk to Thompson, between the two "Thomas, R..." entries that immediately preceded the Thompson entries. Should i just have said "things were still pretty chaotic"?) My real point is that WP was then much smaller, and had a much smaller and plausibly very different editor corps, so the old data may have little relevance. (You would of course have to examine misplacements of the given-name-Thomases as a fraction of some other collection of additions, and not the time-rate of such misplacements.)
- _ _ What i will argue (more specifically, as occasion arises) is that the problem of human-misplaced entries requires ongoing human attention, and even if it didn't, its solution would not be worth the costs of the change you advocate.
- _ _ More to the point, your proposal-by-implementation to change the order of this section implies doing the same for the other dozens of such sections. (
and perhapsThere may be even more dozens of non-section ranges for names too rare to have separate given- and sur- name sections, or even too rare to be more than a range with a "names beginning with the string ..." section. Only "...as given name" sections "reach out and grab" your attention, and looking at tens of thousands of entries is the only way to find the instances that don't lie within a "people named..." section (or such a bullet heading). My first reaction to your change was to go look at Talk:List of people by name/Individual Entries, in order to cite some relevant material there. (And to move into it some unrelated material i'd intended to add. That helped make clear how badly the page was organized. Today, i've refactored it, mostly in terms of new headings and shuffling of the levels of headings.) Specifically, what is now People known by Given Name without Reference to a Surname. I think the start of that section is the earliesteximplicit justification (and perhaps the source), exactly a month short of three years ago, of this sequencing. (BTW, IMO it is the greatest single source of the need for sections, and in some cases bullet headings, as additional structure within pages.) - _ _ In a line: how would you treat User:SGBailey's examples without handling principle-key surnames separately from principle-key given names? (You may have an answer to that; if so we will have a model specific enuf that we can start discussing the negative impact of your model in light of the high multiplicity of variant name-versions of medieval and ancient given-named people.)
- _ _ All of that being said, i grant that the separation of surname- and given-name-sorted entries poses problems for your bot. And i look forward to working with you to find automatable solutions to those problems.
--Jerzy•t 20:31, 8 & 16:27, 9 June 2006 (UTC) - I have worked out a change to LoPbN, limited in one sense but radical in another, that i think is an important part of responding to this concern; i think it stops short of fully solving the problem you're concerned abt, but points the way to shifting the unreasonable decision burden that the current approach leaves resting on the bot. You may be heartened my cryptically saying that while i continue to believe that seekers of people not known by surnames cannot be adequately served by mixing those people's entries in with entries for the same name as surname, i now think separate given-name stretches on the respective pages are probably not the best way to avoid that. I'm contemplating an explanation and a mockup (of probably the Th subtree) to illustrate it.
--Jerzy•t 04:30, 19 June 2006 (UTC)
-
- Not sure what your suggested proposal is.... I think it is hard to defend the viewpoint that there is a straight division between surnames, given names and "the name people are known by". In many cases, people will not know (and will not care) what the name they know a certain historical figure by is. (Examples: Dante, Vivaldi, Cicero, Leonardo, Michelangelo - which ones of these are surnames, and what's the logic by which their "known name" is picked?) In such a case, I believe listing the people alphabetically by the name we expect people to look for them by, without care for what the name is supposed to represent, is a more sensible strategy than to try for a division into classes of names that a) isn't formally defensible, and b) is certainly not *mechanically* findable. --Alvestrand 05:44, 19 June 2006 (UTC)
My refactoring of Talk:List of people by name/Individual Entries in the last 24 hours omitted a needed relocation of the section, relevant here, of Separating Surnames from Identical Given Names. If you read it earlier, you missed part of what should be there.
--Jerzy•t 16:27, 9 June 2006 (UTC)
[edit] LoPbN page census
Note - the following Google search string will find all 53 examples of this type of section, I think:
- "as given name" "list of people by name" site:en.wikipedia.org
The search "list of people by name" site:en.wikipedia.org claims to return 59.300 pages - that seems like a lot...
--Alvestrand 15:04, 5 June 2006 (UTC)
- You were searching there for pages containing the string, so that
- allintitle:list-of-people-by-name site:en.wikipedia.org
- would be closer to what you were after; it gets rid of avt 75% of that number. Some of the inflation reflects, i think, a bot that created a rdr for each pageless cell of the 26x26 table. But even inclusion of templates and rdrs doesn't explain to me the 15K that the tighter search leaves. It might be interesting to extract the page titles of the 1000 available hits, sort them into alpha order, and see what that 3% sample looks like.
- My 700 figure reflects what a bot run (IIRC) by User:Jni came up with -- in the 660s -- and allows for some growth since. However, that includes index-only pages. The number of interpage lks on {{List of people by name exhaustive page-index (sectioned)}} + 26 - 3(for Q, U, & X) +1 (for the root) usually provides an exact number for pages of the tree, and subtracting the number of bullet points on that page leaves just the pages for actual and potential names -- a more relevant figure. (I just verified that the lines of that page match up with all the mostly-blue-lk'd ones of the 6 alpha sections of User:Jerzy/Argus for LoPbN Templates#List of Templates for particular LoPbN letter combos, so the present moment is part of that "usually" -- and should remain so, until i get to work further subdividing descendents of the T page.
--Jerzy•t 03:41, 6 June 2006 (UTC) - A look at Special:Allpages/list of people by name and Special:Allpages/list of people by name: Rf suggests that, contrary to my understanding, the 700-ish figure reflects rdrs in addition to the index-only and names pages. That makes sense out of my recollection that i had a 350-ish figure in mind, based on the calculation that i describe in my last preceding paragraph, and was surprised by the growth implied by the figure i attribute to Jni. I can't remember when i last did the calculation; if i haven't since then, everything but the Google figure seems explained.
--Jerzy•t 04:28, 6 June 2006 (UTC) - Special:Whatlinkshere/Template:Index only and Category:Lists of people by name indicate indicate about 110 index-only pages; the Cat puts the tree just short of 800, and thus the name-entry-eligible pages in the high 600s; so far this looks to me like a discrepancy. Is Allpages way behind real time?
--Jerzy•t 05:10, 6 June 2006 (UTC) - My uncritical trust in the bot-generated 660's figure was unwise, since the use of rdrs in the classic (about 3-year-old) rectangular table invites tree-traversing algorithms that would count some pages and entries more than once. While it relies on adherance to design decisions that are not stated in any structured fashion, Template:List of people by name exhaustive page-index (sectioned) and/or the implicit extracts from it within each page's "Access points to page-tree of List of people by name" box are the basis of the simplest reliable algorithms.
--Jerzy•t 14:10, 6 June 2006 (UTC)
strictly BTW.... some other research I did with a reasonably fresh Wikipedia dump leads me to guess that there are approximately 182.000 person articles in the database. If I'm able to verify that, I'd like to use that list as the input for my next pass at LoPBN.... it's about twice the number of the "living people" category, so it shouldn't be that much worse.... but we should expect the number of names in LoPBN to increase roughly by a factor of 5. The number of pages may have to increase that much as well. --Alvestrand 21:00, 9 June 2006 (UTC)
[edit] LoPbN Great Leap Forward
[edit] Census Growth
_ _ In #LoPbN page census (and the 'graph preceding this heading, as of this writing), Alvestrand discusses the prospect of an automated expansion of LoPbN by about a factor of 5. They express the expectation of the number of pages perhaps "increas[ing] that much as well."
_ _ I don't know if that expectation rests on more than my own intuition, which i presume will coincide with most people's first impression: "The various aspects of things increase in proportion to each other."
_ _ Of course, actually that's true for one-dimensional things. Tho it's tempting to think that lists are one dimensional, i suspect that the fact that some pages need splitting because the ToC gets too many entries (too wide a tree) and others because of too many levels (too deep a tree) indicates otherwise. I'm not sure whether that means 2-D behavior (or something between 1-D and 2-D), with pages page count growing slower than entry count.
_ _ Other complications include
- Many pages don't "fill" even their first section, and only a few pages (the ones that are over 80% "full") are likely to be split 5 ways;
- If pages were 1-D and the statistical distribution of free percentage were uniform, we'd expect a page-count expansion of 2.5, but (while we have very little idea about that distribution) a great many splits produce a big page (which sometimes is nearly as wide but permits a lot of expansion before it gets as deep again) and two empty or very small ones, so that distribution may strongly peek below 20%-full.
- On the other hand, many "half-full" pages will split into three rather than two pages when the number of entries doubles.
_ _ Perhaps Alv. or someone else knows theorems about such info structures that are over my head, and can draw reliable conclusions that i cannot.
--Jerzy•t 03:43, 12 June 2006 (UTC)
[edit] Making the Leap
Besides the issue user:Alvestrand has raised about separation of given-name entries from surname ones, it may be worth examing the 182K names to anticipate almost exactly what new pages are needed, before inserting the entries. It's hard to be sure what the cost to users is if the navigation to existing entries suddenly becomes dependent on lots of scrolling as in 2003, so pre-building some structure for the new entries to expand into may be the only reasonable decision. I'll think about that process and write more here.
--Jerzy•t 03:43, 12 June 2006 (UTC)
Here's a straw-man plan for avoiding a long period of disruption by the LoPbN GLF. It's intended to elicit a better plan, that takes into account all the issues i haven't thot thru. It has two phases: Steps 1-6 primarily find out fairly precisely the shape the revised LoPbN tree should have, while steps 7 & 8 make the revisions available to users.
- Alv modifies mergenames to read the existing LoPbN page as now, but
- (instead of rewriting the existing page) to put the results into either (LoPbN-GLF-wide decision to be made) a GLF subpage of the existing page, or of its talk page, and
- to transcribe the template for the "access to rest of list" array of links, inside a comment, to avoid accidental linking out to the working LoPbN pages when seeking related GLF sub-pages.
- Mergenames reads Alv's grand list of people, and puts the resulting output into the corresponding GLF pages. (IMO, it's worth omitting the (normally valuable) comments, which are an editing burden and not needed for the first phase.)
- Various editors, certainly including me, further subdivide each GLF page's section structure into a model of a structure with hierarchical sections (and sub-sections) of 25 or fewer entries each. This will inflate many of the ToCs beyond the 25-line size. In many cases it will also require more than the 5 levels of sections and subsections that we normally use. (In fact, it will, probably often, require more than the 6 levels that MediaWiki supports. I suggest blithely (i.e., as if we didn't realize MediaWiki can't render unlimited depths of section nesting) coding section headings with 7 or more equal-signs at each end, since
- the rendered result is reasonably useful to editors, and
- the later use of global replacement, to de-subordinate such headings within newly subordinated pages that no longer need their former top-level headings, is made especially efficient by this practice.)
- Editors comfortable with the process involved break up the GLF subpages to achieve ToCs with 25 lines or fewer, and 5 levels or fewer, omitting the "access to rest of list" link arrays or placing the templates for them inside comments.
- Editors, preferably those comfortable with the use of subst-ed templates for the process of generating the needed markup, create the other needed templates. Note that modification of pre-GLF templates, and changing of which template is called by a pre-GLF page after being subdivided for the first time, are tasks that must be put off for later in the process. (Jerzy should write a tutorial on the general process and this variation on it, and solicit editing of it and demands for clarification.)
- As GLF subtrees complete the preceding steps, editors enter the pages' statuses into a chart similar to Template:List of people by name exhaustive page-index (sectioned) (as to layout and perhaps lks, but not coding).
- When all GLF subtrees have completed the preceding steps, time has passed, and a small fraction of the one-fifth million bios will have been added or deleted since they were placed in the GLF sub-pages. Adding the out-of-date entries would be an improvement, but we can do better by "refreshing" them, via a further modification of mergenames that uses four classes of input:
-
- The freshest, most complete version of the all-people list,
- the existing entries of the LoPbN tree,
- the structure among the GLF sub-pages, and
- the GLF sub-pages' respective section structures.
- It should use this to do mergenames's normal job, (including the familiar mergename comments, which will be valuable in hand-building the descriptive portion of each entry). Especially if mergenames can carry out page-moves, it makes sense to rename the subpages from (main-namespace or talk) subpages to main-namespace LoPbN-tree pages, after overwriting them with the composite content just described. (The alternatives are to leave the no-longer-useful GLF subpages lying around, manually merge them, or delete them and lose their histories.) Subtrees should be added to the LoPbN tree by processing their nodes, leaves first and moving toward the root; each page's processing includes de-commenting the transclusion of the "access to rest of list" template.
-
- For each GLF subtree whose parent is an existing mainspace LoPbN-tree page, the step that grafts the subtree into the mainspace tree is replacement of the corresponding mainspace page with an index-only page. (This may be done following either of two distinct policies, which to my knowledge have never been discussed; perhaps it's time for me to describe these two policies in some detail - but not in this msg. The difference between them is whether the earlier history of a subdivided page ends up as the history
- of the inx-only page or
- of that page's most populous descendant pages.)
--Jerzy•t 18:38, 12 June 2006 (UTC)
[edit] Format Barriers to Expansion
_ _ It's quite possible that i am the predominant barrier to Alvestrand's expansion project. There are two format issues whose prior resolution i have heretofore urged: the exclusion from strict alphabetization of most entries for #People well known without reference to a Surname, and the extension of the subdivision of pages beyond the section-heading structure (using multi-level bullet-headings). I continue to defend the long-term utility of both of these measures, but i am now convinced that their short-term value is not great enuf to justify further delaying the very welcome "Great Leap Forward" (as i term it).
_ _ I support whatever plan Alvestrand settles on for doing the expansion, provided the existing entries are preserved. I presume that means manually or automatically relocating the non-surname entries (automatically recognizable either by their being out of alpha order, or by the wording of their headings), and manually or automatically discarding the bullet headings (automatically recognizable either by their lacking links or by their having (typical) entries subordinate to them). (It would sure be nice to keep, in the course of those deletions, a list of the names, e.g. Thomas, that currently have headings reflecting their being both surnames and given names, with the potential rebuilding of the headings in mind.)
_ _ (And, in the long term:
- It remains my opinion that
- given names that are used as principal sort keys become needlessly hard to locate when interleaved among people with that name as surname, and
- quick searching is facilitited by subdivision of sections sufficiently to justify the extra space consumed.
- And i look forward, following the GLF (and any lesser leaps deemed necessary for adquately reflecting the existing bios that belong to descendants of Category:People), to exploring how to
-
- get Category:People-descendant tags into LoPbN bios that lack them, and
- keep LoPbN up to date with future additions to Category:People descendants,
- while introducing or restoring aids to access including or like the separate as-given-name headings and the multi-level bullet-headings.)
- The main barrier is, as always, my time and energy.... I am worried about departing from a mechanical alphabetization because the process of automatically adding new people should continue - we'll have the problem of manual re-sorting every time. And I continue to maintain that the distinction between "given name" and "surname" is a local convention of limited applicability. --Alvestrand 19:29, 27 July 2006 (UTC)
- _ _ Your rate is the right one until someone can do it faster, or somehow can help you.
- _ _ Pretend for now that i agree with you, and explain to me later what "local convention of limited applicability" means and how it matters. (But a std comment whose contents begin
- **mergenames** Known by given name
- and an optionally coded (default-No) field named ALPHBYGIVENNAME in the Persondata tag would be a beginning toward making the division mechanizable. But i realy gotta run.)
--Jerzy•t 19:58, 27 July 2006 (UTC)
[edit] Can we also cater this article for the likes of people with numbers in their name??
--Greasysteve13 07:26, 12 July 2006 (UTC)
- These are easy bcz everyone knows how to pronounce them:
- Thirteen X, Clarence (or perhaps, in recognition of the lack of blank, Thirteen-X, Clarence but that's a trivial issue in light of the small number of Thirteen... surnames.
- Fifty Cent (and of course Fiddy Cent)
- (There's a list of even weirder ones, that IMO deserve less attention than these "near outlying" formats, for more notable people.)
- The answer to the obvious objection "but their names aren't spelled that way" is that we provide many, many misspelled entries (normally duplicating correctly spelled ones, since the correctly spelled ones fit neatly into the alphabet, which is the fundamental finding mechanism for this tree of pages whose reason for existence is finding): An example that comes to mind is Janus, Byron. LoPbN is not where you find the correct spellings, it's where you find the bio article that has the correct spelling(s), the ranks of someone listed as "Israeli soldier", whether they play ice hockey, field hockey, or (for all i know) roller hockey (is that what they play on the roof in Clerks?), and which two years might be Michael Cimino's "c. 1940" year of birth.
--Jerzy•t 17:11, 27 July 2006 (UTC) - (Uh, LoPbN and the tree it heads are in the main namespace, most of whose members are articles. But Rdrs, Dabs, and lists of articles, including LoPbN and the tree and the individual pages of the tree, should probably not be thought of as articles, any more than Dabs or Rdrs are.)
--Jerzy•t 19:10, 27 July 2006 (UTC)
[edit] AFD over
The deletion discussion has been closed and the consensus was to keep; however do not allow that to make you complacent, please. Many comments in the discussion deal with unwieldiness of the list(s), the benefits of converting this to {{persondata}}, the incompleteness when it comes to existing biographical articles on Wikipedia, and similar concerns. I encourage you to use this input and solicit discussion with those persons, if possible, to achieve better usefulness for this list to avoid future nominations. One question I would have is whether this list truly belongs in articlespace, or if it is more useful/less contentious in Wikipedia space. -- nae'blis 22:43, 5 January 2007 (UTC)
[edit] Little question
Hey, can any person in the world be posted up there? --Zouavman Le Zouave (Talk to me! • See my edits!) 17:12, 6 January 2007 (UTC)
- No. People who are Notable by Wikipedia standards can be. --Alvestrand 00:14, 7 January 2007 (UTC)
Oh okay... Thanks ^^ I'll try to add as many people as I can find. --Zouavman Le Zouave (Talk to me! • See my edits!) 12:10, 7 January 2007 (UTC)