Wikipedia:LoPbN Meta-structure

From Wikipedia, the free encyclopedia

The list of people by name provides users with means of access to biographical articles on real people. The bulk of the list's multi-page content consists of entries (on pages other than List of people by name), with one link to a bio per entry, and occasionally multiple entries linking to the same bio.

This page concerns the relationships among the list's multiple structures; it addresses specific structures only to the extent useful in helping distinguish among them and describe their relationships.

Contents

[edit] Overview

[edit] The Entry

The structures of this list are assemblages of entries, or higher-order assemblages of structures ultimately made up of entries.

This account of entries is intended to clarify LoPbN entries to the point where the reader can recognize and analyze concrete examples of LoPbN structures. (Most of the remainder of this section will be moved later into another Wikipedia:LoPbN page, with a reference lk'g to it appearing here in its place.)

Each entry for a biography includes a link bearing the name of (or one of the names for) the person. To further distinguish among people with the same or similar names, or otherwise easily confusable names, nearly every entry includes a very small amount of additional information. It is limited to avoid clutter that could distract and thus obstruct navigation:

  • vital statistics (partial or complete) or a year or period when they "flourished"
  • a nationality, and/or in special cases, national or ethnic origin
  • a very broad description of the area (usually livelihood or calling) that led to the person's notability.

For any given person on the list, there may be more than one entry if the person is known by multiple names: any name that distinguishes them from nearly all other people can label a useful access point, including

changes of name (as by marriage or adoption),
pseudonyms and very well known nicknames,
variant transliterations into English (e.g. Goedel vs. Gödel), and
common or natural misspellings such as interchanging Holliday and Holiday, or Marden and Martin.

A point that cannot be over-emphasized: the purpose of the list is to locate biographies. Information that appears in the bio is redundant in the list (and information missing from the bio, that could conceivably belong on the list, is always at least as important to add to the bio as to add to the list). Such redundancy is justified only when it aids, as briefly as feasible, the goal of distinguishing the person whose bio the user seeks from another who might be confused with them, in spite of the following crucial given: this list is (and must remain) of significant use only to a user who can rule out almost all entries using their knowledge of either

  1. the spelling of the most important part of the name (usually the surname), or
  2. a handful of reasonable spellings that could have led to their understanding of that part's sound.

Occasionally, an entry on LoPbN has a "red-link" on the name, showing (barring a typo) that at least one editor thought they knew the spelling, and thought a bio would be justified, but the bio article does not yet exist. Such an editor has made a useful contribution by asserting the spelling as correct (or, by piping that spelling to a title with the correct spelling, as being plausible for users to search under) and the notability of the person. This can be regarded as an entry-and-bio suite whose creation is in progress.

On the other hand it is of course entirely possible that the editors who considered the red-linked entry were all mistaken:

  1. In a "good" mistake, the bio article being sought may exist with the different title from the red one the entry links to, which is either
    • a different spelling,
    • another version of the name (transliteration, more or less complete name, or pseudonym), or
    • a different interpretation of where the surname starts.
    In such cases, the link probably needs correction, once the article for the person intended is tracked down; it probably suggests a needed redirect, but the entry needs removal if the error is too idiosyncratic to provide a useful "workaround" link from that name.
  2. In a bad mistake -- an entry for a non-notable person -- the entry impedes the purpose of the list and needs removal.

[edit] Logical vs. Physical

These two terms make more sense seen as relative than as absolute, and more metaphorical sense than literal. The physical level of abstraction here is a view forced on the Wiki editor, who edits specific Wikipedia pages in placing an entry, and positions the entry relative to the markups of section and bullet markups. The logical view is the least detailed model that can help a user be efficient in choosing which page and section links to follow, and in moving her eye up or down in response first to bullet headings (where they exist) and eventually to entries near the one sought.

[edit] Logical Structure of the list

The basic logical structure of the LoPbN list of bios is, to say the obvious, that of a list: one entry after another in the fixed order described below. There are two kinds of reasons for providing additional structure:

  1. implementation under the MediaWiki facilities, and
  2. ease of use (visual comprehension and navigation).

Logical issues that arise at the three "physical" levels are similar, and current approaches at these levels follow common principles, resulting in closely analogous structures. (See ---.)

[edit] Scheme of Ordering

[edit] Significance of Punctuation

Phonebooks and dictionaries in English tend to differ somewhat in alphabetizing: dictionaries usually treat only letters as significant; phonebooks give a role to spaces, hyphens, commas, and the like. This phonebook role falls just short of the role given to the end of the text whose spelling participates in determining the entry's position. (More specifically, phone books (and LoPbN) can be said to treat these characters as "coming before the letters in the alphabet". The simplest rigorous description, beyond the scope of this page, has two complementary descriptions: one of them describes a sequence of consecutive sorts of the whole list; the other describes a recursive process of establishing a relatively short series of "holding areas" (in alpha order relative to each other) that items are sent to, on the basis of a coarse subdivision process, and then repeated subdivision of each holding area and more finely sorting its contenst into the new subdivisions, for as many repeated assignments of each item as are needed to get every holding area down to either one item or to a group of items whose order relative to eadch other is of no concern.)

LoPbN follows a version of this phone-book alphabetizing.

[edit] Alphabets

Most languages with Latin alphabets include some letters outside the 26 of modern English. A native speaker of such a language may reasonably hold that each of those "extra" letters

  • is always considered, e.g., to follow a similar letter (e.g. n, then ñ), or follow Z, in that alphabet;
  • is still a letter of that alphabet when it appears in LoPbN; and
  • therefore forces words including it to be alphabetized according to the rule of the letter's home alphabet.

At the cost of offending some such colleagues and other users, the alphabetizing of LoPbN is based strictly on the 26 letters of the modern English alphabet. Diacritics are treated as not affecting alphabetization. A handful of further cases within the Latin alphabets, that cannot be reasonably treated as "plus diacritic" variations, are treated as if they were two-letter combinations.

The logic of this is that

  1. it is unlikely that a scheme for alphabetizing all of these letters on the basis of their native languages' rules could be formulated in terms other than separate rules about, e.g., what Ä goes after when it occurs in a German name, and what instead, when in a Swedish name;
  2. even if a simpler system than that is possible, it would be so specialized that no editor would pick it up just by their experience editing; and
  3. virtually no non-editing reader would make the effort to study it, and none would grasp it solely by observation.

The full scheme goes far enough to handle troublesome cases, in the long run, as well as is feasible:

  1. An entry whose rendering (whether via unpiped article title or piped rendering) includes a character in a Latin-based alphabet, other than one of the English alphabet's 26 or one of them modified with a diacritical mark, is positioned where it would belong if its special character(s) had been replaced as follows:
    • Æ/æ (ash) by AE/ae
    • Œ/œ (ethel) by OE/oe
    • ß (eszett, a letter that lacks a conventional upper case form) by ss
    • Þ/þ (thorn) by Th/th
    • Ð/ð (edh, or eth) by Th/th
  2. Since
    1. Ð/ð is sometimes transliterated as dh rather than th,
    2. the upper case Ð seems clearly derived from the letter D modified by a diacritic (specifically, crossed on the left with a short horizontal bar), and
    3. this uppercase Ð is for most readers indistinguishable, perhaps even by context, from the Vietnamese (and Finnish?) Đ/đ, which seem clearly to be a consistent pair of diacritic-modified versions of D and d. (This is in contrast with ð, which clearly has undergone further evolution, separate from that affecting d, since the time when it was simply d with a diacritical mark.)
a list of LoPbN entries involving it is being built. The intention is to make it practical to add duplicate entries, e.g. one for Broðir alphabetized as if spelled Brodhir, and (hypothetically!) two for Ðatgai, as if spelled Datgai and Dhatgai (perhaps in addition to an entry spelling That-guy). These parallel the practice of using a duplicate Hermann Goering besides the direct lk Hermann Göring, to reflect both the native spelling with the diacritic and a common English transliteration.

(In fact, the letters edh (AKA eth) and thorn are so little known among native English speakers that in a WP context no system can in practice deal well with entries where they appear early enough in a name to make a difference. Treating them like two cases of TH is lame and tacky, and "the worst possible system, except for all the others". The only excuse for it is that the cases where it presents problems are a minuscule portion of the list, and that most people trying to find a name based on their recalling it as containing Đ/ð, Þ/þ, or ß probably will know enough either to realize that TH or SS is a plausible transliteration, or to consult articles on the language or alphabet in question, and learn of the connection.)

[edit] Personal-Name Models

The vast majority of names on LoPbN are either of the West European <given name> <surname> style or the East and Southeast Asian (and traditional Hungarian) <surname> <given name> style, so that listing the names in <surname> <given name> order (with a comma for names inverted into that order, and without any otherwise) is a convention whose omnipresent examples outside WP obviate any thought of explaining it to users. It is deviations from those conventions that need some thought about minimizing confusion and misunderstanding. The common deviations in LoPbN are:

  1. People known primarily by names that include titles: monarchs, other nobility, hierarchs (popes, patriarchs, and a few other variations)
  2. Other names including "the" or "of"
  3. Arabic names (which may supplement a given name with one or more phrases consisting of the equivalents of "son of <father's given name>", "father of <first son's given name>", and "from <place name>"
  4. Miscellaneous, including
    1. names like Javier Perez de Cuellar and Per Brinch Hansen where the division between given name and surname is ambiguous,
    2. names like Rose Marie that are ambiguous between being a compound given name or a given-name surname combination,
    3. nicknames and pseudonyms that are more like phrases than conventional names, such as Meat Loaf and Method Man, and
    4. names chosen to intentionally violate perceived naming conventions, such as FM-2030.

The major strategies are

  • adapting some deviations into quasi-European form (e.g. separate entries for "Meat Loaf" and "Loaf, Meat", and
  • where chaos threatens (e.g. lists of monarchs known by given name, Roman numeral, and country ruled, which could invite the question of whether people surnamed James can appear between kings named James), segregate the "troublemakers" under labels that help to document the difference approach:
  • Simon as given name
    • Simon IV
    • Simon V
  • Simon as surname
    • Simon, Neil

[edit] Scheme of Subdivision into Sub-lists

Entries are grouped in a hierarchy of subdivisions of various types, for various related purposes.

A fundamental concept of LoPbN's logical structure is that of a sublist, a portion of a list consisting of all its elements that lie within one range of that list. More formally,

a smaller list is a sublist of a bigger one
if and only if
the rest of the bigger list consists of one or two lists that could
be joined end to end to the smaller one,
and thus reproduce the bigger list.

(For instance, a list created by including every second element from a parent list would not be a sublist of the parent list, because getting back to the parent list requires inserting elements.)

The most basic principle practiced in subdividing LoPbN is that nothing is subdivided except into sub-lists of it: the whole list is subdivided only into sub-lists of itself, and each sub-list is further subdivided only into sub-lists of itself (each of which is of course a sublist of LoPbn).

[edit] Types of Subdivision

Beyond the sub-list restriction, groups are kept as intuitive as feasible, via simplicity and naturalness:

  • The fundamental kind of subdivision ("Type I") is all names of the list that begin with the subdivision's common string; it can be said to have a depth equal to the length of that common string. An example is all the names beginning with BA (or Ba, bA, or ba), and is called "Ba" for short; it has depth 2.
  • Another kind of subdivision ("Type II") is a specific kind of alphabetic range within the Type-I subdivision X of depth n. Namely, it is such a range that is the union of Type-I subdivisions of depth n+1 within X, that are each determined by a string one letter longer than the one that determines X; it has a depth equal to the (common) length of those strings. An example is all the strings from the first name beginning with BAD to the last name beginning BAG, called for short "Bad-Bag" or "Bad - Bag" (depending on context); it has depth 3.
  • Type III subdivisions are special variations on Type I, based mostly on the presence of blanks in a name, and discussed elsewhere.

These restrictions are tighter than may be obvious. For instance, the practice in some reference books of division into volumes of equal size will inevitably produce some volumes like the "HOW to HUG" encyclopedia volume in the old "I wanted to learn how to hug" joke. LoPbN could have a Ho-Hu subdivision, but if ranges within that grew, and forced further subdivision, the Type-II Ho-Hu would be abolished in favor of, say, Ho, Hp-Ht, and Hu, (Types I, II, and I, respectively) and the Type I subdivisions Ho and Hu could have further subdivisions. Similarly, these rules preclude all "unbalanced" subdivisions such as Ho-Hug.

(It may be worth noting that "Bad - Bag" is likely to be mis-called "Bad through Bag": it includes names beginning Baga... and Bagz..., all of which are beyond "Bag". Strict logic would call for names like "List of people by name: Ba..." and "List of people by name: Bad - Bag...", but there have been neither complaints about that lapse of logic, nor reasons to infer errors caused by misunderstanding its intent.)

[edit] Conventions Applicable to Naming Sublists

Even without causing errors, the "Bad-Bag" style implies awkward language. The following examples show that and another convention at work. The convention used in 2005 and beyond differs between the style used in page titles contrasted with section and bullet-heading titles: as indicated below, the page title would end (following the colon) "name Ho", while a corresponding section, bullet-point, or abstract subdivision would be "People named Ho".)

Name near start Name near end 2005 subd'n name Alternative subd'n name
Albert Ho Robert Hoyzer Ho Ho...
Albert Ho Tommy Ho name Ho (for page)

People named Ho (otherwise)

Ho
Lew Hoad Robert Hoyzer Hoa-Hoz Hoa - Hoz...

(Note that the difference of using spaces around the hyphen is intentional. Besides the improved readability, it clarifies that the ellipsis applies to "Hoz" rather than to the range that Hoz ends.)

The "People named..." heading is also imperfect in expressing its scope: arguably Javier Perez de Cuellar is among people who are "named Perez", since Perez is the first unit of his surname, and is sometimes used alone for him. But "named Al" (or even "Named al") doesn't express at all why there might be an entry for al Tikriti, Saddam Hussein before Alanson, James: "surname" does not describe "al Tikriti" very well, and (even more clearly) "al" is not a name in the context "Saddam Hussein al Tikriti". (It is roughly a grammatical article, perhaps "the", with that name for him meaning roughly "Saddam Hussein, the one of that name who comes from Tikrit".)

[edit] Subdivision Levels

Subdivisions paralleling LoPbN logical sublists are used at three different levels.

At the highest of the three levels, each of the main (article) name-space pages associated with the list corresponds to a Type-I, -II, or -III subdivision. (Those that are parents of other pages in the tree of such pages are all Type-I pages.)

At the middle level, all of those pages that are leaves of the tree of pages contain entries (or exist to be ready to receive the entries for names that may in theory eventually be added). Those that have more than about a screenful of entries are divided into multiple wiki sections of entries; each such section (including sub-...sections) corresponds to a Type-I, -II, or -III subdivision.

At the bottom level, some pages have been converted from having each section a "flat" bulleted list to a multi-level bulleted list in each section. Each hierarchical group of entries subordinate to a bullet-heading at any level is a subdivision.

[edit] "Physical" Structures of the List

The structures discussed in the following sections are "physical" in that each amounts to a separate mechanism that supports the presentation to the user of the content of entries, in logical structure already described.

These structures developed in stages, driven largely by needs that have emerged as the size of the list has increased, but also by availability (and awareness) of appropriate MediaWiki facilities.

[edit] Realizations of Sub-list Structure

This is the primary physical structure of LoPbN, all others being secondary in the sense of being infrastructure for support of it.

The sections are presented in order from larger structures to finer ones.

[edit] Inter-page Structure

[edit] People Pages

People pages are the indispensable mechanism implementing the sublists of the logical structure. Their sizes are limited by the policy of keeping pages smaller than about 32 K-bytes.

Human factors and managability also enter into page splits. As of 2006 January, most pages are limited by keeping the depth of their section tree to 5 or fewer, and the number of lines in the ToC to around 20. (A first careful observation of a library 'Net PC for public use suggests 14 ToC lines may be a more inclusive figure that does not go too far into diminishing returns.) It must be borne in mind that this restricts not just the number of sections, but thereby the number of entries, since, similarly, the screen size is treated as limiting the number of entries (and bullet headings among them); in the case of the observation already cited, 16 total bullet points, plus a normal top-level section heading (e.g. == Baa - Bab == on List of people by name: Baa-Baj)

[edit] Page Indexes

A page index on a LoPbN page is a set of lks to other such pages; they appeared on sub-list pages at least since 2001 October.

From the time when index-only pages appeared, the indexes both kinds of main-space LoPbN pages have adhered to common formats.

[edit] Index-Only Pages

In terms of its time development, an index-only page is, virtually without exception, a page that was previously a sub-list page but has lost its name entries by its subdivision into new people pages, leaving it an Index-only page.

While a people page has an index for navigation to another page, an inx-o page has the same intecs it would have it it had stayed a people page, plus lks to the child pages it lost its entries to: the inx can still be used to navigate to its ancestor, sibling, uncle, great-uncle, etc. pages, but it is most used to navigate deeper into the tree, to one of its child pages.

[edit] Section Structure

[edit] Intra-section Structure

The structure w/in a section, of a multi-level bulleted list, is at this writing a pilot prjoject implemented on a few dozen pages.

[edit] Index Infra-structure

The minimal infra-structure to implement the people and index peages is an array, on each page, with a hand-coded link markup for each page, seen as

worth the space and clutter it entails. In an environment of constatnly increasing numbers of entries and thus newlyucreated peopl pages, maintenacned of tehse indexes is voh labofr intesnive and error prone. Aserise of medchnismes havce servced dot proegressivelyu amelirorate those problems.

[edit] Index Templates

The crucial element of the IIs since early 20004 is the inz template. Itrs introduction may have been sparked by the self-link rediering emchanism: Links from a page to itself (especially where piped, and especilly where the user reache the page via a rdieresct are at vest ugly, often surpriseing, adt at worst a source of incomprehensible repeatred frustration. Sup[presssing self links alleviated this, and made i possible for identical marekup to implement teh inxs

[edit] Index Meta-Templates & Subdivision Templates

[edit] Subdivision-Template-Generating Templates

[edit] (Proposed) Whole-List Infra-structure

[edit] (Proposed) Inter-Wiki Entry Structure

[edit] Progressive Development of Inter-page Structure Features

  1. 26 sublists, each with index to the rest
  2. Additional sublists in 2nd level, and corresponding index-only pages at 1st level
  3. Rectangular table (and 2-level indexes?)
  4. Unlimited levels
  5. Indexes supported via shared templates
  6. Meta-template-supported indexes
  7. Outline (or exhaustive) index
  8. Generated templates
  9. (Proposed) global template