Talk:List of people by name/Intra-page structure

From Wikipedia, the free encyclopedia

1 Subdivisions within an Individual Page of the LoPbN Tree

[edit] Subdivisions within an Individual Page of the LoPbN Tree

[edit] Theory

The purpose of subdividing the list is accessibility, which consists of reducing two main burdens: time and mental effort. Random choices from an unstructured pool, and linear search in a randomly ordered list are of theoretical interest only; the fundamental tool needed for avoiding these is binary search, which requires at least a means, such as an alphabetical list, of knowing whether the data found is earlier or later in the collection than sought. A perfect binary search of a list of 16K items (LoPbN is larger) nearly always requires the maximum 14 stages, but one of the strengths of alphabetical order is that not all binary searches are equal: one requiring several stages of visually performed binary search without intervening changes of the visible data would probably be insignificantly slower than a single stage ending with such a change. The heavy practice given the tasks of remembering the alphabet and visually homing in on a specific letter can make a hierarchical organization, with each level based on the alphabet, highly efficient of both time and conscious mental effort (which is probably many times more boring and fatiguing than equivalent unconscious effort). This seems to be valuable enough to make up for giving up, repeatedly for most real searches, the opportunity for each choice (whether binary or 26-ary) to divide the list into equal parts, as the theoretically optimum performance requires.
--Jerzy•t 09:03, 15 December 2005 (UTC)

[edit] Section Hierarchy

In practice, the use of sections (including sub-sections and deeper -sections) on LoPbN, dating back at least to 2003 September, has been used to subdivide pages, making smaller stretches of names lk-able via the table of contents (ToC). However, each new section increases the length of the ToC, and once the ToC is long enough that for some users it fills the height of their screen, having to maneuver the needed ToC portion onto the screen starts to erode the gains sought. (We are not in a position to identify each user's screen height and take it into account, so estimates and perhaps safety margins have to shape the one-structure-fits-all plan.
--Jerzy•t 09:03, 15 December 2005 (UTC)

Let's make a specific recommendation. I suggest:

Lists of people more than 25 lines long SHOULD be broken up by intermediate sections
Pages with TOCs of more than 25 entries SHOULD be broken up into multiple places

Then you only have growth of the master index to deal with :-) --Alvestrand 12:09, 28 April 2006 (UTC)

[edit] Experimental Bullet-list Hierarchy

In any case, further subdivision within the roughly screen-high sections is desirable. These pilot pages (listed on User:Jerzy/LoPbN Pilot Pages) permit sampling of the effect of imposing further hierarchy within sections via multi-level bullet-lists:
'Multi-level bulleting', w/ ToC abv Inx
(The superscript in each case indicates date of conversion to this formating (within the last quarter of 2005 and early 2006);
pages in bold are those that have been worked on, from their conversion thru '06 Apr 20, by editors other than Jerzy, the originator on 2005 Oct 25 Oct of this pilot scheme.)

Ae^J9 | | | Bera-Berm^D19 | Bern-Berz^D20
Caa-Cal^N4 | | Chb-Che^J12 | Chf-Chq^J18
Cle^F9 | Clf-Clz^O25 | | Consa-Conss ^O25 | Const^O25
Doa-Dom^N4 | Don-Doz^N3 | | | Fis^J20 | | | Haa^D9
Ly^N17 | | | Sf-Sg^J18 | | Sl^J4 | | | Ts-Tt^J11 | | Ty^J4
Web^N11 | | Wu^N4 | Wv-Wz^N4

Other 'ToC abv Inx' (not exhaustive)

All names pages from Aa thru Har... (mainstream of LoPbN GT)
name P
Sf-Sg | | Sma-Smh | Smi | Smj-Smz
Zu

Dab lists & maint dates for editors' use:

Talk:List of people by name: Bau-Baz

Table of Editors since Conversions .
Feedback, preferably on this talk page, will be welcome, now and as further pages are gradually converted. Barring reversal if problems are identified, i anticipate a gradual conversion, page by page, of the entire list.
--Jerzy•t 09:03, 15 December 2005 (UTC)

[edit] Discussion of how to use bullets in subdivisions

(the long table of editors made it hard to navigate to this section, so I inserted a subsection... --Alvestrand 14:17, 7 May 2006 (UTC))

I don't like this sub-bulleting scheme much, for two reasons:

It mixes up data (the names) and navigation (the markers) in one class of entity: bulleted items
It requires the script that inserts new entries to keep track of whether it's in an 1-bulleted or a 2-bulleted list.

Of course, I've got a rather "scriptish" bias.... --Alvestrand 11:59, 28 April 2006 (UTC)

_ _ A solution to the mixing might be to make one or the other indented rather than bulleted. Or to use indentation, for both, but begin each data or nav line with a character, rather than relying on MediaWiki to supply the bullets. E.g., in place of

Donk
- Donkin, Rufane Shaw (1773-1841), British politician and geographer
Donl
- Donla - Donle
  - Donlavey, Junie (born 1924), American race car owner
  - Donleavy, James Patrick (J.P.) (born 1926), Irish American author
  - Donlevy, Brian (1901-1972), American actor
- Donli - Donlo
  - Donlin, Mike (1878-1933), American baseball player
  - Donlon, Andrea, American scientist

we could have [rewritten by Jerzy fixing my hasty msg]

Donk

Donkin, Rufane Shaw (1773-1841), British politician and geographer

Donl

Donla - Donle

Donlavey, Junie (born 1924), American race car owner
Donleavy, James Patrick (J.P.) (born 1926), Irish American author
Donlevy, Brian (1901-1972), American actor

Donli - Donlo

Donlin, Mike (1878-1933), American baseball player
Donlon, Andrea, American scientist

or [end of rewritten portion]

‣ Donk

• Donkin, Rufane Shaw (1773-1841), British politician and geographer

‣ Donl

‣ Donla - Donle

• Donlavey, Junie (born 1924), American race car owner

• Donleavy, James Patrick (J.P.) (born 1926), Irish American author

• Donlevy, Brian (1901-1972), American actor

‣ Donli - Donlo

• Donlin, Mike (1878-1933), American baseball player

• Donlon, Andrea, American scientist

--Jerzy•t 03:50, 29 April & 23:02, 8 May 2006 (UTC)

_ _ I hope my lack of charity is not too great, regarding the scripts: if the additional navigation serves the user as well as i suppose, neither the effort of editors in maintaining it as new items are added manually, nor the extra programming for the mergenames bot, should weigh too much in our decision. But i suggest there's no need to maintain such flexible code:

If it were clear that we're going to adopt a within-section hierarchy, i'd say write the script to assume it, and i'll start converting to it as rapidly as my sanity permits. (Or i'd describe my current algorithm, we could settle what the algorithm really should be, and presumably a script could do the conversion.
In the real case, i assume we are not clear that we're going to adopt one, in light of the sparse discussion of it: my quick description, your comment, and one editor (who's never responded to urging to contribute to Talk:List of people by name or its subpages but) who removed the bulleting from a page with a summary "reduce insanity" (or maybe one word more). Write the bot for flat sections only; stick to those numerous pages, or we can even strip out the multi-level bulleting on the samples until the torrent of additions i anticipate ebbs. In the meantime, we work out the within-section hierarchy scheme, apply it thruout while suspending mergenames use and working up the new version of mergenames, that assumes within-section hierarchy, and start using the new version when the conversion is completed.
In fact, expansion of within-section-hierarchy pages on the scale Category:Living people/mergenames implies is probably really a task of stripping the multi-level bullets, adding entries, and starting over on the task of building the within-section hierarchy. Maybe the long-term design of mergenames should be more like this:
1. Set a template included on every names page to say "Automated editing in progress; editors who reorder or remove entries are likely to find those changes get lost, and are urged to check them after this message is removed."
2. Generate a flat list from the tree.
3. Let the bot make additions as mergenames now does, to the flat list, using some specification of the Cats to be used as a source of entries.
4. Scan for manually added or modified entries made during the preceding; apply those changes to the flat list.
5. Build a model of the tree structure from the flat list.
6. Lock all the pages of the tree.
7. Replace all the pages of the tree, including splitting pages as needed and creating new pages to hold the contents of the split pages.
8. Unlock the pages of the tree.
9. Make minor changes reflect additions and entry-modifications made while the model was being built.

Just a thought of mine.

--Jerzy•t 03:50, 29 April 2006 (UTC)

I think the megenames script can adapt to most things - the most important thing for it is consistency. As an user, I prefer to see consistency too. As an information theorist, I like using different construcs for different things - in this case, the last point is what leads me to suggest that we stick to == (section names) for separating sections, and * (bullets) for marking person names. That gives a structure that I think both people and scripts will find easy to understand.

WRT locking, reformatting etc: The time required for a mergenames run is on the order of 10 mins for a page - I don't think the time is great enough to warrant a heavy-duty locking scheme. What to do about the reformatting is another problem - mergenames COULD insert new sections when detecting that sections become overlong, but Jerzy, you know better how much time it takes you to do page splits. --Alvestrand 14:17, 7 May 2006 (UTC) (back from a week's vacation)

_ _ Sorry about the bulky table (which i added for my own use re getting clear how much various m-l bullet pages have been edited, as a possible gauge on whether others have been impeded by it, without recalling that i'd transcluded here the page it was on). Also sorry about starting to offer two examples, but leaving one example & pointlessly duplicated markup instead; the first example now exists above.)

_ _ My only real argument with your "different purposes" objection is that user convenience should trump such considerations. But i may have been too vague about my purposes, especially if you think i am using sections and bullet headings (by which i mean bulleted items with children) for the same thing.

_ _ Actually, in articles in WP, pages and sections each serve essentially the same two purposes, differentiated largely by considerations of size (page big, section small).

navigation (in terms of controlling which information is displayed), and
structuring the text, alerting the user to what's coming next and promoting a mental model of what is being read, which hopefully is part of the knowledge the user seeks.

_ _ In contrast, navigational pages in the main namespace (i don't consider them articles) such as Dabs & LoPbN pages, are seldom read like prose, but rather searched for one link that is followed without any thought of coming back soon for more. (This contrasts with articles, where the user is likely to read most of the sections of a given page.)

_ _ Bullet points are used in articles mostly for references, ext lks, and occasionally for internal lists.

_ _ On big (LoPbN but not generally Dab) nav pages, three levels of scale exist and IMO need to be exploited, always in the service of navigation, because the only structure is navigational, not any structure of knowledge the user wants to carry away. Despite the similarity of the labels on pages, sections, and bullet hdgs, the section lks provide access to an approximate screenful, the page lks provide access to a approximae screenful of section lks, and the bullet hdgs provide access, within one screenful, for the visual navigation that takes place without touching the input devices. The section lks are far more like the page lks than like the bullet points.
Jerzy•t 00:03, 9 May 2006 (UTC)

If we adopt a reasonably fixed guideline (I suggested "25" elsewhere on this page) for section size, that's fairly easy to navigate. The thing with more navigation aid is that they use up screen real estate - the alphabetical ordering is a fairly strong navigational aid in itself, my feeling is that bullets add to the clutter. In your "Don" example earlier in this section, you have 6 lines of data and 4 lines of clutter - granted, that's just an example, but illustrates a point. --Alvestrand 11:32, 10 May 2006 (UTC)

_ _ I don't remember your original wording, but rather than "a reasonably fixed guideline", i'd hope you were talking about a threshold for splitting in response to size. The former could invite speculation that there is something wrong with the small pages and sections, near larger ones, that are required to preserve, when a split is made, what i'd call "clean hierarchies" of pages and sections. "Reasonably fixed guideline" also could be taken as support for breaking a section into several sections that push as close to a "target" threshold as possible, often leaving a quite small final section -- rather than sharing the "slack" as equally as possible among sections. (For the same number of sections, big-...-big-small means the large sections, which will probably get additions at a rate in proportion to their sizes, will overflow soon and force redivision, whereas similar-sized sections can usually tolerate the longest series of additions before readjustment.)

I meant a guideline for when to split a section, yes. It should be human judgment when and where to split, but 25 seems like a reasonable guess. --Alvestrand 12:53, 13 May 2006 (UTC)

_ _ You characterize the 4 navigational lines as "clutter". Their purpose is to permit the user to do what i think is described as an indexed search rather than a sequential or a purely visual binary search of the dozen or two entries. (Each level of bulleted hdgs is an equivalent to the thumb index on most dictionaries, except that using one hdg gets you to another level of hdgs.) I don't mind your using "clutter" as a rhetorical flourish in place of saying "an indexed search is a waste", but i'd also like to hear why you think that. Would you also say that the indentation of the ToC is wasted info? In fact, take a look at the ToC of List of people by name: Don. According to my quick count, there are 25 lines (excluding "Access to rest of list"), and 8 of them are "clutter" by your standard. I'd conclude you'd probably be happier with a short ToC that saved lines by looking like this:

Donah
People named Donald
Donaldson, A-P
Donaldson, R-W
Donas - Donat
Done
People named Dong
Donga - Dongs
Donh
Doni
Donk - Donl
People named Donn
Donna
People named Donne
Donnel
Donner
Dono - Donu

I wasn't really talking about the ToC when I mentioned "clutter" - in this particular example, the subheadings 2.2 "Donal" and 2.2.1.2 "Donalds" irritate me, but for a different reason - they seem to imply that there are people named "Donal*" that aren't named "Donald*", or people named "Donalds*" that aren't named "Donaldson" - but that's not a big deal for me. And there's only 2 of those. I like the sub-bulleting, for the most part. --Alvestrand 12:53, 13 May 2006 (UTC)

_ _ Also, this is another place where it is relevant that "WP is not paper". Saving "screen real estate" is valuable per se only when using it in a profligate fashion forces the info the user wants off screen; that is not the case here, bcz the users arrive at the right page via lks, and at the right section via a lk in the page's ToC. Even if i were wrong in saying that the bulleted hdgs save the user time, the indirect cost to the user due to wasted space would be far less than the 40% your "wasted space" statistic suggests. The only possible costs to the user would be the response time for downloading of one (or, rarely, more) additional index-only pages (transmission time is not significant, bcz those pages are so small), if the space overhead from bulleted hdgs forces redivision of the page into enough additional sections on the page that the ToC is pushed to or past the splitting threshold, and even then only if that pushes the entry of interest deeper in the tree (rather than just causing that page to be split into two or three pages). And even then, an extra page is downloaded only if the user selects a page from the square table (dropping down two levels in the tree by clicking on lk) and then links down one level at a time to the required depth: using the "exhaustive LoPbN index", the user suffers only from the inflation of the size of that index as a result of deepening of the tree. Bearing in mind that

many sections and many ToCs are far short of the threshold and thus do not split in response to wasting "real estate",
the depth of the tree increases only as a logarithm of the number of pages (rather than proportionally),
the number of lines in the exhaustive index is increased only by one line for each page that is divided into deeper pages (not by splittings at the same depth),
the longest of the 17 sections of the exhaustive index is presently 10 lines (and only 4 more sections are close), so it is far from needing more sections, and
a square index could handle an enormous increase without strain, and without substantially longer access time.

_ _ As to that 4:6 ratio, IIRC i specifically chose a real section that compactly illustrated the principle, i.e. one with 2 levels of hdgs and few entries. The whole page has something like 131 entries, and 45 bullet hdgs, or about half the overhead rate of the sample on this page, i.e. 1:3.
--Jerzy•t 04:17, 13 May 2006 (UTC)

Oh, i meant to say that 25 is a figure that works pretty well for my configuration (MS IE; Text size = Medium; Std, Links, Address, & Google tool bars), but i'd like to find the time to see how typical library computers are configured before getting too committed to 25. i also wonder if there's a significant low-vision population using Text size = Larger; if so, maybe we should adopt that as the greatest common demoninator. (Ideally, there should be a Prefs switch to use a version with a lower threshhold, and on/off tags for "render only for std users" and "render only for low-vision users! But that's well beyond the scope of this decision.)
--Jerzy•t 04:43, 13 May 2006 (UTC)

My preference for 25 as a limit actually comes from the time of working on text-only terminals back in the 80s :-) - mobiles and PDAs seem to be the only devices that aren't able to display text formatted for those ancient devices reasonably. Blind people tend to like that kind of display because it makes them able to read line-by-line; I don't know much about how people with weak eyesight work. But I like better adopting a number and then adjusting it based on experience than having no number and having case-by-case discussions. And as I said - I like guidelines instead of rules. --Alvestrand 12:53, 13 May 2006 (UTC)

Splitting pages is actually fairly quick work, due to the templated intra-LoPbN nav system and the template generating mechanism that most users are unaware of. But my mention of locking is not about the 10 minute-scale work you are talking about for adding to a single page. Rather, adjusting bullet hdgs is slow work, especially since when splitting a page results from additions, some bullet hdgs would become section hdgs, or be split into several section hdgs, and the remaining bullet hdgs would become less deeply subordinated. Doing that task right my ultimately be something that should be done to the whole tree at once, which is why i implied (or tried to imply) turning all the hierarchy modification over to a bot, which would modify by rebuilding from scratch. I'll know better what else to say if you ask some questions.
Jerzy•t 00:03, 9 May 2006 (UTC)

[edit] Issues and ideas about subdivision

This topic has been placed here to invite comments or questions about how the subdivision is being done, or if you have a better idea of how it should be done. It should allow new editors to learn and become more skilled at improving LoPbN pages, and perhaps allow the most experienced editors to pass on their knowledge, which might eventually become a beautiful document that we can all understand. Chris the speller 18:53, 12 March 2007 (UTC)

[edit] Large sections

Sections with more than 25 entries should be subdivided. Please add more detail here, or ask how. If you see a section that is too large, and don't have the time, the skill or the interest, please list it in the section below ("Requests for section subdivision"). Chris the speller 18:53, 12 March 2007 (UTC)

[edit] Large pages

Large pages should be split into multiple pages. Please add more detail here, or ask how. Chris the speller 18:53, 12 March 2007 (UTC)

[edit] How to tell that a page is ready to be split up

I do not know what is too large; having to scroll to see all of the TOC? This would vary, depending on the browser and type size and maybe more. If I see a page that seems too large, I will list it in the section below ("Requests for page subdivision"). Chris the speller 18:53, 12 March 2007 (UTC)

[edit] Miscellaneous subdivision questions and ideas

[edit] Requests for section subdivision

Please use five tildes (~) followed by a link to the page, and, optionally, a # and the section name, then sign with three tildes. Strike out and sign with four tildes when the work is finished.

18:56, 12 March 2007 (UTC) List of people by name: Char#Charles Chris the speller

02:44, 13 March 2007 (UTC) List of people by name: Cr#Cru Chris the speller

17:24, 16 March 2007 (UTC) List of people by name: Hen#Monarchs given-named Henry (might need a page split as well) Chris the speller

04:08, 19 March 2007 (UTC) List of people by name: Lou#Louis Chris the speller