[Stable draft; last amended 11:04, 27 March 2008 (UTC). Development page: WT:NOWRAP.]
Better markup for the hard space
A proposal to implement ,, as markup for the hard space ( )
- 1. Overview
- 2. Technical details
- 3. Objections and replies
- 4. Implementation
1. Overview
The hard space is an important but neglected element in good Wikipedia editing. It stops an unwanted line break, so it is also called no-break space, or non-breaking space. An example (one sort from very many): no line break should occur in "17 sq ft". At present there are two ways to achieve this: first, with the raw HTML code (17 sq ft ); second, with the {{nowrap}} template ({{nowrap|17 sq ft}} ). These options are hard to remember, hard to input, and hard to interpret on the screen. Some cases are far more complex.
The solution? Introduce simple new Wikipedia markup, similar to the existing markup for italic (''italic text'' ) and bold ('''bold text''' ). Although these are converted by the system into HTML code (<i>italic text</i> and <b>bold text</b> respectively), the text always appears in the edit box with the markup '' or ''' .
The proposal simply adapts this useful and accepted idea, to include the hard space. Extensive discussion among interested editors, followed by a poll, shows that ,, (two ordinary commas) is the best markup. When it is implemented, one could type 17,,sq,,ft in the edit box, which would be converted internally to 17 sq ft, so that the reader of the article always sees an unbroken "17 sq ft". Editing, we would always still see 17,,sq,,ft . This innovation is easy for experienced editors and welcoming to Wikipedia newcomers, since it is the same style as markup for bold and italics.
Analysis shows that comma-based markup could be extended for other formatting and punctuation; but that is beyond the present simple proposal.
2. Technical details
The system's existing parsing of markup for italics ('' ) or bold (''' ) is a little complex. Those markups are "dual", requiring distinct interpretations as either beginnings (<i>, <b>) or ends (</i>, </b>). Those markups have to coexist with the use of ' as a single quote mark and as an apostrophe. Italics and bold are often applied together: sometimes overlapping, but more often nested like 'this' (markup: ''nested '''like 'this'''''' ). The WP community accepts the occasional ambiguity, and where the system fails to parse as an editor intends, there are workarounds available.
The proposed markup for the hard space will be much more straightforward – for originating editors, subsequent editors, and the system itself. There are no beginnings and ends, just single applications. Still, some slight complexities will arise, and they are easily dealt with. We start with the simplest case:
The case of ,,
- Exactly two adjacent commas will always be parsed as a hard space, yielding the HTML code
. Inadvertent typing of ,, instead of , will cause no serious damage: its effect can easily be detected and repaired.
The case of ,,,
- The markup ,, must coexist with the use of , as an ordinary comma in the text, though it will rarely be adjacent to such a comma. If this ever does happen, the natural parsing of ,,, would be as comma + . This could conceivably be needed in a complex subscript, like this: W1, 2, 3, ... , n (markup:
W<sub>1,,,2,,,3,,,...,,,n</sub> ).
It is hard to think of a case in which the reverse would be needed ( + comma); but if that ever did arise, <nowiki></nowiki>, , and {{nowrap}} would always be available, as they are now.
The case of ,,,, (and higher even numbers of commas)
- Sometimes more than one hard space is called for: in fine-tuning spacing in tabular work, for example. (This is deprecated by some, but often a convenient solution.) The natural parsing of
,,,, would be as + . A similar interpretation would apply to any string of 2·n commas: it would be parsed as n instances of . And similarly for longer even-numbered strings of commas. If the editor intended comma + + comma, the existing alternative resources should be used.
The case of ,,,,, (and higher odd numbers of commas)
- The sequence
,,,,, would be most unlikely to occur, and might arise more often as an error than as meaning anything specific. But the natural parsing would be comma + + . And similarly for longer odd-numbered strings of commas. If the editor intended + comma + , the existing alternative resources should be used.
In short, there is a single rational parsing for any comma-based markup that could arise; and in the rare cases in which no comma-based markup will yield the desired non-breaking HTML code, alternative resources will meet the need as they do now.
3. Objections and replies
Objection 1
Hard spaces? I never use them! Why should I care?
- Good editing requires hard spaces, even if many editors know nothing about them. Wikipedia's Manual of Style (MOS) explains some of their uses; and more uses would be added there, if only the markup were simple enough.
Objection 2
The no-break space can already be input with . Recently this code has been attached to a button for insertion, under the edit box. Isn't that enough?
- There are three points to make.
- First, the code
is hard to remember, hard to type, and hard to interpret on the screen, especially to those who are unfamiliar with HTML. One simple example: an en dash often needs a hard space before it. Consider "89 sq in – 3 sq ft". To keep this from breaking improperly you need to type, or later edit, this code: 89 sq in – 3 sq ft . Under the proposal, you would simply type this instead: 89,,sq,,in,,– 3,,sq,,ft .
- Second, note that the goal of wikitext is to abstract HTML codes away from the user, especially for common markup such as
''italics'' and '''bold''' . If the hard space is to be used as often as MOS advocates, it also needs this treatment.
- Third, the new insert button does not help with interpreting the code seen in the edit box. It helps with inserting: but you still have to find the button, then find your place in the text again. Do you really want to do all that three times, for the example just given?
Objection 3
Why not use the Unicode character for the no-break space, attached to an insert button?
- There is still a problem with insert buttons as a solution. In contrast to
, the Unicode non-breaking space is visually indistinguishable from the ordinary space. This is unacceptable, since the editor needs to be able to see the difference in the edit box.
Objection 4
There is also a {{nowrap}} template. Why not use that?
- This template may be useful in some longer phrases that shouldn't be wrapped at any point; but again it is too visually intrusive to be used extensively for single no-break spaces. With the example above, the code would be
{{nowrap|89 sq in –}} {{nowrap|3 sq ft}} . And many cases are more complex than that. Currently, {{nowrap}} does not behave well, since a space at the start or end of the enclosed text is rendered in HTML outside of that text, leading to unexpected breaks. There are other templates similar to {{nowrap}} that cater for specialised requirements; but all of these come up against technical inadequacies in the template system itself. And they all use the HTML <span> tag, which is interpreted differently by different browsers with unpredictable results. Use of instead is simpler, more intuitively understandable, and more reliable in its results on the reader's screen no matter which browser is used. That is what the proposal for ,, achieves.
Objection 5
I use a non-standard editor with aliasing, so when I want a hard space I type /h to get . Easy! So what's the problem?
- There are two problems. The code you make is still hard to read and edit; and not everyone can do what you can do.
Objection 6
I'm used to . That's what I'll always type!
- You could do that, of course. You'd still make markup text that's hard to read and hard to edit. Some people (very few!) type
<i> and </i> in the edit box. But they'd make life easier for themselves and others if they just typed '' instead.
Objection 7
The proposal is technically too hard to implement, and is not standard anywhere.
- It is in fact easier to implement than
'' , ''' , and other non-standard markup used at Wikipedia (see Technical Details). Wikipedia is the leader in these matters. Our developers have the capacity to innovate, and others would almost certainly respect their precedent, and follow it.
Objection 8
The new markup is unintuitive, and unlike anything we have already.
- In fact it is quite intuitive, at least for editors who are already used to
'' and ''' . Beginners have to learn that markup: the new markup simply uses , instead of ' : similar but distinct characters, both with their own stand-alone uses as apostrophe (or single quote mark) and comma, and both also used in markup.
Objection 9
Markup with commas is impractical, and a dead end in development.
- The comma markup has potential to be extended in all sorts of useful ways. Because a comma hardly ever occurs without a space or a digit after it, it is quite readily available for alternative use. We might in future consider markup like this:
,., (to force a break: equivalent to <br>; ,--, (for an en dash: equivalent to – or – ); and so on. Such markup could be combined: ,,,--, (for an en dash with a hard space before it: equivalent to – or – ). But any such extensions would be negotiable. Accepting the present proposal does not commit anyone to any such extensions.
Objection 10
German and some other European languages use the character „ as an opening quote. Markup with ,, could easily be confused with that; and that markup might be needed in future for „ or for some other purpose.
- In fact there is little likelihood of the two being confused. Their appearances on the screen in common fonts are less similar than the appearances of
'' and " are: and we all live with those quite happily. As for reserving ,, for some other purpose, the present proposal makes its own strong case. If others want to make their claim as well, well and good. But we see no evidence of any such claim; and the hard space is of such importance that it is unlikely to be trumped by anything more pressing. After all, those who use „ already have their own established ways of inputting it, just as we do for " .
Objection 11
Other changes in the wiki-markup system would be more sweeping and more "wiki-like".
- Certainly the suggestion is unusual in that it proposes substitution of the single HMTL-entity
for the single marker ,, . But the fact that this is easy to implement is not a principled objection, and certainly not a convincing technical objection. The logic of the subsitution is laid out in full above. As for the possibility of overall changes that would make this change redundant, this can be no serious objection either. We have been promised solutions to problems like the one we address here for a long time; but they are not forthcoming, and there seems to be no consultation about them beyond the technical development community within Wikipedia. The present problem has been identified by serious, active editors concerned with style and ease of use, and we find that the proposed solution does all that is needed. If a better solution eventually comes along, no harm has been done in the meantime.
Objection 12
It's all too hard! How can editors bring about a change like this?
- It is hard work to bring about such a change. But the rewards for this simple innovation are significant. And Wikipedia is made up of editors with good ideas that they pursue energetically. That's really what it's all about! Developers and decision-makers will take seriously a proposal that grows out of the concerns of competent and committed editors.
4. Implementation
After appropriate discussion at WT:NOWRAP, and possibly at MetaWiki, the coding and system changes for the new markup are a matter for developers. This proposal simply outlines the desired behaviour of the markup (see full specification in Technical details, above), and responds to predicted and actual objections so far.
Once the new feature is in place, it would remain harmless and unnoticed by most editors, since they would rarely if ever input two adjacent commas. (This is a virtue of the proposed markup, in fact.) The community will therefore need to be informed of the change. This can be done at a variety of forums, including:
The details can be determined once the new markup has been accepted for incorporation. In the meantime, nothing is lost through uncertainty and confusion, since the proposal adds a feature, but does not take anything away. In this respect it is safer than almost any alternative. For example, :: has meanings in mathematics (and : has meaning in current wiki markup); ;; may occur when HTML entities fall adjacent to a semicolon (and ; has meaning in wiki markup); `` has a meaning in some existing systems of wiki markup; etc.
By notifications as outlined above, and through general community interactions, the new ,, feature should soon become a part of accepted practice, just as '' and ''' are already accepted and appreciated.
|