Lightweight markup language
A lightweight markup language (LML), also termed a simple or humane markup language, is a markup language with simple, unobtrusive syntax. It is designed to be easy to create using any generic text editor, as well as easy to read in its raw form. Lightweight markup languages are used in applications where it may be necessary to read the raw document as well as the final rendered output.
For instance, a person downloading a software library might prefer to read the documentation in a text editor rather than a web browser. Another application for such languages is to provide for data entry in web-based publishing, such as weblogs and wikis, where the input interface is a simple text box. The server software then converts the input into a common document markup language like HTML.
History
Lightweight markup languages were originally used on text-only displays which could not display characters in italics or bold, so informal methods to convey this information had to be developed. This formatting choice was naturally carried forth to plain-text email communications. Console browsers may also resort to similar display conventions.
In 1986 international standard SGML provided facilities to define and parse lightweight markup languages using grammars and tag implication. The 1998 W3C XML is a profile of SGML that omits these facilities. However, no SGML DTD for any of the languages listed below is known.
Types
Lightweight markup languages can be categorized by their tag types. Like HTML (<b>bold</b>
), some languages use named elements that share a common format for start and end tags (e.g. BBCode [B]bold[/B]
), whereas proper lightweight markup languages are restricted to punctuation marks and other non-letter symbols for tags, but some also mix both styles (e.g. Textile bq.
) or allow embedded HTML (e.g. Markdown), possibly extended with custom elements (e.g. Wikitext <ref>source</ref>
).
Most languages distinguish between markup for lines or blocks and for shorter spans of texts, but some only support inline markup.
Some markup languages are tailored for a specific purpose, such as documenting computer code (e.g. POD, RD) or being converted to a certain output format (usually HTML) and nothing else, others are more general in application. This includes whether they are oriented on textual presentation or on data serialization.
Presentation oriented languages include AsciiDoc, AFT, atx, BBCode, Creole, Crossmark, Deplate, Epytext, EtText, Haml, JsonML, MakeDoc, Markdown, Org-mode, POD, reST, RD, Setext, SiSU, SPIP, Xupl, Texy!, Textile, txt2tags, UDO and Wikitext.
Data serialization oriented languages include Curl (homoiconic, but also reads JSON; every object serializes), JSON, OGDL, SDL and YAML.
Comparison of language features
Language | HTML export tool | HTML import tool | Tables | Link titles | class attribute |
id attribute |
Release date |
---|---|---|---|---|---|---|---|
AsciiDoc | Yes | Yes | Yes | Yes | No | No | |
AFT | Yes | No | Yes | Yes | No | No | |
BBCode | No | No | Yes | No | No | No | |
Creole | No | No | Yes | No | No | No | |
deplate | Yes | No | Yes | No | Yes | Yes | |
GitHub Flavored Markdown | Yes | No | Yes | Yes | No | No | |
Jemdoc[1] | Yes | No | Yes | Yes | No | No | |
KARAS | Yes | No | Yes | Yes | Yes/No | Yes/No | |
Markdown | Yes | Yes | Yes/No | Yes | Yes/No | Yes/No | |
Markdown Extra | Yes | Yes | Yes[2] | Yes | Yes | Yes | |
MediaWiki | Yes | Yes | Yes | Yes | Yes | Yes | |
MultiMarkdown | Yes | No | Yes | Yes | No | No | |
Org-mode | Yes | Yes[3] | Yes | Yes | Yes | Yes | |
PmWiki | No | Yes | Yes | Yes | Yes | Yes | |
POD | Yes | ? | No | Yes | ? | ? | |
reStructuredText | Yes | Yes[3] | Yes | Yes | Yes | auto | |
Textile | Yes | No | Yes | Yes | Yes | Yes | |
Texy! | Yes | Yes | Yes | Yes | Yes | Yes | |
txt2tags | Yes | Yes[4] | Yes[5] | Yes | ? | ? |
Markdown's own syntax does not support tables, class attributes, or id attributes; however, since Markdown supports inclusion of native HTML code, these features can be implemented using direct HTML. (Note that some extensions may support these features.)
Comparison of implementation features
Language | Implementations | XHTML | Con/LaTeX | DocBook | ODF | EPUB | DOC(X) | LMLs | Other | License | |
---|---|---|---|---|---|---|---|---|---|---|---|
AsciiDoc | Python, Ruby, JavaScript | XHTML | LaTeX | DocBook | ODF | EPUB | No | — | Man page etc. | GNU GPL, MIT | |
AFT | Perl | HTML | LaTeX | No | No | No | No | RTF | — | extensible | not stated[6] |
BBCode | Perl, PHP, C#, Python, Ruby | (X)HTML | No | No | No | No | No | No | — | — | Public Domain |
Creole | PHP, Python, Ruby, JavaScript [7] | Depends on implementation | CC_BY-SA 1.0 | ||||||||
deplate | Ruby | HTML | LaTeX | No | DocBook | No | No | No | — | plain text | GPL |
GitHub Flavored Markdown | Haskell (Pandoc) | HTML | LaTeX, ConTeXt | DocBook | ODF | EPUB | DOC | AsciiDoc, reST | OPML | GPL | |
Java,[8] JavaScript,[9][10][11] PHP,[12][13] Python,[14] Ruby[15] | HTML[9][10][11][13][14] | No | No | No | No | No | No | — | — | Proprietary | |
Jemdoc[1] | Python | XHTML 1.1 | No | No | No | No | No | No | — | — | GPL |
KARAS | PHP, C#, JavaScript, Ruby | (X)HTML5 | No | No | No | No | No | No | — | — | BSD-style |
Markdown | Perl (originally), C,[16][17] Python,[18] JavaScript, Haskell,[3] Ruby,[19] C#, Java, PHP | HTML | LaTeX, ConTeXt | DocBook | ODF | EPUB | RTF | MediaWiki, reST | Man page, S5 etc. | BSD-style & GPL (both) | |
Markdown Extra | PHP (originally), Python, Ruby | XHTML | No | No | No | No | No | No | — | — | BSD-style & GPL (both) |
MediaWiki | Perl, PHP, Haskell | XHTML | No | No | No | No | No | No | — | — | GNU GPL |
MultiMarkdown | C, Perl | (X)HTML | LaTeX | No | ODF | No | DOC, RTF | — | OPML | GPL, MIT | |
Org-mode | Emacs Lisp, Ruby (parser only), Perl, OCaml | XHTML | LaTeX | DocBook | ODF | EPUB[20] | DOCX [20] | Markdown | TXT, XOXO, iCalendar, Texinfo, man, contrib: groff, s5, deck.js, Confluence Wiki Markup, TaskJuggler, RSS, FreeMind | GPL | |
PmWiki | PHP | XHTML 1.0 Transitional | No | No | No | No | No | No | — | — | GNU GPL |
POD | Perl | (X)HTML, XML | LaTeX | No | DocBook | No | No | RTF | — | Man page, plain text | Artistic License, Perl's license |
reStructuredText | Python,[21][22] Haskell, Java, | HTML, XML | LaTeX | DocBook | ODF | EPUB | DOC | — | man, S5, Devhelp, QT Help, CHM, JSON | Public Domain | |
Textile | PHP, Javascript, Java, Perl, Python, Ruby, ASP, C#, Haskell | XHTML | No | No | No | No | No | No | — | — | Textile License |
Texy! | PHP, C# | (X)HTML | No | No | No | No | No | No | — | — | GNU GPL v2 License |
txt2tags | Python,[23] PHP[24] | (X)HTML, SGML | LaTeX | DocBook | ODF | EPUB | DOC | Creole, AsciiDoc, MediaWiki, MoinMoin, PmWiki, DokuWiki, Google Code Wiki | roff, man, MagicPoint, Lout, PageMaker, ASCII Art, TXT | GPL |
Comparison of lightweight markup language syntax
Text formatting
Although usually documented as yielding italic and bold text, most lightweight markup processors output semantic HTML elements em
and strong
instead. Monospaced text may either result in semantic code
or presentational tt
elements. Few languages make a distinction, e.g. Textile, or allow the user to configure the output easily, e.g. Texy.
LMLs sometimes differ for multi-word markup where some require the markup characters to replace the inter-word spaces (infix). Some languages require a single character as prefix and suffix, other need doubled or even tripled ones or support both with slightly different meaning, e.g. different levels of emphasis.
Language | Bold | Italic | Monospace | Notes |
---|---|---|---|---|
AsciiDoc | *bold text* |
'italic text' |
+monospace text+ |
Can double operators to apply formatting where there is no word boundary (for example **b**old t**ex**t yields bold text). |
_italic text_ |
`monospace text` | |||
AFT | _bold text_ |
''italic text'' |
|monospace text| |
~small text~ |
Creole | **bold text** |
//italic text// |
{{{monospace text}}} |
Triple curly braces are for nowiki which is optionally monospace in Creole (the choice of the implementor). |
Deplate | {text style=bold: bold text} |
__emphasized text__ |
''monospace text'' |
Deplate discourages visual formatting. Users who want to format text in a particular style have to define style classes in the given output format (CSS, LaTeX). |
Jemdoc[1] | *bold text* |
/italic text/ |
+monospace text+ |
Supports inline LaTeX equations, such as $ f(x) = \frac{1}{x} $ |
KARAS | **bold text** |
//italic text// |
```monospace text``` |
***strong text*** , ///em text/// , __ u text__ , ___ ins text___ and the others. |
Markdown[25] | **bold text** |
*italic text* |
`monospace text` |
Markdown doesn't use bold and italic tags, but rather em (typically italic) and strong (typically bold) tags. |
__bold text__ |
_italic text_ |
`monospace text` | ||
MediaWiki | '''bold text''' |
''italic text'' |
<code>monospace text</code> |
MediaWiki mostly resorts to inline HTML |
Org-mode | *bold text* |
/italic text/ |
=code= |
_underlined_ |
~verbatim~ |
+strike-through+ | |||
PmWiki | '''bold text''' |
''italic text'' |
@@monospace text@@ |
|
reST | **bold text** |
*italic text* |
``monospace text`` |
|
Setext | **bold text** |
~italic text~ |
|
_underlined_text_ |
Textile[26] | *strong* |
_emphasis_ |
@monospace text@ |
Semantic strong and em HTML tags |
**bold text** |
__italic text__ |
Presentational b and i HTML tags | ||
Texy! | **bold text** |
*italic text* |
`monospace text` |
Texy uses semantic tags by default, but can be configured to use presentational tags. |
//italic text// | ||||
txt2tags | **bold text** |
//italic text// |
``monospace text`` |
__underlined__ --strike-through-- |
POD | B<bold text> |
I<italic text> |
C<monospace text> |
Indented text is also shown as monospaced code. |
BBCode | [b]bold text[/b] |
[i]italic text[/i] |
[code]monospace text[/code] |
Formatting works across line breaks. |
Headings
Headings are usually available in up to six levels, but the top one is often reserved to contain the same as the document title, which may be set externally. Some documentation may associate levels with divisional types, e.g. part, chapter, section, article or paragraph.
Most LMLs follow one of two styles for headings, Setext-like underlines or line markers, or they support both.
Underline
Level 1 Heading =============== Level 2 Heading --------------- Level 3 Heading ~~~~~~~~~~~~~~~
The first style uses underlines, i.e. repeated characters (e.g. equals =
, hyphen -
or tilde ~
, usually at least two or four times) in the line below the heading text.
Chars: | = |
- |
~ |
* |
+ |
# |
^ |
_ |
: |
” |
’ |
` |
< |
> |
min |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
AsciiDoc | 1 | 2 | 3 | No | No | No | No | No | No | No | No | No | No | No | ? |
Markdown | 1 | 2 | No | No | No | No | No | No | No | No | No | No | No | No | ? |
reStructuredText | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | ? |
Setext | 1 | 2 | No | No | No | No | No | No | No | No | No | No | No | No | ? |
Texy! | Yes | Yes | No | Yes | No | Yes | No | No | No | No | No | No | No | No | ? |
RST and Texy may determine heading levels dynamically, which makes authoring more individual on the one hand, but complicates merges from external sources on the other hand.
Prefix
# Level 1 Heading ## Level 2 Heading ## ### Level 3 Heading ###
The second style is based on repeated markers (e.g. hash #
, equals =
or asterisk *
) at the start of the heading itself, where the number of repetitions indicates the (sometimes inverse) heading level. Most languages also support the reduplication of the markers at the end of the line, but whereas some make them mandatory, others do not even expect their numbers to match.
Char: | = |
# |
* |
! |
+ |
Suffix | Levels | Indentation |
---|---|---|---|---|---|---|---|---|
AsciiDoc | Yes | No | No | No | No | Optional | 1–6 | No |
AFT | No | No | Yes | No | No | No | 1–7 | No |
Creole | Yes | No | No | No | No | Optional | 1–6 | No |
deplate | No | No | Yes | No | No | No | 1–6 | No |
Jemdoc[1] | Yes | No | No | No | No | No | 1–6 | No |
Markdown | No | Yes | No | No | No | Optional | 1–6 | No |
MediaWiki | Yes | No | No | No | No | Yes | 1–6 | No |
Org-mode | No | No | Yes | No | No | No | 1– +∞ | alternative [27][28] |
PmWiki | No | No | No | Yes | No | Optional | 1–6 | No |
Texy! | No | Yes | No | No | No | Optional | 6–1 or 1–6, dynamic | No |
txt2tags | Yes | No | No | No | Yes | Yes | 1–6 | No |
POD and Textile choose the HTML convention of numbered heading levels instead. Org-mode supports indentation as a means of indicating the level. BBCode does not support section headings at all.
Language | Format |
---|---|
POD | =head1 Level 1 Heading =head2 Level 2 Heading |
Textile[26] | h1. Level 1 Heading h2. Level 2 Heading h3. Level 3 Heading h4. Level 4 Heading h5. Level 5 Heading h6. Level 6 Heading |
Link syntax
Hyperlinks can either be added inline, which may clutter the code because of long URLs, or with named alias
or numbered id
references to lines containing nothing but the address and related attributes and often may be located anywhere in the document.
Most languages allow the author to specify text Text
to be displayed instead of the plain address http://example.com
and some also provide methods to set a different link title Title
which may contain more information about the destination.
LMLs that are tailored for special setups, e.g. wikis or code documentation, may automatically generate named anchors (for headings, functions etc.) inside the document, link to related pages (possibly in a different namespace) or provide a textual search for linked keywords.
Most languages employ (double) square or angular brackets to surround links, but hardly any two languages are completely compatible. Many can automatically recognize and parse absolute URLs inside the text without further markup.
Basic syntax | Text syntax | Title syntax | Languages |
---|---|---|---|
http://example.com |
AFT, BBCode, Creole, Deplate, MediaWiki, PmWiki | ||
"Text":http://example.com |
"Text (Title)":http://example.com |
Textile | |
"Text .(Title)":http://example.com |
Texy! | ||
http://example.com[Text] |
AsciiDoc | ||
[http://example.com] |
[Text http://example.com] |
txt2tags | |
[http://example.com Text] |
Jemdoc, MediaWiki | ||
[Name] |
[Text (Name)] |
AFT | |
[Text (http://example.com)] |
|||
[[Name]] |
[[Name|Text]] |
Creole, MediaWiki, PmWiki | |
[[Name][Text]] |
Deplate, Org-mode | ||
[[Namespace:Name]] |
[[Namespace:Name|Text]] |
Creole | |
[[Namespace:Name][Text]] |
Deplate, Org-mode | ||
[[http://example.com]] |
[[http://example.com|Text]] |
Creole, PmWiki | |
[[http://example.com][Text]] |
Deplate, Org-mode | ||
[url]http://example.com[/url] |
[url=http://example.com]Text[/url] |
BBCode | |
<http://example.com> |
[Text](http://example.com) |
[Text](http://example.com "Title") |
Markdown |
`Text <http://example.com/>`_ |
reStructuredText | ||
L</Name> |
POD | ||
L<http://example.com/> |
POD | ||
Text syntax | Title syntax | Languages |
---|---|---|
... Name_ ...
.. _Name: http://example.com
|
reStructuredText | |
... [Text][id] ... [id]: http://example.com |
... [Text][id] ... [id]: http://example.com "Title" |
Markdown |
... "Text":alias ... [alias]http://example.com |
... "Text":alias ... [alias (Title)]http://example.com |
Textile |
... "Text":alias ... [alias]: http://example.com |
... "Text":alias ... [alias]: http://example.com .(Title) |
Texy! |
See also
References
- 1 2 3 4 Mattingley, Jacob (2012-11-27). "jemdoc – cheatsheet". jemdoc.jaboc.net. Retrieved 2015-01-27.
- ↑ "PHP Markdown Extra". Michelf.com. Retrieved 2013-10-08.
- 1 2 3 Pandoc, which is written in Haskell, parses Markdown (in two forms) and ReStructuredText, as well as HTML and LaTeX; it writes from any of these formats to HTML, RTF, LaTeX, ConTeXt, OpenDocument, EPUB and several other formats, including (via LaTeX) PDF.
- ↑ "Html2wiki txt2tags module". cpan.org. Retrieved 2014-01-30.
- ↑ "Txt2tags User Guide". Txt2tags.sourceforge.net. Retrieved 2013-10-08.
- ↑ Todd A. Coram (2010-09-09). "Almost Free Text (AFT) Reference Manual". 5.98.
- ↑ "Converters". WikiCreole. Retrieved 2013-10-08.
- ↑ pegdown : A Java library for Markdown processing
- 1 2 gfms : Github Flavored Markdown Server
- 1 2 marked : A full-featured markdown parser and compiler, written in JavaScript. Built for speed.
- 1 2 node-gfm : GitHub flavored markdown to html converter
- ↑ Parsedown : Markdown parser written in PHP
- 1 2 Ciconia : Markdown parser written in PHP
- 1 2 Grip : GitHub Readme Instant Preview
- ↑ github-markdown : Self-contained Markdown parser for GitHub
- ↑ peg-markdown is an implementation of markdown in C.
- ↑ Discount is also an implementation of markdown in C.
- ↑ "Python-Markdown". Github.com. Retrieved 2013-10-08.
- ↑ Bruce Williams <http://codefluency.com>, for Ruby Central <http://rubycentral.org>. "kramdown: Project Info". RubyForge. Retrieved 2013-10-08.
- 1 2 "Via ox-pandoc and pandoc itself".
- ↑ Docutils is an implementation of ReStructuredText in Python
- ↑ Sphinx is an implementation of ReStructuredText in Python and Docutils with a number of output format Builders
- ↑ Aurelio Jargas www.aurelio.net (2012-01-11). "txt2tags". txt2tags. Retrieved 2013-10-08.
- ↑ "txt2tags.class.php - online convertor". Txt2tags.org. Retrieved 2013-10-08.
- ↑ "Markdown Syntax". Daringfireball.net. Retrieved 2013-10-08.
- 1 2 Textile Syntax
- ↑ "using org-adapt-indentation".
- ↑ "using org-indent-mode or org-indent".
External links
Wikibooks has a book on the topic of: Curl |
- C2's list
- Curl Markup to replace HTML+CSS+JS
- Inhabitants of the authoring ecosphere
- List at otl website
- Humane Text Formats - A comparison (obsolete)
- Pandoc: a versatile inter-format converter
|