User:Krauss/WTF

From Wikipedia, the free encyclopedia

This article details some of the formalism behind the Web template system and Template system articles, for the benefit of readers who desire a more rigorous treatment of the material than is accessible to a general audience.

Contents

[edit] System characterization

Motivations and methodological background

There are many systems promoted as being template systems. We need a objective criteria to select what is and what is not a template system.

A "candidate system" (to be characterized as a template system) is a black box for us, then, a practical methodology for selection is the Black box testing (with a "Use case testing"). It is good because takes an external perspective of the test

A simple template system illustration (like that below) show for us what we need to model as a black box, what your inputs and what your outputs:

  • The template engine, a process, is the Black box;
  • The template and contents are inputs;
  • The output documents are outputs.

Now, for answer the question

What is and what is not a template system?

we need, for the Black box testing methodology, to fix a set of use case tests. A set of positive results on the tests must reflect the essential properties of a template system.

We use a mathematical system model to specify with precision a "check list" of these properties.

[edit] Informal check-list of essential properties

  1. Empty template supply a empty document: the template engine can not add information to the empty template.
  2. Content independence on a "empty of instructions" template: for usual template languages, a output document file (like to use a HTML file as a PHP file) is also a valid template. On many template languages we can not only copy/paste document to the template for this proposite, we need add Hooks (and/or headers) to express a "template container", see XSLT. To generalize the idea we can use a kind of "clean function" to eliminate script language from the template. This clean operation can be exemplifyed by a little PHP code:
    PHP source: <html><? if ($flag){?> Hello <?= $x; ?>! <? } else { ?> Bye <? }?> </html> then
    Cleanned PHP: <html> Hello ! Bye </html>.
  3. The output document is ever a "cleaned document", without any trace of the script language. All the "template language code" are processed by the template engine in one step.
  4. No information generated by the template engine.
  5. The output documents of templates with only variations on presentation, have also only variations on presentation, independent the content. If we compare both, they have the same information.

See also FAQ about use of the definitions.

[edit] Formal characterization

Elements (C,T,P,R) on the dataflow representation.
Elements (C,T,P,R) on the dataflow representation.

To formally define, we can modeling the (dataflow) template system as a function and your parameters:

  • Inputs:
    • Template library, L = {T1, T2, … Ti, … TN}
      There are N templates, identified by the index i. The template T1 is also used as "default root template", when engine need a pre-defined root. For express a input with a single template (N=1) the alternative notation L={T1}={T}, can be used.
    • Content, C \subseteq D
      C is a set (or a sequence) of incoming data values, from the content resource. C is in principle read-only, but engine can assign values for simplify the variable declaration feature. Alternative notation for express elements (attributes) C = {c1, c2, … cj, … cM}
      D is the data model, formally a (universe) set of all possible contents. It can also specify data structure — when C is not a single set of values, like a sequence (ordered set), or a XML input.
  • Process, P(L,C)
    It is a black box model for the template engine. Suppose P as a overloading, to express singular case, P(T,C)=P({T},C). Systems that only work as a P(T,C) process — perhaps using implicit sub-templates, but not a template library — can said "lib-less" systems.
  • Process output, R = P(L,C)
    On web template systems, R is a web document. On (generic) template systems, R is any kind of document.

Essential properties

Let:

  • ø: a empty content, empty template or empty document.
  • Tnop: a template with no programming operation (without template instructions).
    Lnop = { T | T is a Tnop}
  • Ta, Tb are "presentation variants".
    (La, Lb) is a library relationship where all Ta,i from La have a correspondent Tb,j from Lb.
  • Clean(T) is a "clean function" that extract all fragments (with respective hooks) of the script language.
    For template languages like Haml, that specify the output language into an "alternative syntax", it is necessary that this syntax is reversible[1] to the output language, then, the clean operator also embody this reversion.
  • I(X) is a "information content set" function, like a set of words from a txt converter. If X is a template, the I(X) process starts with Clean(X). If it is a library, it is applied for all library templates.
    For T on the above example, I(T) = {Hello, Bye}.
  • P' process, returns a "expanded equivalent template". A engine (or manual procedure) that only find and expand sub-template references.

Essential properties (for all L, La, Lb, Lnop, and C), that template systems satisfy:

# Property Notes
1 P({ø},C) = ø ø bypass (equivalence between cleaned empty template and empty output).
2 P(Lnop,ø) = P(Lnop, C) = Clean(P' (Lnop,ø)) Content independence on the "no instructions template". Tnop bypass. If L={T}, P(Tnop,ø) = Clean(Tnop).
3 P(L,C) = Clean(P(L,C)) R have nothing to process if it is used as a template, it can only be used as a Tnop.
4 I(P(L,C)) \subseteq I(P' (L,C)) U I(C) No information generated by P. If L={T}, I(P(T,C)) \subseteq I(T) U I(C).
5 I(P(La,C)) = I(P(Lb,C)) Content conservation on "presentation variants".
Obs.: imply   I(P' (La,C)) = I(P' (Lb,C))   and, if L={T}, I(Ta) = I(Tb) .

Notes:

  • See corresponding informal properties.
  • Template systems modeled as complex dataflows, must use simplifier hypothesis to the characterization.
  • Template systems modeling is the first and fundamental step to the characterization. Example:
    for characterize a documentation generator as a template system, the source code (commented or not) is modeled as (structured) content, and usually templates are internal (not customizable) to the system, the configuration files are modeled as content.
  • It is possible a P composition (pipeline), if the content, C, and output, R, are on the same format, like XML. Example: a composition   PXSLT(L, PXQuery(T,C))   on Cocoon pipeline (see also XML transformation languages).
  • If no content on template, I(T)=ø, and script language is regular, then T is like a schema. There are cases where the template language is like a "programmable schema specification" (compare Haml with RELAX Compact_syntax).
  • For validate a "candidate system" we need the correspondent Clean and P', as specific checking tools, and I as a generic checking tool.

Important conclusions from definitions:

  • The simplest "substitution string system" can characterized as template system.
  • All template language need clear rules and "syntax facilities" (simple for human and machine) to evaluate the Clear(T) function.
  • Template files from languages like XSLT or XQuery are templates, but files like a Perl script, evaluated by a usual Pperl interpreter, that need output instruction like print and has no hook notation, are not.
  • The process cannot be arbitrary.
  • The possibility of use sub-templates is a feature for template languages, is not a general characteristic.
  • The possibility of sub-template recurrence relation is also a feature (may be formalized by P' but not is modeled to preserve the simplicity of template system definition).

[edit] Referential "driven types"

For specify projects and division of tasks, designers and programmers need to adopt objective point of view to see and to organize a template set, that result into a certain dichotomy of "referential types of systems". The types of system strategies are with respective P process, and how engines do decisions about sub-template choices:

  1. Script-driven template systems:
    • Designer's perception: the template engine "select template fragments and fill it with content".
    • Programmer's perception: all the logic about sub-template choices (typically a if/then or switch/case logic) are explicit into the script.
    • Examples: SSI (simplest lib-less), XQuery (sophisticated lib-less), Smarty (sophisticated).
    • Notes: need a "root template", to express the logic for select "first level sub-templates".
  2. Content-driven template systems:
    • Designer's perception: the template engine "select desirable content and frame it with template fragments".
    • Programmer's perception: part of the logic is implicit (not expressed on script), and is on engine as pre-defined rules, that permit the content choose (select) what sub-template will be used.
    • Example: XSLT (lib), attribute languages like Zope (lib-less on typical uses).
    • Notes: to supply the content-driven sub-template reference language feature (content choice by a dynamic context), the engine use a dispatcher or another event-driven algorithms.

There are also mixed types: a script-driven template system augmented with a kind of "match and referrer sub-template by ID" (a simple hash dispatcher can do it) instead direct control. By other hand, a content-driven template system can express traditional logic into a root template, used as a script-driven template.

[edit] Architecture characterization

The architecture of template systems, into a client-server reference model, is the main split criteria for group then. There are illustrated (see on links) three groups: Outside server systems, Server-side systems, and Distributed systems. A formal characterization avoid mistakes about systems with cache strategies and remote references.

Using the system notation (defs. for R, T,C, and P above) and adding:

Notation Definition
A@X "A is at X", or "the resource for A is at the X machine or at the same LAN". @X can be:
  • @C - On Client.
  • @S - On Server (or on a same high performance LAN).
  • @O - On another LAN, Outside server (and outside client).
A@X ← B@Y "information A, at X, is a received information B from Y". Attribution with communication.

Outside server systems (or "local systems")

R ← P@O(L@O,C@O)

The system act only on local transfer process. The "global transfer process" need two stages:

  1. R@OP@O(L@O,C@O)    Output production, with the system.
  2. R@C ← R@S ← R@O    Publication (using another system or something like manual FTP) and distribution (e.g. HTTP browsing).

Server-side systems   there no flow between nets, all are server-side net (or server machine).

R@CP@S(L@S,C@S)    Output "on-demand production", "on-fly publication" and receiving (over distribution method).

Or caching on server:

  1. R@C1 ← Rcache@SP@S(L@S,C@S)    Output "on demand production" (first request) and caching.
  2. R@C2 ← Rcache@S    (next request), using the cache.
Note: system generating meta-templates (like a CMS generating a PHP output) use cache also for the first request.

Distributed systems   All other combinations, with one or more elements, but not all, on sever:

R@S1P@S2(L@S3,C@S4)    Generic case.
R@CP@C(L@S1,C@S2)    Typical case.

There are also, on distributed systems, the possibility of use a "distributed library", L, where the templates are not at the same resource.

Note: when a system with a outside server engine do also the publication, it is characterized as a distributed system:

R@S ← R@OP@O(L@O,C@O)
Typical server-side systems when producing with a pre-determined demand, can also use similar strategy, to "cache by publication".

[edit] Language characterization

Informally a simple template T is a "document with holes", where holes are place-holders or macro references. A template T, from the template system reference model, is a input it self, or a element from the library.

[edit] Template characterization

A template T is a string that can be splited (using "hook criteria") into 2 distinct, not empty, token types:

  • t: output document contiguous fragments.
  • s: script contiguous fragments, like expressions or instructions — simple instructions, or statements, or directives, or blocks of them.
    Note: a sequence of repeated s, like occurs with XSLT or ColdFusion, is transformed into a unique "contiguous s" block.

The resulted sequence of tokens is not arbitrary, and, theoretically the "contiguous hypothesis" enforce a pattern that avoid validation necessity. Technically it is validated by a regular expression: /^((t(st)*s?)|(s(ts)*t?))$/.

Formally it is supplied by a generative grammar, G = (N,Σ,Q,A), with N = {A,X,Y}, Σ = {t,s}, A the start symbol, and Q the following production rules:   A: = X | Y | s | t;   X: = tsX | ts | tst;  Y: = stY | st | sts.

Notes:

About convention for "embed" terminology: if the template T is generated by X productions (starts with t), it is a template with "output language embedded with the script", else (starts with s) it is a template with a "script embedded with the output language". Languages like XQuery permits both of the "template embeddeding modes".
About point of view: designers see the script fragments as "holes", then, designers always see (by a background effect or viewer/editor choose) a template as a "output language embedded with a script".
About Parr definition: this definition is given by a generalization over "Parr split model" [2], that must start with t and not is submitted to system context considerations.

[edit] Level of affinity between script and output languages

The resulted pattern ([st]+ sequence) not need to reflect a well-balanced XML structure, or a script with nested loops. But this kind of behavior reflect the level of affinity between languages.

The paramount characteristic of a template scripting language "is whether it operates at the lexical or syntactical level" [3].

Conceptually, lexical P processing precedes "output language parsing", and is thus ignorant of the syntax of the underlying output language. The t fragments are "transparents" for s and vice-versa; they have no affinity.

Typically "lexically embed scripts" like ASP, PHP and JSP, can be lexically transformed into a full script: output language fragments (t) are wrapped in invocations of print-like instructions to output.

"In contrast, syntactical languages operate on parse trees (...) which of course requires knowledge of the host language and its grammar. (...) the syntax may help convey the meaning of and reflect the nature of the abstraction."[3].

There are also hibrid levels. XSLT, XQuery, TeX macro language, and Haml offer more affinity than lexical languages, recognizing the basic rules of the output language: s and t can be balanced and/or complemented.

[edit] Template types

T is a grammar G where the s script fragments are specifications to the engine, to generate output using the content, C, and the output fragments, t.

The simplest script type only do scalar variable references. Parr defined[2] another 4 types of templates:

  • Regular (Parr's def.2): have a "internal grammar" restricted to 2 sub-token types, a (scalar or multi-valued) variable reference and a sub-template reference. Both references are side-effect free, and may iterate over a set of multi-variable values (from content C) or literals (t).
  • Context-free (Parr's def.3): limited to referencing scalar variables and sub-templates, but more general than regular language "(...) since it can handle balanced tree structures"[4].
  • Context-sensitive (Parr's def.4): is a Context-free augmented to allow predicated template application; that is, a template augmented to allow template references or inclusion of sub-templates only in certain grammatical context. Predicates operate on variables and the template tree structure itself. Actions and predicates are side-effect free. By limiting predicates to operations on cj and surrounding template (t).
  • Unrestricted (Parr's def.1): like context-sensitive, but unrestricted computationally and syntactically. Script fragments behave as Turing machines.

[edit] Template Languages hierarchy

There are two levels of abstractions for the template language definition:

  1. Template instance grammar: the template text is splited into elements of a generational grammar, and output analyzed. T is a string that was characterized as a template grammar, and, analyzing output behaviour it can be characterized also as a instance of a grammar type.
  2. Template language: it is a language where a "split model grammar" generates it, and have a pre-fixed standard meta-grammar characterized by the specific "script language". The same schema of types may be used to define generic language types (groups of standards).

Template languages can be grouped in a hierarchy:

Language class Template type (formal name) Notes
Recursively enumerable Complex (unrestricted template)
Recursive "Near Complex" (Recursive) Grouped with Programmable. Not exist on Parr neither Chomsky schemes.
Context-sensitive Programmable (Context-sensitive) Use IFs.
Context-free Iterable (context-free)
Regular (regular template) Grouped with Iterable.
(not a language) Simple Not exist on Parr scheme.
A Venn diagram showing the template language types as sets of features.
A Venn diagram showing the template language types as sets of features.

The main divisory line, from the good separation principles perspective, is about Programmable/Iterable (Context-sensitive / Context-free). From algorithms perspective, the upper line — complex languages have power to produce any algorithm — and lower line (no algorithm on simple languages).

These template languages groups,

Level 3 - Complex template language,
Level 2 - Programmable template language,
Level 1 - Iterable template language,
Level 0 - Simple template language;

are also a hierarchy of feature sets. The logic of the hierarchy is about minimal features: if a language have the "minimal features" of a level N > 0, the language will/must have all the minimal features of the level N-1.

[edit] References

  1. ^ ""Dual Syntax for XML Languages", C. Brabrand, A. Moller, M. I. Schwartzbach. (2006). University of Aarhus, Denmark.
  2. ^ a b "Enforcing Strict Model-View Separation in Template Engines", T. Parr (2004). In Proceedings International WWW Conference, New York, USA. Access also at USFCA University.
  3. ^ a b "Growing Languages with Metamorphic Syntax Macros", C. Brabrand & M. I. Schwartzbach. (2000). University of Aarhus, Denmark.
  4. ^ "Domain Specific Languages for Interactive Web Services", C. Brabrand (2002). PhD Dissertation, University of Aarhus, Denmark.