Syntax of programming languages
From Wikipedia, the free encyclopedia
In computer science, the syntax of a programming language is the set of rules that a sequence of characters in a source code file must follow to be considered as a syntactically conforming program in that language.
The rules specify how the character sequences are to be chunked into tokens (the lexical grammar), the permissible sequences of these tokens and some of the meaning to be attributed to these permissible token sequences (additional meaning is assigned by the semantics of the language).
The syntactic analysis of source code usually entails the transformation of the linear sequence of tokens into a hierarchical syntax tree (abstract syntax trees are one convenient form of syntax tree). This process is called parsing, as it is in syntactic analysis in linguistics. Tools have been written that automatically generate parsers from a specification of a language grammar written in Backus-Naur form, e.g., Yacc (yet another compiler compiler).
The syntax of many computer languages is at level-2 (i.e., a context-free grammar) in the Chomsky hierarchy (constructs such as regular expressions are at level-1).