Dyck language

Lattice of the 14 Dyck words of length 8 - [ and ] interpreted as up and down

In the theory of formal languages of computer science, mathematics, and linguistics, the Dyck language is the language consisting of balanced strings of square brackets [ and ]. It is important in the parsing of expressions that must have a correctly nested sequence of brackets, such as arithmetic or algebraic expressions. It is named after the mathematician Walther von Dyck.

Formal definition

Let $\Sigma =\{[,]\}$ be the alphabet consisting of the symbols [ and ] and let $\Sigma ^{{*}}$ denote its Kleene closure. For any element $u\in \Sigma ^{{*}}$ with length $|u|$ we define partial functions $\operatorname {insert} :\Sigma ^{*}\times \mathbb {N} \cup \{0\}\rightarrow \Sigma ^{*}$ and $\operatorname {delete} :\Sigma ^{*}\times \mathbb {N} \rightarrow \Sigma ^{*}$ by

\operatorname {insert} (u,j)

u

with "

[]

" inserted into the

j

th position

\operatorname {delete} (u,j)

u

with "

[]

" deleted from the

j

th position

with the understanding that $\operatorname {insert} (u,j)$ is undefined for $j>|u|$ and $\operatorname {delete} (u,j)$ is undefined if $j>|u|-2$ . We define an equivalence relation $R$ on $\Sigma ^{{*}}$ as follows: for elements $a,b\in \Sigma ^{{*}}$ we have $(a,b)\in R$ if and only if there exists a finite sequence of applications of the $\operatorname {insert}$ and $\operatorname {delete}$ functions starting with $a$ and ending with $b$ , where the empty sequence is allowed. That the empty sequence is allowed accounts for the reflexivity of $R$ . Symmetry follows from the observation that any finite sequence of applications of $\operatorname {insert}$ to a string can be undone with a finite sequence of applications of $\operatorname {delete}$ . Transitivity is clear from the definition.

The equivalence relation partitions the language $\Sigma ^{{*}}$ into equivalence classes. If we take $\epsilon$ to denote the empty string, then the language corresponding to the equivalence class $\operatorname {Cl}(\epsilon )$ is called the Dyck language.

Alternative definition

An alternative definition of the Dyck language can be formulated when we introduce the $imbalance:\Sigma ^{*}\rightarrow \mathbb {Z}$ function.

\operatorname {imbalance} (u)=|u|_{[}-|u|_{]}

for any

u\in \Sigma ^{{*}}

where $|u|_{{[}}$ and $|u|_{{]}}$ are respectively the number of [ and ] in $u$ , i.e. $\operatorname {imbalance}$ counts the imbalance of [ over ]. If $\operatorname {imbalance} (u)$ is positive then $u$ has more [ than ].

Now, the Dyck language can be defined as the language

\{u\in \Sigma ^{*}\vert \operatorname {imbalance} (u)=0{\text{ and }}\operatorname {imbalance} (v)\geq 0{\text{ for all prefixes }}v{\text{ of }}u\}

Properties

The Dyck language is closed under the operation of concatenation.
By treating $\Sigma ^{{*}}$ as an algebraic monoid under concatenation we see that the monoid structure transfers onto the quotient $\Sigma ^{{*}}/R$ , resulting in the syntactic monoid of the Dyck language. The class $\operatorname {Cl}(\epsilon )$ will be denoted $1$ .
The syntactic monoid of the Dyck language is not commutative: if $u=\operatorname {Cl}([)$ and $v=\operatorname {Cl}(])$ then $uv=\operatorname {Cl}([])=1\neq \operatorname {Cl}(][)=vu$ .
With the notation above, $uv=1$ but neither $u$ nor $v$ are invertible in $\Sigma ^{{*}}/R$ .
The syntactic monoid of the Dyck language is isomorphic to the bicyclic semigroup by virtue of the properties of $\operatorname {Cl}([)$ and $\operatorname {Cl}(])$ described above.
By the Chomsky–Schützenberger representation theorem, any context-free language is a homomorphic image of the intersection of some regular language with a homomorphic preimage of the Dyck language on two brackets.^[1]
The Dyck language with two distinct types of brackets can be recognized in the complexity class $TC^{{0}}$ .^[2]

Examples

We can define an equivalence relation $L$ on the Dyck language ${\mathcal {D}}$ . For $u,v\in\mathcal{D}$ we have $(u,v)\in L$ if and only if $|u| = |v|$ , i.e. $u$ and $v$ have the same length. This relation partitions the Dyck language $\mathcal{D} / L = \mathcal{D}_{0} \cup \mathcal{D}_{2} \cup \mathcal{D}_{4} \cup \ldots = \bigcup_{n=0}^{\infty} \mathcal{D}_{n}$ where $\mathcal{D}_{n} = \{ u\in\mathcal{D} \mid |u| = n\}$ . Note that $\mathcal{D}_{n}$ is empty for odd $n$ . E.g. there are 14 words in $\mathcal{D}_{8}$ , i.e. [[[[]]]], [[[][]]], [[[]][]], [[][[]]], [[[]]][], [[][][]], [][[[]]], [[][]][], [[]][[]], [][[][]], [[]][][], [][[]][], [][][[]], [][][][].

Having introduced the Dyck words of length $n$ , we can introduce a relationship on them. For every $n\in \mathbb {N}$ we define a relation $S_{n}$ on $\mathcal{D}_{n}$ ; for $u,v\in\mathcal{D}_{n}$ we have $(u,v)\in S_{n}$ if and only if $v$ can be reached from $u$ by a series of proper swaps. A proper swap in a word $u\in\mathcal{D}_{n}$ swaps an occurrence of '][' with '[]'. For each $n\in \mathbb {N}$ the relation $S_{n}$ makes $\mathcal{D}_{n}$ into a partially ordered set. The relation $S_{n}$ is reflexive because an empty sequence of proper swaps takes $u$ to $u$ . Transitivity follows because we can extend a sequence of proper swaps that takes $u$ to $v$ by concatenating it with a sequence of proper swaps that takes $v$ to $w$ forming a sequence that takes $u$ into $w$ . To see that $S_{n}$ is also antisymmetric we introduce an auxiliary function $\sigma_{n}:\mathcal{D}_{n}\rightarrow\mathbb{N}:u\mapsto\sum_{vw=u} imbalance(v)$ where $v$ ranges over all prefixes of $u$ . The following table illustrates that $\sigma_{n}$ is strictly monotonic with respect to proper swaps.

Strict monotonicity of $\sigma_{n}$
partial sums of $\sigma _{n}(u)$	$P$	$P-1$	$P$	$Q$
$u$	$\ldots$	]	[	$\ldots$
$u'$	$\ldots$	[	]	$\ldots$
partial sums of $\sigma _{n}(u')$	$P$	$P+1$	$P$	$Q$
Difference of partial sums	0	2	0	0

Hence $\sigma_{n}(u') - \sigma_{n}(u) = 2 > 0$ so $\sigma_{n}(u) < \sigma_{n}(u')$ when there is a proper swap that takes $u$ into $u'$ . Now if we assume that both $(u,v), (v,u)\in S_{n}$ and $u\ne v$ , then there are non-empty sequences of proper swaps such $u$ is taken into $v$ and vice versa. But then $\sigma_{n}(u) < \sigma_{n}(v) < \sigma_{n}(u)$ which is nonsensical. Therefore, whenever both $(u,v)$ and $(v,u)$ are in $S_{n}$ , we have $u = v$ , hence $S_{n}$ is antisymmetric.

The partial ordered set $D_{8}$ is shown in the illustration accompanying the introduction if we interpret a [ as going up and ] as going down.

Notes

↑ Kambites, Communications in Algebra Volume 37 Issue 1 (2009) 193-208
↑ Barrington and Corbett, Information Processing Letters 32 (1989) 251-256

References

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.