Brzozowski derivative

Brzozowski derivative (on red background) of a dictionary string set with respect to "con"

In theoretical computer science, in particular in formal language theory, the Brzozowski derivative u−1S of a set S of strings and a string u is defined as the set of all rest-strings obtainable from a string in S by cutting off its prefix u (if possible), formally: u−1S = { v ∈ Σ*: uvS }, cf. picture.[1] It is named after the computer scientist Janusz Brzozowski who investigated their properties and gave an algorithm to compute the derivative of a generalized regular expression.

Derivative of a regular expression

Given a finite alphabet A of symbols,[2] a generalized regular expression denotes a possibly infinite set of finite-length strings of symbols from A. It may be built of:

In an ordinary regular expression, neither ∧ nor ¬ is allowed. The string set denoted by a generalized regular expression R is called its language, denoted as L(R).

As an auxiliary function, δ(R) yields the empty string ε if the language corresponding to R contains ε; otherwise, δ(R) yields ∅. This function can be computed by the following rules:[3]

δ(ε) = ε
δ(∅) = ∅
δ(R*) = ε
δ(RS) = δ(R) ∧ δ(S)
δ(RS) = δ(R) ∧ δ(S)
δ(RS) = δ(R) ∨ δ(S)
δ(¬R) = ε if δ(R) = ∅
δ(¬R) = ∅ if δ(R) = ε

Based on that, the derivative of a generalized regular expression with respect to a single-symbol string a can be computed as follows:[4]

a−1a = ε
a−1b = ∅ for each symbol ba
a−1ε = ∅
a−1 = ∅
a−1(R*) = a−1RR*
a−1(RS) = (a−1R)S ∨ δ(R)a−1S
a−1(RS) = (a−1R) ∧ (a−1S)
a−1(RS) = (a−1R) ∨ (a−1S)
a−1R) = ¬(a−1R)

Finally, for a symbol a, an arbitrary string u, and a generalized regular expression R, the derivative (ua)−1R can be computed recursively as a−1(u−1R); and ε−1R simply equals R.[5] This way, for any given generalized regular expression R and any string u, the derivative u−1R may be computed as another generalized regular expression.[6]

Properties

A string u is a member of the string set denoted by a generalized regular expression R if and only if ε is a member of the string set denoted by the derivative u−1R.[7]

Considering all the derivatives of a fixed generalized regular expression R results in only finitely many different languages. If their number is denoted by dR, all these languages can be obtained as derivatives of R with respect to string of length below dR.[8]

References

  1. Janusz A. Brzozowski (1964). "Derivatives of Regular Expressions". JACM 11: 481–494. doi:10.1145/321239.321249.
  2. Brzozowski (1964), p.481, required A to consist of the 2n combinations of n bits, for some n.
  3. Brzozowski (1964), p.482, Definition 3.2
  4. Brzozowski (1964), p.483, Theorem 3.1
  5. Brzozowski (1964), p.483, Theorem 3.2
  6. Brzozowski (1964), p.483, Theorem 4.1
  7. Brzozowski (1964), p.483, Theorem 4.2
  8. Brzozowski (1964), p.484, Theorem 4.3