Shunting-yard algorithm

In computer science, the shunting-yard algorithm is a method for parsing mathematical expressions specified in infix notation. It can produce either a postfix notation string, also known as Reverse Polish notation (RPN), or an abstract syntax tree (AST). The algorithm was invented by Edsger Dijkstra and named the "shunting yard" algorithm because its operation resembles that of a railroad shunting yard. Dijkstra first described the Shunting Yard Algorithm in the Mathematisch Centrum report MR 34/61.

Like the evaluation of RPN, the shunting yard algorithm is stack-based. Infix expressions are the form of mathematical notation most people are used to, for instance "3 + 4" or "3 + 4 × (2 − 1)". For the conversion there are two text variables (strings), the input and the output. There is also a stack that holds operators not yet added to the output queue. To convert, the program reads each symbol in order and does something based on that symbol. The result for the above examples would be "3 4 +" or "3 4 2 1 − × +".

The shunting-yard algorithm was later generalized into operator-precedence parsing.

A simple conversion

Input: 3 + 4
  1. Push 3 to the output queue (whenever a number is read it is pushed to the output)
  2. Push + (or its ID) onto the operator stack
  3. Push 4 to the output queue
  4. After reading the expression, pop the operators off the stack and add them to the output.
    In this case there is only one, "+".
  5. Output: 3 4 +

This already shows a couple of rules:

Graphical illustration

Graphical illustration of algorithm, using a three-way railroad junction. The input is processed one symbol at a time: if a variable or number is found, it is copied directly to the output a), c), e), h). If the symbol is an operator, it is pushed onto the operator stack b), d), f). If the operator's precedence is less than that of the operators at the top of the stack or the precedences are equal and the operator is left associative, then that operator is popped off the stack and added to the output g). Finally, any remaining operators are popped off the stack and added to the output i).

The algorithm in detail

Important terms: Token, Function, Operator associativity, Precedence

while there are tokens to be read:
	read a token.
	if the token is a number, then push it to the output queue.
	if the token is an operator, then:
		while there is an operator at the top of the operator stack with
			greater than or equal to precedence and the operator is left associative:
				pop operators from the operator stack, onto the output queue.
		push the read operator onto the operator stack.
	if the token is a left bracket (i.e. "("), then:
		push it onto the operator stack.
	if the token is a right bracket (i.e. ")"), then:
		while the operator at the top of the operator stack is not a left bracket:
			pop operators from the operator stack onto the output queue.
		pop the left bracket from the stack.
		/* if the stack runs out without finding a left bracket, then there are
		mismatched parentheses. */
if there are no more tokens to read:
	while there are still operator tokens on the stack:
		/* if the operator token on the top of the stack is a bracket, then
		there are mismatched parentheses. */
		pop the operator onto the output queue.
exit.

To analyze the running time complexity of this algorithm, one has only to note that each token will be read once, each number, function, or operator will be printed once, and each function, operator, or parenthesis will be pushed onto the stack and popped off the stack once—therefore, there are at most a constant number of operations executed per token, and the running time is thus O(n)—linear in the size of the input.

The shunting yard algorithm can also be applied to produce prefix notation (also known as Polish notation). To do this one would simply start from the end of a string of tokens to be parsed and work backwards, reverse the output queue (therefore making the output queue an output stack), and flip the left and right parenthesis behavior (remembering that the now-left parenthesis behavior should pop until it finds a now-right parenthesis).

Detailed example

Input: 3 + 4 × 2 ÷ ( 1 − 5 ) ^ 2 ^ 3

operator precedence associativity
^ 4 Right
× 3 Left
÷ 3 Left
+ 2 Left
2 Left

The symbol ^ represents the power operator.

Token Action Output
(in RPN)
Operator
stack
Notes
3 Add token to output 3
+ Push token to stack 3 +
4 Add token to output 3 4 +
× Push token to stack 3 4 × + × has higher precedence than +
2 Add token to output 3 4 2 × +
÷ Pop stack to output 3 4 2 × + ÷ and × have same precedence
Push token to stack 3 4 2 × ÷ + ÷ has higher precedence than +
( Push token to stack 3 4 2 × ( ÷ +
1 Add token to output 3 4 2 × 1 ( ÷ +
Push token to stack 3 4 2 × 1 − ( ÷ +
5 Add token to output 3 4 2 × 1 5 − ( ÷ +
) Pop stack to output 3 4 2 × 1 5 − ( ÷ + Repeated until "(" found
Pop stack 3 4 2 × 1 5 − ÷ + Discard matching parenthesis
^ Push token to stack 3 4 2 × 1 5 − ^ ÷ + ^ has higher precedence than ÷
2 Add token to output 3 4 2 × 1 5 − 2 ^ ÷ +
^ Push token to stack 3 4 2 × 1 5 − 2 ^ ^ ÷ + ^ is evaluated right-to-left
3 Add token to output 3 4 2 × 1 5 − 2 3 ^ ^ ÷ +
end Pop entire stack to output 3 4 2 × 1 5 − 2 3 ^ ^ ÷ +

Input: sin ( max ( 2, 3 ) ÷ 3 × 3.1415 )

Token Action Output =
(in RPN)
Operator
stack
Notes
sin Push token to stack sin
max Push token to stack max sin
2 Add token to output 2 max sin
3 Add token to output 2 3 max sin
÷ Pop stack to output 2 3 max sin
Push token to stack 2 3 max ÷ sin
3 Add token to output 2 3 max 3 ÷ sin
× Pop stack to output 2 3 max 3 ÷ sin
Push token to stack 2 3 max 3 ÷ × sin
3.1415 Add token to output 2 3 max 3 ÷ 3.1415 × sin
end Pop entire stack to output 2 3 max 3 ÷ 3.1415 × sin

See also

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.