Parsing Math Expressions in C#. Shunting-Yard Algorithm

Winter is the coldest time of the year, but even it is not as cold as the interviewer’s reaction to your implementation of the calculator for the postfix notation. Nevertheless, we do not have time for emotions, therefore, we’re proceeding to analyze the shunting-yard algorithm mentioned in the previous article.

This algorithm is used for transforming the math expressions written in the infix notation to the postfix notation. We’re not going to give a complete proof of this algorithm’s correctness but only partial explanations to make the implementation more intuitive.

Let’s start with the simplest examples: applying a binary operator to operands. For example, for the expression 2 + 3 a similar expression in the postfix notation would be 2 3 +. The same is true for other binary operators. Let’s try to create a simple model in C# for working with such expressions:

  • While an expression has tokens take a token:
    • If the token is an operand then add it to the final sequence;
    • If the token is an operator then put it to some temporary storage.
  • Take the previously saved operator from the storage and add it to the final sequence.
public class ShuntingYardAlgorithm
{
    public ShuntingYardAlgorithm()
    {
        _operatorStorage = null;
        _postfixNotationTokens = new List<IToken>();
    }

    public IEnumerable<IToken> Apply(IEnumerable<IToken> infixNotationTokens)
    {
        foreach (var token in infixNotationTokens)
        {
            ProcessToken(token);
        }
        return GetResult();
    }

    private void ProcessToken(IToken token)
    {
        switch (token)
        {
            case OperandToken operandToken:
                StoreOperand(operandToken);
                break;
            case OperatorToken operatorToken:
                PushOperator(operatorToken);
                break;
            default:
                var exMessage = $"An unknown token type: {token.GetType().ToString()}.";
                throw new SyntaxException(exMessage);
        }
    }

    private void StoreOperand(OperandToken operandToken)
    {
        _postfixNotationTokens.Add(operandToken);
    }

    private void PushOperator(OperatorToken operatorToken)
    {
        _operatorStorage = operatorToken;
    }

    private List<IToken> GetResult()
    {
        if (_operatorStorage.HasValue)
        {
            _postfixNotationTokens.Add(_operatorStorage.Value);
        }
        return _postfixNotationTokens;
    }

    private OperatorToken? operatorStorage;
    private readonly List<IToken> _postfixNotationTokens;
}

All this works great until we want to complicate the original expression and add another operator 2 + 3 - 4. For the following case, a similar expression in the postfix notation would be 2 3 + 4 -. Note that the previous algorithm cannot store more than one operator in the temporary storage, so we’re losing one operator in the calculation process and as a result, we would get 2 3 4 -. This cannot but upset, therefore, let’s try to add an improvement to our conversion algorithm.

  • While an expression has tokens take a token:
    • If the token is an operand then add it to the final sequence;
    • If the token is an operator then put it to some temporary storage. If the storage contains another operator, then we move the previously saved operator to the resulting sequence, and the new operator is saved to the storage.
  • Take the last saved operator from the storage and add it to the final sequence.
public class ShuntingYardAlgorithm
{
    public ShuntingYardAlgorithm()
    {
        _operatorStorage = null;
        _postfixNotationTokens = new List<IToken>();
    }

    public IEnumerable<IToken> Apply(IEnumerable<IToken> infixNotationTokens)
    {
        foreach (var token in infixNotationTokens)
        {
            ProcessToken(token);
        }
        return GetResult();
    }

    private void ProcessToken(IToken token)
    {
        switch (token)
        {
            case OperandToken operandToken:
                StoreOperand(operandToken);
                break;
            case OperatorToken operatorToken:
                PushOperator(operatorToken);
                break;
            default:
                var exMessage = $"An unknown token type: {token.GetType().ToString()}.";
                throw new SyntaxException(exMessage);
        }
    }

    private void StoreOperand(OperandToken operandToken)
    {
        _postfixNotationTokens.Add(operandToken);
    }

    private void PushOperator(OperatorToken operatorToken)
    {
        if (_operatorStorage.HasValue)
        {
            _postfixNotationTokens.Add(_operatorStorage.Value);
        }
        _operatorStorage = operatorToken;
    }

    private List<IToken> GetResult()
    {
        if (_operatorStorage.HasValue)
        {
            _postfixNotationTokens.Add(_operatorStorage.Value);
        }
        return _postfixNotationTokens;
    }

    private OperatorToken? operatorStorage;
    private readonly List<IToken> _postfixNotationTokens;
}

Beautiful! But what to do if operators have different priorities? 2 + 3 * 4 this expression should be converted into the following one 2 3 4 * +, but not to the expression 2 3 + 4 *. To further improve the algorithm, we also need to consider the expression 2 * 3 + 4 and its counterpart in the postfix notation 2 3 * 4 +. From where we can make a conclusion that our algorithm does not work properly only for the cases when the next operator has a higher priority than the previous one. Moreover, in the case when the next operator has a higher priority, the operators should, as it were, swap places in the resulting expression. So it must be inferred that our temporary storage should become a stack.

  • While an expression has tokens take a token:
    • If the token is an operand then add it to the final sequence;
    • If the token is an operator then put it to some temporary storage. If the storage contains another operator, then we move the previously saved operator to the resulting sequence, and the new operator is saved to the storage.
    • If the token is an operator:
      • While the stack is not empty and the top of the stack contains an operator with an equal or greater priority (>=), move the stored operator from the stack to the resulting sequence;
      • Store this new operator to the stack.
  • Take the last saved operator from the storage and add it the saved operators from the stack and move them to the final sequence.
public class ShuntingYardAlgorithm
{
    public ShuntingYardAlgorithm()
    {
        _operatorsStack = new Stack<OperatorToken>();
        _postfixNotationTokens = new List<IToken>();
    }

    public IEnumerable<IToken> Apply(IEnumerable<IToken> infixNotationTokens)
    {
        foreach (var token in infixNotationTokens)
        {
            ProcessToken(token);
        }
        return GetResult();
    }

    private void ProcessToken(IToken token)
    {
        switch (token)
        {
            case OperandToken operandToken:
                StoreOperand(operandToken);
                break;
            case OperatorToken operatorToken:
                PushOperator(operatorToken);
                break;
            default:
                var exMessage = $"An unknown token type: {token.GetType().ToString()}.";
                throw new SyntaxException(exMessage);
        }
    }

    private void StoreOperand(OperandToken operandToken)
    {
        _postfixNotationTokens.Add(operandToken);
    }

    private void PushOperator(OperatorToken operatorToken)
    {
        var operatorPriority = GetOperatorPriority(operatorToken);

        while (_operatorsStack.Count > 0)
        {
            var stackOperatorToken = _operatorsStack.Peek();
            var stackOperatorPriority = GetOperatorPriority(stackOperatorToken);
            if (stackOperatorPriority < operatorPriority)
            {
                break;
            }

            _postfixNotationTokens.Add(_operatorsStack.Pop());
        }

        _operatorsStack.Push(operatorToken);
    }

    private int GetOperatorPriority(OperatorToken operatorToken)
    {
        switch (operatorToken.OperatorType)
        {
            case OperatorType.Addition:
            case OperatorType.Subtraction:
                return 1;
            case OperatorType.Multiplication:
            case OperatorType.Division:
                return 2;
            default:
                var exMessage = "An unexpected action for the operator: " +
                    $"{operatorToken.OperatorType}.";
                throw new SyntaxException(exMessage);
        }
    }

    private List<IToken> GetResult()
    {
        while (_operatorsStack.Count > 0)
        {
            var token = _operatorsStack.Pop();
            _postfixNotationTokens.Add(token);
        }
        return _postfixNotationTokens.ToList();
    }

    private readonly Stack<OperatorToken> _operatorsStack;
    private readonly List<IToken> _postfixNotationTokens;
}

And as the final touch is to add support for brackets that can change the priority of the operators execution. Let’s consider the following expression (2 + 3 * 4) / 5 and a counterpart for the postfix notation 2 3 4 * + 5 /. In other words, the expression in the brackets must be calculated before division. Let’s extend the operator types from the previous article with the bracket operators:

public enum OperatorType
{
    Addition,
    Subtraction,
    Multiplication,
    Division,
    OpeningBracket,
    ClosingBracket
}

The closing bracket token is essentially a signal that notifies that a calculation is necessary at this point. Since the postfix calculator is sequential, the later an operator appears in the postfix notation, the later its execution occurs. Therefore, we extend the algorithm with the following rules:

  • While an expression has tokens take a token:
    • If the token is an operand then add it to the final sequence;
    • If the token is an operator (but not a bracket):
      • While the stack is not empty and the top of the stack contains an operator with an equal or greater priority (>=), move the stored operator from the stack to the resulting sequence;
      • Store this new operator to the stack.
    • If the token is an opening bracket then push it to the stack;
    • If the token is a closing bracket, then we take the operators from the stack and save them in the resulting sequence until we meet an opening bracket. If an opening bracket is not found, then we signal about the error in the expression.
  • Take the saved operators from the stack and move them to the final sequence. If there is an opening bracket among the operators, then we signal about the error in the expression.
public class ShuntingYardAlgorithm
{
    public ShuntingYardAlgorithm()
    {
        _operatorsStack = new Stack<OperatorToken>();
        _postfixNotationTokens = new List<IToken>();
    }

    public IEnumerable<IToken> Apply(IEnumerable<IToken> infixNotationTokens)
    {
        Reset();
        foreach (var token in infixNotationTokens)
        {
            ProcessToken(token);
        }
        return GetResult();
    }

    private void Reset()
    {
        _operatorsStack.Clear();
        _postfixNotationTokens.Clear();
    }


    private void ProcessToken(IToken token)
    {
        switch (token)
        {
            case OperandToken operandToken:
                StoreOperand(operandToken);
                break;
            case OperatorToken operatorToken:
                ProcessOperator(operatorToken);
                break;
            default:
                var exMessage = $"An unknown token type: {token.GetType().ToString()}.";
                throw new SyntaxException(exMessage);
        }
    }

    private void StoreOperand(OperandToken operandToken)
    {
        _postfixNotationTokens.Add(operandToken);
    }

    private void ProcessOperator(OperatorToken operatorToken)
    {
        switch (operatorToken.OperatorType)
        {
            case OperatorType.OpeningBracket:
                PushOpeningBracket(operatorToken);
                break;
            case OperatorType.ClosingBracket:
                PushClosingBracket(operatorToken);
                break;
            default:
                PushOperator(operatorToken);
                break;
        }
    }

    private void PushOpeningBracket(OperatorToken operatorToken)
    {
        _operatorsStack.Push(operatorToken);
    }

    private void PushClosingBracket(OperatorToken operatorToken)
    {
        bool openingBracketFound = false;

        while (_operatorsStack.Count > 0)
        {
            var stackOperatorToken = _operatorsStack.Pop();
            if (stackOperatorToken.OperatorType == OperatorType.OpeningBracket)
            {
                openingBracketFound = true;
                break;
            }

            _postfixNotationTokens.Add(stackOperatorToken);
        }

        if (!openingBracketFound)
        {
            throw new SyntaxException("An unexpected closing bracket.");
        }
    }

    private void PushOperator(OperatorToken operatorToken)
    {
        var operatorPriority = GetOperatorPriority(operatorToken);

        while (_operatorsStack.Count > 0)
        {
            var stackOperatorToken = _operatorsStack.Peek();
            if (stackOperatorToken.OperatorType == OperatorType.OpeningBracket)
            {
                break;
            }

            var stackOperatorPriority = GetOperatorPriority(stackOperatorToken);
            if (stackOperatorPriority < operatorPriority)
            {
                break;
            }

            _postfixNotationTokens.Add(_operatorsStack.Pop());
        }

        _operatorsStack.Push(operatorToken);
    }

    private int GetOperatorPriority(OperatorToken operatorToken)
    {
        switch (operatorToken.OperatorType)
        {
            case OperatorType.Addition:
            case OperatorType.Subtraction:
                return 1;
            case OperatorType.Multiplication:
            case OperatorType.Division:
                return 2;
            default:
                var exMessage = "An unexpected action for the operator: " +
                    $"{operatorToken.OperatorType}.";
                throw new SyntaxException(exMessage);
        }
    }

    private List<IToken> GetResult()
    {
        while (_operatorsStack.Count > 0)
        {
            var token = _operatorsStack.Pop();
            if (token.OperatorType == OperatorType.OpeningBracket)
            {
                throw new SyntaxException("A redundant opening bracket.");
            }
            _postfixNotationTokens.Add(token);
        }
        return _postfixNotationTokens.ToList();
    }

    private readonly Stack<OperatorToken> _operatorsStack;
    private readonly List<IToken> _postfixNotationTokens;
}

The full implementation can be found on GitHub (the coding-interview project). This will finish the review of the shunting-yard algorithm, and without wasting time, we’re moving on to the tokenization algorithm described in the next post.

Leave a Reply