Parsing Math Expressions in C#. Postfix Notation

Well, hello, dear reader!

Who else but we, software engineers, that burned out by the life and burned down by the light of the screen, knows how contradictory, difficult and unjust this world of technical interviews is! Whether your solutions launch spaceships or your service keeps millions of requests, an experienced interviewer will always bring you to light. Aren’t you able to immediately recall the difference between the merge sort and insertion sort, don’t you know all the SOLID principles by heart? Oh, my friend, unlike you, the interviewer has prepared more seriously and in the morning has repeated these important topics. Your incompetence is obvious. See you! Bye!

For sure, next time these questions will not take you by surprise, you are amazed yourself if a person cannot immediately implement the heapsort or does not remember by heart all the rules for balancing red-black trees. You are invited to the interview again and isn’t it just an amazing coincidence that here is the same interviewer? It is the moment of truth, now you know everything about sorting, you’re ready to start, while suddenly you see the problem statement: “Implement a parser for math expressions”.

Wholly unexpectable? Sure. So, I suggest right here and now to deal with the theme. Let’s begin!

Problem statement: implement a parser for mathematical expressions.

Of course, it’s quite inconvenient and difficult to work with the formulation, so it’s desirable to follow the rule: discuss constraints first, then proceed with an implementation. If you want to put the interviewer in an ecstatic state then it’s worth proposing to register the constraints as a set of tests.

We’re introducing the following set of constraints:

  • An expression cannot contain variables or named constants; it consists only of numbers, arithmetic signs, and brackets;
  • The following arithmetic signs are allowed: +, -, *, /;
  • Only round brackets are allowed;
  • If the expression contains a syntax or computational error, then it is necessary to return an error message.

The difficulty of parsing math expressions lies in the fact that mathematical operations have different priorities, which can be also changed by using brackets. Does it sound complicated? In fact, it’s enough to convert the expression from the infix notation to the postfix notation using the shunting-yard algorithm, do the calculation and that’s it! It would seem that we can end the post at this, but most likely the evil interviewer will want to delve into the details, or even worse, will require implementation. Therefore, we continue.

Infix notation is a form of writing mathematical expressions where operators (in our case, arithmetic signs) are written between the operands (in our case, numbers) to which they are applied. The order of calculations is determined by the priority of the operators, but can be adjusted by brackets. Thus, the infix notation is the standard form of expression writing that we were taught back in school, for example: 1 + (2 - 3) * 4.

Postfix notation is a form of writing mathematical expressions where operators are written after the operands to which they are applied. For this notation, there is no priority for operations, and therefore there are no brackets. For example, the expression above corresponds to one of the following expressions in the postfix notation: 2 3 - 4 * 1 + or 4 2 3 - * 1 +.

The full implementation in C# can be found on GitHub (the coding-interview project), but for now, let’s analyze the algorithm for evaluating an expression written in the postfix notation. Let’s create a class that receives a sequential set of tokens (operators and operands) as the input, and as the output, it returns a result in the form of a calculated operand.

public class PostfixNotationCalculator
{
    public OperandToken Calculate(IEnumerable<IToken> tokens)
    {
        foreach (var token in tokens)
        {
            ProcessToken(token);
        }
        return GetResult();
    }

    private void ProcessToken(IToken token)
    {
        switch (token)
        {
            case OperandToken operandToken:
                StoreOperand(operandToken);
                break;
            case OperatorToken operatorToken:
                ApplyOperator(operatorToken);
                break;
            default:
                var exMessage = $"An unknown token type: {token.GetType().ToString()}";
                throw new SyntaxException(exMessage);
        }
    }

    private void StoreOperand(OperandToken operandToken)
    {
        // Discussed below
    }

    private void ApplyOperator(OperatorToken operatorToken)
    {
        // Discussed below
    }

    private OperandToken GetResult()
    {
        // Discussed below
    }
}

public interface IToken { }

public class OperandToken : IToken
{
    public double Value { get { return _value; } }

    public OperandToken(double value)
    {
        _value = value;
    }

    private readonly double _value;
}

public enum OperatorType
{
    Addition,
    Subtraction,
    Multiplication,
    Division
}

public class OperatorToken : IToken
{
    public OperatorType OperatorType { get { return _operatorType; } }

    public OperatorToken(OperatorType operatorType)
    {
        _operatorType = operatorType;
    }

    private OperatorType _operatorType;
}

public class MathExpressionException : Exception
{
    public MathExpressionException(string message) : base(message)
    {
    }
}

public class SyntaxException : MathExpressionException
{
    public SyntaxException(string message) : base(message)
    {
    }
}

If a token is an operand then we add it to the operands stack.

public class PostfixNotationCalculator
{
    public PostfixNotationCalculator()
    {
        _operandTokensStack = new Stack<OperandToken>();
    }

    // ...

    private void StoreOperand(OperandToken operandToken)
    {
        _operandTokensStack.Push(operandToken);
    }

    // ...

    private Stack<OperandToken> _operandTokensStack;
}

If the token is an operator, then we get the necessary number of operands from the stack, perform the calculation and push the result of the calculation back to the operands stack. If the stack does not have the necessary number of elements for applying the operator, then we return an error.

public class PostfixNotationCalculator
{
    // ...

    private void ApplyOperator(OperatorToken operatorToken)
    {
        switch (operatorToken.OperatorType)
        {
            case OperatorType.Addition:
                ApplyAdditionOperator();
                break;
            case OperatorType.Subtraction:
                ApplySubtractionOperator();
                break;
            case OperatorType.Multiplication:
                ApplyMultiplicationOperator();
                break;
            case OperatorType.Division:
                ApplyDivisionOperator();
                break;
            default:
                var exMessage = $"An unknown operator type: {operatorToken.OperatorType}.";
                throw new SyntaxException(exMessage);
        }
    }

    private void ApplyAdditionOperator()
    {
        var operands = GetBinaryOperatorArguments();
        var result = new OperandToken(operands.Item1.Value + operands.Item2.Value);
        _operandTokensStack.Push(result);
    }

    private void ApplySubtractionOperator()
    {
        var operands = GetBinaryOperatorArguments();
        var result = new OperandToken(operands.Item1.Value - operands.Item2.Value);
        _operandTokensStack.Push(result);
    }

    private void ApplyMultiplicationOperator()
    {
        var operands = GetBinaryOperatorArguments();
        var result = new OperandToken(operands.Item1.Value * operands.Item2.Value);
        _operandTokensStack.Push(result);
    }

    private void ApplyDivisionOperator()
    {
        var operands = GetBinaryOperatorArguments();
        var result = new OperandToken(operands.Item1.Value / operands.Item2.Value);
        _operandTokensStack.Push(result);
    }

    private Tuple<OperandToken, OperandToken> GetBinaryOperatorArguments()
    {
        if (_operandTokensStack.Count < 2)
        {
            var exMessage = "Not enough arguments for applying a binary operator.";
            throw new SyntaxException(exMessage);
        }

        var right = _operandTokensStack.Pop();
        var left = _operandTokensStack.Pop();

        return Tuple.Create(left, right);
    }

    // ...
}

It remains only to add the returning of a result, i.e. the only left element in the stack. If after the processing of all the tokens there is more than one element in the stack, this says about the presence of a mismatch between operators and operands. If there are no elements in the stack, then this means that an empty sequence has been fed to the input.

class PostfixNotationCalculator
{
    // ...

    private OperandToken GetResult()
    {
        if (_operandTokensStack.Count == 0)
        {
            var exMessage = "The expression is invalid." +
                " Check, please, that the expression is not empty.";
            throw new SyntaxException(exMessage);
        }

        if (_operandTokensStack.Count != 1)
        {
            var exMessage = "The expression is invalid." +
                " Check, please, that you're providing the full expression and" +
                " the tokens have a correct order.";
            throw new SyntaxException(exMessage);
        }

        return _operandTokensStack.Pop();
    }

    // ...
}

At this, I need to say goodbye while in the next post we will analyze the implementation of the shunting-yard algorithm.

Leave a Reply