Parsing Math Expressions in C#. Tokenization

The science is constantly evolving, the technology is rapidly breaking into our lives, new tools and opportunities are appearing, the only one thing that is permanent is the gloomy expression of the face of your interviewer. And since we’ve figured out how to read the emotions of the interviewer, then we still have problems with reading mathematical expressions. Therefore, in this post, we will analyze the simplest topic in the context of this task – splitting a source string into separate tokens.

As has been mentioned in the previous posts about the postfix notation and shunting-yard algorithm. We have the following types of tokens:

Arithmetic signs: +, -, *, /;
Round brackets (, );
Real numbers; for simplicity let’s use only integer numbers and the numbers with a dot delimiter;
Space is the only delimiter for tokens.

Let’s create a body for the main method in C#. We’re going to read an expression character by character and process them in-place. For our case, it’s enough while other lexical analyzers can require more complex approaches. In the very book, this topic is described quite well.

public class Tokenizer
{
    public IEnumerable<IToken> Parse(string expression)
    {
        foreach (char next in expression)
        {
            FeedCharacter(next);
        }
        return GetResult();
    }
}

First, we’re implementing the simplest case: an input line contains only one number, there are neither spaces no operators:

While there are characters read a character and add it to special storage;
If the storage is not empty then create a number based on the characters and add the number to the resulting sequence.

public class Tokenizer
{
    public Tokenizer()
    {
        _valueTokenBuilder = new StringBuilder();
        _infixNotationTokens = new List<IToken>();
    }

    public IEnumerable<IToken> Parse(string expression)
    {
        foreach (char next in expression)
        {
            FeedCharacter(next);
        }
        return GetResult();
    }

    private void FeedCharacter(char next)
    {
        _valueTokenBuilder.Append(next);
    }

    private IToken CreateOperandToken(string raw)
    {
        if (double.TryParse(raw, out double result))
        {
            return new OperandToken(result);
        }

        throw new SyntaxException($"The operand {raw} has an invalid format.");
    }

    private IEnumerable<IToken> GetResult()
    {
        if (_valueTokenBuilder.Length > 0)
        {
            var token = CreateOperandToken(_valueTokenBuilder.ToString());
            _valueTokenBuilder.Clear();
            _infixNotationTokens.Add(token);
        }

        return _infixNotationTokens.ToList();
    }

    private readonly StringBuilder _valueTokenBuilder;
    private readonly List<IToken> _infixNotationTokens;
}

Let’s complicate the task a bit by allowing the use of several numbers separated by spaces. Then the conversion into tokens can be represented as follows:

While there are characters take a character:
- If the character is a separator (a space in our case), then convert the previously saved characters in the storage to a number and add the number to the resulting sequence;
- Otherwise, add the character to the numbers storage.
If the storage contains characters then convert the characters to a number and add the number to the resulting sequence.

public class Tokenizer
{
    public Tokenizer()
    {
        _valueTokenBuilder = new StringBuilder();
        _infixNotationTokens = new List<IToken>();
    }

    public IEnumerable<IToken> Parse(string expression)
    {
        foreach (char next in expression)
        {
            FeedCharacter(next);
        }
        return GetResult();
    }

    private void FeedCharacter(char next)
    {
        if (IsSpacingCharacter(next))
        {
            if (_valueTokenBuilder.Length > 0)
            {
                var token = CreateOperandToken(_valueTokenBuilder.ToString());
                _valueTokenBuilder.Clear();
                _infixNotationTokens.Add(token);
            }
        }
        else
        {
            _valueTokenBuilder.Append(next);
        }
    }

    private bool IsSpacingCharacter(char c)
    {
        switch (c)
        {
            case ' ':
                return true;
            default:
                return false;
        }
    }

    private IToken CreateOperandToken(string raw)
    {
        if (double.TryParse(raw, out double result))
        {
            return new OperandToken(result);
        }

        throw new SyntaxException($"The operand {raw} has an invalid format.");
    }

    private IEnumerable<IToken> GetResult()
    {
        if (_valueTokenBuilder.Length > 0)
        {
            var token = CreateOperandToken(_valueTokenBuilder.ToString());
            _valueTokenBuilder.Clear();
            _infixNotationTokens.Add(token);
        }

        return _infixNotationTokens.ToList();
    }

    private readonly StringBuilder _valueTokenBuilder;
    private readonly List<IToken> _infixNotationTokens;
}

The last thing we need to add is a processor for operators. As in our problem statement, every operator can be represented by only a single character then when we meet an operator character during the processing we can convert it and put it to the resulting sequence at once. Then the final algorithm looks like this:

While there are characters take a character:
- If the character is a separator (a space in our case), then convert the previously saved characters in the numbers storage to a number and add the number to the resulting sequence;
- If the character is an operator then convert the previously saved characters in the numbers storage to a number and add the number to the resulting sequence, after which add the operator token to the resulting sequence;
- Otherwise, add the character to the numbers storage.
If the storage contains characters then convert the characters to a number and add the number to the resulting sequence.

public class Tokenizer
{
    public Tokenizer()
    {
        _valueTokenBuilder = new StringBuilder();
        _infixNotationTokens = new List<IToken>();
    }

    public IEnumerable<IToken> Parse(string expression)
    {
        foreach (char next in expression)
        {
            FeedCharacter(next);
        }
        return GetResult();
    }

    private void FeedCharacter(char next)
    {
        if (IsSpacingCharacter(next))
        {
            if (_valueTokenBuilder.Length > 0)
            {
                var token = CreateOperandToken(_valueTokenBuilder.ToString());
                _valueTokenBuilder.Clear();
                _infixNotationTokens.Add(token);
            }
        }
        else if (IsOperatorCharacter(next))
        {
            if (_valueTokenBuilder.Length > 0)
            {
                var token = CreateOperandToken(_valueTokenBuilder.ToString());
                _valueTokenBuilder.Clear();
                _infixNotationTokens.Add(token);
            }

            var operatorToken = CreateOperatorToken(next);
            _infixNotationTokens.Add(operatorToken);
        }
        else
        {
            _valueTokenBuilder.Append(next);
        }
    }

    private bool IsOperatorCharacter(char c)
    {
        switch (c)
        {
            case '(':
            case ')':
            case '+':
            case '-':
            case '*':
            case '/':
                return true;
            default:
                return false;
        }
    }

    private bool IsSpacingCharacter(char c)
    {
        switch (c)
        {
            case ' ':
                return true;
            default:
                return false;
        }
    }

    private IToken CreateOperandToken(string raw)
    {
        if (double.TryParse(raw, out double result))
        {
            return new OperandToken(result);
        }

        throw new SyntaxException($"The operand {raw} has an invalid format.");
    }

    private OperatorToken CreateOperatorToken(char c)
    {
        switch (c)
        {
            case '(': return new OperatorToken(OperatorType.OpeningBracket);
            case ')': return new OperatorToken(OperatorType.ClosingBracket);
            case '+': return new OperatorToken(OperatorType.Addition);
            case '-': return new OperatorToken(OperatorType.Subtraction);
            case '*': return new OperatorToken(OperatorType.Multiplication);
            case '/': return new OperatorToken(OperatorType.Division);
            default:
                throw new SyntaxException($"There's no a suitable operator for the char {c}");
        }
    }

    private IEnumerable<IToken> GetResult()
    {
        if (_valueTokenBuilder.Length > 0)
        {
            var token = CreateOperandToken(_valueTokenBuilder.ToString());
            _valueTokenBuilder.Clear();
            _infixNotationTokens.Add(token);
        }

        return _infixNotationTokens.ToList();
    }

    private readonly StringBuilder _valueTokenBuilder;
    private readonly List<IToken> _infixNotationTokens;
}

As always, the full implementation can be found on the GitHub of the coding-interview project, while with the post we are finishing the topic of parsing mathematical expressions. I hope that everything has been informative and interesting. See you!

Categories:Coding Interview

Tags:C#Mathematical Expressions Parser

itdranik

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_WSLJD2DY1N	2 years	This cookie is installed by Google Analytics.

Cookie	Duration	Description
wp_api	past	Cookies used to run and administer the WordPress blog, including session and time.
wp_api_sec	past	Cookies used to run and administer the WordPress blog, including session and time.

IT Dranik

Parsing Math Expressions in C#. Tokenization

Like this:

Leave a ReplyCancel reply

Share this:

Like this:

Leave a ReplyCancel reply