Simple Parsers for Simple Problems

work safe

One minor detail I’ve noticed about working on the Advent of Code problems this year is the ubiquity of parsing. Every problem has an input string. It’s actually a lesson I learn every year as I build up my library of methods of processing text.

Some years, it’s fairly straight forward. I understand the text handling in C# fairly well. In others, it is more complex. Swift seems to want you to build an application were the input comes from a structured source like JSON or a database as it uses more complex APIs for splitting strings and such. Elm, while lovely, doesn’t have a clean mechanism for just reading in a static file.

Do I need a parser?

This year, I’m working in C#, but I still noticed there was an anti-pattern I saw that kept cropping up. I was reading input by counting lines or repeated splits. Look at the example data for the Day 5 problem.

0,9 -> 5,9
1,2 -> 1,6

Each line has a definition of a vent line. The vent lines are each pairs of points separated by the ” ” string. I was using string.Split(string n) to pry the data apart. Another option is using Regular Expressions to match the parts of each row. I wanted a more structured method of handling this.

Parser Combinator

In other languages, I’ve worked Monadic Parser Combinators. They are libraries that let you write small parsers that can be combined with other small parsers to make larger and more complex parsers.

What is a parser? It is just a function or object that turns a string in to a more structured or typed piece of data.

I found Sprache. Sprache is a parser combinator library that lets you build a definition of your expected string input and how those pieces get translated in to data.

Building Our Parser

Let’s start with a point parser that generates an instance of Point from the input string x,y.

static Parser<Point> _pointParser =
    from x in Parse.Decimal.Select(int.Parse)
  from _ in Parse.Char(',')
  from y in Parse.Decimal.Select(int.Parse))
  select new Point(x,y);

You see that we can use a number of built in helpers and now I have a parser that I can use for any point string to build a valid type.

Point point = _pointParser.Parse("5,4");
point.X.Should().Be(5);
point.Y.Should().Be(4);

Since Parsers can be combined, we can build out a vent line definition.

static Parser<VentLine> _lineParser =
    from p1 in _pointParser
  from _ in Parse.String(" -> ");
  from p2 in _pointParser
  select new VentLine(p1,p2);

Now each line can be parsed in to a vent line. Sweet!

But each file specifies a list of vent lines separated by new line characters

static Parser<IEnumerable<VentLine>> _listParser =
    (from ventLine in _lineParser
   from _ = Parse.String("\n").Maybe()
   select ventLine).Many();

This is all we need to read the file.

var vents = _listParser.Parse(File.ReadAllText(filename))
                    .ToArray();

Solving the problem is now an exercise left to the reader. But at least the parser was easy!

More Parsers to Check Out

One of my coworkers who checked this post out had a couple of things he wanted me to add about choosing your parser library in the .NET ecosystem. For C# users, he suggested Sprache might not be the most performant. Superpower is based on Sprache and focuses on better error messages and better performance. Pidgin is a different lineage of parser library focusing on speed. I’ve been told I should really check out Pidgin once I have some spare time. And as a final note, If I was using F#, I would definitely be looking at FParsec, a port of the Haskell Parsec library that defined the genre of Parser Combinators.

No Comments

Leave a Reply

Allowed tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>