Category Theory and Software Engineering

February 5, 2014

This is a write-up for the book Categories for the Working Mathematician written by Saunders Mac Lane.  Someone indicated to me that they would be interested in seeing the write-up in blog format.  So here we go.

Initially, I didn’t think that category theory was necessarily something that really fits in well with your typically expected professional development.  I only started researching it for personal reasons because I stumbled across it while investigating type systems.  Unfortunately, certain developments[1] hint that category theory may end up being a lot more important for computer science, and by extension software engineering, than the seemingly endless waves of “monads are a burrito” blog postings, oh so popular a couple years ago, suggested.

First up.  Can we ignore category theory if we avoid esoteric programming languages that no serious, professional software engineer would be caught dead using in production?  After all we use respectful languages like C# that have the backing of massive multinational corporations like Microsoft.  With synergies like this who needs categories right?  Well, I have some bad news for you.  LINQ has category theory embedded into its core[2], so unless you’re interested in using archaic versions of C# it’s already too late.

If category theory really is so vital to LINQ, why haven’t we already started running into problems?  Well, things get kind of weird here.  Even though there are some very interesting parallels between math and programming, I believe that mathematicians and programmers have their interests piqued in radically different ways.  Mathematicians, far as I can tell, are very interested in proofs, properties, and relationships.  Things get very abstract.  As a result there were a bunch of proofs lying around in category theory that were minding their own business when some rather clever researchers showed up and realized they could exploit them in programming languages.  You can get surprisingly far by simply checking the documentation and then believing that the thing actually works correctly.  However, the reason why they built LINQ in the first place was because they had category theory guiding them.

Now, I think there is a very good reason why we need to have at least a base level of understanding of category theory.  We write high quality custom software.  This puts us in at least an implicit advisory role for our clients.  Part of our professional duty is to be able to differentiate between the usage of legitimate software engineering techniques and baseless hype.  Category theory is new enough to and different enough from typical computer science that we should expect a mix of quality improvements and malformed anti-patterns, which are difficult to validate to the inexperienced.  Our job is to utilize advancements that provide our clients with real value while avoiding the pitfalls born from excess excitement and lack of experience.

Also consider that we are currently in an era where there exists well known and powerful organizations that have a multinational influence, which have downright embarrassing failures due to software.  Software correctness and security is going to be much more important moving forward and category theory has some interesting avenues to achieve this with experimentation already in progress.  I would much rather be one of the people ushering in a new era of software reliability as opposed to being one of the people watching it from the sidelines.

With that massive justification out of the way, what does this book actually offer to the software engineer interested in checking out category theory.  To date this is my favorite source of information of category theory.  I believe this is because the goal of this book is describing category theory to mathematicians.  They’re not trying to dumb down the material or only choosing to talk about the aspects as they relate to computer science and/or software engineering.  So in other words, this book really feels like it’s treating you like an adult.  Giving you category theory as it actually is.  Of course the down side of this is that they expect you to be a mathematician.  And with respect to that, they’re pretty unforgiving.

Before stepping into this book you should probably checkout group theory[3] and ring theory[4].  Not to mention you should also have some basic set theory knowledge and be willing to look up what different mathematical notation means.  Category theory started off as a generalization of algebraic topology[5], so it might be useful to check that out as well.  Although I was able to get what I was interested in out of this book without a lot of topology knowledge, so your mileage may vary.  As a side note, algebraic topology might also have important consequences for software engineering (specifically homotopy).  However, these developments are very new, so advantages to software engineering may not be forthcoming.  On the other hand, if you happen to like algebraic topology, you might consider spending some time with it because there’s a chance it might become very relevant.

Now you can wade into the category theory book itself.  A lot of it is going to be very difficult to get up to speed on although I suspect that almost definitely doesn’t matter.  I mentioned earlier that mathematicians are interested in different things than programmers are.  A category theorist is going to be interested in pushing the bounds of what’s possible in the field of category theory.  Showing a new relation, finding a new lemma, proving a new property.  If you’re only interested in utilizing preexisting results of category theory in software engineering, then you can get away with surface level knowledge and just the gist of things in a lot of cases.  Trust that the original mathematicians’ proofs hold.  At the very least I’m a hobbyist beginner to this topic and I can already see why category theory is being utilized in the world of software.  By the way, save yourself a lot of time and confusion and read this wiki page [6] before you try to figure out your first diagram.

As far as I can tell, the interesting software developments are making use of functors, monads, and F-algebras[8][9].  So if you can manage to get the gist of these constructs with your initial foray into category theory, that’s good progress.  Personally, I found Kleisli categories useful for understanding why monads actually work in programming languages, but that’s really a proof type of thing.  If you are willing to just believe that they work out, then you can skip this part at first.  Similarly, certain people from Microsoft research[7] believe that monoidal categories are important for writing correct software.  However again, if you know how to use complicated generic functions (like the ones provided with LINQ) and you’re willing to trust that Microsoft is providing a functionally correct programming language then you can skip some of this initially.

[1] – https://research.microsoft.com/en-us/um/people/simonpj/papers/ndp/haskell-beats-C.pdf

[2] – https://www.infoq.com/interviews/LINQ-Erik-Meijer

[3] – https://en.wikipedia.org/wiki/Group_theory

[4] – https://en.wikipedia.org/wiki/Ring_theory

[5] – https://en.wikipedia.org/wiki/Algebraic_topology

[6] – https://en.wikipedia.org/wiki/Commutative_diagram

[7] – https://www.youtube.com/watch?v=ZhuHCtR3xq8

[8] – https://en.wikipedia.org/wiki/F-algebra