Unit Measurement Types

The Mars Climate Orbiter met an unfortunate fate in 1999 when it unintentionally crashed into the surface of Mars. The cause? Misinterpreting the unit of measurement. This is hardly an isolated incident. Many software projects of non-trivial size struggle with a good solution for dealing with the cognitive overhead of keeping track of what unit our data is measured in. From the simple example of having to remember what time unit Thread.Sleep takes to the more complex issue of the Orbiter that sent United States customary units to a software module that expected SI units.

I'm sure we've all imagined solving the problem manually, but the traditional Object Oriented solution leaves something to be desired:

class Second {
    public double Value { get; set; }
    public override +( Second a, Second b ) { 
        return new Second( a.Value + b.Value ); 
    }
    ...
}
  • There can be size issues with replacing a double with a class.
  • There can be memory issues with replacing a double with a class.
  • There can be allocation issues with overriding your operators with new ones that allocate a new object with every operation.
  • And it's not easy to stop the dreaded OnlyTakesSeconds(new Second(minute.Value)) workaround from those developers who insist that they "know what they're doing".

F# is one of the few languages that supports built in Measurement types.

[<Measure>] type second
[<Measure>] type minute

let onlyTakesSeconds (s : float<second>) = ...

onlyTakesSeconds( 5.0<second> ) // yes
onlyTakesSeconds( 5.0<minute> ) // compilation failure

But it doesn't stop there. F# also supports common measurement operators at compile time. Consider what happens if you end up doing physics operations:

class Second { ... }
class Meter { ... }
...
public ? CalculateVelocity( Meter distance, Second timeElapsed ) { ... }
...
var x = new Second( 5 );
var y = new Meter( 5 );

So, we're going to have to manually add a velocity class. And we're going to have to make sure that all calculations of velocity happen in a controlled location so that they aren't done incorrectly and mistyped and then sent into the system. Are we also going to need to create an acceleration class? And then I guess also a velocity type for kilometers per second. And maybe seconds squared.

Consider F#:

[<Measure>] type second
[<Measure>] type meter
[<Measure>] type kilometero

let dist = 5.0<meter>
let time = 10.0<second>
let vel = dist / time

let takeVelocity (v : float<meter/second>) = ...
...
takeVelocity vel

F# automatically creates the type meter/seconds. Similarly, you can perform operations on your unit bearing types and get the corresponding units automatically applied to the resulting output like: meter * second or meter / second ^ 2.

While writing this blog post, I wanted to make sure that I was getting all of the F# components correct, so I was typing my examples up in an editor and compiling the results. I made several physics mistakes and the compiler was quick to correct me.

compiler: hey, velocity divided by time is meters per seconds squared not meters
me: "Right, duh, I was supposed to do velocity * time to get back to meters."

This feature is the real deal and you want it if you need to write code in any domain that requires you to think about the units of your data.

Computational Expressions

Computational expressions can be thought of as a programmable semicolon.

class Option<T> { 
    bool HasItem;
    T Item;
}

Option<int> Retrieve(string address) { ... }

Option<int> Compute() {
    var item1 = Retrieve( 'location 1' );
    
    if ( !item1.HasItem ) { return Option<int> { HasItem = false }; }
    
    var item2 = Retrieve( 'location 2' );
    
    if ( !item2.HasItem ) { return Option<int> { HasItem = false }; }
    
    var item3 = Retrieve( 'location 3' );
    
    if ( !item3.HasItem ) { return Option<int> { HasItem = false }; }

    return item1.Item + item2.Item + item3.Item;
}

Each time we call Retrieve we need to verify that the item returned is actually an item that we can use. The repeated pattern of checking, returning early if no item is present, and then only using the item is something that we could encode to occur at the end of each statement. Or you could say that you're programming the semicolon.

Option<T> Return<T>( T t ) { return Option( t ); }
Option<S> Bind<T, S>( Option<T> ma, Func<T, Option<S>> f ) {
    if ( !ma.HasItem ) {
        return Option<S> { HasItem = false };
    }
    return f( ma.Item );
}
Option<int> Compute() {
    return Bind( Retrieve( 'location 1' ), item1 => 
           Bind( Retrieve( 'location 2' ), item2 =>
           Bind( Retrieve( 'location 3' ), item3 => 
           Return( item1 + item2 + item3 ) ) ) );
}

The Bind pattern can also abstracted. That's what the computation expression is all about.

type OptionBuilder() =
    member this.Bind( ma, f ) = 
        match ma with
        | Some(a) -> f a
        | None -> None
    member this.Return t = Some(t)

let optionBuilder = OptionBuilder()
    
let compute = optionBuilder { let! item1 = retrieve( 'location 1' )
                              let! item2 = retrieve( 'location 2' )
                              let! item3 = retrieve( 'location 3' )
                              return item1 + item2 + item3
                            }

Automatically deciding if the option type that you have contains a legitimate item is useful, but it's also hardly the only thing you can do with a computational expression.

  • Asynchronous
  • Nondeterministic
  • Database queries
  • Continuations
  • Logging

Computational expressions provide a powerful way to abstract patterns in your code and can allow you to introduce powerful concepts that would otherwise be awkward to work with.

Algebraic Data Types

In the battle for program correctness the two best tools that we have are 1) type systems and 2) algebraic data types with pattern matching.

F# Algebraic Data Types

The first thing to consider is the OR case.

type OrCase = 
    | FirstCase
    | SecondCase

This allows us to represent any data that has multiple possible modes. You can think of it like an enum type. As with enums we can use match to create a sort of switch statement.

let takeOrCase (o : OrCase) =
    match o with
    | FirstCase -> "first case encountered"
    | SecondCase -> "second case encountered"

You can also parameterize constructors for the type.

type ParamCase =
    | Box of int

And with pattern matching you can destructure this value.

let takeParamCase (p : ParamCase) = 
    match p with
    | Box(v) -> v

You can also pattern match on both the inner value as well as the outer value.

let takeParamCase (p : ParamCase) =
    match p with
    | Box(5) -> "five"
    | Box(v) -> "other value"

We saw parameterizing with one value, but we can also parameterize with multiple values.

type AndCase =
    | Case of int * float * char

let takeAndCase (a : AndCase) =
    match a with
    | Case(i, f, c) -> ...

F# also supports using generic types in an algebraic data type.

type Generic<'a> = 
    | Box of 'a

let takeGeneric (g : Generic<'a>) =
    match g with
    | Box(v) -> v

And you can also combine everything we've done previously including parameterizing your constructors with types you created even recursively.

type List<'a> =
    | Cons of 'a * List<'a>
    | Nil
    
type Option<'a> =
    | Some of 'a
    | None
    
type MyType<'a, 'b> =
    | FirstCase of 'a * Option<'b>
    | SecondCase of List<'b>
    | ThirdCase of int
    | FourthCase

Accurate Data Description

We've seen what algebraic data types are, but what about that claim that this is one of the best tools for program correctness. The first thing to consider is that algebraic data types provide accurate data description. Consider what regular expressions provide:

ab          // 'a' and 'b'
a|b         // 'a' or 'b'
a*          // zero or more of 'a'
b?          // zero or one of 'a'

Consider what you can get out of predicate logic:

∧           // conjunction
∨           // disjunction
→           // implication
∀           // forall

Set theory:

∩           // intersection
∪           // union

Type theory:

+           // sum 
*           // product
->          // function
()          // unit
type x<'a>  // generic 

There are a lot of mathematical domains and they all have ways of representing and constructing objects inside of them. What you'll often see, though, is a way to represent multiple things together (and), a way to represent one thing or some other thing (or), a way to represent some sort of value to be filled in (generic), and a way to represent transformation (function, implication).

Algebraic data types provide mathematical like primitives (the 'algebraic' in the name isn't just for show) that allow for a direct encoding of domain objects for a wide variety of data. Direct encoding is important because it makes clear what part of the program is representing domain objects and what part of the program is setting up infrastructure to represent the domain objects.

Case Analysis

We already talked about how Algebraic Data Types have some very useful mathematical properties with how you construct data. However, they also have another useful property that reflects what we find in mathematics. Proof by cases.

A fairly widely used proof technique is to prove multiple cases individually and then conclude that something is true if you can show that all possible cases are handled. Algebraic Data Types provide this with the accompanying feature: pattern matching.

match x with
| Case1 -> ...
| Case2 -> ...
| Case3(y, InnerCase1) -> ...
| Case3(y, InnerCase2) -> ...

F# has exhaustive pattern matching, so it will determine that your patterns handle all possible cases. The result is that if you represent all of your data with sufficiently accurate Algebraic Data Types, then the code that uses them will always have a compelling argument that it properly addresses all possible states of the data.