Unions and sums are different

An important category of types are those which offer choice. If we have two types A and B we form a new type A + B which represents values coming from either A or B.

There are two major ways of doing this, however, and it’s critical to not confuse them. We’ll call the two ways “sum” (X + Y) and “union” (X \cup Y).

The easy way to tell the two apart

The simplest way to distinguish them is to consider what happens when you join a type with itself using both sum and union. In the case of union, the type T \cup T is exactly equivalent to T. In the case of sum, the type T + T is distinct. When we examine values of type T + T we see them as values of type T but we also know which “side” of the sum they came from.

Both kinds of choice are meaningful

At first blush, there’s no reason to prefer either sum or union. Both are natural ideas.

More than that, if you only had one you could easily replicate the other. You get unions from sums by ignoring the “sidedness” information. You get sums from unions by unioning together “tagged” versions of the types.

X + Y = { (x, 0) | x : X } \cup { (y, 1) | y : Y }

The intuition which best supports union is the one you get when you think of types as “sets of values”. It’s very easy to think of types like int | bool as the set of integers joined up with the set of booleans. We simply want to state the idea that we’re considering values of either form.

On the other hand, sums make algebraic sense. The size of the type A + B is the sum of the sizes of A and B. This holds because sums ensure that no funny business occurs when two types overlap.

Overlaps between types

In a lot of practical examples we don’t think of there being an overlap between types. The types int and bool share no common values. This is one of the reasons why it’s sometimes tough to tell the difference between unions and sums. It can be a little difficult to think of an example where types overlap.

This is the genesis of my heuristic at the top of the article. It’s clear that a type overlaps entirely with itself.

On the other hand, overlaps between types can quickly become a difficult subject when working with parametric types.

Sums are better for abstraction

Consider the standard Option type Option[A] is either a value of type A or a sentinel value suggesting “missingness”. It’s clearly a form of choice, so we can consider using sums or unions.

The right choice is to use a sum. To see why, consider what happens when we (a) implement it with union and (b) write generic functions. For instance, consider the function getOrElse which strips the optionality off by using a default value instead of missingness.

def getOrElse[A](opt: Option[A], default: A): A = {
  if isMissing(opt) then default else opt
}

The point of this function is that after we’ve called it we know for certain that the optionality has been handled. This code seems fine, but it can quickly become hairy if we try to call it on a nested Option.

getOrElse[Optional[Int]](Missing, default)

What does that Missing value entail? Do we want to return Missing or whatever the default value is?

It’s actually totally unclear. Since the union collapses intersections between types, Option[Option[A]] is indistinguishable from Option[A]. We don’t know if the inner or outer layer is the one that’s caused the missingness.