Category theory: abstracting mathematical construction (essay)
Definition. A category consists[note 1] of the following data:
- A class[note 2] , called the "objects" of the category
- For any two objects , a set , called the "morphisms from to ." We take to be a statement which means the exact same thing as the statement .
- For any two morphisms and , a function which we call "composition." That is, we can compose and to get an element ,
subject to the following conditions:
- For any object , there exists an "identity" morphism , satisfying for any and for any .
- Composition is associative: .
♦
One example of a category---the example, in fact---is , the category of sets. The objects of are sets, the morphisms of are functions, and composition of morphisms of is composition of functions: The identity morphism of a set is the function which does nothing.
Another example of a category is "the discrete category with 2 elements," which I shall call . The objects are a set consisting of 2 elements (for concreteness we could take where and ).[note 3] The morphisms of are the identity elements, and nothing else. More formally, we have, , and .
Another example of a category is the category of groups. Here, the objects are groups, and the morphisms are group homomorphisms.
Another example of a category is the (naive) category of topological spaces. Here, the objects are topological spaces, and the morphisms are continuous maps.
We could go on like this ad infinitum. There are a huge variety of mathematical objects that form categories. Rings, graphs, vector spaces, representations, algebraic varieties, the points of the sphere, the open subsets of the plane, the natural numbers, compact Hausdorff totally disconnected spaces, etc etc. There is even a category whose objects are categories themselves. In fact, every mathematical object forms a category,[note 4] though perhaps not in a useful way. Part of modern mathematical folklore is that you should always "categorify" whatever mathematical objects it is that you are working with, i.e. you should find a way to define everything in a category-theoretic way.
Note that the definition I gave of categories is set theoretic. Shea has reminded me that category theory can be taken as an alternative foundation to set theory (see here for Lawvere's attempt at that). I have barely studied Lawvere's ideas, but I am skeptical that it is useful as a foundation. One reason why is that when it comes to actually proving things, set theory is much easier than category theory, which wouldn't be the case if the latter was truly foundational (and hence integral to human thought). It has been my experience that even the most routine, "obvious" things can become very difficult to prove when phrased categorically. I don't think this is just my own ineptitude; I think there is a deep reason why category theory is difficult, which I will explain below. It's not just me, either: When working mathematicians want to understand something category theoretic, they often do things that allow them to think about it in a set theoretic manner, e.g. they embed their category into the category of sets (see "Yoneda embedding"), or choose a concrete set-theoretic model for their category.
Functors
I mentioned earlier that there is a category of categories. Clearly the objects of are ... categories, but what are the morphisms of ? They are called functors, and they are defined as follows.
Definition. A functor is a function which sends any to an object , along with a function which sends any to a morphism . This assignment respects composition (meaning for any morphisms ), and respects identity (meaning for any object ). ♦
If you want examples of functors, you can find many of them in a category theory textbook. I will only give one example, one which I think demonstrates the essence and purpose of functors.
Let's suppose that we are mathematicians who wish to do some set-theoretic "construction."[note 5] For example, one construction we could do is we could take any set , and we could replace it with a set , where is a set which has exactly two copies of every element of . For example, if then we might have .
Now, let's suppose that we change in a "trivial" way. For example, instead of considering , what if we considered ? Since and are basically the same, nothing important would change. Instead of getting
It is clear that and didn't play any special role in our construction; we could have done the exact same thing for any set . Thus, for an arbitrary set , let's define . Furthermore, let's call the latter isomorphism of the above paragraph . It's also clear that if is any isomorphism of sets, then we get an isomorphism , sending which we shall call . Thus our construction defines a functor , where is that category whose objects are sets and whose morphisms are isomorphisms of sets. (Exercise: verify this last statement.)
Let's take a step back and observe what happened. We have a set-theoretic construction . This set-theoretic construction is robust against arbitrary choices, in the sense that if we were to relabel all the elements of (that is, if we were to change to an isomorphic set ), then there is an obvious way to relabel all the elements of . This robustness means precisely that our construction defines a functor.
This, in my view, is the essence of functors.[note 6] Functors represent mathematical constructions, such that when you make an unimportant change to the input of the construction, it induces an unimportant change on the output of the construction.
As any mathematician (or computer programmer) knows, whenever you are constructing something (or programming something), there are tons of details which are "arbitrary," in the sense that they that could have been other than the way they are. We even faced this issue in the previous section, when I defined the category . To define the objects of , I needed a set with two elements, and I chose . But I could just as well have chosen , or , or , or anything else, and it wouldn't have made a big difference: The resulting category would have been essentially the same. (Likewise, when programming a computer, it doesn't matter if we begin our for-loops at 0 and end at n-1, or begin them at 1 and end them at n: The resulting computer program will be essentially the same.)
Functors are a way of talking about constructions, that abstracts away the arbitrary choices that went into the construction. Functors gain their "independence" from arbitrary choices because to build one, you have to say how it would handle any possible arbitrary choice. Here we also see why category theory is difficult: saying how you would handle any possibility is much more difficult than just saying how you would do the construction in one specific case.
Universal properties
Besides functors, there is another way in which categories abstract away from arbitrary choices, called "universal properties."
The construction (or as we now know, the functor) of the previous section suggests a more general construction. Given two sets and , we can construct their disjoint union
Given any set and any set , we can construct a set whose elements are the elements of and the elements of . More formally,
Definition. The disjoint union of a set and a set is the set
It is possible to get at the exact same idea in a way which is differs in mere technical details. We could instead have defined the disjoint union as
Definition. The disjoint sum of a set and a set is the set
It is clear that the thing I'm calling "disjoint union" and the thing I'm calling "disjoint sum" are not different in an important way. Both of them capture the idea that, given two sets, there exists a set which has all the elements of the first one and all the elements of the second one, and which keeps the elements separate. The "disjoint union" construction keeps separated from by putting the former into a tuple with "1", and putting the latter into a tuple with "2". The "disjoint sum" construction reverses the order, and keeps separated from by putting the former into a 3-tuple with "1" and "2", and putting the latter into a 3-tuple with "3" and "4".
We see that there is something arbitrary about the details of these constructions. What is the common essence that disjoint union and disjoint sum share, which we are trying to capture with our formal definitions? Is it even possible to talk about such a thing mathematically? The answer to the latter question is yes(!!), and the answer to the former question is that these two constructions possess the same universal property. I will not try to define this concept yet, but rather I will proceed inductively by studying the disjoint union and disjoint sum in greater depth.
The first thing to note about the disjoint union is that there is always an "inclusion" function sending , and an "inclusion" function sending . These data possess the following property:
Proposition 1 (universal property of the coproduct). For any set , and any functions , , there exists a function such that and , and is the unique function with this property. More formally, uniqueness means that if there is any function such that and , then necessarily . ♦
This proposition is logically somewhat complex (), but the reader will find that it is very easy to prove once he gets a grip on what it is saying.
The more standard and succinct (but less clear) way to state the above proposition would be to say "for any set , and any functions , , there exists a unique function such that and ." But rather than writing out that long sentence every time, a category theorist would draw the following diagram:

The universal property of the coproduct is also satisfied by the disjoint sum. That is,
Proposition 2. For any set , and any functions , , there exists a function such that and , and is the unique function with this property. More formally, uniqueness means that if there is any function such that and , then necessarily . ♦
Note that this "universal property of the coproduct" is entirely category theoretic: to state the property, you don't have to know any of the details about what's inside , , or , and you don't have to know anything about what the functions , , etc. are doing. You don't even have to know that , , and are sets, and that , , etc. are functions. For the statement to make sense, we just have to know that , , and are objects in some category, and that , , etc. are morphisms in that category.
A consequence of this generality is that we could have formulated the exact same universal property in any other category, whether it be groups, topological spaces, natural numbers, or anything else. There's no guarantee that any object of a given category will actually possess this property, but one could state it nonetheless. In particular, if there were a different category of sets from the standard one (and I alluded to the fact that I think there ought to be a better one), then this property would still make sense. Another consequence of this generality is the following:
Proposition 3. Let and be two objects of any(!) category, and suppose that and both satisfy the universal properties of the coproduct. Then and are uniquely isomorphic(!).
As an exercise, the reader should give a more precise statement of proposition 3 (i.e. state precisely what "uniquely isomorphic" means), and prove it.
Corollary of proposition 3. The disjoint sum and disjoint union of and are uniquely isomorphic.
This is the sense in which the disjoint sum and the disjoint union are really "the same" construction! They satisfy the same universal property, and therefore are isomorphic in a canonical way.
I can now give a definition of a universal property. A universal property is a property possessed by an object of a category, which characterizes it up to a canonical/unique isomorphism. Besides the coproduct, there are many other familiar sets and set-theoretic constructions that possess universal properties, for example, the product of two sets , a set containing a single element , a coproduct of three sets , and anything else that you can dream up. For an example of the flavor of universal products in other categories, I shall note that in many geometrical categories (where the objects are shapes / spaces of some sort), the (transverse) intersection of two objects satisfies a universal property (the universal property of the pullback), and the gluing together of two objects satisfies a universal property (the universal property of the pushforward).
In conclusion, with universal properties we observe the same phenomenon that we observed with functors. There are some arbitrary details in our math constructions, but this concept helps us talk about our math constructions in a way which guarantees that those arbitrary details won't matter. If the only properties of that we ever use are those that follow from its universal property, then it is completely indistinguishable from . Of course, just like for functors, this robustness against arbitrary choices comes at the cost of making category theory more difficult than set theory: instead of just writing down a set and chugging along, we have to learn and automatize a complicated statement.
Conclusion (general relativity and beyond)
This theme of canonical math constructions, i.e. those which can be performed without making any choices, is very powerful. I believe that the ideas of category theory will have implications that go far beyond the ridiculously abstract fields (algebraic geometry, homotopy theory) in which they are presently employed. I will conclude with just one tantalizing example:
The laws of classical mechanics are formulated with respect to coordinate systems. A coordinate system is a choice, which could have been otherwise: the 0˚ longitude line could have been just as easily defined to run through Paris as it was defined to run through Greenwich. If your coordinate system is changed drastically, then the form of the laws of classical mechanics change drastically in turn. For example, if your coordinate system is rotating, then there is a Coriolis force.
One might ask the question: How do we formulate the laws of physics in a way so that they look the same, regardless of what coordinate system we are using? I suspect that in answering that question, you will arrive at something like General Relativity.
Notes
- ↑ In the literature this is called a "locally small" category.
- ↑ A class is just another word for a set. In conventional mathematics, we aren't allowed to use the word "set" here, because for many categories it would give rise to a Russell's paradox. I think that Russell's paradox only arises in situations where we aren't being careful about our what math refers to in reality, but that discussion would take us too far afield.
- ↑ I think sets contain actual existents. One type of existent is indeed a set which is empty. However, I don't think such things should play any sort of foundational role in a proper theory of sets. Another place where I disagree with the mainstream is that I don't think there is a unique empty set. The place where any set exists is within the mind of an individual; a set is one of his forms of awareness. The empty set is a man's awareness of nothing, i.e. of some existent being other than what context suggests. Thus there is one empty set in Bob's mind when he identifies that his pantry is empty, another empty set in Mary's mind when she identifies that she doesn't have any meetings today, etc.
- ↑ Indeed, suppose that some "thing" in mathematics is defined as , where is a set and is a structure on . (A structure is an order, or a sigma algebra, or an involution, or a multiplication operation, or something like that. I won't try to define precisely what a "structure" is (see Bourbaki for that), but I claim that everything in standard mathematics can be thought of as some structure on some set.) We then get a category of "things" for free. The objects of are "thing"s, and a morphism is an isomorphism of sets, such that: the structure living over that you get when you transport by , is equal to , or in symbols .
- ↑ One might ask what I mean by a "construction." What exactly are set-theoretic constructions in reality? A set-theoretic construction is a method of forming one set, given some other sets. To "form a set" is to make the identification that some existents are a set; it is to consider the existents as belonging together, as part of a single collection; it is to take the unit perspective on some existents.
- ↑ Technical caveat: it is the essence of functors on groupoids.