Category theory: abstracting mathematical construction (essay)
Definition. A category consists[note 1] of the following data:
- A class[note 2] , called the "objects" of the category
- For any two objects , a set , called the "morphisms from to ." We take to be a statement which means the exact same thing as the statement .
- For any two morphisms and , a function which we call "composition." That is, we can compose and to get an element ,
subject to the following conditions:
- For any object , there exists an "identity" morphism , satisfying for any and for any .
- Composition is associative: .
♦
One example of a category---the example, in fact---is , the category of sets. The objects of are sets, the morphisms of are functions, and composition of morphisms of is composition of functions: The identity morphism of a set is the function which does nothing.
Another example of a category is "the discrete category with 2 elements," which I shall call . The objects are a set consisting of 2 elements (for concreteness we could take where and ).[note 3] The morphisms of are the identity elements, and nothing else. More formally, we have, , and .
Another example of a category is the category of groups. Here, the objects are groups, and the morphisms are group homomorphisms.
Another example of a category is the (naive) category of topological spaces. Here, the objects are topological spaces, and the morphisms are continuous maps.
We could go on like this ad infinitum. There are a huge variety of mathematical objects that form categories. Rings, graphs, vector spaces, representations, algebraic varieties, the points of the sphere, the open subsets of the plane, the natural numbers, compact Hausdorff totally disconnected spaces, etc etc. There is even a category of categories themselves. In fact, every mathematical object forms a category,[note 4] though perhaps not in a useful way. Part of modern mathematical folklore is that you should always "categorify" whatever mathematical objects it is that you are working with, i.e. you should find a way to define everything in a category-theoretic way.
Note that the definition I gave of categories is set theoretic. Shea has reminded me that category theory can be taken as an alternative foundation to set theory (see here for Lawvere's attempt at that). I have barely studied Lawvere's ideas, but I am skeptical that it is useful as a foundation. One big reason why is that when it comes to actually proving things, set theory is much easier than category theory. It has been my experience that even the most routine, "obvious" things can become very difficult to prove when phrased categorically. I don't think this is just my own ineptitude; I think there is a deep reason why category theory is difficult, which I will explain later. It's not just me, either: When working mathematicians want to understand something category theoretic, they often do things that allow them to think about it in a set theoretic manner, e.g. they embed their category into the category of sets (see "Yoneda embedding").
Functors
I mentioned earlier that there is a category of categories. Clearly the objects of are ... categories, but what are the morphisms of ? They are called functors, and they are defined as follows.
Definition. A functor is a function which sends any to an object , along with a function which sends any to an object . This assignment respects composition (meaning for any morphisms ), and which respects identity (meaning for any object ). ♦
If you want examples of functors, you can find many of them in a category theory textbook. I won't give the standard examples here. I will only give one example, which demonstrates the essence and purpose of functors.
Let's suppose that we are mathematicians who wish to do some set-theoretic "construction."[note 5] For example, one construction we could do is we could take any set , and we could replace it with a set , where is a set which has exactly two copies of every element of . For example, if then we might have .
Now, let's suppose that we change in a "trivial" way. For example, instead of considering , what if we considered . Since and are basically the same, nothing important would change. Instead of getting
Now, why am I entitled to say that and are "basically the same," and that replacing with was trivial? Because all I did was I took the contents of , and I added a symbol ' to both of them! That's not interesting. More formally, I think that and are "the same" because I have in mind a function , which sends and , and this function is an isomorphism; it has an inverse function which sends and . Likewise, I think that and are "basically the same" because there is an isomorphism sending , , etc.
It is clear that and didn't play any special role in our construction; we could have done the exact same thing for any set . Thus, for an arbitrary set , let's call . Furthermore, let's call the latter isomorphism of the above paragraph . It's also clear (exercise!) that is any isomorphism of sets, then we get an isomorphism , which we shall call . Thus our construction defines a functor , where is that category whose objects are sets and whose morphisms are isomorphisms of sets.
Let's take a step back and observe what happened. We have a set-theoretic construction . This set-theoretic construction is robust against arbitrary choices, in the sense that if we were to relabel all the elements of (that is, if we changed to an isomorphic set ), then there is an obvious way to relabel all the elements of . This robustness means precisely that our construction defines a functor.
This, in my view, is the essence of functors.[note 6] Functors represent mathematical constructions, such that when you make an unimportant change to the input of the construction, it induces an unimportant change on the output of the construction.
As any mathematician (or computer programmer) knows, whenever you are constructing something (or programming something), there are tons of details which are "arbitrary," in the sense that they that could have been other than the way they are. We even faced this issue in the previous section, when I defined the category . To define the objects of , I needed a set with two elements, and I chose , but I could just as well have chosen , or , or , or anything else, and it wouldn't have made a big difference. (Likewise, when programming a computer, it doesn't matter if we begin our for-loops at 0 and end at n-1, or begin them at 1 and end them at n.)
Functors are a way of talking about constructions, without making any arbitrary choices. Functors gain their "independence" from arbitrary choices because to build one, you have to say how it would handle any possible arbitrary choices. (Here we also see why category theory is difficult: to define a functor, you have to say how you would handle any possible choice, which is much more difficult than just saying how you would do the construction in some specific case).
Universal properties
There is another way in which categories handle arbitrary choices. [TODO explain set theoretic union?]
The construction (or as we now know, the functor) of the previous section suggests a more general construction. Given two sets and , we can construct their disjoint union
Definition. The disjoint union of a set and a set is the set
However, we can get at the exact same idea in a way which is technically different. We could instead have defined the disjoint union as
Definition. The disjoint sum of a set and a set is the set
It is clear that the thing I'm calling "disjoint union" and the thing I'm calling "disjoint sum" are not different in an important way. Both of them capture the idea that, given two sets, there exists a set which has all the elements of the first one and all the elements of the second one, and which keeps the elements separate. The "disjoint union" construction keeps separated from by putting the former into a tuple with a "1", and putting the latter into a tuple with a "2". The "disjoint sum" construction keeps separated from by doing nothing to the former, and putting the latter into a set . We see that there is something arbitrary about the details of these constructions.
What is the common essence that disjoint union and disjoint sum share, which we are trying to capture with our formal definitions? Is it even possible to talk about such a thing mathematically? The answer to the latter question is yes(!!), and the answer to the former question is that these two constructions possess the same universal property. I will not try to define this concept yet, but rather I will proceed inductively by studying the disjoint union and disjoint sum in greater depth.
The first thing to note about the disjoint union is that there always exists an "inclusion" function sending , and an "inclusion" function sending . This data possesses the following property:
Proposition 1 (universal property of the coproduct). For any set , and any functions , , there exists a function such that and , and is the unique function with this property. More formally, uniqueness means that if there is any function such that and , then necessarily . ♦
This proposition is logically somewhat complex (), but the reader will find that it is very easy to prove once he gets a grip on what it is saying.
The more standard and succinct (but less clear) way to state the above proposition would be to say "for any set , and any functions , , there exists a unique function such that and ." But rather than writing out that long sentence every time, a category theorist would draw the following diagram [TODO].
Ex
The universal property of the coproduct is also satisfied by the disjoint sum. That is,
Proposition 2. For any set , and any functions , , there exists a function such that and , and is the unique function with this property. More formally, uniqueness means that if there is any function such that and , then necessarily . ♦
Note that this "universal property of the coproduct" is entirely category theoretic: to state the property, you don't have to know any of the details about what's inside , , or , and you don't have to know anything about what the functions , , etc. are doing. To make the statement about rather than , all we had to do was change the words. We didn't have to dig into the details of the way that these two sets are constructed. In other words, all that this universal property cares about is the properties of these things qua elements of some category.
A consequence of this generality is that we could have formulated the exact same universal property in any other category, whether it be groups, topological spaces, natural numbers, or anything else. There's no guarantee that any object of a given category will actually possess this property, but could state it nonetheless. Another consequence of this generality is the following:
Proposition 3. Let and be two objects of any(!) category, and suppose that and both satisfy the universal properties of the coproduct. Then and are uniquely isomorphic(!). [TODO]
Corollary of proposition 3. The disjoint sum and disjoint union of and are uniquely isomorphic.
As an exercise, the reader should give a more precise statement of proposition 3 (i.e. state precisely what "uniquely isomorphic" means), and prove it.
I can now give a definition of a universal property. A universal property is a property possessed by an object of a category, which characterizes it up to a unique isomorphism. Besides the coproduct, there are many other familiar sets and set-theoretic constructions that possess universal properties, for example, the product of two sets , a set containing a single element , a coproduct of three sets , and anything else that you can dream up. For an example of the flavor of universal products in other categories, I shall note that in many geometrical categories (where the objects are shapes / spaces of some sort), the (transverse) intersection of two objects satisfies a universal property (the universal property of the pullback), and the gluing together of two objects satisfies a universal property (the universal property of the pushforward).
In conclusion, with universal properties we observe the same phenomenon that we observed with functors. There are some arbitrary details in our math constructions, but this concept helps us talk about our math constructions in a way which guarantees that those arbitrary details won't matter. If the only properties of that we ever use are those that follow from its universal property, then it is completely indistinguishable from . Of course, just like for functors, this robustness against arbitrary choices comes at the cost of making category theory more difficult than set theory: instead of just writing down a set and chugging along, we have to learn and automatize a complicated statement.
Notes
- ↑ In the literature this is called a "locally small" category.
- ↑ A class is just another word for a set. In conventional mathematics, we aren't allowed to use the word "set" here, because for many categories it would give rise to a Russell's paradox. I think that Russell's paradox only arises in situations where we aren't being careful about our what math refers to in reality, but that discussion would take us too far afield.
- ↑ I think sets contain actual existents. One type of existent is indeed a set which is empty. However, I don't think such things should play any sort of foundational role in a proper theory of sets. Another place where I disagree with the mainstream is that I don't think there is a unique empty set. The place where any set exists is within the mind of an individual; a set is one of his forms of awareness. The empty set is a man's awareness of nothing, i.e. of some existent being other than what context suggests. Thus there is one empty set in Bob's mind when he identifies that his pantry is empty, another empty set in Mary's mind when she identifies that she doesn't have any meetings today, etc.
- ↑ Indeed, suppose that some "thing" in mathematics is defined as , where is a set and is a structure (I won't define what a "structure" is; see Bourbaki for that. But think about it as an additional set. [TODO look up]). Then we get a category of "things" for free. The objects of are "thing"s, and a morphism is an isomorphism of sets, such that: the structure living over that you get when you transport by , is equal to , or in symbols
- ↑ One might ask what I mean by a "construction." What exactly are set-theoretic constructions in reality? A construction is a method of forming one set, given some other sets. To "form a set" is to make the identification that some existents are a set; it is to consider the existents as belonging together, as part of a single collection; it is to take the unit perspective on some existents.
- ↑ Or at least, it is the essence of functors on groupoids.