How a Doodler’s Problem Sparked a Controversy in Math

The twisty history and surprise ending of the four-color-theorem saga

A map of the U.S. with states in yellow, blue, pink and purple. The states that share a border are in different colors. — In this map, no two adjacent states are colored in the same hue.

June Kim

In 1852 mathematician Francis Guthrie asked a seemingly simple question that triggered endless dispute, left a trail of overturned publications in its wake and culminated in a resolution that has stretched the very tenets of math. The conundrum that stirred up so much trouble was: What is the minimum number of colors needed to color a map so that no neighboring states or other designated areas have the same hue? Here’s how it works. Check out the black and-white map of the contiguous U.S. below. It looks a little bare-bones. To make maps more vivid and clearly highlight their borders, cartographers tend to color in the regions.

A map of the U.S. with state borders. — June Kim

Naturally, we don’t want neighboring states to have the same color, because that would make the boundaries more confusing. Under this constraint, we used four colors to fill in the black-and-white map. Could we have done it with only three? Might other maps require five or six?

On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.

The map in this problem doesn’t need to correspond to real geography—any partitioning of a flat surface into distinct regions qualifies. The question, given any such map, is how many colors are required to fill in each region so that no two adjacent regions have the same shade. Some ground rules: Each distinct region must be contiguous (technically Michigan violates this rule in U.S. maps because Lake Michigan severs the state into two disconnected parts). For two regions to count as adjacent, they must share some length of contiguous border; touching at a single point (or discrete set of points) doesn’t qualify. For example, Utah and New Mexico touch at only one corner and so do not count as neighbors for our purposes.

With the rules established, here are some questions with surprising answers. Suppose I printed out a large poster with a complicated map containing a few thousand regions. How long would it take you to determine whether the map could be colored with two colors? Three colors? Four colors? You don’t necessarily need to find a coloring scheme; just decide whether it exists for each number of colors. Curiously, although the task seems nearly identical for all the numbers, it requires radically different amounts of time to complete for each. Using the best-known methods:

Deciding whether two colors suffice would take about an hour. To do it, pick out any region and color it, say, red. This forces all the region’s neighbors into your second color, say, blue. In turn, all their neighbors become red and so on, propagating through the map. Eventually you either encounter a conflict where neighboring regions share a color, in which case no “two-coloring” exists, or the colors spread through the entire map conflict-free, in which case you’ve found a valid coloring. A back-of-the-envelope calculation with 3,000 regions at a rate of one second per coloring yields 50 minutes of time well spent.
Suppose the map can’t be filled with only two colors. Deciding whether three colors suffice would take longer. The afternoon would pass you by. The weeks would peel off the calendar as you furiously scribbled endless configurations, searching for one that works. To carry forth, you’d have to pass down the ongoing task to your children and they to their children. Generations of lives devoted to nothing other than finding a three-coloring of this map wouldn’t put a dent in the workload as the sun inevitably engulfed Earth in some billions of years and put an end to the silly endeavor, leaving us barely closer to an answer.

Determining whether an arbitrary map has a three-coloring is hard. Here “hard” is a technical term indicating that it falls into a class of computational problems renowned for their time-consuming difficulty, called NP-complete problems. For problems in this class, we don’t know any faster methods than more or less brute-force searching through every possible solution. That search space grows exponentially as the size of the problem increases. For a small map with only a few regions, we could afford to exhaustively look through every possible three-coloring until we find one that works (or conclude that there isn’t one). But the number of ways to assign three colors to maps with thousands of regions is so astronomical that it renders exhaustive searches hopeless.

And four colors? Well, that takes about one second or the time you need to say “yes” because every map can be colored with four colors. This outcome is the infamous and long-disputed four-color theorem.

When Guthrie first conjectured the four-color theorem in 1852, he noticed that he needed only four colors to properly fill the counties of England. He suspected this rule would generalize to any map, but although any kindergartner could understand the question, neither he nor his colleagues could prove it. It was clear that three colors wouldn’t always hack it, as evidenced by the circle diagram below, where every region borders every other one.

A diagram of a yellow circle wrapped by a ring divided into three sections. Each section of the ring is colored in blue, pink or purple. — June Kim

But nobody could find a map that required five colors. Stymied by the problem, mathematician Augustus De Morgan grew obsessed with it and concluded that a new axiom—which in math is a statement that is assumed to be true without proof, from which more complicated statements can be derived—must be added to the foundations of math to resolve Guthrie’s conjecture.

The fevered frustration ostensibly ended in 1879, when a proof emerged that four colors always suffice. This was underscored by a second, independent proof a year later. With the matter settled and accolades distributed, captivated mathematicians returned to their usual research pursuits—except for some. Eleven years after the publication of the first proof, both proofs were overturned, and the slippery four-color theorem reverted to the four-color conjecture. Percy John Heawood of Durham University in England, who exposed a hole in the original proof, made some progress, however, by proving that five colors always suffice for filling any map.

This left the math world in a rather embarrassing position. A problem so seemingly simple had one of two answers—four or five—but which was correct? It would stand this way for almost a century more.

Although nobody could find a map requiring five colors, no one could rule out the possibility of one, either. Because there are an infinite number of maps, one could never check each of them individually. A key step toward a solution involved reducing the problem to a finite set of cases that could be checked individually. The leap from infinite to finite seems vast, but the monstrous number of cases to check still far exceeded what any person could manually process.

So mathematicians Kenneth Appel and Wolfgang Haken, then both at the University of Illinois, turned to a daring idea: program a computer to process them instead. In 1976, after years of fine-tuning and more than 1,000 hours of computer time, their algorithm finished exhaustively checking every case, and the four-color theorem was established. It was the first major theorem to use a computer in its proof.

The math world lit ablaze with equal parts celebration and dismay. One of Appel and Haken’s colleagues, William Tutte of the University of Waterloo in Ontario, rejoiced that they “smote the kraken.” Others despised the thought of computers encroaching on human ingenuity. The affair also posed a philosophical problem in the math community. Does a proof that can’t be verified by humans count as a proof at all? Many expected the work to eventually be retracted like both the alleged proofs that preceded it. The New York Times even refused to report on the announcement at first because proofs of the four-color theorem “were all false anyway.”

Multiple attempts to refute the computer-assisted proof failed in the following decades. Mathematicians have since drastically simplified the proof and verified the computer code, but to this day no proof of the theorem derived without the aid of computers is known. And although the four-color theorem has become widely accepted as a fact, a yearning lingers over it. A computer program that systematically analyzes reams of configurations does not explain exactly why every map can be filled with four colors. Even though mathematicians now welcome computers as partners in discovery, they are still searching for a more illuminating proof of this colorful puzzle.