14 May 2024 ⋅ Back to blog index
Note (added December 2024): This post contains boxes in math formulas, which are currently not supported by Google Chrome and other Chromium-based browsers such as Microsoft Edge. Please use Firefox.
In computability, we love to compare the undecidability of problems using reductions: a Turing reduction from to is an oracle machine which can solve when the oracle answers queries. This directly gives rise to the central concept of Turing degrees.
In complexity, we also love Turing reductions, but they are often too strong. For example, when we get the answer from the oracle, we can easily negate it, so NP is not closed under polytime Turing reductions unless NP = co-NP. To compensate, we use weaker reductions where the algorithm that reduces to outputs an equivalent instance of when given some instance of — in other words, Turing reductions where there is only a single oracle invocation in “tail call” position. These are called many-to-one reductions.
Ever since learning about these reductions, I've vaguely wondered why they are code-named “many-to-one”. Why do we stress that such a reduction need not be one-to-one? What does this set-theoretic aspect of the reduction have to do with algorithms? Surely this means that one-to-one (injective) reductions are somehow interesting. Yet nobody ever told me what they could be useful for. Some way to define more fine-grained complexity classes? Some weird cryptography thing where we want to prevent collisions?
Recently, I discovered on Wikipedia what is possibly the only use of one-to-one reductions across all computer science, and, unexpectedly to me, it turns out to be in pure computability. It is a cute theorem which probably deserves to be better known, the Myhill isomorphism theorem. Roughly speaking, it is effective equivalent of the Cantor-Schröder-Bernstein theorem: if there is a one-to-one reduction from to and a one-to-one reduction from to , then and are isomorphic. Of course, we should define what we mean by isomorphic:
Definition: A redutin from to is a (total) function such that . A (many-to-one) reduction is a computable redutin. Two sets are computably isomorphic when there is a reduction between them which is a bijection of ℕ onto itself.
[I don't know a term for reductions which are not necessarily computable. By analogy with “rng”, I took “reduction” and removed the first two letters of “computable”. Sorry!]
It should be clear that all computability-theoretic notions make no difference between computably isomorphic sets. To justify the term “isomorphism”, we can view the subsets of ℕ as a category with many-to-one reductions as morphisms. The isomorphisms in this category are exactly as just defined.
Then the theorem reads:
Myhill isomorphism theorem: Let . If there are one-to-one reductions from to and from to , then and are computably isomorphic.
As for the proof, it is not very difficult, but quite beautiful. (Update (July 2nd 2024): I've put a short summary of this proof on the Wikipedia article.)
Most proofs of the Cantor-Schröder-Bernstein theorem revolve around the idea of analyzing “chains” which are formed by repeated applications of the two injections. We have sets and , injections and , we start from some , and we consider
Because and are injections, these chains do not “overlap”. One cannot have two chains starting from different element which eventually merge into a single chain. Each element of or is in a unique chain, which continues indefinitely on the right. What happens on the left, if we work backward? As long as we can find an inverse image of the current element, we can continue our travel along the chain. Then three things can happen: either we find an element which is elsewhere in the chain, so we have a loop; or we just continue indefinitely; or there is no more preimage at some point and we are forced to stop. That is, there are three types of chains:
Loops:
Two-sided chains:
One-sided chains:
or
We can then define a bijection between and by pairing elements along the chains. For loops, we just pick an element arbitrarily, pair it with its successor on the chain, and repeat with the next element, until we complete the circle.
For two-sided chains, we form a pair anywhere and propagate forward and backward until eternity:
For one-sided chains, we just have to be careful not to leave the very first element unpaired. We start by pairing it with its successor and then continue:
or
And we are done. On loops and two-sided chains, our bijection is the same as or — either of them works. The only subtlety is that on a one-sided chain, we have to choose whether to use or depending on where the first element lives (the blue elements are either paired all with their successor, or all with their predecessor).
Let us now try to prove the Myhill isomorphism theorem. Can we adapt this proof to make it effective? Note that if we pair a blue element with a green element on the same chain, then the mapping we build is a reduction, because we follow and , which are reductions. (All elements are now integers, but I am still using green and blue colors to mentally distinguish between elements we think of as in the domain of the bijection we construct and elements in the range.) So, we can reuse the idea of constructing a bijection between the blue elements and the green elements on each chain, and gluing these bijections together.
Certainly, if we are given an element, we can follow the chain in the forward direction, because and are now assumed computable. What about the backward direction? Given , we can enumerate integers and see if we find one such that . But what if there is no such ? We just loop until the cows come home. Not good! On one-sided chains, we cannot detect where the chain stops (since we absolutely must halt), so we cannot know the color of the first element.
Intuitively, we seem to only be able to build up the pairs by going forward, since we cannot go backward without stumbling on possible non-termination. If we assume that we have already constructed some pairs, we are happy when we encounter for example
since we can build a new pair:
but our problem is situations like this:
where we cannot go backward, and the next element forward in the chain is already in a pair. However, there is a nifty solution which should be almost obvious on this visualization. We simply move forward in the chain, until we find an unpaired element, and we pair it:
And that's basically it! The rest is routine bookkeeping. Our bijective reduction takes an integer and works by incrementally building pairs (thought of as a blue source and a green target) until it finds as the source of some pair, at which point it returns the target. In order not to miss any element in the domain or range of the bijection, the pairs are built by enumerating integers , and for each of them, making sure there is a pair with source and a pair with target . If we find that we do not have a pair with source , we follow a chain from , skipping already paired elements, until we find an available element, and we add a pair with as source and that element as target. Adding a pair with as target is symmetric. ∎
As an added bonus, we obtain that the theorem itself is effective: given algorithms for the reductions and , we can compute an algorithm for the isomorphism. Here is some Python code to illustrate this. Beware, as the saying goes, I have only proved it correct, not tested it.
def myhill_isomorphism(f, g):
def iso(n):
paired_sources = set()
paired_targets = set()
for k in range(n+1):
if k not in paired_sources:
paired_sources.add(k)
target = f(k)
while target in paired_targets:
target = g(f(target))
if k == n:
return target
paired_targets.add(target)
if k not in paired_targets:
paired_targets.add(k)
source = g(k)
while source in paired_sources:
source = f(g(source))
if source == n:
return k
paired_sources.add(source)
return iso
I also tried to formalize the theorem in Coq, before eventually giving up as the task was much more work than I anticipated. The biggest problem is that you need to formalize “skipping pairs until an unpaired element is found”, which is a while loop, and prove the termination of this loop. On looping chains, this is because the number of nodes is even, and for one-sided or two-sided chains, it is because there are only finitely many pairs already constructed. That “should” all be easy, but the Coq standard library seemed surprisingly lackluster and difficult to use — why are there three different definitions of an injection in unrelated modules and why does each of them prove a different subset of the basic facts about injections? Perhaps I should see if the grass is greener on the Lean and mathlib side.
So much for the proof. Does this theorem really demonstrate that one-to-one reductions are interesting? Clearly, it does not hold as-is for many-to-one reductions. If and are many-to-one-equivalent, they may not be computably isomorphic for cardinality reasons. For example, if and , then there cannot be a bijective reduction from to because has cardinal 1 and has cardinal 2. The same goes if and , because of the cardinals of the complements.
The assumption that the reductions are one-to-one prevents these cardinality problems, by the vanilla Cantor-Schröder-Bernstein theorem. But what if we just require that the cardinalities match? Can we find and which are many-to-one equivalent, and between which there is a bijective redutin but no bijective reduction? Can the theorem fail other than for silly cardinality reasons?
From another point of view: In the Myhill isomorphism theorem, we require that the same functions serve as reductions and one-to-one mappings. What if we relax this requirement?
Curiously, I did not find a word about this obvious generalization in any reference about this theorem. (I did find proofs which were essentially equivalent to the one above, but written in a very formal and hard-to-decipher style).
I went searching for a counter-example myself, and in the end I found one, but it is a tad contrived and I remain convinced there must be a simpler one. If you find it, let me know!
Let us first narrow down our search with simple remarks. If is finite and there is a bijective redutin from to (this means = , which for a bijection is equivalent to ), then is finite with the same cardinal as , and the two are trivially computably isomorphic (i.e., one can choose computable). The same goes for co-finite sets. On the other hand, if and are both infinite and co-infinite, then clearly there exists a bijective redutin from to (glue together any bijection between and with any bijection between the complement of and the complement of , all these sets being countably infinite).
Therefore, at this point, our objective is to find two infinite and co-infinite sets and which are many-to-one-equivalent, but not computably isomorphic. By the Myhill isomorphism theorem, demanding that they are not computably isomorphic is the same as demanding that there be no one-to-one reduction from one of them to the other, for example, no one-to-one reduction from to .
I will choose wisely, and then use for the set with its minimum element removed. This construction clearly ensures that and are many-to-one-equivalent; the reductions just need to special-case that minimum element and otherwise return the input.
The hope is that, because there is an extra element in , a one-to-one reduction from to is forced to map elements of to larger elements of (e.g., map each element to the next one), and we can try to make this transformation non-computable.
If we use for , say, the halting problem, we unfortunately do not get a counter-example. This is because a reduction from to can manage to be one-to-one by a “padding” trick. Intuitively, when we have two instances of which correspond to the same Turing machine and we want the reduction to map them to different Turing machines, we can artificially make them different, by adding useless isolated states in one of them.
The idea of my counter-example is that if we space out the elements of enormously, a one-to-one reduction from to needs to compute numbers so large that we are able to prevent any computable function from doing the job. This is reminiscent of the busy beaver function.
Assume that we have and as specified. Denote where (thus ).
Assume that is a one-to-one reduction from to . It is computed by some Turing machine with a certain number of states . By injectivity, the first elements of , namely , are sent to distinct elements of , therefore at least one of them is or larger. Thus, we have such that .
At last, we can see how to construct to make this impossible: pick arbitrarily, and iteratively choose larger than the maximum image of an for by a Turing machine with states. ∎
Leave a comment