May 14, 2024 ∙ Back to blog index
In computability, we love to compare the undecidability of problems using reductions: a Turing reduction from X to Y is an oracle machine which can solve X when the oracle answers Y queries. This directly gives rise to the central concept of Turing degrees.
In complexity, we also love Turing reductions, but they are often too strong. For example, when we get the answer from the oracle, we can easily negate it, so NP is not closed under polytime Turing reductions unless NP = co-NP. To compensate, we use weaker reductions where the algorithm that reduces X to Y outputs an equivalent instance of Y when given some instance of X — in other words, Turing reductions where there is only a single oracle invocation in “tail call” position. These are called many-to-one reductions.
Ever since learning about these reductions, I've vaguely wondered why they are code-named “many-to-one”. Why do we stress that such a reduction need not be one-to-one? What does this set-theoretic aspect of the reduction have to do with algorithms? Surely this means that one-to-one (injective) reductions are somehow interesting. Yet nobody ever told me what they could be useful for. Some way to define more fine-grained complexity classes? Some weird cryptography thing where we want to prevent collisions?
Recently, I discovered on Wikipedia what is possibly the only use of one-to-one reductions across all computer science, and, unexpectedly to me, it turns out to be in pure computability. It is a cute theorem which probably deserves to be better known, the Myhill isomorphism theorem. Roughly speaking, it is effective equivalent of the Cantor-Schröder-Bernstein theorem: if there is a one-to-one reduction from X to Y and a one-to-one reduction from Y to X, then X and Y are isomorphic. Of course, we should define what we mean by isomorphic:
Definition: A redutin from X ⊆ ℕ to Y ⊆ ℕ is a (total) function f : ℕ → ℕ such that ∀ n ∈ ℕ, n ∈ X ⟺ f(n) ∈ Y. A (many-to-one) reduction is a computable redutin. Two sets X, Y ⊆ ℕ are computably isomorphic when there is a reduction between them which is a bijection of ℕ onto itself.
[I don't know a term for reductions which are not necessarily computable. By analogy with “rng”, I took “reduction” and removed the first two letters of “computable”. Sorry!]
It should be clear that all computability-theoretic notions make no difference between computably isomorphic sets. To justify the term “isomorphism”, we can view the subsets of ℕ as a category with many-to-one reductions as morphisms. The isomorphisms in this category are exactly as just defined.
Then the theorem reads:
Myhill isomorphism theorem: Let X, Y ⊆ ℕ. If there are one-to-one reductions from X to Y and from Y to X, then X and Y are computably isomorphic.
As for the proof, it is not very difficult, but quite beautiful.
Most proofs of the Cantor-Schröder-Bernstein theorem revolve around the idea of analyzing “chains” which are formed by repeated applications of the two injections. We have sets X and Y, injections f : X → Y and g : Y → X, we start from some x ∈ X, and we consider
Because f and g are injections, these chains do not “overlap”. One cannot have two chains starting from different element which eventually merge into a single chain. Each element of X or Y is in a unique chain, which continues indefinitely on the right. What happens on the left, if we work backward? As long as we can find an inverse image of the current element, we can continue our travel along the chain. Then three things can happen: either we find an element which is elsewhere in the chain, so we have a loop; or we just continue indefinitely; or there is no more preimage at some point and we are forced to stop. That is, there are three types of chains:
Loops:
x → f(x) → g(f(x)) → f(g(f(x))) → … → x
Two-sided chains:
… → f^{-1}(g^{-1}(x)) → g^{-1}(x) → x → f(x) → g(f(x)) → f(g(f(x))) → …
One-sided chains:
x_{0} → f(x_{0}) → g(f(x_{0})) → f(g(f(x_{0}))) → …
or
y_{0} → g(y_{0}) → f(g(y_{0})) → g(f(g(y_{0}))) → …
We can then define a bijection between X and Y by pairing elements along the chains. For loops, we just pick an element arbitrarily, pair it with its successor on the chain, and repeat with the next element, until we complete the circle.
x → f(x) → g(f(x)) → f(g(f(x))) → … → x
For two-sided chains, we form a pair anywhere and propagate forward and backward until eternity:
… → f^{-1}(g^{-1}(x)) → g^{-1}(x) → x → f(x) → g(f(x)) → f(g(f(x))) → …
For one-sided chains, we just have to be careful not to leave the very first element unpaired. We start by pairing it with its successor and then continue:
x_{0} → f(x_{0}) → g(f(x_{0})) → f(g(f(x_{0}))) → …
or
y_{0} → g(y_{0}) → f(g(y_{0})) → g(f(g(y_{0}))) → …
And we are done. On loops and two-sided chains, our bijection is the same as f or g^{-1} — either of them works. The only subtlety is that on a one-sided chain, we have to choose whether to use f or g^{-1} depending on where the first element lives (the blue elements are either paired all with their successor, or all with their predecessor).
Let us now try to prove the Myhill isomorphism theorem. Can we adapt this proof to make it effective? Note that if we pair a blue element with a green element on the same chain, then the mapping we build is a reduction, because we follow f and g, which are reductions. (All elements are now integers, but I am still using green and blue colors to mentally distinguish between elements we think of as in the domain of the bijection we construct and elements in the range.) So, we can reuse the idea of constructing a bijection between the blue elements and the green elements on each chain, and gluing these bijections together.
Certainly, if we are given an element, we can follow the chain in the forward direction, because f and g are now assumed computable. What about the backward direction? Given x, we can enumerate integers n and see if we find one such that f(n) = x. But what if there is no such n? We just loop until the cows come home. Not good! On one-sided chains, we cannot detect where the chain stops (since we absolutely must halt), so we cannot know the color of the first element.
Intuitively, we seem to only be able to build up the pairs by going forward, since we cannot go backward without stumbling on possible non-termination. If we assume that we have already constructed some pairs, we are happy when we encounter for example
??? → x → f(x) → g(f(x)) → f(g(f(x))) → g(f(g(f(x)))) → f(g(f(g(f(x))))) → …
since we can build a new pair:
??? → x → f(x) → g(f(x)) → f(g(f(x))) → g(f(g(f(x)))) → f(g(f(g(f(x))))) → …
but our problem is situations like this:
??? → x → f(x) → g(f(x)) → f(g(f(x))) → g(f(g(f(x)))) → f(g(f(g(f(x))))) → …
where we cannot go backward, and the next element forward in the chain is already in a pair. However, there is a nifty solution which should be almost obvious on this visualization. We simply move forward in the chain, until we find an unpaired element, and we pair it:
??? → x → f(x) → g(f(x)) → f(g(f(x))) → g(f(g(f(x)))) → f(g(f(g(f(x))))) → …
And that's basically it! The rest is routine bookkeeping. Our bijective reduction takes an integer n and works by incrementally building pairs (thought of as a blue source and a green target) until it finds n as the source of some pair, at which point it returns the target. In order not to miss any element in the domain or range of the bijection, the pairs are built by enumerating integers k, and for each of them, making sure there is a pair with source k and a pair with target k. If we find that we do not have a pair with source k, we follow a chain from k, skipping already paired elements, until we find an available element, and we add a pair with k as source and that element as target. Adding a pair with k as target is symmetric. QED.
As an added bonus, we obtain that the theorem itself is effective: given algorithms for the reductions f and g, we can compute an algorithm for the isomorphism. Here is some Python code to illustrate this. Beware, as the saying goes, I have only proved it correct, not tested it.
def myhill_isomorphism(f, g):
def iso(n):
paired_sources = set()
paired_targets = set()
for k in range(n+1):
if k not in paired_sources:
paired_sources.add(k)
target = f(k)
while target in paired_targets:
target = g(f(target))
if k == n:
return target
paired_targets.add(target)
if k not in paired_targets:
paired_targets.add(k)
source = g(k)
while source in paired_sources:
source = f(g(source))
if source == n:
return k
paired_sources.add(source)
return iso
I also tried to formalize the theorem in Coq, before eventually giving up as the task was much more work than I anticipated. The biggest problem is that you need to formalize “skipping pairs until an unpaired element is found”, which is a while loop, and prove the termination of this loop. On looping chains, this is because the number of nodes is even, and for one-sided or two-sided chains, it is because there are only finitely many pairs already constructed. That “should” all be easy, but the Coq standard library seemed surprisingly lackluster and difficult to use — why are there three different definitions of an injection in unrelated modules and why does each of them prove a different subset of the basic facts about injections? Perhaps I should see if the grass is greener on the Lean and mathlib side.
So much for the proof. Does this theorem really demonstrate that one-to-one reductions are interesting? Clearly, it does not hold as-is for many-to-one reductions. If X and Y are many-to-one-equivalent, they may not be computably isomorphic for cardinality reasons. For example, if X = {0} and Y = {0, 1}, then there cannot be a bijective reduction from X to Y because X has cardinal 1 and Y has cardinal 2. The same goes if X = ℕ \ {0} and Y = ℕ \ {0, 1}, because of the cardinals of the complements.
The assumption that the reductions are one-to-one prevents these cardinality problems, by the vanilla Cantor-Schröder-Bernstein theorem. But what if we just require that the cardinalities match? Can we find X and Y which are many-to-one equivalent, and between which there is a bijective redutin but no bijective reduction? Can the theorem fail other than for silly cardinality reasons?
From another point of view: In the Myhill isomorphism theorem, we require that the same functions serve as reductions and one-to-one mappings. What if we relax this requirement?
Curiously, I did not find a word about this obvious generalization in any reference about this theorem. (I did find proofs which were essentially equivalent to the one above, but written in a very formal and hard-to-decipher style).
I went searching for a counter-example myself, and in the end I found one, but it is a tad contrived and I remain convinced there must be a simpler one. If you find it, let me know!
Let us first narrow down our search with simple remarks. If X is finite and there is a bijective redutin f from X to Y (this means X = f^{-1}(Y), which for a bijection is equivalent to f(X) = Y), then Y is finite with the same cardinal as X, and the two are trivially computably isomorphic (i.e., one can choose f computable). The same goes for co-finite sets. On the other hand, if X and Y are both infinite and co-infinite, then clearly there exists a bijective redutin from X to Y (glue together any bijection between X and Y with any bijection between the complement of X and the complement of Y, all these sets being countably infinite).
Therefore, at this point, our objective is to find two infinite and co-infinite sets X and Y which are many-to-one-equivalent, but not computably isomorphic. By the Myhill isomorphism theorem, demanding that they are not computably isomorphic is the same as demanding that there be no one-to-one reduction from one of them to the other, for example, no one-to-one reduction from X to Y.
I will choose X wisely, and then use for Y the set X with its minimum element removed. This construction clearly ensures that X and Y are many-to-one-equivalent; the reductions just need to special-case that minimum element and otherwise return the input.
The hope is that, because there is an extra element in X, a one-to-one reduction from X to Y is forced to map elements of X to larger elements of X (e.g., map each element to the next one), and we can try to make this transformation non-computable.
If we use for X, say, the halting problem, we unfortunately do not get a counter-example. This is because a reduction from X to Y can manage to be one-to-one by a “padding” trick. Intuitively, when we have two instances of X which correspond to the same Turing machine and we want the reduction to map them to different Turing machines, we can artificially make them different, by adding useless isolated states in one of them.
The idea of my counter-example is that if we space out the elements of X enormously, a one-to-one reduction from X to Y needs to compute numbers so large that we are able to prevent any computable function from doing the job. This is reminiscent of the busy beaver function.
Assume that we have X and Y as specified. Denote X = {x_{0}, x_{1}, x_{2}, …} where x_{0} ≤ x_{1} ≤ x_{2} ≤ … (thus Y = {x_{1}, x_{2}, …}).
Assume that f is a one-to-one reduction from X to Y. It is computed by some Turing machine with a certain number of states n ≥ 1. By injectivity, the n first elements of X, namely x_{0}, …, x_{n-1}, are sent to n distinct elements of Y, therefore at least one of them is x_{n} or larger. Thus, we have k < n such that f(x_{k}) ≥ x_{n}.
At last, we can see how to construct X to make this impossible: pick x_{0} arbitrarily, and iteratively choose x_{n} larger than the maximum image of an x_{k} for k < n by a Turing machine with n states. QED.