One-to-one reductions, and the Myhill isomorphism theorem

14 May 2024 ⋅ Back to blog index

Note (added December 2024): This post contains boxes in math formulas, which are currently not supported by Google Chrome and other Chromium-based browsers such as Microsoft Edge. Please use Firefox.

In computability, we love to compare the undecidability of problems using reductions: a Turing reduction from XX to YY is an oracle machine which can solve XX when the oracle answers YY queries. This directly gives rise to the central concept of Turing degrees.

In complexity, we also love Turing reductions, but they are often too strong. For example, when we get the answer from the oracle, we can easily negate it, so NP is not closed under polytime Turing reductions unless NP = co-NP. To compensate, we use weaker reductions where the algorithm that reduces XX to YY outputs an equivalent instance of YY when given some instance of XX — in other words, Turing reductions where there is only a single oracle invocation in “tail call” position. These are called many-to-one reductions.

Ever since learning about these reductions, I've vaguely wondered why they are code-named “many-to-one”. Why do we stress that such a reduction need not be one-to-one? What does this set-theoretic aspect of the reduction have to do with algorithms? Surely this means that one-to-one (injective) reductions are somehow interesting. Yet nobody ever told me what they could be useful for. Some way to define more fine-grained complexity classes? Some weird cryptography thing where we want to prevent collisions?

Recently, I discovered on Wikipedia what is possibly the only use of one-to-one reductions across all computer science, and, unexpectedly to me, it turns out to be in pure computability. It is a cute theorem which probably deserves to be better known, the Myhill isomorphism theorem. Roughly speaking, it is effective equivalent of the Cantor-Schröder-Bernstein theorem: if there is a one-to-one reduction from XX to YY and a one-to-one reduction from YY to XX, then XX and YY are isomorphic. Of course, we should define what we mean by isomorphic:

Definition: A redutin from XNX ⊆ ℕ to YNY ⊆ ℕ is a (total) function f:NNf : ℕ → ℕ such that nN,nXf(n)Y∀ n ∈ ℕ, n ∈ X ⇔ f(n) ∈ Y. A (many-to-one) reduction is a computable redutin. Two sets X,YNX, Y ⊆ ℕ are computably isomorphic when there is a reduction between them which is a bijection of ℕ onto itself.

[I don't know a term for reductions which are not necessarily computable. By analogy with “rng”, I took “reduction” and removed the first two letters of “computable”. Sorry!]

It should be clear that all computability-theoretic notions make no difference between computably isomorphic sets. To justify the term “isomorphism”, we can view the subsets of ℕ as a category with many-to-one reductions as morphisms. The isomorphisms in this category are exactly as just defined.

Then the theorem reads:

Myhill isomorphism theorem: Let X,YNX, Y ⊆ ℕ. If there are one-to-one reductions from XX to YY and from YY to XX, then XX and YY are computably isomorphic.

As for the proof, it is not very difficult, but quite beautiful. (Update (July 2nd 2024): I've put a short summary of this proof on the Wikipedia article.)

Most proofs of the Cantor-Schröder-Bernstein theorem revolve around the idea of analyzing “chains” which are formed by repeated applications of the two injections. We have sets X\def\X#1{\textcolor{royalblue}{#1}} \X{X} and Y\def\Y#1{\textcolor{forestgreen}{#1}} \Y{Y}, injections f:XYf : \X{X} → \Y{Y} and g:YXg : \Y{Y} → \X{X}, we start from some xX\X{x ∈ X}, and we consider

xf(x)g(f(x))f(g(f(x)))\X{x} → \Y{f(x)} → \X{g(f(x))} → \Y{f(g(f(x)))} → …

Because ff and gg are injections, these chains do not “overlap”. One cannot have two chains starting from different element which eventually merge into a single chain. Each element of XX or YY is in a unique chain, which continues indefinitely on the right. What happens on the left, if we work backward? As long as we can find an inverse image of the current element, we can continue our travel along the chain. Then three things can happen: either we find an element which is elsewhere in the chain, so we have a loop; or we just continue indefinitely; or there is no more preimage at some point and we are forced to stop. That is, there are three types of chains:

We can then define a bijection between XX and YY by pairing elements along the chains. For loops, we just pick an element arbitrarily, pair it with its successor on the chain, and repeat with the next element, until we complete the circle.

xf(x)g(f(x))f(g(f(x)))x\boxed{\X{x} → \Y{f(x)}} → \boxed{\X{g(f(x))} → \Y{f(g(f(x)))}} → … → \X{x}

For two-sided chains, we form a pair anywhere and propagate forward and backward until eternity:

f1(g1(x))g1(x)xf(x)g(f(x))f(g(f(x)))… → \boxed{\X{f^{-1}(g^{-1}(x))} → \Y{g^{-1}(x)}} → \boxed{\X{x} → \Y{f(x)}} → \boxed{\X{g(f(x))} → \Y{f(g(f(x)))}} → …

For one-sided chains, we just have to be careful not to leave the very first element unpaired. We start by pairing it with its successor and then continue:

x0f(x0)g(f(x0))f(g(f(x0)))\boxed{\X{x_0} → \Y{f(x_0)}} → \boxed{\X{g(f(x_0))} → \Y{f(g(f(x_0)))}} → …

or

y0g(y0)f(g(y0))g(f(g(y0)))\boxed{\Y{y_0} → \X{g(y_0)}} → \boxed{\Y{f(g(y_0))} → \X{g(f(g(y_0)))}} → …

And we are done. On loops and two-sided chains, our bijection is the same as ff or g1g^{-1} — either of them works. The only subtlety is that on a one-sided chain, we have to choose whether to use ff or g1g^{-1} depending on where the first element lives (the blue elements are either paired all with their successor, or all with their predecessor).

Let us now try to prove the Myhill isomorphism theorem. Can we adapt this proof to make it effective? Note that if we pair a blue element with a green element on the same chain, then the mapping we build is a reduction, because we follow ff and gg, which are reductions. (All elements are now integers, but I am still using green and blue colors to mentally distinguish between elements we think of as in the domain of the bijection we construct and elements in the range.) So, we can reuse the idea of constructing a bijection between the blue elements and the green elements on each chain, and gluing these bijections together.

Certainly, if we are given an element, we can follow the chain in the forward direction, because ff and gg are now assumed computable. What about the backward direction? Given xx, we can enumerate integers nn and see if we find one such that f(n)=xf(n) = x. But what if there is no such nn? We just loop until the cows come home. Not good! On one-sided chains, we cannot detect where the chain stops (since we absolutely must halt), so we cannot know the color of the first element.

Intuitively, we seem to only be able to build up the pairs by going forward, since we cannot go backward without stumbling on possible non-termination. If we assume that we have already constructed some pairs, we are happy when we encounter for example

???xf(x)g(f(x))f(g(f(x)))g(f(g(f(x))))f(g(f(g(f(x)))))\def\mysterious#1{\textcolor{dimgrey}{#1}} \mysterious{\text{???} →} \X{x} → \Y{f(x)} → \boxed{\X{g(f(x))} → \Y{f(g(f(x)))}} → \boxed{\X{g(f(g(f(x))))} → \Y{f(g(f(g(f(x)))))}} → …

since we can build a new pair:

???xf(x)g(f(x))f(g(f(x)))g(f(g(f(x))))f(g(f(g(f(x)))))\mysterious{\text{???} →} \boxed{\X{x} → \Y{f(x)}} → \boxed{\X{g(f(x))} → \Y{f(g(f(x)))}} → \boxed{\X{g(f(g(f(x))))} → \Y{f(g(f(g(f(x)))))}} → …

but our problem is situations like this:

???xf(x)g(f(x))f(g(f(x)))g(f(g(f(x))))f(g(f(g(f(x)))))\mysterious{\text{???} →} \X{x} → \boxed{\Y{f(x)} → \X{g(f(x))}} → \boxed{\Y{f(g(f(x)))} → \X{g(f(g(f(x))))}} → \Y{f(g(f(g(f(x)))))}

where we cannot go backward, and the next element forward in the chain is already in a pair. However, there is a nifty solution which should be almost obvious on this visualization. We simply move forward in the chain, until we find an unpaired element, and we pair it:

???xf(x)g(f(x))f(g(f(x)))g(f(g(f(x))))f(g(f(g(f(x)))))\mysterious{\text{???} →} \underbrace{\boxed{\X{x}} → \boxed{\Y{f(x)} → \X{g(f(x))}} → \boxed{\Y{f(g(f(x)))} → \X{g(f(g(f(x))))}} → \boxed{\Y{f(g(f(g(f(x)))))}}} → …

And that's basically it! The rest is routine bookkeeping. Our bijective reduction takes an integer nn and works by incrementally building pairs (thought of as a blue source and a green target) until it finds nn as the source of some pair, at which point it returns the target. In order not to miss any element in the domain or range of the bijection, the pairs are built by enumerating integers kk, and for each of them, making sure there is a pair with source kk and a pair with target kk. If we find that we do not have a pair with source kk, we follow a chain from kk, skipping already paired elements, until we find an available element, and we add a pair with kk as source and that element as target. Adding a pair with kk as target is symmetric. ∎

As an added bonus, we obtain that the theorem itself is effective: given algorithms for the reductions ff and gg, we can compute an algorithm for the isomorphism. Here is some Python code to illustrate this. Beware, as the saying goes, I have only proved it correct, not tested it.

def myhill_isomorphism(f, g):
    def iso(n):
        paired_sources = set()
        paired_targets = set()
        for k in range(n+1):
            if k not in paired_sources:
                paired_sources.add(k)
                target = f(k)
                while target in paired_targets:
                    target = g(f(target))
                if k == n:
                    return target
                paired_targets.add(target)
            if k not in paired_targets:
                paired_targets.add(k)
                source = g(k)
                while source in paired_sources:
                    source = f(g(source))
                if source == n:
                    return k
                paired_sources.add(source)
    return iso

I also tried to formalize the theorem in Coq, before eventually giving up as the task was much more work than I anticipated. The biggest problem is that you need to formalize “skipping pairs until an unpaired element is found”, which is a while loop, and prove the termination of this loop. On looping chains, this is because the number of nodes is even, and for one-sided or two-sided chains, it is because there are only finitely many pairs already constructed. That “should” all be easy, but the Coq standard library seemed surprisingly lackluster and difficult to use — why are there three different definitions of an injection in unrelated modules and why does each of them prove a different subset of the basic facts about injections? Perhaps I should see if the grass is greener on the Lean and mathlib side.

So much for the proof. Does this theorem really demonstrate that one-to-one reductions are interesting? Clearly, it does not hold as-is for many-to-one reductions. If XX and YY are many-to-one-equivalent, they may not be computably isomorphic for cardinality reasons. For example, if X={0}X = \{0\} and Y={0,1}Y = \{0, 1\}, then there cannot be a bijective reduction from XX to YY because XX has cardinal 1 and YY has cardinal 2. The same goes if X=N{0}X = ℕ ∖ \{0\} and Y=N{0,1}Y = ℕ ∖ \{0, 1\}, because of the cardinals of the complements.

The assumption that the reductions are one-to-one prevents these cardinality problems, by the vanilla Cantor-Schröder-Bernstein theorem. But what if we just require that the cardinalities match? Can we find XX and YY which are many-to-one equivalent, and between which there is a bijective redutin but no bijective reduction? Can the theorem fail other than for silly cardinality reasons?

From another point of view: In the Myhill isomorphism theorem, we require that the same functions serve as reductions and one-to-one mappings. What if we relax this requirement?

Curiously, I did not find a word about this obvious generalization in any reference about this theorem. (I did find proofs which were essentially equivalent to the one above, but written in a very formal and hard-to-decipher style).

I went searching for a counter-example myself, and in the end I found one, but it is a tad contrived and I remain convinced there must be a simpler one. If you find it, let me know!

Let us first narrow down our search with simple remarks. If XX is finite and there is a bijective redutin ff from XX to YY (this means XX = f1(Y)f^{-1}(Y), which for a bijection is equivalent to f(X)=Yf(X) = Y), then YY is finite with the same cardinal as XX, and the two are trivially computably isomorphic (i.e., one can choose ff computable). The same goes for co-finite sets. On the other hand, if XX and YY are both infinite and co-infinite, then clearly there exists a bijective redutin from XX to YY (glue together any bijection between XX and YY with any bijection between the complement of XX and the complement of YY, all these sets being countably infinite).

Therefore, at this point, our objective is to find two infinite and co-infinite sets XX and YY which are many-to-one-equivalent, but not computably isomorphic. By the Myhill isomorphism theorem, demanding that they are not computably isomorphic is the same as demanding that there be no one-to-one reduction from one of them to the other, for example, no one-to-one reduction from XX to YY.

I will choose XX wisely, and then use for YY the set XX with its minimum element removed. This construction clearly ensures that XX and YY are many-to-one-equivalent; the reductions just need to special-case that minimum element and otherwise return the input.

The hope is that, because there is an extra element in XX, a one-to-one reduction from XX to YY is forced to map elements of XX to larger elements of XX (e.g., map each element to the next one), and we can try to make this transformation non-computable.

If we use for XX, say, the halting problem, we unfortunately do not get a counter-example. This is because a reduction from XX to YY can manage to be one-to-one by a “padding” trick. Intuitively, when we have two instances of XX which correspond to the same Turing machine and we want the reduction to map them to different Turing machines, we can artificially make them different, by adding useless isolated states in one of them.

The idea of my counter-example is that if we space out the elements of XX enormously, a one-to-one reduction from XX to YY needs to compute numbers so large that we are able to prevent any computable function from doing the job. This is reminiscent of the busy beaver function.

Assume that we have XX and YY as specified. Denote X={x0,x1,x2,}X = \{x_0, x_1, x_2, …\} where x0x1x2x_0 ≤ x_1 ≤ x_2 ≤ … (thus Y={x1,x2,}Y = \{x_1, x_2, …\}).

Assume that ff is a one-to-one reduction from XX to YY. It is computed by some Turing machine with a certain number of states n1n ≥ 1. By injectivity, the nn first elements of XX, namely x0,,xn1x_0, …, x_{n-1}, are sent to nn distinct elements of YY, therefore at least one of them is xnx_n or larger. Thus, we have k<nk < n such that f(xk)xxnf(x_k) ≥ x ≥ x_n.

At last, we can see how to construct XX to make this impossible: pick x0x_0 arbitrarily, and iteratively choose xnx_n larger than the maximum image of an xkx_k for k<nk < n by a Turing machine with nn states. ∎


Comments

Leave a comment