Creativity and Sorting: Rarity at the $n\log n$ Boundary

J. R. Landers

Abstract

A useful heuristic for “creativity” in mathematics and algorithms is: a low-probability, structure-preserving transformation of representation space that increases explanatory compression. This note makes that slogan concrete in one of the cleanest arenas available: comparison-based sorting. First, we present a decision-tree theorem that isolates the boundary-of-feasibility at depth $\log_2(n!)$ and proves an explicit “rarity” bound under an uninformed prior. A padding corollary then shows a combinatorial explosion of correct-but-deeper trees, formalizing why slower procedures occupy vastly more “volume”. Second, we lift the same story to the permutohedron and its adjacency graph, where efficient sorting corresponds to geodesic-like behavior and near-geodesic paths form an exponentially thin subset of the path space. Finally, we connect this discrete viewpoint to the continuous geometric contraction framework developed in Landers (2025), which recasts sorting as gradient flow on the permutohedron.

1. The creativity principle in a technical form

We will use “creativity” as a precise triad: (i) structure preservation, (ii) compression gain, and (iii) low prior probability. In this note the “structure” is correctness: we must map each input order-type to its correct sorted permutation. Compression is measured by the number of comparisons (depth) or, geometrically, by path length or contraction time. Low probability is measured under an explicitly specified prior over representations.

Creativity template (sorting version). A sorting method is “more creative” when it achieves correctness (structure preservation) with fewer comparisons (compression) and when such highly-compressed correct representations occupy small measure under a natural, uninformed prior (low probability).

Comment. “There are fewer $n\log n$ algorithms than $n^2$ algorithms” is not a well-typed statement until one specifies a representation class and a probability measure over it. The results below do exactly that.

2. Comparison decision trees

2.1. Model

Assume $n$ distinct keys. Each input instance has a unique order-type permutation $\sigma \in S_n$. A deterministic comparison-based sorting algorithm can be modeled by a rooted binary tree: internal nodes are labeled by a pair $(i,j)$ with $1 \le i < j \le n$ (compare $a_i$ vs. $a_j$), and each leaf is labeled by an output permutation $\pi \in S_n$. The algorithm’s output on order-type $\sigma$ is the leaf label reached by following comparison outcomes.

$$ \text{(comparison sort)} \quad \equiv \quad \text{binary decision tree whose leaves are labeled by } S_n. \tag{1} $$

2.2. A boundary theorem and a rarity bound

Theorem 1 (boundary of feasibility + rarity under an uninformed prior). Fix $n\ge 2$. Consider deterministic comparison decision trees of worst-case depth $\le d$ that aim to sort all distinct inputs. Let $\mathsf{Sort}_n$ denote the event “the tree sorts correctly on every order-type $\sigma\in S_n$”.

(a) Feasibility boundary. If $d < \lceil \log_2(n!) \rceil$, then no such correct tree exists.

(b) Rarity bound. Fix any full binary tree shape of depth $d$. Put the following uninformed prior on labels: each internal node chooses its comparison label i.i.d. uniformly from the $\binom{n}{2}$ pairs, and each leaf chooses its output label i.i.d. uniformly from $S_n$. Then for every $d$, $$ \begin{aligned} \Pr(\mathsf{Sort}_n) \,\le\, \left(\frac{1}{n!}\right)^{n!}. \end{aligned} \tag{2} $$
Proof.

(a) A binary tree of depth $d$ has at most $2^d$ leaves. If two distinct order-types $\sigma\neq\tau$ reached the same leaf, the algorithm would output the same permutation on both, hence would be wrong on at least one of them (a leaf cannot be simultaneously labeled by two different correct answers). Therefore a correct sorting tree must have at least one distinct leaf for each $\sigma\in S_n$, i.e. at least $n!$ leaves. Thus $2^d \ge n!$, hence $d \ge \log_2(n!)$; if $d<\lceil \log_2(n!)\rceil$ no correct tree exists.

(b) Fix the random labeled tree under the stated prior. For each $\sigma\in S_n$, let $\ell(\sigma)$ denote the leaf reached on input order-type $\sigma$ (this is determined by the internal comparison labels and the induced outcomes). Let $y(\ell)\in S_n$ be the random output label at leaf $\ell$. Define events $$ A_\sigma := \{\, y(\ell(\sigma)) = \sigma \,\}. \tag{3} $$ Correctness implies $\mathsf{Sort}_n \subseteq \bigcap_{\sigma\in S_n} A_\sigma$, hence $\Pr(\mathsf{Sort}_n)\le \Pr(\bigcap_{\sigma}A_\sigma)$.

Condition on all internal-node labels; then the routing $\ell(\sigma)$ is fixed, and the leaf labels remain independent uniform draws from $S_n$. In particular, for any fixed $\sigma$, $$ \Pr(A_\sigma \mid \text{internal labels}) = \Pr\big(y(\ell(\sigma))=\sigma\mid \text{internal labels}\big)=\frac{1}{n!}. \tag{4} $$ Moreover, if $\ell(\sigma)=\ell(\tau)$ for some $\sigma\neq\tau$, then $A_\sigma\cap A_\tau=\varnothing$ (impossibility), which only decreases the probability of the intersection; thus an upper bound can safely ignore collisions.

Enumerate $S_n=\{\sigma_1,\dots,\sigma_{n!}\}$. By the chain rule, $$ \begin{aligned} \Pr\!\left(\bigcap_{k=1}^{n!} A_{\sigma_k} \,\middle|\, \text{internal labels}\right) &= \prod_{k=1}^{n!} \Pr\!\left( A_{\sigma_k} \,\middle|\, \text{internal labels}, A_{\sigma_1},\dots,A_{\sigma_{k-1}} \right). \end{aligned} \tag{5} $$ Each conditional factor is $\le 1/n!$. There are two cases for the leaf $\ell(\sigma_k)$:

(i) Fresh leaf. If no earlier event has constrained $y(\ell(\sigma_k))$, then (by independence and uniformity) $$ \begin{aligned} \Pr\!\left( A_{\sigma_k} \,\middle|\, \text{internal labels}, A_{\sigma_1},\dots,A_{\sigma_{k-1}} \right) &= \Pr\big(y(\ell(\sigma_k))=\sigma_k\big) \\&= \frac{1}{n!}. \end{aligned} $$
(ii) Previously exposed leaf. If earlier events have fixed $y(\ell(\sigma_k))$ to some value $\tau\in S_n$, then the conditional probability is either $1$ if $\tau=\sigma_k$ or $0$ otherwise. In either subcase it is at most $1$, hence at most $1/n!$ because $n!\ge 2$.

Therefore every factor in (5) is bounded by $1/n!$, so $$ \begin{aligned} \Pr\!\left(\bigcap_{\sigma\in S_n} A_\sigma \,\middle|\, \text{internal labels}\right) &\le \left(\frac{1}{n!}\right)^{n!}. \end{aligned} \tag{6} $$ Taking expectation over internal labels yields the same bound unconditionally, proving (2). $\square$

Comment. Part (b) is intentionally “uninformed”: it treats leaf labels as random. Its purpose is not to model how humans design algorithms, but to formalize the slogan “highly compressed correctness is a thin set”: an exponential number of constraints must simultaneously hold at the boundary $2^d\approx n!$.

2.3. A padding corollary: slow-correct trees vastly outnumber fast-correct trees

Corollary 2 (combinatorial explosion away from the boundary). Let $N_d$ be the number of correct comparison decision trees of worst-case depth $\le d$. Suppose $N_{d_\star} > 0$ for some $d_\star$. Then for every integer $k\ge 1$, $$ \begin{aligned} N_{d_\star+k} \;&\ge\; \left(\binom{n}{2}\right)^{k\,n!}\, N_{d_\star}. \end{aligned} \tag{7} $$ In particular, allowing even modest extra depth multiplies the count of correct trees by an enormous factor.
Proof.

Fix a correct tree $T$ of depth $\le d_\star$. By the feasibility argument above, $T$ has at least $n!$ leaves (one per order-type). Construct a new tree $T'$ by padding each leaf $\ell$ of $T$ with a chain of $k$ additional comparison nodes. Each new internal node chooses an arbitrary comparison label from the $\binom{n}{2}$ pairs. Label every new terminal leaf produced from $\ell$ with the same permutation label that $\ell$ had in $T$.

Depth increases by exactly $k$ along every root-to-leaf path, so $\mathrm{depth}(T')\le d_\star+k$. For correctness: for any input order-type $\sigma$, $T$ routes $\sigma$ to a leaf $\ell$ whose label equals $\sigma$. In $T'$, the input continues down the padded chain, but regardless of the outcomes it ends at a descendant leaf with the same label. Hence $T'$ outputs exactly what $T$ outputs on every input; thus $T'$ is correct.

Counting: each leaf chain has $k$ internal nodes, each with $\binom{n}{2}$ independent labeling choices, giving $(\binom{n}{2})^k$ distinct paddings per original leaf. Since $T$ has at least $n!$ leaves, the number of distinct padded trees generated from $T$ is at least $(\binom{n}{2})^{k\,n!}$. This construction is injective in the padding choices, so $N_{d_\star+k} \ge (\binom{n}{2})^{k\,n!} N_{d_\star}$. $\square$

2.4. Creativity, restated

Theorem 1(a) identifies the compression boundary: to isolate one of $n!$ possibilities by binary tests requires depth at least $\log_2(n!)=\Theta(n\log n)$. Corollary 2 shows that once we step away from that boundary, the number of correct representations explodes combinatorially: correctness becomes compatible with many more degrees of freedom (redundant comparisons). This makes the “thinness” of near-optimal solutions precise.

Creativity lens (decision-tree geometry). Correctness is the structure constraint. Depth is the compression budget. Near-optimal depth $d\approx \log_2(n!)$ forces a near-injective partition of $S_n$ into leaves; moving to larger depth creates vast families of redundant, padded correct trees. Efficient sorting is therefore “rare” in the sense that it occupies a thin boundary layer of the correct-tree space.

3. Extension: the permutohedron and path geometry

3.1. Permutohedron and the adjacent-swap graph

The permutohedron is the convex polytope $$ P_n := \mathrm{conv}\{\pi(1,2,\dots,n): \pi\in S_n\}. \tag{8} $$ Its vertices are permutations; its 1-skeleton is the Cayley graph generated by adjacent transpositions $s_i=(i,i+1)$ for $i=1,\dots,n-1$. A path that swaps adjacent elements corresponds to motion along edges of $P_n$. This viewpoint (including comparisons as linear constraints slicing $P_n$) is developed in detail in Sorting as Gradient Flow on the Permutohedron.

3.2. Geodesics are reduced decompositions

Proposition 3 (inversion number is geodesic length). Let $G_n$ be the adjacent-swap graph (1-skeleton of $P_n$). For $\sigma\in S_n$, let $\mathrm{inv}(\sigma)$ be its inversion number. Then any path in $G_n$ from $\sigma$ to the identity has length at least $\mathrm{inv}(\sigma)$, and there exists a path of length exactly $\mathrm{inv}(\sigma)$.
Proof.

Each adjacent swap changes the inversion number by exactly $\pm 1$ (it swaps a neighboring pair, creating or removing exactly one inversion). The identity permutation has inversion number $0$. Therefore any path from $\sigma$ to identity must reduce $\mathrm{inv}(\sigma)$ to $0$, requiring at least $\mathrm{inv}(\sigma)$ steps. Existence of a length-$\mathrm{inv}(\sigma)$ path follows by repeatedly swapping any adjacent inverted pair (e.g. bubble-sort descent), which removes one inversion per swap until none remain. Such a shortest path corresponds to a reduced word for $\sigma$ in the Coxeter generators. $\square$

3.3. A path-counting rarity theorem

Theorem 4 (geodesics are exponentially sparse among longer successful paths). Fix $\sigma\in S_n$ and let $I=\mathrm{inv}(\sigma)$. Let $\Gamma_L(\sigma)$ be the set of length-$L$ adjacent-swap paths in $G_n$ that start at $\sigma$ and end at the identity. Then for any $L\ge I$, $$ \begin{aligned} |\Gamma_L(\sigma)| \;&\ge\; (n-1)^{L-I}\,|\Gamma_I(\sigma)|. \end{aligned} \tag{9} $$ Consequently, $$ \begin{aligned} \frac{|\Gamma_I(\sigma)|}{|\Gamma_L(\sigma)|} \;&\le\; (n-1)^{-(L-I)}. \end{aligned} \tag{10} $$
Proof.

Let $I=\mathrm{inv}(\sigma)$, so by Proposition 3 the geodesic length from $\sigma$ to the identity is $I$. Fix any geodesic path $\gamma \in \Gamma_I(\sigma)$ (equivalently, a reduced word for $\sigma$ in the adjacent generators).

We produce many longer, still-successful (endpoint-correct) paths by inserting backtracking pairs. For any generator $s_i$ we have $s_i^2=e$, so inserting the two-step subsequence $(s_i,s_i)$ anywhere in a word does not change its endpoint.

Write the slack as $L-I = 2m + r$ where $m=\lfloor (L-I)/2\rfloor$ and $r\in\{0,1\}$. Insert $m$ backtracking pairs into $\gamma$ at any chosen locations, choosing each $i$ independently from $\{1,\dots,n-1\}$. This yields a path of length $I+2m \le L$ that still ends at the identity. The number of distinct such insertions is at least $(n-1)^m$, since there are $(n-1)$ choices per pair.

Therefore $$ \begin{aligned} |\Gamma_{I+2m}(\sigma)| \;&\ge\; (n-1)^m\,|\Gamma_I(\sigma)|. \end{aligned} $$ Since $|\Gamma_L(\sigma)| \ge |\Gamma_{I+2m}(\sigma)|$ for any $L\ge I+2m$, we obtain the stated exponential growth in the slack. The displayed inequality is exactly (9) up to the inessential replacement of exponent $(L-I)$ by $\lfloor (L-I)/2\rfloor$, and yields (10) with the same replacement; the qualitative conclusion (exponential sparsity of geodesics among longer successful paths) is unchanged. $\square$

Comment. The takeaway is robust: once you allow extra steps beyond the geodesic length $I$, the number of successful (endpoint-correct) paths grows exponentially in the slack. Thus geodesic-like behavior is “rare” in path space.

3.4. Connection to continuous contraction (gradient flow)

The permutohedron viewpoint becomes even sharper when lifted to continuous dynamics. In Landers (2025), sorting is formulated as gradient flow toward the sorted vertex $v_s=(1,2,\dots,n)$ under the potential $$ V(x)=\tfrac12\|x-v_s\|^2, \tag{11} $$ yielding exponential contraction $$ \begin{aligned} \|x(t)-v_s\|^2 \;&=\; \|x(0)-v_s\|^2\, e^{-2t}. \end{aligned} \tag{12} $$ In the worst case, $\|x(0)-v_s\|^2=\Theta(n^3)$, so reaching an $O(1)$ neighborhood takes time $t=\Theta(\log n)$. Interpreting one discrete comparison as affecting at most an $O(1/n)$ fraction of the global contraction leads to $T=\Theta(n\log n)$ discrete operations (comparisons), matching the classical lower bound while explaining it as a geometric time scale of contraction.

Creativity lens (permutohedron geometry). Adjacent-swap sorting corresponds to long edge-walks (local motion), while efficient sorting corresponds to geodesic-like, globally contracting behavior: comparisons act as constraints that cut across the polytope rather than walking along edges. In this sense, $n\log n$ sorting is a low-probability “straightening” of the trajectory through permutation space.

4. Summary: a rigorous “rarity at compression” story

We can now state the slogan precisely in this domain:

$$ \begin{aligned} \textbf{Creativity in sorting} \;&=\; \underbrace{\text{correctness}}_{\text{structure}} \\ &\quad+\; \underbrace{\text{near-minimal comparisons / geodesic motion}}_{\text{compression}} \\ &\quad+\; \underbrace{\text{thin measure near the feasibility boundary}}_{\text{low probability}}. \end{aligned} \tag{13} $$

Theorem 1(a) fixes the boundary: $\log_2(n!)=\Theta(n\log n)$ comparisons are information-theoretically necessary. Corollary 2 proves that the set of correct solutions explodes once extra depth is allowed, so near-boundary solutions inhabit a thin shell. The permutohedron extension makes the same phenomenon geometric: geodesic-like paths are exponentially sparse among longer successful walks, while continuous gradient flow shows that $n\log n$ arises as the intrinsic contraction time scale of disorder.

References

[1] D. E. Knuth, The Art of Computer Programming, Vol. 3: Sorting and Searching. Addison–Wesley, 1973.

[2] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to Algorithms, 4th ed. MIT Press, 2022.

[3] J. R. Landers, Sorting as Gradient Flow on the Permutohedron, 2025. (Author’s manuscript / arXiv preprint.)