  Heapsort

Recall, in a heap-implemented priority queue, one can insert keys in any order and pull them out in either descending order (if a max heap is used) or ascending order (if a min heap is used). Further, the keys internally are stored in an array. Doesn't this feel like this could lead to a way to a heap-inspired way to sort an array? If you said "yes", you are absolutely correct!

As a first pass, one might attempt something as straight-forward as the following:

import java.util.Arrays;

public class FirstAttempt {

public static <Item extends Comparable<Item>> void sort(Item[] a) {
int n = a.length;
MaxHeap<Item> heap = new MaxHeap<Item>();

for (int i = 0; i < n; i++)
heap.insert(a[i]);

for (int i = n-1; i >= 0; i--)
a[i] = heap.removeMax();
}

public static void main(String[] args) {
Character[] a = {'f','e','a','c','g','i','h','d','b'};
sort(a);
System.out.println(Arrays.toString(a));
}
}

For each element inserted, the comparison cost should be $O(\ln n)$, given the maximum depth through which an inserted item may "swim". So for inserting $n$ items, we have a comparison cost of $O(n\ln n)$.

However, there is one annoying issue with the above. The items to sort are in one array, but they are moved to another array (the one inside heap) as part of the sorting process. Consequently, the above sorting method is not "in-place". Is there, we naturally then wonder, an in-place version of this sort? Again, the answer is "yes"!

The trick is to put the array to be sorted itself in heap order. Then, similar to a heap-implemented priority queue, we can repeatedly remove its maximum elements to put the array elements in increasing order.

The key to doing this efficiently is to recognize that any complete tree whose root's left and right subtrees can be considered heaps can itself be put into heap order by attempting to sink its root node. (The attempt may not result in any change, of course, if the root is already larger than its children.) We focus initially on the collection of (necessarily complete) subtrees rooted at the second-to-last level. This collection includes only subtrees of height $0$ (i.e., single nodes -- which can be interpreted as heaps, since they are trivially both complete and in heap order) and subtrees of height $1$. The subtrees of height $1$ may not be in heap order -- but appealing to the previous observation, they can be turned into heaps by attempting to sink their root nodes.

We thus attempt to sink each subtree's root, thus ensuring all of the subtrees rooted at the second-to-last level can individually be thought of as small heaps.

Then, we turn our attention to the subtrees rooted just above these. They too can be turned into heaps by just attempting to sink their respective root elements.

In this way, we progress level by level -- making larger and larger subtree heaps until we reach the root of the overall tree. Attempting to sink this final root puts the entire tree in heap order -- making it ready for successive removals of its largest elements to complete the sorting of the array.

Consider the following example: Suppose we wish to sort the following array of letters: $$\begin{array}{|c|c|c|c|c|c|c|c|c|c|c|c|} 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 & 11\\\hline - & S & O & R & T & E & X & A & M & P & L & E\\\hline \end{array}$$ Starting with the the second-to-last level, we consider the subtrees rooted at $T$, $E$, $X$, and $A$. Noting that $A$ and $X$ have no children, they are one-node subtrees, and trivially heaps. However, the subtree rooted at $E$ is not currently in heap order as $E < L$. However, sinking $E$ rectifies the situation, which we accomplish with a single exchange with its greatest child $L \gt E$: In doing this and all of the exchanges to come, remember that given some item at index $n$, its left child is at index $2n$ and its right child is at index $2n+1$. (So here, we look at indices $5$ and $2\cdot5=10$.) $$\begin{array}{|c|c|c|c|c|c|c|c|c|c|c|c|} 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 & 11\\\hline - & S & O & R & T & \color{red}L & X & A & M & P & \color{red}E & E\\\hline \end{array}$$ The subtree rooted at $T$ is already in heap order as $T$ is greater than both its children, so the attempt to sink $T$ doesn't change anything. Now we move up a level, and deal with the subtrees rooted at $O$ and $R$. Attempting to sink $R$ exchanges it with its largest child $X$, which is greater than $R$. $$\begin{array}{|c|c|c|c|c|c|c|c|c|c|c|c|} 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 & 11\\\hline - & S & O & \color{red}X & T & L & \color{red}R & A & M & P & E & E\\\hline \end{array}$$ $O$ sinks deeper, first exchanging places with its largest child $T \gt O$, and then again with the largest child of that child, $P$, which is also greater than $O$. $$\begin{array}{|c|c|c|c|c|c|c|c|c|c|c|c|} 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 & 11\\\hline - & S & \color{red}T & X & \color{red}O & L & R & A & M & P & E & E\\\hline \end{array}$$ $$\begin{array}{|c|c|c|c|c|c|c|c|c|c|c|c|} 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 & 11\\\hline - & S & T & X & \color{red}P & L & R & A & M & \color{red}O & E & E\\\hline \end{array}$$ Moving up a level, we find ourselves at the root. Sinking $S$, we exchange it with its largest child $X \gt S$. $$\begin{array}{|c|c|c|c|c|c|c|c|c|c|c|c|} 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 & 11\\\hline - & \color{red}X & T & \color{red}S & P & L & R & A & M & O & E & E\\\hline \end{array}$$ This puts the entire array in heap order, leaving us ready to repeatedly remove the largest element from this heap, in a manner similar to dequeuement in a heap-implemented priority queue, to complete the sorting process.

Despite the complex logic behind this "bottom-up heapification" of our array, the implementation of this part of the sorting algorithm is almost trivial. Remember, all we are doing is attempting to sink every node starting at the second to last level and working our way up.

Given that the order in which we do this to the nodes in any one level doesn't matter, we choose a convenient one: Noting that when a tree has $n$ nodes, the right-most node on the second to last level is at index $\lfloor n \rfloor$, we start with this index and decrement it by one until we get to $1$. This takes us through all of the indices for the nodes on the second-to-last level, then through all of the nodes above that, then to the nodes above that, and so on.. until we end at the root node.

With this plan for the sequencing of how we address the nodes, we can now accomplish the aforementioned "heapification" with a simple for-loop:

for (int k = n / 2; k >= 1; k--) {
sink(k, n);
}

All that is left to do is the repeated removal of the maximum elements in the heap that results. Of course, we don't actually want to remove these elements from the array -- just the heap. As such, we plan to move these "removed" items to the end of the array.

This requires us to modify our sink() method slightly. In addition to supplying the index of the item to sink, we now need to also provide a parameter $n$ that describes the heap size. We limit the "sinking" to the first $n$ elements in the array, leaving the space to the right of these $n$ elements as a place to store the items removed from the heap. With every exchange of the root with the last element (the first step in dequeuement), we decrease the value of n passed to sink(). In this way, the elements removed from the heap never actually leave the array.

There are a couple of other minor modifications we'll want to make as well:

• We should add generic type support, as always.

• In consistency with the other sorting methods we have implemented, it will be nice to support our Sorter interface.

• As much out of courtesy to the client, as in an effort to hide implementation details, we need to realize that the arrays a client will likely want to sort won't always be null at index $0$. (Remember, this was a trick we pulled to make finding indices of parents and children easy to calculate.) Consequently, we need to modify our methods for comparing and exchanging items in the array to account for the use of indices from $0$ to $(n-1)$ instead of from $1$ to $n$. The additional calculations required to do this add a tiny, but negligible running time cost -- but are well worth their inclusion.

Here's a full heapsort implementation that incorporates all of the above:

public class HeapSorter<Item extends Comparable<Item>> implements Sorter<Item> {

Item[] a;

private boolean less(int v, int w) {
return (a[v-1]).compareTo(a[w-1]) < 0;
// the "one off" here accommodates sorting array elements from 0 to length - 1,
// while heap indices are normally 1 to length to make finding parent
// and children indices easy
}

private void exch(int i, int j) { // we see a "one off" modified version of exch()
Item tmp = a[i-1];             // for the same reason as above
a[i-1] = a[j-1];
a[j-1] = tmp;
}

// Importantly, the heap we use here is one where LARGER elements are higher up in the heap!
// So smaller things will sink to the bottom...

private void sink(int k, int n) {    // sinks element in position k in heap
// formed by only the first n elements in array
// (i.e., it ignores elements to the right
// of the first n elements)

while (2*k <= n) {                // while there is a left child in this n-sized heap
int j = 2*k;                  // let j be the left child
if ((j < n) && less(j,j+1))   // if there is a right child BIGGER than left child
j++;                      // make j the right child, so j is always the LARGEST child
if (less(j, k))               // if parent is bigger than biggest child, stop sinking
break;
exch(k, j);                   // otherwise, exchange sinking parent and biggest child
k = j;                        // reset k to be the new (now lower) parent and do it
}                                 // all over again...
}

public void sort(Item[] a) {
this.a = a;
int n = a.length;

for (int k = n / 2; k >= 1; k--) { // sink all elements in 2nd to bottom level.
sink(k, n);                    // then sink all elements in 3rd to bottom level.
}                                  // continue in this way until one reaches the top level.
// this puts array elements in heap order quickly

while (n > 1) {                    // sort from right to left
exch(1, n--);                  // i.e., pull max off top, exchange to end
sink(1, n);                    // then leave end untouched and do the same
}                                  // to smaller list. repeat until done.
}
}