Better Handling of Duplicates: 3-Way Quicksort

Consider sorting an array of $1000$ people by their birth month. There are only 12 possible months -- if these are uniformly distributed in our array, we can expect each month will show up many, many times in our array. We thus have an array with many duplicate values. Situations like this are common in real-world scenarios.

Suppose one uses quicksort to sort an array with many duplicate values. At some point the recursive calls are undoubtedly going to result in sub-arrays consisting entirely of identical elements. There is no need to sort such sub-arrays any further, of course -- but the typical quicksort algorithm is not able to recognize this is happening, and will mindlessly waste time recursively breaking things down into smaller and smaller sub-arrays.

Think about how partitioning works in the quicksort for a moment -- in particular, think about how elements equal to the pivot move during this process. In "ends-to-middle" partitioning, elements equal to the pivot can end up in both sub-arrays. In "left-to-right" partitioning, they always end up to the left of the pivot. However, over the course of a single partition there is no guarantee that elements equal to the pivot will end up adjacent to one another. (Take a minute and convince yourself of these 3 claims.).

Now think about the potential offered by modifying our partitioning technique to group array elements into three different groups: those less than the pivot, those equal to the pivot, and those greater than the pivot.

So for example, if there was some way to rearrange the array below with pivot $P$ $$\begin{array}{|c|c|c|c|c|c|c|c|c|c|c|c|c|c|} \color{red}P & A & B & X & W & P & P & V & P & D & P & C & Y & Z\\\hline \end{array}$$ into something like the following, in the course of a single partition. $$\begin{array}{|c|c|c|c|c|c|c|c|c|c|c|c|c|c|} \color{purple}A & \color{purple}B & \color{purple}C & \color{purple}D & \color{red}P & \color{red}P & \color{red}P & \color{red}P & \color{red}P & \color{green}V & \color{green}W & \color{green}Y & \color{green}Z & \color{green}X\\\hline \end{array}$$ If this hypothetical partitioning could also identify the positions where the group of elements equal to the pivot starts and stops, then we could recursively sort the sub-arrays of smaller and larger elements without doing any more unnecessary work related to the duplicate pivot values.

3-Way Partitioning

Thankfully, this type of partitioning -- called 3-way partitioning -- is indeed possible! Let's look at how to accomplish the above rearrangement by hand, before we discuss how to implement things in code:

Again, we start with an array in some random order -- identifying the left-most element as the pivot ($P$ in this case). Our intention is to let the single red $P$ above move incrementally to the right, and occasionally grow as more $P$'s are encountered.

$$\begin{array}{|c|c|c|c|c|c|c|c|c|c|c|c|c|c|} \color{red}P & A & B & X & W & P & P & V & P & D & P & C & Y & Z\\\hline \rightarrow & \leftarrow & & & & & & & & & & & & \\ \end{array}$$

Let us examine the letter positioned just to the right of our the red pivot $P$. Depending on whether this is less than, equal to, or greater than the pivot, we will move things appropriately. The first letter so found is $A$ -- this is less than the pivot and should consequently be to the left of all of the $P$'s. As such, we exchange it with the left-most $P$. Note how this moves the red $P$ incrementally to the right.

$$\begin{array}{|c|c|c|c|c|c|c|c|c|c|c|c|c|c|} \color{purple}A & \color{red}P & \color{black}B & \color{black}X & \color{black}W & \color{black}P & \color{black}P & \color{black}V & \color{black}P & \color{black}D & \color{black}P & \color{black}C & \color{black}Y & \color{black}Z \\\hline & \rightarrow & \leftarrow & & & & & & & & & & & \\ \end{array}$$ Looking again one right of the right-most red $P$, we find $B$ this time. Again this is less than the pivot, so we exchange it with the left-most $P$ to keep it left of all of the $P$'s to produce the following. Note again, how the $P$'s are slowly drifting to the right as more values are found less than this pivot value. $$\begin{array}{|c|c|c|c|c|c|c|c|c|c|c|c|c|c|} \color{purple}A & \color{purple}B & \color{red}P & \color{black}X & \color{black}W & \color{black}P & \color{black}P & \color{black}V & \color{black}P & \color{black}D & \color{black}P & \color{black}C & \color{black}Y & \color{black}Z \\\hline & & & \rightarrow & & & & & & & & & & \leftarrow\\ \end{array}$$ This time when we look one right of the right-most red $P$, we find $X$, which is greater than the pivot. We know it must be to the right of all of the $P$'s, so we color it green and exchange it with the right-most unexamined (black) element. $$\begin{array}{|c|c|c|c|c|c|c|c|c|c|c|c|c|c|} \color{purple}A & \color{purple}B & \color{red}P & \color{black}Z & \color{black}W & \color{black}P & \color{black}P & \color{black}V & \color{black}P & \color{black}D & \color{black}P & \color{black}C & \color{black}Y & \color{green}X \\\hline & & & \rightarrow & & & & & & & & & \leftarrow & \\ \end{array}$$ The same happens with $Z$, the next letter to the right of the right-most red $P$. It is greater than the pivot, so we color it green and exchange it with the rightmost black element, thus ensuring it is to the right of all of the $P$'s. This puts $Y$ to the right of the right-most red $P$, and so it gets colored green and "thrown right" as well, swapping positions with the $C$: $$\begin{array}{|c|c|c|c|c|c|c|c|c|c|c|c|c|c|} \color{purple}A & \color{purple}B & \color{red}P & \color{black}Y & \color{black}W & \color{black}P & \color{black}P & \color{black}V & \color{black}P & \color{black}D & \color{black}P & \color{black}C & \color{green}Z & \color{green}X \\\hline & & & \rightarrow & & & & & & & & \leftarrow & & \\ \end{array}$$ $$\begin{array}{|c|c|c|c|c|c|c|c|c|c|c|c|c|c|} \color{purple}A & \color{purple}B & \color{red}P & \color{black}C & \color{black}W & \color{black}P & \color{black}P & \color{black}V & \color{black}P & \color{black}D & \color{black}P & \color{green}Y & \color{green}Z & \color{green}X \\\hline & & \rightarrow & \leftarrow & & & & & & & & & & \\ \end{array}$$ $C$ is less than the pivot, so we swap it with the left-most red $P$ to ensure it is left of all of the $P$'s. Note again how the red $P$'s move incrementally to the right as a result. After that, $W$ is greater than the pivot, so we color it green and swap it with the last black letter (a $P$). $$\begin{array}{|c|c|c|c|c|c|c|c|c|c|c|c|c|c|} \color{purple}A & \color{purple}B & \color{purple}C & \color{red}P & \color{black}W & \color{black}P & \color{black}P & \color{black}V & \color{black}P & \color{black}D & \color{black}P & \color{green}Y & \color{green}Z & \color{green}X \\\hline & & & & \rightarrow & & & & & & \leftarrow & & & \\ \end{array}$$ $$\begin{array}{|c|c|c|c|c|c|c|c|c|c|c|c|c|c|} \color{purple}A & \color{purple}B & \color{purple}C & \color{red}P & \color{black}P & \color{black}P & \color{black}P & \color{black}V & \color{black}P & \color{black}D & \color{green}W & \color{green}Y & \color{green}Z & \color{green}X \\\hline & & & & * & & & & & & & & & \\ \end{array}$$ Finally, the letter to the right of the right-most red $P$ equals the pivot value! We want to keep the $P$ values together, so we preserve this adjacency by simply coloring this new $P$ red (no elements are swapped). This does mean that the our new $P$ is now the right-most red $P$, if only temporarily. Given that the next two letters past this new right-most red $P$ are also $P$'s, they each in turn become a newly-colored, right-most red $P$: $$\begin{array}{|c|c|c|c|c|c|c|c|c|c|c|c|c|c|} \color{purple}A & \color{purple}B & \color{purple}C & \color{red}P & \color{red}P & \color{black}P & \color{black}P & \color{black}V & \color{black}P & \color{black}D & \color{green}W & \color{green}Y & \color{green}Z & \color{green}X \\\hline & & & & & * & & & & & & & & \\ \end{array}$$ $$\begin{array}{|c|c|c|c|c|c|c|c|c|c|c|c|c|c|} \color{purple}A & \color{purple}B & \color{purple}C & \color{red}P & \color{red}P & \color{red}P & \color{black}P & \color{black}V & \color{black}P & \color{black}D & \color{green}W & \color{green}Y & \color{green}Z & \color{green}X \\\hline & & & & & & * & & & & & & & \\ \end{array}$$ $$\begin{array}{|c|c|c|c|c|c|c|c|c|c|c|c|c|c|} \color{purple}A & \color{purple}B & \color{purple}C & \color{red}P & \color{red}P & \color{red}P & \color{red}P & \color{black}V & \color{black}P & \color{black}D & \color{green}W & \color{green}Y & \color{green}Z & \color{green}X \\\hline & & & & & & & \rightarrow & & \leftarrow & & & & \\ \end{array}$$ Finally, turning $V$ green and swapping it with the right-most black letter $D$, means that $D$ will then turn purple and be swapped with the left-most red $P$ to finally bring the group of red $P$'s adjacent to the last $P$. Turning this last $P$ red completes the partition. $$\begin{array}{|c|c|c|c|c|c|c|c|c|c|c|c|c|c|} \color{purple}A & \color{purple}B & \color{purple}C & \color{red}P & \color{red}P & \color{red}P & \color{red}P & \color{black}D & \color{black}P & \color{green}V & \color{green}W & \color{green}Y & \color{green}Z & \color{green}X \\\hline & & & \rightarrow & & & & \leftarrow & & & & & & \\ \end{array}$$ $$\begin{array}{|c|c|c|c|c|c|c|c|c|c|c|c|c|c|} \color{purple}A & \color{purple}B & \color{purple}C & \color{purple}D & \color{red}P & \color{red}P & \color{red}P & \color{red}P & \color{black}P & \color{green}V & \color{green}W & \color{green}Y & \color{green}Z & \color{green}X \\\hline & & & & & & & & * & & & & & \\ \end{array}$$ $$\begin{array}{|c|c|c|c|c|c|c|c|c|c|c|c|c|c|} \color{purple}A & \color{purple}B & \color{purple}C & \color{purple}D & \color{red}P & \color{red}P & \color{red}P & \color{red}P & \color{red}P & \color{green}V & \color{green}W & \color{green}Y & \color{green}Z & \color{green}X \\\hline \end{array}$$