Another less obvious property we should worry about when sorting is the stability of the sort. Before we define what that means, consider the following example: Suppose we have an array of student records. Each record includes the student's last name, their first year of attendance, and other useful information (grades, contact information, etc.). Now suppose we wish to sort these to create alphabetical lists by last name for each year of attendance, in order. To do this, we first sort the records by last names. The below list gives a sample of how the records might now be organized. Note the order in which the 2019 students appeared (shown in yellow).
Last name | First year | Phone | $\cdots$ |
---|---|---|---|
Albert | 2018 | 231-343-5555 | $\cdots$ |
Creed | 2019 | 898-122-9643 | $\cdots$ |
Douglas | 2020 | 874-088-1212 | $\cdots$ |
Elmer | 2019 | 644-480-0023 | $\cdots$ |
Francis | 2019 | 762-555-3987 | $\cdots$ |
Harris | 2020 | 872-322-9768 | $\cdots$ |
King | 2019 | 766-093-9873 | $\cdots$ |
Smith | 2018 | 991-878-4944 | $\cdots$ |
Zhu | 2020 | 991-675-1001 | $\cdots$ |
Now, we sort the above list by the first year of attendance to obtain:
Last name | First year | Phone | $\cdots$ |
---|---|---|---|
Albert | 2018 | 231-343-5555 | $\cdots$ |
Smith | 2018 | 991-878-4944 | $\cdots$ |
Creed | 2019 | 898-122-9643 | $\cdots$ |
Elmer | 2019 | 644-480-0023 | $\cdots$ |
Francis | 2019 | 762-555-3987 | $\cdots$ |
King | 2019 | 766-093-9873 | $\cdots$ |
Douglas | 2020 | 874-088-1212 | $\cdots$ |
Harris | 2020 | 872-322-9768 | $\cdots$ |
Zhu | 2020 | 991-675-1001 | $\cdots$ |
Notice that the second sort in this case gave us exactly what we wanted -- students grouped by first year and alphabetized by last name. This worked out the way it did because the order of students with the same first year of attendance in our initial list was preserved when we sorted by first year of attendance. For example, in the first list, the 2019 students appear in the order "Creed, Elmer, Francis, King". After sorting by first year, this order was preserved. This motivates a definition -- when a sort preserves the relative order of elements of equal rank, we say that the sort is stable.
It could have gone another way though. What if the sorting by first year of attendance had produced the following?
Last name | First year | Phone | $\cdots$ |
---|---|---|---|
Smith | 2018 | 991-878-4944 | $\cdots$ |
Albert | 2018 | 231-343-5555 | $\cdots$ |
Francis | 2019 | 762-555-3987 | $\cdots$ |
Creed | 2019 | 898-122-9643 | $\cdots$ |
Elmer | 2019 | 644-480-0023 | $\cdots$ |
King | 2019 | 766-093-9873 | $\cdots$ |
Zhu | 2020 | 991-675-1001 | $\cdots$ |
Douglas | 2020 | 874-088-1212 | $\cdots$ |
Harris | 2020 | 872-322-9768 | $\cdots$ |
This time, while the 2019 students were initially in the order "Creed, Elmer, Francis, King", now they are in the order "Francis, Creed, Elmer, King". In other words, the sorting by first year undid some of the work the sorting by last name had accomplished. When a sorting method can change the relative order of elements of equal rank, we say the sort is not stable.
Clearly, when one wants to follow one sort with another, without undoing the work of the first sort, one should consider using a stable sort over an unstable one.
Wondering now which of the sorts we have examined are stable? Let's take a look:
Insertion sorts are stable. Think about it -- as we move smaller elements to the left by repeated swapping, they never move past any equal elements to their left.
Selection sorts are not stable. As one counter-example, consider the array $\fbox{$B_1|B_2|A$}$ (where $B_1 = B_2$). When this array is sorted with a selection sort, the result is $\fbox{$A|B_2|B_1$}$. Note that the $B$'s are now in reverse order.
Merge sorts are stable. Recall that during a merge sort, the only time elements move is when we are merging two sub-arrays together. As we consider elements (from left to right) in the two sub-arrays to be merged -- when the two elements compared are equal, we always move the element from the left sub-array to the merged group, never from the right-array. This preserves the relative order of equal elements. Here are the relevant lines in the code where this can be seen:
//merge... int i = lo; int j = mid + 1; for (int k = lo; k <= hi; k++) { if (i > mid) a[k] = aux[j++]; else if (j > hi) a[k] = aux[i++]; else if (less(aux[j],aux[i])) a[k] = aux[j++]; else a[k] = aux[i++]; //<-- when !less(aux[j],aux[i]), // aux[i] is copied, not aux[j] }
Quicksort is not stable. As one counter-example, consider the array $\fbox{$B|C_1|C_2|A$}$ (where $C_1 = C_2$). When this array is sorted with quicksort, the result is $\fbox{$A|B|C_2|C_1$}$. Note that the $C$'s are now in reverse order.