5  Heapsort

“Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination.”
– Albert Einstein

In this chapter, we introduce yet another sorting algorithm, known as heap sort. Like merge sort, heap sort’s running time is O(n\lg{n}). However, heap sort is an in-place algorithm: in which only a constant number of array elements are stored outside the input array.

Space Complexity

Space complexity of an algorithm or a data structure is the amount of memory (equivalently array size) required to solve an instance of the computational problem as a function of the input size. It includes the size of the input and the auxiliary space required to solve the instance of the problem.

Like the running time of an algorithm, we will also express space complexity using big-O notation.

Recall that the insertion sort is in-place, but merge sort does not sort in place. Thus, heap sort combines the better properties of the two. The following table compares different sorting algorithm in terms of their complexity.

Sorting Algorithm Best-case Worst-case Extra Space
Selection O(n^2) O(n^2) O(1)
Bubble O(n^2) O(n^2) O(1)
Insertion O(n) O(n^2) O(1)
Merge O(n\lg{n}) O(n\lg{n}) \pmb{\Omega(n)}
Heap O(n\lg{n}) O(n\lg{n}) O(1)

5.1 (Binary) Heaps

The binary heap—or simply heap—is the very first data structure we formally introduce in this course. The heap is not really a new way of storing data, rather viewing an array as a special kind of tree whose each node corresponds to an element of the array. More specifically, the heap is a (nearly complete) binary tree.

We first define a binary tree.

Definition 5.1 (Binary Tree) A binary tree is a tree with:

  1. a designated root node usually drawn at the top level, and

  2. each node having a designated (possibly empty) LEFT child and a (possibly empty) RIGHT child. A node is called a leaf if it has no children, and an internal node otherwise.

Figure 5.1: In the Figure above, we see two binary trees: (a) the node 7 and 1 having only a left child, and (b) node 7 has a only right child. None of them, however, is a heap.

Exercise 5.1 In Figure 5.1, describe the root node, internal nodes and leaves.

Exercise 5.2 In Figure 5.1, are the two binary trees identical?

Definition 5.2 A heap is a nearly completely binary tree: filled on all levels except possibly the lowest, which is filled from the left up to a point.

Exercise 5.3 Which one(s) in Figure 5.1, is a heap? Can you turn them them into a heap?

As it turns out, the properties of a heap enables us to represent a heap by an array! The root of the heap is A[0], and all other elements of the heap are placed in the array sequentially as we traverse the tree first left-to-right on each level then top-to-bottom. This procedure establishes the equivalence of a heap and its representation as an array—we can always go back-and-forth. See the demo below.

Demo

Let us demonstrate how to create a heap (of certain heap_size) from an array and vice versa.

Exercise 5.4 Draw the heap corresponding to the array <0, 12, 5, 4, 7, 3, -9, 6, 13, 20, -4, -2, 5> for heap_size=10.

The height of a node in a heap measures how far up the node is above a leaf, that is, the number of edges between the node and a leaf.

Exercise 5.5 What’s the height of the node 13 in the above heap.

Implementing a Heap

We eventually implement the class Heap in Python, primarily with the attribute heap_size.

Exercise 5.6 Describe a good design strategy for implementing such a class.

Exercise 5.7 Start to implement the above strategy in Python.

Exercise 5.8 For a non-root node i in a heap, guess the index of its PARENT node.

Exercise 5.9 For a non-leaf node i in a heap, guess the index of its LEFT and RIGHT child.

Exercise 5.10 Add PARENT, LEFT, and RIGHT to the class Heap as lambda functions.

5.2 Max-Heaps

A heap without any special property is nearly useless. In order to make heaps useful in designing algorithms, we often impose a heap property.

In a max-heap A, the max-heap-property is that for each array index i: A[{\tt PARENT}(i)]\geq A[i], that is the value of a node is at most the value of its parent.

The other kind of heaps—as you might have already guessed—is the min-heap. The heapsort algorithm uses max-heaps.

Exercise 5.11 Is the heap corresponding to the array <33, 19, 20, 15, 13, 10, 21, 13, 16, 12> a max-heap for heap_size=5? Would you conclude the same if we had heap_size=7?

Exercise 5.12 Is an array sorted if the corresponding heap is a max-heap? Is the converse true?

Our goal is to develop patches of code to eventually turn a given input array into a max-heap.

MAX-HEAPIFY

You are probably wondering how we can create a max-heap by updating an input array. We start with a much simpler problem: turn the subtree of a given node i into a max-heap assuming that both children are already max-heaps.

\begin{algorithm} \caption{MaxHeapify} \begin{algorithmic} \Procedure{MaxHeapify}{$A[0:n-1], i$} \State{$l=\text{LEFT}(i)$} \State{$r=\text{RIGHT}(i)$} \If{$l < A$.heap\_size and $A[l]>A[i]$} \State{$largest=l$} \Else \State{$largest=i$} \EndIf \If{$r < A$.heap\_size and $A[r]>A[largest]$} \State{$largest=r$} \EndIf \If{$largest\neq i$} \State{exchange $A[i]$ and $A[largest]$} \State{\Call{MaxHeapify}{A$, largest$}} \EndIf \EndProcedure \end{algorithmic} \end{algorithm}

Demo

Exercise 5.13 Dry-run MAX-HEAPIFY on the heap <-12, 7, 8, 6, 4, 5, 0, 0, -7, -1, 0, 0, -1> with heap_size=10 for i=2.

Exercise 5.14 Implement MAX-HEAPIFY in Python inside class Heap.

Exercise 5.15 Write down the recursive equation for the running time of MAX-HEAPIFY.

Exercise 5.16 Analyze the running time of MAX-HEAPIFY using both a recursion tree and the Master Theorem.

BUILD-MAX-HEAP

We now turn any array into a max-heap by recursively invoking MAX-HEAPIFY. The procedure BUILD-MAX-HEAP processes the input array in a bottom-up manner.

We note in Homework 5 that all leaves of a heap with n elements have indices \lfloor\frac{n-1}{2}\rfloor+1,\ldots,n-1. And, they are all max-heaps as 1-element heaps.

  • The procedure starts with the last internal node i=\lfloor(n-1)/2\rfloor. Since its children are all max-heaps, the subtree rooted at i can be turned into a max-heap by calling MAX-HEAPIFY for i.

  • This way we max-heapify all subtrees rooted at that level and above by repeatedly decrementing i and calling MAX-HEAPIFY on it.

\begin{algorithm} \caption{BuildMaxHeap} \begin{algorithmic} \Procedure{BuildMaxHeap}{$A[0:n-1]$} \State{$A$.heap\_size = $n$} \For{$i=\lfloor\frac{n-1}{2}\rfloor$ downto $0$} \State{\Call{MaxHeapify}{$A, i$}} \EndFor \EndProcedure \end{algorithmic} \end{algorithm}

Exercise 5.17 Implement the BUILD-MAX-HEAP in class Heap.

Running-time

A rough upper bound is O(n\lg{n}).

However, it can be shown that the genuine upper bound for BUILD-MAX-HEAP is O(n). The idea is that the procedure MAX-HEAPIFY is called repeatedly, but on subtrees with various height.

In order to show desired upper bound, we assume, without any loss of generality, that n=2^k-1 for some k\geq1. In other words, the heap is a complete binary tree. We then note that:

  • the height h of a node in the heap can rage from 0 to k-1.

  • at any height h, there are at most 2^{k-1-h} nodes (Homework 5).

  • if h is the height of a subtree, calling MAX-HEAPIFY on its root takes O(h) or ch time for some constants c.

Then the total running time T(n) of BUILD-MAX-HEAP can be written as

\begin{align*} T(n) &=\sum_{h=0}^{k-1}2^{k-1-h}ch \\ &=c2^{k-1}\sum_{h=0}^{k-1}\frac{1}{2^h}h \\ &\leq c2^{k-1}\sum_{h=0}^{\infty}\frac{h}{2^h} \\ &=c2^{k-1}\frac{1/2}{(1-1/2)^2}. \end{align*} This is due to the infinite series Equation A.4.

So, T(n)\leq c2^k=cn. Therefore, a better upper-bound for BUILD-MAX-HEAP is O(n).

5.3 Heapsort

The heapsort procedure HEAPSORT works as follows:

  • it starts by calling BUILD-MAX-HEAP to build a max-heap on the input array A[0:n-1].

  • Since the maximum element of the array is stored at the root A[0], HEAPSORT can place it into its correct final position by swapping it with A[n-1].

  • the procedure then discards node n-1 from the heap by simply decrementing A.heap\_size.

  • By doing so, the children of the root remain max-heaps, but the new root element might violate the max-heap property. To restore the max-heap property, the procedure just calls MAX-HEAPIFY(A, 0), which leaves a max-heap in A[0:n-2].

  • The HEAPSORT procedure then repeats this process for the max-heap of size n-1 down to a heap of size 1.

\begin{algorithm} \caption{HeapSort} \begin{algorithmic} \Procedure{HeapSort}{$A[0:n-1]$} \State{\Call{BuildMaxHeap}{$A$}} \For{$i=n-1$ downto $1$} \State{exchange $A[0]$ with $A[i]$} \State{$A$.heap\_size = $A$.heap\_size$ - 1$} \State{\Call{MaxHeapify}{$A, 0$}} \EndFor \EndProcedure \end{algorithmic} \end{algorithm}

Demo

Exercise 5.18 Analyze the running time of HEAP-SORT.

5.4 Board Pictures