viewof n = Inputs.range([0, 36], {
value: 10,
step: 1,
label: tex`n`
});
A = d3.range(n).map((d) => d3.randomInt(-10, 10)());
{
const container = d3
.create("div")
.attr("width", width)
.style("min-height", '100px')
.style("margin-top", '30px')
container
.selectAll(".element")
.data(A)
.join("div")
.attr("class", "element")
.style("border", "3px solid lightgray")
.style("border-style", (d, id) => id < size ? "solid" : "dashed")
.style("width", "35px")
.style("height", "35px")
.style("padding", "1px")
.style("text-align", "center")
.style("display", "inline-block")
.style("font-size", "20px")
.style("line-height", "23px")
.text((d, id ) => d )
.append("div")
.style("color", "pink")
.style("position", "absolute")
.style("transform", "translate(50%, 40%)")
.text((d, id) => id < size ? id : "")
return container.node();
}
5 Heapsort
“Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination.”
– Albert Einstein
In this chapter, we introduce yet another sorting algorithm, known as heap sort. Like merge sort, heap sort’s running time is O(n\lg{n}). However, heap sort is an in-place algorithm: in which only a constant number of array elements are stored outside the input array.
Space Complexity
Space complexity of an algorithm or a data structure is the amount of memory (equivalently array size) required to solve an instance of the computational problem as a function of the input size. It includes the size of the input and the auxiliary space required to solve the instance of the problem.
Like the running time of an algorithm, we will also express space complexity using big-O notation.
Recall that the insertion sort is in-place, but merge sort does not sort in place. Thus, heap sort combines the better properties of the two. The following table compares different sorting algorithm in terms of their complexity.
Sorting Algorithm | Best-case | Worst-case | Extra Space |
---|---|---|---|
Selection | O(n^2) | O(n^2) | O(1) |
Bubble | O(n^2) | O(n^2) | O(1) |
Insertion | O(n) | O(n^2) | O(1) |
Merge | O(n\lg{n}) | O(n\lg{n}) | \pmb{\Omega(n)} |
Heap | O(n\lg{n}) | O(n\lg{n}) | O(1) |
5.1 (Binary) Heaps
The binary heap—or simply heap—is the very first data structure we formally introduce in this course. The heap is not really a new way of storing data, rather viewing an array as a special kind of tree whose each node corresponds to an element of the array. More specifically, the heap is a (nearly complete) binary tree.
We first define a binary tree.
Definition 5.1 (Binary Tree) A binary tree is a tree with:
a designated root node usually drawn at the top level, and
each node having a designated (possibly empty)
LEFT
child and a (possibly empty)RIGHT
child. A node is called a leaf if it has no children, and an internal node otherwise.
Exercise 5.1 In Figure 5.1, describe the root node, internal nodes and leaves.
Solution
In both (a) and (b), 3 is the root node. The internal nodes are 3, 2, 7, 4, and 1. The leaves are 5 and 6.
Exercise 5.2 In Figure 5.1, are the two binary trees identical?
Solution
No, they are not identical as binary trees. For example, in (a), the node 7 has a left child (namely 5) but no right child. Whereas in (b), the node 7 has no left child but a right child (namely 5).
Definition 5.2 A heap is a nearly completely binary tree: filled on all levels except possibly the lowest, which is filled from the left up to a point.
Exercise 5.3 Which one(s) in Figure 5.1, is a heap? Can you turn them them into a heap?
Solution
None.
As it turns out, the properties of a heap enables us to represent a heap by an array! The root of the heap is A[0], and all other elements of the heap are placed in the array sequentially as we traverse the tree first left-to-right on each level then top-to-bottom. This procedure establishes the equivalence of a heap and its representation as an array—we can always go back-and-forth. See the demo below.
Demo
Let us demonstrate how to create a heap (of certain heap_size
) from an array and vice versa.
Exercise 5.4 Draw the heap corresponding to the array <0, 12, 5, 4, 7, 3, -9, 6, 13, 20, -4, -2, 5>
for heap_size=10
.
Solution
The height of a node in a heap measures how far up the node is above a leaf, that is, the number of edges between the node and a leaf.
Exercise 5.5 What’s the height of the node 13 in the above heap.
Solution
3.
Implementing a Heap
We eventually implement the class Heap
in Python, primarily with the attribute heap_size
.
Exercise 5.6 Describe a good design strategy for implementing such a class.
Solution
Since a heap is basically an array that is visualized differently, extending or sub-classing the list
class in Python seems to be a good option. The heap_size
attribute can potentially be different from the len
of the backing array. So, we just add a new attribute called heap_size
.
Exercise 5.7 Start to implement the above strategy in Python.
Solution
Exercise 5.8 For a non-root node i in a heap, guess the index of its PARENT
node.
Solution
\begin{algorithm} \caption{PARENT} \begin{algorithmic} \Procedure{Parent}{$i$} \Return{$\left\lfloor\frac{i-1}{2}\right\rfloor$} \EndProcedure \end{algorithmic} \end{algorithm}
Exercise 5.9 For a non-leaf node i in a heap, guess the index of its LEFT
and RIGHT
child.
Solution
\begin{algorithm} \caption{LEFT} \begin{algorithmic} \Procedure{Left}{$i$} \Return{$2 * i + 1$} \EndProcedure \end{algorithmic} \end{algorithm}
\begin{algorithm} \caption{RIGHT} \begin{algorithmic} \Procedure{Right}{$i$} \Return{$2 * i + 2$} \EndProcedure \end{algorithmic} \end{algorithm}
Exercise 5.10 Add PARENT
, LEFT
, and RIGHT
to the class Heap
as lambda functions.
5.2 Max-Heaps
A heap without any special property is nearly useless. In order to make heaps useful in designing algorithms, we often impose a heap property.
In a max-heap A, the max-heap-property is that for each array index i: A[{\tt PARENT}(i)]\geq A[i], that is the value of a node is at most the value of its parent.
The other kind of heaps—as you might have already guessed—is the min-heap. The heapsort algorithm uses max-heaps.
Exercise 5.11 Is the heap corresponding to the array <33, 19, 20, 15, 13, 10, 21, 13, 16, 12>
a max-heap for heap_size=5
? Would you conclude the same if we had heap_size=7
?
Solution
- Yes.
- No
Exercise 5.12 Is an array sorted if the corresponding heap is a max-heap? Is the converse true?
Solution
No; try the max-heap [12, 6, 7, 5, 4].
Yes if sorted in decreasing order.
Our goal is to develop patches of code to eventually turn a given input array into a max-heap.
MAX-HEAPIFY
You are probably wondering how we can create a max-heap by updating an input array. We start with a much simpler problem: turn the subtree of a given node i into a max-heap assuming that both children are already max-heaps.
\begin{algorithm} \caption{MaxHeapify} \begin{algorithmic} \Procedure{MaxHeapify}{$A[0:n-1], i$} \State{$l=\text{LEFT}(i)$} \State{$r=\text{RIGHT}(i)$} \If{$l < A$.heap\_size and $A[l]>A[i]$} \State{$largest=l$} \Else \State{$largest=i$} \EndIf \If{$r < A$.heap\_size and $A[r]>A[largest]$} \State{$largest=r$} \EndIf \If{$largest\neq i$} \State{exchange $A[i]$ and $A[largest]$} \State{\Call{MaxHeapify}{A$, largest$}} \EndIf \EndProcedure \end{algorithmic} \end{algorithm}
Demo
Exercise 5.13 Dry-run MAX-HEAPIFY
on the heap <-12, 7, 8, 6, 4, 5, 0, 0, -7, -1, 0, 0, -1>
with heap_size=10
for i=2.
Exercise 5.14 Implement MAX-HEAPIFY
in Python inside class Heap
.
Solution
class Heap(list):
def __init__(self, arr = []):
super().__init__(arr)
self.heap_size = len(self)
PARENT = lambda i : -1 if i == 0 else (i - 1) // 2 # index of parent
LEFT = lambda i : 2 * i + 1 # index of left child
RIGHT = lambda i : 2 * i + 2 # index of right child
def max_heapify(self, i = 0):
l, r = Heap.LEFT(i), Heap.RIGHT(i)
if l < self.heap_size and self[l] > self[i]:
largest = l
else:
largest = i
if r < self.heap_size and self[r] > self[largest]:
largest = r
if largest != i:
self[i], self[largest] = self[largest], self[i]
self.max_heapify(largest)
Exercise 5.15 Write down the recursive equation for the running time of MAX-HEAPIFY
.
Solution
Let T(n) denote the worst-case running time of MAX-HEAPIFY
, where the the subtree rooted at index i has at most n nodes.
Divide: Before the recursive call (line #14
), we compute LEFT
, RIGHT
, and largest
. Each of them takes a constant amount of time. So, D(n)=c_1. Here the base case is when n=1.
Conquer: We recursively consider the subtree rooted at the child index largest
. The subtree under consideration has at most 2n/3 nodes (Homework 5).
Combine: C(n)=c_2.
So, the running time T(n)=(c_1+c_2) + T(2n/3)=c + T(2n/3).
Exercise 5.16 Analyze the running time of MAX-HEAPIFY
using both a recursion tree and the Master Theorem.
Solution
Using the Master theorem, we get T(n)=O(\lg{n}).
BUILD-MAX-HEAP
We now turn any array into a max-heap by recursively invoking MAX-HEAPIFY
. The procedure BUILD-MAX-HEAP
processes the input array in a bottom-up manner.
We note in Homework 5 that all leaves of a heap with n elements have indices \lfloor\frac{n-1}{2}\rfloor+1,\ldots,n-1. And, they are all max-heaps as 1-element heaps.
The procedure starts with the last internal node i=\lfloor(n-1)/2\rfloor. Since its children are all max-heaps, the subtree rooted at i can be turned into a max-heap by calling
MAX-HEAPIFY
for i.This way we max-heapify all subtrees rooted at that level and above by repeatedly decrementing i and calling
MAX-HEAPIFY
on it.
\begin{algorithm} \caption{BuildMaxHeap} \begin{algorithmic} \Procedure{BuildMaxHeap}{$A[0:n-1]$} \State{$A$.heap\_size = $n$} \For{$i=\lfloor\frac{n-1}{2}\rfloor$ downto $0$} \State{\Call{MaxHeapify}{$A, i$}} \EndFor \EndProcedure \end{algorithmic} \end{algorithm}
Exercise 5.17 Implement the BUILD-MAX-HEAP
in class Heap
.
Solution
import math
class Heap(list):
def __init__(self, arr = [], heap_size = math.inf):
super().__init__(arr)
self.heap_size = min(heap_size, len(self))
PARENT = lambda i : (i - 1) // 2 # index of parent
LEFT = lambda i : 2 * i + 1 # index of left child
RIGHT = lambda i : 2 * i + 2 # index of right child
@staticmethod
def max_heapify(h, i = 0):
l, r = Heap.LEFT(i), Heap.RIGHT(i)
if l < h.heap_size and h[l] > h[i]:
largest = l
else:
largest = i
if r < h.heap_size and h[r] > h[largest]:
largest = r
if largest != i:
h[i], h[largest] = h[largest], h[i]
h.max_heapify(h, largest)
@staticmethod
def build_max_heap(arr):
h = Heap(arr) # this sets heap_size = len(arr)
n = h.heap_size
for i in range(n // 2 - 1, 0, -1):
h.max_heapify(h, i)
Running-time
A rough upper bound is O(n\lg{n}).
However, it can be shown that the genuine upper bound for BUILD-MAX-HEAP
is O(n). The idea is that the procedure MAX-HEAPIFY
is called repeatedly, but on subtrees with various height.
In order to show desired upper bound, we assume, without any loss of generality, that n=2^k-1 for some k\geq1. In other words, the heap is a complete binary tree. We then note that:
the height h of a node in the heap can rage from 0 to k-1.
at any height h, there are at most 2^{k-1-h} nodes (Homework 5).
if h is the height of a subtree, calling
MAX-HEAPIFY
on its root takes O(h) or ch time for some constants c.
Then the total running time T(n) of BUILD-MAX-HEAP
can be written as
\begin{align*} T(n) &=\sum_{h=0}^{k-1}2^{k-1-h}ch \\ &=c2^{k-1}\sum_{h=0}^{k-1}\frac{1}{2^h}h \\ &\leq c2^{k-1}\sum_{h=0}^{\infty}\frac{h}{2^h} \\ &=c2^{k-1}\frac{1/2}{(1-1/2)^2}. \end{align*} This is due to the infinite series Equation A.4.
So, T(n)\leq c2^k=cn. Therefore, a better upper-bound for BUILD-MAX-HEAP
is O(n).
5.3 Heapsort
The heapsort procedure HEAPSORT
works as follows:
it starts by calling
BUILD-MAX-HEAP
to build a max-heap on the input array A[0:n-1].Since the maximum element of the array is stored at the root A[0],
HEAPSORT
can place it into its correct final position by swapping it with A[n-1].the procedure then discards node n-1 from the heap by simply decrementing A.heap\_size.
By doing so, the children of the root remain max-heaps, but the new root element might violate the max-heap property. To restore the max-heap property, the procedure just calls
MAX-HEAPIFY(A, 0)
, which leaves a max-heap in A[0:n-2].The
HEAPSORT
procedure then repeats this process for the max-heap of size n-1 down to a heap of size 1.
\begin{algorithm} \caption{HeapSort} \begin{algorithmic} \Procedure{HeapSort}{$A[0:n-1]$} \State{\Call{BuildMaxHeap}{$A$}} \For{$i=n-1$ downto $1$} \State{exchange $A[0]$ with $A[i]$} \State{$A$.heap\_size = $A$.heap\_size$ - 1$} \State{\Call{MaxHeapify}{$A, 0$}} \EndFor \EndProcedure \end{algorithmic} \end{algorithm}
Demo
Exercise 5.18 Analyze the running time of HEAP-SORT
.
Solution
T(n)=O(n\lg{n}).
5.4 Board Pictures
Download the full Heap
class here: heap.py.