viewof n = Inputs.range([0, 36], {
value: 10,
step: 1,
label: tex`n`
});
A = d3.range(n).map((d) => d3.randomInt(-10, 10)());
{
const container = d3
.create("div")
.attr("width", width)
.style("min-height", '100px')
.style("margin-top", '30px')
container
.selectAll(".element")
.data(A)
.join("div")
.attr("class", "element")
.style("border", "3px solid lightgray")
.style("border-style", (d, id) => id < size ? "solid" : "dashed")
.style("width", "35px")
.style("height", "35px")
.style("padding", "1px")
.style("text-align", "center")
.style("display", "inline-block")
.style("font-size", "20px")
.style("line-height", "23px")
.text((d, id ) => d )
.append("div")
.style("color", "pink")
.style("position", "absolute")
.style("transform", "translate(50%, 40%)")
.text((d, id) => id < size ? id : "")
return container.node();
}
5 Heapsort
“Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination.”
– Albert Einstein
In this chapter, we introduce yet another sorting algorithm, known as heap sort. Like merge sort, heap sort’s running time is O(n\lg{n}). However, heap sort is an in-place algorithm: in which only a constant number of array elements are stored outside the input array.
Space Complexity
Space complexity of an algorithm or a data structure is the amount of memory (equivalently array size) required to solve an instance of the computational problem as a function of the input size. It includes the size of the input and the auxiliary space required to solve the instance of the problem.
Like the running time of an algorithm, we will also express space complexity using big-O notation.
Recall that the insertion sort is in-place, but merge sort does not sort in place. Thus, heap sort combines the better properties of the two. The following table compares different sorting algorithm in terms of their complexity.
Sorting Algorithm | Best-case | Worst-case | Extra Space |
---|---|---|---|
Selection | O(n^2) | O(n^2) | O(1) |
Bubble | O(n^2) | O(n^2) | O(1) |
Insertion | O(n) | O(n^2) | O(1) |
Merge | O(n\lg{n}) | O(n\lg{n}) | \pmb{\Omega(n)} |
Heap | O(n\lg{n}) | O(n\lg{n}) | O(1) |
5.1 (Binary) Heaps
The binary heap—or simply heap—is the very first data structure we formally introduce in this course. The heap is not really a new way of storing data, rather viewing an array as a special kind of tree whose each node corresponds to an element of the array. More specifically, the heap is a (nearly complete) binary tree.
We first define a binary tree.
Definition 5.1 (Binary Tree) A binary tree is a tree with:
a designated root node usually drawn at the top level, and
each node having a designated (possibly empty)
LEFT
child and a (possibly empty)RIGHT
child. A node is called a leaf if it has no children, and an internal node otherwise.
Exercise 5.1 In Figure 5.1, describe the root node, internal nodes and leaves.
Exercise 5.2 In Figure 5.1, are the two binary trees identical?
Definition 5.2 A heap is a nearly completely binary tree: filled on all levels except possibly the lowest, which is filled from the left up to a point.
Exercise 5.3 Which one(s) in Figure 5.1, is a heap? Can you turn them them into a heap?
As it turns out, the properties of a heap enables us to represent a heap by an array! The root of the heap is A[0], and all other elements of the heap are placed in the array sequentially as we traverse the tree first left-to-right on each level then top-to-bottom. This procedure establishes the equivalence of a heap and its representation as an array—we can always go back-and-forth. See the demo below.
Demo
Let us demonstrate how to create a heap (of certain heap_size
) from an array and vice versa.
Exercise 5.4 Draw the heap corresponding to the array <0, 12, 5, 4, 7, 3, -9, 6, 13, 20, -4, -2, 5>
for heap_size=10
.
The height of a node in a heap measures how far up the node is above a leaf, that is, the number of edges between the node and a leaf.
Exercise 5.5 What’s the height of the node 13 in the above heap.
Implementing a Heap
We eventually implement the class Heap
in Python, primarily with the attribute heap_size
.
Exercise 5.6 Describe a good design strategy for implementing such a class.
Exercise 5.7 Start to implement the above strategy in Python.
Exercise 5.8 For a non-root node i in a heap, guess the index of its PARENT
node.
Exercise 5.9 For a non-leaf node i in a heap, guess the index of its LEFT
and RIGHT
child.
Exercise 5.10 Add PARENT
, LEFT
, and RIGHT
to the class Heap
as lambda functions.
5.2 Max-Heaps
A heap without any special property is nearly useless. In order to make heaps useful in designing algorithms, we often impose a heap property.
In a max-heap A, the max-heap-property is that for each array index i: A[{\tt PARENT}(i)]\geq A[i], that is the value of a node is at most the value of its parent.
The other kind of heaps—as you might have already guessed—is the min-heap. The heapsort algorithm uses max-heaps.
Exercise 5.11 Is the heap corresponding to the array <33, 19, 20, 15, 13, 10, 21, 13, 16, 12>
a max-heap for heap_size=5
? Would you conclude the same if we had heap_size=7
?
Exercise 5.12 Is an array sorted if the corresponding heap is a max-heap? Is the converse true?
Our goal is to develop patches of code to eventually turn a given input array into a max-heap.
MAX-HEAPIFY
You are probably wondering how we can create a max-heap by updating an input array. We start with a much simpler problem: turn the subtree of a given node i into a max-heap assuming that both children are already max-heaps.
\begin{algorithm} \caption{MaxHeapify} \begin{algorithmic} \Procedure{MaxHeapify}{$A[0:n-1], i$} \State{$l=\text{LEFT}(i)$} \State{$r=\text{RIGHT}(i)$} \If{$l < A$.heap\_size and $A[l]>A[i]$} \State{$largest=l$} \Else \State{$largest=i$} \EndIf \If{$r < A$.heap\_size and $A[r]>A[largest]$} \State{$largest=r$} \EndIf \If{$largest\neq i$} \State{exchange $A[i]$ and $A[largest]$} \State{\Call{MaxHeapify}{A$, largest$}} \EndIf \EndProcedure \end{algorithmic} \end{algorithm}
Demo
Exercise 5.13 Dry-run MAX-HEAPIFY
on the heap <-12, 7, 8, 6, 4, 5, 0, 0, -7, -1, 0, 0, -1>
with heap_size=10
for i=2.
Exercise 5.14 Implement MAX-HEAPIFY
in Python inside class Heap
.
Exercise 5.15 Write down the recursive equation for the running time of MAX-HEAPIFY
.
Exercise 5.16 Analyze the running time of MAX-HEAPIFY
using both a recursion tree and the Master Theorem.
BUILD-MAX-HEAP
We now turn any array into a max-heap by recursively invoking MAX-HEAPIFY
. The procedure BUILD-MAX-HEAP
processes the input array in a bottom-up manner.
We note in Homework 5 that all leaves of a heap with n elements have indices \lfloor\frac{n-1}{2}\rfloor+1,\ldots,n-1. And, they are all max-heaps as 1-element heaps.
The procedure starts with the last internal node i=\lfloor(n-1)/2\rfloor. Since its children are all max-heaps, the subtree rooted at i can be turned into a max-heap by calling
MAX-HEAPIFY
for i.This way we max-heapify all subtrees rooted at that level and above by repeatedly decrementing i and calling
MAX-HEAPIFY
on it.
\begin{algorithm} \caption{BuildMaxHeap} \begin{algorithmic} \Procedure{BuildMaxHeap}{$A[0:n-1]$} \State{$A$.heap\_size = $n$} \For{$i=\lfloor\frac{n-1}{2}\rfloor$ downto $0$} \State{\Call{MaxHeapify}{$A, i$}} \EndFor \EndProcedure \end{algorithmic} \end{algorithm}
Exercise 5.17 Implement the BUILD-MAX-HEAP
in class Heap
.
Running-time
A rough upper bound is O(n\lg{n}).
However, it can be shown that the genuine upper bound for BUILD-MAX-HEAP
is O(n). The idea is that the procedure MAX-HEAPIFY
is called repeatedly, but on subtrees with various height.
In order to show desired upper bound, we assume, without any loss of generality, that n=2^k-1 for some k\geq1. In other words, the heap is a complete binary tree. We then note that:
the height h of a node in the heap can rage from 0 to k-1.
at any height h, there are at most 2^{k-1-h} nodes (Homework 5).
if h is the height of a subtree, calling
MAX-HEAPIFY
on its root takes O(h) or ch time for some constants c.
Then the total running time T(n) of BUILD-MAX-HEAP
can be written as
\begin{align*} T(n) &=\sum_{h=0}^{k-1}2^{k-1-h}ch \\ &=c2^{k-1}\sum_{h=0}^{k-1}\frac{1}{2^h}h \\ &\leq c2^{k-1}\sum_{h=0}^{\infty}\frac{h}{2^h} \\ &=c2^{k-1}\frac{1/2}{(1-1/2)^2}. \end{align*} This is due to the infinite series Equation A.4.
So, T(n)\leq c2^k=cn. Therefore, a better upper-bound for BUILD-MAX-HEAP
is O(n).
5.3 Heapsort
The heapsort procedure HEAPSORT
works as follows:
it starts by calling
BUILD-MAX-HEAP
to build a max-heap on the input array A[0:n-1].Since the maximum element of the array is stored at the root A[0],
HEAPSORT
can place it into its correct final position by swapping it with A[n-1].the procedure then discards node n-1 from the heap by simply decrementing A.heap\_size.
By doing so, the children of the root remain max-heaps, but the new root element might violate the max-heap property. To restore the max-heap property, the procedure just calls
MAX-HEAPIFY(A, 0)
, which leaves a max-heap in A[0:n-2].The
HEAPSORT
procedure then repeats this process for the max-heap of size n-1 down to a heap of size 1.
\begin{algorithm} \caption{HeapSort} \begin{algorithmic} \Procedure{HeapSort}{$A[0:n-1]$} \State{\Call{BuildMaxHeap}{$A$}} \For{$i=n-1$ downto $1$} \State{exchange $A[0]$ with $A[i]$} \State{$A$.heap\_size = $A$.heap\_size$ - 1$} \State{\Call{MaxHeapify}{$A, 0$}} \EndFor \EndProcedure \end{algorithmic} \end{algorithm}
Demo
Exercise 5.18 Analyze the running time of HEAP-SORT
.