6 Basic Data Structures
“A computer would deserve to be called intelligent if it could deceive a human into believing that it was human.”
– Alan Turning
Most algorithms take their input as a collection of objects. Such a collection, also called data, is usually organized as a data-structure. In this chapter, we discuss some of commonly used data-structures: arrays, stacks, and queues.
Recall that in our running time analyses so far, we tacitly assumed that accessing an element A[i] at index i into a collection A took constant time. This is, in fact, the case if A is an array. However, if the algorithm was handed a different data-structure like a linked-list L, accessing L[i] would take \Omega(\text{n}) in the worst-case for an n-element list. The choice of the data-structure used in an algorithm is thus crucial.
6.1 Arrays
Array (or something equivalent) is a primitive data-structure in almost all programming languages. Most compilers store an array in the memory as a contiguous sequence of (a fixed number of) bytes. In each block of bytes, a data element is stored. In order to keep the size (or number of bytes) of each block fixed, compilers do not store data objects of different types in them.
In Python, arrays are implemented by the class list
. Although Python tricks us into believing that a list can contain objects of different types, only the references (or addresses) to those objects are kept in each element of the list.
Access
Given a list A
and an index i
into the list, the object at index i is accessed by A[i]
.
Complexity:
Time: O(1)
Extra Space: O(1)
Array indexing takes constant time because of two reasons. First, we can calculate the memory address of the i-th element using a + i\times s. Here a is the address of the first element in the array, and s is the space each element takes (this is why the elements in an array must have the same type, so that will be the same for every element).
Second, we can retrieve the element from the address in the equation above. Both calculating the address and retrieving the element take constant time. This is the key reason why array is one of the most important data structures in algorithm design.
In practice, the two steps are combined in one. Moreover, we do not have to know the a and s in the equation above as the complier keeps track of them. All we need to know is the index of the element we want to access.
Find Max (Min)
Let us now develop and compare strategies to find the maximum element of a given array.
Here is an incremental scheme taking O(n) in the worst-case:
Exercise 6.1 Explain why the for-loop in #3
iterates from i=1
, instead of i=0
.
This is alright, because the running max
is initialized with A[0].
So, it suffices to now visit the other elements A[1:n-1] of the array to check for a bigger element.
Exercise 6.2 What would happen if the inequality in line #4
was replaced with strict inequality?
Although the replacement update max
less frequently, it would not change the outcome of the algorithm. With a >
, we update the current max only when a genuinely larger element is encountered.
In that sense, the replacement would be a better coding practice.
Exercise 6.3 Identify at least three corner cases for Algorithm 1. Discuss how you would address them.
- The input is not a list or iterable.
- raise an
Exception
.
- raise an
- The input list is empty, i.e., len(A)=0.
- The mathematical convention is that the maximum of the empty set (\emptyset) is -\infty. So, I would return
-math.inf
in this case.
- The mathematical convention is that the maximum of the empty set (\emptyset) is -\infty. So, I would return
- The input is a non-empty array with non-numerical values.
- raise an
Exception
.
- raise an
Exercise 6.4 Implement Algorithm 1 in Python handling the above mentioned corner cases.
Exercise 6.5 Develop an algorithm for finding the maximum using divide-and-conquer. Make sure that the size of each subproblem is nearly half the original size.
Let A[p, r] is the array under consideration.
Divide: Split into two subarrays A[p:q] and A[q+1:r], where q is the midpoint of p and r.
Conquer: Recursively find max of A[p:q] and A[q+1:r]. The recursion reaches the base case—when p\geq r.
Combine: We use the recursive formula: \max(A[p,q])=\max\{\max(A[p,q]), \max(A[q+1,r])\}.
Exercise 6.6 Analyze the time-complexity and (extra) space-complexity of the above recursive scheme.
Here are the costs:
Divide + Combine: constant, say c_1.
Conquer: Two subproblems, each with half the size.
Therefore, the running time T(n)=c_1 + 2T(n/2). The Master theorem gives us: T(n)=O(n).
Extra space: O(1).
Exercise 6.7 Find a good lower bound on the time-complexity in big-\Omega notation.
Note that to find the maximum of a list, we must visit each element of the list at least one. Hence, a lower bound is \Omega(n).
Insert
A new element x
is inserted into a list A
by the list method A.append(x)
.
Exercise 6.8 Discuss the time-complexity of .append()
.
Note that insertion is usually O(1) if array is implemented to have a fixed length. Examples include C, C++, Java, etc.
In Python, lists are of variable length—similar to ArrayList
in Java. Such flexibility comes at a cost. When the list gets full, the .append(x)
doubles are size of the array incurring a cost of O(n). So, inserting a new element technically takes O(n).
Doubling of the array does not happen with each call of .append(x)
, but only when the array is full. For this reason, the amortized running time is O(1). Read to know more about amortized analysis.
Search
Finding membership of an element x in a list A is done by using the command x in A
. As we have already discussed, the operation takes O(n) in the worst-case.
Although we would discuss the dictionary data-structure in this course, they facilitate O(1)-time searching of a key x.
If D
is a Python dictionary and key
is a query item, then key in D
takes O(1).
Delete
Exercise 6.9 Discuss the time-complexity of deleting from an array the element at an index i.
6.2 Stacks (LIFO)
Simply put, a stack is a data-structure that follows LIFO (last-in, first-out) policy. A stack is often visualized as a stack of plates, where a plate is added (and removed) always from the top. The data-structure implements two operations: push and pop.
In Python, a list can be used as a stack. It implements insertion to a stack via append(x)
and removal of the last element via pop()
.
Complexity:
pop
: O(1)push
: O(n), but O(1) amortized
Call Stacks
Execution of recursive algorithms make use of stacks. Internally, a compiler (or interpreter) uses a call stack to store information about the active subroutines of a computer program.
In order to run a computer program, the compiler repeatedly evaluates the top of the corresponding call stack. If the main program depends on a subroutine, then the subroutine is pushed to the top of the call stack to be evaluated first.
Demo: recFact
Demo: recFib
Exercise 6.10 A sequence of parentheses in a string is valid if they occur in the correct open-close order. Assume that the input string can contain only (
and )
.
For example: (())
is valid, but (()
is not. Design an algorithm to check if an input string has valid parentheses.
Exercise 6.11 Implement in Python what you have designed.
Exercise 6.12 A sequence of parentheses in a string is valid if they occur in the correct open-close order. Assume that the input string can contain
{
, (
, [
, ]
, )
, and }
.
For example: ([])
is valid, but (][)
is not. Design an algorithm to check if an input string has valid parentheses.
To be added.
Exercise 6.13 Implement in Python what you have designed.
6.3 Queue (FIFO)
In this section we will introduce another data structure, named queue, which works in a somehow opposite way compared to stack. In simple words, queue is a structure where data are stored in a FIFO (first in first out) fashion. This is the key difference between queue and stack (which works in a LIFO way). You can think of the queue data-structure as the queue in real life. Most of the time, we wait in a queue (or a line) so that the customer who enters first will be served first (a.k.a., first come first serve). It would be chaotic if we waited in a stack instead!
There is another difference between stack and queue, on the programming level. As mentioned, we use the class list
to implement stack. However, we do not do so for queue. This is because all the operations for stack (pushing and popping) are conducted from the end of the list.
A queue must implement two methods: enQueue(x)
(to the end) and deQueue()
(from the front). Ideally, both must take O(1) time. There is a data-structure particularly designed for queue in Python, named deque
. Here are some simple examples showing how it works.
6.4 Priority Queue
In this section we will introduce another data structure, named priority queue, which generalizes both the stack and queue data structures.
In simple words, each object x
in a priority queue has a priority, stored in an attribute named key
. Then, objects are added and removed based on their priorities: x.key
. There are two types: max-priority queue and min-priority queue. Here we only discuss the former: the object with the most priority is the one to be popped first. To abuse notation, we simply call it priority queue.
We can now think of queue as a priority queue, where x.key
is the given by their waiting time in the queue. The longer an object waits, the more priority it earns for being popped; this is exactly the FIFO policy.
Similarly, we can now think of stack as a priority queue, where x.key
is the given by the negative of their waiting time in the queue. The longer an object waits, the less priority it earns for being popped; this is exactly the LIFO policy.
A priority queue must implement two methods: enQueue(x)
, deQueue()
, increaseKey(x)
.
Exercise 6.14 Design an implementation of priority queue, and analyze the time and space complexities of our scheme.
6.5 Resources
Read more about the complexities of data-structures implemented in Python.