7  Graphs

“To err is human — and to blame it on a computer is even more so.”
– Robert Orben

Learning Objectives

After completing this chapter, students are expected to understand:

  • the idea and implementation of graphs

  • how BFS and DFS work

  • how to find shortest distance paths between vertices in a graph

  • how to apply the two kinds of search algorithms to design optimal solutions

A graph is a collection of vertices (also called nodes) and edges between them. Sometimes, the vertices have vertex attributes and edges a direction and/or weights. More formally, a graph G is denoted by (V, E), where V=\{v_1,v_2,\ldots,v_n\} is the vertex set and E\subset V\times V is the collection of edges.

The above graph G has 6 vertices: V=\{0, 1, 2, 3, 4, 5\} and 5 edges: E=\{(0, 3), (1, 4), (1, 3), (2, 3), (3, 5)\}.

Note that the graph G above is an undirected graph, unlike a directed graph that has directed edges or arrows. We will consider only undirected graphs here.

Also, the edges of G are unweighted. In a weighted graph, each edge e\in E has an edge weight w_e.

Exercise 7.1 Draw the graph G=(V,E), where

  • V=\{v_0, v_1, v_2, v_3\}

  • E is the set of all possible edges.

Exercise 7.2 What is the minimum and maximum number of edges possible in a graph with n vertices.

Solution

The minimum is 0 when |E| is empty. The maximum number of edges depends on the following assumptions:

  • if self-loops (e.g, (u, u)\in E) are allowed:

    • if G is undirected, then all possible edges totalling {|V| \choose 2} + |V|

    • if G is directed, then all possible edges totalling 2{|V| \choose 2} + |V|

  • no self-loops:

    • if G is undirected, then all possible edges totalling {n \choose 2}.

    • if G is directed, then all possible edges totalling 2{|V| \choose 2}

Exercise 7.3 Implement a Vertex class to store the vertex attributes of a graph.

Exercise 7.4 Implement an Edge class to store the edge attributes of a graph.

7.1 Representing a Graph

There are two standard ways to represent a graph G=(V,E):

  • adjacency list, and

  • adjacency matrix.

Although both representations are equivalent, different graph algorithms prefer to take one over the other as input due to convenience and efficiently. Also, the preference is sometimes dictated by how sparse the graph is.

Adjacency Matrix

The adjacency-matrix representation of a graph assumes that the vertices are numbered 1, 2,\ldots, |V| in some arbitrary manner. Then the adjacency matrix of G is |V|\times|V| matrix A=(a_{ij}) such that a_{ij}=\begin{cases} 1 &\text{ if }(i,j)\in E, \\ 0 &\text{ otherwise } \end{cases}

Memory
  • The adjacency matrix requires \Theta(n^2) (meaning both O(n^2) and \Omega(n^2)) memory.

Exercise 7.5 Write down the adjacency matrix of G with V=\{1, 2, 3, 4, 5\} and E=\{(1, 2), (1, 5), (2, 3), (2, 4), (2, 5), (3, 4), (4, 5)\}.

Exercise 7.6 What can you infer about the nature of a graph from its adjacency matrix?

Adjacency List

The adjacency-list representation of a graph G consists of a list Adj such that each entry is the adjacency list Adj[u] of a vertex of u\in V.

For each u\in V, the adjacency list Adj[u] contains all the vertices v such that there is an edge (u,v)\in E. This is, Adj[u] consists of all the vertices adjacent to u in G.

Memory
  • The adjacency list requires \Theta(|V| + |E|) memory.

Exercise 7.7 Give an adjacency-list representation for a complete binary tree on 7 vertices. Give an equivalent adjacency-matrix representation. Assume that the edges are undirected and that the vertices are numbered from 1 to 7 as in a binary heap.

Exercise 7.8 What is the worst-case time complexity of discovering all the edges?

Solution

Discovering all edges takes \Theta(|V|+|E|)-time—rather than just \Theta(|E|).

7.2 Implementing a Graph

Exercise 7.9 Implement the Graph class in Python. Add methods to add and delete vertices and edges.

Exercise 7.10 Implement a method in class Graph to return the adjacency matrix representation.

7.3 Graph Traversal

We now discuss the two most commonly used algorithms to search a graph. A graph can be searched for a variety of reasons:

  • checking the reachability of a query node from a source node

  • counting the number of connected components

  • computing the shortest distance of a query node from a source

  • extracting a spanning tree rooted at a query source

  • searching for nodes with a query property

Breadth-First Search (BFS)

Given a graph G=(V, E) and a source vertex s, the BFS algorithm explores each vertex reachable from s. The idea is to discover the reachable vertices by gradually expanding the frontier of the seen vertices along the breadth of the frontier.

Alongside exploring the reachable vertices from the source, BFS computes the shortest-path distance from s to each reachable vertex and a BFS tree with root s containing all reachable vertices.

The basic logic of BFS is the following:

  1. Color each vertex as

    • White (not visited)
    • Gray (currently visiting or on the frontier)
    • Black (already visited).
  2. In the beginning each node is colored White—except for the source node s being Gray.

  3. Maintain a queue Q, to hold the currently exploring Gray nodes.

  4. In each iteration, do the following repeatedly until Q is empty:

    1. deQueue the head u of Q and color it Black

    2. using the adjacency list Adj[u], discover the neighbors of u that are still White

    3. color these unexplored neighbors as Gray and enQueue them to Q

Pseudocode

Here is a pseudocode of BFS:

\begin{algorithm} \caption{BFS} \begin{algorithmic} \Procedure{BFS}{$G, s$} \For{each vertex $u\in G.V$ and $v\neq s$} \State{$u.color=\text{White}$} \State{$u.d=\infty$} \State{$u.\pi=\text{NIL}$} \EndFor \State{$s.color=\text{Gray}$} \State{$s.d=0$} \State{$s.\pi=\text{NIL}$} \State{Queue $Q=\emptyset$} \State{\call{EnQueue}{$Q,s$}} \While{$Q\neq\emptyset$} \State{$u=$\call{DeQueue}{$Q$}} \State{▷ search the neighbors of $u$} \For{each vertex $v\in G.Adj[u]$} \If{$v.color==\text{White}$} \Comment{is $v$ being discovered now?} \State{$v.color=\text{Gray}$} \State{$v.d=u.d + 1$} \State{$v.\pi=u$} \State{\call{EnQueue}{$Q,v$}}\Comment{$v$ is now on the frontier} \EndIf \EndFor \State{$u.color=\text{Black}$} \Comment{$u$ is now behind the frontier} \EndWhile \EndProcedure \end{algorithmic} \end{algorithm}

Demo

Shortest Paths

Define the shortest-path distance \delta(s, v) from s to v as the minimum number of edges in any path from vertex s to vertex v. A path of length \delta(s, v) is called a shortest path from s to v. If v is not reachable from s, then \delta(s, v)=\infty.

A remarkable byproduct of BFS is that it computes the shortest-path distances v.d of all reachable vertices v from s along the way; see line #18. The proof that v.d is, in fact, \delta(s, v) is rather technical. An interested student may call upon the textbook (Cormen et al. 2009, 558–60).

Cormen, Thomas H., Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2009. Introduction to Algorithms, Third Edition. 3rd ed. The MIT Press.

BFS Tree

Another noteworthy byproduct of BFS is the construction of a breadth-first tree, a tree rooted at s containing all reachable nodes.

The tree corresponds to the \pi attributes in the BFS algorithm; the blue edges in the demo. More specifically, v.π denotes the parent in the tree of a reachable node v from s; see line #19.

Depth-first Search (DFS)

Given a graph G=(V, E), the DFS algorithm searches deeper in the graph to explore all vertices of G. Depth-first search explores edges out of the most recently discovered vertex v that still has unexplored edges leaving it. Once all of v’s edges have been explored, the search backtracks to explore edges leaving the vertex from which v was discovered. This process continues until all vertices that are reachable from the original source vertex have been discovered. If any undiscovered vertices remain, then the depth-first search selects one of them as a new source, repeating the search from that source.

Like BFS, the DFS algorithm keeps track of predecessor attribute v.\pi, producing a subgraph during the search. Since DFS chooses different sources, the resulting subgraph is often a forest, unlike a tree in the case of BFS.

The basic logic of DFS is the following:

  1. Color each vertex as

    • White (not visited)
    • Gray (currently visiting)
    • Black (already visited).
  2. In the beginning each node is colored White.

  3. Take a White vertex and run the subroutine DFS-VISIT on it to recursively explore all the White adjacent vertices.

  4. Take another unvisited vertex and repeat Step 3.

Pseudocode

Here is a pseudocode of DFS:

\begin{algorithm} \caption{DFS} \begin{algorithmic} \Procedure{DFS}{$G$} \For{each vertex $u\in G.V$} \State{$u.color=\text{White}$} \State{$u.\pi=\text{NIL}$} \EndFor \State{$time=0$} \For{each vertex $u\in G.V$} \If{$u.color==\text{White}$} \State{\call{DFS-VISIT}{$G, u$}} \EndIf \EndFor \EndProcedure \Procedure{DFS-VISIT}{$G, u$} \State{$time=time+1$} \State{$u.d=time$} \State{$u.color=\text{Gray}$} \For{each vertex $u\in G.Adj[u]$} \If{$u.color==\text{White}$} \State{$v.\pi=u$} \State{\call{DFS-VISIT}{$G, v$}} \EndIf \State{$time=time+1$} \State{$u.f=time$} \State{$u.color=\text{Black}$} \EndFor \EndProcedure \end{algorithmic} \end{algorithm}

Demo

7.4 Exercises

Exercise 7.11 Take a graph of your choice with at least 8 nodes and 10 edges. Dry-run DFS for this graph choosing any source of your choice.

Exercise 7.12 Explain why DFS terminates for any input graph.

Solution

A reachable vertex of G is enQueued and deQueued exactly once. Since there are only finitely many such vertices in G, the algorithm must terminate after finitely many steps.

Exercise 7.13 What would happen if we used a Stack instead of a Queue.

Solution

Scanning all the reachable vertices would still work, but the shortest distance would be miscalculated.

Exercise 7.14 What would happen if we removed line #23 in BFS.

Solution

The removal would not have any impact. The reason is we expand the frontier by enQueueing the incident White nodes. They would still be chosen in the correct order if we did not turn the Gray nodes Black after deQueued them.

Exercise 7.15 Describe the worst-case running time of BFS in terms of the number of vertices V and edges E of G.

Solution

The running-time of BFS is O(|V|+|E|). It is, in fact, \Theta(|V|+|E|)—both an upper and lower bound.

Exercise 7.16 A graph is called bipartite if each vertex can be labeled either 1 or 0 such that each edge connects to differently labeled vertices. Design an algorithm to check if a graph is bipartite.

Solution

Hint: Modify BFS.

Sather Tower, UC Berkeley. Image courtesy Public Domain Images