7  Graphs

“To err is human — and to blame it on a computer is even more so.”
– Robert Orben

Learning Objectives

After completing this chapter, students are expected to understand:

  • the idea and implementation of graphs

  • how BFS and DFS work

  • how to find shortest distance paths between vertices in a graph

  • how to apply the two kinds of search algorithms to design optimal solutions

A graph is a collection of vertices (also called nodes) and edges between them. Sometimes, the vertices have vertex attributes and edges a direction and/or weights. More formally, a graph G is denoted by (V, E), where V=\{v_1,v_2,\ldots,v_n\} is the vertex set and E\subset V\times V is the collection of edges.

The above graph G has 6 vertices: V=\{0, 1, 2, 3, 4, 5\} and 5 edges: E=\{(0, 3), (1, 4), (1, 3), (2, 3), (3, 5)\}.

Note that the graph G above is an undirected graph, unlike a directed graph that has directed edges or arrows. We will consider only undirected graphs here.

Also, the edges of G are unweighted. In a weighted graph, each edge e\in E has an edge weight w_e.

Exercise 7.1 Draw the graph G=(V,E), where

  • V=\{v_0, v_1, v_2, v_3\}

  • E is the set of all possible edges.

Exercise 7.2 What is the minimum and maximum number of edges possible in a graph with n vertices.

Exercise 7.3 Implement a Vertex class to store the vertex attributes of a graph.

Exercise 7.4 Implement an Edge class to store the edge attributes of a graph.

7.1 Representing a Graph

There are two standard ways to represent a graph G=(V,E):

  • adjacency list, and

  • adjacency matrix.

Although both representations are equivalent, different graph algorithms prefer to tak one over the other as input due to convenience and efficiently. Also, the preference is sometimes dictated by how sparse the graph is.

Adjacency Matrix

The adjacency-matrix representation of a graph assumes that the vertices are numbered 1, 2,\ldots, |V| is some arbitrary manner. Then the adjacency matrix of G is |V|\times|V| matrix A=(a_{ij}) such that a_{ij}=\begin{cases} 1 &\text{ if }(i,j)\in E, \\ 0 &\text{ otherwise } \end{cases}

Memory
  • The adjacency matrix requires \Theta(n^2) (meaning both O(n^2) and \Omega(n^2)) memory.

Exercise 7.5 Write down the adjacency matrix of G with V=\{1, 2, 3, 4, 5\} and E=\{(1, 2), (1, 5), (2, 3), (2, 4), (2, 5), (3, 4), (4, 5)\}.

Exercise 7.6 What can you infer about the nature of a graph from its adjacency matrix?

Adjacency List

The adjacency-list representation of a graph G consists of a list Adj such that each entry is the adjacency list Adj[u] of a vertex of u\in V.

For each u\in V, the adjacency list Adj[u] contains all the vertices v such that there is an edge (u,v)\in E. This is, Adj[u] consists of all the vertices adjacent to u in G.

Memory
  • The adjacency list requires \Theta(|V| + |E|) memory.

Exercise 7.7 Give an adjacency-list representation for a complete binary tree on 7 vertices. Give an equivalent adjacency-matrix representation. Assume that the edge are undirected and that the vertices are numbered from 1 to 7 as in a binary heap.

Exercise 7.8 What is the worst-case time-complexity of discovering all the edges?

7.2 Implementing a Graph

Exercise 7.9 Implement the Graph class in Python. Add methods to add and delete vertices and edges.

Exercise 7.10 Implement a method in class Graph to return the adjacency matrix representation.

7.3 Graph Traversal

We now discuss two most commonly used algorithms to search a graph. A graph can searched for a variety of reasons:

  • checking reachability of a query node from a source node

  • counting the number of connected components

  • computing the shortest distance of a query node from a source

  • extracting a spanning-tree rooted at a query source

  • searching for nodes with a query property

Breadth-First Search (BFS)

Given a graph G=(V, E) and a source vertex s, the BFS algorithm explores each vertex reachable from s. The idea is to discover the reachable vertices by gradually expanding the frontier of the seen vertices along the breadth of the frontier.

Alongside exploring the reachable vertices from the source, BFS computes the shortest-path distance from s to each reachable vertex and a BFS tree with root s containing all reachable vertices.

The basic logic of BFS is the following:

  1. Color each vertex as

    • White (not visited)
    • Gray (currently visiting or on the frontier)
    • Black (already visited).
  2. In the beginning each node is colored White—except for the source node s being Gray.

  3. Maintain a queue Q, to hold the currently exploring Gray nodes.

  4. In each iteration, do the following repeatedly until Q is empty:

    1. deQueue the head u of Q and color it Black

    2. using the adjacency list Adj[u], discover the neighbors of u that are still White

    3. color these unexplored neighbors as Gray and enQueue them to Q

Pseudocode

Here is a pseudocode of BFS:

\begin{algorithm} \caption{BFS} \begin{algorithmic} \Procedure{BFS}{$G, s$} \For{each vertex $u\in G.V$ and $v\neq s$} \State{$u.color=\text{White}$} \State{$u.d=\infty$} \State{$u.\pi=\text{NIL}$} \EndFor \State{$s.color=\text{Gray}$} \State{$s.d=0$} \State{$s.\pi=\text{NIL}$} \State{Queue $Q=\emptyset$} \State{\call{EnQueue}{$Q,s$}} \While{$Q\neq\emptyset$} \State{$u=$\call{DeQueue}{$Q$}} \State{▷ search the neighbors of $u$} \For{each vertex $v\in G.Adj[u]$} \If{$v.color==\text{White}$} \Comment{is $v$ being discovered now?} \State{$v.color=\text{Gray}$} \State{$v.d=u.d + 1$} \State{$v.\pi=u$} \State{\call{EnQueue}{$Q,v$}}\Comment{$v$ is now on the frontier} \EndIf \EndFor \State{$u.color=\text{Black}$} \Comment{$u$ is now behind the frontier} \EndWhile \EndProcedure \end{algorithmic} \end{algorithm}

Demo

Shortest Paths

Define the shortest-path distance \delta(s, v) from s to v as the minimum number of edges in any path from vertex s to vertex v. A path of lenght \delta(s, v) is called a shortest path from s to v. If v is not reachable from s, then \delta(s, v)=\infty.

A remarkable byproduct of BFS is that it computes the shortest-path distances v.d of all reachable vertices v from s along the way; see line #18. The proof that v.d is, in fact, \delta(s, v) is rather technical. An interested student may call upon the textbook (Cormen et al. 2009, 558–60).

Cormen, Thomas H., Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2009. Introduction to Algorithms, Third Edition. 3rd ed. The MIT Press.

BFS Tree

Another noteworthy byproduct of BFS is the construction of a breadth-first tree, a tree rooted at s containing all reachable nodes.

The tree corresponds to the \pi attributes in the BFS algorithm; the blue edges in the demo. More specifically, v.π denotes the parent in the tree of a reachable node v from s; see line #19.

Depth-first Search (DFS)

Given a graph G=(V, E), the DFS algorithm searches deeper in the graph to explore all vertices of G. Depth-first search explores edges out of the most recently discovered vertex v that still has unexplored edges leaving it. Once all of v’s edges have been explored, the search backtracks to explore edges leaving the vertex from which v was discovered. This process continues until all vertices that are reachable from the original source vertex have been discovered. If any undiscovered vertices remain, then depth-first search selects one of them as a new source, repeating the search from that source.

Like BFS, the DFS algorithm keeps track of predecessor attribute v.\pi, producing a subgraph during the search. Since DFS chooses different sources, the resulting subgraph is often a forest, unlike a tree in case of BFS.

The basic logic of DFS is the following:

  1. Color each vertex as

    • White (not visited)
    • Gray (currently visiting)
    • Black (already visited).
  2. In the beginning each node is colored White.

  3. Take a White vertex and run the subroutine DFS-VISIT on it to recursively explore all the White adjacent vertices.

  4. Take another unvisited vertex and repeat Step 3.

Pseudocode

Here is a pseudocode of DFS:

\begin{algorithm} \caption{DFS} \begin{algorithmic} \Procedure{DFS}{$G$} \For{each vertex $u\in G.V$} \State{$u.color=\text{White}$} \State{$u.\pi=\text{NIL}$} \EndFor \State{$time=0$} \For{each vertex $u\in G.V$} \If{$u.color==\text{White}$} \State{\call{DFS-VISIT}{$G, u$}} \EndIf \EndFor \EndProcedure \Procedure{DFS-VISIT}{$G, u$} \State{$time=time+1$} \State{$u.d=time$} \State{$u.color=\text{Gray}$} \For{each vertex $u\in G.Adj[u]$} \If{$u.color==\text{White}$} \State{$v.\pi=u$} \State{\call{DFS-VISIT}{$G, v$}} \EndIf \State{$time=time+1$} \State{$u.f=time$} \State{$u.color=\text{Black}$} \EndFor \EndProcedure \end{algorithmic} \end{algorithm}

Demo

7.4 Exercises

Exercise 7.11 Take a graph of your choice with at least 8 nodes and 10 edges. Dry-run DFS for on this graph choosing any source of your choice.

Exercise 7.12 Explain why DFS terminates for any input graph.

Exercise 7.13 What would happen if we used a Stack instead of a Queue.

Exercise 7.14 What would happen if we removed line #23 in BFS.

Exercise 7.15 Describe the worst-case running time of BFS in terms of the number of vertices V and edges E of G.

Exercise 7.16 A bipartite graph is a graph such that each vertex is colored either red or blue and each edge connects to differently colored vertices. Design a algorithm to check if a graph is bipartite.

Sather Tower, UC Berkeley. Image courtesy Public Domain Images