Graph Week

This week /r/DailyProgrammer posed a series of challenges related to graph theory.

The first challenge gives you the number of vertices of an undirected graph and the graph’s edges. It asks you to find the degree of each vertex. The bonus challenge asks you to construct the adjacency matrix of the graph. I solved both problems, and I combined my solutions into one solution.

Each vertex i corresponds to a row in the adjacency matrix. For each column j in the matrix, if there is an edge between vertex i and vertex j, then the matrix entry i,j is equal to 1, and 0 otherwise. So, to find the number of edges for a vertex in an undirected graph, you can just construct the adjacency matrix and then sum all of the entries of the corresponding row.

My solution to both challenges is up at my GitHub.

The next challenge gives you the edges of a directed graph. It asks you to find the radius and the diameter of the graph.

The eccentricity of a vertex is the greatest distance from that vertex to any other vertex. So, to find the radius and the diameter of a graph, you can find the eccentricity of each vertex. Then the radius is the smallest eccentricity, and the diameter is the largest eccentricity.

Because I find the eccentricity of each vertex, I find the distances between all pairs of vertices in the graph. So, my first thought was to apply the Floyd-Warshall algorithm. However, one of the given graphs has a cycle, and I mistakenly believed that Floyd-Warshall only applies to directed acyclic graphs. So, instead of using Floyd-Warshall, I designed a solution around breadth-first search. When I discovered that Floyd-Warshall only fails on graphs that have negative cycles, I implemented a solution that employs Floyd-Warshall.

Here is my Floyd-Warshall solution. Here is my breadth-first search solution. Both are up at my GitHub.

The last challenge posed by /r/DailyProgrammer gives you the number of vertices in an undirected graph and the graph’s edges. It asks you to find a maximum clique in the graph.

Even though the challenge is advertised as hard, I found it to be very easy. You can just recursively construct cliques and determine whether they are maximum. Although this is an exhaustive search, you can’t really solve the problem any faster than that, because the maximum clique problem is NP-complete. Fortunately, the size of the given problem instance is small enough for brute force to quickly yield a solution.

My solution to the last challenge is up at my GitHub.

Magic Squares

I solved today’s /r/DailyProgrammer challenge. The challenge is easy, but both of the optional bonus problems are significantly more difficult, and I solved both of them.

The first bonus problem is to decide whether any given grid of integers forms a magic square. The second bonus problem is, given a grid whose bottom row is missing, to decide whether the given grid could possibly form a magic square.

I put my code up on my GitHub.

EDIT (4/11/2016):

I was reading John D. Cook’s blog, and I read that all 3 x 3 magic squares have an interesting property. Being both curious and skeptical, I wrote a little Python program to see for myself whether it’s true that they really do. The code is up at my GitHub.

Dynamic Programming Problems

I have always liked dynamic programming. I went on a kick recently where I wrote solutions to several classic dynamic programming problems.

GeeksforGeeks published my implementation of the weighed interval scheduling problem. I wrote a Python program and a Java program to solve the weighted interval scheduling problem. When I was researching the problem, I noticed that the solution on GeeksforGeeks ran in O(n ^ 2) time, but I knew that the problem can be solved in O(n log n) time. So, I submitted a modification, and GeeksforGeeks accepted it.

Here are my implementations of several solutions to the discrete knapsack problem. I wrote a naive recursive solution, a memoized recursive solution, and a bottom-up dynamic programming solution.

I also implemented a naive recursive solution, a memoized recursive solution, and a bottom-up dynamic programming solution for the longest common subsequence problem.

Introduction to Algorithms, by Cormen, Leiserson, Rivest, and Stein, shows the optimal way to cut up a rod whose pieces you want to sell, and I implemented their solution in Python.

I implemented a bottom-up dynamic programming solution to the longest increasing subsequence problem.

Here are my implementations of the Fibonacci sequence. Again I wrote a naive recursive solution, a memoized recursive solution, and a bottom-up dynamic programming solution.

Here are my implementations of combination. I wrote a naive recursive solution, a memoized recursive solution, and a bottom-up dynamic programming solution.

Further resources:

Bellman, Richard. Dynamic Programming. Princeton, NJ: Princeton University Press, 1957.

http://cs.stackexchange.com/questions/28111/proof-of-an-optimal-substructure-in-dynammic-programming/28114#28114

http://programmers.stackexchange.com/questions/140233/how-to-prove-a-dynamic-programming-strategy-will-work-for-an-algorithm

https://www.cs.oberlin.edu/~asharp/cs280/2012fa/handouts/dp.pdf

Graham Scan

A polygon is convex if a line segment that joins any two points in the polygon is an edge of the polygon or lies inside of the polygon. The convex hull of a set of points is a subset of the set of points that forms the smallest convex polygon in which every point in the set of points is either on the boundary of the polygon or in the interior of the polygon.

To find the convex hull, the Graham Scan first places into the hull a point that must belong in it. Then it incrementally adds new points to the hull.

The point with the lowest y-coordinate must belong in the hull. So, the Graham Scan first places into the hull the point with the lowest y-coordinate. If there are multiple points that all have the lowest y-coordinate, then the algorithm picks the one that has the lowest x-coordinate.

The remaining points are sorted by polar angle around this leftmost lowest point, and each of the remaining points is scanned. If a point lies to the left of the line segment that joins the second-to-last point in the hull and the last point in the hull, then the point is added to the hull. If it does not, then the last point in the hull is removed until it does. Then it is added to the hull.

I implemented the Graham Scan in Java. My implementation is up at my GitHub.

Further references:

Graham, R. L. “An Efficient Algorithm for Determining the Convex Hull of a Finite Planar Set.” Information Processing Letters 1 (1972): 132-133. Amsterdam: North-Holland Publishing Company.

Skiena, Steven S., and Miguel A. Revilla. Programming Challenges: The Programming Contest Training Manual. New York: Springer-Verlag, 2003.

https://en.wikipedia.org/wiki/Graham_scan

http://courses.csail.mit.edu/6.006/spring11/rec/rec24.pdf

http://algs4.cs.princeton.edu/99hull/GrahamScan.java.html

http://algs4.cs.princeton.edu/99hull/Point2D.java.html

http://www.geeksforgeeks.org/convex-hull-set-2-graham-scan/

The Maximum Flow Problem

A flow network is a directed graph where each edge has a nonnegative capacity and two vertices are distinguished as a source and a sink. A flow in a flow network is a function that maps edges to real numbers and satisfies two requirements:

(1) For each edge, the flow through the edge is equal to or less than the capacity of the edge.

(2) For each vertex other than the source and the sink, the amount of flow going into the vertex is equal to the amount of flow going out of it. The requirement does not apply to the source because the source does not have any flow going into it, and the requirement does not apply to the sink because the sink does not have any flow going out of it.

The value of a flow is the total amount of flow going into the sink. A maximum flow is a flow of maximum value. The maximum flow problem is to find a maximum flow, given a flow network.

For any two vertices u and v, if (u, v) is an edge in the flow network, then the residual capacity of u and v is the difference between the capacity of (u, v) and the flow through (u, v). If instead (v, u) is an edge in the flow network, then the residual capacity of u and v is the flow through (v, u). If neither (u, v) nor (v, u) is an edge in the flow network, then the residual capacity of u and v is zero.

A residual network is another directed graph. It has all of the vertices that the flow network has, but it includes an edge (u, v) if and only if the residual capacity of u and v is positive.

An augmenting path is a path from the source to the sink in the residual network. A flow in a flow network is maximum if and only if the corresponding residual network has no augmenting path. This fact yields a useful method. The Ford-Fulkerson method starts with an initial flow, and, while there is an augmenting path, it uses the augmenting path to increase the total flow.

The Edmonds-Karp algorithm is an instance of the Ford-Fulkerson method that uses breadth-first search to look for augmenting paths. The Edmonds-Karp algorithm runs in O(V * E^2) time.

Many linear programming problems can be modeled as network-flow problems, and network-flow algorithms can solve these problems much faster than general-purpose linear programming methods. I implemented Edmonds-Karp in Java to solve an instance of the maximum bipartite matching problem.

My Java implementation of Edmonds-Karp is up at my GitHub.

References:

Cormen, Thomas, Charles Leiserson, Ronald Rivest, and Clifford Stein. Introduction to Algorithms (3rd ed.). Cambridge, MA: The MIT Press, 2009.

Ford, L. R., and D. R. Fulkerson. Flows in Networks. Princeton, NJ: Princeton University Press, 1962.

Hidden Password

Say that, if the characters of X appear in Y in the same order that they appear in X, then X is hidden in Y. So, if X is “ABC” and Y is “HAPPYBIRTHDAYCACEY”, then X is hidden in Y. However, if X is “ABC” and Y is “TRAGICBIRTHDAYCACEY”, then X is not hidden in Y, because “C” appears before “B” in “TRAGICBIRTHDAYCACEY”. Of course, if X is “ABC” and Y is “HAPPYBIRTHDAY”, then X is not hidden in Y, because “C” never appears in “HAPPYBIRTHDAY”.

Now, given two strings X and Y, decide whether X is hidden in Y.

In the ACM-ICPC’s 2015 Mid-Central USA Programming Contest, this problem appeared in the guise of deciding whether a given password is hidden inside a given message. My solution to the problem is recursive:

1) If X is empty, then it is vacuously true that X is hidden in Y.

2) If X is nonempty and Y is empty, then it is false that X is hidden in Y.

3) If the first character of X and the first character of Y are the same, then continue the procedure on the rest of X and the rest of Y.

4) If the first character of X and the first character of Y are different, but the first character of Y appears somewhere else in X, then it is false that X is hidden in Y.

5) If none of the above conditions obtains, then continue the procedure on X and the rest of Y.

My Python code and my Scheme code are both up at my GitHub.

Simulated Annealing

You have a problem. You already have some solution to your problem, but you want the very best solution.

There is a set of local modifications that you can make to your solution. After you make a modification, you have a new solution that neighbors your old solution.

You try each modification. If none of the neighboring solutions is better than your solution, then you stay with your solution. If at least one of them is better than your solution, then you change your solution to the best of them. With this new solution now as your solution, you repeat the process. You continue like this until no neighboring solution is better than your solution.

This strategy is an instance of local search. Specifically, it is hill climbing.

The issue with this approach is that, although your solution improves as it iteratively changes into its best neighboring solution, the very best solution might never be in the neighborhood of your solution. So, you might never find the very best solution.

Simulated annealing combats this issue. At each iteration, a neighboring solution is drawn at random. Like in hill climbing, if it is better than the current solution, then it becomes the current solution. However, unlike in hill climbing, there is still a probability that it will become the current solution even if it is worse than the current solution. The hope is that by chance the solution will enter a neighborhood where the very best solution lives.

In fact, if you model the sequence of solutions as a Markov chain, then you can show that simulated annealing converges toward the set of optimal solutions. However, it converges in the limit. Like hill climbing, simulated annealing is a heuristic. There is no guarantee that it will find a global optimum.

In the single machine total weighted tardiness problem, you want to process a set of jobs on a single machine. Each job has a processing time, a due date, and a weight that represents its importance. You can only process one job at a time. If a job finishes after its due date, then it is tardy. The importance of tardiness depends on the importance of the job, and you want to minimize the total weighted tardiness for all jobs.

I wrote a Java program that applies simulated annealing to the single machine total weighted tardiness problem. It is up at my GitHub.

Further references:

http://www.inf.ufpr.br/aurora/disciplinas/topicosia2/livros/search/SA.pdf
http://www.theprojectspot.com/tutorial-post/simulated-annealing-algorithm-for-beginners/6
http://katrinaeg.com/simulated-annealing.html