Backing Up Your Files

Recently I manually backed up files to a USB drive. During that extremely pleasant experience, I decided that I’m going to automate the process for the future.

Backing up files by hand is time-consuming, error-prone, tedious, and stressful. So, I created a little bash script that will do the work for me.

The Coin Change Problem

The problem

How many ways can you make change for a dollar?

You have four denominations available. You can use quarters, dimes, nickels, and pennies, and you have all the coins that you might need. So, you could make change for a dollar with 100 pennies. You could use four quarters. You could use three quarters, two dimes, and one nickel. You could use one quarter, two dimes, four nickels, and 35 pennies.

There are a lot of ways to make change for a dollar. How do you find out the total number of ways?

In general, given a list of denominations D and an amount of money M, how do you find the total number of ways to make change for M?

To answer that question, we’re going to apply dynamic programming. First we’ll take a top-down approach. Then we’ll take a bottom-up approach.

The top-down approach

This is the top-down approach in a nutshell: If the given problem instance is trivial, then solve it; if it’s not trivial, then cut it down to size.

Given an instance of the problem, we continually reduce the size of the problem instance until we reach subinstances that are trivial to solve. To produce a solution to the original problem instance, we combine all of the solutions to the subinstances.

Two parameters define the coin change problem: an amount of money (M) and a list of coin denominations (D). So, you can reduce a problem instance in two ways: decrease M or shorten D. We’ll do both.

We divide a nontrivial problem instance into these two subinstances:

(1) Find the number of ways to make change for M using the first denomination in D
(2) Find the number of ways to make change for M without using the first denomination in D

To illustrate the procedure, let’s say that M = 100 and D = [quarters, dimes, nickels, pennies].

If you use the first denomination in D, then you put in 25 cents. Now you only have to make change for \$0.75. So, you reduce the size of the problem instance because you reduce the size of its first parameter, M.

Let’s say that you don’t use the first denomination in D. So, you’ve effectively reduced D to [dimes, nickels, pennies]. So, you reduce the size of the given problem instance by reducing the second parameter of the problem.

You continue dividing problem instances until you reach trivial subinstances. The coin change problem is trivial in these instances:

-You must make change for no amount of money. There is only one way to solve that problem: You use no coins.
-You must make change for a negative amount of money. There are zero ways to do that.
-You must make change for some amount of money, but there are no denominations to choose coins from. There are no ways to do that, either.

The subproblems are the same type of problem that the original problem is, because the subproblems require you to find the number of ways to make change for a given amount of money, too. So, we apply the same problem-solving procedure to the subproblems that we do to the original problem. Consequently, it is natural to use a recursive algorithm, like it was when I implemented a solution to the Towers of Hanoi problem. So, I implemented the algorithm in Scheme.

The bottom-up approach

We just examined a top-down approach to the coin change problem. We applied a divide-and-conquer strategy. We started with a problem instance and divided it into smaller and smaller subinstances, working our way down until we arrived at problem instances that are trivial to solve.

We can also employ a bottom-up approach, where we start with the solutions to the smallest subinstances of the problem and continually build on them until we get up to a solution to the entire problem instance.

The top-down, recursive approach typically solves a lot of the same subinstances many times. The bottom-up approach does not. So, the bottom-up approach is typically a lot faster. In the specific case of the coin change problem, the bottom-up approach is O(mn), where m is the number of coin denominations and n is the amount of money (in cents). In contrast, if you don’t memoize the recursive algorithm, it runs in exponential time.

As I have in the past, I used dynamic programming to design just such a bottom-up strategy. Here is a Python program that implements it. The program also features a naive recursive solution and a memoized recursive solution.

In the past, I also implemented a top-down, recursive algorithm and then designed and implemented a bottom-up algorithm. The algorithms find the maximum independent set of a tree. Here is the Python program that implements the bottom-up algorithm. Here is the Python program that implements the top-down algorithm.

The algorithms that solve the maximum independent set problem and the coin change problem have a feature in common: The use of inclusion and non-inclusion to neatly divide a problem instance into just two subinstances.

The maximum independent set of a tree either includes the root of the tree or it doesn’t include the root of the tree. Exploiting that fact, you construct the independent set that includes the root and the independent set that doesn’t include the root, and then you take the larger of the two.

Likewise, when you count the total number of ways that you can make change for some amount of money, you count the number of ways that you can make change for it if you include the first denomination on the list and the number of ways if you don’t include that denomination.

I have found that many dynamic programming approaches feature this technique. For example, an approach to the discrete knapsack problem features it, too: To compute the maximum value that the knapsack can contain, you compute the value when you include the current item and when you don’t include the current item, and then you take the larger of the two. I have implemented that approach, too.

MIT professor Erik Demaine has said that an essential part of dynamic programming is guessing. In his opinion, dynamic programming is “careful brute force”. If you don’t know the solution to some problem, then try every possible solution and take a best one. And that is what is going on in the above procedures. The technique that all of these procedures deploy is just a clever way to partition the set of all possible solutions. Because of the law of excluded middle, a solution either includes an element or doesn’t include it. So, we only have to process two possibilities.

Ranked Pairs and “The Wire”

A few days ago, a redditor asked the people of /r/TheWire to rank the five seasons of the HBO television show “The Wire”, and several people posted their rankings. I wanted to take their individual rankings and uncover one aggregate ranking that captured the overall preferences of the entire group of posters. In other words, I wanted to implement a preferential voting scheme.

I was aware of Arrow’s theorem, which says that no voting scheme can satisfy a certain small set of reasonable criteria when voters are ranking three or more alternatives. However, I was also aware that a number of countries employ preferential voting systems. So, I didn’t give up, and I searched for a preferential voting system that would fit the bill.

I found one that worked: the ranked pairs voting scheme. In the ranked pairs voting system, you first count all the times that voters ranked Alternative X ahead of Alternative Y, for all pairs of alternatives (X, Y). If voters ranked Alternative X ahead of Alternative Y more often, then Alternative X is a majority over Alternative Y.

Next you sort the majorities. For example, say that voters ranked Alternative X ahead of Alternative Y three times and that they ranked Alternative A ahead of Alternative B five times. Then the latter majority comes before the former majority in the sort.

But what if the list of majorities creates a cycle? For example, say that we have [(A, B), (B, C), (C, A)]. Then A is a majority over B and B is a majority over C. But C is a majority over A. So, by transitivity, A is a majority over A. But an alternative X can be a majority over an alternative Y only if X was ranked ahead of Y, and A can’t be ranked ahead of itself.

So, here’s how you resolve that issue. You treat the list of majorities like a directed acyclic graph where the dag includes a directed edge from X to Y if X is a majority over Y. And you just go down the sorted list of majorities. For each majority, you try to add a corresponding edge to the dag. If adding that edge would produce a cycle, then you don’t add it. If it wouldn’t produce a cycle, then you do add it, and that majority gets locked in to the final list of majorities.

The winner of the election is the source of the dag, i.e., the vertex that doesn’t have any incoming edges. But I didn’t want to know just the winner. I wanted to know the aggregate ranking. So, I sorted the alternatives by the number of outgoing edges that their corresponding vertices have in the dag. The number of outgoing edges of a vertex corresponds to the number of alternatives that an alternative is a majority over.

I collected the data on the night of September 1. I simply scrolled down the list of comments and recorded the rankings that commenters reported. The only complication was that a couple of commenters expressed indifference between some seasons. So, for example, one person wrote, “4 > 1 > 3/2 > 5”. In such a case, I ranked seasons in the order that they were written. So, in this particular case, I placed Season 3 before Season 2.

Then I ran the program, and the result was 4 > 1 > 2 > 3. Season 5 didn’t even make the list.

You can find my program up at my GitHub.

Further references:

Swapping Cards

Earlier today I searched through the ACM-ICPC archive of problems that have been used in ACM-ICPC Regionals and World Finals, looking for some interesting problems to solve. I found one that reminded me of another ACM-ICPC problem that I solved. You can read the full description of it, but I’ll put it in a nutshell:

You’re given a stack of playing cards. You can swap adjacent cards. You want to find the minimum number of swaps it takes to sort the cards so that all the diamonds and hearts in the stack appear before all the spades and clubs.

I realized that you are searching for a certain state of the stack of cards. Because you want to find that state in the minimum number of swaps, you can perform a version of breadth-first search to solve the problem.

I remembered the Marbles in Three Baskets problem, another ACM-ICPC problem that I solved. I applied the same basic technique and philosophy of problem-solving to this problem that I outline in the post about the Marbles in Three Baskets problem.

You can find my solution to this problem at my GitHub.

Vertex Coloring and the Min-Conflicts Algorithm

In the vertex coloring problem, you are given a graph and a set of colors. You want to assign colors to the vertices of the graph in such a way that no adjacent vertices have the same color.

The vertex coloring problem appears in a lot of different forms in the real world. To take a rather frivolous example, just consider Sudoku. Each cell corresponds to a vertex, the numbers 1-9 correspond to colors that you can assign to the vertices, and two vertices are adjacent if the corresponding cells are in the same row, column, or subgrid.

The vertex coloring problem is a constraint satisfaction problem. In a constraint satisfaction problem, you are given a set of variables, a set of possible values for each variable, and a set of constraints. You want to find an assignment of values to the variables that satisfies the constraints.

In the vertex coloring problem, the vertices are the variables, the colors are the possible values for each variable, and the constraint is that no adjacent vertices can have the same color.

There are a number of ways to solve constraint satisfaction problems. One way is recursive backtracking. The n queens problem is another constraint satisfaction problem, and I have applied recursive backtracking to solve the n queens problem.

Today I want to talk about the min-conflicts algorithm. Like simulated annealing, which I have also talked about, the min-conflicts algorithm is a type of local search. You start with some assignment of values to variables. Unless you get really lucky, that initial assignment will produce conflicts. For example, if a vertex is colored green and two of its neighboring vertices are colored green, then its being colored green produces two conflicts. So, at each iteration, the min-conflicts algorithm picks a conflicting variable at random and assigns to it a value that produces the minimal number of conflicts. If there is more than one such value, then it chooses one at random. It then repeats this process again and again, until it either finds a complete assignment of values that satisfies all of the constraints or goes through the maximum number of iterations that it may go through.

I implemented the min-conflicts algorithm, and I applied it to the vertex covering problem. I also decided to go back and apply the min-conflicts algorithm to the n queens problem. You can find more of my programs at my GitHub.

Graph Week

This week /r/DailyProgrammer posed a series of challenges related to graph theory.

The first challenge gives you the number of vertices of an undirected graph and the graph’s edges. It asks you to find the degree of each vertex. The bonus challenge asks you to construct the adjacency matrix of the graph. I solved both problems, and I combined my solutions into one solution.

Each vertex i corresponds to a row in the adjacency matrix. For each column j in the matrix, if there is an edge between vertex i and vertex j, then the matrix entry i,j is equal to 1, and 0 otherwise. So, to find the number of edges for a vertex in an undirected graph, you can just construct the adjacency matrix and then sum all of the entries of the corresponding row.

The next challenge gives you the edges of a directed graph. It asks you to find the radius and the diameter of the graph.

The eccentricity of a vertex is the greatest distance from that vertex to any other vertex. So, to find the radius and the diameter of a graph, you can find the eccentricity of each vertex. Then the radius is the smallest eccentricity, and the diameter is the largest eccentricity.

Because I find the eccentricity of each vertex, I find the distances between all pairs of vertices in the graph. So, my first thought was to apply the Floyd-Warshall algorithm. However, one of the given graphs has a cycle, and I mistakenly believed that Floyd-Warshall only applies to directed acyclic graphs. So, instead of using Floyd-Warshall, I designed a solution around breadth-first search. When I discovered that Floyd-Warshall only fails on graphs that have negative cycles, I implemented a solution that employs Floyd-Warshall.

Here is my Floyd-Warshall solution. Here is my breadth-first search solution. Both are up at my GitHub.

The last challenge posed by /r/DailyProgrammer gives you the number of vertices in an undirected graph and the graph’s edges. It asks you to find a maximum clique in the graph.

Even though the challenge is advertised as hard, I found it to be very easy. You can just recursively construct cliques and determine whether they are maximum. Although this is an exhaustive search, you can’t really solve the problem any faster than that, because the maximum clique problem is NP-complete. Fortunately, the size of the given problem instance is small enough for brute force to quickly yield a solution.