Swapping Cards

Earlier today I searched through the ACM-ICPC archive of problems that have been used in ACM-ICPC Regionals and World Finals, looking for some interesting problems to solve. I found one that reminded me of another ACM-ICPC problem that I solved. You can read the full description of it, but I’ll put it in a nutshell:

You’re given a stack of playing cards. You can swap adjacent cards. You want to find the minimum number of swaps it takes to sort the cards so that all the diamonds and hearts in the stack appear before all the spades and clubs.

I realized that you are searching for a certain state of the stack of cards. Because you want to find that state in the minimum number of swaps, you can perform a version of breadth-first search to solve the problem.

I remembered the Marbles in Three Baskets problem, another ACM-ICPC problem that I solved. I applied the same basic technique and philosophy of problem-solving to this problem that I outline in the post about the Marbles in Three Baskets problem.

You can find my solution to this problem at my GitHub.

Advertisements

The Sieve of Eratosthenes

Given an integer N > 1, you want to find all of the prime numbers from 1 to N. For example, if N = 11, then you get {2, 3, 5, 7, 11}.

The Sieve of Eratosthenes is a simple way to solve this problem. The basic idea behind it is this: Construct the list of all integers from 1 to N, and remove the composite numbers.

It is easy to construct the list of all integers from 1 to N. So, the question is: How do you remove the composite numbers?

It is a fact of number theory that every composite number k has a prime factor that is less than or equal to sqrt(k). To see this, consider any composite number k = mn. If both m > sqrt(k) and n > sqrt(k), then k < mn, which contradicts the fact that k = mn. So, k must have a prime factor that is less than or equal to sqrt(k).

So, every composite number from 1 to N must have a prime factor that is less than or equal to sqrt(N). So, just iterate over each prime number from 1 to sqrt(N) and remove all of its multiples in the list. In that way you remove all of the composite numbers from the list.

I implemented the Sieve of Eratosthenes in C, and I implemented the Sieve of Eratosthenes in Python. Both of my implementations are up at my GitHub.

References:

https://proofwiki.org/wiki/Sieve_of_Eratosthenes
http://pages.pacificcoast.net/~cazelais/222/sieve.pdf

Vertex Coloring and the Min-Conflicts Algorithm

In the vertex coloring problem, you are given a graph and a set of colors. You want to assign colors to the vertices of the graph in such a way that no adjacent vertices have the same color.

The vertex coloring problem appears in a lot of different forms in the real world. To take a rather frivolous example, just consider Sudoku. Each cell corresponds to a vertex, the numbers 1-9 correspond to colors that you can assign to the vertices, and two vertices are adjacent if the corresponding cells are in the same row, column, or subgrid.

The vertex coloring problem is a constraint satisfaction problem. In a constraint satisfaction problem, you are given a set of variables, a set of possible values for each variable, and a set of constraints. You want to find an assignment of values to the variables that satisfies the constraints.

In the vertex coloring problem, the vertices are the variables, the colors are the possible values for each variable, and the constraint is that no adjacent vertices can have the same color.

There are a number of ways to solve constraint satisfaction problems. One way is recursive backtracking. The n queens problem is another constraint satisfaction problem, and I have applied recursive backtracking to solve the n queens problem.

Today I want to talk about the min-conflicts algorithm. Like simulated annealing, which I have also talked about, the min-conflicts algorithm is a type of local search. You start with some assignment of values to variables. Unless you get really lucky, that initial assignment will produce conflicts. For example, if a vertex is colored green and two of its neighboring vertices are colored green, then its being colored green produces two conflicts. So, at each iteration, the min-conflicts algorithm picks a conflicting variable at random and assigns to it a value that produces the minimal number of conflicts. If there is more than one such value, then it chooses one at random. It then repeats this process again and again, until it either finds a complete assignment of values that satisfies all of the constraints or goes through the maximum number of iterations that it may go through.

I implemented the min-conflicts algorithm, and I applied it to the vertex covering problem. I also decided to go back and apply the min-conflicts algorithm to the n queens problem. You can find more of my programs at my GitHub.

The ID3 Algorithm

You are given the voting records of a set of Democrats and the voting records of a set of Republicans. Now you are given the voting record of some unidentified politician, and you want to decide whether they are a Democrat or a Republican. How do you do that?

One way to do it is to take the voting records that you’re given and construct a decision tree. Each internal node in the decision tree corresponds to an attribute of a politician’s voting record, and each edge in the tree corresponds to a value of an attribute. To classify a given politician, you start at the root of the tree and determine the value of the corresponding attribute. Then you take the edge that corresponds to the value of that attribute. Now you’re at a new node, and you repeat the process. In this way you walk down the tree until you reach a leaf which designates the party that the politician probably belongs to.

The ID3 algorithm is a procedure to generate decision trees. ID3 is essentially a greedy search through the space of decision trees. You start with an empty tree and then iteratively construct the decision tree, starting at the root. At each iteration, you add the node whose corresponding attribute would produce the most gain in information if you were given its value.

I implemented the ID3 algorithm in Python. My implementation is up at my GitHub.

Each path from the root to a leaf in a decision tree corresponds to a rule. For example, a politician who voted against freezing physician fees and in favor of increasing spending on education is, as a rule, a Democrat. An entire decision tree corresponds to a set of rules. I wrote and included in my implementation a function that builds a ruleset out of a given decision tree.

And I applied the ID3 algorithm to solve the problem that I gave above. The solution is also up at my GitHub.

Of Dijkstra and Priority Queues

Although I’ve implemented variants of Dijkstra’s algorithm in the past, I wanted to implement Dijkstra’s algorithm with my own implementation of the priority queue. So, today I bring to you a reinvention of the wheel.

Especially when the project is small, I tend to develop software in Python and then translate the code into another language. Python is a clean, easy-to-use language that has a REPL. Consequently, it is quick and fun to develop in Python. Oftentimes there’s a better tool for the job, but I like to use Python when I can.

In this case, I implemented Dijkstra’s algorithm and the priority queue in Python and then translated the code into Java. So, I have:

A Java implementation of the priority queue
A Java implementation of Dijkstra’s algorithm
A Python implementation of the priority queue
A Python implementation of Dijkstra’s algorithm.

One of the many interesting things that I learned from my artificial intelligence class is that the big difference between breadth-first search, depth-first search, and Dijkstra’s algorithm is the data structure that stores vertices.

In breadth-first search, the data structure is a queue. The first item that you put into it is the first item that you remove from it.

In depth-first search, the data structure is a stack. The last item that you put into it is the first item that you remove from it.

In Dijkstra’s algorithm, the data structure is a priority queue. The item that has the highest priority is the item that gets removed.

You can actually think of stacks and queues as types of priority queues. A queue is a priority queue where elements have more precedence the earlier they enter the container. A stack is a priority queue where elements have more precedence the later they enter the container.

Priority queues are commonly implemented as binary heaps, and that is how I implemented my priority queue. A binary heap is a complete binary tree that is ordered such that, for any given vertex, all of the vertex’s children have lower priority than it.

Dijkstra’s algorithm solves the single-source shortest path problem. In the single-source shortest path problem, you are given a graph and some distinguished vertex, and you want to find the shortest paths between the distinguished vertex and every other vertex in the graph. So, for example, let’s say that you are in Atlanta, and you want to find the quickest routes to Dallas, Chicago, and New York. Represent the highway system as a graph, and then you can apply Dijkstra’s algorithm to solve your problem.

At this page, I found “a data file that represents a significant portion of the interstate highway system”. So, I decided to actually compute the shortest paths from Atlanta to Dallas, Chicago, and New York. In case you are dying to know, here are the results:

The shortest distance between Atlanta and New York is 897 miles. The route is Atlanta -> Spartanburg -> Charlotte -> Greensboro -> Petersburg -> Richmond -> Washington -> Baltimore -> Philadelphia -> Newark -> New York.

The shortest distance between Atlanta and Dallas is 791 miles. The route is Atlanta -> Birmingham -> Meridian -> Jackson -> Shreveport -> Dallas.

The shortest distance between Atlanta and Chicago is 729 miles. The route is Atlanta -> Chattanooga -> Wildwood -> Nashville -> Louisville -> Indianapolis -> Gary -> Chicago.

Of course, the quality of these results depends on the quality of the data and the quality of the program that operates on the data. However, these results are close to the results that Google turns up. So, both the data and my implementation of Dijkstra’s algorithm seem to be good.

For more of my programs, check out my GitHub.

Graph Week

This week /r/DailyProgrammer posed a series of challenges related to graph theory.

The first challenge gives you the number of vertices of an undirected graph and the graph’s edges. It asks you to find the degree of each vertex. The bonus challenge asks you to construct the adjacency matrix of the graph. I solved both problems, and I combined my solutions into one solution.

Each vertex i corresponds to a row in the adjacency matrix. For each column j in the matrix, if there is an edge between vertex i and vertex j, then the matrix entry i,j is equal to 1, and 0 otherwise. So, to find the number of edges for a vertex in an undirected graph, you can just construct the adjacency matrix and then sum all of the entries of the corresponding row.

My solution to both challenges is up at my GitHub.

The next challenge gives you the edges of a directed graph. It asks you to find the radius and the diameter of the graph.

The eccentricity of a vertex is the greatest distance from that vertex to any other vertex. So, to find the radius and the diameter of a graph, you can find the eccentricity of each vertex. Then the radius is the smallest eccentricity, and the diameter is the largest eccentricity.

Because I find the eccentricity of each vertex, I find the distances between all pairs of vertices in the graph. So, my first thought was to apply the Floyd-Warshall algorithm. However, one of the given graphs has a cycle, and I mistakenly believed that Floyd-Warshall only applies to directed acyclic graphs. So, instead of using Floyd-Warshall, I designed a solution around breadth-first search. When I discovered that Floyd-Warshall only fails on graphs that have negative cycles, I implemented a solution that employs Floyd-Warshall.

Here is my Floyd-Warshall solution. Here is my breadth-first search solution. Both are up at my GitHub.

The last challenge posed by /r/DailyProgrammer gives you the number of vertices in an undirected graph and the graph’s edges. It asks you to find a maximum clique in the graph.

Even though the challenge is advertised as hard, I found it to be very easy. You can just recursively construct cliques and determine whether they are maximum. Although this is an exhaustive search, you can’t really solve the problem any faster than that, because the maximum clique problem is NP-complete. Fortunately, the size of the given problem instance is small enough for brute force to quickly yield a solution.

My solution to the last challenge is up at my GitHub.

Magic Squares

I solved today’s /r/DailyProgrammer challenge. The challenge is easy, but both of the optional bonus problems are significantly more difficult, and I solved both of them.

The first bonus problem is to decide whether any given grid of integers forms a magic square. The second bonus problem is, given a grid whose bottom row is missing, to decide whether the given grid could possibly form a magic square.

I put my code up on my GitHub.

EDIT (4/11/2016):

I was reading John D. Cook’s blog, and I read that all 3 x 3 magic squares have an interesting property. Being both curious and skeptical, I wrote a little Python program to see for myself whether it’s true that they really do. The code is up at my GitHub.