You are given the voting records of a set of Democrats and the voting records of a set of Republicans. Now you are given the voting record of some unidentified politician, and you want to decide whether they are a Democrat or a Republican. How do you do that?
One way to do it is to take the voting records that you’re given and construct a decision tree. Each internal node in the decision tree corresponds to an attribute of a politician’s voting record, and each edge in the tree corresponds to a value of an attribute. To classify a given politician, you start at the root of the tree and determine the value of the corresponding attribute. Then you take the edge that corresponds to the value of that attribute. Now you’re at a new node, and you repeat the process. In this way you walk down the tree until you reach a leaf which designates the party that the politician probably belongs to.
The ID3 algorithm is a procedure to generate decision trees. ID3 is essentially a greedy search through the space of decision trees. You start with an empty tree and then iteratively construct the decision tree, starting at the root. At each iteration, you add the node whose corresponding attribute would produce the most gain in information if you were given its value.
Each path from the root to a leaf in a decision tree corresponds to a rule. For example, a politician who voted against freezing physician fees and in favor of increasing spending on education is, as a rule, a Democrat. An entire decision tree corresponds to a set of rules. I wrote and included in my implementation a function that builds a ruleset out of a given decision tree.
And I applied the ID3 algorithm to solve the problem that I gave above. The solution is also up at my GitHub.