1R classification algorithm

1R is a simple classification learning algorithm that develops a set of rules from a single input attribute (hence "1" for the number of inputs and "R" for "rules"). The algorithm works by looking at an attribute of a given data set and producing a rule based on that attribute to predict the outcome. This process is repeated for each attribute to create a complete rule set.

1R classification example

Consider the following training dataset which we want to use to train our model to predict whether a credit card customer will respond to a life insurance promotion:

AgeGenderIncome RangeHas Credit Card InsuranceLife Insurance Promotion
45M40k-50knono
40F30k-40knoyes
42M40k-50knono
43M30k-40kyesyes
38F50k-60knoyes
55F20k-30knono
35M30k-40kyesyes
27M20k-30knono
43M30k-40knono
41F30k-40knoyes
43F40k-50knoyes
29M20k-30knoyes
39F50k-60knoyes
55M40k-50knono
19F20k-30kyesyes

Looking at Gender, the majority (6 out of 7) females responded to the life insurance promotion while the majority (6 out of 8) of males did not. This creates the following rules for gender:

graph LR Gender{Gender}-->| F | yes((yes)) Gender-->| M | no((no))

The accuracy rate on this rule is the number of accurate predictions, divided by the total number of predictions, or $(6+5)/15 = 11/15 \approx 73\%$

This process is repeated for Has Credit Card Insurance and Income Range, which derives the following rules:

graph LR CC{Has Credit Card Insurance?}-->| yes | yesCC((yes)) CC-->| no | noCC((no)) Income{Income Range}-->| 20k-30k | noI((no)) Income-->| 30k-40k | yesI((yes)) Income-->| 40k-50k | noI Income-->| 50k-60k | yesI

Note that the Has Credit Card Insurance attribute has an equal number of each outcome for no values (6 no values result in a yes and 6 no values result in a no). To resolve this, we break the tie in the way we deem most appropriate. In this case we already determined that yes predicts yes with $100\%$ accuracy, thus it makes sense that no would predict no in this case.

Note the accuracy for each rule is as follows:

RuleAccuracy
Gender73%
Has Credit Card Insurance60%
Income Range73%

Age is treated differently because it is a numeric value. Numeric values are discretized into ranges of values, sometimes called sub-ranges, bins, or buckets. Though there are multiple ways to discretize, one of the simplest is to sort the data and then split it wherever makes the most accurate prediction.

Using the above sample data, we can arrange it by age and whether they responded to the life insurance promotion:

AgeLife Insurance Promotion
19yes
27no
29yes
35yes
38yes
39yes
40yes
41yes
42no
43yes
43yes
43no
45no
55no
55no

Note that if we split it roughly in half, between age $40$ and $41$, the accuracy of the resulting rule (where < 40 ➡ yes and > 40 ➡ no) would be $6+5/15 \approx 73\%$. However, if we split the data between $43$ and $45$, the accuracy rises to $80\%$ ($12/15$ accurate predictions). Thus we end up with the rule:

graph LR Age{Age}-->| <= 43 | yes((yes)) Age-->| > 43 | no((no))

Broader Topics Related to 1R Classification Algorithm

Classification Learning

A type of machine learning that classifies entities based on their characteristics