Extras din curs
Decision tree is a popular classifier that does not require any knowledge or parameter setting. The approach is supervised learning. Given a training data, we can induce a decision tree. From a decision tree we can easily create rules about the data. Using decision tree, we can easily predict the classification of unseen records. In this decision tree tutorial, you will learn how to use, and how to build a decision tree in a very simple explanation.
Decision tree is a hierarchical tree structure that used to classify classes based on a series of questions (or rules) about the attributes of the class. The attributes of the classes can be any type of variables from binary, nominal, ordinal, and quantitative values, while the classes must be qualitative type (categorical or binary, or ordinal). In short, given a data of attributes together with its classes, a decision tree produces a sequence of rules (or series of questions) that can be used to recognize the class.
Let us start with an example. Throughout this tutorial, we will use the following 10 training data. The training data is supposed to be a part of a transportation study regarding mode choice to select Bus, Car or Train among commuters along a major route in a city, gathered through a questionnaire study. The data have 4 attributes which I selected for the shake of clarity. Attribute gender is binary type, car ownership is quantitative integer (thus behave like nominal). Travel cost/km is quantitative of ratio type but in here I put into ordinal type (because quantitative data need to be split into qualitative data) and income level is also an ordinal type.
Attributes Classes
Gender Car ownership Travel Cost ($)/km Income Level Transportation mode
Male 0 Cheap Low Bus
Male 1 Cheap Medium Bus
Female 1 Cheap Medium Train
Female 0 Cheap Low Bus
Male 1 Cheap Medium Bus
Male 0 Standard Medium Train
Female 1 Standard Medium Train
Female 1 Expensive High Car
Male 2 Expensive Medium Car
Female 2 Expensive High Car
Based on above training data, we can induce a decision tree as the following:
Notice that attribute “income level” is not included in the decision tree because based on the given data attribute “travel cost per km” would produce better classification than “income level”. We will see later how the decision is generated. In the next section, I will discuss how to use a decision tree to predict unseen record.
Decision tree can be used to predict a pattern or to classify the class of a data. Suppose we have new unseen records of a person from the same location where the data sample was taken. The following data are called test data (in contrast to training data ) because we would like to examine the classes of these data.
Person name Gender Car ownership Travel Cost ($)/km Income Level Transportation Mode
Alex Male 1 Standard High ?
Buddy Male 0 Cheap Medium ?
Cherry Female 1 Cheap High ?
The question is what transportation mode would Alex, Buddy and Cheery use? Using the decision tree that we have generated in the previous section, we will use deductive approach to classify whether a person will use car, train or bus as his or her mode along a major route in that city, based on the given attributes.
Preview document
Conținut arhivă zip
- Decision Tree.doc