sklearn-Classification Algorithms-Decision Trees

Information entropy (put forward) = The probability of each category * log probability, sum, multiply by -1

Information entropy measures uncertainty. The smaller the information entropy is, the smaller the uncertainty is. The greater the information entropy, the greater the uncertainty.

Information gain:

Represents the degree to which the information entropy decreases when the information of feature A is known. = Initial Information Entropy-A Conditional Information Entropy Age Information Entropy: Youth (5/15), Middle Age (5/15) and Old Age (5/15)

Age Information Entropy=-1* [5/15* H (Youth) + 5/15 * H (Middle Age) + 5/15 * H (Old Age)]

H (Youth) = - 1 (2/5log 2/5 + 3/5*log 3/5)

Decision Tree: The more information gain (the more uncertainties decrease) is put in front of the tree.

api :

from sklearn.tree import DecisionTreeClassifier
DecisionTreeClassifier(criterion='gini', max_depth=None,random_state=None)

criterion: The default is the'gini'coefficient. You can also choose'entropy' for information gain.

max_depth: The depth of the tree.

Code

import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.feature_extraction import DictVectorizer
def decision_tree():
data = data.drop(['row.names'],axis=1)

# Fill in missing values
data['age'] = data['age'].fillna(data['age'].mean())
print(data.info())
y = data['survived']
x = data[['pclass','age','sex']]
#   Chinese characters, to onehot encoding
x = x.to_dict(orient='records')
dv =DictVectorizer(sparse=False)
x = dv.fit_transform(x)
print(dv.get_feature_names())
print(x)
x_train, x_test, y_train, y_test = train_test_split(x,y,train_size=0.8)
# Training model
dtc = DecisionTreeClassifier(criterion='gini',max_depth=9)
dtc.fit(x_train,y_train)
dtc.predict(x_test)
score = dtc.score(x_test,y_test)
print(score)
decision_tree()

['age', 'pclass=1st', 'pclass=2nd', 'pclass=3rd', 'sex=female', 'sex=male']
[[29.          1.          0.          0.          1.          0.        ]
[ 2.          1.          0.          0.          1.          0.        ]
[30.          1.          0.          0.          0.          1.        ]
...
[31.19418104  0.          0.          1.          0.          1.        ]
[31.19418104  0.          0.          1.          1.          0.        ]
[31.19418104  0.          0.          1.          0.          1.        ]]
0.8022813688212928

Tags: encoding

Posted on Sun, 06 Oct 2019 14:22:40 -0700 by PcGeniusProductions