机器学习笔记 十八:基于3种方法的随机森林模型分析房屋参数重要性_随机森林重要性分析-CSDN博客
将机器学习笔记 十六:基于Boruta算法的随机森林(RF)特征重要性评估与本篇结合,对比分析。 1. 探索性数据分析
输入参数: id、date、bedrooms、bathrooms、sqft_living、sqft_lot、floors、waterfront、view、condition、grade、sqft_above、sqft_basement、yr_built、yr_renovated、zipcode、lat、long、sqft_living15、sqft_lot15、
输出参数: price
用散点图展示数据之间的相关性:
绘制参数热图(相关性分析):
1.1 数据集分割(训练集、测试集)
((17290, 18), (17290,), (4323, 18), (4323,)) 1.2 模型拟合
Metrics for Random Forest Regressor
Average absolute error: 72704.15 degrees.
Improvement over baseline: 100.0 %.
Accuracy: 86.88 %.
R2 score: 0.8381720745711922 2. 特征重要性比较 2.1 Gini Importance
2.2 Permutation Importance
可以看出lat的重要性升高 2.3 Boruta
Iteration: 1 / 50
Confirmed: 0
Tentative: 18
Rejected: 0
Iteration: 2 / 50
Confirmed: 0
Tentative: 18
Rejected: 0
Iteration: 3 / 50
Confirmed: 0
Tentative: 18
Rejected: 0
Iteration: 4 / 50
Confirmed: 0
Tentative: 18
Rejected: 0
Iteration: 5 / 50
Confirmed: 0
Tentative: 18
Rejected: 0
Iteration: 6 / 50
Confirmed: 0
Tentative: 18
Rejected: 0
Iteration: 7 / 50
Confirmed: 0
Tentative: 18
Rejected: 0
Iteration: 8 / 50
Confirmed: 13
Tentative: 0
Rejected: 5
BorutaPy finished running.
Iteration: 9 / 50
Confirmed: 13
Tentative: 0
Rejected: 5
837.3257942199707
Confirmed:
[‘bathrooms’, ‘sqft_living’, ‘sqft_lot’, ‘waterfront’, ‘view’, ‘grade’, ‘sqft_above’, ‘yr_built’, ‘zipcode’, ‘lat’, ‘long’, ‘sqft_living15’, ‘sqft_lot15’]
Tentatives:
[‘sqft_basement’]
Rejected:
[‘bedrooms’, ‘floors’, ‘condition’, ‘yr_renovated’] 3. 特征比较 3.1 Gini Importance
3.2 Permutation Importance
3.3 Boruta
4. 模型比较
******************* Original Model ***********************
Metrics for Random Forest Regressor
Average absolute error: 72704.15 degrees.
Improvement over baseline: 100.0 %.
Accuracy: 86.88 %.
R2 score: 0.8381720745711922
**** Feature selection based on Gini Importance ****
Metrics for Random Forest Regressor
Average absolute error: 81288.41 degrees.
Improvement over baseline: 100.0 %.
Accuracy: 85.56 %.
R2 score: 0.8052584664901095
**** Feature selection based on Permutation Importance *****
Metrics for Random Forest Regressor
Average absolute error: 72741.67 degrees.
Improvement over baseline: 100.0 %.
Accuracy: 86.77 %.
R2 score: 0.8477802122659206
*********** Feature selection based on Boruta **************
Metrics for Random Forest Regressor
Average absolute error: 73254.05 degrees.
Improvement over baseline: 100.0 %.
Accuracy: 86.75 %.
R2 score: 0.8388239891237698
Permutation Importance对于R2的计算是比较好的模型,Permutation Importance和Boruta都是比较好的方法。
评论