机器学习笔记 十八:基于3种方法的随机森林模型分析房屋参数重要性_随机森林重要性分析-CSDN博客

admin 体育资讯 2024-05-09 51 0

机器学习笔记 十八:基于3种方法的随机森林模型分析房屋参数重要性_随机森林重要性分析-CSDN博客

将机器学习笔记 十六:基于Boruta算法的随机森林(RF)特征重要性评估与本篇结合,对比分析。 1. 探索性数据分析

输入参数: id、date、bedrooms、bathrooms、sqft_living、sqft_lot、floors、waterfront、view、condition、grade、sqft_above、sqft_basement、yr_built、yr_renovated、zipcode、lat、long、sqft_living15、sqft_lot15、

输出参数: price

用散点图展示数据之间的相关性:

绘制参数热图(相关性分析):

1.1 数据集分割(训练集、测试集)

((17290, 18), (17290,), (4323, 18), (4323,)) 1.2 模型拟合

Metrics for Random Forest Regressor

Average absolute error: 72704.15 degrees.

Improvement over baseline: 100.0 %.

Accuracy: 86.88 %.

R2 score: 0.8381720745711922 2. 特征重要性比较 2.1 Gini Importance

2.2 Permutation Importance

可以看出lat的重要性升高 2.3 Boruta

Iteration: 1 / 50

Confirmed: 0

Tentative: 18

Rejected: 0

Iteration: 2 / 50

Confirmed: 0

Tentative: 18

Rejected: 0

Iteration: 3 / 50

Confirmed: 0

Tentative: 18

Rejected: 0

Iteration: 4 / 50

Confirmed: 0

Tentative: 18

Rejected: 0

Iteration: 5 / 50

Confirmed: 0

Tentative: 18

Rejected: 0

Iteration: 6 / 50

Confirmed: 0

Tentative: 18

Rejected: 0

Iteration: 7 / 50

Confirmed: 0

Tentative: 18

Rejected: 0

Iteration: 8 / 50

Confirmed: 13

Tentative: 0

Rejected: 5

BorutaPy finished running.

Iteration: 9 / 50

Confirmed: 13

Tentative: 0

Rejected: 5

837.3257942199707

Confirmed:

[‘bathrooms’, ‘sqft_living’, ‘sqft_lot’, ‘waterfront’, ‘view’, ‘grade’, ‘sqft_above’, ‘yr_built’, ‘zipcode’, ‘lat’, ‘long’, ‘sqft_living15’, ‘sqft_lot15’]

Tentatives:

[‘sqft_basement’]

Rejected:

[‘bedrooms’, ‘floors’, ‘condition’, ‘yr_renovated’] 3. 特征比较 3.1 Gini Importance

3.2 Permutation Importance

3.3 Boruta

4. 模型比较

******************* Original Model ***********************

Metrics for Random Forest Regressor

Average absolute error: 72704.15 degrees.

Improvement over baseline: 100.0 %.

Accuracy: 86.88 %.

R2 score: 0.8381720745711922

**** Feature selection based on Gini Importance ****

Metrics for Random Forest Regressor

Average absolute error: 81288.41 degrees.

Improvement over baseline: 100.0 %.

Accuracy: 85.56 %.

R2 score: 0.8052584664901095

**** Feature selection based on Permutation Importance *****

Metrics for Random Forest Regressor

Average absolute error: 72741.67 degrees.

Improvement over baseline: 100.0 %.

Accuracy: 86.77 %.

R2 score: 0.8477802122659206

*********** Feature selection based on Boruta **************

Metrics for Random Forest Regressor

Average absolute error: 73254.05 degrees.

Improvement over baseline: 100.0 %.

Accuracy: 86.75 %.

R2 score: 0.8388239891237698

Permutation Importance对于R2的计算是比较好的模型,Permutation Importance和Boruta都是比较好的方法。

评论