bagging(main: RF随机森林) 回归器
""" 一个原始数据的bagging回归 编辑代码思想的步骤: 1. 根据要实现的需求,导入数据处理和功能调用的包/模块 2. 创建数据 3. 创建变量n_tree:集成回归器棵数 4. 创建存储回归器的存储器 5. 循环1-n_tree的训练和预测: 训练 01:训练循环体中选用抽取方式并调用 训练 02:将x,y从数据表格中取出 训练 03:实例化回归器 训练 04:训练 训练 05:每循环一次回归器存储到存储器 预测 01:重新创建X,Y变量取出数据 预测 02:初始化回归器计算的总值total 预测 03:预测循环体中存储器每一次的predict() 预测 04:total/n_tree 求平均 预测 05:预测y 6. 打分 """
# 集成学习有3种思想:bagging(RF随机森林), boosting(AdaBoost and GBDT), and stacking import numpy as np import pandas as pd from sklearn.tree import DecisionTreeRegressor from sklearn.metrics import r2_score df_r = pd.DataFrame([[1, 10.56], [2, 27], [3, 39.1], [4, 40.4], [5, 58], [6, 60.5], [7, 79], [8, 87], [9, 90], [10, 95]], columns=['X', 'Y']) print(df_r) n_T = 10 Models = [] for i in range(n_T): df2 = df_r.sample(frac=1.0, replace=True) X = df_r.iloc[:, :-1] Y = df_r.iloc[:, -1] model = DecisionTreeRegressor(max_depth=1) model.fit(X, Y) Models.append(model) # 预测 x = df_r.iloc[:,:-1] y = df_r.iloc[:,-1] total = np.zeros(df_r.shape[0]) for t in range(n_T): total += Models[t].predict(x) y_hat = total/n_T print('y_hat:', y_hat) # 打分 print("R:", r2_score(y, y_hat)) # 回归器只回归一次的案例 model02 = DecisionTreeRegressor(max_depth=1) model02.fit(x, y) y_hat02 = model02.predict(x) print('#' * 100) print("y_hat02:", y_hat02) print("One R:", r2_score(y, y_hat02))
X Y
0 1 10.56
1 2 27.00
2 3 39.10
3 4 40.40
4 5 58.00
5 6 60.50
6 7 79.00
7 8 87.00
8 9 90.00
9 10 95.00
y_hat: [29.265 29.265 29.265 29.265 78.25 78.25 78.25 78.25 78.25 78.25 ]
R: 0.7622123252516444
####################################################################################################
y_hat02: [29.265 29.265 29.265 29.265 78.25 78.25 78.25 78.25 78.25 78.25 ]
One R: 0.7622123252516444
Process finished with exit code 0