当前位置：首页 > article >正文

使用KNN实现对鸢尾花数据集或者自定义数据集的的预测

article 2025/4/2 17:44:32

创建自定义数据集：

point1=[[7.7,6.1],[3.1,5.9],[8.6,8.8],[9.5,7.3],[3.9,7.4],[5.0,5.3],[1.0,7.3]]
point2=[[0.2,2.2],[4.5,4.1],[0.5,1.1],[2.7,3.0],[4.7,0.2],[2.9,3.3],[7.3,7.9]]
point3=[[9.2,0.7],[9.2,2.1],[7.3,4.5],[8.9,2.9],[9.5,3.7],[7.7,3.7],[9.4,2.4]]
point_concat = np.concatenate((point1, point2, point3), axis=0)
point_concat_label = np.concatenate((np.zeros(len(point1)), np.ones(len(point2)), np.ones(len(point2)) + 1), axis=0)
print(point_concat_label)

并对以上数据集进行预测

完整代码：

from sklearn.neighbors import KNeighborsClassifier
import numpy as np
import matplotlib.pyplot as plt

point1=[[7.7,6.1],[3.1,5.9],[8.6,8.8],[9.5,7.3],[3.9,7.4],[5.0,5.3],[1.0,7.3]]
point2=[[0.2,2.2],[4.5,4.1],[0.5,1.1],[2.7,3.0],[4.7,0.2],[2.9,3.3],[7.3,7.9]]
point3=[[9.2,0.7],[9.2,2.1],[7.3,4.5],[8.9,2.9],[9.5,3.7],[7.7,3.7],[9.4,2.4]]
point_concat = np.concatenate((point1, point2, point3), axis=0)
point_concat_label = np.concatenate((np.zeros(len(point1)), np.ones(len(point2)), np.ones(len(point2)) + 1), axis=0)
print(point_concat_label)

n_neighbors = 3
knn = KNeighborsClassifier(n_neighbors=n_neighbors, algorithm='kd_tree', p=2)

knn.fit(point_concat, point_concat_label)

x1 = np.linspace(0, 10, 100)
y1 = np.linspace(0, 10, 100)
x_axis, y_axis = np.meshgrid(x1, y1)
print('s')

xy_axis=np.c_[x_axis.ravel(),y_axis.ravel()]
knn_predict_result=knn.predict(xy_axis)

fig=plt.figure(figsize=(5,5))
ax=fig.add_subplot(111)
ax.contour(x_axis,y_axis,knn.predict(xy_axis).reshape(x_axis.shape))

ax.scatter(point_concat[point_concat_label == 0, 0], point_concat[point_concat_label == 0, 1],color='r', marker='^')
ax.scatter(point_concat[point_concat_label == 1, 0], point_concat[point_concat_label == 1, 1],color='g', marker='*')
ax.scatter(point_concat[point_concat_label == 2, 0], point_concat[point_concat_label == 2, 1],color='b', marker='s')
plt.show()

输出结果：

对鸢尾花数据集：

完整代码：

from sklearn.datasets import load_iris
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler,MinMaxScaler
from sklearn.neighbors import KNeighborsClassifier


iris = load_iris()
iris_data1 = pd.DataFrame(data=iris['data'], columns = ['Sepal_Length', 'Sepal_Width', 'Petal_Length', 'Petal_Width'])
print('')
iris_data1['target']=iris['target']
def plot_iris(data,col1,col2):
    sns.lmplot(x=col1,y=col2,data=data,hue='target',fit_reg=False)
    plt.title('data show')
    plt.xlabel(col1)
    plt.ylabel(col2)
    plt.show()

plot_iris(iris_data1,'Sepal_Length','Petal_Width')
x_train,x_test,y_train,y_test=train_test_split(iris['data'],iris['target'],test_size=0.3,random_state=42)
print("训练集的特征值是 ： \n", x_train)
print("测试集的特征值是 ： \n", x_test)
print("训练集的目标值是 ： \n", y_train)
print("测试集的目标值是 ： \n", y_test)

print("训练集的特征值形状 ： \n", x_train.shape)
print("测试集的特征值形状 ： \n", x_test.shape)
print("训练集的目标值形状 ： \n", y_train.shape)
print("测试集的目标值形状 ： \n", y_test.shape)

transfer=MinMaxScaler(feature_range=(0,1))

transfer1=StandardScaler()
ret_train_data=transfer1.fit_transform(x_train)
ret_test_data=transfer1.fit_transform(x_test)

n_neighbors = 5
knn = KNeighborsClassifier(n_neighbors=n_neighbors)
knn.fit(ret_train_data, y_train)

y_pre=knn.predict(ret_test_data)
print('预测值是\n',y_pre)
print("预测值和真实值的对比是:\n",y_pre==y_test)
score=knn.score(ret_test_data,y_test)
print(f'准确率是：{score}')

结果：

训练集的特征值是 ： 
 [[5.5 2.4 3.7 1. ]
 [6.3 2.8 5.1 1.5]
 [6.4 3.1 5.5 1.8]
 [6.6 3.  4.4 1.4]
 [7.2 3.6 6.1 2.5]
 [5.7 2.9 4.2 1.3]
 [7.6 3.  6.6 2.1]
 [5.6 3.  4.5 1.5]
 [5.1 3.5 1.4 0.2]
 [7.7 2.8 6.7 2. ]
 [5.8 2.7 4.1 1. ]
 [5.2 3.4 1.4 0.2]
 [5.  3.5 1.3 0.3]
 [5.1 3.8 1.9 0.4]
 [5.  2.  3.5 1. ]
 [6.3 2.7 4.9 1.8]
 [4.8 3.4 1.9 0.2]
 [5.  3.  1.6 0.2]
 [5.1 3.3 1.7 0.5]
 [5.6 2.7 4.2 1.3]
 [5.1 3.4 1.5 0.2]
 [5.7 3.  4.2 1.2]
 [7.7 3.8 6.7 2.2]
 [4.6 3.2 1.4 0.2]
 [6.2 2.9 4.3 1.3]
 [5.7 2.5 5.  2. ]
 [5.5 4.2 1.4 0.2]
 [6.  3.  4.8 1.8]
 [5.8 2.7 5.1 1.9]
 [6.  2.2 4.  1. ]
 [5.4 3.  4.5 1.5]
 [6.2 3.4 5.4 2.3]
 [5.5 2.3 4.  1.3]
 [5.4 3.9 1.7 0.4]
 [5.  2.3 3.3 1. ]
 [6.4 2.7 5.3 1.9]
 [5.  3.3 1.4 0.2]
 [5.  3.2 1.2 0.2]
 [5.5 2.4 3.8 1.1]
 [6.7 3.  5.  1.7]
 [4.9 3.1 1.5 0.2]
 [5.8 2.8 5.1 2.4]
 [5.  3.4 1.5 0.2]
 [5.  3.5 1.6 0.6]
 [5.9 3.2 4.8 1.8]
 [5.1 2.5 3.  1.1]
 [6.9 3.2 5.7 2.3]
 [6.  2.7 5.1 1.6]
 [6.1 2.6 5.6 1.4]
 [7.7 3.  6.1 2.3]
 [5.5 2.5 4.  1.3]
 [4.4 2.9 1.4 0.2]
 [4.3 3.  1.1 0.1]
 [6.  2.2 5.  1.5]
 [7.2 3.2 6.  1.8]
 [4.6 3.1 1.5 0.2]
 [5.1 3.5 1.4 0.3]
 [4.4 3.  1.3 0.2]
 [6.3 2.5 4.9 1.5]
 [6.3 3.4 5.6 2.4]
 [4.6 3.4 1.4 0.3]
 [6.8 3.  5.5 2.1]
 [6.3 3.3 6.  2.5]
 [4.7 3.2 1.3 0.2]
 [6.1 2.9 4.7 1.4]
 [6.5 2.8 4.6 1.5]
 [6.2 2.8 4.8 1.8]
 [7.  3.2 4.7 1.4]
 [6.4 3.2 5.3 2.3]
 [5.1 3.8 1.6 0.2]
 [6.9 3.1 5.4 2.1]
 [5.9 3.  4.2 1.5]
 [6.5 3.  5.2 2. ]
 [5.7 2.6 3.5 1. ]
 [5.2 2.7 3.9 1.4]
 [6.1 3.  4.6 1.4]
 [4.5 2.3 1.3 0.3]
 [6.6 2.9 4.6 1.3]
 [5.5 2.6 4.4 1.2]
 [5.3 3.7 1.5 0.2]
 [5.6 3.  4.1 1.3]
 [7.3 2.9 6.3 1.8]
 [6.7 3.3 5.7 2.1]
 [5.1 3.7 1.5 0.4]
 [4.9 2.4 3.3 1. ]
 [6.7 3.3 5.7 2.5]
 [7.2 3.  5.8 1.6]
 [4.9 3.6 1.4 0.1]
 [6.7 3.1 5.6 2.4]
 [4.9 3.  1.4 0.2]
 [6.9 3.1 4.9 1.5]
 [7.4 2.8 6.1 1.9]
 [6.3 2.9 5.6 1.8]
 [5.7 2.8 4.1 1.3]
 [6.5 3.  5.5 1.8]
 [6.3 2.3 4.4 1.3]
 [6.4 2.9 4.3 1.3]
 [5.6 2.8 4.9 2. ]
 [5.9 3.  5.1 1.8]
 [5.4 3.4 1.7 0.2]
 [6.1 2.8 4.  1.3]
 [4.9 2.5 4.5 1.7]
 [5.8 4.  1.2 0.2]
 [5.8 2.6 4.  1.2]
 [7.1 3.  5.9 2.1]]
测试集的特征值是 ： 
 [[6.1 2.8 4.7 1.2]
 [5.7 3.8 1.7 0.3]
 [7.7 2.6 6.9 2.3]
 [6.  2.9 4.5 1.5]
 [6.8 2.8 4.8 1.4]
 [5.4 3.4 1.5 0.4]
 [5.6 2.9 3.6 1.3]
 [6.9 3.1 5.1 2.3]
 [6.2 2.2 4.5 1.5]
 [5.8 2.7 3.9 1.2]
 [6.5 3.2 5.1 2. ]
 [4.8 3.  1.4 0.1]
 [5.5 3.5 1.3 0.2]
 [4.9 3.1 1.5 0.1]
 [5.1 3.8 1.5 0.3]
 [6.3 3.3 4.7 1.6]
 [6.5 3.  5.8 2.2]
 [5.6 2.5 3.9 1.1]
 [5.7 2.8 4.5 1.3]
 [6.4 2.8 5.6 2.2]
 [4.7 3.2 1.6 0.2]
 [6.1 3.  4.9 1.8]
 [5.  3.4 1.6 0.4]
 [6.4 2.8 5.6 2.1]
 [7.9 3.8 6.4 2. ]
 [6.7 3.  5.2 2.3]
 [6.7 2.5 5.8 1.8]
 [6.8 3.2 5.9 2.3]
 [4.8 3.  1.4 0.3]
 [4.8 3.1 1.6 0.2]
 [4.6 3.6 1.  0.2]
 [5.7 4.4 1.5 0.4]
 [6.7 3.1 4.4 1.4]
 [4.8 3.4 1.6 0.2]
 [4.4 3.2 1.3 0.2]
 [6.3 2.5 5.  1.9]
 [6.4 3.2 4.5 1.5]
 [5.2 3.5 1.5 0.2]
 [5.  3.6 1.4 0.2]
 [5.2 4.1 1.5 0.1]
 [5.8 2.7 5.1 1.9]
 [6.  3.4 4.5 1.6]
 [6.7 3.1 4.7 1.5]
 [5.4 3.9 1.3 0.4]
 [5.4 3.7 1.5 0.2]]
训练集的目标值是 ： 
 [1 2 2 1 2 1 2 1 0 2 1 0 0 0 1 2 0 0 0 1 0 1 2 0 1 2 0 2 2 1 1 2 1 0 1 2 0
 0 1 1 0 2 0 0 1 1 2 1 2 2 1 0 0 2 2 0 0 0 1 2 0 2 2 0 1 1 2 1 2 0 2 1 2 1
 1 1 0 1 1 0 1 2 2 0 1 2 2 0 2 0 1 2 2 1 2 1 1 2 2 0 1 2 0 1 2]
测试集的目标值是 ： 
 [1 0 2 1 1 0 1 2 1 1 2 0 0 0 0 1 2 1 1 2 0 2 0 2 2 2 2 2 0 0 0 0 1 0 0 2 1
 0 0 0 2 1 1 0 0]
训练集的特征值形状 ： 
 (105, 4)
测试集的特征值形状 ： 
 (45, 4)
训练集的目标值形状 ： 
 (105,)
测试集的目标值形状 ： 
 (45,)
预测值是
 [1 0 2 2 2 0 1 2 1 1 2 0 0 0 0 2 2 1 1 2 0 2 0 2 2 2 2 2 0 0 0 0 1 0 0 2 1
 0 0 0 2 1 1 0 0]
预测值和真实值的对比是:
 [ True  True  True False False  True  True  True  True  True  True  True
  True  True  True False  True  True  True  True  True  True  True  True
  True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True  True  True  True  True  True  True]
准确率是：0.9333333333333333

查看全文

http://www.kler.cn/a/516034.html