SVM模型实战1
目录
- 前言
- 实战
前言
这里有一份手写体字母识别的数据,我们采用网格搜索法,分别测试LinearSVC和SVC模型,最终选择SVC模型,并计算预测结果的准确性。
实战
# 导入第三方模块
from sklearn import svm
import pandas as pd
from sklearn import model_selection
from sklearn import metrics
# 读取外部数据
letters = pd.read_csv(r'letterdata.csv')
# 将数据拆分为训练集和测试集
predictors = letters.columns[1:]
X_train,X_test,y_train,y_test = model_selection.train_test_split(letters[predictors], letters.letter, test_size = 0.25, random_state = 1234)
# 使用网格搜索法,选择线性可分SVM“类”中的最佳C值
#C=[0.05,0.1,0.5,1,2,5]
#parameters = {'C':C}
#grid_linear_svc = model_selection.GridSearchCV(estimator = svm.LinearSVC(),param_grid =parameters, scoring='accuracy',cv=5,verbose =1)
# 模型在训练数据集上的拟合
#grid_linear_svc.fit(X_train,y_train)
# 返回交叉验证后的最佳参数值
#print(grid_linear_svc.best_params_)
# 使用网格搜索法,选择非线性可分SVM“类”中的最佳C值和核函数
#kernel=['rbf','linear','poly','sigmoid']
#C=[0.1,0.5,1,2,5]
#parameters = {'kernel':kernel,'C':C}
#grid_svc = model_selection.GridSearchCV(estimator = svm.SVC(), param_grid =parameters,scoring='accuracy',cv=5,verbose =1)
# 模型在训练数据集上的拟合
#grid_svc.fit(X_train,y_train)
# 返回交叉验证后的最佳参数值
#print(grid_svc.best_params_)
#linearsvc =svm.LinearSVC(C=5.0)
#linearsvc.fit(X_train,y_train)
# 模型在测试集上的预测
#pred_linear_svc = linearsvc.predict(X_test)
# 模型的预测准确率
#print(metrics.accuracy_score(y_test, pred_linear_svc))
svc = svm.SVC(C=5.0,kernel='rbf')
svc.fit(X_train,y_train)
# 模型在测试集上的预测
pred_linear_svc = svc.predict(X_test)
# 模型的预测准确率
print(metrics.accuracy_score(y_test, pred_linear_svc))
最终我们选择的模型是SVC模型,最佳参数是{‘C’: 5, ‘kernel’: ‘rbf’}
结果如下:
0.9596