当前位置：首页 > article >正文

机器学习之朴素贝叶斯

article 2025/3/9 10:30:07

朴素贝叶斯：

也叫贝叶算法推断，建立在主管判断的基础上，不断地进行地修正。需要大量的计算。
	1、主观性强
	2、大量计算
贝叶斯定理：有先验概率和后验概率
	区别：
	假如出门堵车有两个因素：车太多与交通事故
		先验概率：堵车的概率就是先验概率
		后验概率类似于条件概率：主备出门前，广播里说发生了交通事故，计算现在堵车的概率就是后验概率
	贝叶斯推断的而含义就是：我们先预估一个“先验概率”（预计堵车的概率），然后加入实验结果，看这个实验是增强了还是消弱了“先验概率”，由此得到更接近事实的“后验概率”。
		实验结果就是后验概率，也就是时尚发生堵车的概率。如果先验概率越接近后验概率，说明在不知情的情况下发生堵车概率也就先验概率越准确。
	先验概率是指以前经验与分析得到的概率
	后验概率是指依据得到“结果”计算出来的最有可能是哪种事件发生的概率
		先验概率越接近后验概率说明以往积累的经验分析得到的概率越高，经验越准确，也叫最大似然估计
贝叶斯就是考虑一件事情发生的概率是多少，然后再训练模型中不断加入实验结果让先验概率更加靠近真实的后验概率。
只用于特征之间是条件独立的情况下，否则分类效果不好
朴素贝叶斯就是条件独立贝叶斯，常用于文档分类
朴素贝叶斯在sk-learn中提供了三种不同类型的贝叶斯模型算法：
	1、高斯模型
	2、伯努利模型
	3、多项式模型

贝叶斯之高斯分布：

也叫做正态分布：
公式中有两个参数μ表示均值，σ表示标准差，均值对应正态分布的中间位置

这里是高斯分布，来自于Jiaxxxxxx原创

高斯模型：

所要用到的api是：
	from sklearn.naive_bayes import GaussianNB
	当进行实例化模型对象的时候，不需要对高斯朴素贝叶斯类输入任何参数，是一个轻量级的类，操作简单。没有参数可以调整，该算法成长空间不足，效果不太理想，一般会换模型。

代码如下：

from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split
import sklearn.datasets as datasets
# 获取鸢尾花的数据
iris = datasets.load_iris()
print(iris)
feature = iris['data']
target = iris['target']
# 切割特征和标签，分为训练集和测试集
train_x, test_x, train_y, test_y = train_test_split(feature, target)
print(test_x, test_x)
# 创建贝叶斯模型实例
model = GaussianNB()
# 将训练集放入模型中进行训练，求解模型中的μ和σ，也就是均值和标准差
model.fit(train_x, train_y)
# 进行预测 取出来测试集的第五个进行测试
x_pred = model.predict(test_x[5].reshape((1, -1)))
print(x_pred)
# 查看样本呢分到不同类别的概率
print(model.predict_proba(test_x[5].reshape((1, -1))))
print(model.predict_log_proba(test_x[5].reshape((1,-1))))

#查看样本的精度如何
score = model.score(test_x,test_y)
print(score)

实验结果：

{'data': array([[5.1, 3.5, 1.4, 0.2],
       [4.9, 3. , 1.4, 0.2],
       [4.7, 3.2, 1.3, 0.2],
       [4.6, 3.1, 1.5, 0.2],
       [5. , 3.6, 1.4, 0.2],
       [5.4, 3.9, 1.7, 0.4],
       [4.6, 3.4, 1.4, 0.3],
       [5. , 3.4, 1.5, 0.2],
       [4.4, 2.9, 1.4, 0.2],
       [4.9, 3.1, 1.5, 0.1],
       [5.4, 3.7, 1.5, 0.2],
       [4.8, 3.4, 1.6, 0.2],
       [4.8, 3. , 1.4, 0.1],
       [4.3, 3. , 1.1, 0.1],
       [5.8, 4. , 1.2, 0.2],
       [5.7, 4.4, 1.5, 0.4],
       [5.4, 3.9, 1.3, 0.4],
       [5.1, 3.5, 1.4, 0.3],
       [5.7, 3.8, 1.7, 0.3],
       [5.1, 3.8, 1.5, 0.3],
       [5.4, 3.4, 1.7, 0.2],
       [5.1, 3.7, 1.5, 0.4],
       [4.6, 3.6, 1. , 0.2],
       [5.1, 3.3, 1.7, 0.5],
       [4.8, 3.4, 1.9, 0.2],
       [5. , 3. , 1.6, 0.2],
       [5. , 3.4, 1.6, 0.4],
       [5.2, 3.5, 1.5, 0.2],
       [5.2, 3.4, 1.4, 0.2],
       [4.7, 3.2, 1.6, 0.2],
       [4.8, 3.1, 1.6, 0.2],
       [5.4, 3.4, 1.5, 0.4],
       [5.2, 4.1, 1.5, 0.1],
       [5.5, 4.2, 1.4, 0.2],
       [4.9, 3.1, 1.5, 0.2],
       [5. , 3.2, 1.2, 0.2],
       [5.5, 3.5, 1.3, 0.2],
       [4.9, 3.6, 1.4, 0.1],
       [4.4, 3. , 1.3, 0.2],
       [5.1, 3.4, 1.5, 0.2],
       [5. , 3.5, 1.3, 0.3],
       [4.5, 2.3, 1.3, 0.3],
       [4.4, 3.2, 1.3, 0.2],
       [5. , 3.5, 1.6, 0.6],
       [5.1, 3.8, 1.9, 0.4],
       [4.8, 3. , 1.4, 0.3],
       [5.1, 3.8, 1.6, 0.2],
       [4.6, 3.2, 1.4, 0.2],
       [5.3, 3.7, 1.5, 0.2],
       [5. , 3.3, 1.4, 0.2],
       [7. , 3.2, 4.7, 1.4],
       [6.4, 3.2, 4.5, 1.5],
       [6.9, 3.1, 4.9, 1.5],
       [5.5, 2.3, 4. , 1.3],
       [6.5, 2.8, 4.6, 1.5],
       [5.7, 2.8, 4.5, 1.3],
       [6.3, 3.3, 4.7, 1.6],
       [4.9, 2.4, 3.3, 1. ],
       [6.6, 2.9, 4.6, 1.3],
       [5.2, 2.7, 3.9, 1.4],
       [5. , 2. , 3.5, 1. ],
       [5.9, 3. , 4.2, 1.5],
       [6. , 2.2, 4. , 1. ],
       [6.1, 2.9, 4.7, 1.4],
       [5.6, 2.9, 3.6, 1.3],
       [6.7, 3.1, 4.4, 1.4],
       [5.6, 3. , 4.5, 1.5],
       [5.8, 2.7, 4.1, 1. ],
       [6.2, 2.2, 4.5, 1.5],
       [5.6, 2.5, 3.9, 1.1],
       [5.9, 3.2, 4.8, 1.8],
       [6.1, 2.8, 4. , 1.3],
       [6.3, 2.5, 4.9, 1.5],
       [6.1, 2.8, 4.7, 1.2],
       [6.4, 2.9, 4.3, 1.3],
       [6.6, 3. , 4.4, 1.4],
       [6.8, 2.8, 4.8, 1.4],
       [6.7, 3. , 5. , 1.7],
       [6. , 2.9, 4.5, 1.5],
       [5.7, 2.6, 3.5, 1. ],
       [5.5, 2.4, 3.8, 1.1],
       [5.5, 2.4, 3.7, 1. ],
       [5.8, 2.7, 3.9, 1.2],
       [6. , 2.7, 5.1, 1.6],
       [5.4, 3. , 4.5, 1.5],
       [6. , 3.4, 4.5, 1.6],
       [6.7, 3.1, 4.7, 1.5],
       [6.3, 2.3, 4.4, 1.3],
       [5.6, 3. , 4.1, 1.3],
       [5.5, 2.5, 4. , 1.3],
       [5.5, 2.6, 4.4, 1.2],
       [6.1, 3. , 4.6, 1.4],
       [5.8, 2.6, 4. , 1.2],
       [5. , 2.3, 3.3, 1. ],
       [5.6, 2.7, 4.2, 1.3],
       [5.7, 3. , 4.2, 1.2],
       [5.7, 2.9, 4.2, 1.3],
       [6.2, 2.9, 4.3, 1.3],
       [5.1, 2.5, 3. , 1.1],
       [5.7, 2.8, 4.1, 1.3],
       [6.3, 3.3, 6. , 2.5],
       [5.8, 2.7, 5.1, 1.9],
       [7.1, 3. , 5.9, 2.1],
       [6.3, 2.9, 5.6, 1.8],
       [6.5, 3. , 5.8, 2.2],
       [7.6, 3. , 6.6, 2.1],
       [4.9, 2.5, 4.5, 1.7],
       [7.3, 2.9, 6.3, 1.8],
       [6.7, 2.5, 5.8, 1.8],
       [7.2, 3.6, 6.1, 2.5],
       [6.5, 3.2, 5.1, 2. ],
       [6.4, 2.7, 5.3, 1.9],
       [6.8, 3. , 5.5, 2.1],
       [5.7, 2.5, 5. , 2. ],
       [5.8, 2.8, 5.1, 2.4],
       [6.4, 3.2, 5.3, 2.3],
       [6.5, 3. , 5.5, 1.8],
       [7.7, 3.8, 6.7, 2.2],
       [7.7, 2.6, 6.9, 2.3],
       [6. , 2.2, 5. , 1.5],
       [6.9, 3.2, 5.7, 2.3],
       [5.6, 2.8, 4.9, 2. ],
       [7.7, 2.8, 6.7, 2. ],
       [6.3, 2.7, 4.9, 1.8],
       [6.7, 3.3, 5.7, 2.1],
       [7.2, 3.2, 6. , 1.8],
       [6.2, 2.8, 4.8, 1.8],
       [6.1, 3. , 4.9, 1.8],
       [6.4, 2.8, 5.6, 2.1],
       [7.2, 3. , 5.8, 1.6],
       [7.4, 2.8, 6.1, 1.9],
       [7.9, 3.8, 6.4, 2. ],
       [6.4, 2.8, 5.6, 2.2],
       [6.3, 2.8, 5.1, 1.5],
       [6.1, 2.6, 5.6, 1.4],
       [7.7, 3. , 6.1, 2.3],
       [6.3, 3.4, 5.6, 2.4],
       [6.4, 3.1, 5.5, 1.8],
       [6. , 3. , 4.8, 1.8],
       [6.9, 3.1, 5.4, 2.1],
       [6.7, 3.1, 5.6, 2.4],
       [6.9, 3.1, 5.1, 2.3],
       [5.8, 2.7, 5.1, 1.9],
       [6.8, 3.2, 5.9, 2.3],
       [6.7, 3.3, 5.7, 2.5],
       [6.7, 3. , 5.2, 2.3],
       [6.3, 2.5, 5. , 1.9],
       [6.5, 3. , 5.2, 2. ],
       [6.2, 3.4, 5.4, 2.3],
       [5.9, 3. , 5.1, 1.8]]), 'target': array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]), 'frame': None, 'target_names': array(['setosa', 'versicolor', 'virginica'], dtype='<U10'), 'DESCR': '.. _iris_dataset:\n\nIris plants dataset\n--------------------\n\n**Data Set Characteristics:**\n\n    :Number of Instances: 150 (50 in each of three classes)\n    :Number of Attributes: 4 numeric, predictive attributes and the class\n    :Attribute Information:\n        - sepal length in cm\n        - sepal width in cm\n        - petal length in cm\n        - petal width in cm\n        - class:\n                - Iris-Setosa\n                - Iris-Versicolour\n                - Iris-Virginica\n                \n    :Summary Statistics:\n\n    ============== ==== ==== ======= ===== ====================\n                    Min  Max   Mean    SD   Class Correlation\n    ============== ==== ==== ======= ===== ====================\n    sepal length:   4.3  7.9   5.84   0.83    0.7826\n    sepal width:    2.0  4.4   3.05   0.43   -0.4194\n    petal length:   1.0  6.9   3.76   1.76    0.9490  (high!)\n    petal width:    0.1  2.5   1.20   0.76    0.9565  (high!)\n    ============== ==== ==== ======= ===== ====================\n\n    :Missing Attribute Values: None\n    :Class Distribution: 33.3% for each of 3 classes.\n    :Creator: R.A. Fisher\n    :Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)\n    :Date: July, 1988\n\nThe famous Iris database, first used by Sir R.A. Fisher. The dataset is taken\nfrom Fisher\'s paper. Note that it\'s the same as in R, but not as in the UCI\nMachine Learning Repository, which has two wrong data points.\n\nThis is perhaps the best known database to be found in the\npattern recognition literature.  Fisher\'s paper is a classic in the field and\nis referenced frequently to this day.  (See Duda & Hart, for example.)  The\ndata set contains 3 classes of 50 instances each, where each class refers to a\ntype of iris plant.  One class is linearly separable from the other 2; the\nlatter are NOT linearly separable from each other.\n\n.. topic:: References\n\n   - Fisher, R.A. "The use of multiple measurements in taxonomic problems"\n     Annual Eugenics, 7, Part II, 179-188 (1936); also in "Contributions to\n     Mathematical Statistics" (John Wiley, NY, 1950).\n   - Duda, R.O., & Hart, P.E. (1973) Pattern Classification and Scene Analysis.\n     (Q327.D83) John Wiley & Sons.  ISBN 0-471-22361-1.  See page 218.\n   - Dasarathy, B.V. (1980) "Nosing Around the Neighborhood: A New System\n     Structure and Classification Rule for Recognition in Partially Exposed\n     Environments".  IEEE Transactions on Pattern Analysis and Machine\n     Intelligence, Vol. PAMI-2, No. 1, 67-71.\n   - Gates, G.W. (1972) "The Reduced Nearest Neighbor Rule".  IEEE Transactions\n     on Information Theory, May 1972, 431-433.\n   - See also: 1988 MLC Proceedings, 54-64.  Cheeseman et al"s AUTOCLASS II\n     conceptual clustering system finds 3 classes in the data.\n   - Many, many more ...', 'feature_names': ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)'], 'filename': 'iris.csv', 'data_module': 'sklearn.datasets.data'}
[[6.7 3.  5.2 2.3]
 [6.2 2.9 4.3 1.3]
 [6.  2.9 4.5 1.5]
 [5.7 4.4 1.5 0.4]
 [6.3 3.4 5.6 2.4]
 [4.9 2.5 4.5 1.7]
 [6.7 3.3 5.7 2.5]
 [5.4 3.  4.5 1.5]
 [5.1 3.4 1.5 0.2]
 [6.4 3.2 5.3 2.3]
 [5.1 3.3 1.7 0.5]
 [5.5 2.6 4.4 1.2]
 [5.1 3.7 1.5 0.4]
 [5.8 2.7 5.1 1.9]
 [7.9 3.8 6.4 2. ]
 [6.4 2.9 4.3 1.3]
 [6.3 2.3 4.4 1.3]
 [6.5 3.  5.5 1.8]
 [6.9 3.1 5.4 2.1]
 [6.3 3.3 6.  2.5]
 [6.7 3.1 4.7 1.5]
 [6.7 2.5 5.8 1.8]
 [6.3 2.5 4.9 1.5]
 [5.  2.3 3.3 1. ]
 [6.  2.7 5.1 1.6]
 [7.2 3.6 6.1 2.5]
 [7.7 3.8 6.7 2.2]
 [6.9 3.1 5.1 2.3]
 [6.5 2.8 4.6 1.5]
 [5.5 2.3 4.  1.3]
 [7.4 2.8 6.1 1.9]
 [4.9 2.4 3.3 1. ]
 [4.7 3.2 1.3 0.2]
 [6.4 2.8 5.6 2.1]
 [5.1 3.5 1.4 0.3]
 [6.3 2.7 4.9 1.8]
 [6.1 2.9 4.7 1.4]
 [5.4 3.4 1.7 0.2]] [[6.7 3.  5.2 2.3]
 [6.2 2.9 4.3 1.3]
 [6.  2.9 4.5 1.5]
 [5.7 4.4 1.5 0.4]
 [6.3 3.4 5.6 2.4]
 [4.9 2.5 4.5 1.7]
 [6.7 3.3 5.7 2.5]
 [5.4 3.  4.5 1.5]
 [5.1 3.4 1.5 0.2]
 [6.4 3.2 5.3 2.3]
 [5.1 3.3 1.7 0.5]
 [5.5 2.6 4.4 1.2]
 [5.1 3.7 1.5 0.4]
 [5.8 2.7 5.1 1.9]
 [7.9 3.8 6.4 2. ]
 [6.4 2.9 4.3 1.3]
 [6.3 2.3 4.4 1.3]
 [6.5 3.  5.5 1.8]
 [6.9 3.1 5.4 2.1]
 [6.3 3.3 6.  2.5]
 [6.7 3.1 4.7 1.5]
 [6.7 2.5 5.8 1.8]
 [6.3 2.5 4.9 1.5]
 [5.  2.3 3.3 1. ]
 [6.  2.7 5.1 1.6]
 [7.2 3.6 6.1 2.5]
 [7.7 3.8 6.7 2.2]
 [6.9 3.1 5.1 2.3]
 [6.5 2.8 4.6 1.5]
 [5.5 2.3 4.  1.3]
 [7.4 2.8 6.1 1.9]
 [4.9 2.4 3.3 1. ]
 [4.7 3.2 1.3 0.2]
 [6.4 2.8 5.6 2.1]
 [5.1 3.5 1.4 0.3]
 [6.3 2.7 4.9 1.8]
 [6.1 2.9 4.7 1.4]
 [5.4 3.4 1.7 0.2]]
[1]
[[1.92430987e-112 9.67308348e-001 3.26916523e-002]]
[[-2.57234963e+02 -3.32379639e-02 -3.42063552e+00]]
0.9473684210526315

Process finished with exit code 0