一種基于Bagging和離群點的分類結果置信度的度量方法技術

技術編號：15691537 閱讀：73 留言：0更新日期：2017-06-24 04:49

本發明專利技術公開了一種基于Bagging和離群點的分類結果置信度的度量方法，首先采用Logistic回歸、支持向量機和樸素貝葉斯中的一個作為基分類器對待度量置信度數據進行分類，計算在不同分類中的分類概率得到待度量置信度數據的分類結果集和分類概率集，通過分類結果集得到待度量置信度數據的分類結果；在分類概率集中，將每個分類作為空間中一個點，以分類結果對應分類概率集中的點作為離群點，余下分類對應分類概率集中的點為一個簇，最后使用歐式距離比較簇內每個點到簇質心的距離和到離群點的距離，若滿足簇內所有點到簇質心的距離小于到離群點的距離，則該分類結果為可信，反之則為不可信。本發明專利技術避免了模型再學習時因采用了不可信的分類結果對訓練模型的影響。

A method for measuring confidence in classification results based on Bagging and outlier

The invention discloses a method for measuring the reliability of the classification based on Bagging and outliers, firstly using Logistic regression, support vector machine and a Naive Bayesian as the base classifier to measure the reliability of data classification, calculation of probability in different categories of measurement to obtain the reliability of data classification set and the classification probability set, through the classification result set to obtain the classification measure for the reliability of data; focus on the classification probability of each classification as a point in space, the classification results corresponding to centralized classification probability points as outliers, the remaining classification corresponding classification probability set point as a cluster, and finally the use of European style distance between each point within a cluster to cluster centroid distance and outlier distance, if the cluster of all point to the cluster centroid is smaller than the distance to the outlier Distance, then the classification results are credible, and the contrary is not credible. The invention avoids the influence of the training result on the training model when the model is re learning.

全部詳細技術資料下載

【技術實現步驟摘要】
一種基于Bagging和離群點的分類結果置信度的度量方法
本專利技術屬于分類結果置信度度量
，特別涉及一種基于Bagging和離群點的分類結果置信度的度量方法。
技術介紹
通過待度量數據來提高模型的準確性是在線學習中重要的部分，而如何保持學習數據的準確性變得尤為重要。分類結果置信度度量的方法是對每次分類后用于衡量分類的結果可信或不可信的方法，這對保持訓練集和模型再訓練有很重要的意義。傳統的對Logistic回歸、SVM和樸素貝葉斯等模型分類結果不進行置信度度量，模型再學習時無法避免學習不可信的分類結果對模型的影響。嚴云洋和朱全銀等人已有的研究基礎包括：嚴云洋,吳茜茵,杜靜,周靜波,劉以安.基于色彩和閃頻特征的視頻火焰檢測.計算機科學與探索，2014,08(10):1271-1279；SGao，JYang，YYan.Anovelmultiphaseactivecontourmodelforinhomogeneousimagesegmentation.MultimediaToolsandApplications，2014,72(3):2321-2337；SGao,JYang,YYan.Alocalmodifiedchan–vesemodelforsegmentinginhomogeneousmultiphaseimages.InternationalJournalofImagingSystemsandTechnology,2012,22(2):103-113；劉金嶺,嚴云洋.基于上下文的短信文本分類方法.計算機工程,2011,37(10):41-43；嚴...
一種基于Bagging和離群點的分類結果置信度的度量方法

【技術保護點】
一種基于Bagging和離群點的分類結果置信度的度量方法，其特征在于，包括如下步驟：步驟一：對已有可信數據集采用Bagging集成學習方法，即采用Logistic回歸、支持向量機和樸素貝葉斯中一個作為基分類器，得到基分類器的分類模型集；步驟二：通過步驟一得出的基分類器的分類模型集，對待度量置信度數據進行分類，并計算在不同分類中的分類概率，得到待度量置信度數據的分類結果集和待度量置信度數據的分類概率集，再對分類結果集進行統計，得到待度量置信度數據的分類結果；步驟三：采用離群點分析方法，對待度量置信度數據的分類結果進行置信度度量，得到待度量置信度數據中的可信數據和不可信數據，并將待度量置信度數據中滿足置信條件的數據加入已有可信數據集。

【技術特征摘要】
1.一種基于Bagging和離群點的分類結果置信度的度量方法，其特征在于，包括如下步驟：步驟一：對已有可信數據集采用Bagging集成學習方法，即采用Logistic回歸、支持向量機和樸素貝葉斯中一個作為基分類器，得到基分類器的分類模型集；步驟二：通過步驟一得出的基分類器的分類模型集，對待度量置信度數據進行分類，并計算在不同分類中的分類概率，得到待度量置信度數據的分類結果集和待度量置信度數據的分類概率集，再對分類結果集進行統計，得到待度量置信度數據的分類結果；步驟三：采用離群點分析方法，對待度量置信度數據的分類結果進行置信度度量，得到待度量置信度數據中的可信數據和不可信數據，并將待度量置信度數據中滿足置信條件的數據加入已有可信數據集。2.根據權利要求1所述的基于Bagging和離群點的分類結果置信度的度量方法，其特征在于，所述步驟一中得到基分類器的分類模型集的具體方法為：步驟1.1：定義已有可信數據集的特征和分類屬性；步驟1.2：選擇Logistic回歸、支持向量機和樸素貝葉斯中一個作為基分類器Function；步驟1.3：對步驟1.1中定義過的已有可信數據集采用Bagging集成學習方法，以步驟1.2中選擇的Function為基分類器，得到Function的分類模型集。3.根據權利要求1所述的基于Bagging和離群點的分類結果置信度的度量方法，其特征在于，所述步驟二中得到待度量置信度數據的分類結果的具體方法為：步驟2.1：對待度量置信度數據進行分類，并計算不同分類中的分類概率，得到待度量置信度數據的分...

【專利技術屬性】
技術研發人員：嚴云洋，瞿學新，朱全銀，于柿民，趙陽，唐海波，潘舒新，
申請(專利權)人：淮陰工學院，
類型：發明
國別省市：江蘇,32

全部詳細技術資料下載我是這個專利的主人

相關技術

網友詢問留言已有0條評論

還沒有人留言評論。發表了對其他瀏覽者有用的留言會獲得科技券。

發布您的意見

相關領域技術