计算经验条件熵 :param trainDataArr_DevFeature:切割后只有feature那列数据的数组 :param trainLabelArr: 标签集数组 :return: 经验条件熵
(trainDataArr_DevFeature, trainLabelArr)
| 99 | return H_D |
| 100 | |
| 101 | def calcH_D_A(trainDataArr_DevFeature, trainLabelArr): |
| 102 | ''' |
| 103 | 计算经验条件熵 |
| 104 | :param trainDataArr_DevFeature:切割后只有feature那列数据的数组 |
| 105 | :param trainLabelArr: 标签集数组 |
| 106 | :return: 经验条件熵 |
| 107 | ''' |
| 108 | #初始为0 |
| 109 | H_D_A = 0 |
| 110 | #在featue那列放入集合中,是为了根据集合中的数目知道该feature目前可取值数目是多少 |
| 111 | trainDataSet = set([label for label in trainDataArr_DevFeature]) |
| 112 | |
| 113 | #对于每一个特征取值遍历计算条件经验熵的每一项 |
| 114 | for i in trainDataSet: |
| 115 | #计算H(D|A) |
| 116 | #trainDataArr_DevFeature[trainDataArr_DevFeature == i].size / trainDataArr_DevFeature.size:|Di| / |D| |
| 117 | #calc_H_D(trainLabelArr[trainDataArr_DevFeature == i]):H(Di) |
| 118 | H_D_A += trainDataArr_DevFeature[trainDataArr_DevFeature == i].size / trainDataArr_DevFeature.size \ |
| 119 | * calc_H_D(trainLabelArr[trainDataArr_DevFeature == i]) |
| 120 | #返回得出的条件经验熵 |
| 121 | return H_D_A |
| 122 | |
| 123 | def calcBestFeature(trainDataList, trainLabelList): |
| 124 | ''' |
no test coverage detected