机器学习实战笔记_09_树回归_代码错误修正

网友投稿 787 2022-08-23

机器学习实战笔记_09_树回归_代码错误修正

机器学习实战笔记_09_树回归_代码错误修正

from numpy import *

def loadDataSet(fileName): dataMat = [] fr = open(fileName) for line in fr.readlines(): curLine = line.strip().split('\t') fltLine = map(float,curLine) dataMat.append(fltLine) return dataMatdef binSplitDataSet(dataSet, feature, value): mat0 = dataSet[nonzero(dataSet[:,feature] > value)[0],:] #第一处错误修正 mat1 = dataSet[nonzero(dataSet[:,feature] <= value)[0],:] #第一处错误修正 return mat0, mat1def regLeaf(dataSet):#returns the value used for each leaf return mean(dataSet[:,-1])def regErr(dataSet): return var(dataSet[:,-1]) * shape(dataSet)[0]def chooseBestSplit(dataSet, leafType=regLeaf, errType=regErr, ops=(1,4)): tolS = ops[0]; tolN = ops[1] #if all the target variables are the same value: quit and return value if len(set(dataSet[:,-1].T.tolist()[0])) == 1: #exit cond 1 return None, leafType(dataSet) m,n = shape(dataSet) #the choice of the best feature is driven by Reduction in RSS error from mean S = errType(dataSet) bestS = inf; bestIndex = 0; bestValue = 0 for featIndex in range(n-1): # for splitVal in set(dataSet[:,featIndex]): for splitVal in set((dataSet[:, featIndex].T.A.tolist())[0]): #第二处错误修正 mat0, mat1 = binSplitDataSet(dataSet, featIndex, splitVal) if (shape(mat0)[0] < tolN) or (shape(mat1)[0] < tolN): continue newS = errType(mat0) + errType(mat1) if newS < bestS: bestIndex = featIndex bestValue = splitVal bestS = newS #if the decrease (S-bestS) is less than a threshold don't do the split if (S - bestS) < tolS: return None, leafType(dataSet) #exit cond 2 mat0, mat1 = binSplitDataSet(dataSet, bestIndex, bestValue) if (shape(mat0)[0] < tolN) or (shape(mat1)[0] < tolN): #exit cond 3 return None, leafType(dataSet) return bestIndex,bestValue#returns the best feature to split on #and the value used for that splitdef createTree(dataSet, leafType=regLeaf, errType=regErr, ops=(1,4)): feat, val = chooseBestSplit(dataSet, leafType, errType, ops) #choose the best split if feat == None: return val #if the splitting hit a stop condition return val retTree = {} retTree['spInd'] = feat retTree['spVal'] = val lSet, rSet = binSplitDataSet(dataSet, feat, val) retTree['left'] = createTree(lSet, leafType, errType, ops) retTree['right'] = createTree(rSet, leafType, errType, ops) return retTree# testMat = mat(eye(4))# mat0, mat1 = binSplitDataSet(testMat, 1, 0.5) ## print testMat# print mat0# print mat1myDat = loadDataSet('ex00.txt')myMat=mat(myDat)createTree(myMat)

本人用的是python 2.7,但是敲击书上的源代码,总是运行错误,发现代码有两处错误,可以把我的代码和书上的代码对照,

错误地方已经标出regTrees.py

版权声明:本文内容由网络用户投稿,版权归原作者所有,本站不拥有其著作权,亦不承担相应法律责任。如果您发现本站中有涉嫌抄袭或描述失实的内容,请联系我们jiasou666@gmail.com 处理,核实后本网站将在24小时内删除侵权内容。

上一篇:读书笔记:Spark上数据的获取,处理与准备 下
下一篇:如何通过IP地址进行精准定位(ip地址精确定位)
相关文章

 发表评论

暂时没有评论,来抢沙发吧~