python keras bert ChnSentiCorp情感分析酒店评论分类

网友投稿 1352 2022-08-23

python keras bert ChnSentiCorp情感分析酒店评论分类

python keras bert ChnSentiCorp情感分析酒店评论分类

今天帮助了以为外国人做了一下keras版本的中文情感分析,我这里没有去掉标点符号,停用词这些,有兴趣的读者可以自己实现。我这里把代码分享出来。数据集-为:​​ ubuntu 16.04 python 3.6 tensorflow-gpu  2.2.4

pip install kashgari

代码

from kashgari.embeddings import BERTEmbeddingfrom kashgari.tasks.classification import CNNLSTMModelfrom kashgari.corpus import SMP2017ECDTClassificationCorpusimport numpy as npfrom nltk.tokenize import RegexpTokenizerimport stringimport osimport pandas as pdimport jiebaroot_path='./dataset/chnsenticorp'train_path=os.path.join(root_path,"train.tsv")train_data=pd.read_csv(train_path,sep='\t')print(train_data.head())def tokenlize_(string): seg_list=jieba.cut(string, cut_all=True) return list(seg_list)train_data['text_a']=train_data['text_a'].apply(lambda x:tokenlize_(x))print(train_data.head())train_data_x = np.array(train_data['text_a'])train_x=train_data_x.tolist()print(train_x[0][0])# Y = pd.get_dummies(train_data['label'].astype(int))# print(Y.head())# train_data['label'] = train_data['label'].astype('int32').map({1:"positive" , 0:"negative"})# print(train_data[0])# for row in train_data['label'].rows:# print(row)# breaktrain_data_y = np.array(train_data['label'])train_y=train_data_y.tolist()print(train_y[0])validation=pd.read_csv(os.path.join(root_path,"test.tsv"),sep='\t')validation['text_a']=validation['text_a'].apply(lambda x:tokenlize_(x))test_data = np.array(validation['text_a'])test_x=test_data.tolist()test_data_y = np.array(validation['label'])test_y=test_data_y.tolist()print(test_y[0])bert_embedding = BERTEmbedding('bert-base-chinese', sequence_length=30) model = CNNLSTMModel(bert_embedding)# train_x, train_y = SMP2017ECDTClassificationCorpus.get_classification_data()model.fit(train_x, train_y, epochs=10,validation_data=(test_x,test_y))scores=model.evaluate(test_x,test_y)# print("Accuracy:%.2f%%"%(scores[1]*100))

参考文献

[1].Kashgari. ​​https://github.com/BrikerMan/Kashgari​​

版权声明:本文内容由网络用户投稿,版权归原作者所有,本站不拥有其著作权,亦不承担相应法律责任。如果您发现本站中有涉嫌抄袭或描述失实的内容,请联系我们jiasou666@gmail.com 处理,核实后本网站将在24小时内删除侵权内容。

上一篇:python3用PIL把图片转换为RGB图片
下一篇:[leetcode] 144. Binary Tree Preorder Traversal
相关文章

 发表评论

暂时没有评论,来抢沙发吧~