Goodreads-books(好书籍相关数据集)

网友投稿 806 2022-11-11

Goodreads-books(好书籍相关数据集)

Goodreads-books(好书籍相关数据集)

原文:

Goodreads-books

comprehensive list of all books listed in goodreads

The primary reason for creating this dataset is the requirement of a good clean dataset of books. Being a bookie myself (see what I did there?) I had searched for datasets on books in kaggle itself - and I found out that while most of the datasets had a good amount of books listed, there were either a) major columns missing or b) grossly unclean data. I mean, you can't determine how good a book is just from a few text reviews, come on! What I needed were numbers, solid integers and floats that say how many people liked the book or hated it, how much did they like it, and stuff like that. Even the good dataset that I found was well-cleaned, it had a number of interlinked files, which increased the hassle. This prompted me to use the Goodreads API to get a well-cleaned dataset, with the promising features only ( minus the redundant ones ), and the result is the dataset you're at now.

译:

好书

古德雷兹所列全部书籍的综合清单

创建这个数据集的主要原因是需要一个干净的图书数据集。我自己是个赌徒(看到我在那里做了什么了吗?)我在kaggle自己的书中搜索了数据集,我发现,虽然大多数数据集都列出了大量的书,但要么是a)主要列缺失,要么是b)数据极不干净。我的意思是,你不能仅仅从几篇课文评论就决定一本书有多好,拜托!我需要的是数字、实心整数和浮点数,这些数字可以表示有多少人喜欢或讨厌这本书,有多少人喜欢这本书,等等。即使是我发现的好数据集也很干净,它有许多相互关联的文件,这增加了麻烦。这促使我使用GoodReadsAPI来获得一个干净的数据集,只包含有希望的特性(减去多余的特性),结果就是现在的数据集。

版权声明:本文内容由网络用户投稿,版权归原作者所有,本站不拥有其著作权,亦不承担相应法律责任。如果您发现本站中有涉嫌抄袭或描述失实的内容,请联系我们jiasou666@gmail.com 处理,核实后本网站将在24小时内删除侵权内容。

上一篇:浅谈springboot如何保证多线程安全
下一篇:cordova build时出错Minimum supported Gradle version is 5.4.1. Current version is 4.10.3
相关文章

 发表评论

暂时没有评论,来抢沙发吧~