Jester数据集-FinClip官网

Jester数据集

网友投稿 1090 2022-10-09

Jester数据集

原文：

4.1 Million continuous ratings (-10.00 to +10.00) of 100 jokes from 73,421 users: collected between April 1999 - May 2003.

Freely available for research use when acknowledged with the following reference:

Eigentaste: A Constant Time Collaborative Filtering Algorithm. Ken Goldberg, Theresa Roeder, Dhruv Gupta, and Chris Perkins. Information Retrieval, 4(2), 133-151. July 2001.

(Aside: many papers, including ours, report Normalized Mean Absolute Error (NMAE) rates of approx 20%. How good is this compared with random guessing? In the Appendix to our paper, we show that if user ratings are uniformly distributed, random guessing yields NMAE = 33%.)

As a courtesy, if you use the data, I would appreciate knowing your name, what research group you are in, and the publications that may result.

The Jester Dataset (save to disk, then unzip to obtain Excel files):

jester-data-1.zip : (3.9MB) Data from 24,983 users who have rated 36 or more jokes, a matrix with dimensions 24983 X 101.

jester-data-2.zip : (3.6MB) Data from 23,500 users who have rated 36 or more jokes, a matrix with dimensions 23500 X 101.

jester-data-3.zip : (2.1MB) Data from 24,938 users who have rated between 15 and 35 jokes, a matrix with dimensions 24,938 X 101.

Format:

3 Data files contain anonymous ratings data from 73,421 users.

Data files are in .zip format, when unzipped, they are in Excel (.xls) format

Ratings are real values ranging from -10.00 to +10.00 (the value "99" corresponds to "null" = "not rated").

One row per user

The first column gives the number of jokes rated by that user. The next 100 columns give the ratings for jokes 01 - 100.

The sub-matrix including only columns {5, 7, 8, 13, 15, 16, 17, 18, 19, 20} is dense. Almost all users have rated those jokes (see discussion of "universal queries" in the above paper).

译文：

在1999年4月至2003年5月期间，73421名用户对100个笑话进行了410万次连续评分（-10.00到+10.00）。

经以下参考确认后，可免费用于研究：

Eigentaste：一种恒定时间的协同过滤算法。Ken Goldberg，Theresa Roeder，Dhruv Gupta和Chris Perkins。信息检索，4（2），133-151。2001年7月。

（旁白：许多论文，包括我们的，报告标准化平均绝对误差（NMAE）率约为20%。这和随机猜测相比有多好？在本文的附录中，我们发现，如果用户评分是均匀分布的，随机猜测的结果是NMAE=33%。）

出于礼貌，如果您使用这些数据，我将非常感谢您知道您的姓名、您所在的研究小组以及可能产生的出版物。

Jester数据集（保存到磁盘，然后解压缩以获取Excel文件）：

jester-data-1.zip:（3.9MB）来自24983个用户的数据，这些用户给36个或更多的笑话打分，一个尺寸为24983x101的矩阵。

jester-data-2.zip:（3.6MB）来自23500个用户的数据，他们给36个或更多的笑话打分，一个尺寸为23500 X 101的矩阵。

jester-data-3.zip:（2.1MB）来自24938个用户的数据，这些用户的笑话评分在15到35个之间，这是一个尺寸为24938x101的矩阵。

格式：

3个数据文件包含来自73421个用户的匿名评级数据。

数据文件为.zip格式，解压缩时为Excel（.xls）格式，定值是介于-10.00到+10.00之间的实际值（“99”对应于“null”=“未评级”）。

每个用户一行第一列给出了该用户评定的笑话数。接下来的100个专栏给出了笑话01-100的评分。

只包含{5，7，8，13，15，16，17，18，19，20}列的子矩阵是稠密的。几乎所有的用户都对这些笑话进行了评级（参见上述文章中关于“通用查询”的讨论）。

微前端架构如何改变企业的开发模式与效率提升

1090 2022-10-09

Jester数据集

微前端架构如何改变企业的开发模式与效率提升

前端开源框架如何提升开发效率与用户体验的关键因素

前端移动端框架如何推动数字化转型与创新发展

最近发表

更多内容

小程序SDK

Finclip技术文档

小程序开发

小程序容器

小程序框架

Finclip小程序平台

Finclip用户投稿

车联网

推荐文章

小程序SDK是什么意思？小程序sdk和插件有什么区别？

小程序支付功能怎么实现？

企业app开发流程是什么？

app运营模式有哪些？

小程序多端引流怎么做？

小程序生态分析的机会和威胁

Flutter入门这一篇效率文章就够了

原生与跨平台解决方案分析,跨平台软件开发技术方案

热更新技术：让软件更新变得更加轻松快速

解决方案

银行解决方案

证券解决方案

互联网解决方案

政企OA解决方案

科技解决方案

loT解决方案

信任解决方案

热评文章

AppCan:基于混合模式的移动应用开发,移动混合模

Hybrid App混合模式开发的了解

小程序容器技术助力券商数字营销突围，小程序容器化的意

用mpvue开发微信小程序基础知识（vue.js开发

小程序多端框架全面测评对比，强烈推荐！

券商app架构 - 解析券商应用程序的构建与设计