Netflix大奖数据-FinClip官网

Netflix大奖数据

网友投稿 1034 2022-09-02

Netflix大奖数据

原文：

Netflix Prize data

Dataset from Netflix's competition to improve their reccommendation algorithm

Context

Netflix held the Netflix Prize open competition for the best algorithm to predict user ratings for films. The grand prize was $1,000,000 and was won by BellKor's Pragmatic Chaos team. This is the dataset that was used in that competition.

Content

This comes directly from the README:

TRAINING DATASET FILE DESCRIPTION

The file "training_set.tar" is a tar of a directory containing 17770 files, one

per movie. The first line of each file contains the movie id followed by a

colon. Each subsequent line in the file corresponds to a rating from a customer

and its date in the following format:

CustomerID,Rating,Date

MovieIDs range from 1 to 17770 sequentially.CustomerIDs range from 1 to 2649429, with gaps. There are 480189 users.Ratings are on a five star (integral) scale from 1 to 5.Dates have the format YYYY-MM-DD.

MOVIES FILE DESCRIPTION

Movie information in "movie_titles.txt" is in the following format:

MovieID,YearOfRelease,Title

MovieID do not correspond to actual Netflix movie ids or IMDB movie ids.YearOfRelease can range from 1890 to 2005 and may correspond to the release of

corresponding DVD, not necessarily its theaterical release.

Title is the Netflix movie title and may not correspond to

titles used on other sites. Titles are in English.

QUALIFYING AND PREDICTION DATASET FILE DESCRIPTION

The qualifying dataset for the Netflix Prize is contained in the text file

"qualifying.txt". It consists of lines indicating a movie id, followed by a

colon, and then customer ids and rating dates, one per line for that movie id.

The movie and customer ids are contained in the training set. Of course the

ratings are withheld. There are no empty lines in the file.

MovieID1:

CustomerID11,Date11

CustomerID12,Date12

…

MovieID2:

CustomerID21,Date21

CustomerID22,Date22

For the Netflix Prize, your program must predict the all ratings the customers

gave the movies in the qualifying dataset based on the information in the

training dataset.

The format of your submitted prediction file follows the movie and customer id,

date order of the qualifying dataset. However, your predicted rating takes the

place of the corresponding customer id (and date), one per line.

For example, if the qualifying dataset looked like:

111:

3245,2005-12-19

5666,2005-12-23

6789,2005-03-14

225:

1234,2005-05-26

3456,2005-11-07

then a prediction file should look something like:

111:

3.0

3.4

4.0

225:

1.0

2.0

which predicts that customer 3245 would have rated movie 111 3.0 stars on the

19th of Decemeber, 2005, that customer 5666 would have rated it slightly higher

at 3.4 stars on the 23rd of Decemeber, 2005, etc.

You must make predictions for all customers for all movies in the qualifying

dataset.

THE PROBE DATASET FILE DESCRIPTION

To allow you to test your system before you submit a prediction set based on the

qualifying dataset, we have provided a probe dataset in the file "probe.txt".

This text file contains lines indicating a movie id, followed by a colon, and

then customer ids, one per line for that movie id.

MovieID1:

CustomerID11

CustomerID12

…

MovieID2:

CustomerID21

CustomerID22

Like the qualifying dataset, the movie and customer id pairs are contained in

the training set. However, unlike the qualifying dataset, the ratings (and

dates) for each pair are contained in the training dataset.

If you wish, you may calculate the RMSE of your predictions against those

ratings and compare your RMSE against the Cinematch RMSE on the same data. See

that value.

译文：

Netflix大奖数据

来自Netflix竞争对手的数据集，以改进其推荐算法

概述：

Netflix举办了Netflix大奖公开赛，评选预测电影用户评级的最佳算法。大奖是100万美元，由贝尔科尔的务实混沌团队赢得。这是比赛中使用的数据集。

所容纳之物

这直接来自自述：

训练数据集文件描述

文件“training_set.tar”是包含17770个文件的目录的tar，一个每部电影。每个文件的第一行包含电影id，后跟冒号文件中的每个后续行对应于客户的评级及其日期，格式如下：

客户ID、等级、日期

● 电影ID的范围从1到17770。

● CustomerID的范围从1到2649429，有间隙。有480189个用户。

● 评级为五星（积分）等级，从1到5。

● 日期的格式为YYYY-MM-DD。

电影文件描述

“Movie_titles.txt”中的电影信息采用以下格式：

电影ID、租赁年、片名

● MovieID与实际的Netflix电影ID或IMDB电影ID不对应。

● release的年份范围从1890年到2005年，可能对应于相应的DVD，不一定是实物版。

● 标题是Netflix电影的标题，可能与

在其他网站上使用的标题。标题是英文的。

限定和预测数据集文件描述

Netflix大奖的合格数据集包含在文本文件中

“qualification.txt”。它由指示电影id的行组成，后跟冒号，然后是客户id和评级日期，该电影id每行一个。

电影和客户ID包含在培训集中。当然是

评级被扣留。文件中没有空行。

影片编号1：

客户11，日期11

客户12，日期12

…

第二部电影：

客户编号21，日期21

客户号22，日期22

对于Netflix大奖，您的程序必须预测所有客户的评分

根据中的信息提供符合条件的数据集中的电影训练数据集。

您提交的预测文件的格式遵循电影和客户id，符合条件的数据集的日期顺序。然而，你的预测评级需要对应客户id（和日期）的位置，每行一个。

例如，如果符合条件的数据集如下所示：

111:

3245,2005-12-19

5666,2005-12-23

6789,2005-03-14

225:

1234,2005-05-26

3456,2005-11-07

然后，预测文件应类似于：

111:

3.4

225:

据预测，客户3245会将电影111评为3.0级明星

2005年12月19日，该客户5666会对其进行稍高的评级

2005年12月23日的3.4颗星等。

您必须对排位赛中所有电影的所有客户进行预测数据集。

探测数据集文件描述允许您在提交基于的预测集之前测试系统

通过限定数据集，我们在文件“probe.txt”中提供了一个探测数据集。

此文本文件包含指示电影id的行，后跟冒号，以及然后是客户id，每行一个电影id。

影片编号1：

定制的11

客户12

…

第二部电影：

定制21

客户化22

与限定数据集一样，电影和客户id对也包含在

训练集。但是，与限定数据集不同，评级（和培训数据集中包含每对的日期）。

如果您愿意，您可以根据这些数据计算预测的RMSE

评级，并在相同数据上将您的RMSE与Cinematch RMSE进行比较。看见http://netflixprize.com/faq#probe为了这个价值。

网络小程序开发（小程序开发软件开发）

1034 2022-09-02

Netflix大奖数据

网络小程序开发（小程序开发软件开发）

给微信小程序配一个App如何？

小程序——疫情下企业数字化的新方向

最近发表

更多内容

小程序SDK

Finclip技术文档

小程序开发

小程序容器

小程序框架

Finclip小程序平台

Finclip用户投稿

车联网

推荐文章

小程序SDK是什么意思？小程序sdk和插件有什么区别？

小程序支付功能怎么实现？

企业app开发流程是什么？

app运营模式有哪些？

小程序多端引流怎么做？

小程序生态分析的机会和威胁

Flutter入门这一篇效率文章就够了

原生与跨平台解决方案分析,跨平台软件开发技术方案

热更新技术：让软件更新变得更加轻松快速

解决方案

银行解决方案

证券解决方案

互联网解决方案

政企OA解决方案

科技解决方案

loT解决方案

信任解决方案

热评文章

AppCan:基于混合模式的移动应用开发,移动混合模

Hybrid App混合模式开发的了解

小程序容器技术助力券商数字营销突围，小程序容器化的意

用mpvue开发微信小程序基础知识（vue.js开发

小程序多端框架全面测评对比，强烈推荐！

券商app架构 - 解析券商应用程序的构建与设计