机器学习两周学习成果-FinClip官网

机器学习两周学习成果

网友投稿 886 2022-10-28

机器学习两周学习成果

JQW-学习成果

1.KNN算法原理学习和完成案例

1.1案例1 鸢尾花种类预测

1.1.1 数据集介绍1.1.2 代码过程1.1.3 实验结果

1.2 案例2 预测facebook签到位置

1.2.1 数据集介绍1.2.2 代码过程1.2.3 实验结果

2. 线性回归原理学习和案例完成

2.1 案例波士顿房价预测

2.1.1 数据介绍2.2.2 代码过程2.2.3 实验结果

3.逻辑回归原理学习和案例实现

3.1案例癌症分类预测-良／恶性乳腺癌肿瘤预测

3.1.1 数据介绍3.1.2 代码过程3.3.3 实验结果

4.决策树原理学习和案例实现

4.1 案例：泰坦尼克号乘客生存预测

4.1.1 数据介绍4.1.2 代码过程4.3.3 实验结果

5.聚类原理学习和案例实现

5.1 案例：探究用户对物品类别的喜好细分降维

5.1.1 数据介绍5.1.2 代码过程5.3.3 实验结果

6.其它成果

1.KNN算法原理学习和完成案例

1.1案例1 鸢尾花种类预测

1.1.1 数据集介绍

Iris数据集是常用的分类实验数据集，由Fisher, 1936收集整理。Iris也称鸢尾花卉数据集，是一类多重变量分析的数据集。关于数据集的具体介绍：

1.1.2 代码过程

from sklearn.datasets import load_irisfrom sklearn.model_selection import train_test_splitfrom sklearn.preprocessing import StandardScalerfrom sklearn.neighbors import KNeighborsClassifier# 1.获取数据集iris = load_iris()# 2.数据基本处理# x_train,x_test,y_train,y_test为训练集特征值、测试集特征值、训练集目标值、测试集目标值x_train, x_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2, random_state=22)# 3、特征工程：标准化transfer = StandardScaler()x_train = transfer.fit_transform(x_train)x_test = transfer.transform(x_test)# 4、机器学习(模型训练)estimator = KNeighborsClassifier(n_neighbors=9)estimator.fit(x_train, y_train)# 5、模型评估# 方法1：比对真实值和预测值y_predict = estimator.predict(x_test)print("预测结果为:\n", y_predict)print("比对真实值和预测值：\n", y_predict == y_test)# 方法2：直接计算准确率score = estimator.score(x_test, y_test)print("准确率为：\n", score)

1.1.3 实验结果

1.2 案例2 预测facebook签到位置

1.2.1 数据集介绍

大赛的目的是预测一个人想签入到哪个地方。对于本次比赛的目的，Facebook的创建一个人造的世界，包括位于10公里的10平方公里超过10万米的地方。对于一个给定的坐标，你的任务是返回最有可能的地方的排名列表。数据制作出类似于来自移动设备的位置的信号，给你需要什么与不准确的，嘈杂的价值观复杂的真实数据工作一.番风味。不-致的和错误的位置数据可能破坏，如Facebook入住服务经验。数据介绍：将根据用户的位置，准确性和时间戳预测用户正在查看的业务。

train.csv，test.csv row_id：登记事件的ID xy：坐标准确性：定位准确性时间：时间戳 place_id：业务的ID，这是您预测的目标

1.2.2 代码过程

# 1、获取数据集facebook = pd.read_csv("./data/FBlocation/train.csv")2.基本数据处理# 2.基本数据处理# 2.1 缩小数据范围facebook_data = facebook.query("x>2.0 & x<2.5 & y>2.0 & y<2.5")# 2.2 选择时间特征time = pd.to_datetime(facebook_data["time"], unit="s")time = pd.DatetimeIndex(time)facebook_data["day"] = time.dayfacebook_data["hour"] = time.hourfacebook_data["weekday"] = time.weekday# 2.3 去掉签到较少的地方place_count = facebook_data.groupby("place_id").count()place_count = place_count[place_count["row_id"]>3]facebook_data = facebook_data[facebook_data["place_id"].isin(place_count.index)]# 2.4 确定特征值和目标值x = facebook_data[["x", "y", "accuracy", "day", "hour", "weekday"]]y = facebook_data["place_id"]# 2.5 分割数据集x_train, x_test, y_train, y_test = train_test_split(x, y, random_state=22)# 3.特征工程--特征预处理(标准化)# 3.1 实例化一个转换器transfer = StandardScaler()# 3.2 调用fit_transformx_train = transfer.fit_transform(x_train)x_test = transfer.fit_transform(x_test)# 4.机器学习--knn+cv# 4.1 实例化一个估计器estimator = KNeighborsClassifier()# 4.2 调用gridsearchCVparam_grid = {"n_neighbors": [1, 3, 5, 7, 9]}estimator = GridSearchCV(estimator, param_grid=param_grid, cv=5)# 4.3 模型训练estimator.fit(x_train, y_train)# 5.模型评估# 5.1 预测值输出y_pre = estimator.predict(x_test)print("预测值为:\n", y_pre)# 5.2 scorescore = estimator.score(x_test, y_test)print("准确率为:\n", score)# 5.3 其他评价指标print("最好的模型：\n", estimator.best_estimator_)print("最好的结果:\n", estimator.best_score_)print("整体模型结果:\n", estimator.cv_results_)

1.2.3 实验结果

2. 线性回归原理学习和案例完成

2.1 案例波士顿房价预测

2.1.1 数据介绍

属性名	解释	类型
CRIM	该镇的人均犯罪率	连续值
ZN	占地面积超过25,000平方的住宅用地比例	连续值
INDUS	非零售商业用地比例	连续值
CHAS	是否邻近Charles River	离散值，1=邻近; 0=不邻近
NOX	一氧化氮浓度	连续值
RM	每栋房屋的平均客房数	连续值
AGE	1940年之前建成的自用单位比例	连续值
DIS	到波士顿5个就业中心的加权距离	连续值
RAD	到径向公路的可达性指数	连续值
TAX	全值财产税率	连续值
PTRATIO	学生与教师的比例	连续值
B	1000(BK - 0.63)^2,其中BK为黑人占比	连续值
LSTAT	低收入人群占比	连续值
MEDV	同类房屋价格的中位数	连续值

2.2.2 代码过程

def linear_model1(): """ 线性回归:正规方程 :return:None """ # 1.获取数据 data = load_boston() # 2.数据集划分 x_train, x_test, y_train, y_test = train_test_split(data.data, data.target, random_state=22) # 3.特征工程-标准化 transfer = StandardScaler() x_train = transfer.fit_transform(x_train) x_test = transfer.fit_transform(x_test) # 4.机器学习-线性回归(特征方程) estimator = LinearRegression() estimator.fit(x_train, y_train) # 5.模型评估 # 5.1 获取系数等值 y_predict = estimator.predict(x_test) print("预测值为:\n", y_predict) print("模型中的系数为:\n", estimator.coef_) print("模型中的偏置为:\n", estimator.intercept_) # 5.2 评价 # 均方误差 error = mean_squared_error(y_test, y_predict) print("误差为:\n", error)def linear_model2(): """ 线性回归:梯度下降法 :return:None """ # 1.获取数据 data = load_boston() # 2.数据集划分 x_train, x_test, y_train, y_test = train_test_split(data.data, data.target, random_state=22) # 3.特征工程-标准化 transfer = StandardScaler() x_train = transfer.fit_transform(x_train) x_test = transfer.fit_transform(x_test) # 4.机器学习-线性回归(特征方程) estimator = SGDRegressor(max_iter=1000) estimator.fit(x_train, y_train) # 5.模型评估 # 5.1 获取系数等值 y_predict = estimator.predict(x_test) print("预测值为:\n", y_predict) print("模型中的系数为:\n", estimator.coef_) print("模型中的偏置为:\n", estimator.intercept_) # 5.2 评价 # 均方误差 error = mean_squared_error(y_test, y_predict) print("误差为:\n", error)

2.2.3 实验结果

3.逻辑回归原理学习和案例实现

3.1案例癌症分类预测-良／恶性乳腺癌肿瘤预测

3.1.1 数据介绍

（1）699条样本，共11列数据，第一列用语检索的id，后9列分别是与肿瘤相关的医学特征，最后一列表示肿瘤类型的数值。（2）包含16个缺失值，用”?”标出。

3.1.2 代码过程

import pandas as pdimport numpy as npfrom sklearn.model_selection import train_test_splitfrom sklearn.preprocessing import StandardScalerfrom sklearn.linear_model import LogisticRegressionimport sslssl._create_default_= ssl._create_unverified_context# 1.获取数据names = ['Sample code number', 'Clump Thickness', 'Uniformity of Cell Size', 'Uniformity of Cell Shape', 'Marginal Adhesion', 'Single Epithelial Cell Size', 'Bare Nuclei', 'Bland Chromatin', 'Normal Nucleoli', 'Mitoses', 'Class']data = pd.read_csv(" names=names)data.head()# 2.基本数据处理# 2.1 缺失值处理data = data.replace(to_replace="?", value=np.NaN)data = data.dropna()# 2.2 确定特征值,目标值x = data.iloc[:, 1:10]x.head()y = data["Class"]y.head()# 2.3 分割数据x_train, x_test, y_train, y_test = train_test_split(x, y, random_state=22)# 3.特征工程(标准化)transfer = StandardScaler()x_train = transfer.fit_transform(x_train)x_test = transfer.transform(x_test)# 4.机器学习(逻辑回归)estimator = LogisticRegression()estimator.fit(x_train, y_train)# 5.模型评估y_predict = estimator.predict(x_test)y_predictestimator.score(x_test, y_test)

3.3.3 实验结果

4.决策树原理学习和案例实现

4.1 案例：泰坦尼克号乘客生存预测

4.1.1 数据介绍

在泰坦尼克号和titanic2数据帧描述泰坦尼克号上的个别乘客的生存状态。这里使用的数据集是由各种研究人员开始的。其中包括许多研究人员创建的旅客名单，由Michael A. Findlay编辑。提取的数据集中的特征是票的类别，存活，乘坐班，年龄，登陆，home.dest，房间，票，船和性别。

4.1.2 代码过程

import pandas as pdimport numpy as npfrom sklearn.feature_extraction import DictVectorizerfrom sklearn.model_selection import train_test_splitfrom sklearn.tree import DecisionTreeClassifier, export_graphviz# 1、获取数据titan = pd.read_csv("2.数据基本处理# 2.1 确定特征值,目标值x = titan[["pclass", "age", "sex"]]y = titan["survived"]# 2.2 缺失值处理# 缺失值需要处理，将特征当中有类别的这些特征进行字典特征抽取x['age'].fillna(x['age'].mean(), inplace=True)# 2.3 数据集划分x_train, x_test, y_train, y_test = train_test_split(x, y, random_state=22)# 3.特征工程(字典特征抽取)# 特征中出现类别符号，需要进行one-hot编码处理(DictVectorizer)x.to_dict(orient="records") 需要将数组特征转换成字典数据# 对于x转换成字典数据x.to_dict(orient="records")# [{"pclass": "1st", "age": 29.00, "sex": "female"}, {}]transfer = DictVectorizer(sparse=False)x_train = transfer.fit_transform(x_train.to_dict(orient="records"))x_test = transfer.fit_transform(x_test.to_dict(orient="records"))# 4.决策树模型训练和模型评估# 决策树API当中，如果没有指定max_depth那么会根据信息熵的条件直到最# 终结束。这里我们可以指定树的深度来进行限制树的大小# 4.机器学习(决策树)estimator = DecisionTreeClassifier(criterion="entropy", max_depth=5)estimator.fit(x_train, y_train)# 5.模型评估ret=estimator.score(x_test, y_test)rety_pre = estimator.predict(x_test)print(y_pre)

4.3.3 实验结果

5.聚类原理学习和案例实现

5.1 案例：探究用户对物品类别的喜好细分降维

5.1.1 数据介绍

order_products__prior.csv：订单与商品信息

字段：order_id, product_id, add_to_cart_order, reordered

products.csv：商品信息

字段：product_id, product_name, aisle_id, department_id

orders.csv：用户的订单信息

字段：order_id,user_id,eval_set,order_number,….

aisles.csv：商品所属具体物品类别

字段： aisle_id, aisle

5.1.2 代码过程

import pandas as pdfrom sklearn.decomposition import PCAfrom sklearn.cluster import KMeansfrom sklearn.metrics import silhouette_score# 1.获取数据order_product = pd.read_csv("./data/instacart/order_products__prior.csv")products = pd.read_csv("./data/instacart/products.csv")orders = pd.read_csv("./data/instacart/orders.csv")aisles = pd.read_csv("./data/instacart/aisles.csv")# 2.数据基本处理# 2.1 合并表格# 2.1 合并表格table1 = pd.merge(order_product, products, on=["product_id", "product_id"])table2 = pd.merge(table1, orders, on=["order_id", "order_id"])table = pd.merge(table2, aisles, on=["aisle_id", "aisle_id"])# 2.2 交叉表合并table = pd.crosstab(table["user_id"], table["aisle"])# 2.3 数据截取table = table[:1000]# 3.特征工程 — pcatransfer = PCA(n_components=0.9)data = transfer.fit_transform(table)# 4.机器学习（k-means）estimator = KMeans(n_clusters=8, random_state=22)estimator.fit_predict(data)# 5.模型评估silhouette_score(data, y_predict)

5.3.3 实验结果

6.其它成果

阅读内容

《机器学习》（周志华）部分章节《Python机器学习基础教程》（Sarah Guido）部分章节推荐系统、深度学习、交通预测相关的论文

智慧屏安装 app如何提升家庭娱乐与教育体验的关键工具

886 2022-10-28

机器学习两周学习成果

app开发者平台在数字化时代的重要性与发展趋势解析

智慧屏第三方App安装如何提升用户体验与功能拓展

智慧屏安装 app如何提升家庭娱乐与教育体验的关键工具

最近发表

更多内容

小程序SDK

Finclip技术文档

小程序开发

小程序容器

小程序框架

Finclip小程序平台

Finclip用户投稿

车联网

推荐文章

小程序SDK是什么意思？小程序sdk和插件有什么区别？

小程序支付功能怎么实现？

企业app开发流程是什么？

app运营模式有哪些？

小程序多端引流怎么做？

小程序生态分析的机会和威胁

Flutter入门这一篇效率文章就够了

原生与跨平台解决方案分析,跨平台软件开发技术方案

热更新技术：让软件更新变得更加轻松快速

解决方案

银行解决方案

证券解决方案

互联网解决方案

政企OA解决方案

科技解决方案

loT解决方案

信任解决方案

热评文章

AppCan:基于混合模式的移动应用开发,移动混合模

Hybrid App混合模式开发的了解

小程序容器技术助力券商数字营销突围，小程序容器化的意

用mpvue开发微信小程序基础知识（vue.js开发

小程序多端框架全面测评对比，强烈推荐！

券商app架构 - 解析券商应用程序的构建与设计