Python网络爬虫之制作股票数据定向爬虫以及爬取的优化可以显示进度条！-FinClip官网

Python网络爬虫之制作股票数据定向爬虫以及爬取的优化可以显示进度条！

网友投稿 785 2022-11-20

Python网络爬虫之制作股票数据定向爬虫以及爬取的优化可以显示进度条！

候选网站：新浪股票：百度股票：获取股票列表 2. 根据列表信息到百度获取个股信息2，根据列表信息到百度获取个股信息 3. 将结果存储

考虑用字典作为数据容器进行存储！

火狐浏览器可以查看源代码，蓝色的IE浏览器就会出现乱码：

火狐的:

因为a标签，太多所以正则表达式匹配比较困难。

可用try except来解决！

[s]：表示s。[hz]：表示h z。后面是随意6个数。

SH:

SZ:

优化：

r.encoding:仅从头部获得r.apparent_encoding:是从全文获得的。r.apparent_encoding:是从全文获得的。

优化就是将编码直接给代码，另外一个就是显示进度。

下面就是代码部分啦：

最初的代码：(真长)

import requestsfrom bs4 import BeautifulSoupimport tracebackimport redef getHTMLText(url): try: r = requests.get(url, timeout = 30) r.raise_for_status() r.encoding = r.apparent_encoding return r.text except: return ""def getStockList(lst, stockURL): html = getHTMLText(stockURL) soup = BeautifulSoup(html, 'html.parser') a = soup.find_all('a') for i in a: try: href = i.attrs['href'] lst.append(re.findall(r"[s][hz]\d{6}",href)[0]) except: continue def getStockInfo(lst, stockURL, fpath): for stock in lst: url = stockURL + stock + ".html" html = getHTMLText(url) try: if html == "": continue infoDict = {} soup = BeautifulSoup(html, 'html.parser') stockInfo = soup.find('div', attrs={'class':'stock-bets'}) name = stockInfo.find_all(attrs={'class':'bets-name'})[0] infoDict.update({'股票名称':name.text.split()[0]}) keyList = stockInfo.find_all('dt') valueList = stockInfo.find_all('dd') for i in range(len(keyList)): key = keyList[i].text val = valurList[i].text infoDict[key] = val with open(fpath, 'a', encoding='utf-8') as f: f.write(str(infoDict) + '\n') except: traceback.print_exc() continuedef main(): stock_list_url = ' stock_info_url = ' output_file = 'D:\234.txt' slist = [] getStockList(slist, stock_list_url) getStockInfo(slist, stock_info_url, output_file)main()

代码执行结果;

优化后的代码：

import requestsfrom bs4 import BeautifulSoupimport tracebackimport redef getHTMLText(url,code='utf-8'):#默认的是utf-8 try: r = requests.get(url, timeout = 30) r.raise_for_status() r.encoding = code#直接赋值 return r.text except: return ""def getStockList(lst, stockURL): html = getHTMLText(stockURL,'GB2312')#已经查询过啦！ soup = BeautifulSoup(html, 'html.parser') a = soup.find_all('a') for i in a: try: href = i.attrs['href'] lst.append(re.findall(r"[s][hz]\d{6}",href)[0]) except: continue def getStockInfo(lst, stockURL, fpath): count = 0 for stock in lst: url = stockURL + stock + ".html" html = getHTMLText(url) try: if html == "": continue infoDict = {} soup = BeautifulSoup(html, 'html.parser') stockInfo = soup.find('div', attrs={'class':'stock-bets'}) name = stockInfo.find_all(attrs={'class':'bets-name'})[0] infoDict.update({'股票名称':name.text.split()[0]}) keyList = stockInfo.find_all('dt') valueList = stockInfo.find_all('dd') for i in range(len(keyList)): key = keyList[i].text val = valurList[i].text infoDict[key] = val with open(fpath, 'a', encoding='utf-8') as f: f.write(str(infoDict) + '\n') count = count +1 print('\r当前速度：{:.2f}%'.format(count*100/len(lst)),end=' ') except: count = count +1 print('\r当前速度：{:.2f}%'.format(count*100/len(lst)),end=' ') traceback.print_exc() continuedef main(): stock_list_url = ' stock_info_url = ' output_file = 'D:\234.txt' slist = [] getStockList(slist, stock_list_url) getStockInfo(slist, stock_info_url, output_file)main()

提前给出了编码方式以及可以显示进度条的代码给出编码方式的代码：

def getHTMLText(url,code='utf-8'):#默认的是utf-8 try: r = requests.get(url, timeout = 30) r.raise_for_status() r.encoding = code#直接赋值 return r.text except: return ""def getStockList(lst, stockURL): html = getHTMLText(stockURL,'GB2312')#已经查询过啦！ soup = BeautifulSoup(html, 'html.parser') a = soup.find_all('a') for i in a: try: href = i.attrs['href'] lst.append(re.findall(r"[s][hz]\d{6}",href)[0]) except: continue

照片：

（如果不是utf-8，就要提前给替换掉！）

可以显示进度条的代码

def getStockInfo(lst, stockURL, fpath): count = 0 for stock in lst: url = stockURL + stock + ".html" html = getHTMLText(url) try: if html == "": continue infoDict = {} soup = BeautifulSoup(html, 'html.parser') stockInfo = soup.find('div', attrs={'class':'stock-bets'}) name = stockInfo.find_all(attrs={'class':'bets-name'})[0] infoDict.update({'股票名称':name.text.split()[0]}) keyList = stockInfo.find_all('dt') valueList = stockInfo.find_all('dd') for i in range(len(keyList)): key = keyList[i].text val = valurList[i].text infoDict[key] = val with open(fpath, 'a', encoding='utf-8') as f: f.write(str(infoDict) + '\n') count = count +1 print('\r当前速度：{:.2f}%'.format(count*100/len(lst)),end=' ') except: count = count +1 print('\r当前速度：{:.2f}%'.format(count*100/len(lst)),end=' ') traceback.print_exc() continue

照片：

不过，显示进度在IDLE那里不可以显示。

但是最后我也没成功有文件生成以及显示进度条，算啦。先去吃饭啦~

洞察探索如何利用兼容微信生态的小程序容器，实现跨平台开发，助力金融和车联网行业的数字化转型。

785 2022-11-20

Python网络爬虫之制作股票数据定向爬虫以及爬取的优化可以显示进度条！

洞察探索如何利用兼容微信生态的小程序容器，实现跨平台开发，助力金融和车联网行业的数字化转型。

洞察企业如何通过FinClip提升跨平台小程序加载效率，适应多样化市场需求

随着数字化转型的不断深入，小程序技术标准如何在金融、安全和物联网等领域实现高效合规运营？

最近发表

更多内容

小程序SDK

Finclip技术文档

小程序开发

小程序容器

小程序框架

Finclip小程序平台

Finclip用户投稿

车联网

推荐文章

小程序SDK是什么意思？小程序sdk和插件有什么区别？

小程序支付功能怎么实现？

企业app开发流程是什么？

app运营模式有哪些？

小程序多端引流怎么做？

小程序生态分析的机会和威胁

Flutter入门这一篇效率文章就够了

原生与跨平台解决方案分析,跨平台软件开发技术方案

热更新技术：让软件更新变得更加轻松快速

解决方案

银行解决方案

证券解决方案

互联网解决方案

政企OA解决方案

科技解决方案

loT解决方案

信任解决方案

热评文章

AppCan:基于混合模式的移动应用开发,移动混合模

Hybrid App混合模式开发的了解

小程序容器技术助力券商数字营销突围，小程序容器化的意

用mpvue开发微信小程序基础知识（vue.js开发

小程序多端框架全面测评对比，强烈推荐！

券商app架构 - 解析券商应用程序的构建与设计

Python网络爬虫之制作股票数据定向爬虫 以及爬取的优化 可以显示进度条！

最近发表

更多内容

推荐文章

解决方案

热评文章

Python网络爬虫之制作股票数据定向爬虫以及爬取的优化可以显示进度条！