python爬虫分析器bs4，xpath，pquery

小秦哥 · 2025-3-23 17:18:42

0x00 bs4

分析器的作用就是可以直接分析html页面，可以直接从网页中提取标签中的内容，而不消在利用正则表达式举行提起数据

复制代码

案列爬取图片
https://haowallpaper.com/

复制代码

0x01 xpath

案列爬取什么值得买手机价格和手机型号，当源码过多时可以把源码下载下来，删除无用代码然后在举行分析

import requests
from lxml import etree
from PIL import Image, ImageDraw, ImageFont
headers={"user-agent":
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36"}
url='https://www.smzdm.com/fenlei/zhinengshouji/'
req=requests.get(url)
html=etree.HTML(req.content)
price_list=html.xpath('//a[@class="z-highlight "]/text()')
title_list=html.xpath('//h5[@class="feed-block-title"]/a[1]/text()')
# a_list=html.xpath("//div[@class='z-feed-img']//img/@src")
count=0
for i in price_list:
count+=1
with open("3.txt","a+",encoding='gbk') as f:
for i in range(0,count):
f.write(price_list[i].strip())
f.write(title_list[i].strip()+'\n')

复制代码

0x02 PyQuery

1.PyQuery基础利用

复制代码

修改html页面代码

复制代码

2.案列豆瓣读书爬取短评，书评，评分，书名，内容，作者简介

留意这里热门短评是会刷新的，只读取5条，每次读取的都有不一样的地方

import requests
from pyquery import PyQuery
headers={"user-agent":
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36"}
url='https://book.douban.com/subject/4913064/'
rep=requests.get(url,headers=headers)
p=PyQuery(rep.text)
with open("1.txt",'a+',encoding='utf-8') as f:
title=p("title").text()
score=p("div #interest_sectl div div strong").text()
f.write(f"书名：{title}\n评分{score}\n")
content=p("div .intro").eq(0)("p").text() #选择第一个class为intro的div下所有的p标签
composer=p("div .intro").eq(1)("p").text()#选择第二个class为intro的div下所有的p标签
f.write(f"内容简介：{content}\n作者简介：{composer}")
comments=p("li p span ").items()
f.write("\n短评：\n")
for i in comments:
comment=i.text()
f.write(f"{comment}\n")
shupin_1=p("div.main-bd ").items()
f.write("书评：\n")
for j in shupin_1:
shupin_2=j("h2 a").text()
shupin_3=j("div div .short-content").text().replace("这篇书评可能有关键情节透露","").replace("... (展开)","")
f.write(f"{shupin_2+shupin_3}\n")

复制代码

免责声明：如果侵犯了您的权益，请联系站长，我们会及时删除侵权内容，谢谢合作！更多信息从访问主页：qidao123.com:ToB企服之家，中国第一个企服评测及商务社交产业平台。

		自动登录	找回密码
密码			立即注册

0 个回复