物联网Python-Python爬虫（selenium的使用，定位元素，层级定位）

尚未崩坏 发表于 2025-1-9 00:52:40

Python----Python爬虫（selenium的使用，定位元素，层级定位）

一、介绍与安装

https://i-blog.csdnimg.cn/direct/6409ec3943f547a9ab03baaa4b40cdd8.jpeg
Selenium是一个Web的主动化测试工具，最初是为网站主动化测试而开辟的，类型像我们玩游戏用的按键精灵，可以按指定的下令主动操作，差别是Selenium 可以直接运行在欣赏器上，它支持全部主流的欣赏器。
Selenium 可以根据我们的指令，让欣赏器主动加载页面，获取必要的数据，甚至页面截屏，或者判断网站上某些动作是否发生。
Selenium 自己不带欣赏器，不支持欣赏器的功能，它必要与第三方欣赏器结合在一起才能使用。
Selenium 官方参考文档：http://selenium-python.readthedocs.io/index.html
pip install selenium -i https://pypi.tuna.tsinghua.edu.cn/simple
注意
selenium操作欣赏器必要驱动(driver)

[*] selenium版本是4.6.0以上，会主动下载
[*] selenium版本是4.6.0以前，必要手动下载
第一个 selenium程序

from selenium.webdriver importEdge
# 创建一个浏览器
edge=Edge()
# 发送请求
edge.get('https://www.baidu.com/')
# 获取HTML
page = edge.page_source
# 打印
print(page)
# 关闭浏览器
edge.quit() 二、控制欣赏器

2.1、最大化窗口

edge.maximize_window()
https://i-blog.csdnimg.cn/direct/1ba0d38e81824abe83fce7a14f9532a7.png
from selenium.webdriver importEdge
import time
# 创建一个浏览器
edge=Edge()
# 发送请求
edge.get('https://www.baidu.com/')
# 最大化窗口
edge.maximize_window()
# 设置等待时间，以便更清晰的观看到
time.sleep(5)
# 关闭浏览器
edge.quit() https://i-blog.csdnimg.cn/direct/2700b044c2b0425d9e12c88c7ce2eb59.png
2.2、设置高与宽

edge.set_window_size(500,500)
https://i-blog.csdnimg.cn/direct/78115cf7ad5f4238a53b0f4d6d8048c0.png
https://i-blog.csdnimg.cn/direct/9b566a4590bc4db2bb2fa6a5b6825163.png
2.3、欣赏器进步退却

进步

edge.forward()
https://i-blog.csdnimg.cn/direct/cbe4938e69eb475fad363aa7858cc4c5.png
退却

edge.back()
https://i-blog.csdnimg.cn/direct/b14ef636d64c4513b6cc6fb029b69adf.png
进步退却一体化展示

设计思路：
1.从某网站到csdn
2.从csdn到某网站
3.从某网站再回到csdn
from selenium.webdriver importEdge
import time
# 创建一个浏览器
edge=Edge()
# 发送请求
edge.get('https://www.baidu.com/')
# 设置等待时间
time.sleep(3)
# 请求到另一个网址
edge.get('https://www.csdn.net/?spm=1001.2014.3001.4476')
# 设置等待时间，以便更清晰的观看到
time.sleep(3)
# 后退
edge.back()
# 设置等待时间
time.sleep(5)
# 前进
edge.forward()
# 设置等待时间
time.sleep(5)
# 关闭浏览器
edge.quit() 三、定位元素

对象的定位应是主动化的核心。要操作一个对象，首先必须辨认它。对象就像一个人一样，具有多种特性（属性），比方可以通过身份证号、姓名，或居住街道、楼层和门牌号来查找这个人。 Selenium定位元素的过程也可以鉴戒这一点。
3.1、对象定位

webdriver提供了对象定位方法
        find_element(type,value)
        查找并返回页面上的第一个符合条件的单一元素。
https://i-blog.csdnimg.cn/direct/48825bc7031e41d3a04b6960ed259909.png
        find_elements(type,value)
        查找并返回页面上全部符合条件的元素，返回一个元素列表。
https://i-blog.csdnimg.cn/direct/9fce6300097e477ead9a6df6b4e21397.png

[*]type: 定位方式（定位器），可选值包括：

[*]By.ID：通过元素的 ID 定位。
[*]By.NAME：通过元素的 NAME 属性定位。
[*]By.CLASS_NAME：通过元素的类名定位。
[*]By.TAG_NAME：通过元素的标签名定位。
[*]By.LINK_TEXT：通过链接文本定位。
[*]By.PARTIAL_LINK_TEXT：通过部分链接文本定位。
[*]By.XPATH：通过 XPath 表达式定位。
[*]By.CSS_SELECTOR：通过 CSS 选择器定位。

[*]value: 定位值，依靠于上述选择器类型。https://i-blog.csdnimg.cn/direct/c007ad5730d64a47b3a4ec8eeaea9588.png
3.2、操作元素

在 Selenium 中，操作元素是主动化测试的核心部分。操作元素通常包括点击、输入文本、获取文本、扫除文本、拖动等。

[*] click 点击对象
[*] send_keys 在对象上模拟按键输入
[*] clear 扫除对象的内容，如果可以的话
1. 点击元素
        方法: click()
        用途：模拟用户点击某个元素（如按钮、链接等）。
https://i-blog.csdnimg.cn/direct/2026db7e53584045959513d8318eb782.png
2. 输入文本
        方法: send_keys()
        用途：在输入框中输入文本或字符。
https://i-blog.csdnimg.cn/direct/a19249a6ca95403ba8c4bc832f1fc890.png
3. 扫除输入框内容
        方法: clear()
        用途：扫除输入框或文本地区中的内容。
https://i-blog.csdnimg.cn/direct/392447ad31b442f181a74be4c5fb7308.png
3.3、展示

1.获取输入框input的id元素
https://i-blog.csdnimg.cn/direct/95c4b3453c264e2386b352874243ca6e.png
2.在输入框中输入csdn
https://i-blog.csdnimg.cn/direct/7351a150aeb1415f90dec5e4c1d8fb7e.png
3.加入点击事件，使得网页进入csdn网页
https://i-blog.csdnimg.cn/direct/78b66bc34df9485c91eb671e3208787e.png
https://i-blog.csdnimg.cn/direct/bd59027aca094a04811996dfa9f2e1a8.png
from selenium.webdriver.common.by import By
from selenium.webdriver importEdge
import time
# 创建一个浏览器
edge=Edge()
# 发送请求
edge.get('https://www.baidu.com/')
# 设置等待时间
time.sleep(3)
# 找到id并输入csdn
edge.find_element(By.ID,"kw").send_keys('csdn')
time.sleep(1)
# 找到搜索并加入点击事件
edge.find_element(By.ID,"su").click()
# 设置等待时间
time.sleep(3)
# 关闭浏览器
edge.quit() 四、定位下拉菜单

在爬取数据时，有时数据太多，而官网提供了筛选功能select标签，像如许的数据，我们只必要定位元素，点击即可
HTML界面
https://i-blog.csdnimg.cn/direct/eb44f088d758437898833f59b9616ffe.png
元素定位
https://i-blog.csdnimg.cn/direct/aacefcc080894dfb841c6b67ceb6ba26.png
<html>
<head>
   <meta http-equiv="content-type" content="text/html;charset=utf-8" />
   <title>Level Locate</title>
   <script type="text/javascript" src="https://cdn.jsdelivr.net/npm/jquery@1.12.4/dist/jquery.min.js"></script>
   <link href="https://cdn.jsdelivr.net/npm/@bootcss/v3.bootcss.com@1.0.9/dist/css/bootstrap.min.css" rel="stylesheet" />
</head>
<body>
   <h3>Level locate</h3>
   <div class="span3 col-md-3">
         <div class="well">
            <div class="dropdown">
               <a class="dropdown-toggle" data-toggle="dropdown" href="#">Link1</a>
               <ul class="dropdown-menu" role="menu" aria-labelledby="dLabel" id="dropdown1" >
                     <li><a tabindex="-1" href="https://www.csdn.net/?spm=1001.2014.3001.4476">Action</a></li>
                     <li><a tabindex="-1" href="#">Another action</a></li>
                     <li><a tabindex="-1" href="#">Something else here</a></li>
                     <li class="divider"></li>
                     <li><a tabindex="-1" href="#">Separated link</a></li>
               </ul>
            </div>
         </div>
   </div>
   <div class="span3 col-md-3">
         <div class="well">
            <div class="dropdown">
               <a class="dropdown-toggle" data-toggle="dropdown" href="#">Link2</a>
               <ul class="dropdown-menu" role="menu" aria-labelledby="dLabel" >
                     <li><a tabindex="-1" href="#">Action</a></li>
                     <li><a tabindex="-1" href="#">Another action</a></li>
                     <li><a tabindex="-1" href="#">Something else here</a></li>
                     <li class="divider"></li>
                     <li><a tabindex="-1" href="#">Separated link</a></li>
               </ul>
            </div>
         </div>
   </div>
</body>
<script src="https://cdn.jsdelivr.net/npm/@bootcss/v3.bootcss.com@1.0.9/dist/js/bootstrap.min.js"></script></html> from selenium.webdriver.common.by import By
from selenium.webdriver importEdge
import time
# 创建一个浏览器
edge=Edge()
# 发送请求
edge.get(r'test01.html')
# 设置等待时间
time.sleep(3)
# 找到class属性
edge.find_element(By.CLASS_NAME,"dropdown-toggle").click()
time.sleep(1)
# 找到要移动的位置加入点击事件
edge.find_element(By.LINK_TEXT,'Action').click()
# 设置等待时间
time.sleep(3)
# 关闭浏览器
edge.quit() 五、层级定位

https://i-blog.csdnimg.cn/direct/910120f272ed431bb951beb0e60edf5f.png
在当代 Web 应用中，常常会使用框架（iframe）或新窗口，这可能会给元素定位带来挑衅。纵然定位器精确，如果元素位于一个框架内，也可能无法乐成定位。这时，就必要查抄元素是否在框架中。
Selenium WebDriver 提供了 switch_to.frame() 方法，可以轻松切换到指定框架，从而解决问题。
switch_to.frame()
https://i-blog.csdnimg.cn/direct/ff323feb8e684e09a1750dc4cec2b5f2.png
1.界面展示
https://i-blog.csdnimg.cn/direct/531f7bd9682b4fe0aba4e2243b13d0a4.png inner.html
<html>
<head>
<meta http-equiv="content-type" content="text/html;charset=utf-8" />
<title>inner</title>
</head>
<body>
<div class="row-fluid">
   <div class="span6 well">
         <h3>inner</h3><iframe id="f2" src="https://www.csdn.net/?spm=1001.2014.3001.4476" width="1400" height="1100"></iframe>
   </div>
</div>
</body>
</html>outer.html
<html>
<head>
<meta http-equiv="content-type" content="text/html;charset=utf-8" />
<title>frame</title>
<script type="text/javascript" src="https://cdn.jsdelivr.net/npm/jquery@1.12.4/dist/jquery.min.js"></script>
<link href="http://netdna.bootstrapcdn.com/twitter-bootstrap/2.3.2/css/bootstrap-combined.min.css"
   rel="stylesheet" />
</head>

<body>
<div class="row-fluid">
   <div class="span10 well">
         <h3>frame</h3><iframe id="f1" src="inner.html" width="1500" , height="1200"></iframe>
   </div>
</div>
</body>
<script src="https://cdn.jsdelivr.net/npm/@bootcss/v3.bootcss.com@1.0.8/dist/js/bootstrap.min.js"></script></html>
</html> from selenium.webdriver.common.by import By
from selenium.webdriver importEdge
import time
# 创建一个浏览器
edge=Edge()
# 发送请求
edge.get(r'outer.html')
# 设置等待时间
time.sleep(3)
# 切换frame
edge.switch_to.frame('f1')
edge.switch_to.frame('f2')
# 设置等待时间
time.sleep(3)
# 定位元素，输入要搜索的内容
edge.find_element(By.ID,'toolbar-search-input').send_keys('蹦蹦跳跳真可爱589')
# 设置等待时间
time.sleep(3)
# 定位按钮，点击搜索
edge.find_element(By.ID,'toolbar-search-button').click()
# 设置等待时间
time.sleep(3)
# 关闭浏览器
edge.quit()
https://i-blog.csdnimg.cn/direct/62432b891c374c718fffc2893b050b42.png

免责声明：如果侵犯了您的权益，请联系站长，我们会及时删除侵权内容，谢谢合作！更多信息从访问主页：qidao123.com:ToB企服之家，中国第一个企服评测及商务社交产业平台。

页: [1]

IT评测·应用市场-qidao123.com技术社区's Archiver

Python----Python爬虫（selenium的使用，定位元素，层级定位）