Python 网络请求利器:requests 包详解与实战

打印 上一主题 下一主题

主题 1700|帖子 1700|积分 5100

诸神缄默不语-个人技能博文与视频目次

  
一、前言

在举行网络编程或爬虫开辟时,我们经常须要向网页或服务器发送 HTTP 请求,获取数据。这时,requests 包无疑是最受接待、最简便易用的 Python 库之一。
相比原生的 urllib 模块,requests 提供了更人性化的 API,更容易上手,险些成为了网络请求的“标准库”。
本文将介绍 requests 的基本用法、进阶操纵以及常见问题处置惩罚,配合实际代码演示,带你快速把握这个神器!
https://httpbin.org/是一个简单的用来模拟各种HTTP服务请求的网站,以下很多代码示例都会用这个网站的链接来实现。
因为这个网站部署在海外,以是大概会出现网络访问的问题,可以通过部署到当地来解决。部署到当地可以参考官方教程,大概这篇博文:五、接口测试 — Httpbin介绍(请求调试工具) - 知乎
二、安装方式

  1. pip install requests
复制代码
三、基本使用

关于get请求和post请求的区别请参考我撰写的另一篇博文:Web应用中的GET与POST请求详解
1. 发起 GET 请求

  1. import requests
  2. response = requests.get('https://httpbin.org/get')
  3. print(response.status_code)      # 状态码
  4. print(response.text)             # 响应内容(字符串)
  5. print(response.json())           # 如果是 JSON,解析成字典
复制代码
2. 发起 POST 请求

  1. payload = {'username': 'test', 'password': '123456'}
  2. response = requests.post('https://httpbin.org/post', data=payload)
  3. print(response.json())
复制代码
四、requests请求调用常用参数

1. URL

就是第一个参数,网站的链接地点
2. 数据data

请求携带的数据。
如果值是字符串或字节流,默认不设置Content-Type会设置。
如果值是字典、元组构成的列表或列表对象,会默认Content-Type会设置为application/x-www-form-urlencoded,也就是HTML表单形式的键值对数据。(对Content-Type的详细介绍请见下一节headers参数)
  1. import requests
  2. import json
  3. payload = {"key1": "value1", "key2": "value2"}
  4. # String payload in json format
  5. r = requests.post("https://httpbin.org/post", data="a random sentence")
  6. print(r.json())
  7. print(r.json()["headers"].get("Content-Type","None"))
  8. # String payload in json format
  9. r = requests.post("https://httpbin.org/post", data=json.dumps(payload))
  10. print(r.json())
  11. print(r.json()["headers"].get("Content-Type","None"))
  12. # String payload in json content type
  13. r = requests.post(
  14.     "https://httpbin.org/post",
  15.     data=json.dumps(payload),
  16.     headers={"Content-Type": "application/json"},
  17. )
  18. print(r.json())
  19. print(r.json()["headers"].get("Content-Type","None"))
  20. # Dictionary payload
  21. r = requests.post("https://httpbin.org/post", data=payload)
  22. print(r.json())
  23. print(r.json()["headers"].get("Content-Type","None"))
  24. # List of tuples payload
  25. payload_tuples = [("key1", "value1"), ("key2", "value2")]
  26. r = requests.post("https://httpbin.org/post", data=payload_tuples)
  27. print(r.json())
  28. print(r.json()["headers"].get("Content-Type","None"))
  29. # Bytes payload
  30. payload_bytes = "key1=value1&key2=value2".encode("utf-8")
  31. r = requests.post("https://httpbin.org/post", data=payload_bytes)
  32. print(r.json())
  33. print(r.json()["headers"].get("Content-Type","None"))
复制代码
3. 请求头 headers

一样平常会携带请求的Content-Type、体系信息(如使用的设备、编码方式等)、认证信息、时间戳等
  1. headers = {'User-Agent': 'MyUserAgent/1.0'}
  2. response = requests.get('https://httpbin.org/headers', headers=headers)
  3. print(response.json())
复制代码
Content-Type的常见范例:

(图源1)
4. 参数 params

这个在get请求中的效果就类似于直接在URL后面加?k=v
  1. params = {'q': 'python'}
  2. response = requests.get('https://httpbin.org/get', params=params)
  3. print(response.url)  # 实际请求的完整 URL
复制代码
输出:https://httpbin.org/get?q=python
5. 超时时间 timeout

  1. response = requests.get('https://httpbin.org/delay/3', timeout=2)
复制代码
  如果超过2秒没相应,会抛出 requests.exceptions.Timeout 异常。
  6. 文件上传 file:上传纯文本文件流

  1. files = {'file': open('test.txt', 'rb')}
  2. response = requests.post('https://httpbin.org/post', files=files)
  3. print(response.text)
复制代码
↑ 须要注意的是虽然file参数确实可以直接这么传文件流……但我没咋见过真这么干的。
一样平常纯文本不用file传,一样平常都直接塞data里面带过去。
非纯文本文件流(二进制字节流),我一样平常看比较多的传输方式是把字节流转换为base64编码塞到data里带。用base64编码的代码可参考我写的另一篇博文:深入理解 Python 的 base64 模块
(不过说实话直接用file参数传文件流好像实际上背后也颠末了base64编码-解码的过程,但是大家都这么干肯定有大家的原理)
7. json

用json参数传JSON对象(在Python 3中表现为字典对象)就相当于用data参数传JSON对象、然后表现设置Content-Type为application/json
  1. payload = {'id': 1, 'name': 'chatgpt'}
  2. response = requests.post('https://httpbin.org/post', json=payload)
  3. print(response.json())
复制代码
上面这个请求和下面这个请求是一样的:
  1. response = requests.post(
  2.     "https://httpbin.org/post",
  3.     data=json.dumps(payload),
  4.     headers={"Content-Type": "application/json"},
  5. )
  6. print(response.json())
复制代码
作为对比可以看看别的两种请求参数格式的效果(可以注意到第一种写法返回的data和json值好歹还是一样的,第二种写法的话对象就放到form里了,因为是以表单对象形式来解析的):
  1. response = requests.post(
  2.     "https://httpbin.org/post",
  3.     data=json.dumps(payload)
  4. )
  5. print(response.json())
  6. response = requests.post(
  7.     "https://httpbin.org/post",
  8.     data=payload
  9. )
  10. print(response.json())
复制代码
五. 相应的属性和函数

1. 属性:headers、cookies、编码格式

  1. r = requests.get('https://httpbin.org/get')
  2. print(r.headers)
  3. print(r.cookies)
  4. print(r.encoding)
复制代码
2. 异常处置惩罚:raise_for_status()

如果status_code不是200就报错
六、Session 会话对象(保持登录态)

requests.Session() 可以模拟保持会话,得当须要登录认证的网站。
  1. s = requests.Session()
  2. s.post('https://httpbin.org/cookies/set', data={'cookie': 'value'})
  3. response = s.get('https://httpbin.org/cookies')
  4. print(response.text)
复制代码
七、进阶用法

1. 上传压缩文件


  • gzip实现
    1. import requests
    2. import gzip
    3. import json
    4. data = json.dumps({'key': 'value'}).encode('utf-8')
    5. compressed_data = gzip.compress(data)
    6. headers = {'Content-Encoding': 'gzip'}
    7. response = requests.post('https://httpbin.dev/api', data=compressed_data, headers=headers)
    8. response.raise_for_status()
    9. print("Gzip Compressed Request Status:", response.status_code)
    复制代码
  • brotli实现
    1. import requests
    2. import brotli
    3. data = json.dumps({'key': 'value'}).encode('utf-8')
    4. compressed_data = brotli.compress(data)
    5. headers = {'Content-Encoding': 'br'}
    6. response = requests.post('https://httpbin.dev/api', data=compressed_data, headers=headers)
    7. response.raise_for_status()
    8. print("Brotli Compressed Request Status:", response.status_code)
    复制代码
2. 并发


  • httpx实现(泉源于Concurrency vs Parallelism)
    1. import asyncio
    2. import httpx
    3. import time
    4. # Asynchronous function to fetch the content of a URL
    5. async def fetch(url):
    6.     async with httpx.AsyncClient(timeout=10.0) as client:
    7.         response = await client.get(url)
    8.         return response.text
    9. # Concurrently fetch multiple URLs using asyncio.gather
    10. async def concurrent_fetch(urls):
    11.     tasks = [fetch(url) for url in urls]
    12.     return await asyncio.gather(*tasks)
    13. # Synchronous version to demonstrate performance difference
    14. def sync_fetch(urls):
    15.     results = []
    16.     for url in urls:
    17.         response = httpx.get(url)
    18.         results.append(response.text)
    19.     return results
    20. def run_concurrent():
    21.     urls = ["http://httpbin.org/delay/2"] * 100  # Use the same delay for simplicity
    22.     start_time = time.time()
    23.     # Running fetch requests concurrently
    24.     asyncio.run(concurrent_fetch(urls))
    25.     duration = time.time() - start_time
    26.     print(f"Concurrent fetch completed in {duration:.2f} seconds")
    27. def run_sync():
    28.     urls = ["http://httpbin.org/delay/2"] * 100  # Use the same delay for simplicity
    29.     start_time = time.time()
    30.     # Running fetch requests synchronously
    31.     sync_fetch(urls)
    32.     duration = time.time() - start_time
    33.     print(f"Synchronous fetch completed in {duration:.2f} seconds")
    34. if __name__ == "__main__":
    35.     print("Running concurrent version:")
    36.     # Concurrent fetch completed in 2.05 seconds
    37.     run_concurrent()
    38.     print("Running synchronous version:")
    39.     # Synchronous fetch completed in 200.15 seconds
    40.     run_sync()
    复制代码
  • threading实现
    1. import threading
    2. import requests
    3. def post_data(data):
    4.     requests.post('https://httpbin.dev/api', json=data)
    5. # Sample data list
    6. data_list = [{'name': 'User1'}, {'name': 'User2'}]
    7. threads = []
    8. for data in data_list:
    9.     thread = threading.Thread(target=post_data, args=(data,))
    10.     threads.append(thread)
    11.     thread.start()
    12. for thread in threads:
    13.     thread.join()
    复制代码
关于并发的相关知识也可以参考我写的另一篇博文:Python中的并发与并行
七、常见异常

1. requests.exceptions.JSONDecodeError

如果response带的报文不是JSON,还调用response.json()函数,会报requests.exceptions.JSONDecodeError错误,完整的报错信息类似如许:
  1. Traceback (most recent call last):
  2.   File "myenv_path\Lib\site-packages\requests\models.py", line 974, in json
  3.     return complexjson.loads(self.text, **kwargs)
  4.            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  5.   File "myenv_path\Lib\json\__init__.py", line 346, in
  6. loads
  7.     return _default_decoder.decode(s)
  8.            ^^^^^^^^^^^^^^^^^^^^^^^^^^
  9.   File "myenv_path\Lib\json\decoder.py", line 337, in decode
  10.     obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  11.                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  12.   File "myenv_path\Lib\json\decoder.py", line 355, in raw_decode
  13.     raise JSONDecodeError("Expecting value", s, err.value) from None
  14. json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
  15. During handling of the above exception, another exception occurred:
  16. Traceback (most recent call last):
  17.   File "tryrequests1.py", line 6, in <module>
  18.     print(response.json())           # 如果是 JSON,解析成字典
  19.           ^^^^^^^^^^^^^^^
  20.   File "myenv_path\Lib\site-packages\requests\models.py", line 978, in json
  21.     raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)
  22. requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
复制代码
2. requests.exceptions.Timeout

期待请求返回结果的时长超过了timeout参数设置的时长。
3. requests.exceptions.ProxyError: HTTPSConnectionPool

访问URL失败。
偶尔候网络服务不稳定是临时的,直接重试频频就行。重试的计谋可以参考我撰写的另一篇博文:Python3:在访问不可靠服务时的重试计谋(持续更新ing…)
一个典型的由于临时的网络不稳定而产生的访问失败报错输出全文:
  1. Traceback (most recent call last):
  2.   File "myenv_path\Lib\site-packages\urllib3\connectionpool.py", line 789, in urlopen
  3.     response = self._make_request(
  4.                ^^^^^^^^^^^^^^^^^^^
  5.   File "myenv_path\Lib\site-packages\urllib3\connectionpool.py", line 536, in _make_request
  6.     response = conn.getresponse()
  7.                ^^^^^^^^^^^^^^^^^^
  8.   File "myenv_path\Lib\site-packages\urllib3\connection.py", line 507, in getresponse
  9.     httplib_response = super().getresponse()
  10.                        ^^^^^^^^^^^^^^^^^^^^^
  11.   File "myenv_path\Lib\http\client.py", line 1374, in getresponse
  12.     response.begin()
  13.   File "myenv_path\Lib\http\client.py", line 318, in begin
  14.     version, status, reason = self._read_status()
  15.                               ^^^^^^^^^^^^^^^^^^^
  16.   File "myenv_path\Lib\http\client.py", line 287, in _read_status
  17.     raise RemoteDisconnected("Remote end closed connection without"
  18. http.client.RemoteDisconnected: Remote end closed connection without response
  19. The above exception was the direct cause of the following exception:
  20. urllib3.exceptions.ProxyError: ('Unable to connect to proxy', RemoteDisconnected('Remote end closed connection without response'))
  21. The above exception was the direct cause of the following exception:
  22. Traceback (most recent call last):
  23.   File "myenv_path\Lib\site-packages\requests\adapters.py", line 667, in send
  24.     resp = conn.urlopen(
  25.            ^^^^^^^^^^^^^
  26.   File "myenv_path\Lib\site-packages\urllib3\connectionpool.py", line 843, in urlopen
  27.     retries = retries.increment(
  28.               ^^^^^^^^^^^^^^^^^^
  29.   File "myenv_path\Lib\site-packages\urllib3\util\retry.py", line 519, in increment
  30.     raise MaxRetryError(_pool, url, reason) from reason  # type: ignore[arg-type]
  31.     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  32. urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='httpbin.org', port=443): Max retries exceeded with url: /cookies (Caused by ProxyError('Unable to connect to proxy', RemoteDisconnected('Remote end
  33. closed connection without response')))
  34. During handling of the above exception, another exception occurred:
  35. Traceback (most recent call last):
  36.   File "tryrequests1.py", line 5, in <module>
  37.     response = s.get('https://httpbin.org/cookies')
  38.                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  39.   File "myenv_path\Lib\site-packages\requests\sessions.py", line 602, in get
  40.     return self.request("GET", url, **kwargs)
  41.            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  42.   File "myenv_path\Lib\site-packages\requests\sessions.py", line 589, in request
  43.     resp = self.send(prep, **send_kwargs)
  44.            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  45.   File "myenv_path\Lib\site-packages\requests\sessions.py", line 703, in send
  46.     r = adapter.send(request, **kwargs)
  47.         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  48.   File "myenv_path\Lib\site-packages\requests\adapters.py", line 694, in send
  49.     raise ProxyError(e, request=request)
  50. requests.exceptions.ProxyError: HTTPSConnectionPool(host='httpbin.org', port=443): Max retries exceeded
  51. with url: /cookies (Caused by ProxyError('Unable to connect to proxy', RemoteDisconnected('Remote end closed connection without response')))
复制代码
八、实战案例:爬取豆瓣电影 Top250(示例)

  1. import requests
  2. from bs4 import BeautifulSoup
  3. headers = {'User-Agent': 'Mozilla/5.0'}
  4. for start in range(0, 250, 25):
  5.     url = f'https://movie.douban.com/top250?start={start}'
  6.     r = requests.get(url, headers=headers)
  7.     soup = BeautifulSoup(r.text, 'html.parser')
  8.     titles = soup.find_all('span', class_='title')
  9.     for title in titles:
  10.         print(title.text)
复制代码
本文撰写过程中参考的其他网络资料


  • What is the difference between the ‘json’ and ‘data’ parameters in Requests? | WebScraping.AI
  • python requests.post() 请求中 json 和 data 的区别 - 小嘉欣 - 博客园
  • Python requests.post()方法中data和json参数的使用_requests.post中data和json是否可以同时设置-CSDN博客


   

  • Python requests POST ↩︎

免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作!更多信息从访问主页:qidao123.com:ToB企服之家,中国第一个企服评测及商务社交产业平台。

本帖子中包含更多资源

您需要 登录 才可以下载或查看,没有账号?立即注册

x
回复

使用道具 举报

0 个回复

倒序浏览

快速回复

您需要登录后才可以回帖 登录 or 立即注册

本版积分规则

不到断气不罢休

论坛元老
这个人很懒什么都没写!
快速回复 返回顶部 返回列表