当前位置：首页 > article >正文

python 使用seleniumwire获取响应数据以及请求参数

article 2024/11/15 13:39:11

seleniumwire 是一个在 Selenium WebDriver 基础上扩展的库，它允许你在使用 Selenium 进行网页自动化测试或爬虫时捕获和修改 HTTP 请求和响应。这对于需要分析网页数据或进行更复杂的网络交互的自动化任务特别有用。
以下是如何使用 seleniumwire 来获取响应数据的步骤：

1. 安装 seleniumwire

首先，确保你已经安装了 Selenium。然后，你可以通过 pip 安装 seleniumwire：

pip install seleniumwire

2. 编写代码

使用 seleniumwire 类似于使用 Selenium，但你需要从 seleniumwire 而不是 selenium 导入 WebDriver。
a.获得某一接口的请求参数

from seleniumwire import webdriver
from selenium.webdriver.chrome.service import Service
import urllib.parse

# 设置 WebDriver 的路径（如果使用的是 ChromeDriver）
# 注意：根据你的系统环境，这里的路径可能需要调整
executable_path = 'path/to/your/chromedriver'
print(executable_path)
service = Service(executable_path=executable_path)

# 初始化 WebDriver
driver = webdriver.Chrome(service=service)

# 访问一个网页
driver.get('http://example.com')

# 获取请求数据
for request in driver.requests:
    if '你的接口url' in request.url:#if 'https://fp.tongdun.net/web3_8/profile.json?' in request.url:
        params_ = dict(
            urllib.parse.parse_qsl(urllib.parse.urlsplit(request.url).query))
        print(params_)


# 获取并打印页面源代码（作为响应体的一部分）


# 关闭浏览器
driver.quit()

a.获得某一接口的响应数据

简单版：

from seleniumwire import webdriver
from selenium.webdriver.chrome.service import Service

import urllib.parse

# 设置 WebDriver 的路径（如果使用的是 ChromeDriver）
# 注意：根据你的系统环境，这里的路径可能需要调整
executable_path = 'path/to/your/chromedriver'
print(executable_path)
service = Service(executable_path=executable_path)

# 初始化 WebDriver
driver = webdriver.Chrome(service=service)

# 访问一个网页
driver.get('http://example.com')

# 获取请求数据
for request in driver.requests:
    if '你的接口url' in request.url:#if 'https://fp.tongdun.net/web3_8/profile.json?' in request.url:
        if request.response:
            print(
                request.url,
                request.response.status_code,
                request.response.headers['Content-Encoding'],#检查响应头中的Content-Encoding字段来确定响应内容是否被什么格式压缩

                request.response.headers['Content-Type'],
                len(request.response.body)#request.response.body.decode('utf-8') 有些情况需要加上编码格式
            )


# 获取并打印页面源代码（作为响应体的一部分）


# 关闭浏览器
driver.quit()

上面简单版有时候会报错：

Traceback (most recent call last):
  File "C:\limeixue\workspace\offical_crawl_hx\hx_offical_crawl_yidun\apps\test\tt.py", line 26, in <module>
    len(request.response.body.decode('utf-8'))#request.response.body.decode('utf-8') 有些情况需要加上编码格式
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

这个时候需要观察 request.response.headers[‘Content-Encoding’]返回的值是否有被压缩过。如果有，假如是：gzip

修改后的代码

from seleniumwire import webdriver
from selenium.webdriver.chrome.service import Service


import gzip
import io


# 设置 WebDriver 的路径（如果使用的是 ChromeDriver）
# 注意：根据你的系统环境，这里的路径可能需要调整
executable_path = 'path/to/your/chromedriver'

service = Service(executable_path=executable_path)

# 初始化 WebDriver
driver = webdriver.Chrome(service=service)

# 访问一个网页
driver.get('http://example.com')

# 获取请求数据
for request in driver.requests:
    if 'https://fp.tongdun.net/web3_8/profile.json?' in request.url:#if 'https://fp.tongdun.net/web3_8/profile.json?' in request.url:
        if request.response:
            Content_Encoding = request.response.headers['Content-Encoding']#检查响应头中的Content-Encoding字段来确定响应内容是否被什么格式压缩
            Content_Type= request.response.headers['Content-Type']
            url = request.url
            status_code = request.response.status_code
            print(status_code,Content_Encoding,Content_Type)
            if Content_Encoding=='gzip':
                # 方法一：
                compressed_stream = io.BytesIO(request.response.body)#request.response.body是二进制
                gzipper = gzip.GzipFile(fileobj=compressed_stream)
                decompressed_data = gzipper.read()


                # 现在您可以使用 decompressed_data 变量中的解压缩数据了
                # 例如，您可以将其打印出来（但请注意，它可能是二进制数据或文本，具体取决于原始内容）
                result = decompressed_data.decode('utf-8', errors='ignore') if decompressed_data else ""
                print(result)

                #方法二：

                decompressed_data = gzip.decompress(request.response.body)  #这样写不用去管request.response.body是不是二进制
                # 处理解压后的数据
                data_as_text = decompressed_data.decode('utf-8')
                print(data_as_text)



# 获取并打印页面源代码（作为响应体的一部分）


# 关闭浏览器
driver.quit()