当前位置：首页 > article >正文

19爬虫：使用playwright登录超级鹰

article 2025/2/6 7:48:53

本次案例一共解决了如下两个问题：

（1）如何使用playwright截图，特别是验证码图片

（2）在playwright中如何判断一个元素可见

1.截图

我们首先解决第一个问题，截图。如果前期接触过selenium的小伙伴应该不难接受使用playwright截图。

在playwright中，截图分为三种情况：整页截图，捕获到缓冲区，元素截图。

整页截图就是捕获屏幕截图并将其保存到文件中，对应的方法为page.screenshot(path=文件保存的路径以及文件名称)，page.screenshot(path='screenshot.png')表示整屏幕截图并将截图命名为screenshot.png，保存在当前程序所在的目录下。

捕获到缓冲区具体是什么意思本人并不清楚，如果有知道的小伙伴麻烦留言，非常的感谢。

元素截图相当于我们常用的区域截图，一般我们可以截取验证码或者滑块所在的图片，对应的方法为page.locator(元素定位表达式).screenshot({ path: 文件保存的路径以及文件名称 })也可以是page.locator(元素定位表达式).screenshot( path=文件保存的路径以及文件名称 )，例如page.locator('xpath=//form/div/img').screenshot(path='screenshot.png')表示定位验证码图片然后进行截图。

2.在playwright中判断一个元素可见

破解验证码，在一定程度上程序可能识别错误。如果输入账号、密码、验证码，网页登录不上，一般会有相应的提示信息，比如账号错误、密码错误、验证码输入错误等提示信息，我们通过定位这些信息是否在网页上可见就可以断言登录是否成功。

判断网页上的某个元素是否可见可以使用is_visible()方法。具体的使用方法为：page.locator(元素定位表达式).is_visible()

3.完整程序代码

'''
    playwright如何下载验证码图片

    主要学习一下如何使用playwright截图

'''

from playwright.sync_api import sync_playwright
import ddddocr

def handle_code():
    '''
    该函数用于破解验证码
    '''
    with open('screenshot.png', 'rb') as f:
        ocr = ddddocr.DdddOcr(show_ad=False)
        code = ocr.classification(f.read())
    # code 的类型时字符串样式
    return code

with sync_playwright() as p:
    browser = p.chromium.launch(headless=False)
    context = browser.new_context()
    page = context.new_page()

    page.goto('https://www.chaojiying.com/user/login/')
    page.locator('xpath=/html/body/div[3]/div/div[3]/div[1]/form/div/img').screenshot(path='screenshot.png')

    # with open('screenshot.png', 'rb') as f:
    #     ocr = ddddocr.DdddOcr(show_ad=False)
    #     code = ocr.classification(f.read())
    # print(code,type(code))

    page.locator('xpath=/html/body/div[3]/div/div[3]/div[1]/form/p[1]/input').type('账号',delay=1000)
    page.locator('xpath=/html/body/div[3]/div/div[3]/div[1]/form/p[2]/input').type('密码',delay=1000)
    page.locator('xpath=/html/body/div[3]/div/div[3]/div[1]/form/p[3]/input').type(handle_code(),delay=1000)
    page.locator('xpath=/html/body/div[3]/div/div[3]/div[1]/form/p[4]/input').click()

    # 如果登录不成功，使用while循环破解验证码，输入验证码，判断“验证码”错误字样是否在网页上出现
    flag = page.locator('xpath=/html/body/div[3]/div/div[1]/span/font').is_visible() # 验证码错误字样是否出现，如果flag为真则登录失败
    while flag:
        print('验证失败，重新验证')
        page.locator('xpath=/html/body/div[3]/div/div[3]/div[1]/form/div/img').screenshot(path='screenshot.png')
        page.locator('xpath=/html/body/div[3]/div/div[3]/div[1]/form/p[3]/input').type(handle_code(),delay=1000)
        page.locator('xpath=/html/body/div[3]/div/div[3]/div[1]/form/p[4]/input').click()
        flag = page.locator('xpath=/html/body/div[3]/div/div[1]/span/font').is_visible()
    print('登录成功')
    page.wait_for_timeout(1000)
    page.close()
    context.close()
    browser.close()

查看全文

http://www.kler.cn/a/533403.html