玫瑰花变蚊子血,自动化无痕浏览器对比测试( 二 )


可以通过原生协程库进行调用,内置函数只需要添加await关键字即可,非常方便,与之相比,原生库并不支持异步模式,必须安装三方扩展才可以 。
最炫酷的是,可以对用户的浏览器操作进行录制,并且可以转换为相应的代码,在终端执行以下命令:
python -m playwright codegen --target python -o 'edge.py' -b chromium --channel=msedge
这里通过命令进行录制,指定浏览器为edge,将所有操作写入edge.py的文件中:
与此同时,也支持移动端的浏览器模拟,比如苹果手机:
from playwright.sync_api import sync_playwrightwith sync_playwright() as p:iphone_13 = p.devices['iPhone 13 Pro']browser = p.webkit.launch(headless=False)page = browser.new_page()page.goto('https://v3u.cn')page.screenshot(path='./v3u-iphone.png')browser.close()
这里模拟的浏览器访问情况 。
当然了,除了UI功能测试,我们当然还需要帮我们干点脏活累活,那就是爬虫:
from playwright.sync_api import sync_playwrightdef extract_data(entry):name = entry.locator("h3").inner_text().strip("\n").strip()capital = entry.locator("span.country-capital").inner_text()population = entry.locator("span.country-population").inner_text()area = entry.locator("span.country-area").inner_text()return {"name": name, "capital": capital, "population": population, "area (km sq)": area}with sync_playwright() as p:# launch the browser instance and define a new contextbrowser = p.chromium.launch()context = browser.new_context()# open a new tab and go to the websitepage = context.new_page()page.goto("https://www.scrapethissite.com/pages/simple/")page.wait_for_load_state("load")# get the countriescountries = page.locator("div.country")n_countries = countries.count()# loop through the elements and scrape the datadata = http://www.kingceram.com/post/[]for i in range(n_countries):entry = countries.nth(i)sample = extract_data(entry)data.append(sample)browser.close()
这里data变量就是抓取的数据内容:
[{'name': 'Andorra', 'capital': 'Andorra la Vella', 'population': '84000', 'area (km sq)': '468.0'},{'name': 'United Arab Emirates', 'capital': 'Abu Dhabi', 'population': '4975593', 'area (km sq)': '82880.0'},{'name': 'Afghanistan', 'capital': 'Kabul', 'population': '29121286', 'area (km sq)': '647500.0'},{'name': 'Antigua and Barbuda', 'capital': "St. John's", 'population': '86754', 'area (km sq)': '443.0'},{'name': 'Anguilla', 'capital': 'The Valley', 'population': '13254', 'area (km sq)': '102.0'},...]
【玫瑰花变蚊子血,自动化无痕浏览器对比测试】基本上,该有的功能基本都有,更多功能请参见官方文档:
曾经是用于网络抓取和网络自动化的最流行的开源无头浏览器工具之一 。在使用进行抓取时,我们可以自动化浏览器、与 UI 元素交互并在 Web 应用程序上模仿用户操作 。的一些核心组件包括 、 IDE 和Grid 。
关于的一些基本操作请移玉步至:.7爬虫:使用带登录并且模拟进行表单上传文件,这里不作过多赘述 。
如同前文提到的,与相比,需要第三方库来实现异步并发执行,同时,如果需要录制动作视频,也需要使用外部的解决方案 。
就像那样,让我们使用构建一个简单的爬虫脚本 。
首先导入必要的模块并配置实例,并且通过设置确保无头模式处于活动状态. = True:
from selenium import webdriverfrom selenium.webdriver.chrome.service import Servicefrom selenium.webdriver.common.by import By# web driver manager: https://github.com/SergeyPirogov/webdriver_manager# will help us automatically download the web driver binaries# then we can use `Service` to manage the web driver's state.from webdriver_manager.chrome import ChromeDriverManagerdef extract_data(row):name = row.find_element(By.TAG_NAME, "h3").text.strip("\n").strip()capital = row.find_element(By.CSS_SELECTOR, "span.country-capital").textpopulation = row.find_element(By.CSS_SELECTOR, "span.country-population").textarea = row.find_element(By.CSS_SELECTOR, "span.country-area").textreturn {"name": name, "capital": capital, "population": population, "area (km sq)": area}options = webdriver.ChromeOptions()options.headless = True# this returns the path web driver downloadedchrome_path = ChromeDriverManager().install()# define the chrome service and pass it to the driver instancechrome_service = Service(chrome_path)driver = webdriver.Chrome(service=chrome_service, options=options)url = "https://www.scrapethissite.com/pages/simple"driver.get(url)# get the data divscountries = driver.find_elements(By.CSS_SELECTOR, "div.country")# extract the datadata = http://www.kingceram.com/post/list(map(extract_data, countries))driver.quit()