之前已经实现了:
【已解决】Mac中用Selenium自动操作浏览器实现百度搜索
现在继续换用:puppeteer去实现同样效果。
先去看如何搭建环境:
【已解决】Mac中初始化搭建Python版puppeteer的pyppeteer的开发环境
然后再去看,如何定位元素:
【已解决】pyppeteer中如何定位查找百度搜索输入框
搞懂基本操作逻辑后,后续即可方便的实现后续逻辑了。
接下来是,触发搜索
方式1:点击 百度一下 按钮
方式2:输入回车键
分别去试试
【已解决】pyppeteer中如何模拟输入回车键
以及试试:定位 百度一下 按钮,并点击
先去看看元素

<input type="submit" id="su" value="百度一下" class="bg s_btn">
去试试:
SearchButtonSelector = "input[id='su']"
searchButtonElem = await page.querySelector(SearchButtonSelector)
print("searchButtonElem=%s" % searchButtonElem)
searchButtonElem.click()但是没效果
看到警告:
/Users/crifan/dev/dev_root/python/puppeteerBaiduSearch/puppeteerBaiduSearch.py:50: RuntimeWarning: coroutine 'ElementHandle.click' was never awaited searchButtonElem.click()
所以要去加上await
await searchButtonElem.click()
结果:

就可以了。
目前代码是:
################################################################################
# Trigger search
################################################################################
# Method 1: press ENTER key
# await page.keyboard.press('Enter')
# Method 2: locator search button then click
SearchButtonSelector = "input[id='su']"
searchButtonElem = await page.querySelector(SearchButtonSelector)
print("searchButtonElem=%s" % searchButtonElem)
await searchButtonElem.click()即可。
另外看到
await (await page.$('input[type="text"]')).press('Enter'); // Enter Key去试试
await searchButtonElem.press("Enter")结果:

也是可以的。
即,点击某个按钮,可以用:
用该元素的click
await searchButtonElem.click()
也可以用:
用该元素去触发回车键:
await searchButtonElem.press("Enter")至此,已经可以触发搜索了。
接着再去看看,如何提取搜索结果标题和链接:
【已解决】pyppeteer中提取百度搜索结果中的信息
最终代码:
# Function: pyppeteer (python version puppeteer) do baidu search
# Author: Crifan Li
# Update: 20210330
import asyncio
from pyppeteer import launch
async def main():
browser = await launch(headless=False)
page = await browser.newPage()
await page.setJavaScriptEnabled(enabled=True)
baiduUrl = "https://www.baidu.com"
await page.goto(baiduUrl)
# await page.screenshot({'path': 'baidu.png'})
################################################################################
# Input text
################################################################################
searchStr = "crifan"
# SearchInputSelector = "input[id=kw]"
SearchInputSelector = "input[id='kw']"
# SearchInputXpath = "//input[@id='kw']"
# searchInputElem = page.xpath(SearchInputXpath)
# # Input method 1: selector + click + keyboard type
# searchInputElem = await page.querySelector(SearchInputSelector)
# print("searchInputElem=%s" % searchInputElem)
# await searchInputElem.click()
# await page.keyboard.type(searchStr)
# Input method 2: focus then type
# await page.focus(SearchInputSelector)
# await page.keyboard.type(searchStr)
# Input method 3: selector and input once using type
await page.type(SearchInputSelector, searchStr, delay=20)
################################################################################
# Trigger search
################################################################################
# Method 1: press ENTER key
await page.keyboard.press('Enter')
# # Method 2: locator search button then click
# SearchButtonSelector = "input[id='su']"
# searchButtonElem = await page.querySelector(SearchButtonSelector)
# print("searchButtonElem=%s" % searchButtonElem)
# await searchButtonElem.click()
# # await searchButtonElem.press("Enter")
################################################################################
# Wait page reload complete
################################################################################
SearchFoundWordsSelector = 'span.nums_text'
SearchFoundWordsXpath = "//span[@class='nums_text']"
# await page.waitForSelector(SearchFoundWordsSelector)
# await page.waitFor(SearchFoundWordsSelector)
# await page.waitForXPath(SearchFoundWordsXpath)
# Note: all above exception: 发生异常: ElementHandleError Evaluation failed: TypeError: MutationObserver is not a constructor
# so change to following
# # Method 1: just wait
# await page.waitFor(2000) # millisecond
# Method 2: wait element showing
SingleWaitSeconds = 1
while not await page.querySelector(SearchFoundWordsSelector):
print("Still not found %s, wait %s seconds" % (SearchFoundWordsSelector, SingleWaitSeconds))
await asyncio.sleep(SingleWaitSeconds)
# pass
################################################################################
# Extract result
################################################################################
resultASelector = "h3[class^='t'] a"
searchResultAList = await page.querySelectorAll(resultASelector)
# print("searchResultAList=%s" % searchResultAList)
searchResultANum = len(searchResultAList)
print("Found %s search result:" % searchResultANum)
for curIdx, aElem in enumerate(searchResultAList):
curNum = curIdx + 1
print("%s [%d] %s" % ("-"*20, curNum, "-"*20))
aTextJSHandle = await aElem.getProperty('textContent')
# print("type(aTextJSHandle)=%s" % type(aTextJSHandle))
# type(aTextJSHandle)=<class 'pyppeteer.execution_context.JSHandle'>
# print("aTextJSHandle=%s" % aTextJSHandle)
# aTextJSHandle=<pyppeteer.execution_context.JSHandle object at 0x10309c9b0>
title = await aTextJSHandle.jsonValue()
# print("type(title)=%s" % type(title))
# type(title)=<class 'str'>
print("title=%s" % title)
baiduLinkUrl = await (await aElem.getProperty("href")).jsonValue()
print("baiduLinkUrl=%s" % baiduLinkUrl)
await browser.close()
asyncio.get_event_loop().run_until_complete(main())输出:
Still not found span.nums_text, wait 1 seconds Found 10 search result: -------------------- [1] -------------------- title=在路上on the way - 走别人没走过的路,让别人有路可走 baiduLinkUrl=http://www.baidu.com/link?url=eGTzEXXlMw-hnvXYSFk8t4VSZPck1dougn7YhfCwBf3ZzGJEHdZYsoAQK-4GBJuP -------------------- [2] -------------------- title=crifan – 在路上 baiduLinkUrl=http://www.baidu.com/link?url=l6jXejlgARrWj34ODgKWZ9BeNKwyYZLRhLb5B8oDFVqNpHoco8a_qbAdD1m-t_cf -------------------- [3] -------------------- title=crifan简介_crifan的专栏-CSDN博客_crifan baiduLinkUrl=http://www.baidu.com/link?url=IIqPM5wuVE_QP7S357-1bJWGGU1kpFcAZ945BaXUQNpaDzXihf_98wAVi05Gk6-8Qu4aGLv2Rv65WJm6Qr5kk_ -------------------- [4] -------------------- title=crifan的微博_微博 baiduLinkUrl=http://www.baidu.com/link?url=NnqeMlu4Jr_Ld-zoui8pbQO4eRMMO9pLd_DHXagqcdZ46NF4CSuyEziKSTpqCNEi -------------------- [5] -------------------- title=Crifan的电子书大全 | crifan.github.io baiduLinkUrl=http://www.baidu.com/link?url=uOZ-AmgNBNr3mGdETezIjTvtedH_ueM6-LNOc2QxbjcNeS8LuVBY-kirwogX7qLl -------------------- [6] -------------------- title=GitHub - crifan/crifanLib: crifan's library baiduLinkUrl=http://www.baidu.com/link?url=t42I1rYfn32DGw9C6cw_5lB-z1worKzEuROOtWj-Jyf1l2IBNBcz-l85hSKv9s9T -------------------- [7] -------------------- title=在路上www.crifan.org - 网站排行榜 baiduLinkUrl=http://www.baidu.com/link?url=WwLwfXA72vK08Obyx2hwqA3-wmq8jAisi4VVSt2R0Ml3ccCy_yxeYfxD2xouAX-i5AyUU1U_2EghwVbJ2p-ipa -------------------- [8] -------------------- title=crifan的专栏_crifan_CSDN博客-crifan领域博主 baiduLinkUrl=http://www.baidu.com/link?url=Cmcn2mXwiZr87FBGQBq85Np0hgGTP_AK2yLUW6GDeA21r7Q5WvUOUjaKZo5Jhb0f -------------------- [9] -------------------- title=User crifan - Stack Overflow baiduLinkUrl=http://www.baidu.com/link?url=yGgsq1z2vNDAAeWY-5VDWbHv7e7zPILHI4GVFPZd6MaFrGjYHsb3Onir1Vi6vvZqD7QAGJrZehIYZpcBfh_Gq_ -------------------- [10] -------------------- title=crifan - Bing 词典 baiduLinkUrl=http://www.baidu.com/link?url=UatxhUBL3T_1ikPco5OazvJaWkVqCeCHh4eoA6AX_lP4t_Bx3GVHlMHZjgu3YAwE
效果:


附录:
官网资料:
- 谷歌的JS的Puppeteer
- Quick start | Tools for Web Developers | Google Developers
- Python的pyppeteer
- 旧的:miyakogi
- miyakogi/pyppeteer: Headless chrome/chromium automation library (unofficial port of puppeteer)
- Pyppeteer’s documentation — Pyppeteer 0.0.25 documentation
- API Reference — Pyppeteer 0.0.25 documentation
- 新的:pyppeteer
- pyppeteer/pyppeteer: Headless chrome/chromium automation library (unofficial port of puppeteer)
- Pyppeteer’s documentation — Pyppeteer 0.0.25 documentation
其他:
如何写Selector可参考:
转载请注明:在路上 » 【已解决】Mac中用puppeteer自动操作浏览器实现百度搜索