折腾:
【未解决】Mac中用playwright自动操作浏览器实现百度搜索
期间,已经能触发百度搜索,现在去提取结果。
对于获取到元素后,可以
各种操作
包括获取值:
- * elementHandle.innerHTML()
- * elementHandle.innerText()
- * elementHandle.textContent()
- * jsHandle.getProperties()
- * jsHandle.jsonValue()
ElementHandle represents an in-page DOM element. ElementHandles can be created with the page.$(selector) method.
说明是:用:
page.$(selector)
去选择获取到元素
去试试
不过还是去搞清楚:如何选择到元素
playwright select element
还是看看 核心概念吧
- * Browser
- * Browser contexts
- * Pages and frames
- * Selectors
- * Auto-waiting
- * Execution contexts: Playwright and Browser
- * Evaluation Argument
page有:
- page.goto
- page.fill
- page.click
看到如何获取到 定位到 元素了:
// Get frame using any other selector const frameElementHandle = await page.$('.frame-class');
就是:
page.$(someSelector)
去看看page
看到了:
- * page.$(selector)
- Page | Playwright
- 返回单个(第一个匹配到的)元素
- * page.$$(selector)
- Page | Playwright
- 返回所有元素
- * page.$eval(selector, pageFunction[, arg])
- * page.$$eval(selector, pageFunction[, arg])
“page.$(selector)#
* selector <string> A selector to query for. See working with selectors for more details.
* returns: <Promise<null|ElementHandle>>
The method finds an element matching the specified selector within the page. If no elements match the selector, the return value resolves to null.
Shortcut for main frame’s frame.$(selector).”
page.$,传入 selector,返回空会元素句柄 ElementHandle
去试试:
resultASelector = "h3[class^='t'] a" searchResultAList = page.$$(resultASelector)
结果:
语法错误:
searchResultAList = page.$$(resultASelector) ^ SyntaxError: invalid syntax

看来是:$$是js语法,不是此处python语法
-》要求找Python版Playwright的page.$$对应的写法
是python版的文档
找到了
href_element = page.query_selector("a") href_element.click()
很清晰,用:page.query_selector
->
- * element_handle.query_selector(selector)
- * element_handle.query_selector_all(selector)
“element_handle.query_selector(selector)#
* selector <str> A selector to query for. See working with selectors for more details.
* returns: <NoneType|ElementHandle>
The method finds an element matching the specified selector in the ElementHandle’s subtree. See Working with selectors for more details. If no elements match the selector, returns null.
element_handle.query_selector_all(selector)#
* selector <str> A selector to query for. See working with selectors for more details.
* returns: <List[ElementHandle]>
The method finds all elements matching the specified selector in the ElementHandles subtree. See Working with selectors for more details. If no elements match the selector, returns empty array.”
注意到此处是:
真是针对当前元素 的子元素中去找
而此处想要找的是页面中去找
所以再去page页面中找
果然也有:
- * page.query_selector(selector)
- * page.query_selector_all(selector)
“page.query_selector(selector)#
* selector <str> A selector to query for. See working with selectors for more details.
* returns: <NoneType|ElementHandle>
The method finds an element matching the specified selector within the page. If no elements match the selector, the return value resolves to null.
Shortcut for main frame’s frame.query_selector(selector).
page.query_selector_all(selector)#
* selector <str> A selector to query for. See working with selectors for more details.
* returns: <List[ElementHandle]>
The method finds all elements matching the specified selector within the page. If no elements match the selector, the return value resolves to [].
Shortcut for main frame’s frame.query_selector_all(selector).”
就是我们希望的:
找到我们要的元素了。
即:
- element_handle
- element_handle.query_selector(selector)
- element_handle.query_selector_all(selector)
- page
- page.query_selector(selector)
- page.query_selector_all(selector)
去写代码:
resultASelector = "h3[class^='t'] a" searchResultAList = page.query_selector_all(resultASelector)
结果:

每个都是:
<JSHandle preview=JSHandle@node>
的类型
searchResultAList=[<JSHandle preview=JSHandle@node>, <JSHandle preview=JSHandle@node>, <JSHandle preview=JSHandle@node>, <JSHandle preview=JSHandle@node>, <JSHandle preview=JSHandle@node>, <JSHandle preview=JSHandle@node>, <JSHandle preview=JSHandle@node>, <JSHandle preview=JSHandle@node>, <JSHandle preview=JSHandle@node>, <JSHandle preview=JSHandle@node>]
然后再去获取值
- * element_handle.get_attribute(name)
- * element_handle.inner_html()
- * element_handle.inner_text()
- * element_handle.text_content()
批量运行时,也出现类似问题:
【已解决】Python的Playwright用page.query_selector_all找不到元素
继续。
【总结】
最后用代码:
################################################################################ # Extract content ################################################################################ resultASelector = "h3[class^='t'] a" searchResultAList = page.query_selector_all(resultASelector) print("searchResultAList=%s" % searchResultAList) # searchResultAList=[<JSHandle preview=JSHandle@<a target="_blank" href="http://www.baidu.com/link?…>在路上on the way - 走别人没走过的路,让别人有路可走</a>>, <JSHandle preview=JSHandle@node>, 。。。, <JSHandle preview=JSHandle@node>] searchResultANum = len(searchResultAList) print("Found %s search result:" % searchResultANum) for curIdx, aElem in enumerate(searchResultAList): curNum = curIdx + 1 print("%s [%d] %s" % ("-"*20, curNum, "-"*20)) title = aElem.text_content() print("title=%s" % title) baiduLinkUrl = aElem.get_attribute("href") print("baiduLinkUrl=%s" % baiduLinkUrl)
实现了百度搜索结果的内容的解析和提取:
searchResultAList=[<JSHandle preview=JSHandle@<a target="_blank" href="http://www.baidu.com/link?…>在路上on the way - 走别人没走过的路,让别人有路可走</a>>, <JSHandle preview=JSHandle@node>, <JSHandle preview=JSHandle@node>, <JSHandle preview=JSHandle@node>, <JSHandle preview=JSHandle@node>, <JSHandle preview=JSHandle@node>, <JSHandle preview=JSHandle@node>, <JSHandle preview=JSHandle@node>, <JSHandle preview=JSHandle@node>, <JSHandle preview=JSHandle@node>] Found 10 search result: -------------------- [1] -------------------- title=在路上on the way - 走别人没走过的路,让别人有路可走 baiduLinkUrl=http://www.baidu.com/link?url=fB3F0xZmwig9r2M_1pK4BJG00xFPLjJ85X39GheP_fzEA_zJIjX-IleEH_ZL8pfo -------------------- [2] -------------------- title=crifan – 在路上 baiduLinkUrl=http://www.baidu.com/link?url=kmvgD1PraoULnnjUvNPQmwHFQ9uUKkXg_HWy0NI3xI11cV7evpdxyA_4FkVf3zLH -------------------- [3] -------------------- title=crifan简介_crifan的专栏-CSDN博客_crifan baiduLinkUrl=http://www.baidu.com/link?url=CHLWAQKOMgb23GmzVCZRIVze9kBNu6DIVoSWQqe21bWq_qZk2zDu_V3pDC1o1i5WC8qXAbUhaBIN8UO9Sjzxfa -------------------- [4] -------------------- title=crifan的微博_微博 baiduLinkUrl=http://www.baidu.com/link?url=-QwlZ5SEmZD1R2QqdsK7ByUhxmIdX_hiFCX79hg9RTbQ11j5wXaBaYXegXU9WDk3 -------------------- [5] -------------------- title=Crifan的电子书大全 | crifan.github.io baiduLinkUrl=http://www.baidu.com/link?url=Sgrbyd_pBsm-BTANKwSmyveSWvWj2_IqOOZzYw-SkG8tQ_C6Ccz88zZxHf3Eh1JA -------------------- [6] -------------------- title=GitHub - crifan/crifanLib: crifan's library baiduLinkUrl=http://www.baidu.com/link?url=NSZ5IzQ2Qag3CpGLMAbJer3QaAqI7qZOp2Ythiw6o8inoDX-5LqlzOKWTrMzQK5G -------------------- [7] -------------------- title=在路上www.crifan.com - 网站排行榜 baiduLinkUrl=http://www.baidu.com/link?url=Tc4cbETNKpQXj-kX1pwSOcPG8l9ijRRPqokRSMHgB59rSn6GoWSBzCPu3ky3dN6Cu1pb-4HBZ2_YaVyS7qdDS_ -------------------- [8] -------------------- title=crifan的专栏_crifan_CSDN博客-crifan领域博主 baiduLinkUrl=http://www.baidu.com/link?url=OLkrWu8q9SRZuBN-KzEMO56f82IpIfvbOp-sU3jdjbVBPP3GXBw_8StJgYG-_QrK -------------------- [9] -------------------- title=User crifan - Stack Overflow baiduLinkUrl=http://www.baidu.com/link?url=t1rc0EGg33A-uJUiZHKkUWA8ETf6B5P8pBKo0yNCH-VTWluW3xqUlYRHjMz8bQdiN2mJROMhfkX6bY0db_bB_a -------------------- [10] -------------------- title=crifan - Bing 词典 baiduLinkUrl=http://www.baidu.com/link?url=8z-3hYeLAQ8T4efOf4848LtAdpGdR1Ect9au4JIUB32bm2z412RDsMelFW1R2aIk
效果:

已回复帖子