折腾:
【未解决】爬取mp.codeup.cn中的英语教材电子书资源
期间,现在去模拟mp.codeup.cn去尝试写代码
模拟如下内容:
1. Request URL: https://biz.bookln.cn/ebookpageservices/queryAllPageByEbookId.do 2. Request Method: POST 3. Status Code: 200 4. Remote Address: 1xxx.210:443 5. Referrer Policy: no-referrer-when-downgrade Request Headers: 1. :authority: biz.bookln.cn 2. :method: POST 3. :path: /ebookpageservices/queryAllPageByEbookId.do content-type: application/x-www-form-urlencoded accept: application/json, text/javascript, */*; q=0.01 1. content-length: 118 Form Data: view source: ebookId=52365&_timestamp=1583157835&_nonce=491fd5fc-b046-4bd7-870b-ccae94ccc23b&_sign=47CBFDFACD3E0A0746E2391C7F78AD00 encoded: 1. ebookId: 52365 2. _timestamp: 1583157835 3. _nonce: 491fd5fc-b046-4bd7-870b-ccae94ccc23b 4. _sign: 47CBFDFACD3E0A0746E2391C7F78AD00
目测可能:_timestamp,_nonce,_sign可能稍微麻烦点
话说,如果只是这2本书,都不用模拟了:直接用保存的json即可。
不过为了支持更多书,还是去尝试模拟吧
感觉要:
要去搞清楚requests如何发送:
post,但是data是application/x-www-form-urlencoded的
requests application/x-www-form-urlencoded
>>> payload = {'key1': 'value1', 'key2': 'value2'} >>> r = requests.post(" http://httpbin.org/post ", data=payload) >>> print r.content { "origin": "179.13.100.4", "files": {}, "form": { "key2": "value2", "key1": "value1" }, "url": " http://httpbin.org/post ", "args": {}, "headers": { "Content-Length": "23", "Accept-Encoding": "identity, deflate, compress, gzip", "Accept": "*/*", "User-Agent": "python-requests/0.8.0", "Host": "127.0.0.1:7077", "Content-Type": "application/x-www-form-urlencoded" }, "data": "" }
如果直接post,data是dict的话,默认就是:
“Content-Type”: “application/x-www-form-urlencoded”
如果想要发送json字符串,则是:
url = ' https://api.github.com/some/endpoint ' payload = {'some': 'data'} r = requests.post(url, data=json.dumps(payload))
对于代码:
for eachBookId in gBookIdList: getAllPageUrl = " https://biz.bookln.cn/ebookpageservices/queryAllPageByEbookId.do " curHeaders = deep.copy(gHeaders) curHeaders["Content-Type"] = "application/x-www-form-urlencoded" postDict = { "ebookId": eachBookId } resp = requests.post(getAllPageUrl, headers=gHeaders, data=postDict) print("resp=%s" % resp)
先去调试看看再说

'{"msg":"服务器繁忙中,请稍后重试!","success":false}\n'
很明显,此处参数不对。
加了其他一些header,估计是没关系的:
curHeaders["Accept"] = "application/json, text/javascript, */*; q=0.01" curHeaders["origin"] = " http://mp.codeup.cn " curHeaders["referer"] = " http://mp.codeup.cn/book/sample2.htm?id=%s " % eachBookId curHeaders["sec-fetch-dest"] = "empty" curHeaders["sec-fetch-mode"] = "cors" curHeaders["sec-fetch-site"] = "cross-site"
结果:
问题依旧。
看来要去想办法实现sign了:
【未解决】分析mp.codeup.cn中核心参数_timestamp、_nonce、_sign逻辑
其中已获取到js源码。
暂时懒得转python了。
等有需要再去转Python。
转载请注明:在路上 » 【未解决】模拟mp.codeup.cn中调用queryAllPageByEbookId.do返回json数据