最新消息:20210917 已从crifan.com换到crifan.org

【未解决】模拟mp.codeup.cn中调用queryAllPageByEbookId.do返回json数据

模拟 crifan 576浏览 0评论
折腾:
【未解决】爬取mp.codeup.cn中的英语教材电子书资源
期间,现在去模拟mp.codeup.cn去尝试写代码
模拟如下内容:
1. Request URL: 
https://biz.bookln.cn/ebookpageservices/queryAllPageByEbookId.do
2. Request Method: POST
3. Status Code: 200
4. Remote Address: 1xxx.210:443
5. Referrer Policy: no-referrer-when-downgrade

Request Headers:
1. :authority: biz.bookln.cn
2. :method: POST
3. :path: /ebookpageservices/queryAllPageByEbookId.do

content-type: application/x-www-form-urlencoded
accept: application/json, text/javascript, */*; q=0.01

1. content-length: 118


Form Data:

view source:
ebookId=52365&_timestamp=1583157835&_nonce=491fd5fc-b046-4bd7-870b-ccae94ccc23b&_sign=47CBFDFACD3E0A0746E2391C7F78AD00

encoded:
1. ebookId: 52365
2. _timestamp: 1583157835
3. _nonce: 491fd5fc-b046-4bd7-870b-ccae94ccc23b
4. _sign: 47CBFDFACD3E0A0746E2391C7F78AD00
目测可能:_timestamp,_nonce,_sign可能稍微麻烦点
话说,如果只是这2本书,都不用模拟了:直接用保存的json即可。
不过为了支持更多书,还是去尝试模拟吧
快速上手 — Requests 2.18.1 文档
感觉要:
要去搞清楚requests如何发送:
post,但是data是application/x-www-form-urlencoded的
requests application/x-www-form-urlencoded
python实现Content-Type类型为application/x-www-form-urlencoded发送POST请求 – 梦雨情殇 – 博客园
四种常见的 POST 提交数据方式 | JerryQu 的小站
Quickstart — Requests 0.8.2 documentation
>>> payload = {'key1': 'value1', 'key2': 'value2'}
>>> r = requests.post("
http://httpbin.org/post
", data=payload)
>>> print r.content
{
  "origin": "179.13.100.4",
  "files": {},
  "form": {
    "key2": "value2",
    "key1": "value1"
  },
  "url": "
http://httpbin.org/post
",
  "args": {},
  "headers": {
    "Content-Length": "23",
    "Accept-Encoding": "identity, deflate, compress, gzip",
    "Accept": "*/*",
    "User-Agent": "python-requests/0.8.0",
    "Host": "127.0.0.1:7077",
    "Content-Type": "application/x-www-form-urlencoded"
  },
  "data": ""
}
如果直接post,data是dict的话,默认就是:
“Content-Type”: “application/x-www-form-urlencoded”
如果想要发送json字符串,则是:
url = '
https://api.github.com/some/endpoint
'
payload = {'some': 'data'}

r = requests.post(url, data=json.dumps(payload))
对于代码:
for eachBookId in gBookIdList:
    getAllPageUrl = "
https://biz.bookln.cn/ebookpageservices/queryAllPageByEbookId.do
"
    curHeaders = deep.copy(gHeaders)
    curHeaders["Content-Type"] = "application/x-www-form-urlencoded"
    postDict = {
      "ebookId": eachBookId
    }
    resp = requests.post(getAllPageUrl, headers=gHeaders, data=postDict)
    print("resp=%s" % resp)
先去调试看看再说
'{"msg":"服务器繁忙中,请稍后重试!","success":false}\n'
很明显,此处参数不对。
加了其他一些header,估计是没关系的:
    curHeaders["Accept"] = "application/json, text/javascript, */*; q=0.01"
    curHeaders["origin"] = "
http://mp.codeup.cn
"
    curHeaders["referer"] = "
http://mp.codeup.cn/book/sample2.htm?id=%s
" % eachBookId
    curHeaders["sec-fetch-dest"] = "empty"
    curHeaders["sec-fetch-mode"] = "cors"
    curHeaders["sec-fetch-site"] = "cross-site"
结果:
问题依旧。
看来要去想办法实现sign了:
【未解决】分析mp.codeup.cn中核心参数_timestamp、_nonce、_sign逻辑
其中已获取到js源码。
暂时懒得转python了。
等有需要再去转Python。

转载请注明:在路上 » 【未解决】模拟mp.codeup.cn中调用queryAllPageByEbookId.do返回json数据

发表我的评论
取消评论

表情

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址
102 queries in 0.253 seconds, using 23.25MB memory