最新消息:20210917 已从crifan.com换到crifan.org

【已解决】PySpider中如何发送POST请求且传递格式为application/x-www-form-urlencoded的form data参数

pyspider crifan 3083浏览 0评论

折腾:

【已解决】使用PySpider去爬取某网站中的视频

期间,需要去搞清楚,PySpider中:

如何发送POST请求,且带格式为application/x-www-form-urlencoded的form data

pyspider post with url-encoded

pyspider发送post请求 – CSDN博客

python – pyspider中利用self.crawl函数实现向服务器post用户名和密码,如何解决遇到的编码错误? – SegmentFault 思否

self.crawl – pyspider

self.crawl – pyspider中文文档 – pyspider中文网

pyspider示例代码六:传递参数 – microman – 博客园

然后代码:

<code>    @config(age=10 * 24 * 60 * 60)
    def index_page(self, response):
        # &lt;ul class="list-user list-user-1" id="list-user-1"&gt;
        for each in response.doc('ul[id^="list-user"] li  a[href^="http"]').items():
            self.crawl(each.attr.href, callback=self.detail_page)

        maxPageNum = 10
        for curPageIdx in range(maxPageNum):
            curPageNum = curPageIdx + 1
            print("curPageNum=%s" % curPageNum)
            getShowsUrl = "http://xxx/index.php?m=home&amp;c=match_new&amp;a=get_shows"
            headerDict = {
                "Content-Type": "application/x-www-form-urlencoded"
            }
            dataDict = {
                "counter": curPageNum,
                "order": 1,
                "match_type": 2,
                "match_name": "",
                "act_id": 3
            }
            self.crawl(
                getShowsUrl,
                method="POST",
                headers=headerDict,
                data=dataDict,
                cookies=response.cookies,
                callback=self.parseGetShowsCallback
            )

    def parseGetShowsCallback(self, response):
        print("parseGetShowsCallback: self=%s, response=%s"%(self, response))
</code>

是可以返回response了:

但是想要获得对应的json

然后再去:

pyspider response json

pyspider示例代码二:解析JSON数据 – microman – 博客园

Response – pyspider

Response – pyspider中文文档 – pyspider中文网

<code>    def parseGetShowsCallback(self, response):
        print("parseGetShowsCallback: self=%s, response=%s"%(self, response))
        respJson = response.json
        print("respJson=%s" % (respJson))
</code>

可以返回我们需要的json:

<code>parseGetShowsCallback: self=&lt;x.x.x.Handler object at 0x10def6ac8&gt;, response=&lt;Response [200]&gt;

respJson={'status': 1, 'data': [{'id': '3293', 'uid': '878964', 'show_id': '104728193', 'course_id': '43716', 'supports': '107', 'rewards': '0', 'shares': '2', 'scores': '65.00', 'status': '1', 'match_type': '2', 'create_time': '1513346405', 'act_id': '3', 'child_type': '1', 'show_score': '100', 'head_img': 'https://x.x.x/avatar_2018-06-02_1527913844_9082951.jpeg', 'cover_img': 'https://x.x.x/2017-02-23/58ae9dec28283.jpg', 'name': '徐欣蕊', 'href': '/index.php?m=home&amp;c=match_new&amp;a=video&amp;show_id=104728193'}, {'id': '489', 'uid': '5697525', 'show_id': '103129621', 'course_id': '17734', 'supports': '104', 'rewards': '0', 'shares': '2', 'scores': '63.20', 'status': '1', 'match_type': '2', 'create_time': '1512737780', 'act_id': '3', 'child_type': '1', 'show_score': '0', 'head_img': 'https://x.x.x/2018-06-23/5b2de55693ad9.jpg', 'cover_img': 'https://x.x.x/2018-06-04/5b14e22b8850a.jpg', 'name': '唐昕玥', 'href': '/index.php?m=home&amp;c=match_new&amp;a=video&amp;show_id=103129621'}, {'id': '9', 'uid': '3977349', 'show_id': '103000234', 'course_id': '41758', 'supports': '94', 'rewards': '0', 'shares': '2', 'scores': '57.20', 'status': '1', 'match_type': '2', 'create_time': '1512685717', 'act_id': '3', 'child_type': '1', 'show_score': '95', 'head_img': 'https://x.x.x/2017-09-11/59b6363a9e099.jpg', 'cover_img': 'https://x.x.x/2017-03-15/58c8abf7eafb6.jpg', 'name': '梁多', 'href': '/index.php?m=home&amp;c=match_new&amp;a=video&amp;show_id=103000234'}, {'id': '460', 'uid': '5697525', 'show_id': '103122827', 'course_id': '41758', 'supports': '93', 'rewards': '0', 'shares': '2', 'scores': '56.60', 'status': '1', 'match_type': '2', 'create_time': '1512737139', 'act_id': '3', 'child_type': '1', 'show_score': '78', 'head_img': 'https://x.x.x/2018-06-23/5b2de55693ad9.jpg', 'cover_img': 'https://x.x.x/2017-03-15/58c8abf7eafb6.jpg', 'name': '唐昕玥', 'href': '/index.php?m=home&amp;c=match_new&amp;a=video&amp;show_id=103122827'}, {'id': '4096', 'uid': '3896494', 'show_id': '105000309', 'course_id': '49023', 'supports': '77', 'rewards': '0', 'shares': '1', 'scores': '46.60', 'status': '1', 'match_type': '2', 'create_time': '1513434346', 'act_id': '3', 'child_type': '1', 'show_score': '0', 'head_img': 'http://q.qlogo.cn/qqapp/1104670989/DFC726007737AE2674A65E5BD4FFC3F5/100', 'cover_img': 'https://x.x.x/2017-11-01/59f9793b2bd2c.jpg', 'name': '彭怡', 'href': '/index.php?m=home&amp;c=match_new&amp;a=video&amp;show_id=105000309'}, {'id': '1194', 'uid': '4837277', 'show_id': '103429330', 'course_id': '41758', 'supports': '71', 'rewards': '0', 'shares': '0', 'scores': '42.60', 'status': '1', 'match_type': '2', 'create_time': '1512828159', 'act_id': '3', 'child_type': '1', 'show_score': '95', 'head_img': 'https://x.x.x/2017-10-20/59e9fe8d49cd7.jpg', 'cover_img': 'https://x.x.x/2017-03-15/58c8abf7eafb6.jpg', 'name': '朱思颖', 'href': '/index.php?m=home&amp;c=match_new&amp;a=video&amp;show_id=103429330'}, {'id': '27', 'uid': '1035103', 'show_id': '103008839', 'course_id': '46923', 'supports': '70', 'rewards': '0', 'shares': '1', 'scores': '42.40', 'status': '1', 'match_type': '2', 'create_time': '1512698148', 'act_id': '3', 'child_type': '1', 'show_score': '92', 'head_img': 'https://x.x.x/2016-05-22/5741045425b1f.jpg', 'cover_img': 'https://x.x.x/2017-06-13/14973432415241.jpg', 'name': '王陆睿祺', 'href': '/index.php?m=home&amp;c=match_new&amp;a=video&amp;show_id=103008839'}, {'id': '4570', 'uid': '248179', 'show_id': '105265776', 'course_id': '43716', 'supports': '66', 'rewards': '0', 'shares': '0', 'scores': '39.60', 'status': '1', 'match_type': '2', 'create_time': '1513519204', 'act_id': '3', 'child_type': '1', 'show_score': '0', 'head_img': 'https://x.x.x/2018-06-18/5b27b7b196047.jpg', 'cover_img': 'https://x.x.x/2017-02-23/58ae9dec28283.jpg', 'name': '介里', 'href': '/index.php?m=home&amp;c=match_new&amp;a=video&amp;show_id=105265776'}, {'id': '161', 'uid': '874998', 'show_id': '103036066', 'course_id': '43716', 'supports': '53', 'rewards': '0', 'shares': '1', 'scores': '32.20', 'status': '1', 'match_type': '2', 'create_time': '1512724329', 'act_id': '3', 'child_type': '1', 'show_score': '0', 'head_img': 'https://x.x.x/2018-07-05/5b3e254832248.jpg', 'cover_img': 'https://x.x.x/2017-02-23/58ae9dec28283.jpg', 'name': '尤薇然', 'href': '/index.php?m=home&amp;c=match_new&amp;a=video&amp;show_id=103036066'}, {'id': '2872', 'uid': '3901045', 'show_id': '104542553', 'course_id': '43713', 'supports': '49', 'rewards': '0', 'shares': '1', 'scores': '29.80', 'status': '1', 'match_type': '2', 'create_time': '1513260014', 'act_id': '3', 'child_type': '1', 'show_score': '94', 'head_img': 'https://x.x.x/2017-10-23/59eda1c991187.jpg', 'cover_img': 'https://x.x.x/2017-02-23/58ae9e49a1353.jpg', 'name': '肖乐遥', 'href': '/index.php?m=home&amp;c=match_new&amp;a=video&amp;show_id=104542553'}]}
</code>

【总结】

此处,PySpider中通过:

<code>    @config(age=10 * 24 * 60 * 60)
    def index_page(self, response):
        # &lt;ul class="list-user list-user-1" id="list-user-1"&gt;
        for each in response.doc('ul[id^="list-user"] li  a[href^="http"]').items():
            self.crawl(each.attr.href, callback=self.detail_page)

        maxPageNum = 10
        for curPageIdx in range(maxPageNum):
            curPageNum = curPageIdx + 1
            print("curPageNum=%s" % curPageNum)
            getShowsUrl = "http://xxx/index.php?m=home&amp;c=match_new&amp;a=get_shows"
            headerDict = {
                "Content-Type": "application/x-www-form-urlencoded"
            }
            dataDict = {
                "counter": curPageNum,
                "order": 1,
                "match_type": 2,
                "match_name": "",
                "act_id": 3
            }
            self.crawl(
                getShowsUrl,
                method="POST",
                headers=headerDict,
                data=dataDict,
                cookies=response.cookies,
                callback=self.parseGetShowsCallback
            )

    def parseGetShowsCallback(self, response):
        print("parseGetShowsCallback: self=%s, response=%s"%(self, response))
        respJson = response.json
        print("respJson=%s" % (respJson))

</code>

实现了:

  • 发送POST

    • 传递header

      • “Content-Type”: “application/x-www-form-urlencoded”

    • 传递data

      • 一个dict,包含对应的key和value

    • 顺带传递了cookie

      • cookies=response.cookies

  • 获得返回的JSON

    • callback中用response.json

转载请注明:在路上 » 【已解决】PySpider中如何发送POST请求且传递格式为application/x-www-form-urlencoded的form data参数

发表我的评论
取消评论

表情

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址
95 queries in 0.170 seconds, using 23.38MB memory