【基本解决】PySpider打开页面出现304

折腾：

【暂时解决】给PySpider中用科学上网的代理打开需要翻墙的页面

期间，调试PySpider发现打开页面，出现很多错误，其中有304：

console: AT: [getOffer()] request failed [object Object]
console: AT: Rendering : failed target-global-mbox error timeout
[304] https://www.scholastic.com/teachers/bookwizard/ 20.146
[I 181010 15:46:44 tornado_fetcher:520] [304] ScholasticStorybook:34b1c45f09fa84805dd1697c1809e8c9 https://www.scholastic.com/teachers/bookwizard/ 20.15s

PySpider http 304

does “Force_update” and “itag” not work with HTTP status code 304? · Issue #573 · binux/pyspider

加上：

force_update=True,

试试，现象依旧：

[I 181010 16:02:23 tornado_fetcher:520] [304] ScholasticStorybook:34b1c45f09fa84805dd1697c1809e8c9 
https://www.scholastic.com/teachers/bookwizard/
 20.02s

关键还是不返回数据

does “Force_update” and “itag” not work with HTTP status code 304? · Issue #573 · binux/pyspider

试试：

crawl_config = {
  'force_update': True,
  'last_modified': False,
  'etag': False
}

问题依旧。

[译]理解HTTP/304响应 – 紫云飞 – 博客园

还是去官网详细了解这几个参数的含义：

http://docs.pyspider.org/en/latest/apis/self.crawl/#etag

“etag

use HTTP Etag mechanism to pass the process if the content of the page is not changed. default: True”

“last_modified

use HTTP Last-Modified header mechanism to pass the process if the content of the page is not changed. default: True”

“force_update

force update task params even if the task is in ACTIVE status.”

pyspider 304

Python urllib2.HTTPError: HTTP Error 304: Not Modified – 程序园

“I must create a new project like before to run it again, then the 304 problem is gone.

If you don’t want this feature, set etag and last_modified to False in self.crawl.”

结果此处虽然设置了：

    crawl_config = {
        "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36",
        # "proxy": "127.0.0.1:10870",
        # "proxy": "127.0.0.1:1087",
        # "proxy": "localhost:1087",

        'force_update': True,
        'last_modified': False,
        'etag': False
    }

但是结果竟然还是：

返回为空