折腾:
【已解决】用Python爬虫框架PySpider实现爬虫爬取百度热榜内容列表
期间,去Mac中启动:
pyspider
结果,出现了之前就遇到过的2个问题:
xxx@xxx ~/dev/crifan/python/demo_spider pyspider [W 200731 09:59:37 run:413] phantomjs not found, continue running without it. [I 200731 09:59:39 result_worker:49] result_worker starting... Process Process-4: Traceback (most recent call last): File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/multiprocessing/process.py", line 93, in run self._target(*self._args, **self._kwargs) File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/core.py", line 610, in invoke return callback(*args, **kwargs) File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/decorators.py", line 21, in new_func return f(get_current_context(), *args, **kwargs) File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/pyspider/run.py", line 236, in fetcher Fetcher = load_cls(None, None, fetcher_cls) File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/pyspider/run.py", line 48, in load_cls return utils.load_object(value) File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/pyspider/libs/utils.py", line 369, in load_object module = __import__(module_name, globals(), locals(), [object_name]) File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/pyspider/fetcher/__init__.py", line 1, in <module> from .tornado_fetcher import Fetcher [I 200731 09:59:39 processor:211] processor starting... File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/pyspider/fetcher/tornado_fetcher.py", line 30, in <module> from tornado.curl_httpclient import CurlAsyncHTTPClient File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/tornado/curl_httpclient.py", line 24, in <module> import pycurl # type: ignore ImportError: pycurl: libcurl link-time ssl backend (openssl) is different from compile-time ssl backend (none/other) [I 200731 09:59:39 scheduler:647] scheduler starting... [I 200731 09:59:39 scheduler:782] scheduler.xmlrpc listening on 127.0.0.1:23333 [I 200731 09:59:39 scheduler:586] in 5m: new:0,success:0,retry:0,failed:0 Traceback (most recent call last): File "/Users/xxx/.pyenv/versions/3.6.8/bin/pyspider", line 11, in <module> load_entry_point('pyspider==0.3.10', 'console_scripts', 'pyspider')() File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/pyspider/run.py", line 754, in main cli() File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/core.py", line 829, in __call__ return self.main(*args, **kwargs) File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/core.py", line 782, in main rv = self.invoke(ctx) File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/core.py", line 1236, in invoke return Command.invoke(self, ctx) File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/core.py", line 1066, in invoke return ctx.invoke(self.callback, **ctx.params) File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/core.py", line 610, in invoke return callback(*args, **kwargs) File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/decorators.py", line 21, in new_func return f(get_current_context(), *args, **kwargs) File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/pyspider/run.py", line 165, in cli ctx.invoke(all) File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/core.py", line 610, in invoke return callback(*args, **kwargs) File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/decorators.py", line 21, in new_func return f(get_current_context(), *args, **kwargs) File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/pyspider/run.py", line 497, in all ctx.invoke(webui, **webui_config) File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/core.py", line 610, in invoke return callback(*args, **kwargs) File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/decorators.py", line 21, in new_func return f(get_current_context(), *args, **kwargs) File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/pyspider/run.py", line 333, in webui app = load_cls(None, None, webui_instance) File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/pyspider/run.py", line 48, in load_cls return utils.load_object(value) File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/pyspider/libs/utils.py", line 369, in load_object module = __import__(module_name, globals(), locals(), [object_name]) File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/pyspider/webui/__init__.py", line 8, in <module> from . import app, index, debug, task, result, login File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/pyspider/webui/app.py", line 17, in <module> from pyspider.fetcher import tornado_fetcher File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/pyspider/fetcher/__init__.py", line 1, in <module> from .tornado_fetcher import Fetcher File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/pyspider/fetcher/tornado_fetcher.py", line 30, in <module> from tornado.curl_httpclient import CurlAsyncHTTPClient File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/tornado/curl_httpclient.py", line 24, in <module> import pycurl # type: ignore ImportError: pycurl: libcurl link-time ssl backend (openssl) is different from compile-time ssl backend (none/other)
问题1:缺phantomjs,需要去安装,这个好办
问题2:ssl不兼容
ImportError: pycurl: libcurl link-time ssl backend (openssl) is different from compile-time ssl backend (none/other)
这个问题往往很难完美处理。
先去解决第一个:
【已解决】Mac中安装phantomjs
再去解决:
import pycurl # type: ignore ImportError: pycurl: libcurl link-time ssl backend (openssl) is different from compile-time ssl backend (none/other)
参考:
中:
pip uninstall -y pycurl export PYCURL_SSL_LIBRARY=openssl export LDFLAGS=-L/usr/local/opt/openssl/lib;export CPPFLAGS=-I/usr/local/opt/openssl/include;pip install pycurl --compile --no-cache-dir
结果最后一步报错:
【已解决】Mac中pip安装pycurl报错:fatal error openssl/ssl.h file not found
再回去运行PySpider看看:
pyspider Error: Could not create web server listening on port 25555 [I 200731 10:27:06 result_worker:49] result_worker starting... [I 200731 10:27:07 processor:211] processor starting... [I 200731 10:27:07 tornado_fetcher:638] fetcher starting... [I 200731 10:27:07 scheduler:647] scheduler starting... [I 200731 10:27:07 scheduler:782] scheduler.xmlrpc listening on 127.0.0.1:23333 [I 200731 10:27:07 scheduler:586] in 5m: new:0,success:0,retry:0,failed:0 [I 200731 10:27:07 app:84] webui exiting... Traceback (most recent call last): File "/Users/xxx/.pyenv/versions/3.6.8/bin/pyspider", line 11, in <module> load_entry_point('pyspider==0.3.10', 'console_scripts', 'pyspider')() File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/pyspider/run.py", line 754, in main cli() File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/core.py", line 829, in __call__ return self.main(*args, **kwargs) File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/core.py", line 782, in main rv = self.invoke(ctx) File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/core.py", line 1236, in invoke return Command.invoke(self, ctx) File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/core.py", line 1066, in invoke return ctx.invoke(self.callback, **ctx.params) File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/core.py", line 610, in invoke return callback(*args, **kwargs) File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/decorators.py", line 21, in new_func return f(get_current_context(), *args, **kwargs) File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/pyspider/run.py", line 165, in cli ctx.invoke(all) File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/core.py", line 610, in invoke return callback(*args, **kwargs) File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/decorators.py", line 21, in new_func return f(get_current_context(), *args, **kwargs) File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/pyspider/run.py", line 497, in all ctx.invoke(webui, **webui_config) File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/core.py", line 610, in invoke return callback(*args, **kwargs) File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/decorators.py", line 21, in new_func return f(get_current_context(), *args, **kwargs) File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/pyspider/run.py", line 384, in webui app.run(host=host, port=port) File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/pyspider/webui/app.py", line 59, in run from .webdav import dav_app File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/pyspider/webui/webdav.py", line 216, in <module> dav_app = WsgiDAVApp(config) File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/wsgidav/wsgidav_app.py", line 134, in __init__ _check_config(config) File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/wsgidav/wsgidav_app.py", line 118, in _check_config raise ValueError("Invalid configuration:\n - " + "\n - ".join(errors)) ValueError: Invalid configuration: - Deprecated option 'domaincontroller': use 'http_authenticator.domain_controller' instead. ✘ xxx@xxx ~/dev/crifan/python/demo_spider Error: Could not create web server listening on port 25555
依旧报错,但是感觉是之前phantomjs的问题,所以去杀掉:
✘ xxx@xxx ~/dev/crifan/python/demo_spider ps aux | grep 25555 xxx 35620 0.0 0.0 4277272 820 s002 R+ 10:27上午 0:00.00 grep --color=auto --exclude-dir=.bzr --exclude-dir=CVS --exclude-dir=.git --exclude-dir=.hg --exclude-dir=.svn 25555 xxx 33983 0.0 0.4 6130968 34128 s002 S 10:17上午 0:30.45 phantomjs --ssl-protocol=any --disk-cache=true /Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/pyspider/fetcher/phantomjs_fetcher.js 25555 xxx@xxx ~/dev/crifan/python/demo_spider kill -9 33983
结果:
端口问题解决了,不报错了:
pyspider phantomjs fetcher running on port 25555 [I 200731 10:28:35 result_worker:49] result_worker starting... [I 200731 10:28:35 processor:211] processor starting... [I 200731 10:28:35 tornado_fetcher:638] fetcher starting... [I 200731 10:28:35 scheduler:647] scheduler starting... [I 200731 10:28:35 scheduler:782] scheduler.xmlrpc listening on 127.0.0.1:23333 [I 200731 10:28:35 scheduler:586] in 5m: new:0,success:0,retry:0,failed:0 [I 200731 10:28:35 app:84] webui exiting...
不过前面的错误依旧:
[I 200731 10:28:35 app:84] webui exiting... Traceback (most recent call last): File "/Users/xxx/.pyenv/versions/3.6.8/bin/pyspider", line 11, in <module> load_entry_point('pyspider==0.3.10', 'console_scripts', 'pyspider')() File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/pyspider/run.py", line 754, in main cli() File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/core.py", line 829, in __call__ return self.main(*args, **kwargs) File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/core.py", line 782, in main rv = self.invoke(ctx) File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/core.py", line 1236, in invoke return Command.invoke(self, ctx) File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/core.py", line 1066, in invoke return ctx.invoke(self.callback, **ctx.params) File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/core.py", line 610, in invoke return callback(*args, **kwargs) File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/decorators.py", line 21, in new_func return f(get_current_context(), *args, **kwargs) File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/pyspider/run.py", line 165, in cli ctx.invoke(all) File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/core.py", line 610, in invoke return callback(*args, **kwargs) File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/decorators.py", line 21, in new_func return f(get_current_context(), *args, **kwargs) File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/pyspider/run.py", line 497, in all ctx.invoke(webui, **webui_config) File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/core.py", line 610, in invoke return callback(*args, **kwargs) File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/decorators.py", line 21, in new_func return f(get_current_context(), *args, **kwargs) File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/pyspider/run.py", line 384, in webui app.run(host=host, port=port) File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/pyspider/webui/app.py", line 59, in run from .webdav import dav_app File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/pyspider/webui/webdav.py", line 216, in <module> dav_app = WsgiDAVApp(config) File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/wsgidav/wsgidav_app.py", line 134, in __init__ _check_config(config) File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/wsgidav/wsgidav_app.py", line 118, in _check_config raise ValueError("Invalid configuration:\n - " + "\n - ".join(errors)) ValueError: Invalid configuration: - Deprecated option 'domaincontroller': use 'http_authenticator.domain_controller' instead.
pyspider Deprecated option ‘domaincontroller’: use ‘http_authenticator.domain_controller’ instead
pip install wsgidav==2.4.1
log
pip install wsgidav==2.4.1 Collecting wsgidav==2.4.1 Downloading https://files.pythonhosted.org/packages/95/e8/88e25c17ff671f7fad21fe16cdc435c33c4befe35203bd47c05366af362a/WsgiDAV-2.4.1-py2.py3-none-any.whl (186kB) 100% |████████████████████████████████| 194kB 1.5MB/s Requirement already satisfied: PyYAML in /Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages (from wsgidav==2.4.1) (5.3.1) Collecting jsmin (from wsgidav==2.4.1) Downloading https://files.pythonhosted.org/packages/17/73/615d1267a82ed26cd7c124108c3c61169d8e40c36d393883eaee3a561852/jsmin-2.2.2.tar.gz Requirement already satisfied: defusedxml in /Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages (from wsgidav==2.4.1) (0.6.0) Installing collected packages: jsmin, wsgidav Running setup.py install for jsmin ... done Found existing installation: WsgiDAV 3.0.3 Uninstalling WsgiDAV-3.0.3: Successfully uninstalled WsgiDAV-3.0.3 Successfully installed jsmin-2.2.2 wsgidav-2.4.1
即可解决问题。
不过又出现其他问题:
[I 200731 10:49:44 app:84] webui exiting... Traceback (most recent call last): File "/Users/xxx/.pyenv/versions/3.6.8/bin/pyspider", line 11, in <module> load_entry_point('pyspider==0.3.10', 'console_scripts', 'pyspider')() File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/pyspider/run.py", line 754, in main cli() File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/core.py", line 829, in __call__ return self.main(*args, **kwargs) File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/core.py", line 782, in main rv = self.invoke(ctx) File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/core.py", line 1236, in invoke return Command.invoke(self, ctx) File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/core.py", line 1066, in invoke return ctx.invoke(self.callback, **ctx.params) File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/core.py", line 610, in invoke return callback(*args, **kwargs) File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/decorators.py", line 21, in new_func return f(get_current_context(), *args, **kwargs) File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/pyspider/run.py", line 165, in cli ctx.invoke(all) File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/core.py", line 610, in invoke return callback(*args, **kwargs) File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/decorators.py", line 21, in new_func return f(get_current_context(), *args, **kwargs) File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/pyspider/run.py", line 497, in all ctx.invoke(webui, **webui_config) File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/core.py", line 610, in invoke return callback(*args, **kwargs) File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/decorators.py", line 21, in new_func return f(get_current_context(), *args, **kwargs) File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/pyspider/run.py", line 384, in webui app.run(host=host, port=port) File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/pyspider/webui/app.py", line 64, in run from werkzeug.wsgi import DispatcherMiddleware ImportError: cannot import name 'DispatcherMiddleware'
pyspider ImportError: cannot import name ‘DispatcherMiddleware’
pip install werkzeug==0.16.1
log
pip install werkzeug==0.16.1 Collecting werkzeug==0.16.1 Downloading Werkzeug-0.16.1-py2.py3-none-any.whl (327 kB) |████████████████████████████████| 327 kB 511 kB/s Installing collected packages: werkzeug Attempting uninstall: werkzeug Found existing installation: Werkzeug 1.0.1 Uninstalling Werkzeug-1.0.1: Successfully uninstalled Werkzeug-1.0.1 Successfully installed werkzeug-0.16.1
结果:
终于可以了。
pyspider phantomjs fetcher running on port 25555 [I 200731 10:52:00 result_worker:49] result_worker starting... [I 200731 10:52:00 processor:211] processor starting... [I 200731 10:52:00 tornado_fetcher:638] fetcher starting... [I 200731 10:52:00 scheduler:647] scheduler starting... [I 200731 10:52:00 scheduler:782] scheduler.xmlrpc listening on 127.0.0.1:23333 [I 200731 10:52:00 scheduler:586] in 5m: new:0,success:0,retry:0,failed:0 [I 200731 10:52:00 app:76] webui running on 0.0.0.0:5000

浏览器打开:

即可正常启动。
转载请注明:在路上 » 【已解决】Mac中启动PySpider