最新消息:20210917 已从crifan.com换到crifan.org

【已解决】Mac中启动PySpider

Mac crifan 693浏览 0评论
折腾:
【已解决】用Python爬虫框架PySpider实现爬虫爬取百度热榜内容列表
期间,去Mac中启动:
pyspider
结果,出现了之前就遇到过的2个问题:
 xxx@xxx  ~/dev/crifan/python/demo_spider  pyspider
[W 200731 09:59:37 run:413] phantomjs not found, continue running without it.
[I 200731 09:59:39 result_worker:49] result_worker starting...
Process Process-4:
Traceback (most recent call last):
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/decorators.py", line 21, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/pyspider/run.py", line 236, in fetcher
    Fetcher = load_cls(None, None, fetcher_cls)
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/pyspider/run.py", line 48, in load_cls
    return utils.load_object(value)
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/pyspider/libs/utils.py", line 369, in load_object
    module = __import__(module_name, globals(), locals(), [object_name])
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/pyspider/fetcher/__init__.py", line 1, in <module>
    from .tornado_fetcher import Fetcher
[I 200731 09:59:39 processor:211] processor starting...
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/pyspider/fetcher/tornado_fetcher.py", line 30, in <module>
    from tornado.curl_httpclient import CurlAsyncHTTPClient
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/tornado/curl_httpclient.py", line 24, in <module>
    import pycurl  # type: ignore
ImportError: pycurl: libcurl link-time ssl backend (openssl) is different from compile-time ssl backend (none/other)
[I 200731 09:59:39 scheduler:647] scheduler starting...
[I 200731 09:59:39 scheduler:782] scheduler.xmlrpc listening on 127.0.0.1:23333
[I 200731 09:59:39 scheduler:586] in 5m: new:0,success:0,retry:0,failed:0
Traceback (most recent call last):
  File "/Users/xxx/.pyenv/versions/3.6.8/bin/pyspider", line 11, in <module>
    load_entry_point('pyspider==0.3.10', 'console_scripts', 'pyspider')()
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/pyspider/run.py", line 754, in main
    cli()
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/core.py", line 1236, in invoke
    return Command.invoke(self, ctx)
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/decorators.py", line 21, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/pyspider/run.py", line 165, in cli
    ctx.invoke(all)
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/decorators.py", line 21, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/pyspider/run.py", line 497, in all
    ctx.invoke(webui, **webui_config)
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/decorators.py", line 21, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/pyspider/run.py", line 333, in webui
    app = load_cls(None, None, webui_instance)
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/pyspider/run.py", line 48, in load_cls
    return utils.load_object(value)
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/pyspider/libs/utils.py", line 369, in load_object
    module = __import__(module_name, globals(), locals(), [object_name])
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/pyspider/webui/__init__.py", line 8, in <module>
    from . import app, index, debug, task, result, login
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/pyspider/webui/app.py", line 17, in <module>
    from pyspider.fetcher import tornado_fetcher
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/pyspider/fetcher/__init__.py", line 1, in <module>
    from .tornado_fetcher import Fetcher
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/pyspider/fetcher/tornado_fetcher.py", line 30, in <module>
    from tornado.curl_httpclient import CurlAsyncHTTPClient
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/tornado/curl_httpclient.py", line 24, in <module>
    import pycurl  # type: ignore
ImportError: pycurl: libcurl link-time ssl backend (openssl) is different from compile-time ssl backend (none/other)
问题1:缺phantomjs,需要去安装,这个好办
问题2:ssl不兼容
ImportError: pycurl: libcurl link-time ssl backend (openssl) is different from compile-time ssl backend (none/other)
这个问题往往很难完美处理。
先去解决第一个:
【已解决】Mac中安装phantomjs
再去解决:
    import pycurl  # type: ignore
ImportError: pycurl: libcurl link-time ssl backend (openssl) is different from compile-time ssl backend (none/other)
参考:
【已解决】pyspider运行出错:ImportError pycurl libcurl link-time ssl backend (openssl) is different from compile-time ssl backend (none/other)
中:
pip uninstall -y pycurl
export PYCURL_SSL_LIBRARY=openssl
export LDFLAGS=-L/usr/local/opt/openssl/lib;export CPPFLAGS=-I/usr/local/opt/openssl/include;pip install pycurl --compile --no-cache-dir
结果最后一步报错:
【已解决】Mac中pip安装pycurl报错:fatal error openssl/ssl.h file not found
再回去运行PySpider看看:
 pyspider            
Error: Could not create web server listening on port 25555
[I 200731 10:27:06 result_worker:49] result_worker starting...
[I 200731 10:27:07 processor:211] processor starting...
[I 200731 10:27:07 tornado_fetcher:638] fetcher starting...
[I 200731 10:27:07 scheduler:647] scheduler starting...
[I 200731 10:27:07 scheduler:782] scheduler.xmlrpc listening on 127.0.0.1:23333
[I 200731 10:27:07 scheduler:586] in 5m: new:0,success:0,retry:0,failed:0
[I 200731 10:27:07 app:84] webui exiting...
Traceback (most recent call last):
  File "/Users/xxx/.pyenv/versions/3.6.8/bin/pyspider", line 11, in <module>
    load_entry_point('pyspider==0.3.10', 'console_scripts', 'pyspider')()
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/pyspider/run.py", line 754, in main
    cli()
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/core.py", line 1236, in invoke
    return Command.invoke(self, ctx)
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/decorators.py", line 21, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/pyspider/run.py", line 165, in cli
    ctx.invoke(all)
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/decorators.py", line 21, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/pyspider/run.py", line 497, in all
    ctx.invoke(webui, **webui_config)
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/decorators.py", line 21, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/pyspider/run.py", line 384, in webui
    app.run(host=host, port=port)
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/pyspider/webui/app.py", line 59, in run
    from .webdav import dav_app
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/pyspider/webui/webdav.py", line 216, in <module>
    dav_app = WsgiDAVApp(config)
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/wsgidav/wsgidav_app.py", line 134, in __init__
    _check_config(config)
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/wsgidav/wsgidav_app.py", line 118, in _check_config
    raise ValueError("Invalid configuration:\n  - " + "\n  - ".join(errors))
ValueError: Invalid configuration:
  - Deprecated option 'domaincontroller': use 'http_authenticator.domain_controller' instead.
 ✘ xxx@xxx  ~/dev/crifan/python/demo_spider  Error: Could not create web server listening on port 25555
依旧报错,但是感觉是之前phantomjs的问题,所以去杀掉:
 ✘ xxx@xxx  ~/dev/crifan/python/demo_spider  ps aux | grep 25555
xxx            35620   0.0  0.0  4277272    820 s002  R+   10:27上午   0:00.00 grep --color=auto --exclude-dir=.bzr --exclude-dir=CVS --exclude-dir=.git --exclude-dir=.hg --exclude-dir=.svn 25555
xxx            33983   0.0  0.4  6130968  34128 s002  S    10:17上午   0:30.45 phantomjs --ssl-protocol=any --disk-cache=true /Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/pyspider/fetcher/phantomjs_fetcher.js 25555
 xxx@xxx  ~/dev/crifan/python/demo_spider  kill -9 33983
结果:
端口问题解决了,不报错了:
 pyspider           
phantomjs fetcher running on port 25555
[I 200731 10:28:35 result_worker:49] result_worker starting...
[I 200731 10:28:35 processor:211] processor starting...
[I 200731 10:28:35 tornado_fetcher:638] fetcher starting...
[I 200731 10:28:35 scheduler:647] scheduler starting...
[I 200731 10:28:35 scheduler:782] scheduler.xmlrpc listening on 127.0.0.1:23333
[I 200731 10:28:35 scheduler:586] in 5m: new:0,success:0,retry:0,failed:0
[I 200731 10:28:35 app:84] webui exiting...
不过前面的错误依旧:
[I 200731 10:28:35 app:84] webui exiting...
Traceback (most recent call last):
  File "/Users/xxx/.pyenv/versions/3.6.8/bin/pyspider", line 11, in <module>
    load_entry_point('pyspider==0.3.10', 'console_scripts', 'pyspider')()
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/pyspider/run.py", line 754, in main
    cli()
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/core.py", line 1236, in invoke
    return Command.invoke(self, ctx)
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/decorators.py", line 21, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/pyspider/run.py", line 165, in cli
    ctx.invoke(all)
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/decorators.py", line 21, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/pyspider/run.py", line 497, in all
    ctx.invoke(webui, **webui_config)
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/decorators.py", line 21, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/pyspider/run.py", line 384, in webui
    app.run(host=host, port=port)
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/pyspider/webui/app.py", line 59, in run
    from .webdav import dav_app
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/pyspider/webui/webdav.py", line 216, in <module>
    dav_app = WsgiDAVApp(config)
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/wsgidav/wsgidav_app.py", line 134, in __init__
    _check_config(config)
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/wsgidav/wsgidav_app.py", line 118, in _check_config
    raise ValueError("Invalid configuration:\n  - " + "\n  - ".join(errors))
ValueError: Invalid configuration:
  - Deprecated option 'domaincontroller': use 'http_authenticator.domain_controller' instead.
pyspider Deprecated option ‘domaincontroller’: use ‘http_authenticator.domain_controller’ instead
安装pyspider遇到的坑(python3.6)_盛夏88688的博客-CSDN博客_python 3.6 with报错use async with instead
pip install wsgidav==2.4.1
log
 pip install wsgidav==2.4.1
Collecting wsgidav==2.4.1
  Downloading https://files.pythonhosted.org/packages/95/e8/88e25c17ff671f7fad21fe16cdc435c33c4befe35203bd47c05366af362a/WsgiDAV-2.4.1-py2.py3-none-any.whl (186kB)
    100% |████████████████████████████████| 194kB 1.5MB/s 
Requirement already satisfied: PyYAML in /Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages (from wsgidav==2.4.1) (5.3.1)
Collecting jsmin (from wsgidav==2.4.1)
  Downloading https://files.pythonhosted.org/packages/17/73/615d1267a82ed26cd7c124108c3c61169d8e40c36d393883eaee3a561852/jsmin-2.2.2.tar.gz
Requirement already satisfied: defusedxml in /Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages (from wsgidav==2.4.1) (0.6.0)
Installing collected packages: jsmin, wsgidav
  Running setup.py install for jsmin ... done
  Found existing installation: WsgiDAV 3.0.3
    Uninstalling WsgiDAV-3.0.3:
      Successfully uninstalled WsgiDAV-3.0.3
Successfully installed jsmin-2.2.2 wsgidav-2.4.1
即可解决问题。
不过又出现其他问题:
[I 200731 10:49:44 app:84] webui exiting...
Traceback (most recent call last):
  File "/Users/xxx/.pyenv/versions/3.6.8/bin/pyspider", line 11, in <module>
    load_entry_point('pyspider==0.3.10', 'console_scripts', 'pyspider')()
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/pyspider/run.py", line 754, in main
    cli()
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/core.py", line 1236, in invoke
    return Command.invoke(self, ctx)
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/decorators.py", line 21, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/pyspider/run.py", line 165, in cli
    ctx.invoke(all)
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/decorators.py", line 21, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/pyspider/run.py", line 497, in all
    ctx.invoke(webui, **webui_config)
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/click/decorators.py", line 21, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/pyspider/run.py", line 384, in webui
    app.run(host=host, port=port)
  File "/Users/xxx/.pyenv/versions/3.6.8/lib/python3.6/site-packages/pyspider/webui/app.py", line 64, in run
    from werkzeug.wsgi import DispatcherMiddleware
ImportError: cannot import name 'DispatcherMiddleware'
pyspider ImportError: cannot import name ‘DispatcherMiddleware’
pyspider all命令报错如下:ImportError: cannot import name DispatcherMiddleware from werkzeug.wsgi_lang_niu的专栏-CSDN博客_importerror: cannot import name dispatchermiddlew
pip install werkzeug==0.16.1
log
 pip install werkzeug==0.16.1
Collecting werkzeug==0.16.1
  Downloading Werkzeug-0.16.1-py2.py3-none-any.whl (327 kB)
     |████████████████████████████████| 327 kB 511 kB/s 
Installing collected packages: werkzeug
  Attempting uninstall: werkzeug
    Found existing installation: Werkzeug 1.0.1
    Uninstalling Werkzeug-1.0.1:
      Successfully uninstalled Werkzeug-1.0.1
Successfully installed werkzeug-0.16.1
结果:
终于可以了。
 pyspider           
phantomjs fetcher running on port 25555
[I 200731 10:52:00 result_worker:49] result_worker starting...
[I 200731 10:52:00 processor:211] processor starting...
[I 200731 10:52:00 tornado_fetcher:638] fetcher starting...
[I 200731 10:52:00 scheduler:647] scheduler starting...
[I 200731 10:52:00 scheduler:782] scheduler.xmlrpc listening on 127.0.0.1:23333
[I 200731 10:52:00 scheduler:586] in 5m: new:0,success:0,retry:0,failed:0
[I 200731 10:52:00 app:76] webui running on 0.0.0.0:5000
浏览器打开:
http://0.0.0.0:5000
即可正常启动。

转载请注明:在路上 » 【已解决】Mac中启动PySpider

发表我的评论
取消评论

表情

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址
92 queries in 0.189 seconds, using 23.41MB memory