折腾:
期间,解决了之前的错误,又出现别的错误:
<code>➜ AutocarData pyspider -c config.json result_worker Traceback (most recent call last): File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/bin/pyspider", line 11, in <module> sys.exit(main()) File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/pyspider/run.py", line 754, in main cli() File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/click/core.py", line 722, in __call__ return self.main(*args, **kwargs) File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/click/core.py", line 697, in main rv = self.invoke(ctx) File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/click/core.py", line 1066, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/click/core.py", line 895, in invoke return ctx.invoke(self.callback, **ctx.params) File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/click/core.py", line 535, in invoke return callback(*args, **kwargs) File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/click/decorators.py", line 17, in new_func return f(get_current_context(), *args, **kwargs) File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/pyspider/run.py", line 299, in result_worker result_worker = ResultWorker(resultdb=g.resultdb, inqueue=g.processor2result) TypeError: __init__() got an unexpected keyword argument 'resultdb' </code>
TypeError: __init__() got an unexpected keyword argument ‘resultdb’
Error 500 in webUI result view when use mongodb as result db · Issue #251 · binux/pyspider
PySpider:一个国人编写的强大的网络爬虫系统并带有强大的WebUI – Python开发 – 评论 | CTOLib码库
感觉很怪。
干脆去掉resultdb,改为:
<code>{ "taskdb": "mysql://root:crifan_mysql@127.0.0.1:3306/AutohomeTaskdb", "projectdb": "mysql://root:crifan_mysql@127.0.0.1:3306/AutohomeProjectdb", "resultdb": "mysql://root:crifan_mysql@127.0.0.1:3306/AutohomeResultdb", "result_worker":{ "result_cls": "AutohomeResultWorker.AutohomeResultWorker" } } </code>
结果:
<code>➜ AutocarData pyspider -c config.json result_worker Traceback (most recent call last): File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/bin/pyspider", line 11, in <module> sys.exit(main()) File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/pyspider/run.py", line 754, in main cli() File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/click/core.py", line 722, in __call__ return self.main(*args, **kwargs) File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/click/core.py", line 697, in main rv = self.invoke(ctx) File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/click/core.py", line 1066, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/click/core.py", line 895, in invoke return ctx.invoke(self.callback, **ctx.params) File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/click/core.py", line 535, in invoke return callback(*args, **kwargs) File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/click/decorators.py", line 17, in new_func return f(get_current_context(), *args, **kwargs) File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/pyspider/run.py", line 299, in result_worker result_worker = ResultWorker(resultdb=g.resultdb, inqueue=g.processor2result) File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/pyspider/libs/utils.py", line 355, in __getattr__ return ret.__get__(self, ObjectDict) File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/pyspider/libs/utils.py", line 342, in __get__ return self.getter() File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/pyspider/run.py", line 43, in <lambda> return utils.Get(lambda: connect_database(value)) File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/pyspider/database/__init__.py", line 44, in connect_database db = _connect_database(url) File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/pyspider/database/__init__.py", line 54, in _connect_database raise Exception('wrong scheme format: %s' % parsed.scheme) Exception: wrong scheme format: mysql </code>
难道是:
之前
【已解决】pyspider中运行result_worker出错:ModuleNotFoundError No module named mysql
弄的
ConfigParser.py
导致的参数解析的问题?
那去掉试试,换个名字
<code>➜ AutocarData mv /Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/ConfigParser.py /Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/ConfigParser.py_backup ➜ AutocarData ll /Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/ConfigParse* -rw-r--r-- 1 crifan staff 52K 5 5 22:31 /Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/ConfigParser.py_backup </code>
还是:
<code> raise Exception('wrong scheme format: %s' % parsed.scheme) Exception: wrong scheme format: mysql </code>
再改回:
<code>{ "taskdb": "mysql+taskdb://root:crifan_mysql@127.0.0.1:3306/AutohomeTaskdb", "projectdb": "mysql+projectdb://root:crifan_mysql@127.0.0.1:3306/AutohomeProjectdb", "resultdb": "mysql+resultdb://root:crifan_mysql@127.0.0.1:3306/AutohomeResultdb", "result_worker":{ "result_cls": "AutohomeResultWorker.AutohomeResultWorker" } } </code>
结果:
又回到之前的错误了:
<code>➜ AutocarData pyspider -c config.json result_worker Traceback (most recent call last): File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/bin/pyspider", line 11, in <module> sys.exit(main()) File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/pyspider/run.py", line 754, in main cli() File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/click/core.py", line 722, in __call__ return self.main(*args, **kwargs) File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/click/core.py", line 697, in main rv = self.invoke(ctx) File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/click/core.py", line 1066, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/click/core.py", line 895, in invoke return ctx.invoke(self.callback, **ctx.params) File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/click/core.py", line 535, in invoke return callback(*args, **kwargs) File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/click/decorators.py", line 17, in new_func return f(get_current_context(), *args, **kwargs) File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/pyspider/run.py", line 299, in result_worker result_worker = ResultWorker(resultdb=g.resultdb, inqueue=g.processor2result) TypeError: __init__() got an unexpected keyword argument 'resultdb' </code>
TypeError __init__() got an unexpected keyword argument resultdb
Error 500 in webUI result view when use mongodb as result db · Issue #251 · binux/pyspider
pyspider/setup.py at master · binux/pyspider
好像是:
不应该加上这个resultdb参数的?
然后去掉,用:
<code>➜ AutocarData pyspider -c config.json phantomjs fetcher running on port 25555 Process Process-2: Traceback (most recent call last): File "/usr/local/Cellar/python/3.6.4_4/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/usr/local/Cellar/python/3.6.4_4/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/process.py", line 93, in run self._target(*self._args, **self._kwargs) File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/click/core.py", line 535, in invoke return callback(*args, **kwargs) File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/click/decorators.py", line 17, in new_func return f(get_current_context(), *args, **kwargs) File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/pyspider/run.py", line 299, in result_worker result_worker = ResultWorker(resultdb=g.resultdb, inqueue=g.processor2result) TypeError: __init__() got an unexpected keyword argument 'resultdb' [I 180508 20:38:19 tornado_fetcher:638] fetcher starting... [I 180508 20:38:19 processor:211] processor starting... [I 180508 20:38:19 scheduler:647] scheduler starting... [I 180508 20:38:19 scheduler:586] in 5m: new:0,success:0,retry:0,failed:0 [I 180508 20:38:19 scheduler:782] scheduler.xmlrpc listening on 127.0.0.1:23333 [I 180508 20:38:20 app:76] webui running on 0.0.0.0:5000 </code>
可以正常运行了。
但是不知道内部到底是否真的用到了:
mysql的resultdb
看到现在project都没了:
难道是配置中的:
<code>"projectdb": "mysql+projectdb://root:crifan_mysql@127.0.0.1:3306/AutohomeProjectdb", </code>
生效了?
因为此处projectdb是空的:
那去去掉projectdb的配置,然后重新运行试试
<code>{ "resultdb": "mysql+resultdb://root:crifan_mysql@127.0.0.1:3306/AutohomeResultdb", "result_worker":{ "result_cls": "AutohomeResultWorker.AutohomeResultWorker" } } </code>
结果:
注意到,log中还是输出了:
<code>Process Process-2: Traceback (most recent call last): File "/usr/local/Cellar/python/3.6.4_4/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/usr/local/Cellar/python/3.6.4_4/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/process.py", line 93, in run self._target(*self._args, **self._kwargs) File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/click/core.py", line 535, in invoke return callback(*args, **kwargs) File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/click/decorators.py", line 17, in new_func return f(get_current_context(), *args, **kwargs) File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/pyspider/run.py", line 299, in result_worker result_worker = ResultWorker(resultdb=g.resultdb, inqueue=g.processor2result) TypeError: __init__() got an unexpected keyword argument 'resultdb' </code>
然后project是出来了:
去看看源码:
pyspider/run.py at master · binux/pyspider
<code>@cli.command() @click.option('--result-cls', default='pyspider.result.ResultWorker', callback=load_cls, help='ResultWorker class to be used.') @click.pass_context def result_worker(ctx, result_cls, get_object=False): """ Run result worker. """ g = ctx.obj ResultWorker = load_cls(None, None, result_cls) result_worker = ResultWorker(resultdb=g.resultdb, inqueue=g.processor2result) g.instances.append(result_worker) if g.get('testing_mode') or get_object: return result_worker result_worker.run() </code>
好像是自己的此处的继承ResultWorker的写法有问题?
去看了源码:
果然是的
所以去改为:
<code>class AutohomeResultWorker(ResultWorker): mysqldb = None def __init__(self, resultdb, inqueue): """init mysql db""" print("AutohomeResultWorker init: resultdb=%, inqueue=%s" % (resultdb, inqueue)) super.__init__(resultdb, inqueue) if self.mysqldb is None: self.mysqldb = MysqlDb() print("self.mysqldb=%s" % self.mysqldb) </code>
结果:
期间,先去解决:
【已解决】Python中继承父类如何重写__init__以自定义初始化
然后貌似用代码:
<code>import pymysql import pymysql.cursors from pyspider.result import ResultWorker class AutohomeResultWorker(ResultWorker): # mysqldb = None def __init__(self, resultdb, inqueue): """init mysql db""" print("AutohomeResultWorker init") print("resultdb=%s, inqueue=%s" % (resultdb, inqueue)) ResultWorker.__init__(self, resultdb, inqueue) # print("self.mysqldb=%s" % (self.mysqldb)) # if self.mysqldb is None: self.mysqldb = MysqlDb() print("self.mysqldb=%s" % self.mysqldb) def on_result(self, task, result): """override pyspider on_result to save data into mysql""" # assert task['taskid'] # assert task['project'] # assert task['url'] # assert result print("on_result: result=%s" % result) insertOk = self.mysqldb.insert(result) print("insertOk=%s" % insertOk) class MysqlDb: ... </code>
就可以了?
至少正常运行,没有错误了:
<code>➜ AutocarData pyspider -c config.json all phantomjs fetcher running on port 25555 AutohomeResultWorker init resultdb=<pyspider.database.mysql.resultdb.ResultDB object at 0x1025b3c18>, inqueue=<pyspider.libs.multiprocessing_queue.MultiProcessingQueue object at 0x1025b3a20> connect mysql ok, self.connection= <pymysql.connections.Connection object at 0x102763d30> Connect mysql return True self.mysqldb=<AutohomeResultWorker.MysqlDb object at 0x102763cc0> [I 180508 21:13:12 result_worker:49] result_worker starting... [I 180508 21:13:12 tornado_fetcher:638] fetcher starting... [I 180508 21:13:12 processor:211] processor starting... [I 180508 21:13:12 scheduler:647] scheduler starting... [I 180508 21:13:12 scheduler:126] project autohomeBrandData updated, status:TODO, paused:False, 0 tasks [I 180508 21:13:12 scheduler:782] scheduler.xmlrpc listening on 127.0.0.1:23333 [I 180508 21:13:12 scheduler:586] in 5m: new:0,success:0,retry:0,failed:0 [I 180508 21:13:12 app:76] webui running on 0.0.0.0:5000 </code>
然后接着去调试看看,最终能否调用到:
resultdb,执行到此处的AutohomeResultWorker中的on_result
【总结】
此处之所以出错:
result_worker = ResultWorker(resultdb=g.resultdb, inqueue=g.processor2result)
TypeError: __init__() got an unexpected keyword argument ‘resultdb’
原因是:
之前继承ResultWorker的AutohomeResultWorker的__init__初始化写的有问题
写成了:
<code>def __init__(self): </code>
后来是参考:
pyspider/result/result_worker.py
的源码:
<code>class ResultWorker(object): def __init__(self, resultdb, inqueue): self.resultdb = resultdb self.inqueue = inqueue self._quit = False </code>
看到是除了self外,还有2个参数:resultdb和inqueue
所以自己的继承该类的代码也要有这些参数才对。
然后再通过:
【已解决】Python中继承父类如何重写__init__以自定义初始化
搞清楚了如何调用父类去__init__
然后改为正确的写法:
<code>class AutohomeResultWorker(ResultWorker): def __init__(self, resultdb, inqueue): """init mysql db""" print("AutohomeResultWorker init") print("resultdb=%s, inqueue=%s" % (resultdb, inqueue)) ResultWorker.__init__(self, resultdb, inqueue) # print("self.mysqldb=%s" % (self.mysqldb)) # if self.mysqldb is None: self.mysqldb = MysqlDb() print("self.mysqldb=%s" % self.mysqldb) </code>
心得:
要认真分析错误提示,从错误提示入手,找到错误的原因和线索,然后顺藤摸瓜去找到问题根源,然后才能解决掉
分析过程:
还是要认真看人家显示出来的错误的提示信息:
TypeError: __init__() got an unexpected keyword argument ‘resultdb’
意思是:
__init__收到了一个,没有想到的,没有期望,的参数,resultdb
而此处的错误行的代码是:
result_worker = ResultWorker(resultdb=g.resultdb, inqueue=g.processor2result)
-》所以自己当时认真看错误提示,应该能想到:
此处应该去找ResultWorker方面的问题。
-〉最后是找到了是自己继承人家的ResultWorker的__init__写法有误
-》最终才改为正确初始化写法,才解决此问题的。
转载请注明:在路上 » 【已解决】pyspider中出错:TypeError __init__() got an unexpected keyword argument resultdb