折腾:
【已解决】汽车之家车型车系数据:抓取车型的详细参数配置
期间,已经基本上实现了获取参数配置数据了。
但是有个问题:
进入参数配置页面时,是通过:
# https://car.autohome.com.cn/config/spec/43593.html
print("carConfigSpecUrl=%s" % carConfigSpecUrl)
self.crawl(carConfigSpecUrl,
fetch_type="js",
callback=self.carConfigSpecCallback,
save=carModelDict,
)即运行了js部分的
这样后续的
"""
<table class="tbcs" id="tab_0" style="width: 932px;">
<tbody>
<tr>
<th class="cstitle" show="1" pid="tab_0" id="nav_meto_0" colspan="5">
<h3><span>基本参数</span></h3>
</th>
</tr>
<tr data-pnid="1_-1" id="tr_0">
"""
tbodyDoc = response.doc("table[id='tab_0'] tbody")
print("tbodyDoc=%s" % tbodyDoc)才不会是空的。
才能获取,运行了js后的,html中表格部分的值,才是已合并后的文字。
但是带来了问题:
每个页面都需要额外运行js,很慢。
如果批量爬取,速度太慢。
现在希望是实现加速:去掉js执行部分。
看看不用js返回的html中,能否直接从json数据中获取这些值。

之前就研究过的,config的json中包含对应的值。
看看如何去提取
不过此处由于单个车系车型太多,不好分析具体的值
换个车系里车型少的,就2个的
去看看
发现config处理后:
{
"message": "<span class='hs_kw29_configpl'></span>",
"result": {
"paramtypeitems": [{
"name": "基本参数",
"paramitems": [{
"id": 0,
"name": "车型<span class='hs_kw33_configpl'></span>",
"pnid": "1_-1",
"valueitems": [{
"specid": 39893,
"value": "<span class='hs_kw9_configpl'></span>Q2L e-tron 2019款 Q2L e-tron 纯电智享型"
}, {
"specid": 42875,
"value": "<span class='hs_kw9_configpl'></span>Q2L e-tron 2019款 Q2L e-tron 纯电智酷型"
}]
}, {
"id": 0,
"name": "厂<span class='hs_kw15_configpl'></span><span class='hs_kw0_configpl'></span><span class='hs_kw57_configpl'></span><span class='hs_kw55_configpl'></span>(<span class='hs_kw14_configpl'></span>)",
"pnid": "1_-1",
"valueitems": [{
"specid": 39893,
"value": "23.73<span class='hs_kw1_configpl'></span>"
}, {
"specid": 42875,
"value": "22.68<span class='hs_kw1_configpl'></span>"
}]
}, {
"id": 52,
"name": "厂<span class='hs_kw15_configpl'></span>",
"pnid": "1_-1",
"valueitems": [{
"specid": 39893,
"value": "<span class='hs_kw26_configpl'></span>-<span class='hs_kw47_configpl'></span><span class='hs_kw9_configpl'></span>"
}, {
"specid": 42875,
"value": "<span class='hs_kw26_configpl'></span>-<span class='hs_kw47_configpl'></span><span class='hs_kw9_configpl'></span>"
}]
}, {
"id": 53,
"name": "级别",
"pnid": "1_-1",
"valueitems": [{
"specid": 39893,
"value": "<span class='hs_kw16_configpl'></span>"
}, {
"specid": 42875,
"value": "<span class='hs_kw16_configpl'></span>"
}]
}, {
"id": 1149,
"name": "能源类型",
"pnid": "1_-1",
"valueitems": [{
"specid": 39893,
"value": "纯电动"
}, {
"specid": 42875,
"value": "纯电动"
}]
}, {
"id": 0,
"name": "上市<span class='hs_kw40_configpl'></span>",
"pnid": "1_-1",
"valueitems": [{
"specid": 39893,
"value": "2019.11"
}, {
"specid": 42875,
"value": "2019.11"
}]
}, {
"id": 1291,
"name": "工信部纯电续航里程(km)",
"pnid": "1_-1",
"valueitems": [{
"specid": 39893,
"value": "265"
}, {
"specid": 42875,
"value": "265"
}]
}, {
"id": 1292,
"name": "<span class='hs_kw39_configpl'></span><span class='hs_kw40_configpl'></span>(小时)",
"pnid": "1_-1",
"valueitems": [{
"specid": 39893,
"value": "0.6"
}, {
"specid": 42875,
"value": "0.6"
}]
}, {
"id": 0,
"name": "<span class='hs_kw10_configpl'></span><span class='hs_kw40_configpl'></span>(小时)",
"pnid": "1_-1",
"valueitems": [{
"specid": 39893,
"value": "17"
}, {
"specid": 42875,
"value": "17"
}]
}, {
"id": 0,
"name": "<span class='hs_kw39_configpl'></span><span class='hs_kw11_configpl'></span>百分比",
"pnid": "1_-1",
"valueitems": [{
"specid": 39893,
"value": "80"
}, {
"specid": 42875,
"value": "80"
}]
}, {
"id": 1185,
"name": "<span class='hs_kw8_configpl'></span><span class='hs_kw42_configpl'></span>(kW)",
"pnid": "1_-1",
"valueitems": [{
"specid": 39893,
"value": "100"
}, {
"specid": 42875,
"value": "100"
}]
}, {
"id": 1186,
"name": "<span class='hs_kw8_configpl'></span><span class='hs_kw2_configpl'></span>(N·m)",
"pnid": "1_-1",
"valueitems": [{
"specid": 39893,
"value": "290"
}, {
"specid": 42875,
"value": "290"
}]
}, {
"id": 0,
"name": "电动机(Ps)",
"pnid": "1_-1",
"valueitems": [{
"specid": 39893,
"value": "136"
}, {
"specid": 42875,
"value": "136"
}]
}, {
"id": 1148,
"name": "长*宽*高(mm)",
"pnid": "1_-1",
"valueitems": [{
"specid": 39893,
"value": "4237*1785*1548"
}, {
"specid": 42875,
"value": "4237*1785*1548"
}]
}, {
"id": 1147,
"name": "车身结构",
"pnid": "1_-1",
"valueitems": [{
"specid": 39893,
"value": "5门5座SUV"
}, {
"specid": 42875,
"value": "5门5座SUV"
}]
}, {
"id": 1246,
"name": "最高车速(km/h)",
"pnid": "1_-1",
"valueitems": [{
"specid": 39893,
"value": "150"
}, {
"specid": 42875,
"value": "150"
}]
}, {
"id": 1250,
"name": "官方0-100km/h加速(s)",
"pnid": "1_-1",
"valueitems": [{
"specid": 39893,
"value": "-"
}, {
"specid": 42875,
"value": "-"
}]
}, {
"id": 1252,
"name": "<span class='hs_kw22_configpl'></span>0-100km/h加速(s)",
"pnid": "1_-1",
"valueitems": [{
"specid": 39893,
"value": "-"
}, {
"specid": 42875,
"value": "-"
}]
}, {
"id": 1253,
"name": "<span class='hs_kw22_configpl'></span>100-0km/h制动(m)",
"pnid": "1_-1",
"valueitems": [{
"specid": 39893,
"value": "-"
}, {
"specid": 42875,
"value": "-"
}]
}, {
"id": 0,
"name": "<span class='hs_kw22_configpl'></span>续航里程(km)",
"pnid": "1_-1",
"valueitems": [{
"specid": 39893,
"value": "-"
}, {
"specid": 42875,
"value": "-"
}]
}, {
"id": 0,
"name": "<span class='hs_kw22_configpl'></span><span class='hs_kw39_configpl'></span><span class='hs_kw40_configpl'></span>(小时)",
"pnid": "1_-1",
"valueitems": [{
"specid": 39893,
"value": "-"
}, {
"specid": 42875,
"value": "-"
}]
}, {
"id": 0,
"name": "<span class='hs_kw22_configpl'></span><span class='hs_kw10_configpl'></span><span class='hs_kw40_configpl'></span>(小时)",
"pnid": "1_-1",
"valueitems": [{
"specid": 39893,
"value": "-"
}, {
"specid": 42875,
"value": "-"
}]
}, {
"id": 1255,
"name": "整车<span class='hs_kw36_configpl'></span>",
"pnid": "1_-1",
"valueitems": [{
"specid": 39893,
"value": "三<span class='hs_kw7_configpl'></span>10<span class='hs_kw1_configpl'></span>公里"
}, {
"specid": 42875,
"value": "三<span class='hs_kw7_configpl'></span>10<span class='hs_kw1_configpl'></span>公里"
}]
}]
},
。。。
其中的内容,很方便提取,也和运行js后的结果是一样的
对应着页面上的:

所以,还是容易处理的。
所以去把之前的,从html中提出first value的代码:
def getItemFirstValue(self, rootDoc, trNumber, isRespDoc=False):
"""
<tr data-pnid="1_-1" id="tr_2">
<th>
<div id="1149"><a href="https://car.autohome.com.cn/baike/detail_7_18_1149.html#pvareaid=2042252">能源类型</a>
</div>
</th>
<td style="background:#F0F3F8;">
<div>纯电动</div>
</td>
<tr data-pnid="1_-1" id="tr_3">
<th>
<div id="0">上市<span class="hs_kw40_configxv"></span></div>
</th>
<td style="background:#F0F3F8;">
<div>2019.11</div>
</td>
<td>
<div>2019.11</div>
</td>
<td>
<div></div>
</td>
<td>
<div></div>
</td>
</tr>
"""
trQuery = "tr[id='tr_%s']" % trNumber
# print("trQuery=%s" % trQuery)
trDoc = rootDoc.find(trQuery)
# print("trDoc=%s" % trDoc)
tdDocGenerator = trDoc.items("td")
# print("tdDocGenerator=%s" % tdDocGenerator)
tdDocList = list(tdDocGenerator)
# print("tdDocList=%s" % tdDocList)
firstTdDoc = tdDocList[0]
# print("firstTdDoc=%s" % firstTdDoc)
firstTdDivDoc = firstTdDoc.find("div")
print("firstTdDivDoc=%s" % firstTdDivDoc)
if isRespDoc:
respItem = firstTdDivDoc
else:
firstItemValue = firstTdDivDoc.text()
respItem = firstItemValue
print("respItem=%s" % respItem)
return respItem去改为从config中提取
这样就不用run js了。
最后代码是:
def getItemFirstValue(self, inputContent, itemIndex):
print("in getItemFirstValue")
# firstItemValue = self.extractTrFirstTdValue(inputContent, itemIndex)
firstItemValue = self.extractDictListFirstValue(inputContent, itemIndex)
return firstItemValue
def extractDictListFirstValue(self, paramItemDictList, itemIndex):
"""
[
...,
{
"id": 1149,
"name": "能源类型",
"pnid": "1_-1",
"valueitems": [{
"specid": 39893,
"value": "纯电动"
}, {
"specid": 42875,
"value": "纯电动"
}]
}
...,
{
"id": 1292,
"name": "<span class='hs_kw39_configpl'></span><span class='hs_kw40_configpl'></span>(小时)",
"pnid": "1_-1",
"valueitems": [{
"specid": 39893,
"value": "0.6"
}, {
"specid": 42875,
"value": "0.6"
}]
},
...
,
{
"id": 1255,
"name": "整车<span class='hs_kw36_configpl'></span>",
"pnid": "1_-1",
"valueitems": [{
"specid": 39893,
"value": "三<span class='hs_kw7_configpl'></span>10<span class='hs_kw1_configpl'></span>公里"
}, {
"specid": 42875,
"value": "三<span class='hs_kw7_configpl'></span>10<span class='hs_kw1_configpl'></span>公里"
}]
}
]
"""
paramItemDict = paramItemDictList[itemIndex]
print("paramItemDict=%s" % paramItemDict)
valueItemList = paramItemDict["valueitems"]
print("valueItemList=%s" % valueItemList)
firstItemDict = valueItemList[0]
print("firstItemDict=%s" % firstItemDict)
# firstItemDict={'specid': 43593, 'value': "<span class='hs_kw57_configxt'></span>-<span class='hs_kw21_configxt'></span><span class='hs_kw24_configxt'></span>"}
firtItemValue = firstItemDict["value"]
# firtItemValue=<span class='hs_kw57_configxt'></span>-<span class='hs_kw21_configxt'></span><span class='hs_kw24_configxt'></span>
print("firtItemValue=%s" % firtItemValue)
return firtItemValue
# def extractTrFirstTdValue(self, rootDoc, trNumber, isRespDoc=False):
def extractTrFirstTdValue(self, rootDoc, trNumber):
"""
<tr data-pnid="1_-1" id="tr_2">
<th>
<div id="1149"><a href="https://car.autohome.com.cn/baike/detail_7_18_1149.html#pvareaid=2042252">能源类型</a>
</div>
</th>
<td style="background:#F0F3F8;">
<div>纯电动</div>
</td>
<tr data-pnid="1_-1" id="tr_3">
<th>
<div id="0">上市<span class="hs_kw40_configxv"></span></div>
</th>
<td style="background:#F0F3F8;">
<div>2019.11</div>
</td>
<td>
<div>2019.11</div>
</td>
<td>
<div></div>
</td>
<td>
<div></div>
</td>
</tr>
<tr data-pnid="1_-1" id="tr_20" style="background: rgb(255, 255, 255);">
<th>
<div id="1255"><a href="https://car.autohome.com.cn/baike/detail_7_18_1255.html#pvareaid=2042252">整车<span
class="hs_kw36_configaJ"></span></a></div>
</th>
<td style="background:#F0F3F8;">
<div>三<span class="hs_kw7_configaJ"></span>10<span class="hs_kw1_configaJ"></span>公里</div>
</td>
<td>
<div>三<span class="hs_kw7_configaJ"></span>10<span class="hs_kw1_configaJ"></span>公里</div>
</td>
<td>
<div></div>
</td>
<td>
<div></div>
</td>
</tr>
"""
trQuery = "tr[id='tr_%s']" % trNumber
# print("trQuery=%s" % trQuery)
trDoc = rootDoc.find(trQuery)
# print("trDoc=%s" % trDoc)
tdDocGenerator = trDoc.items("td")
# print("tdDocGenerator=%s" % tdDocGenerator)
tdDocList = list(tdDocGenerator)
# print("tdDocList=%s" % tdDocList)
firstTdDoc = tdDocList[0]
# print("firstTdDoc=%s" % firstTdDoc)
firstTdDivDoc = firstTdDoc.find("div")
print("firstTdDivDoc=%s" % firstTdDivDoc)
# if isRespDoc:
# respItem = firstTdDivDoc
# else:
# firstItemValue = firstTdDivDoc.text()
# respItem = firstItemValue
# print("respItem=%s" % respItem)
# return respItem
respItemHtml = firstTdDivDoc.html()
print("respItemHtml=%s" % respItemHtml)
return respItemHtml以及相关的函数也优化一下
比如 整车质保:
# def extractWholeWarranty(self, firstDivDoc):
def extractWholeWarranty(self, firstDivHtml):
carModelWholeWarranty = ""
# <div>三<span class="hs_kw7_configxv"></span>10<span class="hs_kw1_configxv"></span>公里</div>
# print("firstDivDoc=%s" % firstDivDoc)
# carModelWholeWarranty = firstDivDoc.text() # 三10公里
# firstDivHtml = firstDivDoc.html()
print("firstDivHtml=%s" % firstDivHtml)
# 三<span class="hs_kw7_configCC"></span>10<span class="hs_kw1_configCC"></span>公里
# carWholeQualityQuarantee = re.sub("[^<>]+(?P<firstSpan><span.+?></span>)[^<>]+(?P<secondSpan><span.+?></span>)[^<>]+", )
foundYearDistance = re.search("(?P<warrantyYear>[^<>]+)<span.+?></span>(?P<distanceNumber>[^<>]+)<span.+?></span>(?P<distanceUnit>[^<>]+)", firstDivHtml)
if foundYearDistance:
warrantyYear = foundYearDistance.group("warrantyYear")
distanceNumber = foundYearDistance.group("distanceNumber")
distanceUnit = foundYearDistance.group("distanceUnit")
carModelWholeWarranty = "%s年或%s万%s" % (warrantyYear, distanceNumber, distanceUnit)
else:
# special:
# https://car.autohome.com.cn/config/spec/46700.html
# <div>三<span class="hs_kw58_configWh"></span></div>
# 三<span class="hs_kw58_configOf"></span>
foundYearNotLimitDistance = re.search("(?P<warrantyYear>[^<>]+)<span.+?></span>", firstDivHtml)
if foundYearNotLimitDistance:
warrantyYear = foundYearNotLimitDistance.group("warrantyYear")
carModelWholeWarranty = "%s年不限公里" % warrantyYear
print("carModelWholeWarranty=%s" % carModelWholeWarranty)
return carModelWholeWarranty
def getWholeWarranty(self, inputContent, itemIndex):
# firstDivDoc = self.getItemFirstValue(inputContent, itemIndex, isRespDoc=True)
# print("firstDivDoc=%s" % firstDivDoc)
# carModelWholeWarranty = self.extractWholeWarranty(firstDivDoc)
firstDivDocHtml = self.getItemFirstValue(inputContent, itemIndex)
print("firstDivDocHtml=%s" % firstDivDocHtml)
carModelWholeWarranty = self.extractWholeWarranty(firstDivDocHtml)
return carModelWholeWarranty也去把处理不同能源类型的共有部分整合处理,最后是:
@catch_status_code_error
def carConfigSpecCallback(self, response):
print("in carConfigSpecCallback")
curCarModelDict = response.save
print("curCarModelDict=%s" % curCarModelDict)
carModelDict = copy.deepcopy(curCarModelDict)
configSpecHtml = response.text
# print("configSpecHtml=%s" % configSpecHtml)
# print("")
# # for debug
# return
# config json item index - spec table html item index = 2
ItemIndexDiff = 2
isUseSpecTableHtml = True
isUseConfigJson = False
valueContent = None
energyTypeIdx = 2
# # Method 1: after run js, extract item value from spec table html
# """
# <table class="tbcs" id="tab_0" style="width: 932px;">
# <tbody>
# <tr>
# <th class="cstitle" show="1" pid="tab_0" id="nav_meto_0" colspan="5">
# <h3><span>基本参数</span></h3>
# </th>
# </tr>
# <tr data-pnid="1_-1" id="tr_0">
# """
# tbodyDoc = response.doc("table[id='tab_0'] tbody")
# print("tbodyDoc=%s" % tbodyDoc)
# valueContent = tbodyDoc
# isUseSpecTableHtml = True
# isUseConfigJson = False
# energyTypeIdx = 2
# Method 2: not run js, extract item value from config json
# get value from config json
# var config = {"message" ...... "returncode":"0","taskid":"8be676a3-e023-4fa9-826d-09cd42a1810c","time":"2020-08-27 20:56:17"};
foundConfigJson = re.search("var\s*config\s*=\s*(?P<configJson>\{[^;]+\});", configSpecHtml)
print("foundConfigJson=%s" % foundConfigJson)
if foundConfigJson:
configJson = foundConfigJson.group("configJson")
print("configJson=%s" % configJson)
# configDict = json.loads(configJson, encoding="utf-8")
configDict = json.loads(configJson)
print("configDict=%s" % configDict)
# if "result" in configDict:
configResultDict = configDict["result"]
print("configResultDict=%s" % configResultDict)
# if "paramtypeitems" in configResultDict:
paramTypeItemDictList = configResultDict["paramtypeitems"]
print("paramTypeItemDictList=%s" % paramTypeItemDictList)
# paramTypeItemNum = len(paramTypeItemDictList)
# print("paramTypeItemNum=%s" % paramTypeItemNum)
basicParamDict = paramTypeItemDictList[0]
print("basicParamDict=%s" % basicParamDict)
basicItemDictList = basicParamDict["paramitems"]
print("basicItemDictList=%s" % basicItemDictList)
# print("type(basicItemDictList)=%s" % type(basicItemDictList))
# basicItemNum = len(basicItemDictList)
# print("basicItemNum=%s" % basicItemNum)
valueContent = basicItemDictList
isUseSpecTableHtml = False
isUseConfigJson = True
if isUseConfigJson:
energyTypeIdx += ItemIndexDiff
if valueContent:
carEnergyType = self.getItemFirstValue(valueContent, energyTypeIdx)
# 纯电动 / 汽油 / 插电式混合动力 / 油电混合
carModelDict["carEnergyType"] = carEnergyType
if carEnergyType == "汽油":
# https://car.autohome.com.cn/config/spec/43593.html
# https://car.autohome.com.cn/config/spec/41572.html
# self.processGasolineCar(valueContent, carModelDict)
gasolineCarKeyIdxMapDict = {
"carModelEnvStandard" : 3,
"carModelReleaseTime" : 4,
"carModelMaxPower" : 5,
"carModelMaxTorque" : 6,
"carModelEngine" : 7,
"carModelGearBox" : 8,
"carModelSize" : 9,
"carModelBodyStructure" : 10,
"carModelMaxSpeed" : 11,
"carModelOfficialSpeedupTime" : 12,
"carModelActualTestSpeedupTime" : 13,
"carModelActualTestBrakeDistance" : 14,
"carModelMiitCompositeFuelConsumption" : 15,
"carModelActualFuelConsumption" : 16,
}
wholeWarrantyIdx = 17
if isUseConfigJson:
for eachKey in gasolineCarKeyIdxMapDict.keys():
gasolineCarKeyIdxMapDict[eachKey] += ItemIndexDiff
wholeWarrantyIdx += ItemIndexDiff
self.processSingleEneryTypeCar(gasolineCarKeyIdxMapDict, valueContent, wholeWarrantyIdx, carModelDict)
elif carEnergyType == "纯电动":
# https://car.autohome.com.cn/config/spec/42875.html
# self.processPureElectricCar(valueContent, carModelDict)
pureElectricCarKeyIdxMapDict = {
"carModelReleaseTime": 3,
"carModelMiitEnduranceMileagePureElectric": 4,
"carModelQuickCharge": 5,
"carModelSlowCharge": 6,
"carModelQuickChargePercent": 7,
"carModelMaxPower": 8,
"carModelMaxTorque": 9,
"carModelHorsePowerElectric": 10,
"carModelSize": 11,
"carModelBodyStructure": 12,
"carModelMaxSpeed": 13,
"carModelOfficialSpeedupTime": 14,
"carModelActualTestSpeedupTime": 15,
"carModelActualTestBrakeDistance": 16,
"carModelActualTestEnduranceMileage": 17,
"carModelActualTestQuickCharge": 18,
"carModelActualTestSlowCharge": 19,
}
wholeWarrantyIdx = 20
if isUseConfigJson:
for eachKey in pureElectricCarKeyIdxMapDict.keys():
pureElectricCarKeyIdxMapDict[eachKey] += ItemIndexDiff
wholeWarrantyIdx += ItemIndexDiff
self.processSingleEneryTypeCar(pureElectricCarKeyIdxMapDict, valueContent, wholeWarrantyIdx, carModelDict)
elif carEnergyType == "插电式混合动力":
# https://car.autohome.com.cn/config/series/4460.html
# self.processPhevCar(valueContent, carModelDict)
phevCarKeyIdxMapDict = {
"carModelEnvStandard": 3,
"carModelReleaseTime": 4,
"carModelMiitEnduranceMileagePureElectric": 5,
"carModelQuickCharge": 6,
"carModelSlowCharge": 7,
"carModelQuickChargePercent": 8,
"carModelMaxPower": 9,
"carModelMaxTorque": 10,
"carModelEngine": 11,
"carModelHorsePowerElectric": 12,
"carModelGearBox": 13,
"carModelSize": 14,
"carModelBodyStructure": 15,
"carModelMaxSpeed": 16,
"carModelOfficialSpeedupTime": 17,
"carModelActualTestSpeedupTime": 18,
"carModelActualTestBrakeDistance": 19,
"carModelActualTestEnduranceMileage": 20,
"carModelActualTestQuickCharge": 21,
"carModelActualTestSlowCharge": 22,
"carModelMiitCompositeFuelConsumption": 23,
"carModelActualFuelConsumption": 24,
}
wholeWarrantyIdx = 25
if isUseConfigJson:
for eachKey in phevCarKeyIdxMapDict.keys():
phevCarKeyIdxMapDict[eachKey] += ItemIndexDiff
wholeWarrantyIdx += ItemIndexDiff
self.processSingleEneryTypeCar(phevCarKeyIdxMapDict, valueContent, wholeWarrantyIdx, carModelDict)
elif carEnergyType == "油电混合":
# https://car.autohome.com.cn/config/spec/35507.html
# self.processHevCar(valueContent, carModelDict)
hevCarKeyIdxMapDict = {
"carModelEnvStandard": 3,
"carModelReleaseTime": 4,
"carModelMaxPower": 5,
"carModelMaxTorque": 6,
"carModelEngine": 7,
"carModelHorsePowerElectric": 8,
"carModelGearBox": 9,
"carModelSize": 10,
"carModelBodyStructure": 11,
"carModelMaxSpeed": 12,
"carModelOfficialSpeedupTime": 13,
"carModelActualTestSpeedupTime": 14,
"carModelActualTestBrakeDistance": 15,
"carModelMiitCompositeFuelConsumption": 16,
"carModelActualFuelConsumption": 17,
}
wholeWarrantyIdx = 18
if isUseConfigJson:
for eachKey in hevCarKeyIdxMapDict.keys():
hevCarKeyIdxMapDict[eachKey] += ItemIndexDiff
wholeWarrantyIdx += ItemIndexDiff
self.processSingleEneryTypeCar(hevCarKeyIdxMapDict, valueContent, wholeWarrantyIdx, carModelDict)
else:
errMsg = "TODO: add support %s!" % carEnergyType
raise Exception(errMsg)
else:
self.saveSingleResult(carModelDict)
def processSingleEneryTypeCar(self, keyIdxMapDict, valueContent, wholeWarrantyIdx, carModelDict):
for eachItemKey in keyIdxMapDict.keys():
print("eachItemKey=%s" % eachItemKey)
eachItemIndex = keyIdxMapDict[eachItemKey]
print("eachItemIndex=%s" % eachItemIndex)
eachItemValue = self.getItemFirstValue(valueContent, eachItemIndex)
print("eachItemValue=%s" % eachItemValue)
carModelDict[eachItemKey] = eachItemValue
# 整车质保
carModelWholeWarranty = self.getWholeWarranty(valueContent, wholeWarrantyIdx) # 三年或10万公里
print("carModelWholeWarranty=%s" % carModelWholeWarranty)
carModelDict["carModelWholeWarranty"] = carModelWholeWarranty
self.saveSingleResult(carModelDict)然后把之前的都注释掉了:
# def processGasolineCar(self, valueContent, carModelDict): # # 汽油 # # https://car.autohome.com.cn/config/spec/43593.html # # https://car.autohome.com.cn/config/spec/41572.html # # 环保标准 # carModelEnvStandard = self.getItemFirstValue(valueContent, 3) # 国VI # carModelDict["carModelEnvStandard"] = carModelEnvStandard # # 上市时间 # carModelReleaseTime = self.getItemFirstValue(valueContent, 4) # 2020.04 # carModelDict["carModelReleaseTime"] = carModelReleaseTime # # 最大功率(kW) # carModelMaxPower = self.getItemFirstValue(valueContent, 5) # 110 # carModelDict["carModelMaxPower"] = carModelMaxPower # # 最大扭矩(N·m) # carModelMaxTorque = self.getItemFirstValue(valueContent, 6) # 250 # carModelDict["carModelMaxTorque"] = carModelMaxTorque # # 发动机 # carModelEngine = self.getItemFirstValue(valueContent, 7) # 1.4T 150马力 L4 # carModelDict["carModelEngine"] = carModelEngine # # 变速箱 # carModelGearBox = self.getItemFirstValue(valueContent, 8) # 7挡双离合 # carModelDict["carModelGearBox"] = carModelGearBox # # 长*宽*高(mm) # carModelSize = self.getItemFirstValue(valueContent, 9) # 4312*1785*1426 # carModelDict["carModelSize"] = carModelSize # # 车身结构 # carModelBodyStructure = self.getItemFirstValue(valueContent, 10) # 5门5座两厢车 # carModelDict["carModelBodyStructure"] = carModelBodyStructure # # 最高车速(km/h) # carModelMaxSpeed = self.getItemFirstValue(valueContent, 11) # 200 # carModelDict["carModelMaxSpeed"] = carModelMaxSpeed # # 官方0-100km/h加速(s) # carModelOfficialSpeedupTime = self.getItemFirstValue(valueContent, 12) # 8.4 # carModelDict["carModelOfficialSpeedupTime"] = carModelOfficialSpeedupTime # # 实测0-100km/h加速(s) # carModelActualTestSpeedupTime = self.getItemFirstValue(valueContent, 13) # - # carModelDict["carModelActualTestSpeedupTime"] = carModelActualTestSpeedupTime # # 实测100-0km/h制动(m) # carModelActualTestBrakeDistance = self.getItemFirstValue(valueContent, 14) # - # carModelDict["carModelActualTestBrakeDistance"] = carModelActualTestBrakeDistance # # 工信部综合油耗(L/100km) # carModelMiitCompositeFuelConsumption = self.getItemFirstValue(valueContent, 15) # 5.8 # carModelDict["carModelMiitCompositeFuelConsumption"] = carModelMiitCompositeFuelConsumption # # 实测油耗(L/100km) # carModelActualFuelConsumption = self.getItemFirstValue(valueContent, 16) # - # carModelDict["carModelActualFuelConsumption"] = carModelActualFuelConsumption # self.saveSingleResult(carModelDict) # def processPureElectricCar(self, valueContent, carModelDict): # # 纯电动 # # https://car.autohome.com.cn/config/spec/42875.html # # 上市时间 # carModelReleaseTime = self.getItemFirstValue(valueContent, 3) # 2019.11 # carModelDict["carModelReleaseTime"] = carModelReleaseTime # # 工信部纯电续航里程(km) # carModelMiitEnduranceMileagePureElectric = self.getItemFirstValue(valueContent, 4) # 265 # carModelDict["carModelMiitEnduranceMileagePureElectric"] = carModelMiitEnduranceMileagePureElectric # # 快充时间(小时) # carModelQuickCharge = self.getItemFirstValue(valueContent, 5) # 0.6 # carModelDict["carModelQuickCharge"] = carModelQuickCharge # # 慢充时间(小时) # carModelSlowCharge = self.getItemFirstValue(valueContent, 6) # 17 # carModelDict["carModelSlowCharge"] = carModelSlowCharge # # 快充电量百分比 # carModelQuickChargePercent = self.getItemFirstValue(valueContent, 7) # 80 # carModelDict["carModelQuickChargePercent"] = carModelQuickChargePercent # # 最大功率(kW) # carModelMaxPower = self.getItemFirstValue(valueContent, 8) # 100 # carModelDict["carModelMaxPower"] = carModelMaxPower # # 最大扭矩(N·m) # carModelMaxTorque = self.getItemFirstValue(valueContent, 9) # 290 # carModelDict["carModelMaxTorque"] = carModelMaxTorque # # 电动机(Ps) # carModelHorsePowerElectric = self.getItemFirstValue(valueContent, 10) # 136 # carModelDict["carModelHorsePowerElectric"] = carModelHorsePowerElectric # # 长*宽*高(mm) # carModelSize = self.getItemFirstValue(valueContent, 11) # 4237*1785*1548 # carModelDict["carModelSize"] = carModelSize # # 车身结构 # carModelBodyStructure = self.getItemFirstValue(valueContent, 12) # 5门5座SUV # carModelDict["carModelBodyStructure"] = carModelBodyStructure # # 最高车速(km/h) # carModelMaxSpeed = self.getItemFirstValue(valueContent, 13) # 150 # carModelDict["carModelMaxSpeed"] = carModelMaxSpeed # # 官方0-100km/h加速(s) # carModelOfficialSpeedupTime = self.getItemFirstValue(valueContent, 14) # - # carModelDict["carModelOfficialSpeedupTime"] = carModelOfficialSpeedupTime # # 实测0-100km/h加速(s) # carModelActualTestSpeedupTime = self.getItemFirstValue(valueContent, 15) # - # carModelDict["carModelActualTestSpeedupTime"] = carModelActualTestSpeedupTime # # 实测100-0km/h制动(m) # carModelActualTestBrakeDistance = self.getItemFirstValue(valueContent, 16) # - # carModelDict["carModelActualTestBrakeDistance"] = carModelActualTestBrakeDistance # # 实测续航里程(km) # carModelActualTestEnduranceMileage = self.getItemFirstValue(valueContent, 17) # - # carModelDict["carModelActualTestEnduranceMileage"] = carModelActualTestEnduranceMileage # # 实测快充时间(小时) # carModelActualTestQuickCharge = self.getItemFirstValue(valueContent, 18) # - # carModelDict["carModelActualTestQuickCharge"] = carModelActualTestQuickCharge # # 实测慢充时间(小时) # carModelActualTestSlowCharge = self.getItemFirstValue(valueContent, 19) # - # carModelDict["carModelActualTestSlowCharge"] = carModelActualTestSlowCharge # # 整车质保 # carModelWholeWarranty = self.getWholeWarranty(valueContent, 20) # 三年或10万公里 # carModelDict["carModelWholeWarranty"] = carModelWholeWarranty # self.saveSingleResult(carModelDict) # def processPhevCar(self, valueContent, carModelDict): # # 插电式混合动力 = PHEV = Plug-in Hybrid Electric vehicle # # https://car.autohome.com.cn/config/series/4460.html # # 环保标准 # carModelEnvStandard = self.getItemFirstValue(valueContent, 3) # 国V # carModelDict["carModelEnvStandard"] = carModelEnvStandard # # 上市时间 # carModelReleaseTime = self.getItemFirstValue(valueContent, 4) # 2018.11 # carModelDict["carModelReleaseTime"] = carModelReleaseTime # # 工信部纯电续航里程(km) # carModelMiitEnduranceMileagePureElectric = self.getItemFirstValue(valueContent, 5) # 56 # carModelDict["carModelMiitEnduranceMileagePureElectric"] = carModelMiitEnduranceMileagePureElectric # # 快充时间(小时) # carModelQuickCharge = self.getItemFirstValue(valueContent, 6) # 2.5 # carModelDict["carModelQuickCharge"] = carModelQuickCharge # # 慢充时间(小时) # carModelSlowCharge = self.getItemFirstValue(valueContent, 7) # 10.8 # carModelDict["carModelSlowCharge"] = carModelSlowCharge # # 快充电量百分比 # carModelQuickChargePercent = self.getItemFirstValue(valueContent, 8) # - # carModelDict["carModelQuickChargePercent"] = carModelQuickChargePercent # # 最大功率(kW) # carModelMaxPower = self.getItemFirstValue(valueContent, 9) # 270 # carModelDict["carModelMaxPower"] = carModelMaxPower # # 最大扭矩(N·m) # carModelMaxTorque = self.getItemFirstValue(valueContent, 10) # 700 # carModelDict["carModelMaxTorque"] = carModelMaxTorque # # 发动机 # carModelEngine = self.getItemFirstValue(valueContent, 11) # 2.0T 252马力 L4 # carModelDict["carModelEngine"] = carModelEngine # # 电动机(Ps) # carModelHorsePowerElectric = self.getItemFirstValue(valueContent, 12) # 128 # carModelDict["carModelHorsePowerElectric"] = carModelHorsePowerElectric # # 变速箱 # carModelGearBox = self.getItemFirstValue(valueContent, 13) # 8挡手自一体 # carModelDict["carModelGearBox"] = carModelGearBox # # 长*宽*高(mm) # carModelSize = self.getItemFirstValue(valueContent, 14) # 5071*1968*1716 # carModelDict["carModelSize"] = carModelSize # # 车身结构 # carModelBodyStructure = self.getItemFirstValue(valueContent, 15) # 5门5座SUV # carModelDict["carModelBodyStructure"] = carModelBodyStructure # # 最高车速(km/h) # carModelMaxSpeed = self.getItemFirstValue(valueContent, 16) # 228 # carModelDict["carModelMaxSpeed"] = carModelMaxSpeed # # 官方0-100km/h加速(s) # carModelOfficialSpeedupTime = self.getItemFirstValue(valueContent, 17) # 5.9 # carModelDict["carModelOfficialSpeedupTime"] = carModelOfficialSpeedupTime # # 实测0-100km/h加速(s) # carModelActualTestSpeedupTime = self.getItemFirstValue(valueContent, 18) # - # carModelDict["carModelActualTestSpeedupTime"] = carModelActualTestSpeedupTime # # 实测100-0km/h制动(m) # carModelActualTestBrakeDistance = self.getItemFirstValue(valueContent, 19) # - # carModelDict["carModelActualTestBrakeDistance"] = carModelActualTestBrakeDistance # # 实测续航里程(km) # carModelActualTestEnduranceMileage = self.getItemFirstValue(valueContent, 20) # - # carModelDict["carModelActualTestEnduranceMileage"] = carModelActualTestEnduranceMileage # # 实测快充时间(小时) # carModelActualTestQuickCharge = self.getItemFirstValue(valueContent, 21) # - # carModelDict["carModelActualTestQuickCharge"] = carModelActualTestQuickCharge # # 实测慢充时间(小时) # carModelActualTestSlowCharge = self.getItemFirstValue(valueContent, 22) # - # carModelDict["carModelActualTestSlowCharge"] = carModelActualTestSlowCharge # # 工信部综合油耗(L/100km) # carModelMiitCompositeFuelConsumption = self.getItemFirstValue(valueContent, 23) # 2.4 # carModelDict["carModelMiitCompositeFuelConsumption"] = carModelMiitCompositeFuelConsumption # # 实测油耗(L/100km) # carModelActualFuelConsumption = self.getItemFirstValue(valueContent, 24) # - # carModelDict["carModelActualFuelConsumption"] = carModelActualFuelConsumption # # 整车质保 # carModelWholeWarranty = self.getWholeWarranty(valueContent, 25) # 三年或10万公里 # carModelDict["carModelWholeWarranty"] = carModelWholeWarranty # self.saveSingleResult(carModelDict) # def processHevCar(self, valueContent, carModelDict): # # 混合电动汽车=HEV=Hybrid Electric Vehicle # # https://car.autohome.com.cn/config/spec/35507.html # # 环保标准 # carModelEnvStandard = self.getItemFirstValue(valueContent, 3) # 国IV(国V) # carModelDict["carModelEnvStandard"] = carModelEnvStandard # # 上市时间 # carModelReleaseTime = self.getItemFirstValue(valueContent, 4) # 2018.08 # carModelDict["carModelReleaseTime"] = carModelReleaseTime # # 最大功率(kW) # carModelMaxPower = self.getItemFirstValue(valueContent, 5) # 100 # carModelDict["carModelMaxPower"] = carModelMaxPower # # 最大扭矩(N·m) # carModelMaxTorque = self.getItemFirstValue(valueContent, 6) # - # carModelDict["carModelMaxTorque"] = carModelMaxTorque # # 发动机 # carModelEngine = self.getItemFirstValue(valueContent, 7) # 1.8L 99马力 L4 # carModelDict["carModelEngine"] = carModelEngine # # 电动机(Ps) # carModelHorsePowerElectric = self.getItemFirstValue(valueContent, 8) # 82 # carModelDict["carModelHorsePowerElectric"] = carModelHorsePowerElectric # # 变速箱 # carModelGearBox = self.getItemFirstValue(valueContent, 9) # E-CVT无级变速 # carModelDict["carModelGearBox"] = carModelGearBox # # 长*宽*高(mm) # carModelSize = self.getItemFirstValue(valueContent, 10) # 4360*1765*1455 # carModelDict["carModelSize"] = carModelSize # # 车身结构 # carModelBodyStructure = self.getItemFirstValue(valueContent, 11) # 5门5座SUV # carModelDict["carModelBodyStructure"] = carModelBodyStructure # # 最高车速(km/h) # carModelMaxSpeed = self.getItemFirstValue(valueContent, 12) # - # carModelDict["carModelMaxSpeed"] = carModelMaxSpeed # # 官方0-100km/h加速(s) # carModelOfficialSpeedupTime = self.getItemFirstValue(valueContent, 13) # - # carModelDict["carModelOfficialSpeedupTime"] = carModelOfficialSpeedupTime # # 实测0-100km/h加速(s) # carModelActualTestSpeedupTime = self.getItemFirstValue(valueContent, 14) # - # carModelDict["carModelActualTestSpeedupTime"] = carModelActualTestSpeedupTime # # 实测100-0km/h制动(m) # carModelActualTestBrakeDistance = self.getItemFirstValue(valueContent, 15) # - # carModelDict["carModelActualTestBrakeDistance"] = carModelActualTestBrakeDistance # # 工信部综合油耗(L/100km) # carModelMiitCompositeFuelConsumption = self.getItemFirstValue(valueContent, 16) # 4.6 # carModelDict["carModelMiitCompositeFuelConsumption"] = carModelMiitCompositeFuelConsumption # # 实测油耗(L/100km) # carModelActualFuelConsumption = self.getItemFirstValue(valueContent, 17) # - # carModelDict["carModelActualFuelConsumption"] = carModelActualFuelConsumption # # 整车质保 # carModelWholeWarranty = self.getWholeWarranty(valueContent, 18) # 六年或15万公里 # carModelDict["carModelWholeWarranty"] = carModelWholeWarranty # self.saveSingleResult(carModelDict)
即可 无需运行js,也可以获取结果了:
[
[
"autohome_20200827",
{
"carBrandId": "91",
"carBrandLogoUrl": "https://car3.autoimg.cn/cardfs/series/g26/M05/AE/94/100x100_f40_autohomecar__wKgHEVs9tm6ASWlTAAAUz_2mWTY720.png",
"carBrandName": "红旗",
"carEnergyType": "汽油",
"carMerchantName": "一汽红旗",
"carMerchantUrl": "https://car.autohome.com.cn/price/brand-91-190.html#pvareaid=2042363",
"carModelActualFuelConsumption": "",
"carModelActualTestBrakeDistance": "",
"carModelActualTestEnduranceMileage": "",
"carModelActualTestQuickCharge": "",
"carModelActualTestSlowCharge": "",
"carModelActualTestSpeedupTime": "",
"carModelBodyStructure": "4门5座三厢车",
"carModelDataSift2": "",
"carModelDataSift3": "",
"carModelDataSift4": "",
"carModelDriveType": "前置四驱",
"carModelEngine": "6.0L 408马力 V12",
"carModelEnvStandard": "未知",
"carModelGearBox": "6挡手自一体",
"carModelGroupName": "6.0升 自然吸气 408马力 未知",
"carModelHorsePowerElectric": "",
"carModelMaxPower": "300",
"carModelMaxSpeed": "",
"carModelMaxTorque": "550",
"carModelMiitCompositeFuelConsumption": "",
"carModelMiitEnduranceMileagePureElectric": "",
"carModelMsrp": "",
"carModelName": "2014款 6.0L 帜尊版",
"carModelOfficialSpeedupTime": "",
"carModelQuickCharge": "",
"carModelQuickChargePercent": "",
"carModelReleaseTime": "2014.03",
"carModelSize": "5555*2018*1578",
"carModelSlowCharge": "",
"carModelSpecId": "15822",
"carModelSpecUrl": "https://www.autohome.com.cn/spec/15822/#pvareaid=3454492",
"carModelWholeWarranty": "四年或10万公里",
"carModelYear": "2014款",
"carSeriesId": "3108",
"carSeriesLevelId": "6",
"carSeriesLevelName": "大型车",
"carSeriesMainImgUrl": "https://car3.autoimg.cn/cardfs/product/g24/M05/13/DB/380x285_0_q87_autohomecar__wKgHH1rdRjyAGXKwAAbriozDnBw527.jpg",
"carSeriesMaxPrice": "0.00万",
"carSeriesMinPrice": "0.00万",
"carSeriesMsrp": "",
"carSeriesMsrpUrl": "",
"carSeriesName": "红旗L5",
"carSeriesUrl": "https://www.autohome.com.cn/3108/#levelsource=000000000_0&pvareaid=101594"
},
"https://www.autohome.com.cn/spec/15822/#pvareaid=3454492"
]
]
后记:
转载请注明:在路上 » 【已解决】汽车之家车型车系数据:优化去掉js加速抓取车型参数配置