折腾:
【未解决】重爬少儿xxx的所有视频
期间,对于:

/course/getCourseList
用之前的sign计算规则去调试,结果报错:
getCourseListCallback respJson={'status': 401, 'msg': '认证错误'} prevParaDict={'auth_token': 'MTU2OTkxNjU3ObCtxKuCe7rcr92Mcg', 'nature_id': '508', 'rows': 20, 'sign': '07041954a5b44a4251e4b5e84fc4251a', 'sort': 'new', 'start': 0, 'timestamp': 1568791827, 'uid': '37285135'} [E 190918 xxx base_handler:203] 'data' Traceback (most recent call last): ... ret = function(*arguments[:len(args) - 1]) File "<qpyRecrawlCqpy_mac>", line 488, in getCourseListCallback KeyError: 'data'

看来是,之前的sign值的计算规则,对于此处不适用?
那猜测,难道是get是,所有的para的key和value,再去做md5值?
去试试:
https://childapi30.xxx.com/course/getCourseList?sign=8bc95b8f099c0b3c200938566d9417ee×tamp=1568791216&uid=37285135&sort=new&start=0&auth_token=MTU2OTkxNjU3ObCtxKuCe7rcr92Mcg&rows=20&nature_id=508 sign 8bc95b8f099c0b3c200938566d9417ee timestamp 1568791216 uid 37285135 sort new start 0 auth_token MTU2OTkxNjU3ObCtxKuCe7rcr92Mcg rows 20 nature_id 508
对应的过程:
sign 8bc95b8f099c0b3c200938566d9417ee timestamp 1568791216 uid 37285135 sort new start 0 auth_token MTU2OTkxNjU3ObCtxKuCe7rcr92Mcg rows 20 nature_id 508 sorted: auth_token MTU2OTkxNjU3ObCtxKuCe7rcr92Mcg nature_id 508 rows 20 sort new start 0 timestamp 1568791216 uid 37285135 key+value all str: auth_tokenMTU2OTkxNjU3ObCtxKuCe7rcr92Mcgnature_id508rows20sortnewstart0timestamp1568791216uid37285135 cacl sign: https://md5jiami.51240.com 54cb741421fe2638f7c69027f4bfd6f8
也不对啊:
54cb741421fe2638f7c69027f4bfd6f8
和期望的:
8bc95b8f099c0b3c200938566d9417ee
不一致。
难道是部分核心参数,去md5?
那具体哪些参数?也不清楚啊
懒得挨个组合尝试了。
还是去看原始代码,看看能否找到
/course/getCourseList
的api请求时的参数计算逻辑
感觉不应该啊
http的get的api的参数计算规则,理论上应该是一样的
毕竟之前看代码中,记得都是Interceptor拦截器,应该是统一的逻辑
不存在单独某个,某些api是单独处理sign值的。
搜:
course/getCourseList
看到几个:

那再去找找,具体其中sign值如何计算。
另外看到了api
basic/newCates
以及:
square/courseNature
结果3处地方都有
所以也分不出,具体哪个是真正我们要的
不过也都有调用共用的:
import com.fz.lib.net.bean.FZResponse; import io.reactivex.Observable; import java.util.List; import java.util.Map; import retrofit2.http.Body; import retrofit2.http.GET; import retrofit2.http.POST; import retrofit2.http.Query; import retrofit2.http.QueryMap;
继续去找
仔细看代码:
sources/com/fz/childmodule/match/net/INetApi.java
@GET("course/getCourseList") Observable<FZResponse<List<FZHomeWrapperCourse>>> j(@QueryMap Map<String, String> map);
sources/com/fz/childmodule/mclass/net/ClassApi.java
@GET("course/getCourseList") Observable<FZResponse<List<FZTaskChooseCourse>>> l(@QueryMap Map<String, String> map);
sources/com/fz/childmodule/square/net/SquareApi.java
@GET("course/getCourseList") Observable<FZResponse<List<Course>>> c(@Query("nature_id") String str, @Query("start") String str2, @Query("rows") String str3, @QueryMap Map<String, String> map);
感觉第三个更像:
因为有3个参数:nature_id,start,rows
都是此处调试需要传入的参数
但是和自己代码比:
commonParaDict = generateCommonPara() paraDict = commonParaDict paraDict["start"] = 0 paraDict["rows"] = 20 paraDict["nature_id"] = curNatureId paraDict["sort"] = "new" print("paraDict=%s" % paraDict) self.crawl(ApiGetCourseList, callback=self.getCourseListCallback, params=commonParaDict, save=paraDict, validate_cert=False )
没有另外的sort参数
所以或许是:
计算sign值时,不包括sort?
去试试
试试:不包括sort参数 sorted: auth_token MTU2OTkxNjU3ObCtxKuCe7rcr92Mcg nature_id 508 rows 20 start 0 timestamp 1568791216 uid 37285135 key+value all str: auth_tokenMTU2OTkxNjU3ObCtxKuCe7rcr92Mcgnature_id508rows20start0timestamp1568791216uid37285135 cacl sign: https://md5jiami.51240.com 43218b8639aa618223633c9d367417e5
也不对
还是想办法去找原始代码,具体调用了哪个
Observable<FZResponse
吧,然后才能搞清楚,具体用了哪些参数
去看看,共用的:
import com.fz.lib.net.bean.FZResponse;
sources/com/fz/lib/net/bean/FZResponse.java
public class FZResponse<T> implements Serializable { public static final int STATUS_403 = 403; public static final int STATUS_FAIL = 0; public static final int STATUS_NO_CLASS = 2; public static final int STATUS_OFFLINE = 404; public static final int STATUS_SUCCESS = 1; public T data; public List<MiniData> mini_toast; public String msg; public int status; }
只有定义。
那再去看看SquareApi或Observable?
搜:
getCourseList
找到更多

再去看看代码
对比后,感觉:

sources/com/fz/childmodule/square/ui/search/result/video/SearchResultVideoFragment.java
中最相关
里面有:
最新,最热,筛选
和app的页面一致:

那重点去看看
再去根据代码中的mSearchFilter:

看起来:sort参数是后来加的?
至少是默认的参数
那不加sort参数试试
但就是前面试过的,不行。
搜:
getCourseListData
还是这3个文件
再去细看代码发现,这3个文件,是在同一子模块下的

所以,看哪个,效果都一样。
所以去虽然找个去研究,看看底层调用的逻辑,到底传了哪些参数,去做sign计算的。
找到列表来自
public void onSuccess(FZResponse<VideoSearch> fZResponse) { FZResponse<VideoSearch> fZResponse2 = fZResponse; T t = fZResponse2.data; List<Course> list2 = ((VideoSearch) t).course_list;
去找onSuccess
compositeDisposable.b(FZNetBaseSubscription.a(squareModel.b(str, sb3, sb4.toString(), this.mSearchFilterTag), new FZNetBaseSubscriber<FZResponse<VideoSearch>>() {
FZNetBaseSubscriber
VideoSearch
import com.fz.lib.net.base.FZNetBaseSubscriber;
import com.fz.lib.net.base.FZNetBaseSubscription;
import com.fz.lib.net.bean.FZResponse;
里面都没有具体的调用网络
且感觉最终还是指向,之前看到的:

import com.fz.lib.net.FZNetManager;
实际上有效的是:
FZNetApiManager
以及搜:
FZNetApiManager
也是找到:
sources/com/fz/childdubbing/provider/AppNetProvider.java
import com.fz.lib.net.FZNetApiManager.Builder; import com.fz.lib.net.base.FZINetConfig; public <T> T createApi(Class<T> cls) { return new Builder(createNetConfig(cls, "https://childapi30.xxx.com")).a().b(); } public OkHttpClient getOkHttpClient(boolean z) { return new Builder(createNetConfig(MainApi.class, "https://childapi30.xxx.com")).a().a(z).a(); } public <T> T createApi(Class<T> cls, String str) { return new Builder(createNetConfig(cls, str)).a().b(); } public <T> T createApi(Class<T> cls, FZINetConfig<T> fZINetConfig) { return new Builder(fZINetConfig).a().b(); }
最终还是FZNetApiManager负责发送网络请求
所以只能还是去分析
sources/com/fz/lib/net/FZNetApiManager.java
对于:
private void a(Map<String, String> map) { StringBuilder sb = new StringBuilder(); if (map != null) { long currentTimeMillis = (System.currentTimeMillis() / 1000) + this.a.a.a(); StringBuilder sb2 = new StringBuilder(); sb2.append(currentTimeMillis); sb2.append(""); map.put("timestamp", sb2.toString()); HashMap hashMap = new HashMap(map); hashMap.put("security_key", this.a.a.b()); ArrayList arrayList = new ArrayList(hashMap.entrySet()); Collections.sort(arrayList, new Comparator<Entry<String, String>>() { /* renamed from: a */ public int compare(Entry<String, String> entry, Entry<String, String> entry2) { return ((String) entry.getKey()).compareTo((String) entry2.getKey()); } }); Iterator it = arrayList.iterator(); while (it.hasNext()) { Entry entry = (Entry) it.next(); sb.append((String) entry.getKey()); sb.append((String) entry.getValue()); } map.put("sign", FZNetUtils.a(sb.toString())); } }
想起来了:
是需要:
传入的hashMap(已有一些参数了)
去额外加上:
timestamp
再加上:
security_key
然后才去计算security_key
之前手动测试时,都缺少了:security_key
所以重新手动去试试,加上security_key,能否计算出正确的sign值
试试:加上security_key security_key qpy68c681cbdcd102363 sorted: auth_token MTU2OTkxNjU3ObCtxKuCe7rcr92Mcg nature_id 508 rows 20 security_key qpy68c681cbdcd102363 sort new start 0 timestamp 1568791216 uid 37285135 key+value all str: auth_tokenMTU2OTkxNjU3ObCtxKuCe7rcr92Mcgnature_id508rows20security_keyqpy68c681cbdcd102363sortnewstart0timestamp1568791216uid37285135 cacl sign: https://md5jiami.51240.com 8bc95b8f099c0b3c200938566d9417ee
就对了,就是希望的,和抓包出来的sign值一样了。
【总结】
此处,对于api接口:
/course/getCourseList
其计算sign值的逻辑,也是也之前一致的:
对于所有已有的param的参数
先加上:timestamp,是当前时间戳,10位的
再加上:security_key,整个app都是固定的:qpy68c681cbdcd102363
然后再去计算sign值,即可。
更新后的相关代码为:
gCommonParaDict = { # "auth_token": "MTU2OTgyOTUyNrCtxKuCe7rcr92Mcg", # "uid": "37285135", "auth_token": AUTH_TOKEN, "uid": USER_ID, } def calcParaSign(originParaDict): """ Calc sign=md5 for input origin para dict Note: input dict should already added timestamp """ print("calcParaSign: originParaDict=%s" % originParaDict) toCalcMd5ParaDict = copy.deepcopy(originParaDict) # add security_key toCalcMd5ParaDict["security_key"] = SECURITY_KEY print("toCalcMd5ParaDict=%s" % toCalcMd5ParaDict) sortedParaDict = sortDictByKey(toCalcMd5ParaDict) print("sortedParaDict=%s" % sortedParaDict) sortedParaStr = "" for eachKey, eachValue in sortedParaDict.items(): keyValueStr = "%s%s" % (eachKey, eachValue) sortedParaStr += keyValueStr signMd5Str = generateMd5(sortedParaStr) print("signMd5Str=%s" % signMd5Str) return signMd5Str def addTimestampAndSign(originParaDict): """Add timestamp and sign for para/headers""" print("addTimestampAndSign: originParaDict=%s" % originParaDict) curTimestamp = getCurTimestamp() # 1568769723 print("curTimestamp=%s" % curTimestamp) # add timestamp originParaDict["timestamp"] = curTimestamp originParaDict["sign"] = calcParaSign(originParaDict) return originParaDict
对应调用的地方是:
paraDict = addTimestampAndSign(gCommonParaDict) print("paraDict=%s" % paraDict) # https://childapi30.xxx.com/square/courseNature?sign=d62322c45840d874f40d43e1e292fa8c×tamp=1568535821&uid=37285135&auth_token=MTU2OTgyOTUyNrCtxKuCe7rcr92Mcg self.crawl(ApiCourseNature, callback=self.courseNatureCallback, params=paraDict, validate_cert=False )
和:
# https://childapi30.xxx.com/course/getCourseList?sign=528dfbbe255c85fe7c6f9034d542bbf1×tamp=1568791219&uid=37285135&sort=new&start=0&auth_token=MTU2OTkxNjU3ObCtxKuCe7rcr92Mcg&rows=20&nature_id=509 curNatureId = eachLevel2Nature["nature_id"] paraDict = copy.deepcopy(gCommonParaDict) paraDict["start"] = 0 paraDict["rows"] = 20 paraDict["nature_id"] = curNatureId paraDict["sort"] = "new" paraDictWithSign = addTimestampAndSign(paraDict) print("paraDictWithSign=%s" % paraDictWithSign) self.crawl(ApiGetCourseList, callback=self.getCourseListCallback, params=paraDictWithSign, save=paraDictWithSign, validate_cert=False )
即可正常生成sign值
{ "auth_token": "MTU2OTkxNjU3ObCtxKuCe7rcr92Mcg", "nature_id": "508", "rows": 20, "sign": "a4e7bc6a860e6e5c53c35c362d8816fc", "sort": "new", "start": 0, "timestamp": 1568797753, "uid": "37285135" },
从api获取数据了:

后记:
后续调试发现个bug:后续获取其他page数据时,还会报错
所以再去改为:
def addTimestampAndSign(originParaDict): """ Add timestamp and sign for para/headers Note: makesure NO timestamp and sign before calc this """ print("addTimestampAndSign: originParaDict=%s" % originParaDict) # remove timestamp and sign if exist if "timestamp" in originParaDict: del originParaDict["timestamp"] if "sign" in originParaDict: del originParaDict["sign"] curTimestamp = getCurTimestamp() # 1568769723 print("curTimestamp=%s" % curTimestamp) # add timestamp originParaDict["timestamp"] = curTimestamp originParaDict["sign"] = calcParaSign(originParaDict) return originParaDict
即可。