最新消息:20210917 已从crifan.com换到crifan.org

【已解决】用Python检测一个url是否是有效的安卓apk的下载地址

地址 crifan 608浏览 0评论
折腾:
【未解决】写Python脚本generateTaskListFromChanDaShiSearchResult处理蝉大师的仙侠和传奇搜索结果为任务列表
期间,需要去:
检测一个url,是否是合法有效的安卓的apk的下载地址
如果是,最好返回apk的文件大小,单位:Byte字节
如果不是,则返回出错原因(描述,字符串)
但是发现:
            # 'sh.lilith.dgame.mi/小冰冰传奇|http://app.mi.com/details?id=sh.lilith.dgame.mi'
            # 'com.thelabel.bumbo.bnn/损友传奇|http://app.mi.com/details?id=com.thelabel.bumbo.bnn'
的:
http://app.mi.com/details?id=sh.lilith.dgame.mi
是:app的详情页
不是:app的apk的下载地址
-》需要自己写代码解析,才能获取到apk下载地址
期间还遇到特殊的:
https://appdlc-drcn.hispace.hicloud.com/dl/appdl/application/apk/db/dbd3fbf4bb7c4e199e27169b83054afd/com.zsbf.rxsc.2010151906.rpk?sign=f9001091ej1001042000000000000100000000000500100101010@21BD93C47A224B178DE4FCDEAC296E3F&extendStr=detail%3A1%3B&tabStatKey=A09000&relatedAppId=C100450321&hcrId=21BD93C47A224B178DE4FCDEAC296E3F&maple=0&distOpEntity=HWSW
估计是笔误,把apk写成 rpk 了?
去尝试下载,也是有问题的:
只有145KB,而不是 很多MB
手动改为apk试试:
https://appdlc-drcn.hispace.hicloud.com/dl/appdl/application/apk/db/dbd3fbf4bb7c4e199e27169b83054afd/com.zsbf.rxsc.2010151906.apk?sign=f9001091ej1001042000000000000100000000000500100101010@21BD93C47A224B178DE4FCDEAC296E3F&extendStr=detail%3A1%3B&tabStatKey=A09000&relatedAppId=C100450321&hcrId=21BD93C47A224B178DE4FCDEAC296E3F&maple=0&distOpEntity=HWSW
结果:
403 Forbidden

openresty
-》说明原始地址:是存在的。
只不过的确不是有效的apk地址。
另外后续也发现很多类似的rpk文件:
https://appdl-1-drcn.dbankcdn.com/dl/appdl/application/apk/4c/4c1fcf071c2a4d878310bde3382757d4/com.szyh.rzcq.module.2011101036.rpk?sign=f9001091ej1001022000000000000100000000000500100101010@ADB12C9483C3431B8B21305B582BD6BE&extendStr=detail%3A1%3B&tabStatKey=A09000&relatedAppId=C103204269&hcrId=ADB12C9483C3431B8B21305B582BD6BE&maple=0&distOpEntity=HWSW
-》
com.szyh.rzcq.module.2011101036.rpk
176KB
都是无效的。
此处,代码已经写的差不多了:
def isAndroidApkUrl(curApkUrl, proxies=None):
    """Check whether is android apk url


    Args:
        curApkUrl (str): current apk url
        proxies (dict): requests proxies
    Returns:
        (bool, int/str)
            True, apk file size
            False, error message
    Raises:
    Examples:
        input: https://gameapktxdl.vivo.com.cn/appstore/developer/soft/20201020/202010201805243ed5v.apk
        output: True, 154551625


        input: 'https://appdlc-drcn.hispace.hicloud.com/dl/appdl/application/apk/47/4795a70deeac4103a8e6182b257ec4a9/com.shenghe.wzcq.huawei.2012221953.apk?sign=f9001091ej1001032000000000000100000000000500100101010@CC0A6D3E117D430483B55B08162FB0F4&extendStr=detail%3A1%3B&tabStatKey=A09000&relatedAppId=C100005003&hcrId=CC0A6D3E117D430483B55B08162FB0F4&maple=0&distOpEntity=HWSW'
        output: True, 249455788
    """
    isAllValid = False
    errMsg = "Unknown"
    apkFileSize = 0


    isValidApkUrl = False
    respHeaderDict = getRespHeadersFromUrl(curApkUrl, proxies=proxies)


    contentTypeStr = getContentTypeFromHeaders(respHeaderDict)
    if contentTypeStr:
        # contentTypeStr = contentTypeStr.lower()


        # ContentType_Android = 'application/vnd.android.package-archive'
        # isAndroidType = contentTypeStr == ContentType_Android
        # isValidApkUrl = "android" in contentTypeStr
        foundApplicationAndroid = re.search("application/.*android", contentTypeStr, re.I)
        isAndroidType = bool(foundApplicationAndroid)


        if isAndroidType:
            isValidApkUrl = True
            errMsg = ""
        else:
            errMsg = "Content type %s is NOT android for url %s" % (contentTypeStr, curApkUrl)
            # 'Content type text/html; charset=UTF-8 is NOT android for url http://app.mi.com/details?id=com.cqzzdlq.mi'


        # continue to check other possibility
        if not isValidApkUrl:
            # 'https://appdlc-drcn.hispace.hicloud.com/dl/appdl/application/apk/47/4795a70deeac4103a8e6182b257ec4a9/com.shenghe.wzcq.huawei.2012221953.apk?sign=f9001091ej1001032000000000000100000000000500100101010@CC0A6D3E117D430483B55B08162FB0F4&extendStr=detail%3A1%3B&tabStatKey=A09000&relatedAppId=C100005003&hcrId=CC0A6D3E117D430483B55B08162FB0F4&maple=0&distOpEntity=HWSW'
            # "Content-Type": "application/octet-stream",
            isOctetStreamType = "octet-stream" in contentTypeStr # True
            if isOctetStreamType:
                foundApkInUrl = re.search("[^/]+\.apk", curApkUrl, re.I) # <re.Match object; span=(101, 142), match='com.tanwan.yscqlyzf.huawei.2012141704.apk'>
                isApkInUrl = bool(foundApkInUrl) # True
                if isApkInUrl:
                    isValidApkUrl = True
                    errMsg = ""
                else:
                    isValidApkUrl = False
                    errMsg = "Content Type is octet-stream but no .apk in url %s" % curApkUrl
                    # 'Content Type is octet-stream but no .apk in url https://appdlc-drcn.hispace.hicloud.com/dl/appdl/application/apk/db/dbd3fbf4bb7c4e199e27169b83054afd/com.zsbf.rxsc.2010151906.rpk?sign=f9001091ej1001042000000000000100000000000500100101010@21BD93C47A224B178DE4FCDEAC296E3F&extendStr=detail%3A1%3B&tabStatKey=A09000&relatedAppId=C100450321&hcrId=21BD93C47A224B178DE4FCDEAC296E3F&maple=0&distOpEntity=HWSW'


        if not isValidApkUrl:
            # 'http://appstore.vivo.com.cn/appinfo/downloadApkFile?id=1676650&app_version=100.0'
            # redirect to -> http://apkgamedefbddl.vivo.com.cn/appstore/developer/soft/201612/201612061417371446252.apk

    else:
        isValidApkUrl = False
        errMsg = "Failed to get content type for url %s" % curApkUrl
        # 


    if isValidApkUrl:
        gotApkFileSize = getFileSizeFromHeaders(respHeaderDict) # 190814345
        if gotApkFileSize:
            apkFileSize = gotApkFileSize
            isAllValid = True
        else:
            isAllValid = False
            errMsg = "Failed to get android apk file size from url %s" % curApkUrl
            # 
    else:
        isAllValid = False


    if isAllValid:
        return isAllValid, apkFileSize
    else:
        return isAllValid, errMsg
又遇到新情况:
http://appstore.vivo.com.cn/appinfo/downloadApkFile?id=1676650&app_version=100.0
其打开后,开始自动下载apk文件:
http://apkgamedefbddl.vivo.com.cn/appstore/developer/soft/201612/201612061417371446252.apk
即:其实是内部跳转到了别的,真正的apk下载地址
所以加上:
            # for debug
            if "appstore.vivo.com.cn" in curApkUrl:
                logging.info("respHeaderDict=%s", respHeaderDict)
去调试看看
此处headers是:
{'Date': 'Fri, 25 Dec 2020 02:20:36 GMT', 'Content-Type': 'application/octet-stream', 'Content-Length': '164494719', 'Connection': 'keep-alive', 'Expires': 'Wed, 23 Jun 2021 02:20:36 GMT', 'Server': 'AliyunOSS', 'x-oss-request-id': '5F53193D3772E534345D494D', 'Accept-Ranges': 'bytes', 'ETag': '"9753D036FB074965531166CAA593935C"', 'Last-Modified': 'Thu, 18 May 2017 18:25:11 GMT', 'x-oss-object-type': 'Normal', 'x-oss-hash-crc64ecma': '1997588284273747113', 'x-oss-storage-class': 'Standard', 'Cache-Control': 'max-age=15552000', 'Content-MD5': 'l1PQNvsHSWVTEWbKpZOTXA==', 'x-oss-server-time': '49', 'X-Via': '1.1 PSjshasx4gz53:0 (Cdn Cache Server V2.0)[77 200 0], 1.1 PS-XUZ-01oMT26:5 (Cdn Cache Server V2.0)[109 200 2], 1.1 angtong75:13 (Cdn Cache Server V2.0)[293 200 2]', 'X-Ws-Request-Id': '5fe54c74_wt63_16725-32847', 'Access-Control-Allow-Origin': '*'}
看起来,没有希望的location,真实的url
所以,感觉需要去:
实现函数,返回redirect后的真实url
发现之前已实现了:
def get302RealUrl(originUrl):
    """get real url address after 302 move


    Args:
        originUrl (str): original url
    Returns:
        real url(str)
    Raises:
    Examples:
        input: 'http://dl.gamecenter.vivo.com.cn/clientRequest/gameDownload?id=57587&pkgName=com.jiuzun.mxsg.vivo&sourword=%E4%B8%89%E5%9B%BD&page_index=4&dlpos=1&channel=h5'
        output: 'https://gameapktxdl.vivo.com.cn/appstore/developer/soft/20180206/201802061851104837232.apk'
    """
    realUrl = ""
    resp = requests.get(originUrl, allow_redirects=False)
    if resp.status_code == 302:
        realUrl = resp.headers['Location']
    return realUrl
获取302跳转后的真实url
去调试看看
最后是:
【总结】
代码:
def isAndroidApkUrl(curApkUrl, proxies=None):
    """Check whether is android apk url


    Args:
        curApkUrl (str): current apk url
        proxies (dict): requests proxies
    Returns:
        (bool, int/str)
            True, apk file size
            False, error message
    Raises:
    Examples:
        input: https://gameapktxdl.vivo.com.cn/appstore/developer/soft/20201020/202010201805243ed5v.apk
        output: True, 154551625


        input: 'https://appdlc-drcn.hispace.hicloud.com/dl/appdl/application/apk/47/4795a70deeac4103a8e6182b257ec4a9/com.shenghe.wzcq.huawei.2012221953.apk?sign=f9001091ej1001032000000000000100000000000500100101010@CC0A6D3E117D430483B55B08162FB0F4&extendStr=detail%3A1%3B&tabStatKey=A09000&relatedAppId=C100005003&hcrId=CC0A6D3E117D430483B55B08162FB0F4&maple=0&distOpEntity=HWSW'
        output: True, 249455788
    """
    isAllValid = False
    errMsg = "Unknown"
    apkFileSize = 0


    isValidApkUrl = False
    respHeaderDict = getRespHeadersFromUrl(curApkUrl, proxies=proxies)


    contentTypeStr = getContentTypeFromHeaders(respHeaderDict)
    if contentTypeStr:
        # contentTypeStr = contentTypeStr.lower()


        # ContentType_Android = 'application/vnd.android.package-archive'
        # isAndroidType = contentTypeStr == ContentType_Android
        # isValidApkUrl = "android" in contentTypeStr
        foundApplicationAndroid = re.search("application/.*android", contentTypeStr, re.I)
        isAndroidType = bool(foundApplicationAndroid)


        if isAndroidType:
            isValidApkUrl = True
            errMsg = ""
        else:
            errMsg = "Content type %s is NOT android for url %s" % (contentTypeStr, curApkUrl)
            # 'Content type text/html; charset=UTF-8 is NOT android for url http://app.mi.com/details?id=com.cqzzdlq.mi'


        # continue to check other possibility
        if not isValidApkUrl:
            # 'https://appdlc-drcn.hispace.hicloud.com/dl/appdl/application/apk/47/4795a70deeac4103a8e6182b257ec4a9/com.shenghe.wzcq.huawei.2012221953.apk?sign=f9001091ej1001032000000000000100000000000500100101010@CC0A6D3E117D430483B55B08162FB0F4&extendStr=detail%3A1%3B&tabStatKey=A09000&relatedAppId=C100005003&hcrId=CC0A6D3E117D430483B55B08162FB0F4&maple=0&distOpEntity=HWSW'
            # "Content-Type": "application/octet-stream",
            isOctetStreamType = "octet-stream" in contentTypeStr # True
            if isOctetStreamType:
                foundApkInUrl = re.search("[^/]+\.apk", curApkUrl, re.I) # <re.Match object; span=(101, 142), match='com.tanwan.yscqlyzf.huawei.2012141704.apk'>
                isApkInUrl = bool(foundApkInUrl) # True
                if isApkInUrl:
                    isValidApkUrl = True
                    errMsg = ""
                else:
                    isValidApkUrl = False
                    errMsg = "Content Type is octet-stream but no .apk in url %s" % curApkUrl
                    # 'Content Type is octet-stream but no .apk in url https://appdlc-drcn.hispace.hicloud.com/dl/appdl/application/apk/db/dbd3fbf4bb7c4e199e27169b83054afd/com.zsbf.rxsc.2010151906.rpk?sign=f9001091ej1001042000000000000100000000000500100101010@21BD93C47A224B178DE4FCDEAC296E3F&extendStr=detail%3A1%3B&tabStatKey=A09000&relatedAppId=C100450321&hcrId=21BD93C47A224B178DE4FCDEAC296E3F&maple=0&distOpEntity=HWSW'


                    # continue check for get redirected 302 real url
                    redirectedRealUrl = get302RealUrl(curApkUrl)
                    if redirectedRealUrl != curApkUrl:
                        # Special:
                        # 'https://appdlc-drcn.hispace.hicloud.com/dl/appdl/application/apk/db/dbd3fbf4bb7c4e199e27169b83054afd/com.zsbf.rxsc.2010151906.rpk?sign=f9001091ej1001042000000000000100000000000500100101010@21BD93C47A224B178DE4FCDEAC296E3F&extendStr=detail%3A1%3B&tabStatKey=A09000&relatedAppId=C100450321&hcrId=21BD93C47A224B178DE4FCDEAC296E3F&maple=0&distOpEntity=HWSW'
                        # ->
                        # 'https://appdl-1-drcn.dbankcdn.com/dl/appdl/application/apk/db/dbd3fbf4bb7c4e199e27169b83054afd/com.zsbf.rxsc.2010151906.rpk?sign=f9001091ej1001042000000000000100000000000500100101010@21BD93C47A224B178DE4FCDEAC296E3F&extendStr=detail%3A1%3B&tabStatKey=A09000&relatedAppId=C100450321&hcrId=21BD93C47A224B178DE4FCDEAC296E3F&maple=0&distOpEntity=HWSW'
                        # but still invalid
                        curApkUrl = redirectedRealUrl
                        # Normal Expected:
                        # (1) http://appstore.vivo.com.cn/appinfo/downloadApkFile?id=1676650&app_version=100.0
                        #     ->
                        #     http://apkgamedefbddl.vivo.com.cn/appstore/developer/soft/201612/201612061417371446252.apk
                        # (2) 'https://app.mi.com/download/610735?id=com.mobileuncle.toolhero&ref=appstore.mobile_download&nonce=-2797954111430111294%3A26814339&appClientId=288230376xxx45&appSignature=oxBvxJhrGBuUBck5cgFqasC7gI5rLez99KZ24VMiRpA'
                        #     ->
                        #     'https://fga1.market.xiaomi.com/download/AppStore/03367f59ffcbc4719185da0d550a3b407f50cfb62/com.mobileuncle.toolhero.apk'
                        foundApkInUrl = re.search("[^/]+\.apk", curApkUrl, re.I)
                        isApkInUrl = bool(foundApkInUrl) # True
                        if isApkInUrl:
                            isValidApkUrl = True
                            errMsg = ""
                        else:
                            isValidApkUrl = False
                            errMsg = "Content Type is octet-stream but no .apk in redirected url %s" % curApkUrl
                            # 'Content Type is octet-stream but no .apk in redirected url https://appdl-1-drcn.dbankcdn.com/dl/appdl/application/apk/db/dbd3fbf4bb7c4e199e27169b83054afd/com.zsbf.rxsc.2010151906.rpk?sign=f9001091ej1001042000000000000100000000000500100101010@21BD93C47A224B178DE4FCDEAC296E3F&extendStr=detail%3A1%3B&tabStatKey=A09000&relatedAppId=C100450321&hcrId=21BD93C47A224B178DE4FCDEAC296E3F&maple=0&distOpEntity=HWSW'
                            # 
    else:
        isValidApkUrl = False
        errMsg = "Failed to get content type for url %s" % curApkUrl
        # 


    if isValidApkUrl:
        gotApkFileSize = getFileSizeFromHeaders(respHeaderDict) # 190814345
        if gotApkFileSize:
            apkFileSize = gotApkFileSize
            isAllValid = True
        else:
            isAllValid = False
            errMsg = "Failed to get android apk file size from url %s" % curApkUrl
            # 
    else:
        isAllValid = False


    if isAllValid:
        return isAllValid, apkFileSize
    else:
        return isAllValid, errMsg
相关函数:

def get302RealUrl(originUrl):
    """get real url address after 302 move


    Args:
        originUrl (str): original url
    Returns:
        real url(str)
    Raises:
    Examples:
        input: 'http://dl.gamecenter.vivo.com.cn/clientRequest/gameDownload?id=57587&pkgName=com.jiuzun.mxsg.vivo&sourword=%E4%B8%89%E5%9B%BD&page_index=4&dlpos=1&channel=h5'
        output: 'https://gameapktxdl.vivo.com.cn/appstore/developer/soft/20180206/201802061851104837232.apk'


        input: 'https://appdlc-drcn.hispace.hicloud.com/dl/appdl/application/apk/db/dbd3fbf4bb7c4e199e27169b83054afd/com.zsbf.rxsc.2010151906.rpk?sign=f9001091ej1001042000000000000100000000000500100101010@21BD93C47A224B178DE4FCDEAC296E3F&extendStr=detail%3A1%3B&tabStatKey=A09000&relatedAppId=C100450321&hcrId=21BD93C47A224B178DE4FCDEAC296E3F&maple=0&distOpEntity=HWSW'
        output: 'https://appdl-1-drcn.dbankcdn.com/dl/appdl/application/apk/db/dbd3fbf4bb7c4e199e27169b83054afd/com.zsbf.rxsc.2010151906.rpk?sign=f9001091ej1001042000000000000100000000000500100101010@21BD93C47A224B178DE4FCDEAC296E3F&extendStr=detail%3A1%3B&tabStatKey=A09000&relatedAppId=C100450321&hcrId=21BD93C47A224B178DE4FCDEAC296E3F&maple=0&distOpEntity=HWSW'
    """
    realUrl = ""
    resp = requests.get(originUrl, allow_redirects=False)


    if resp.status_code == 302:
        realUrl = resp.headers['Location']


        # for debug
        if resp.history:
            print("resp.history=%s" % resp.history)


    return realUrl


def getRespHeadersFromUrl(curUrl, proxies=None):
    """Get response headers from url


    Args:
        curUrl (str): current url
        proxies (dict): requests proxies
    Returns:
        headers(dict) or None
    Raises:
    Examples:
        1
            input: https://gameapktxdl.vivo.com.cn/appstore/developer/soft/20201020/202010201805243ed5v.apk
            output: {'Date': 'Thu, 10 Dec 2020 05:27:10 GMT', 'Content-Type': 'application/vnd.android.package-archive', 'Content-Length': '154551625', 'Connection': 'keep-alive', 'Server': 'NWS_TCloud_static_msoc1_xz', 'Cache-Control': 'max-age=600', 'Expires': 'Thu, 10 Dec 2020 05:37:09 GMT', 'Last-Modified': 'Thu, 09 Jan 2020 11:21:35 GMT', 'X-NWS-UUID-VERIFY': '94db2d14f135898d924fb249b13a0964', 'X-Verify-Code': '2871bd7acf67c7e298e9c8d8c865e27d', 'X-NWS-LOG-UUID': 'a83536f2-ab83-465d-ba09-0e19a15cc706', 'X-Cache-Lookup': 'Hit From Disktank3, Hit From Inner Cluster', 'Accept-Ranges': 'bytes', 'ETag': '"46C50A5CADB6BEE339236477BB6DDC14"', 'X-Daa-Tunnel': 'hop_count=2'}
        2
            output: {'Server': 'Tengine', 'Date': 'Fri, 11 Dec 2020 14:11:00 GMT', 'Content-Type': 'application/pdf', 'Content-Length': '24422168', 'Last-Modified': 'Fri, 18 Sep 2020 09:56:15 GMT', 'Connection': 'keep-alive', 'ETag': '"5f64843f-174a718"', 'Strict-Transport-Security': 'max-age=15768000', 'Accept-Ranges': 'bytes'}
        3
            output: {'Date': 'Thu, 24 Dec 2020 08:57:18 GMT', 'Content-Type': 'application/vnd.android.package-archive', 'Content-Length': '190814345', 'Connection': 'keep-alive', 'Server': 'openresty', 'Last-Modified': 'Mon, 14 Dec 2020 12:32:50 GMT', 'Expires': 'Mon, 14 Dec 2020 12:32:50 GMT', 'Content-Disposition': 'attachment; filename="com.tanwan.yscqlyzf.huawei.2012141704.apk"', 'Via': 'CHN-JSsuqian-CT3-CACHE7[8],CHN-JSsuqian-CT3-CACHE3[0,TCP_HIT,6],CHN-JSwuxi-GLOBAL2-CACHE63[5],CHN-JSwuxi-GLOBAL2-CACHE74[0,TCP_HIT,2],CHN-SH-GLOBAL1-CACHE92[589],CHN-SH-GLOBAL1-CACHE152[555,TCP_MISS,588],CHN-HElangfang-GLOBAL2-CACHE41[493],CHN-HElangfang-GLOBAL2-CACHE24[487,TCP_MISS,491]', 'X-Hcs-Proxy-Type': '1', 'X-Ccdn-Cachettl': '31536000', 'X-Ccdn-Expires': '30684021', 'Nginx-Hit': '1', 'Cache-Control': 'max-age=7200', 'Age': '851993', 'Lct-Pos-Percent': '0.19', 'Lct-Hot-Series': '1056964608', 'Accept-Ranges': 'bytes', 'dl-from': 'hwcdn'}
    """
    respHeaderDict = None


    try:
        resp = requests.get(curUrl, stream=True, proxies=proxies)
        respHeaderDict = resp.headers
        # {'Date': 'Thu, 10 Dec 2020 05:27:10 GMT', 'Content-Type': 'application/vnd.android.package-archive', 'Content-Length': '154551625', 'Connection': 'keep-alive', 'Server': 'NWS_TCloud_static_msoc1_xz', 'Cache-Control': 'max-age=600', 'Expires': 'Thu, 10 Dec 2020 05:37:09 GMT', 'Last-Modified': 'Thu, 09 Jan 2020 11:21:35 GMT', 'X-NWS-UUID-VERIFY': '94db2d14f135898d924fb249b13a0964', 'X-Verify-Code': '2871bd7acf67c7e298e9c8d8c865e27d', 'X-NWS-LOG-UUID': 'a83536f2-ab83-465d-ba09-0e19a15cc706', 'X-Cache-Lookup': 'Hit From Disktank3, Hit From Inner Cluster', 'Accept-Ranges': 'bytes', 'ETag': '"46C50A5CADB6BEE339236477BB6DDC14"', 'X-Daa-Tunnel': 'hop_count=2'}
        # {'Server': 'Tengine', 'Date': 'Fri, 11 Dec 2020 14:11:00 GMT', 'Content-Type': 'application/pdf', 'Content-Length': '24422168', 'Last-Modified': 'Fri, 18 Sep 2020 09:56:15 GMT', 'Connection': 'keep-alive', 'ETag': '"5f64843f-174a718"', 'Strict-Transport-Security': 'max-age=15768000', 'Accept-Ranges': 'bytes'}
        # {'Date': 'Thu, 24 Dec 2020 09:19:58 GMT', 'Content-Type': 'application/vnd.android.package-archive', 'Content-Length': '190814345', 'Connection': 'keep-alive', 'Server': 'openresty', 'Age': '859494', 'Cache-Control': 'max-age=7200', 'Content-Disposition': 'attachment; filename="com.tanwan.yscqlyzf.huawei.2012141704.apk"', 'Expires': 'Mon, 14 Dec 2020 12:32:50 GMT', 'Last-Modified': 'Mon, 14 Dec 2020 12:32:50 GMT', 'Lct-Hot-Series': '12582912', 'Lct-Pos-Percent': '0.25', 'Nginx-Hit': '1', 'Via': 'CHN-JSwuxi-AREACT1-CACHE33[4],CHN-JSwuxi-AREACT1-CACHE43[0,TCP_HIT,2],CHN-JSwuxi-GLOBAL2-CACHE110[2],CHN-JSwuxi-GLOBAL2-CACHE74[0,TCP_HIT,0],CHN-SH-GLOBAL1-CACHE92[589],CHN-SH-GLOBAL1-CACHE152[555,TCP_MISS,588],CHN-HElangfang-GLOBAL2-CACHE41[493],CHN-HElangfang-GLOBAL2-CACHE24[487,TCP_MISS,491]', 'X-Ccdn-Cachettl': '31536000', 'X-Ccdn-Expires': '30676539', 'X-Hcs-Proxy-Type': '1', 'Accept-Ranges': 'bytes', 'dl-from': 'hwcdn'}
        # {'Date': 'Thu, 24 Dec 2020 09:22:05 GMT', 'Content-Type': 'application/octet-stream', 'Content-Length': '249455788', 'Connection': 'keep-alive', 'Accept-Ranges': 'bytes', 'ETag': '"2a0205efc29db9ee555d8cd429a5d723"', 'Last-Modified': 'Tue, 22 Dec 2020 13:38:42 GMT', 'Ohc-Cache-HIT': 'czix102 [2]', 'Ohc-File-Size': '249455788', 'Ohc-Upstream-Trace': '58.216.2.102', 'Timing-Allow-Origin': '*', 'dl-from': 'bdcdn', 'x-obs-id-2': '32AAAQAAEAABAAAQAAEAABAAAQAAEAABCSteDob3rAnYCgC3AxdwUWM4S8xxD0WH', 'x-obs-request-id': '000001768AAD97CB980AA58ADE5C652D', 'Age': '122819', 'Via': 'HIT by 61.183.53.37, HIT by 180.97.190.116', 'Server': 'Tengine/2.2.3'}
    except:
        respHeaderDict = None


    return respHeaderDict


def getFileSizeFromHeaders(respHeaderDict):
    """Get file size from url response headers


    Args:
        respHeaderDict (dict): requests response headers
    Returns:
        file size or 0 mean fail to get
    Raises:
    Examples:
        input: {'Date': 'Fri, 25 Dec 2020 01:18:18 GMT', 'Content-Type': 'application/octet-stream', 'Content-Length': '190814345', 'Connection': 'keep-alive', 'Server': 'openresty', 'Age': '915891', 'Last-Modified': 'Mon, 14 Dec 2020 10:32:43 GMT', 'Lct-Hot-Series': '1006632960', 'Lct-Pos-Percent': '0.12', 'Nginx-Hit': '1', 'Via': 'CHN-JSsuqian-CUCC2-CACHE3[21],CHN-JSsuqian-CUCC2-CACHE3[0,TCP_HIT,10],CHN-HElangfang-GLOBAL2-CACHE49[18],CHN-HElangfang-GLOBAL2-CACHE24[0,TCP_HIT,18]', 'X-Ccdn-Cachettl': '31536000', 'X-Ccdn-Expires': '30620162', 'X-Hcs-Proxy-Type': '1', 'X-Obs-Id-2': '32AAAQAAEAABAAAQAAEAABAAAQAAEAABCSiVxBzkAhQ9rf3Mu0HzMB2FV2QN61NS', 'X-Obs-Request-Id': '0000017660D50FAF940B2445365906B1', 'Accept-Ranges': 'bytes', 'dl-from': 'hwcdn'}
        output: 190814345
    """
    totalFileSize = None


    if respHeaderDict:
        contentLengthStr = respHeaderDict['Content-Length'] # '154551625', '24422168', '190814345'
        contentLengthInt = int(contentLengthStr) # 154551625, 24422168, 190814345
        totalFileSize = contentLengthInt


    return totalFileSize


def getFileSizeFromUrl(fileUrl, proxies=None):
    """Get file size from file url


    Args:
        fileUrl (str): file url
        proxies (dict): requests proxies
    Returns:
        file size(int) or None
    Raises:
    Examples:
        input: https://gameapktxdl.vivo.com.cn/appstore/developer/soft/20201020/202010201805243ed5v.apk
        output: 154551625
    """
    respHeaderDict = getRespHeadersFromUrl(fileUrl, proxies=proxies)
    totalFileSize = getFileSizeFromHeaders(respHeaderDict)
    return totalFileSize # 154551625


def getContentTypeFromHeaders(respHeaderDict):
    """Get content type from url response headers


    Args:
        respHeaderDict (dict): requests response headers
    Returns:
        content type(str) or None
    Raises:
    Examples:
        input: {'Date': 'Thu, 10 Dec 2020 05:27:10 GMT', 'Content-Type': 'application/vnd.android.package-archive', 'Content-Length': '154551625', 'Connection': 'keep-alive', 'Server': 'NWS_TCloud_static_msoc1_xz', 'Cache-Control': 'max-age=600', 'Expires': 'Thu, 10 Dec 2020 05:37:09 GMT', 'Last-Modified': 'Thu, 09 Jan 2020 11:21:35 GMT', 'X-NWS-UUID-VERIFY': '94db2d14f135898d924fb249b13a0964', 'X-Verify-Code': '2871bd7acf67c7e298e9c8d8c865e27d', 'X-NWS-LOG-UUID': 'a83536f2-ab83-465d-ba09-0e19a15cc706', 'X-Cache-Lookup': 'Hit From Disktank3, Hit From Inner Cluster', 'Accept-Ranges': 'bytes', 'ETag': '"46C50A5CADB6BEE339236477BB6DDC14"', 'X-Daa-Tunnel': 'hop_count=2'}
        output: 'application/vnd.android.package-archive'


        input: {'Date': 'Fri, 25 Dec 2020 01:47:31 GMT', 'Content-Type': 'text/html; charset=UTF-8', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'Cache-Control': 'no-cache', 'Content-Language': 'en-US', 'Expires': 'Thu, 01 Dec 1994 16:00:00 GMT', 'Set-Cookie': 'JSESSIONID=aaaIXrniUxnxU8Rh-8fzx; path=/', 'Content-Encoding': 'gzip'}
        output: 'text/html; charset=UTF-8'
    """
    contentTypeStr = None


    if respHeaderDict:
        contentTypeStr = respHeaderDict['Content-Type']
        # 'Content-Type': 'application/vnd.android.package-archive'
        # 'Content-Type': 'application/pdf'


    # 'application/vnd.android.package-archive'
    # 'application/pdf'
    return contentTypeStr


def getContentTypeFromUrl(curUrl, proxies=None):
    """Get content type from url


    Args:
        curUrl (str): current url
        proxies (dict): requests proxies
    Returns:
        content type(str) or None
    Raises:
    Examples:
        input: https://gameapktxdl.vivo.com.cn/appstore/developer/soft/20201020/202010201805243ed5v.apk
        output: 'application/vnd.android.package-archive'


        output: 'application/pdf'
    """
    respHeaderDict = getRespHeadersFromUrl(curUrl, proxies=proxies)
    contentTypeStr = getContentTypeFromHeaders(respHeaderDict)
    return contentTypeStr
调用:
        isValidApkUrl, fileSizeOrErrMsg = isAndroidApkUrl(curApkUrl)
即可。
【后记 20210120】
发现对于:
'https://api-game.meizu.com/games/public/download/redirect/url?auth_time=43200&package_name=com.llread.define.mz&source=0×tamp=1611106471512&type=2&sign=2b5fd25e6b0f095cfbea096017e90295&fname=com.llread.define.mz_181'
返回的类型是:
'application/zip'
所以再去加上支持:
        foundApplicationAndroid = re.search("application/.*android", contentTypeStr, re.I)

        # 'application/zip'
        foundApplicationZip = re.search("application/zip", contentTypeStr, re.I)

        isAndroidType = foundApplicationAndroid or foundApplicationZip
结果:
eachAppInfoDict={'gameType': '塔防', 'appName': '植物大战僵尸2高清版', 'packageName': 'com.popcap.pvz2cthdamz', 'downloadCount': 11575579, 'apkUrl': 'https://api-game.meizu.com/games/public/download/redirect/url?auth_time=43200&package_name=com.popcap.pvz2cthdamz&source=0×tamp=1611106470321&type=2&sign=e53a406706f7f7074469fd1816ce7209&fname=com.popcap.pvz2cthdamz_1040', 'apkFileSize': 409527195, 'sourceWebsite': 'ChanDaShi', 'sourceMarket': 'meizu'}
是对的了。
【后记2】
又遇到:
'Content type application/octet-stream is NOT android for url https://api-game.meizu.com/games/public/download/redirect/url?auth_time=43200&package_name=com.hirealgame.hswsw.mz&source=0&timestamp=1611106503304&type=2&sign=fce0aeff829a8b4c4f493f2e86c9e35b&fname=com.hirealgame.hswsw.mz_402'
所以,再去看看
            # "Content-Type": "application/octet-stream",
            isOctetStreamType = "octet-stream" in contentTypeStr # True
            if isOctetStreamType:
                # 'https://appdlc-drcn.hispace.hicloud.com/dl/appdl/application/apk/47/4795a70deeac4103a8e6182b257ec4a9/com.shenghe.wzcq.huawei.2012221953.apk?sign=f9001091ej1001032000000000000100000000000500100101010@CC0A6D3E117D430483B55B08162FB0F4&extendStr=detail%3A1%3B&tabStatKey=A09000&relatedAppId=C100005003&hcrId=CC0A6D3E117D430483B55B08162FB0F4&maple=0&distOpEntity=HWSW'
                foundApkInUrl = re.search("[^/]+\.apk", curApkUrl, re.I) # <re.Match object; span=(101, 142), match='com.tanwan.yscqlyzf.huawei.2012141704.apk'>
                # isApkInUrl = bool(foundApkInUrl) # True
                # 'https://api-game.meizu.com/games/public/download/redirect/url?auth_time=43200&package_name=com.hirealgame.hswsw.mz&source=0×tamp=1611106503304&type=2&sign=fce0aeff829a8b4c4f493f2e86c9e35b&fname=com.hirealgame.hswsw.mz_402'
                foundDownloadInUrl = re.search("download", curApkUrl, re.I) # 
                isApkInUrl = foundApkInUrl or foundDownloadInUrl
结果:
最新代码详见:
https://github.com/crifan/crifanLibPython/blob/master/python3/crifanLib/thirdParty/crifanRequests.py

转载请注明:在路上 » 【已解决】用Python检测一个url是否是有效的安卓apk的下载地址

发表我的评论
取消评论

表情

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址
90 queries in 0.182 seconds, using 23.45MB memory