最新消息:20210917 已从crifan.com换到crifan.org

【已解决】Python过滤不处理印象笔记Note第一个实际不存在的resource

Python crifan 382浏览 0评论
折腾:
【未解决】Python处理发布印象笔记帖子到WordPress后的部分细节优化
期间,继续去优化。发现有个问题:
之前sync印象笔记后,即使把所有的resource更新为空后,重新获取印象笔记的note的resources,却还有一个。
后来,自己领悟处理,猜测是:
帖子的缩略图所属的图片
比如帖子:
更新掉,内部没有<en-media>后,但是获取还是有resource
然后原先代码:
先upload后,比如:
https://www.crifan.com/files/pic/uploads/2021/01/27e8b7d03fc54b058b1aab8c09423a96-4.jpg
注意:地址是-4,说明多次调试,上传多次了,导致name都重名4次了
对应图片是:
很明显就是该note的缩略图的原始大图
但是代码中,每次都upload,然后却发现note中不存在此en-media
所以后来决定去优化为:
upload之前,先去 判断是否存在此en-media
如果不存在,就不继续upload了。
避免无效的upload
最后代码改为:
libs/crifan/crifanEvernoteToWordpress.py

    def uploadNoteImageToWordpress(self, curNoteDetail, curResource, curResList=None):
        """Upload note single imges to wordpress, and sync to note (replace en-media to img) 


        Args:
            curNote (Note): evernote Note
            curResource (Resource): evernote Note Resource
            curResList (list): evernote Note Resource list
        Returns:
            upload image url(str)
        Raises:
        """
        if not curResList:
            curResList = curNoteDetail.resources


        uploadedImgUrl = ""


        isImg = self.evernote.isImageResource(curResource)
        if not isImg:
            logging.warning("Not upload resource for NOT image for %s", crifanEvernote.genResourceInfoStr(curResource))
            return uploadedImgUrl


        foundResEnMediaSoup = crifanEvernote.findResourceSoup(curResource, curNoteDetail=curNoteDetail)
        if not foundResEnMediaSoup:
            logging.warning("Not need upload resource %s to wordpress for not found related <en-media> node", crifanEvernote.genResourceInfoStr(curResource))
            return uploadedImgUrl


        isUploadOk, respInfo = self.uploadImageToWordpress(curResource)
        if isUploadOk:
            # {'id': 70491, 'url': 'https://www.crifan.com/files/pic/uploads/2020/11/c8b16cafe6484131943d80267d390485.jpg', 'slug': 'c8b16cafe6484131943d80267d390485', 'link': 'https://www.crifan.com/c8b16cafe6484131943d80267d390485/', 'title': 'c8b16cafe6484131943d80267d390485'}
            uploadedImgUrl = respInfo["url"]
            logging.info("uploaded url %s", uploadedImgUrl)
            # "https://www.crifan.com/files/pic/uploads/2020/03/f6956c30ef0b475fa2b99c2f49622e35.png"
            # relace en-media to img
            respNote = self.syncNoteImage(curNoteDetail, curResource, uploadedImgUrl, curResList)
            # logging.info("Complete sync image %s to note %s", uploadedImgUrl, respNote.title)
        else:
            logging.warning("Failed to upload image resource %s to wordpress", curResource)


        return uploadedImgUrl
和:
libs/crifan/crifanEvernote.py
    @staticmethod
    def findResourceSoup(curResource, soup=None, curNoteDetail=None):
        """Find related <en-media> BeautifulSoup soup from Evernote Resource


        Args:
            curResource (Resource): Evernote Resource
            soup (Soup): BeautifulSoup soup of note content
            curNoteDetail (Note): Evernote note, with detail content
        Returns:
            soup node
        Raises:
        """
        if not soup:
            soup = crifanEvernote.noteContentToSoup(curNoteDetail)


        curMime = curResource.mime # 'image/png'
        logging.debug("curMime=%s", curMime)
        # # method 1: calc again
        # curResBytes = curResource.data.body
        # curHashStr1 = utils.calcMd5(curResBytes) # 'dc355da030cafe976d816e99a32b6f51'


        # method 2: convert from body hash bytes
        curHashStr = utils.bytesToStr(curResource.data.bodyHash)
        logging.debug("curHashStr=%s", curHashStr)
        # b'\xae\xe1G\xdb\xcdh\x16\xca+@IF"\xff\xfa\xa3' -> 'aee147dbcd6816ca2b40494622fffaa3'


        # imgeTypeP = re.compile("image/\w+")
        curResSoup = soup.find("en-media", attrs={"type": curMime, "hash": curHashStr})
        logging.debug("curResSoup=%s", curResSoup)
        # <en-media hash="aee147dbcd6816ca2b40494622fffaa3" type="image/png" width="370"></en-media>
        return curResSoup
另外每次debug打印resource时,默认都把data打印出data:
所以顺带加上生成resource的info的str
libs/crifan/crifanEvernote.py
    @staticmethod
    def genResourceInfoStr(curResource):
        """Generate resource info str, use for debug print


        Args:
            curResource (Resource): Evernote Resouce
        Returns:
            resource info(str)
        Raises:
        """
        resInfoStr = "Resource(name=%s,mime=%s,guid=%s)" % (curResource.attributes.fileName, curResource.mime, curResource.guid)
        return resInfoStr
然后去调试看看效果
是可以的:
找不到en-media,不上传图片

转载请注明:在路上 » 【已解决】Python过滤不处理印象笔记Note第一个实际不存在的resource

发表我的评论
取消评论

表情

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址
93 queries in 0.188 seconds, using 23.33MB memory