折腾:
【未解决】Python处理发布印象笔记帖子到WordPress后的部分细节优化
期间,继续去优化。发现有个问题:
之前sync印象笔记后,即使把所有的resource更新为空后,重新获取印象笔记的note的resources,却还有一个。
后来,自己领悟处理,猜测是:
帖子的缩略图所属的图片
比如帖子:
更新掉,内部没有<en-media>后,但是获取还是有resource
然后原先代码:
先upload后,比如:
注意:地址是-4,说明多次调试,上传多次了,导致name都重名4次了
对应图片是:
很明显就是该note的缩略图的原始大图
但是代码中,每次都upload,然后却发现note中不存在此en-media
所以后来决定去优化为:
upload之前,先去 判断是否存在此en-media
如果不存在,就不继续upload了。
避免无效的upload
最后代码改为:
libs/crifan/crifanEvernoteToWordpress.py
def uploadNoteImageToWordpress(self, curNoteDetail, curResource, curResList=None): """Upload note single imges to wordpress, and sync to note (replace en-media to img) Args: curNote (Note): evernote Note curResource (Resource): evernote Note Resource curResList (list): evernote Note Resource list Returns: upload image url(str) Raises: """ if not curResList: curResList = curNoteDetail.resources uploadedImgUrl = "" isImg = self.evernote.isImageResource(curResource) if not isImg: logging.warning("Not upload resource for NOT image for %s", crifanEvernote.genResourceInfoStr(curResource)) return uploadedImgUrl foundResEnMediaSoup = crifanEvernote.findResourceSoup(curResource, curNoteDetail=curNoteDetail) if not foundResEnMediaSoup: logging.warning("Not need upload resource %s to wordpress for not found related <en-media> node", crifanEvernote.genResourceInfoStr(curResource)) return uploadedImgUrl isUploadOk, respInfo = self.uploadImageToWordpress(curResource) if isUploadOk: # {'id': 70491, 'url': 'https://www.crifan.com/files/pic/uploads/2020/11/c8b16cafe6484131943d80267d390485.jpg', 'slug': 'c8b16cafe6484131943d80267d390485', 'link': 'https://www.crifan.com/c8b16cafe6484131943d80267d390485/', 'title': 'c8b16cafe6484131943d80267d390485'} uploadedImgUrl = respInfo["url"] logging.info("uploaded url %s", uploadedImgUrl) # "https://www.crifan.com/files/pic/uploads/2020/03/f6956c30ef0b475fa2b99c2f49622e35.png" # relace en-media to img respNote = self.syncNoteImage(curNoteDetail, curResource, uploadedImgUrl, curResList) # logging.info("Complete sync image %s to note %s", uploadedImgUrl, respNote.title) else: logging.warning("Failed to upload image resource %s to wordpress", curResource) return uploadedImgUrl
和:
libs/crifan/crifanEvernote.py
@staticmethod def findResourceSoup(curResource, soup=None, curNoteDetail=None): """Find related <en-media> BeautifulSoup soup from Evernote Resource Args: curResource (Resource): Evernote Resource soup (Soup): BeautifulSoup soup of note content curNoteDetail (Note): Evernote note, with detail content Returns: soup node Raises: """ if not soup: soup = crifanEvernote.noteContentToSoup(curNoteDetail) curMime = curResource.mime # 'image/png' logging.debug("curMime=%s", curMime) # # method 1: calc again # curResBytes = curResource.data.body # curHashStr1 = utils.calcMd5(curResBytes) # 'dc355da030cafe976d816e99a32b6f51' # method 2: convert from body hash bytes curHashStr = utils.bytesToStr(curResource.data.bodyHash) logging.debug("curHashStr=%s", curHashStr) # b'\xae\xe1G\xdb\xcdh\x16\xca+@IF"\xff\xfa\xa3' -> 'aee147dbcd6816ca2b40494622fffaa3' # imgeTypeP = re.compile("image/\w+") curResSoup = soup.find("en-media", attrs={"type": curMime, "hash": curHashStr}) logging.debug("curResSoup=%s", curResSoup) # <en-media hash="aee147dbcd6816ca2b40494622fffaa3" type="image/png" width="370"></en-media> return curResSoup
另外每次debug打印resource时,默认都把data打印出data:
所以顺带加上生成resource的info的str
libs/crifan/crifanEvernote.py
@staticmethod def genResourceInfoStr(curResource): """Generate resource info str, use for debug print Args: curResource (Resource): Evernote Resouce Returns: resource info(str) Raises: """ resInfoStr = "Resource(name=%s,mime=%s,guid=%s)" % (curResource.attributes.fileName, curResource.mime, curResource.guid) return resInfoStr
然后去调试看看效果
是可以的:
找不到en-media,不上传图片