最新消息:20210917 已从crifan.com换到crifan.org

【已解决】Python中更新印象笔记中帖子中附件图片的数据

Python crifan 421浏览 0评论
折腾:
【未解决】用Python处理印象笔记帖子:缩放图片并保存回印象笔记
期间,已经压缩图片,改变了图片数据了。
现在希望保存回原先印象笔记中帖子中的附件中。
Thrift module: NoteStore
像是:
* updateNote
* updateResource
去看看
Function: NoteStore.updateResource
i32 updateResource(string authenticationToken,
                   Types.Resource resource)
    throws Errors.EDAMUserException, Errors.EDAMSystemException, Errors.EDAMNotFoundException

Submit a set of changes to a resource to the service. This can be used to update the meta-data about the resource, but cannot be used to change the binary contents of the resource (including the length and hash). These cannot be changed directly without creating a new resource and removing the old one via updateNote.
@param  resource A Resource object containing the desired fields to be populated on the service. The service will attempt to update the resource with the following fields from the client:
* guid: must be provided to identify the resource
* mime
* width
* height
* duration
* attributes: optional. if present, the set of attributes will be replaced.
@return The Update Sequence Number of the resource after the changes have been applied.
@throws  EDAMUserException
* BAD_DATA_FORMAT "Resource.guid" - if the parameter is missing
* BAD_DATA_FORMAT "Resource.mime" - invalid resource MIME type
* BAD_DATA_FORMAT "ResourceAttributes.*" - bad resource string
* LIMIT_REACHED "ResourceAttribute.*" - attribute string too long
* PERMISSION_DENIED "Resource" - private resource, user doesn't own
@throws  EDAMNotFoundException
* "Resource.guid" - not found, by GUID
即:无法更新resource的二进制数据
只能是,删除旧的,新建一个,再去更新guid?
搜:create
没找到 create resource相关函数
evernote create resource api
Resources – Evernote Developers
Resources – Evernote Developers
“DocsResources GET AN API KEY evernote.com
Resources
Dealing with note attachments
Overview
Resources represent files that are embedded within a note. Common resource types include images, audio clips, PDFs, and documents, but any type of file can be stored in a Resouce.
* Creating resources
* Resource types
* Downloading resources
Creating resources
You’ll notice that there’s no NoteStore.createResource API call. This is because Resources don’t exist outside of the context of a specific note. Instead, Resources are created implicitly when you include a new Resource object in a note as part of a call to NoteStore.createNote or NoteStore.updateNote. “
清楚了:去调用NoteStore.createNote去更新resource
Thrift module: Types
Getting Started with the Evernote API – Evernote Developers
Evernote API resources for developers – Evernote Help & Learning
Thrift module: Types
Struct: Resource
此处没有创建资源,那怎么创建 新建 对象 实例
python – evernote updating note resources – Stack Overflow
Attaching an existing Evernote resource to a new Evernote note – Stack Overflow
注意到调试期间,看到每个资源是:
Resource(guid='33a6d058-58fa-410c-acfb-d163a973a7bb', noteGuid='9bf6cecf-d91e-4391-a034-199c744424db', data=Data(bodyHash=b'\xdc5]\xa00\xca\xfe\x97m\x81n\x99\xa3+oQ', size=1284627, body=b'\xff\xd8\xff\xe1P:Exif\x00\x00MM\x00*\x00\x00\x00\x08\x00\r\x01\x00\x00\x03\x00\x00\x00\x01\x0f\xa0\x..........................2?\xff\xd9'), mime='image/jpeg', width=4000, height=3000, duration=None, active=True, recognition=None, attributes=ResourceAttributes(sourceURL=None, timestamp=1583759247000, latitude=None, longitude=None, altitude=None, cameraMake='MI 9', cameraModel=None, clientWillIndex=None, recoType=None, fileName='IMG_20200309_130727.jpg', attachment=False, applicationData=None), updateSequenceNum=4555832, alternateData=None)
即:内部是有个Resource的对象的
去看看代码
搜:
Resource(
看到了:
/Users/crifan/dev/dev_root/python/EvernoteToWordpress/EvernoteToWordpress/libs/evernote-sdk-python3/lib/evernote/edam/type/ttypes.py
class Resource(object):
  """
  Every media file that is embedded or attached to a note is represented
  through a Resource entry.
  <dl>
  <dt>guid</dt>
...
libs/evernote-sdk-python3/sample/client/EDAMTest.py
data = Types.Data()
data.size = len(image)
data.bodyHash = hash
data.body = image

resource = Types.Resource()
resource.mime = 'image/png'
resource.data = data
就是这么创建的。
所以去试试
调试期间,也是可以看到部分原图中是有属性attributes值的:
期间需要:
【已解决】Python中如何从md5的digest的二进制值计算出hexdigest的字符串值
所以只能:自己从原始二进制数据,从新计算出hexdigest的字符串
后来解决了,再去优化为:
        # method 1: calc again
        curResBytes = curRes.data.body
        curHashStr1 = utils.calcMd5(curResBytes) # 'dc355da030cafe976d816e99a32b6f51'
        # method 2: convert from body hash bytes
        curHashStr = utils.bytesToStr(curRes.data.bodyHash)
调试看到,部分原先图片,还带宽度(和)或高度属性的:
  <div>
    <en-media hash="b017fc1775dabb56603adc9cbe207765" type="image/jpeg" width="1080" />
  </div>
要注意,替换新资源图片时,要保留。
此处已经可以:
压缩图片,然后把压缩后图片的资源列表,更新到note中了
接着去考虑,如何上传,更新到印象笔记的note 即post中去:
【已解决】Python中如何更新印象笔记的note即post帖子的内容和资源列表
不过,关于图片等细节有待优化,就是后续的事情了。
【总结】
此处最终用代码:
/Users/crifan/dev/dev_root/python/EvernoteToWordpress/EvernoteToWordpress/libs/crifan/utils.py
from hashlib import md5 # only for python 3.x
import binascii

ImageFormatToMime = {
    "BMP": "image/bmp",
    "PNG": "image/png",
    "JPEG": "image/jpeg",
    "TIFF": "image/tiff",
}


def bytesToStr(inputBytes, encoding="UTF-8"):
    """convert binary bytes into str hexadecimal representation


    Args:
        inputBytes (bytes): bytes
    Returns:
        str
    Examples:
        input: b'\xdc5]\xa00\xca\xfe\x97m\x81n\x99\xa3+oQ'
        return: 'dc355da030cafe976d816e99a32b6f51'
    Raises:
    """
    inputHex = binascii.hexlify(inputBytes) # b'dc355da030cafe976d816e99a32b6f51'
    inputStr = inputHex.decode(encoding) # 'dc355da030cafe976d816e99a32b6f51'
    return inputStr


def calcMd5(inputContent, isRespBytes=False) :
    """generate md5 string from input content


    Args:
        inputContent (str/bytes): input content of string or bytes
        isRespBytes (bool): return bytes, otherwise return string
    Returns:
        md5 checksum
            str:
                eg: '3110e1e7994dc119ff92439c5758e465'
            bytes:
                eg: b'1\x10\xe1\xe7\x99M\xc1\x19\xff\x92C\x9cWX\xe4e'
    Raises:
    """
    md5Value = ""
    curMd5 = md5()


    inputBytes = ""
    if isinstance(inputContent, bytes):
        inputBytes = inputContent
    elif isinstance(inputContent, str):
        inputBytes = bytes(strToMd5, "UTF-8")


    curMd5.update(inputBytes)
    if isRespBytes:
        md5Value = curMd5.digest()
    else:
        md5Value = curMd5.hexdigest()
    return md5Value
/Users/crifan/dev/dev_root/python/EvernoteToWordpress/EvernoteToWordpress/EvernoteToWordpress.py
def updateNoteResouces(curNoteDetail, newResList):
    """Update note resources with new resource

    Args:
        curNoteDetail (Note): Evernote note with details
    Returns:
        updated note detail
    Raises:
    """
    originResList = curNoteDetail.resources
    originContent = curNoteDetail.content
    soup = BeautifulSoup(originContent, 'html.parser')
    for curIdx, curRes in enumerate(originResList):
        curMime = curRes.mime


        # # method 1: calc again
        # curResBytes = curRes.data.body
        # curHashStr1 = utils.calcMd5(curResBytes) # 'dc355da030cafe976d816e99a32b6f51'
        # method 2: convert from body hash bytes
        curHashStr = utils.bytesToStr(curRes.data.bodyHash)

        # imgeTypeP = re.compile("image/\w+")
        foundMediaNode = soup.find("en-media", attrs={"type": curMime, "hash": curHashStr})
        if foundMediaNode:
            newRes = newResList[curIdx]
            newMime = newRes.mime
            # newHashBytes = newRes.data.bodyHash # b'\xb8\xe8\xbb\xcc\xca\xc1\xdf&J\xbeV\xe2`\xa6K\xb7'
            newResBytes = newRes.data.body
            newHashStr = utils.calcMd5(newResBytes)
            foundMediaNode["type"] = newMime
            foundMediaNode["hash"] = newHashStr
        else:
            logging.warning("Not found resource: type=%s, hash=%s", curMime, curHashStr)


    # newContent = soup.prettify()
    newContent = str(soup)
    curNoteDetail.content = newContent


    curNoteDetail.resources = newResList

    return curNoteDetail

def isImageResource(curRes):
    """check is image media or not


    Args:
        curMedia (Resource): Evernote Resouce instance
    Returns:
        bool
    Raises:
    """
    isImage = False
    curResMime = curRes.mime # 'image/png' 'image/jpeg'
    # libs/evernote-sdk-python3/lib/evernote/edam/limits/constants.py
    matchImage = re.match("^image/", curResMime)
    logging.debug("matchImage=%s", matchImage)
    if matchImage:
        """
            image/gif
            image/jpeg
            image/png
        """
        isImage = True
    logging.info("curResMime=%s -> isImage=%s", curResMime, isImage)
    return isImage

def resizePostImage(curNoteDetail):
    """Resize each media image then update post

    Args:
        curNoteDetail (Note): Evernote note with details
    Returns:
        updated note detail
    Raises:
    """
    newResList = []
    originResList = curNoteDetail.resources
    for curIdx, eachResource in enumerate(originResList):
        if isImageResource(eachResource):
            imgFilename = eachResource.attributes.fileName
            logging.info("[%d] imgFilename=%s", curIdx, imgFilename)


            resBytes = eachResource.data.body
            resizedImgBytes, imgFormat = utils.resizeSingleImage(resBytes)


            reizedImgLen = len(resizedImgBytes) # 77935
            resizedImgMd5Bytes = utils.calcMd5(resizedImgBytes, isRespBytes=True) # '3110e1e7994dc119ff92439c5758e465'
            newMime = utils.ImageFormatToMime[imgFormat] # 'image/jpeg'


            newData = Types.Data()
            # newData = ttypes.Data()
            newData.size = reizedImgLen
            newData.bodyHash = resizedImgMd5Bytes
            newData.body = resizedImgBytes

            newRes = Types.Resource()
            # newRes = ttypes.Resource()
            newRes.mime = newMime
            newRes.data = newData
            newRes.attributes = eachResource.attributes

            newResList.append(newRes)
        else:
            """
                audio/wav
                audio/mpeg
                audio/amr
                application/pdf
                ...
            """
            newResList.append(eachResource)

    curNoteDetail = updateNoteResouces(curNoteDetail, newResList)

    # upload/sync to evernote
    newNote = Types.Note()
    newNote.title = curNoteDetail.title
    newNote.guid = curNoteDetail.guid
    # updated following
    newNote.content = curNoteDetail.content
    newNote.resources = curNoteDetail.resources
    updatedNote = gNoteStore.updateNote(newNote)

    # return curNoteDetail
    return updatedNote
实现了:
图片的压缩:
20200310 10:25:05 EvernoteToWordpress.py:317  INFO    curResMime=image/jpeg -> isImage=True
20200310 10:25:05 EvernoteToWordpress.py:334  INFO    [0] imgFilename=IMG_20200309_130727.jpg
-> Compress ratio=7%, from [fmt=JPEG, size=(4000, 3000), len=987.7KB] to [fmt=JPEG, size=(1024, 768), len=76.1KB]
-> Compress ratio=89%, from [fmt=JPEG, size=(4000, 3000), len=987.7KB] to [fmt=PNG, size=(1024, 768), len=886.1KB]
20200310 10:25:08 EvernoteToWordpress.py:317  INFO    curResMime=image/jpeg -> isImage=True
20200310 10:25:08 EvernoteToWordpress.py:334  INFO    [1] imgFilename=1583732707632.jpg
-> Compress ratio=100%, from [fmt=JPEG, size=(978, 366), len=59.1KB] to [fmt=JPEG, size=(978, 366), len=59.1KB]
-> Compress ratio=860%, from [fmt=JPEG, size=(978, 366), len=59.1KB] to [fmt=PNG, size=(978, 366), len=508.8KB]
20200310 10:25:08 EvernoteToWordpress.py:317  INFO    curResMime=image/png -> isImage=True
20200310 10:25:08 EvernoteToWordpress.py:334  INFO    [2] imgFilename=None
-> Compress ratio=100%, from [fmt=PNG, size=(988, 596), len=568.8KB] to [fmt=PNG, size=(988, 596), len=568.8KB]
-> Compress ratio=10%, from [fmt=PNG, size=(988, 596), len=568.8KB] to [fmt=JPEG, size=(988, 596), len=57.7KB]
20200310 10:25:10 EvernoteToWordpress.py:317  INFO    curResMime=image/jpeg -> isImage=True
20200310 10:25:10 EvernoteToWordpress.py:334  INFO    [3] imgFilename=Screenshot_2020-03-09-12-57-10-382_com.android.mms
-> Compress ratio=17%, from [fmt=JPEG, size=(1080, 2340), len=93.9KB] to [fmt=JPEG, size=(360, 780), len=16.2KB]
-> Compress ratio=42%, from [fmt=JPEG, size=(1080, 2340), len=93.9KB] to [fmt=PNG, size=(360, 780), len=39.8KB]
以及把更新后的图片资源列表的type和hash值也更新到content中。
最后再上传更新到印象笔记的note=post=帖子中。
【后记】
再去更新代码逻辑
关于图片media的width,再缩小图片后,有些width等属性就不需要了。所以需要去掉:
            noteAttrs = foundMediaNode.attrs
            hasWidthAttr = "width" in noteAttrs
            hasHeightAttr = "height" in noteAttrs
            if hasWidthAttr or hasHeightAttr:
                newImg = utils.bytesToImage(newResBytes)
                # newImgWidth = newImg.width
                # newImgHeight = newImg.height
                newImgWidth, newImgHeight = newImg.size
                if hasWidthAttr:
                    curWidth = int(noteAttrs["width"])
                    if curWidth >= newImgWidth:
                        del foundMediaNode["width"]
                
                if hasHeightAttr:
                    curHeight = int(noteAttrs["height"])
                    if curHeight >= newImgHeight:
                        del foundMediaNode["height"]
效果:
更新后的图片的width没了,图片就显示原始大小了:
再继续去添加其他逻辑。

转载请注明:在路上 » 【已解决】Python中更新印象笔记中帖子中附件图片的数据

发表我的评论
取消评论

表情

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址
92 queries in 0.182 seconds, using 23.39MB memory