最新消息:20210917 已从crifan.com换到crifan.org

【已解决】用Python实现印象笔记Evernote的Note笔记的content和BeautifulSoup的soup的互相转换

Python crifan 404浏览 0评论
折腾:
【未解决】Python同步带缩进的html到WordPress后html被改变缩进丢失问题
期间,
另外顺带优化了:
(1)Evernote的Note的Content 转 BeautifulSoup的Soup
    @staticmethod
    def noteContentToSoup(curNote):
        """Convert Evernote Note content to BeautifulSoup Soup

        Args:
            curNote (Note): Evernote Note
        Returns:
            Soup
        Raises:
        """

        # noteHtml = curNote.content
        # # '<!DOCTYPE en-note SYSTEM "http://xml.evernote.com/pub/enml2.dtd">\n
        # # remove fisrt line
        # # <!DOCTYPE en-note SYSTEM "http://xml.evernote.com/pub/enml2.dtd">
        # # noteHtml = re.sub('<!DOCTYPE en-note SYSTEM "http://xml\.evernote\.com/pub/enml2\.dtd"\s+>', "", noteHtml)
        # noteHtml = re.sub('<!DOCTYPE en-note SYSTEM "http://xml\.evernote\.com/pub/enml2\.dtd">\s+', "", noteHtml)


        noteHtml = crifanEvernote.getNoteContentHtml(curNote)


        soup = utils.htmlToSoup(noteHtml)
        # now top node is: html, not en-note


        # for debug
        # if soup.name != "html":
        if soup.name != "[document]":
            logging.info("soup.name=%s", soup.name)


        return soup
(2)BeautifulSoup的Soup 转 Evernote的Note的Content
    @staticmethod
    def soupToNoteContent(soup):
        """Convert BeautifulSoup Soup to Evernote Note content


        Args:
            soup (Soup): BeautifulSoup Soup
        Returns:
            Evernote Note content html(str)
        Raises:
        """


        # for debug
        # if soup.name != "html":
        if soup.name != "[document]":
            logging.info("soup.name=%s", soup.name)


        # soup.name = "en-note" # not work
        noteContentHtml = utils.soupToHtml(soup)


        # convert <html>...</html> back to <en-note>...</en-note>
        noteContentHtml = re.sub('<html>(?P<contentBody>.+)</html>', "<en-note>\g<contentBody></en-note>", noteContentHtml, flags=re.S)


        noteContentHtml = crifanEvernote.convertToClosedEnMediaTag(noteContentHtml)


        # add first line
        # <!DOCTYPE en-note SYSTEM "http://xml.evernote.com/pub/enml2.dtd">
        noteContentHtml = '<!DOCTYPE en-note SYSTEM "http://xml.evernote.com/pub/enml2.dtd">\n' + noteContentHtml


        return noteContentHtml
其中调用了:
    @staticmethod
    def convertToClosedEnMediaTag(noteHtml):
        """Process note content html, for special </en-media> will cause error, so need convert:
                <en-media hash="7c54d8d29cccfcfe2b48dd9f952b715b" type="image/png"></en-media>
            to closed en-media tag:
                <en-media hash="7c54d8d29cccfcfe2b48dd9f952b715b" type="image/png" />
        Args:
            noteHtml (str): Note content html
        Returns:
            note content html with closed en-media tag (str)
        Raises:
        """
        noteHtml = re.sub("(?P<enMedia><en-media\s+[^<>]+)>\s*</en-media>", "\g<enMedia> />", noteHtml, flags=re.S)
        return noteHtml
即可。

转载请注明:在路上 » 【已解决】用Python实现印象笔记Evernote的Note笔记的content和BeautifulSoup的soup的互相转换

发表我的评论
取消评论

表情

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址
93 queries in 0.188 seconds, using 23.38MB memory