最新消息:20210917 已从crifan.com换到crifan.org

【已解决】Python的BeautifulSoup中如何给指定位置插入新节点soup

节点 crifan 453浏览 0评论
折腾:
【未解决】Python发布印象笔记帖子到WordPress后把标题和帖子更新到已发布
期间,希望是对于
中的en-note的第一个div中,在其下所有p之前,插入一个p的soup的node
是要在en-note的第一个div中的,第一个p之前,插入同样的p,即可。
目前已经用代码:
    # curNoteHtml = crifanEvernote.noteContentToHtml(publishedNoteDetail.content, isKeepTopHtml=False)
    curNoteSoup = crifanEvernote.noteContentToSoup(publishedNoteDetail)

    # <div style="font-size: 14px; margin: 0; padding: 0; width: 100%;">
    stypleP = re.compile("width:\s*100%;$")
    firstDivSoup = curNoteSoup.find("div", attrs={"style": stypleP})
    logging.info("firstDivSoup=%s", firstDivSoup)
获取到对应的div节点了。
20201209 10:46:18 EvernoteToWordpress.py:886  INFO    firstDivSoup=<div style="font-size: 14px; margin: 0; padding: 0; width: 100%;"><p style="line-height: 160%; box-sizing: content-box; margin: 10px 0; color: #333;"><a href="https://www.crifan.com/try_charger_and_charging_cable_charge_macbookpro" style="line-height: 160%; box-sizing: content-box; text-decoration: underline; color: #5286bc;">【记录】试用充电器和充电线给MacBookPro充电</a></p>
<p style="line-height: 160%; box-sizing: content-box; margin: 10px 0; color: #333;"><a href="https://www.crifan.com/bethesda_gan_gallium_nitride_65w_charger_unpacking_photo" style="line-height: 160%; box-sizing: content-box; text-decoration: underline; color: #5286bc;">【记录】倍思GaN氮化镓65W充电器开箱照</a></p>
<p style="line-height: 160%; box-sizing: content-box; margin: 10px 0; color: #333;"><a style="line-height: 160%; box-sizing: content-box; text-decoration: underline; color: #5286bc;">【已解决】更换mac pro的充电器</a></p>
<hr style="line-height: 160%; box-sizing: content-box; border-top: 1px solid #eee; margin: 16px 0;"/>
<p style="line-height: 160%; box-sizing: content-box; margin: 10px 0; color: #333;"><a href="https://www.crifan.com/" style="line-height: 160%; box-sizing: content-box; text-decoration: underline; color: #5286bc;"></a></p>
插入一个soup的node
要去搞清楚如何插入
insert()
insert_before() 和 insert_after()
去看看用哪个,怎么用
感觉像是用insert,指定index=0,用div去insert,去试试
代码:
    slugLink = "%s/%s" % (gCfg["wordpress"]["api"]["host"], noteSlug)
    newPSoupStr = """
        <p style="line-height: 160%%; box-sizing: content-box; margin: 10px 0; color: #333;"><a
            href="%s"
            style="line-height: 160%%; box-sizing: content-box; text-decoration: underline; color: #5286bc;">%s</a>
        </p>
    """ % (slugLink , noteTitle)


    firstDivSoup.insert(0, newPSoupStr)
    logging.info("after insert: firstDivSoup=%s", firstDivSoup)
可以插入,但是插入后是:
2020/12/09 11:15:48 EvernoteToWordpress.py:897  INFO    after insert: firstDivSoup=<div style="font-size: 14px; margin: 0; padding: 0; width: 100%;">
        &lt;p style="line-height: 160%; box-sizing: content-box; margin: 10px 0; color: #333;"&gt;&lt;a
            href="https://www.crifan.com/replace_mac_pro_charger"
            style="line-height: 160%; box-sizing: content-box; text-decoration: underline; color: #5286bc;"&gt;【已解决】更换mac pro的充电器&lt;/a&gt;
        &lt;/p&gt;
。。。
即,插入的只是普通字符串,不是p节点,内置a节点的层级节点
所以要去分别插入不同层级节点才可以
还是抽空去用
soup.new_tag
然后再insert 或 insert_before 或 insert_after
估计才可以。
最后是:
curNoteSoup = crifanEvernote.noteContentToSoup(publishedNoteDetail)
得到整体的soup
然后
    # <div style="font-size: 14px; margin: 0; padding: 0; width: 100%;">
    stypleP = re.compile("width:\s*100%;$")
    firstDivSoup = curNoteSoup.find("div", attrs={"style": stypleP})
    logging.info("firstDivSoup=%s", firstDivSoup)
找到第一个div
对于新增的p,之前用:
    # newPSoup = BeautifulSoup(newPSoupStr)
    # # <html><head></head><body><p style="line-height: 160%; box-sizing: content-box; margin: 10px 0; color: #333;"><a href="https://www.crifan.com/replace_mac_pro_charger" style="line-height: 160%; box-sizing: content-box; text-decoration: underline; color: #5286bc;">【已解决】更换mac pro的充电器</a></p>
是不行的,可以看出,会加上顶层的html等标签
只能用new_tag模式:
    newPSoup = curNoteSoup.new_tag("p")
    # newPSoup=<p></p>
    PStyle = "line-height: 160%; box-sizing: content-box; margin: 10px 0; color: #333;"
    newPSoup.attrs["style"] = PStyle
    # newPSoup=<p style="line-height: 160%; box-sizing: content-box; margin: 10px 0; color: #333;"></p>
即可得到带style的p的soup
另外再去新建p下面的a:
    newASoup = curNoteSoup.new_tag("a")
    newASoup.string = noteTitle
    newASoup.attrs["href"] = slugLink
    AStyle = "line-height: 160%; box-sizing: content-box; text-decoration: underline; color: #5286bc;"
    newASoup.attrs["style"] = AStyle
    logging.info("newASoup=%s", newASoup)
    # newASoup=<a href="https://www.crifan.com/replace_mac_pro_charger" style="line-height: 160%; box-sizing: content-box; text-decoration: underline; color: #5286bc;">【已解决】更换mac pro的充电器</a>
重点来了:
直接把a去append,即可加到 变成 p的child:
    newPSoup.append(newASoup)
    logging.info("newPSoup=%s", newPSoup)
    # newPSoup=<p style="line-height: 160%; box-sizing: content-box; margin: 10px 0; color: #333;"><a href="https://www.crifan.com/replace_mac_pro_charger" style="line-height: 160%; box-sizing: content-box; text-decoration: underline; color: #5286bc;">【已解决】更换mac pro的充电器</a></p>
再去找到 div下面第一个p:
    firstPSoup = firstDivSoup.find("p")
    logging.info("firstPSoup=%s", firstPSoup)
    # firstPSoup=<p style="line-height: 160%; box-sizing: content-box; margin: 10px 0; color: #333;"><a href="https://www.crifan.com/try_charger_and_charging_cable_charge_macbookpro" style="line-height: 160%; box-sizing: content-box; text-decoration: underline; color: #5286bc;">【记录】试用充电器和充电线给MacBookPro充电</a></p>
最重要的来了:
用insert_before,就可以把新的p,加到div的第一个p之前:
    firstPSoup.insert_before(newPSoup)
    logging.info("firstDivSoup=%s", firstDivSoup)
    # firstDivSoup=<div style="font-size: 14px; margin: 0; padding: 0; width: 100%;"><p style="line-height: 160%; box-sizing: content-box; margin: 10px 0; color: #333;"><a href="https://www.crifan.com/replace_mac_pro_charger" style="line-height: 160%; box-sizing: content-box; text-decoration: underline; color: #5286bc;">【已解决】更换mac pro的充电器</a></p><p style="line-height: 160%; box-sizing: content-box; margin: 10px 0; color: #333;"><a href="https://www.crifan.com/try_charger_and_charging_cable_charge_macbookpro" style="line-height: 160%; box-sizing: content-box; text-decoration: underline; color: #5286bc;">【记录】试用充电器和充电线给MacBookPro充电</a></p>
(去Evernote的sync后,content变成我们要的)
最后总体的div变成:
<!DOCTYPE en-note SYSTEM "http://xml.evernote.com/pub/enml2.dtd">
<en-note>
 <div style="font-size: 14px; margin: 0; padding: 0; width: 100%;">
  <p style="line-height: 160%; box-sizing: content-box; margin: 10px 0; color: #333;">
   <a href="https://www.crifan.com/replace_mac_pro_charger" style="line-height: 160%; box-sizing: content-box; text-decoration: underline; color: #5286bc;">
    【已解决】更换mac pro的充电器
   </a>
  </p>
  <p style="line-height: 160%; box-sizing: content-box; margin: 10px 0; color: #333;">
   <a href="https://www.crifan.com/try_charger_and_charging_cable_charge_macbookpro" style="line-height: 160%; box-sizing: content-box; text-decoration: underline; color: #5286bc;">
    【记录】试用充电器和充电线给MacBookPro充电
   </a>
  </p>
。。。
【总结】
最后完整代码:
def updateNoteToPublished(noteTitle, noteSlug):
    global gCfg


    publishedNote = gCfg["evernote"]["afterUpload"]["HasPublised"]["note"]
    publishedNoteTitle = publishedNote["title"] # '已发布 20201206'
    publishedNoteGuid = publishedNote["guid"] # 'e242563a-4380-4a59-a6cf-f2e9ac019969'
    publishedNoteDetail = gEvernote.getNoteDetail(publishedNoteGuid)


    # for debug
    utils.dbgSaveContentToHtml(publishedNoteDetail)


    # curNoteHtml = crifanEvernote.noteContentToHtml(publishedNoteDetail.content, isKeepTopHtml=False)
    curNoteSoup = crifanEvernote.noteContentToSoup(publishedNoteDetail)


    # <div style="font-size: 14px; margin: 0; padding: 0; width: 100%;">
    stypleP = re.compile("width:\s*100%;$")
    firstDivSoup = curNoteSoup.find("div", attrs={"style": stypleP})
    logging.info("firstDivSoup=%s", firstDivSoup)


    slugLink = "%s/%s" % (gCfg["wordpress"]["api"]["host"], noteSlug)
    logging.info("slugLink=%s", slugLink)
    logging.info("noteTitle=%s", noteTitle)
    # 'https://www.crifan.com/replace_mac_pro_charger'
    newPSoupStr = """
        <p style="line-height: 160%%; box-sizing: content-box; margin: 10px 0; color: #333;"><a
            href="%s"
            style="line-height: 160%%; box-sizing: content-box; text-decoration: underline; color: #5286bc;">%s</a>
        </p>
    """ % (slugLink , noteTitle)
    # firstDivSoup.insert(0, newPSoupStr)
    # logging.info("after insert: firstDivSoup=%s", firstDivSoup)


    # newPSoup = BeautifulSoup(newPSoupStr)
    # # <html><head></head><body><p style="line-height: 160%; box-sizing: content-box; margin: 10px 0; color: #333;"><a href="https://www.crifan.com/replace_mac_pro_charger" style="line-height: 160%; box-sizing: content-box; text-decoration: underline; color: #5286bc;">【已解决】更换mac pro的充电器</a></p>
    newPSoup = curNoteSoup.new_tag("p")
    # newPSoup=<p></p>
    PStyle = "line-height: 160%; box-sizing: content-box; margin: 10px 0; color: #333;"
    newPSoup.attrs["style"] = PStyle
    # newPSoup=<p style="line-height: 160%; box-sizing: content-box; margin: 10px 0; color: #333;"></p>
    logging.info("newPSoup=%s", newPSoup)


    newASoup = curNoteSoup.new_tag("a")
    newASoup.string = noteTitle
    newASoup.attrs["href"] = slugLink
    AStyle = "line-height: 160%; box-sizing: content-box; text-decoration: underline; color: #5286bc;"
    newASoup.attrs["style"] = AStyle
    logging.info("newASoup=%s", newASoup)
    # newASoup=<a href="https://www.crifan.com/replace_mac_pro_charger" style="line-height: 160%; box-sizing: content-box; text-decoration: underline; color: #5286bc;">【已解决】更换mac pro的充电器</a>


    newPSoup.append(newASoup)
    logging.info("newPSoup=%s", newPSoup)
    # newPSoup=<p style="line-height: 160%; box-sizing: content-box; margin: 10px 0; color: #333;"><a href="https://www.crifan.com/replace_mac_pro_charger" style="line-height: 160%; box-sizing: content-box; text-decoration: underline; color: #5286bc;">【已解决】更换mac pro的充电器</a></p>


    firstPSoup = firstDivSoup.find("p")
    logging.info("firstPSoup=%s", firstPSoup)
    # firstPSoup=<p style="line-height: 160%; box-sizing: content-box; margin: 10px 0; color: #333;"><a href="https://www.crifan.com/try_charger_and_charging_cable_charge_macbookpro" style="line-height: 160%; box-sizing: content-box; text-decoration: underline; color: #5286bc;">【记录】试用充电器和充电线给MacBookPro充电</a></p>
    firstPSoup.insert_before(newPSoup)
    logging.info("firstDivSoup=%s", firstDivSoup)
    # firstDivSoup=<div style="font-size: 14px; margin: 0; padding: 0; width: 100%;"><p style="line-height: 160%; box-sizing: content-box; margin: 10px 0; color: #333;"><a href="https://www.crifan.com/replace_mac_pro_charger" style="line-height: 160%; box-sizing: content-box; text-decoration: underline; color: #5286bc;">【已解决】更换mac pro的充电器</a></p><p style="line-height: 160%; box-sizing: content-box; margin: 10px 0; color: #333;"><a href="https://www.crifan.com/try_charger_and_charging_cable_charge_macbookpro" style="line-height: 160%; box-sizing: content-box; text-decoration: underline; color: #5286bc;">【记录】试用充电器和充电线给MacBookPro充电</a></p>


    # newTitleLinkStr = "[%s](%s)" % (noteTitle, slugLink)
    # newNoteHtml = "%s\n%s" % (newTitleLinkStr, curNoteHtml)
    # newNoteContent = crifanEvernote.htmlToNoteContent(newNoteHtml)


    newNoteHtml = utils.soupToHtml(curNoteSoup)
    newNoteContent = crifanEvernote.htmlToNoteContent(newNoteHtml)
    logging.info("newNoteContent=%s", newNoteContent)


    syncParamDict = {
        # mandatory
        "noteGuid": publishedNoteGuid,
        "noteTitle": publishedNoteTitle,
        # optional
        "newContent": newNoteContent,
    }
    respNote = gEvernote.syncNote(**syncParamDict)


    # for debug
    utils.dbgSaveContentToHtml(publishedNoteDetail)


    return
供参考。

转载请注明:在路上 » 【已解决】Python的BeautifulSoup中如何给指定位置插入新节点soup

发表我的评论
取消评论

表情

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址
97 queries in 0.193 seconds, using 23.23MB memory