最新消息:20210917 已从crifan.com换到crifan.org

【已解决】BeautifulSoup中如何保留div内的字符串且保留空格等缩进

BeautifulSoup crifan 241浏览 0评论
折腾:
【未解决】Python处理印象笔记中笔记的代码块发布到WordPress后丢失格式
期间,想办法找找,如何才能让:
的原始的html:
        <div>class Evernote(object):</div>
        <div>  """</div>
        <div>    Operate Evernote Yinxiang note via python</div>
        <div>
            <div><br /></div>
        </div>
        <div>      首页</div>
        <div>      http://sandbox.yinxiang.com</div>
        <div>
            <div><br /></div>
的soup中,获取到内部的字符串值,且保留空格

此次去看看现有代码:
libs/crifan/utils.py
def getAllContents(curNode):
    """Get all contents of current and children nodes


    Args:
        curNode (soup node): current Beautifulsoup node
    Returns:
        str
    Raises:
    """
    # codeSnippetStr = curNode.prettify()
    # codeSnippetStr = curNode.string
    # codeSnippetStr = curNode.contents
    codeSnippetStr = ""
    stringList = []
    stringGenerator = curNode.stripped_strings
    # stringGenerator = curNode.strings
    for eachStr in stringGenerator:
        # logging.info("eachStr=%s", eachStr)
        stringList.append(eachStr)
    codeSnippetStr = "\n".join(stringList)
    logging.info("codeSnippetStr=%s", codeSnippetStr)
    return codeSnippetStr
很明显,是stripped_strings导致去掉了空格
那看看此处如何才能让
beautifulsoup get string with space
python – beautifulsoup with get_text – handle spaces – Stack Overflow
试试get_text()
.strings 和 stripped_strings
感觉更像是:.strings
去试试
    if isStripped:
        stringGenerator = curNode.stripped_strings
    else:
        stringGenerator = curNode.strings
结果:
是可以保留空格的:
'\xa0\xa0"""'
说明是我们希望的。
【总结】
此处从:
stringGenerator = curNode.stripped_strings
改为:
stringGenerator = curNode.strings
即可保留html的节点中的空格,空行了。
相关函数完整代码:
def getAllContents(curNode, isStripped=False):
    """Get all contents of current and children nodes


    Args:
        curNode (soup node): current Beautifulsoup node
        isStripped (bool): return stripped string or not
    Returns:
        str
    Raises:
    """
    # codeSnippetStr = curNode.prettify()
    # codeSnippetStr = curNode.string
    # codeSnippetStr = curNode.contents
    codeSnippetStr = ""
    stringList = []
    if isStripped:
        stringGenerator = curNode.stripped_strings
    else:
        stringGenerator = curNode.strings


    # stringGenerator = curNode.strings
    for eachStr in stringGenerator:
        # logging.info("eachStr=%s", eachStr)
        logging.info("eachStr=%s", eachStr)
        stringList.append(eachStr)
    codeSnippetStr = "\n".join(stringList)
    logging.info("codeSnippetStr=%s", codeSnippetStr)
    return codeSnippetStr
供参考。

转载请注明:在路上 » 【已解决】BeautifulSoup中如何保留div内的字符串且保留空格等缩进

发表我的评论
取消评论

表情

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址
86 queries in 0.123 seconds, using 21.66MB memory