折腾:
【未解决】Python处理印象笔记中笔记的代码块发布到WordPress后丢失格式
期间,想办法找找,如何才能让:

的原始的html:
<div>class Evernote(object):</div> <div> """</div> <div> Operate Evernote Yinxiang note via python</div> <div> <div><br /></div> </div> <div> 首页</div> <div> http://sandbox.yinxiang.com</div> <div> <div><br /></div>
的soup中,获取到内部的字符串值,且保留空格
此次去看看现有代码:
libs/crifan/utils.py
def getAllContents(curNode):
"""Get all contents of current and children nodes
Args:
curNode (soup node): current Beautifulsoup node
Returns:
str
Raises:
"""
# codeSnippetStr = curNode.prettify()
# codeSnippetStr = curNode.string
# codeSnippetStr = curNode.contents
codeSnippetStr = ""
stringList = []
stringGenerator = curNode.stripped_strings
# stringGenerator = curNode.strings
for eachStr in stringGenerator:
# logging.info("eachStr=%s", eachStr)
stringList.append(eachStr)
codeSnippetStr = "\n".join(stringList)
logging.info("codeSnippetStr=%s", codeSnippetStr)
return codeSnippetStr很明显,是stripped_strings导致去掉了空格
那看看此处如何才能让
beautifulsoup get string with space
试试get_text()
感觉更像是:.strings
去试试
if isStripped: stringGenerator = curNode.stripped_strings else: stringGenerator = curNode.strings
结果:

是可以保留空格的:
'\xa0\xa0"""'
说明是我们希望的。
【总结】
此处从:
stringGenerator = curNode.stripped_strings
改为:
stringGenerator = curNode.strings
即可保留html的节点中的空格,空行了。
相关函数完整代码:
def getAllContents(curNode, isStripped=False):
"""Get all contents of current and children nodes
Args:
curNode (soup node): current Beautifulsoup node
isStripped (bool): return stripped string or not
Returns:
str
Raises:
"""
# codeSnippetStr = curNode.prettify()
# codeSnippetStr = curNode.string
# codeSnippetStr = curNode.contents
codeSnippetStr = ""
stringList = []
if isStripped:
stringGenerator = curNode.stripped_strings
else:
stringGenerator = curNode.strings
# stringGenerator = curNode.strings
for eachStr in stringGenerator:
# logging.info("eachStr=%s", eachStr)
logging.info("eachStr=%s", eachStr)
stringList.append(eachStr)
codeSnippetStr = "\n".join(stringList)
logging.info("codeSnippetStr=%s", codeSnippetStr)
return codeSnippetStr供参考。