用Python的openpyxl去处理excel文件,
遇到一个需求是:
用代码自动去判断,对于合并后的单元格中的内容:

(之前调试时发现的)自动检测出来后,对于后续的,单元格值是空:

但是属于同一系列的话:

那么后续单元格的值,在代码赋值时,也使用前面的值。
其他还有很多合并的cell:

所以问题就变成了:
看看openpyxl是否能检测出来excel的单元格,是否是被合并的,且知道,具体合并了多少行(多少列)
这样才有利于后续单元格值为空时,使用前面的同系列的值
openpyxl check merged cell
python – How to detect merged cells in excel with openpyxl – Stack Overflow
merged_cell_ranges就可以找到了合并了的cell了
Read merged cells in Excel with Python – Stack Overflow
自己写代码?不够方便
不过如果返回:
[‘A1:M1’, ‘B22:B27’]
自己去解析,貌似也不是很方便?
python – openpyxl: merged_cell_ranges and merged_cells are both empty – Stack Overflow
merged_cells
merged_cell_ranges
openpyxl / openpyxl / issues / #476 – Extract the value of merged cell — Bitbucket
-》
Simple usage — openpyxl 2.5.2 documentation
可以用:merge_cells
unmerge_cells
手动试试excel取消合并后,其他单元格是否有值:

很明显,只有第一个有值:

所以此处没法使用unmerge_cells
Working with styles — openpyxl 2.5.2 documentation
Working with Spreadsheets using Python (Part 2) – Hacker Noon
Simple usage — openpyxl 2.5.0 documentation
merged_cell_ranges openpyxl
如何在excep中使用openpyxl检测合并的单元格 Excel 中文网
openpyxl.worksheet.worksheet module — openpyxl 2.5.0 documentation
找到了文档:
“merged_cell_ranges
Return a copy of cell ranges
Note
Deprecated: Use ws.merged_cells.ranges”
让你用:
ws.merged_cells.ranges
merged_cells openpyxl
去试试
然后用代码:
<code>mergedCellsRangesOld = ws.merged_cell_ranges
logging.info("mergedCellsRangesOld=%s", mergedCellsRangesOld)
mergedCells = ws.merged_cells
logging.info("mergedCells=%s", mergedCells)
mergedCellsRangesNew = ws.merged_cells.ranges
logging.info("mergedCellsRangesNew=%s", mergedCellsRangesNew)
</code>就可以去调试,找到对应的这些合并的cell了:

其中:

以及可以看到对应的行和列:
min_col
max_col
min_row
max_row
的值了。
接着就是去如何写代码了。
其中想到的是:
对于一个cell,想要判断其是否在一个CellRange范围内
去看看 openpyxl的CellRange是否自带这个功能
CellRange openpyxl
python – How to specify a cell range with variables in openpyxl? – Stack Overflow
ws.iter_rows() or ws.iter_cols()
ws.iter_rows() or ws.rows or ws.columns
python – openpyxl: a better way to read a range of numbers to an array – Stack Overflow
excel – Python Openpyxl, copy and paste cell range – Stack Overflow
看来只能自己去写代码去检测了?
然后遇到个小问题:
【总结】
最后去写代码:
<code>from openpyxl import Workbook, load_workbook
def isInCellRange(cellToCheck, cellRange):
"""
to check a cell whether in a cell range
:param cellToCheck:
:param cellRange:
:return:
True : if cell in range
False: if cell not in range
"""
# logging.debug("cellToCheck=[%d:%d]", cellToCheck.row, cellToCheck.col_idx)
# logging.debug("cellRange: row=[%d:%d] col=[%d:%d]",
# cellRange.min_row, cellRange.max_row, cellRange.min_col, cellRange.max_col)
if (cellToCheck.row >= cellRange.min_row) and \
(cellToCheck.row <= cellRange.max_row) and \
(cellToCheck.col_idx >= cellRange.min_col) and \
(cellToCheck.col_idx <= cellRange.max_col):
logging.info("cell[%d:%d] with in cell range: row=[%d:%d] col=[%d:%d]",
cellToCheck.row, cellToCheck.col_idx,
cellRange.min_row, cellRange.max_row, cellRange.min_col, cellRange.max_col)
return True
else:
return False
def getCellRangeValue(cellRange):
"""
get cell range value -> the top left cell value
:param cellRange:
:return:
"""
topLeftCell = ws.cell(row=cellRange.min_row, column=cellRange.min_col)
topLeftCellValue = topLeftCell.value
return topLeftCellValue
def getRealCellValue(ws, curCell):
"""
for openpyxl, to get real value from row and column
expecially for merged cell, will get its (same) value from top-left cell value
:param row:
:param column:
:return:
"""
realCellValue = curCell.value
mergedCellsRangesList = ws.merged_cells.ranges
# logging.info("mergedCellsRangesList=%s", mergedCellsRangesList)
# Note:
# to efficiency , we only check cell in range or not when its value is None
# for all merged cell value is None
if not realCellValue:
for eachCellRange in mergedCellsRangesList:
if isInCellRange(curCell, eachCellRange):
cellRangeValue = getCellRangeValue(eachCellRange)
realCellValue = cellRangeValue
break
return realCellValue
# parse excel file
wb = load_workbook(ExcelFullFilename)
logging.info("wb=%s", wb)
# sheetNameList = wb.get_sheet_names()
# logging.info("sheetNameList=%s", sheetNameList)
ws = wb[StorybookSheetTitle]
logging.info("ws=%s", ws)
# process each row in excel
# for curRowNum in range(realContentRowStartNum, ws.max_row + 1):
for curRowNum, eachRow in enumerate(ws.iter_rows(min_row=realContentRowStartNum)):
curRowNum += realContentRowStartNum
logging.info("-"*30 + " row[%d] " + "-"*30, curRowNum)
...
# authorColNumCellValue = ws.cell(row=curRowNum, column=AuthorColNum).value
authorColNumCellValue = getRealCellValue(ws, eachRow[AuthorColNum - 1])
logging.info("col[%d] authorColNumCellValue=%s", AuthorColNum, authorColNumCellValue)
# contentAbstractColNumCellValue = ws.cell(row=curRowNum, column=ContentSimpleIntroColNum).value
contentAbstractColNumCellValue = getRealCellValue(ws, eachRow[ContentSimpleIntroColNum - 1])
logging.info("col[%d] contentAbstractColNumCellValue=%s", ContentSimpleIntroColNum, contentAbstractColNumCellValue)
# publisherColNumCellValue = ws.cell(row=curRowNum, column=PublisherColNum).value
publisherColNumCellValue = getRealCellValue(ws, eachRow[PublisherColNum - 1])
logging.info("col[%d] publisherColNumCellValue=%s", PublisherColNum, publisherColNumCellValue)
...
</code>可以检测到后续的,值是None的,处于被合并区域内的Cell单元格:
<code>2018/04/04 02:50:22 LINE 246 INFO cell[147:14] with in cell range: row=[146:163] col=[14:14] </code>

对应着147行,14列的:

然后就可以通过:
getCellRangeValue去得到合并区域的值==合并区域的最左上角(top left)的那个单元格的值:

了。
【后记】
后来已把相关 的函数整理到:
https://github.com/crifan/crifanLib/blob/master/python/crifanLib/crifanOpenpyxl.py
了,供参考。