折腾:
【未解决】用Python实现解析excel计算业绩统计并打包成可执行文件
期间,现有一个excel
想要读取其中数据,并做后续计算和处理
之前已经会用,知道可以用,openpyxl或xlsxreader之类工具去处理
现在希望用pandas,去试试
先去搭建虚拟环境,结果报错:
【已解决】Mac中pipenv安装报错:Could not find a version that matches pandas
去尝试用pandas去读取excel文件,并解析得到数据。
搞清楚如何用pandas读取excel
pandas read excel
pandas read excel用法
结果报错了:
【已解决】pandas的read_excel报错:ImportError Missing optional dependency xlrd
然后继续看看,如何获取每一列,或者每一行的数据:
pandas 基本用法
然后去搞清楚,行和列的总数
rowCount, columnCount = salePd.shape print("rowCount=%s, columnCount=%s" % (rowCount, columnCount))
rowCount=1017, columnCount=8
【总结】
此处,读取一个excel,并且循环读取每一行的值,相关代码是:
import os import pandas as pd ################################################################################ # Config ################################################################################ InputFolder = "input" saleMoneySumDict = {} saleDf = pd.read_excel(io=SaleFullPath, engine="openpyxl") print("saleDf=%s" % saleDf) saleRowCount, saleColumnCount = saleDf.shape print("saleRowCount=%s, saleColumnCount=%s" % (saleRowCount, saleColumnCount)) saleTitleList = saleDf.columns print("saleTitleList=%s" % saleTitleList) # sort by time print("before head=%s" % saleDf.head()) saleDf = saleDf.sort_values(by='时间') print("after head=%s" % saleDf.head()) firstTime = saleDf.iloc[0]["时间"] lastTime = saleDf.iloc[saleRowCount - 1]["时间"] print("firstTime=%s, lastTime=%s" % (firstTime, lastTime)) # for curIdx, curRow in saleDf.itertuples(): for rowIdx, rowValue in saleDf.iterrows(): print("%s %s %s" % ("-"*10, rowIdx, "-"*10)) print(rowIdx) print(rowValue) # curRowDict = saleDf.iloc[curIdx] # saleId = curRowDict["销售员编号"] # saleMoney = curRowDict["业绩金额"] saleId = rowValue["销售员编号"] saleMoney = rowValue["业绩金额"] print("saleId=%s, saleMoney=%s" % (saleId, saleMoney)) if saleId in saleMoneySumDict.keys(): saleMoneySumDict[saleId] = saleMoneySumDict[saleId] + saleMoney else: saleMoneySumDict[saleId] = saleMoney print("len(saleMoneySumDict.keys())=%s" % len(saleMoneySumDict.keys())) # len(saleMoneySumDict.keys())=463 print("saleMoneySumDict=%s" % saleMoneySumDict)
输出:
saleDf= 销售员编号 类型 下级账号 销售数量 业绩系数 单价 业绩金额 时间 0 adcandccc555 销售 NaN 1.000000e+07 8.666666 5.018457 4.349329e+08 2021-03-12 12:15:50 1 yang12345ccc 销售 NaN 4.000000e+04 8.666600 20.084027 6.962409e+06 2021-05-29 10:59:50 2 mo1212121212 销售 NaN 1.990000e+04 8.666666 20.079746 3.463087e+06 2021-05-01 01:00:06 3 xcp555555555 销售 NaN 1.710000e+04 8.666600 20.080157 2.975856e+06 2021-05-04 08:49:21 4 zengjunping1 销售 NaN 1.202000e+04 8.666666 20.078681 2.091663e+06 2021-04-19 20:40:00 ... ... .. ... ... ... ... ... ... 1215 wabjtam12345 销售 NaN 1.000000e+00 8.666666 0.041161 3.567306e-01 2021-02-27 21:17:00 1216 donglijun123 销售 NaN 5.600000e-03 8.666666 5.041192 2.446658e-01 2021-03-21 11:53:30 1217 zw1111111111 销售 NaN 5.000000e-01 8.666666 0.041138 1.782634e-01 2021-03-01 09:52:20 1218 aaasssdddfff 销售 NaN 2.000000e-01 8.666000 0.041173 7.136102e-02 2021-02-26 18:41:50 1219 ckj111222333 销售 NaN 1.000000e-01 8.666666 0.041172 3.568276e-02 2021-02-27 02:22:20 [1220 rows x 8 columns] saleRowCount=1220, saleColumnCount=8 saleTitleList=Index(['销售员编号', '类型', '下级账号', '销售数量', '业绩系数', '单价 ', '业绩金额', '时间'], dtype='object') before head= 销售员编号 类型 下级账号 销售数量 业绩系数 单价 业绩金额 时间 0 adcandccc555 销售 NaN 10000000.0 8.666666 5.018457 4.349329e+08 2021-03-12 12:15:50 1 yang12345ccc 销售 NaN 40000.0 8.666600 20.084027 6.962409e+06 2021-05-29 10:59:50 2 mo1212121212 销售 NaN 19900.0 8.666666 20.079746 3.463087e+06 2021-05-01 01:00:06 3 xcp555555555 销售 NaN 17100.0 8.666600 20.080157 2.975856e+06 2021-05-04 08:49:21 4 zengjunping1 销售 NaN 12020.0 8.666666 20.078681 2.091663e+06 2021-04-19 20:40:00 after head= 销售员编号 类型 下级账号 销售数量 业绩系数 单价 业绩金额 时间 1208 ancdefg12345 销售 NaN 1.0 8.666 0.041174 0.356811 2021-02-20 14:00:00 1212 ancdefg12345 销售 NaN 1.0 8.666 0.041174 0.356811 2021-02-20 14:00:00 1211 ancdefg12345 销售 NaN 1.0 8.666 0.041174 0.356811 2021-02-20 14:00:00 1210 ancdefg12345 销售 NaN 1.0 8.666 0.041174 0.356811 2021-02-20 14:00:00 1209 ancdefg12345 销售 NaN 1.0 8.666 0.041174 0.356811 2021-02-20 14:00:00 firstTime=2021-02-20 14:00:00, lastTime=2021-06-09 20:29:11 ---------- 1208 ---------- 1208 销售员编号 ancdefg12345 类型 销售 下级账号 NaN 销售数量 1.0 业绩系数 8.666 单价 0.041174 业绩金额 0.356811 时间 2021-02-20 14:00:00 Name: 1208, dtype: object saleId=ancdefg12345, saleMoney=0.35681122 ---------- 1212 ---------- 1212 销售员编号 ancdefg12345 类型 销售 下级账号 NaN 销售数量 1.0 业绩系数 8.666 单价 0.041174 业绩金额 0.356811 时间 2021-02-20 14:00:00 Name: 1212, dtype: object saleId=ancdefg12345, saleMoney=0.35681122 。。。
如此即可。