Python 涓彲浠ヨ鍙� word 鏂囦欢鐨勫簱鏈� python-docx 鍜� pywin32銆�
浼樼偣缂虹偣
python-docx璺ㄥ钩鍙板彧鑳藉鐞� .docx 鏍煎紡锛屼笉鑳藉鐞�.doc鏍煎紡
pywin32浠呴檺 windows 骞冲彴.doc 鍜� .docx 閮借兘澶勭悊
鏈汉瀵逛簬Python瀛︿範鍒涘缓浜嗕竴涓皬灏忕殑瀛︿範鍦堝瓙锛屼负鍚勪綅鎻愪緵浜嗕竴涓钩鍙帮紝澶у涓�璧锋潵璁ㄨ瀛︿範Python銆傛杩庡悇浣嶅埌鏉�Python瀛︿範缇わ細960410445涓�璧疯璁鸿棰戝垎浜涔犮�侾ython鏄湭鏉ョ殑鍙戝睍鏂瑰悜锛屾鍦ㄦ寫鎴樻垜浠殑鍒嗘瀽鑳藉姏鍙婂涓栫晫鐨勮鐭ユ柟寮忥紝鍥犳锛屾垜浠笌鏃朵勘杩涳紝杩庢帴鍙樺寲锛屽苟涓嶆柇鐨勬垚闀匡紝鎺屾彙Python鏍稿績鎶�鏈紝鎵嶆槸鎺屾彙鐪熸鐨勪环鍊兼墍鍦ㄣ��
pywin32
杩欎釜搴撳緢寮哄ぇ锛屼笉浠呬粎鍙互璇诲彇 word锛屼絾鏄綉涓婁粙缁嶇敤 pywin32 璇诲彇 .doc 鐨勬枃绔犵湡涓嶅锛屽洜涓猴紝鐪熷績涓嶅ソ鐢ㄣ��
浠ヤ笅鏄� pywin32 璇诲彇 .doc 鐨勪唬鐮佺ず渚嬶紝浣嗘槸璇诲彇琛ㄦ牸鏈夐棶棰橈紝杈撳嚭鍏ㄦ槸绌猴紝鍘熷洜涓嶆槑锛屽洜涓轰笉鎵撶畻鐢ㄦ墍浠ユ病鏈夋繁鍏ョ爺绌躲�傚彟澶栵紝濡傛灉琛ㄦ牸涓湁绾靛悜鍚堝苟鍗曞厓鏍硷紝浼氭姤閿欙細鈥滄棤娉曡闂闆嗗悎涓崟鐙殑琛岋紝鍥犱负琛ㄦ牸鏈夌旱鍚戝悎骞剁殑鍗曞厓鏍笺�傗��
from win32com.client import Dispatch
word = Dispatch('Word.Application')聽 聽 # 鎵撳紑word搴旂敤绋嬪簭
# word = DispatchEx('Word.Application') # 鍚姩鐙珛鐨勮繘绋�
word.Visible = 0聽 聽 聽 聽 # 鍚庡彴杩愯,涓嶆樉绀�
word.DisplayAlerts = 0聽 # 涓嶈鍛�
path = r'E:\abc\test.doc'
doc = word.Documents.Open(FileName=path, Encoding='gbk')
for para in doc.paragraphs:
聽 聽 print(para.Range.Text)
for t in doc.Tables:
聽 聽 for row in t.Rows:
聽 聽 聽 聽 for cell in row.Cells:
聽 聽 聽 聽 聽 聽 print(cell.Range.Text)
doc.Close()
word.Quit
浣嗘槸 pywin32 鏈夊彟澶栦竴涓姛鑳斤紝灏辨槸灏� .doc 鏍煎紡鍙﹀瓨涓� .docx 鏍煎紡锛岃繖鏍锋垜浠氨鍙互浣跨敤 python-docx 鏉ュ鐞嗕簡銆�
def doc2docx(path):
聽 聽 w = win32com.client.Dispatch('Word.Application')
聽 聽 w.Visible = 0
聽 聽 w.DisplayAlerts = 0
聽 聽 doc = w.Documents.Open(path)
聽 聽 newpath = os.path.splitext(path)[0] + '.docx'
聽 聽 doc.SaveAs(newpath, 12, False, "", True, "", False, False, False, False)
聽 聽 doc.Close()
聽 聽 w.Quit()
聽 聽 os.remove(path)
聽 聽 return newpath
python-docx
import docx
fn = r'E:\abc\test.docx'
doc = docx.Document(fn)
for paragraph in doc.paragraphs:
聽 聽 聽 聽 print(paragraph.text)
for table in doc.tables:
聽 聽 for row in table.rows:
聽 聽 聽 聽 for cell in row.cells:
聽 聽 聽 聽 聽 聽 print(cell.text)
瀵逛簬绾靛悜鍚堝苟鍗曞厓鏍硷紝python-docx 鐨勫鐞嗕篃寰堣创蹇冦��
鈥�
Word 鏈兘寮曞彂浜嬩欢
鎴戠殑鐖櫕鍦ㄧ埇鍙栧埌 .doc 鏂囦欢涔嬪悗锛屽氨閫氳繃涓婇潰鐨勬柟娉曞皢鍏惰浆涓� .docx 鏍煎紡锛屽師鏈竴鍒囬兘濂斤紝涓嬬彮鎸傛満鍦ㄨ窇锛岀浜屽ぉ鏉ヤ竴鐪嬶紝鎶ヤ簡杩欎釜閿欙細
鈥�
鎴戠敤鎶ラ敊鐨勬枃浠跺崟鐙皟璇曚簡 doc2docx 鏂规硶锛屽苟娌℃湁鎶ラ敊銆傜綉涓婃煡浜嗚繖涓敊璇紝娌℃湁鍟ユ敹鑾枫��
鍙嶅娴嬭瘯鍚庡彂鐜版�绘槸閭d釜缃戦〉鎶ラ敊锛岃鏄� bug 鍙互閲嶇幇锛岄棶棰樻槸鍒板簳鏄摢閲屾姤閿欍��
鎴戝皢浠g爜涓�琛岃鍒犲幓锛岀洿鍒板彧鐣欎笅鎵ц鍒版姤閿欐墍蹇呴』鐨勪唬鐮侊細
def get_winningbid_detail(url, name):
聽 聽 r = requests.get(url)
聽 聽 r.encoding = 'utf-8'
聽 聽 html = r.text
聽 聽 soup = BeautifulSoup(html, 'lxml')
聽 聽 ps = soup.find_all(text=re.compile('闄勪欢'))
聽 聽 if len(ps) > 0:
聽 聽 聽 聽 os.makedirs(os.path.join(download_dir, name), exist_ok=True)
聽 聽 聽 聽 for p in ps:
聽 聽 聽 聽 聽 聽 a_tab = p.find_next_sibling('a')
聽 聽 聽 聽 聽 聽 if a_tab is not None:
聽 聽 聽 聽 聽 聽 聽 聽 link = homepage + a_tab['href']
聽 聽 聽 聽 聽 聽 聽 聽 localfilename = os.path.join(download_dir, name, a_tab.text)
聽 聽 聽 聽 聽 聽 聽 聽 # print(localfilename)
聽 聽 聽 聽 聽 聽 聽 聽 with open(localfilename, 'wb+') as sw:
聽 聽 聽 聽 聽 聽 聽 聽 聽 聽 sw.write(requests.get(link).content)
聽 聽 聽 聽 聽 聽 聽 聽 if localfilename.endswith('.doc'):
聽 聽 聽 聽 聽 聽 聽 聽 聽 聽 doc2docx(localfilename)
鍙嶅璇昏繖娈典唬鐮侊紝骞舵病鏈夊彂鐜颁粈涔堥棶棰樸��
鍥犱负鏈変簺缃戦〉鐨勯檮浠跺悕绉版槸鐩稿悓鐨勶紝渚嬪 鍏憡.doc锛屾墍浠ユ垜鎸夋瘡涓綉椤电殑鏍囬锛堝湪鎬昏椤甸潰鐖埌鐨勶級鍒嗘枃浠跺す鏀剧疆涓嬭浇鐨勬枃浠讹紝鎵�浠ユ柟娉曚腑浼犱簡涓�涓� name 鍙傛暟锛岃�屽鏋� name 鍙傛暟浼犵┖锛屽垯涓嶄細鎶ラ敊銆�
鍏跺疄鐢辨宸茬粡鍙互鍙戠幇 bug 鎵�鍦ㄤ簡锛屼絾鎴戝嵈娌℃兂鍒帮紝鍙堝弽澶嶆姌鑵句簡寰堜箙鎵嶅彂鐜帮紝鍘熸潵鏄枃浠跺悕澶暱浜嗐��
鍦╳indows涓嬮潰锛屽崟涓枃浠跺悕鐨勯暱搴﹂檺鍒舵槸255锛屽畬鏁寸殑璺緞闀垮害锛堝 E:\abc\test.doc锛夎繖鏍烽檺鍒舵槸260锛屼竴涓眽瀛楀崰2涓瓧绗︺��
璺緞鏈�鍚庢湁涓�涓瓧绗︿覆缁撴潫绗� '\0' 瑕佸崰鎺変竴涓瓧绗︼紝鎵�浠ュ畬鏁磋矾寰勫疄闄呴檺闀挎槸259銆�