Python锛氳鍙� .doc銆�.docx 涓ょ Word 鏂囦欢绠�杩板強鈥淲ord 鏈兘寮曞彂浜嬩欢鈥濋敊璇�

Python 涓彲浠ヨ鍙� word 鏂囦欢鐨勫簱鏈� python-docx 鍜� pywin32銆�

浼樼偣缂虹偣

python-docx璺ㄥ钩鍙板彧鑳藉鐞� .docx 鏍煎紡锛屼笉鑳藉鐞�.doc鏍煎紡

pywin32浠呴檺 windows 骞冲彴.doc 鍜� .docx 閮借兘澶勭悊

鏈汉瀵逛簬Python瀛︿範鍒涘缓浜嗕竴涓皬灏忕殑瀛︿範鍦堝瓙锛屼负鍚勪綅鎻愪緵浜嗕竴涓钩鍙帮紝澶у涓�璧锋潵璁ㄨ瀛︿範Python銆傛杩庡悇浣嶅埌鏉�Python瀛︿範缇わ細960410445涓�璧疯璁鸿棰戝垎浜涔犮�侾ython鏄湭鏉ョ殑鍙戝睍鏂瑰悜锛屾鍦ㄦ寫鎴樻垜浠殑鍒嗘瀽鑳藉姏鍙婂涓栫晫鐨勮鐭ユ柟寮忥紝鍥犳锛屾垜浠笌鏃朵勘杩涳紝杩庢帴鍙樺寲锛屽苟涓嶆柇鐨勬垚闀匡紝鎺屾彙Python鏍稿績鎶�鏈紝鎵嶆槸鎺屾彙鐪熸鐨勪环鍊兼墍鍦ㄣ��

pywin32

杩欎釜搴撳緢寮哄ぇ锛屼笉浠呬粎鍙互璇诲彇 word锛屼絾鏄綉涓婁粙缁嶇敤 pywin32 璇诲彇 .doc 鐨勬枃绔犵湡涓嶅锛屽洜涓猴紝鐪熷績涓嶅ソ鐢ㄣ��

浠ヤ笅鏄� pywin32 璇诲彇 .doc 鐨勪唬鐮佺ず渚嬶紝浣嗘槸璇诲彇琛ㄦ牸鏈夐棶棰橈紝杈撳嚭鍏ㄦ槸绌猴紝鍘熷洜涓嶆槑锛屽洜涓轰笉鎵撶畻鐢ㄦ墍浠ユ病鏈夋繁鍏ョ爺绌躲�傚彟澶栵紝濡傛灉琛ㄦ牸涓湁绾靛悜鍚堝苟鍗曞厓鏍硷紝浼氭姤閿欙細鈥滄棤娉曡闂闆嗗悎涓崟鐙殑琛岋紝鍥犱负琛ㄦ牸鏈夌旱鍚戝悎骞剁殑鍗曞厓鏍笺�傗��

from win32com.client import Dispatch

word = Dispatch('Word.Application')聽 聽 # 鎵撳紑word搴旂敤绋嬪簭

# word = DispatchEx('Word.Application') # 鍚姩鐙珛鐨勮繘绋�

word.Visible = 0聽 聽 聽 聽 # 鍚庡彴杩愯,涓嶆樉绀�

word.DisplayAlerts = 0聽 # 涓嶈鍛�

path = r'E:\abc\test.doc'

doc = word.Documents.Open(FileName=path, Encoding='gbk')

for para in doc.paragraphs:

聽 聽 print(para.Range.Text)

for t in doc.Tables:

聽 聽 for row in t.Rows:

聽 聽 聽 聽 for cell in row.Cells:

聽 聽 聽 聽 聽 聽 print(cell.Range.Text)

doc.Close()

word.Quit

浣嗘槸 pywin32 鏈夊彟澶栦竴涓姛鑳斤紝灏辨槸灏� .doc 鏍煎紡鍙﹀瓨涓� .docx 鏍煎紡锛岃繖鏍锋垜浠氨鍙互浣跨敤 python-docx 鏉ュ鐞嗕簡銆�

def doc2docx(path):

聽 聽 w = win32com.client.Dispatch('Word.Application')

聽 聽 w.Visible = 0

聽 聽 w.DisplayAlerts = 0

聽 聽 doc = w.Documents.Open(path)

聽 聽 newpath = os.path.splitext(path)[0] + '.docx'

聽 聽 doc.SaveAs(newpath, 12, False, "", True, "", False, False, False, False)

聽 聽 doc.Close()

聽 聽 w.Quit()

聽 聽 os.remove(path)

聽 聽 return newpath

python-docx

import docx

fn = r'E:\abc\test.docx'

doc = docx.Document(fn)

for paragraph in doc.paragraphs:

聽 聽 聽 聽 print(paragraph.text)

for table in doc.tables:

聽 聽 for row in table.rows:

聽 聽 聽 聽 for cell in row.cells:

聽 聽 聽 聽 聽 聽 print(cell.text)

瀵逛簬绾靛悜鍚堝苟鍗曞厓鏍硷紝python-docx 鐨勫鐞嗕篃寰堣创蹇冦��

鈥�

Python锛氳鍙� .doc銆�.docx 涓ょ Word 鏂囦欢绠�杩板強鈥淲ord 鏈兘寮曞彂浜嬩欢鈥濋敊璇�_第1张图片

Word 鏈兘寮曞彂浜嬩欢

鎴戠殑鐖櫕鍦ㄧ埇鍙栧埌 .doc 鏂囦欢涔嬪悗锛屽氨閫氳繃涓婇潰鐨勬柟娉曞皢鍏惰浆涓� .docx 鏍煎紡锛屽師鏈竴鍒囬兘濂斤紝涓嬬彮鎸傛満鍦ㄨ窇锛岀浜屽ぉ鏉ヤ竴鐪嬶紝鎶ヤ簡杩欎釜閿欙細

鈥�

鎴戠敤鎶ラ敊鐨勬枃浠跺崟鐙皟璇曚簡 doc2docx 鏂规硶锛屽苟娌℃湁鎶ラ敊銆傜綉涓婃煡浜嗚繖涓敊璇紝娌℃湁鍟ユ敹鑾枫��

鍙嶅娴嬭瘯鍚庡彂鐜版�绘槸閭d釜缃戦〉鎶ラ敊锛岃鏄� bug 鍙互閲嶇幇锛岄棶棰樻槸鍒板簳鏄摢閲屾姤閿欍��

鎴戝皢浠g爜涓�琛岃鍒犲幓锛岀洿鍒板彧鐣欎笅鎵ц鍒版姤閿欐墍蹇呴』鐨勪唬鐮侊細

def get_winningbid_detail(url, name):

聽 聽 r = requests.get(url)

聽 聽 r.encoding = 'utf-8'

聽 聽 html = r.text

聽 聽 soup = BeautifulSoup(html, 'lxml')

聽 聽 ps = soup.find_all(text=re.compile('闄勪欢'))

聽 聽 if len(ps) > 0:

聽 聽 聽 聽 os.makedirs(os.path.join(download_dir, name), exist_ok=True)

聽 聽 聽 聽 for p in ps:

聽 聽 聽 聽 聽 聽 a_tab = p.find_next_sibling('a')

聽 聽 聽 聽 聽 聽 if a_tab is not None:

聽 聽 聽 聽 聽 聽 聽 聽 link = homepage + a_tab['href']

聽 聽 聽 聽 聽 聽 聽 聽 localfilename = os.path.join(download_dir, name, a_tab.text)

聽 聽 聽 聽 聽 聽 聽 聽 # print(localfilename)

聽 聽 聽 聽 聽 聽 聽 聽 with open(localfilename, 'wb+') as sw:

聽 聽 聽 聽 聽 聽 聽 聽 聽 聽 sw.write(requests.get(link).content)

聽 聽 聽 聽 聽 聽 聽 聽 if localfilename.endswith('.doc'):

聽 聽 聽 聽 聽 聽 聽 聽 聽 聽 doc2docx(localfilename)

鍙嶅璇昏繖娈典唬鐮侊紝骞舵病鏈夊彂鐜颁粈涔堥棶棰樸��

鍥犱负鏈変簺缃戦〉鐨勯檮浠跺悕绉版槸鐩稿悓鐨勶紝渚嬪 鍏憡.doc锛屾墍浠ユ垜鎸夋瘡涓綉椤电殑鏍囬锛堝湪鎬昏椤甸潰鐖埌鐨勶級鍒嗘枃浠跺す鏀剧疆涓嬭浇鐨勬枃浠讹紝鎵�浠ユ柟娉曚腑浼犱簡涓�涓� name 鍙傛暟锛岃�屽鏋� name 鍙傛暟浼犵┖锛屽垯涓嶄細鎶ラ敊銆�

鍏跺疄鐢辨宸茬粡鍙互鍙戠幇 bug 鎵�鍦ㄤ簡锛屼絾鎴戝嵈娌℃兂鍒帮紝鍙堝弽澶嶆姌鑵句簡寰堜箙鎵嶅彂鐜帮紝鍘熸潵鏄枃浠跺悕澶暱浜嗐��

鍦╳indows涓嬮潰锛屽崟涓枃浠跺悕鐨勯暱搴﹂檺鍒舵槸255锛屽畬鏁寸殑璺緞闀垮害锛堝 E:\abc\test.doc锛夎繖鏍烽檺鍒舵槸260锛屼竴涓眽瀛楀崰2涓瓧绗︺��

璺緞鏈�鍚庢湁涓�涓瓧绗︿覆缁撴潫绗� '\0' 瑕佸崰鎺変竴涓瓧绗︼紝鎵�浠ュ畬鏁磋矾寰勫疄闄呴檺闀挎槸259銆�

你可能感兴趣的:(Python锛氳鍙� .doc銆�.docx 涓ょ Word 鏂囦欢绠�杩板強鈥淲ord 鏈兘寮曞彂浜嬩欢鈥濋敊璇�)