【python-docx】word(docx)格式转化成markdow,python代码

python-docx的使用

说明,通过docx库把word的标题都提取出来,转化成markdown的格式

待完善部分,图像提取,表格提取

demo.py

from docx import Document

path = '2_test.docx'  # 文件路径
wordfile = Document(path)  # 读入文件

paragraphs = wordfile.paragraphs

list_txt = []

title1_number = 0
title2_number = 0

for paragraph in paragraphs:
    print(paragraph.style.name)
    print(paragraph.text)

    if paragraph.style.name == 'Heading 1':
        title1_number += 1
        title1 = f'# {
     

你可能感兴趣的:(python每日5行,css,html,前端)