NPOI核心优势:
“NuGet安装后找不到类?90%的初学者都踩过这个坑!”
// Visual Studio NuGet命令行
Install-Package NPOI -Version 2.5.4
Install-Package NPOI.XWPF.UserModel -Version 2.5.4
Install-Package NPOI.HWPF -Version 2.5.4
关键点:
.doc
(旧版Word).docx
(新版Word)“按钮点击事件写错命名空间?看这里!”
// Form1.cs
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
// 初始化按钮
Button btnParse = new Button();
btnParse.Text = "解析Word";
btnParse.Location = new Point(50, 50);
btnParse.Click += BtnParse_Click;
this.Controls.Add(btnParse);
}
private void BtnParse_Click(object sender, EventArgs e)
{
try
{
OpenFileDialog openFileDialog = new OpenFileDialog();
openFileDialog.Filter = "Word文档|*.doc;*.docx";
if (openFileDialog.ShowDialog() == DialogResult.OK)
{
string filePath = openFileDialog.FileName;
ParseWordDocument(filePath);
}
}
catch (Exception ex)
{
MessageBox.Show($"解析失败:{ex.Message}");
}
}
}
调试技巧:
try-catch
,防止程序崩溃Filter
属性限定文件类型“段落遍历卡在空行?你需要知道RowNum的隐藏规则!”
private void ParseWordDocument(string filePath)
{
using (FileStream fs = new FileStream(filePath, FileMode.Open, FileAccess.Read))
{
IWorkbook workbook = null;
if (filePath.EndsWith(".docx"))
{
workbook = new XSSFWorkbook(fs);
}
else if (filePath.EndsWith(".doc"))
{
workbook = new HSSFWorkbook(fs);
}
if (workbook != null)
{
foreach (ISheet sheet in workbook)
{
foreach (IRow row in sheet)
{
foreach (ICell cell in row)
{
Console.WriteLine($"单元格内容:{cell.ToString()}");
}
}
}
}
}
}
进阶优化:
cell.CellType
处理数字/日期cell.GetCellStyle()
获取字体、颜色“表格行数少1?RowNum从0开始计数的真相!”
private void ExtractTableData(XWPFDocument doc)
{
foreach (XWPFTable table in doc.Tables)
{
Console.WriteLine("发现表格:");
for (int i = 0; i < table.Rows.Count; i++)
{
XWPFTableRow row = table.Rows[i];
for (int j = 0; j < row.Cells.Count; j++)
{
XWPFTableCell cell = row.Cells[j];
// 处理单元格中的段落
string cellText = string.Join("\n", cell.Paragraphs.Select(p => p.Text));
Console.WriteLine($"行{i}列{j}:{cellText}");
}
}
}
}
关键技巧:
string.Join
处理多段落单元格table.GetCTTbl().GetPos()
获取位置信息“图片保存路径错误?你需要绝对路径+文件名哈希!”
private void ExtractImages(XWPFDocument doc, string outputDir)
{
if (!Directory.Exists(outputDir)) Directory.CreateDirectory(outputDir);
int imageIndex = 0;
foreach (XWPFPictureData picture in doc.GetAllPictures())
{
string ext = GetImageExtension(picture.PictureType);
string imagePath = Path.Combine(outputDir, $"image_{imageIndex++}{ext}");
using (FileStream fs = new FileStream(imagePath, FileMode.Create))
{
picture.WriteImageContent(fs);
Console.WriteLine($"图片已保存:{imagePath}");
}
}
}
private string GetImageExtension(int pictureType)
{
switch (pictureType)
{
case 2: return ".jpg";
case 3: return ".png";
case 4: return ".gif";
default: return ".bin";
}
}
注意事项:
PictureType
可能返回未知类型using
语句确保流正确关闭“样式丢失?你需要深入ICellStyle的每个属性!”
private void PreserveStyles(ICell cell)
{
ICellStyle style = cell.GetCellStyle();
Console.WriteLine($"字体:{style.Font.FontName}");
Console.WriteLine($"字号:{style.Font.FontSize}");
Console.WriteLine($"颜色:{style.FillForegroundColor}");
Console.WriteLine($"加粗:{style.Font.IsBold}");
}
扩展应用:
“数据库字段命名混乱?用[Column]特性统一规范!”
[Table("WordData")]
public class WordDataEntity
{
[Key]
public int Id { get; set; }
[Column("OriginalText")]
public string Content { get; set; }
[Column("ExtractTime")]
public DateTime ExtractedAt { get; set; }
}
EF Core配置:
Add-Migration InitialCreate
SaveChanges()
vs BulkInsert
“同步阻塞导致UI卡顿?异步编程是关键!”
private async Task SaveToDatabaseAsync(List<WordDataEntity> data)
{
using (var context = new WordDataContext())
{
context.WordData.AddRange(data);
await context.SaveChangesAsync();
}
}
性能优化:
Pooling=true;Max Pool Size=200;
“文件损坏导致解析失败?你需要健壮的异常处理!”
private void SafeParse(string filePath)
{
try
{
using (FileStream fs = new FileStream(filePath, FileMode.Open, FileAccess.Read))
{
if (filePath.EndsWith(".docx"))
{
var doc = new XWPFDocument(fs);
// 解析逻辑
}
else if (filePath.EndsWith(".doc"))
{
var doc = new HWPFDocument(fs);
// 解析逻辑
}
}
}
catch (IOException ex)
{
Console.WriteLine($"IO异常:{ex.Message}");
}
catch (InvalidDataException ex)
{
Console.WriteLine($"文件格式错误:{ex.Message}");
}
}
日志记录:
Log.Information("解析开始")
“500MB文档导致内存爆表?流式处理是关键!”
private void StreamParse(string filePath)
{
using (FileStream fs = new FileStream(filePath, FileMode.Open, FileAccess.Read))
{
if (filePath.EndsWith(".docx"))
{
using (var doc = new XWPFDocument(fs))
{
foreach (var para in doc.Paragraphs)
{
ProcessParagraph(para);
}
}
}
}
}
关键策略:
StringBuilder
等对象“掌握NPOI后,你的生产力直接提升10倍!”
核心思想:
进阶方向:
终极目标: