Apache Poi获取各类文档内容。

2019独角兽企业重金招聘Python工程师标准>>> hot3.png

poi版本3.9

操作之前获取文件输入流对象

 

FileInputStream fis;
try {
    fis = new FileInputStream(file);
} catch (FileNotFoundException fnfe) {
    return;
}

 

1.获取word2003及以前版本内容。

 

WordExtractor wordExtractor = new WordExtractor(fis);
String result = wordExtractor.getText();

 

2.获取word2007内容。

 

XWPFWordExtractor xwpfWordExtractor = new XWPFWordExtractor(new XWPFDocument(fis));
String result = xwpfWordExtractor.getText();

 

3.获取excel2003及以前版本内容。

 

POIFSFileSystem poifsFileSystem = new POIFSFileSystem(fis);
 StringBuffer sb = new StringBuffer();
 HSSFWorkbook wb = new HSSFWorkbook(poifsFileSystem);
 for(int sheetNum = 0;sheetNum < wb.getNumberOfSheets() ;sheetNum++){ 
     if(wb.getSheetAt(sheetNum)!=null){
         HSSFSheet sheet = wb.getSheetAt(sheetNum);
         for(int sheetRow =0;sheetRow

4.获取excel2007内容。

 

XSSFWorkbook wb = new XSSFWorkbook(fis);
StringBuffer sb = new StringBuffer();
for(int sheetNum = 0;sheetNum < wb.getNumberOfSheets() ;sheetNum++){			
    if(wb.getSheetAt(sheetNum)!=null){
        XSSFSheet sheet = wb.getSheetAt(sheetNum);
        for(int sheetRow =0;sheetRow

 

5.获取ppt文件内容。

 

StringBuffer sb = new StringBuffer();
SlideShow ss = new SlideShow(new HSLFSlideShow(fis));
Slide[] s = ss.getSlides();			
for(int i=0;i

 

6. 获取pdf文件内容。

 

PDFParser parser = new PDFParser(fis);								
parser.parse();														
PDDocument pdDocument = parser.getPDDocument();						
PDFTextStripper stripper = new PDFTextStripper();					
String result = stripper.getText(pdDocument);

 

 

转载于:https://my.oschina.net/jiangli0502/blog/119623

你可能感兴趣的:(Apache Poi获取各类文档内容。)