本文主要讨论了用dom4j解析 XML的基础问题,包括建立XML文档,添加、修改、
删除节点,以及格式化(美化)输出和中文问题。可作为dom4j的入门资料。
1 下载与安装
dom4j是 sourceforge.net上的一个开源项目,主要用于对 XML的解析。从2001 年7 月发布
第一版以来,已陆续推出多个版本,目前最高版本为1.5。
dom4j专门针对Java 开发,使用起来非常简单、直观,在Java 界,dom4j正迅速普及。
可以到http://sourceforge.net/projects/dom4j下载其最新版。
dom4j-1.6.1.zip的压缩包,解压后有一个dom4j-1.6.1.jar
文件,这就是应用时需要引入的类包,另外还有一个 jaxen-1.1-beta-6.jar 文件,一般也需要
引入,否则执行时可能抛java.lang.NoClassDefFoundError: org/jaxen/JaxenException异常,其
他的包可以选择用之。
2 示例XML
为了方便起见,先定义了一个XML文档,可能有不合理的地方,但只是为了演示dom4j对XML文档如何进行简单的基本操作的。
<?xml version="1.0" encoding="GBK"?> <books> <!--This is a test for dom4j,the goal is just for learn the base use dom4j --> <book version="1.0"> <title>Think In Java 0</title> <author>Jay Chang0 <name pastUsedName="Jay"/> </author> <publisher>Tsing Hua</publisher> </book> <book version="1.0"> <title>Think In Java 1</title> <author>Jay Chang1 <name pastUsedName="Jay"/> </author> <publisher>Tsing Hua</publisher> </book> <book version="1.0"> <title>Think In Java 2</title> <author>Jay Chang2 <name pastUsedName="Jay"/> </author> <publisher>Tsing Hua</publisher> </book> </books>
例子的XML文件用于村粗多本书 的信息,书的信息包括书名,作者(作者元素有以子元素name,name元素由一个属性,代表作者的曾用名),出版社,book元素有一个version(版本)属性,待会在修改XML文档时会用到。
3 创建XML文档
/** * 创建一个XML文档文件名及存储路径由filePath指定(如"D:\\dom4j_example\\books.xml") * * @param filePath 文件存储路径及文件名 * @return ret 操作成功返回0,否则返回-1 */ public static int createXMLDocument(String filePath) { Document xmlDocument = DocumentHelper.createDocument(); // 可以通过OutputFormat来设置 // xmlDocument.setXMLEncoding("UTF-8"); Element booksElement = xmlDocument.addElement("books"); xmlDocument.setRootElement(booksElement); //添加一注释 booksElement .addComment("This is a test for dom4j,the goal is just for learn the base use dom4j "); //向books元素添加3个book元素 for (int i = 0; i < 3; i++) { Element bookOne = booksElement.addElement("book"); bookOne.addAttribute("version", "1.0"); Element bookOneTitle = bookOne.addElement("title"); bookOneTitle.setText("Think In Java " + i); Element bookOneAuthor = bookOne.addElement("author"); bookOneAuthor.setText("Jay Chang" + i); Element bookOneAuthorName = bookOneAuthor.addElement("name"); bookOneAuthorName.addAttribute("pastUsedName", "Jay"); Element publish = bookOne.addElement("publisher"); publish.setText("Tsing Hua"); } int ret = -1; XMLWriter writer = null; try { int indexOfLastSeparator = filePath.lastIndexOf(File.separator); String directoryStr = filePath.substring(0, indexOfLastSeparator); File directory = new File(directoryStr); if (!directory.exists()) { System.out.println(directory); directory.mkdirs(); } String fileName = filePath.substring(indexOfLastSeparator + 1); File xmlFile = new File(directory, fileName); if (!xmlFile.exists()) { System.out.println(xmlFile); xmlFile.createNewFile(); } OutputFormat format = OutputFormat.createPrettyPrint(); format.setEncoding("GBK"); writer = new XMLWriter(new FileWriter(xmlFile), format); writer.write(xmlDocument); ret = 0; } catch (IOException e) { e.printStackTrace(); } finally { if (writer != null) try { writer.close(); } catch (IOException e) { e.printStackTrace(); } } return ret; }
说明:
Document document = DocumentHelper.createDocument();
通过这句定义一个XML文档对象。
Element booksElement = document.addElement("books");
通过这句定义一个XML元素,这里添加的是根节点。
Element有几个重要的方法:
l l addComment:添加注释
l l addAttribute:添加属性
l l addElement:添加子元素
最后通过 XMLWriter 生成物理文件,默认生成的 XML 文件排版格式比较乱,可以通过
OutputFormat 类的 createCompactFormat()方法或 createPrettyPrint()方法格式化输出,默认采
用createCompactFormat()方法,显示比较紧凑
默认的格式:
<?xml version="1.0" encoding="UTF-8"?> <books><!--This is a test for dom4j,the goal is just for learn the base use dom4j --><book version="1.0"><title>Think In Java 0</title><author>Jay Chang0<name pastUsedName="Jay"/></author><publisher>Tsing Hua</publisher></book><book version="1.0"><title>Think In Java 1</title><author>Jay Chang1<name pastUsedName="Jay"/></author><publisher>Tsing Hua</publisher></book><book version="1.0"><title>Think In Java 2</title><author>Jay Chang2<name pastUsedName="Jay"/></author><publisher>Tsing Hua</publisher></book></books>
如果采用createPrettyPrint()的话,xml文档格式会比较美观
OutputFormat format = OutputFormat.createPrettyPrint();
writer = new XMLWriter(new FileWriter(newXMLFile), format);
writer.write(xmlDocument);
即标题 2中列出的XML文档的那个格式,看上去比较舒服!
4 修改XML文档
该方法演示了对XML文档的三个操作:
1) 将所有version(版本信息)为1.0书的版本信息全改为1.1
2) 将所有的publiser(出版社)为Tsing Hua书的出版社改为Peking University
3) 将title(书名)为Think In Java 1书的title元素删掉
/** * 修改XML文档,操作又修改元素属性,修改元素文本值,向元素添加子元素,删除元素 * * @param filePath 需要读入的XML文档路径及文件名 * @param newFilePath 修改后XML文档存储路径及文件名(如D:\\dom4j_example\\books_modified.xml) * @return ret 操作成功返回0,否则返回-1 */ public static int modifyXMLDocument(String filePath, String newFilePath) { SAXReader reader = new SAXReader(); int ret = -1; XMLWriter writer = null; try { Document xmlDocument = reader.read(new File(filePath)); // 修改内容1:将所有书本的版本属性改为1.1 List versionList = xmlDocument.selectNodes("books/book/@version"); Iterator verAttIt = versionList.iterator(); while (verAttIt.hasNext()) { Attribute attributeVersion = (Attribute) verAttIt.next(); if ("1.0".equals(attributeVersion.getValue())) attributeVersion.setValue("1.1"); } // 修改内容2:将出版社改为Peking,并增加日期元素,文本值为2010-03-22 List publishList = xmlDocument.selectNodes("books/book/publisher"); Iterator pubEleIt = publishList.iterator(); while (pubEleIt.hasNext()) { Element pubElement = (Element) pubEleIt.next(); if ("Tsing Hua".equals(pubElement.getText())) { pubElement.setText("Peking University"); Element pubTime = pubElement.addElement("publishTime"); pubTime.setText("2010-03-22"); } } // 修改内容3:将title为Think In Java 1的元素删除掉 List bookList = xmlDocument.selectNodes("books/book"); Iterator bookIt = bookList.iterator(); while (bookIt.hasNext()) { Element bookElement = (Element) bookIt.next(); Iterator titleIt = bookElement.elementIterator("title"); while (titleIt.hasNext()) { Element titleElement = (Element) titleIt.next(); if ("Think In Java 1".equals(titleElement.getText())) { bookElement.remove(titleElement); } } } // 将Document写到一个新的XML文件中 int indexOfLastSeparator = newFilePath.lastIndexOf(File.separator); String directoryStr = newFilePath.substring(0, indexOfLastSeparator); File directory = new File(directoryStr); if (!directory.exists()) { directory.mkdirs(); } String fileName = newFilePath.substring(indexOfLastSeparator + 1); File newXMLFile = new File(directory, fileName); if (!newXMLFile.exists()) { newXMLFile.createNewFile(); } OutputFormat format = OutputFormat.createPrettyPrint(); writer = new XMLWriter(new FileWriter(newXMLFile), format); writer.write(xmlDocument); ret = 0; } catch (DocumentException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } finally { if (writer != null) try { writer.close(); } catch (IOException e) { e.printStackTrace(); } } return ret; }
说明:
List list = document.selectNodes("/books/book/@version" );
所有book元素的versino列表
list = document.selectNodes("/books/book");
所有book元素
上述代码通过xpath查找到相应内容。
通过setValue()、setText()修改节点内容。
通过remove()删除节点或属性。
5 完整代码
import java.io.File; import java.io.FileWriter; import java.io.IOException; import java.util.Iterator; import java.util.List; import org.dom4j.Attribute; import org.dom4j.Document; import org.dom4j.DocumentException; import org.dom4j.DocumentHelper; import org.dom4j.Element; import org.dom4j.io.OutputFormat; import org.dom4j.io.SAXReader; import org.dom4j.io.XMLWriter; public class Dom4jParse { /** * 创建一个XML文档文件名及存储路径由filePath指定(如"D:\\dom4j_example\\books.xml") * * @param filePath 文件存储路径及文件名 * @return ret 操作成功返回0,否则返回-1 */ public static int createXMLDocument(String filePath) { Document xmlDocument = DocumentHelper.createDocument(); // 可以通过OutputFormat来设置 // xmlDocument.setXMLEncoding("UTF-8"); Element booksElement = xmlDocument.addElement("books"); xmlDocument.setRootElement(booksElement); //添加一注释 booksElement .addComment("This is a test for dom4j,the goal is just for learn the base use dom4j "); //向books元素添加3个book元素 for (int i = 0; i < 3; i++) { Element bookOne = booksElement.addElement("book"); bookOne.addAttribute("version", "1.0"); Element bookOneTitle = bookOne.addElement("title"); bookOneTitle.setText("Think In Java " + i); Element bookOneAuthor = bookOne.addElement("author"); bookOneAuthor.setText("Jay Chang" + i); Element bookOneAuthorName = bookOneAuthor.addElement("name"); bookOneAuthorName.addAttribute("pastUsedName", "Jay"); Element publish = bookOne.addElement("publisher"); publish.setText("Tsing Hua"); } int ret = -1; XMLWriter writer = null; try { int indexOfLastSeparator = filePath.lastIndexOf(File.separator); String directoryStr = filePath.substring(0, indexOfLastSeparator); File directory = new File(directoryStr); if (!directory.exists()) { System.out.println(directory); directory.mkdirs(); } String fileName = filePath.substring(indexOfLastSeparator + 1); File xmlFile = new File(directory, fileName); if (!xmlFile.exists()) { System.out.println(xmlFile); xmlFile.createNewFile(); } //OutputFormat format = OutputFormat.createPrettyPrint(); //format.setEncoding("GBK"); writer = new XMLWriter(new FileWriter(xmlFile)); writer.write(xmlDocument); ret = 0; } catch (IOException e) { e.printStackTrace(); } finally { if (writer != null) try { writer.close(); } catch (IOException e) { e.printStackTrace(); } } return ret; } /** * 修改XML文档,操作又修改元素属性,修改元素文本值,向元素添加子元素,删除元素 * * @param filePath 需要读入的XML文档路径及文件名 * @param newFilePath 修改后XML文档存储路径及文件名(如D:\\dom4j_example\\books_modified.xml) * @return ret 操作成功返回0,否则返回-1 */ public static int modifyXMLDocument(String filePath, String newFilePath) { SAXReader reader = new SAXReader(); int ret = -1; XMLWriter writer = null; try { Document xmlDocument = reader.read(new File(filePath)); // 修改内容1:将所有书本的版本属性改为1.1 List versionList = xmlDocument.selectNodes("books/book/@version"); Iterator verAttIt = versionList.iterator(); while (verAttIt.hasNext()) { Attribute attributeVersion = (Attribute) verAttIt.next(); if ("1.0".equals(attributeVersion.getValue())) attributeVersion.setValue("1.1"); } // 修改内容2:将出版社改为Peking,并增加日期元素,文本值为2010-03-22 List publishList = xmlDocument.selectNodes("books/book/publisher"); Iterator pubEleIt = publishList.iterator(); while (pubEleIt.hasNext()) { Element pubElement = (Element) pubEleIt.next(); if ("Tsing Hua".equals(pubElement.getText())) { pubElement.setText("Peking University"); Element pubTime = pubElement.addElement("publishTime"); pubTime.setText("2010-03-22"); } } // 修改内容3:将title为Think In Java 1的元素删除掉 List bookList = xmlDocument.selectNodes("books/book"); Iterator bookIt = bookList.iterator(); while (bookIt.hasNext()) { Element bookElement = (Element) bookIt.next(); Iterator titleIt = bookElement.elementIterator("title"); while (titleIt.hasNext()) { Element titleElement = (Element) titleIt.next(); if ("Think In Java 1".equals(titleElement.getText())) { bookElement.remove(titleElement); } } } // 将Document写到一个新的XML文件中 int indexOfLastSeparator = newFilePath.lastIndexOf(File.separator); String directoryStr = newFilePath.substring(0, indexOfLastSeparator); File directory = new File(directoryStr); if (!directory.exists()) { directory.mkdirs(); } String fileName = newFilePath.substring(indexOfLastSeparator + 1); File newXMLFile = new File(directory, fileName); if (!newXMLFile.exists()) { newXMLFile.createNewFile(); } OutputFormat format = OutputFormat.createPrettyPrint(); writer = new XMLWriter(new FileWriter(newXMLFile), format); writer.write(xmlDocument); ret = 0; } catch (DocumentException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } finally { if (writer != null) try { writer.close(); } catch (IOException e) { e.printStackTrace(); } } return ret; } public static void main(String[] args) { System.out .println(createXMLDocument("f:\\Dom4j_Learn\\example1\\books.xml")); System.out.println(modifyXMLDocument( "F:\\Dom4j_Learn\\example1\\books.xml", "F:\\Dom4j_Learn\\example1\\books_modified.xml")); } }