解析一个XML文件有很多方法,最常用的就是Dom4j,和JDOM.这个我们就是讲使用Xalan来解析一个XML文件.
先有个xml文件如下:
现在就开始解析一个XML文件:在Dengues 项目中的org.dengues.commons.jdk.xpath.ComplexXPathUsedXalan类.
2.在第12行,此前我新建了一个Map用来存储Prefix和Namespace的对应关系:
3.第11行,它的作用的就是取出所有的Prefix和Namespace.代码如下:
4.其实去Prefix和Namepace还由另外的API:com.sun.org.apache.xml.interal.utils.PrefixResloverDefault,你可以使用它:
5.构造函数的22行设置一个NamespaceContext,代码如下:
现在就是最后要提供一个API:
这个就是提供的解析xpath表达是的接口.大家会看到了要执行一个添加addDefaultPrefix()的函数,它的作用就是在
当存在默认Namespace的时候,添加一个默认的Prefix.这样的问题解决了.
再看看测试的代码:
Dengues论坛(http://groups.google.com/group/dengues/),一个很好的Eclipse开发者乐园.
先有个xml文件如下:
<?
xml version="1.0" encoding="UTF-8"
?>
< a:main xmlns ="test0" xmlns:a ="testt" xmlns:b ="test2" >
< a:node >
< a:nodeA1 > A1 </ a:nodeA1 >
< a:nodeA2 >
< b:nodeB > B1 </ b:nodeB >
< b:nodeB > B2 </ b:nodeB >
</ a:nodeA2 >
</ a:node >
< node >
< nodeD />
</ node >
</ a:main >
这个一个简单的xml文件.但是我们现在要用xpath来查找
< a:main xmlns ="test0" xmlns:a ="testt" xmlns:b ="test2" >
< a:node >
< a:nodeA1 > A1 </ a:nodeA1 >
< a:nodeA2 >
< b:nodeB > B1 </ b:nodeB >
< b:nodeB > B2 </ b:nodeB >
</ a:nodeA2 >
</ a:node >
< node >
< nodeD />
</ node >
</ a:main >
"
/a:main/node
"
一般你是一个很复杂的表达式,这个xpath是经过简化的.简化的代码你可以使用如下:
1
private
static
String simplifyXPathExpression(String xpathExpression) {
2
3 Perl5Matcher matcher = new Perl5Matcher();
4
5 Perl5Compiler compiler = new Perl5Compiler();
6
7 Pattern pattern = null ;
8 try {
9 pattern = compiler.compile( " (.*)/\\s*\\w+\\s*(/(\\.\\.|parent))(.*) " );
10 } catch (MalformedPatternException e) {
11 ExceptionHandler.process(e);
12 }
13
14 Perl5Substitution substitution = new Perl5Substitution( " $1$4 " , Perl5Substitution.INTERPOLATE_ALL);
15
16 int lengthOfPreviousXPath = 0 ;
17
18 do {
19 lengthOfPreviousXPath = xpathExpression.length();
20 if (matcher.matches(xpathExpression, pattern)) {
21 xpathExpression = Util.substitute(matcher, pattern, substitution, xpathExpression, Util.SUBSTITUTE_ALL);
22 }
23 } while (xpathExpression.length() != lengthOfPreviousXPath);
24
25 return xpathExpression;
26 }
这里使用的org.apache.oro.text.regex包里面的类.
2
3 Perl5Matcher matcher = new Perl5Matcher();
4
5 Perl5Compiler compiler = new Perl5Compiler();
6
7 Pattern pattern = null ;
8 try {
9 pattern = compiler.compile( " (.*)/\\s*\\w+\\s*(/(\\.\\.|parent))(.*) " );
10 } catch (MalformedPatternException e) {
11 ExceptionHandler.process(e);
12 }
13
14 Perl5Substitution substitution = new Perl5Substitution( " $1$4 " , Perl5Substitution.INTERPOLATE_ALL);
15
16 int lengthOfPreviousXPath = 0 ;
17
18 do {
19 lengthOfPreviousXPath = xpathExpression.length();
20 if (matcher.matches(xpathExpression, pattern)) {
21 xpathExpression = Util.substitute(matcher, pattern, substitution, xpathExpression, Util.SUBSTITUTE_ALL);
22 }
23 } while (xpathExpression.length() != lengthOfPreviousXPath);
24
25 return xpathExpression;
26 }
现在就开始解析一个XML文件:在Dengues 项目中的org.dengues.commons.jdk.xpath.ComplexXPathUsedXalan类.
1
public
ComplexXPathUsedXalan(String xmlFilename) {
2 xmlInput = new File(xmlFilename);
3 if ( ! xmlInput.exists()) {
4 throw new RuntimeException( " Specified file does not exist! " );
5 }
6 try {
7 DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
8 docFactory.setNamespaceAware( true );
9 DocumentBuilder builder = docFactory.newDocumentBuilder();
10 document = builder.parse(xmlInput);
11 initLastNodes(document.getDocumentElement());
12 prefixToNamespace.put(XMLConstants.XML_NS_PREFIX, XMLConstants.XML_NS_URI);
13 } catch (ParserConfigurationException e) {
14 e.printStackTrace();
15 } catch (IOException e) {
16 e.printStackTrace();
17 } catch (SAXException exception) {
18 exception.printStackTrace();
19 }
20 XPathFactory factory = XPathFactory.newInstance();
21 xPath = factory.newXPath();
22 xPath.setNamespaceContext(getNamespaceContext());
23 }
这是类的构造函数.这里要注意的几个地方就是:1.第8行设置这个DocFactory支持Namespace.这里一定要设置否则你将不能找所需要的节点.
2 xmlInput = new File(xmlFilename);
3 if ( ! xmlInput.exists()) {
4 throw new RuntimeException( " Specified file does not exist! " );
5 }
6 try {
7 DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
8 docFactory.setNamespaceAware( true );
9 DocumentBuilder builder = docFactory.newDocumentBuilder();
10 document = builder.parse(xmlInput);
11 initLastNodes(document.getDocumentElement());
12 prefixToNamespace.put(XMLConstants.XML_NS_PREFIX, XMLConstants.XML_NS_URI);
13 } catch (ParserConfigurationException e) {
14 e.printStackTrace();
15 } catch (IOException e) {
16 e.printStackTrace();
17 } catch (SAXException exception) {
18 exception.printStackTrace();
19 }
20 XPathFactory factory = XPathFactory.newInstance();
21 xPath = factory.newXPath();
22 xPath.setNamespaceContext(getNamespaceContext());
23 }
2.在第12行,此前我新建了一个Map用来存储Prefix和Namespace的对应关系:
1
private
final
Map
<
String, String
>
prefixToNamespace
=
new
HashMap
<
String, String
>
();
在12行就是讲XML对应Namespace设置进去.
3.第11行,它的作用的就是取出所有的Prefix和Namespace.代码如下:
1
private
void
initLastNodes(Node node) {
2 NodeList childNodes = node.getChildNodes();
3 int length = childNodes.getLength();
4 int type = node.getNodeType();
5 if (type == Node.ELEMENT_NODE) {
6 setPrefixToNamespace(node);
7 }
8 for ( int i = 0 ; i < length; i ++ ) {
9 Node item = childNodes.item(i);
10 if (item.getChildNodes().getLength() > 0 ) {
11 initLastNodes(item);
12 }
13 }
14 }
15
16 /**
17 * DOC qzhang Comment method "setPrefixToNamespace".
18 *
19 * @param node
20 */
21 private void setPrefixToNamespace(Node node) {
22 NamedNodeMap nnm = node.getAttributes();
23 for ( int i = 0 ; i < nnm.getLength(); i ++ ) {
24 Node attr = nnm.item(i);
25 String aname = attr.getNodeName();
26 boolean isPrefix = aname.startsWith(XMLConstants.XMLNS_ATTRIBUTE + " : " );
27 if (isPrefix || aname.equals(XMLConstants.XMLNS_ATTRIBUTE)) {
28 int index = aname.indexOf( ' : ' );
29 String p = isPrefix ? aname.substring(index + 1 ) : XMLConstants.NULL_NS_URI;
30 prefixToNamespace.put(p, attr.getNodeValue());
31 }
32 }
33 }
这样的就可以去到当前XML的所有Prefix的所有节点.
2 NodeList childNodes = node.getChildNodes();
3 int length = childNodes.getLength();
4 int type = node.getNodeType();
5 if (type == Node.ELEMENT_NODE) {
6 setPrefixToNamespace(node);
7 }
8 for ( int i = 0 ; i < length; i ++ ) {
9 Node item = childNodes.item(i);
10 if (item.getChildNodes().getLength() > 0 ) {
11 initLastNodes(item);
12 }
13 }
14 }
15
16 /**
17 * DOC qzhang Comment method "setPrefixToNamespace".
18 *
19 * @param node
20 */
21 private void setPrefixToNamespace(Node node) {
22 NamedNodeMap nnm = node.getAttributes();
23 for ( int i = 0 ; i < nnm.getLength(); i ++ ) {
24 Node attr = nnm.item(i);
25 String aname = attr.getNodeName();
26 boolean isPrefix = aname.startsWith(XMLConstants.XMLNS_ATTRIBUTE + " : " );
27 if (isPrefix || aname.equals(XMLConstants.XMLNS_ATTRIBUTE)) {
28 int index = aname.indexOf( ' : ' );
29 String p = isPrefix ? aname.substring(index + 1 ) : XMLConstants.NULL_NS_URI;
30 prefixToNamespace.put(p, attr.getNodeValue());
31 }
32 }
33 }
4.其实去Prefix和Namepace还由另外的API:com.sun.org.apache.xml.interal.utils.PrefixResloverDefault,你可以使用它:
1
PrefixResolverDefault resolverDefault
=
new
PrefixResolverDefault(document.getDocumentElement());
2 String namespace = resolverDefault.getNamespaceForPrefix(prefix);
你可以使用这种方法.但是我觉得这种方法在大部分情况下解决不了问题.原因:(1)当前你只有document的根结点.也就是说它只能得到根结点定义的Namespace.(2)当根结点也使用了Namespace的话(a:main),解析就有问题.
2 String namespace = resolverDefault.getNamespaceForPrefix(prefix);
5.构造函数的22行设置一个NamespaceContext,代码如下:
1
private
NamespaceContext getNamespaceContext() {
2 return new NamespaceContext() {
3
4 public String getNamespaceURI(String prefix) {
5 String namespaceForPrefix = getNamespaceForPrefix(prefix);
6 return namespaceForPrefix;
7
8 }
9
10 public java.util.Iterator getPrefixes(String val) {
11 return null ;
12 }
13
14 public String getPrefix(String uri) {
15 return null ;
16 }
17 };
18 }
19
20 /**
21 * DOC qzhang Comment method "getNamespaceForPrefix".
22 *
23 * @param prefix
24 * @return
25 */
26 protected String getNamespaceForPrefix(String prefix) {
27 String namespace = prefixToNamespace.get(prefix);
28 if (namespace != null ) {
29 return namespace;
30 }
31 return getDefaultNamespace();
32 }
33
34 private String getDefaultNamespace() {
35 Node parent = document.getDocumentElement();
36 int type = parent.getNodeType();
37 if (type == Node.ELEMENT_NODE) {
38 NamedNodeMap nnm = parent.getAttributes();
39 for ( int i = 0 ; i < nnm.getLength(); i ++ ) {
40 Node attr = nnm.item(i);
41 String aname = attr.getNodeName();
42 if (aname.equals(XMLConstants.XMLNS_ATTRIBUTE)) {
43 return attr.getNodeValue();
44 }
45 }
46 }
47 return XMLConstants.NULL_NS_URI;
48 }
这个总要就是取Namespace值,但是当你输入Prefix不在Map里面的时候,它将取默认的Namespace,也就是在getDefaultNamespace().
2 return new NamespaceContext() {
3
4 public String getNamespaceURI(String prefix) {
5 String namespaceForPrefix = getNamespaceForPrefix(prefix);
6 return namespaceForPrefix;
7
8 }
9
10 public java.util.Iterator getPrefixes(String val) {
11 return null ;
12 }
13
14 public String getPrefix(String uri) {
15 return null ;
16 }
17 };
18 }
19
20 /**
21 * DOC qzhang Comment method "getNamespaceForPrefix".
22 *
23 * @param prefix
24 * @return
25 */
26 protected String getNamespaceForPrefix(String prefix) {
27 String namespace = prefixToNamespace.get(prefix);
28 if (namespace != null ) {
29 return namespace;
30 }
31 return getDefaultNamespace();
32 }
33
34 private String getDefaultNamespace() {
35 Node parent = document.getDocumentElement();
36 int type = parent.getNodeType();
37 if (type == Node.ELEMENT_NODE) {
38 NamedNodeMap nnm = parent.getAttributes();
39 for ( int i = 0 ; i < nnm.getLength(); i ++ ) {
40 Node attr = nnm.item(i);
41 String aname = attr.getNodeName();
42 if (aname.equals(XMLConstants.XMLNS_ATTRIBUTE)) {
43 return attr.getNodeValue();
44 }
45 }
46 }
47 return XMLConstants.NULL_NS_URI;
48 }
现在就是最后要提供一个API:
1
public
Object parseXPath(String expression, QName name) {
2 try {
3 expression = addDefaultPrefix(expression);
4 XPathExpression xexpr = xPath.compile(expression);
5 Object cd = xexpr.evaluate(getDocumnent(), name);
6 return cd;
7 } catch (XPathExpressionException e) {
8 e.printStackTrace();
9 }
10 return null ;
11 }
12
13 private String addDefaultPrefix(String xPathExpression) {
14 if (XMLConstants.NULL_NS_URI.equals(getDefaultNamespace())) {
15 return xPathExpression;
16 } else {
17 StringBuilder expr = new StringBuilder();
18 String[] split = xPathExpression.split( " / " );
19 for (String string : split) {
20 if ( ! string.equals( "" ) && string.indexOf( ' : ' ) == - 1 && string.indexOf( ' . ' ) == - 1 ) {
21 expr.append(XMLConstants.DEFAULT_NS_PREFIX + " : " );
22 }
23 expr.append(string + " / " );
24 }
25 if (split.length > 0 ) {
26 expr.deleteCharAt(expr.length() - 1 );
27 }
28 return expr.toString();
29 }
30 }
2 try {
3 expression = addDefaultPrefix(expression);
4 XPathExpression xexpr = xPath.compile(expression);
5 Object cd = xexpr.evaluate(getDocumnent(), name);
6 return cd;
7 } catch (XPathExpressionException e) {
8 e.printStackTrace();
9 }
10 return null ;
11 }
12
13 private String addDefaultPrefix(String xPathExpression) {
14 if (XMLConstants.NULL_NS_URI.equals(getDefaultNamespace())) {
15 return xPathExpression;
16 } else {
17 StringBuilder expr = new StringBuilder();
18 String[] split = xPathExpression.split( " / " );
19 for (String string : split) {
20 if ( ! string.equals( "" ) && string.indexOf( ' : ' ) == - 1 && string.indexOf( ' . ' ) == - 1 ) {
21 expr.append(XMLConstants.DEFAULT_NS_PREFIX + " : " );
22 }
23 expr.append(string + " / " );
24 }
25 if (split.length > 0 ) {
26 expr.deleteCharAt(expr.length() - 1 );
27 }
28 return expr.toString();
29 }
30 }
这个就是提供的解析xpath表达是的接口.大家会看到了要执行一个添加addDefaultPrefix()的函数,它的作用就是在
当存在默认Namespace的时候,添加一个默认的Prefix.这样的问题解决了.
再看看测试的代码:
1
private
static
void
testXPath() {
2 ComplexXPathUsedXalan complexXPath = new ComplexXPathUsedXalan( " c:/SimpleNamespace.xml " );
3 String expression = " /a:main/node " ;
4 try {
5 NodeList nodes = (NodeList) complexXPath.parseXPath(expression, XPathConstants.NODESET);
6 System.out.println( " length: " + nodes.getLength());
7 for ( int i = 0 ; i < nodes.getLength(); i ++ ) {
8 Node item = nodes.item(i);
9 System.out.println( " Name: " + item.getNodeName() + " value: " + item.getNodeValue());
10 }
11 } catch (Exception e) {
12 e.printStackTrace();
13 }
14 }
这样就可以得到打印结果:
2 ComplexXPathUsedXalan complexXPath = new ComplexXPathUsedXalan( " c:/SimpleNamespace.xml " );
3 String expression = " /a:main/node " ;
4 try {
5 NodeList nodes = (NodeList) complexXPath.parseXPath(expression, XPathConstants.NODESET);
6 System.out.println( " length: " + nodes.getLength());
7 for ( int i = 0 ; i < nodes.getLength(); i ++ ) {
8 Node item = nodes.item(i);
9 System.out.println( " Name: " + item.getNodeName() + " value: " + item.getNodeValue());
10 }
11 } catch (Exception e) {
12 e.printStackTrace();
13 }
14 }
1
length:
1
2 Name:node value: null
就是这样了.欢迎大家指正.
2 Name:node value: null
Dengues论坛(http://groups.google.com/group/dengues/),一个很好的Eclipse开发者乐园.