XPath (XML Path Language) 是一种用于在 XML 文档中导航和选择节点的查询语言。它的设计初衷是为了能够轻松地从 XML 文档中提取特定信息,就像 SQL 查询数据库一样。XPath 被广泛应用于各种 XML 技术中,如 XSLT、XQuery、DOM 等。
XPath 的核心优势在于:
在处理 XML 数据时,XPath 是一项必不可少的技能:
目前存在多个 XPath 版本,主要有:
本教程将主要关注 XPath 1.0,因为它被最广泛地支持,尤其是在 Java 的标准库中。
在学习 XPath 之前,让我们简单回顾一下 XML 文档的基本构成:
<bookstore>
<book category="fiction">
<title lang="en">Harry Pottertitle>
<author>J.K. Rowlingauthor>
<year>2005year>
<price>29.99price>
book>
<book category="science">
<title lang="en">Learning XMLtitle>
<author>Erik T. Rayauthor>
<year>2003year>
<price>39.95price>
book>
bookstore>
XML 文档由以下几个部分组成:
...
)category="fiction"
)
结束
开始,以 ?>
结束XPath 的作用就是在这样的文档结构中定位和选择特定的节点。
XPath 将 XML 文档视为节点树,主要的节点类型包括:
category="fiction"
XPath 使用路径表达式来选择节点。路径表达式的语法类似于文件系统的路径,由斜杠(/)分隔的步骤组成。
基本语法:
轴名称::节点测试[谓语]
[]
包围的附加条件,用于进一步筛选节点大多数情况下,我们可以使用简化语法,忽略轴名称(默认为 child 轴)。
XPath 支持两种类型的路径:
绝对路径以正斜杠(/)开始,从文档根节点出发:
/bookstore/book/title
这个表达式选择所有位于 bookstore 元素下的 book 元素中的 title 元素。
相对路径不以斜杠开始,从当前上下文节点出发:
book/title
从当前上下文节点开始,选择其 book 子元素下的 title 元素。
XPath 提供了多种操作符来构建表达式:
示例:
/bookstore/book # 选择 bookstore 的直接子元素 book
//book # 选择文档中任何位置的 book 元素
./title # 从当前节点选择 title 子元素
../@category # 选择父节点的 category 属性
/bookstore/book/@category # 选择 book 元素的 category 属性
谓语是放在方括号 []
中的表达式,用于进一步筛选节点:
/bookstore/book[1] # 选择第一个 book 元素
/bookstore/book[last()] # 选择最后一个 book 元素
/bookstore/book[position()<3] # 选择前两个 book 元素
/bookstore/book[@category="fiction"] # 选择 category 属性为 "fiction" 的 book 元素
/bookstore/book[price>30] # 选择 price 元素值大于 30 的 book 元素
XPath 提供了通配符来匹配多个节点:
示例:
/bookstore/* # 选择 bookstore 的所有子元素
//* # 选择文档中的所有元素
//book/@* # 选择所有 book 元素的所有属性
//text() # 选择所有文本节点
轴定义了相对于当前节点的节点集。XPath 定义了多种轴,使我们能够灵活地在 XML 文档中导航。
self:当前节点本身
self::node() # 等同于 .
child:当前节点的所有子节点(默认轴)
child::book # 等同于 book
parent:当前节点的父节点
parent::node() # 等同于 ..
ancestor:当前节点的所有祖先节点(父节点、祖父节点等)
ancestor::bookstore
ancestor-or-self:当前节点及其所有祖先节点
ancestor-or-self::node()
descendant:当前节点的所有后代节点(子节点、孙节点等)
descendant::price
descendant-or-self:当前节点及其所有后代节点
descendant-or-self::node() # 等同于 //
following-sibling:当前节点之后的所有同级节点
following-sibling::book
preceding-sibling:当前节点之前的所有同级节点
preceding-sibling::book
following:文档中在当前节点结束标记之后的所有节点
following::book
preceding:文档中在当前节点开始标记之前的所有节点
preceding::book
attribute:当前节点的所有属性
attribute::category # 等同于 @category
namespace:当前节点的所有命名空间节点
namespace::*
假设我们有以下 XML 文档:
<bookstore>
<category name="fiction">
<book id="b1">
<title>Harry Pottertitle>
<author>J.K. Rowlingauthor>
book>
<book id="b2">
<title>The Lord of the Ringstitle>
<author>J.R.R. Tolkienauthor>
book>
category>
<category name="science">
<book id="b3">
<title>Learning XMLtitle>
<author>Erik T. Rayauthor>
book>
category>
bookstore>
一些轴使用的例子:
//book[title='Harry Potter']/parent::* # 选择包含"Harry Potter"书籍的父元素(category元素)
//book[title='Harry Potter']/ancestor::bookstore # 选择包含"Harry Potter"书籍的bookstore祖先元素
//book[title='Harry Potter']/following-sibling::book # 选择"Harry Potter"之后的所有同级book元素
//category[@name='fiction']/descendant::author # 选择fiction类别下的所有author元素
//book[title='Harry Potter']/preceding::book # 选择文档中在"Harry Potter"之前的所有book元素
Java 提供了标准的 API 来处理 XML 和执行 XPath 查询,主要通过 javax.xml.xpath
包。
首先,我们需要导入必要的包:
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
以下是一个使用 XPath 在 XML 文档中查询节点的基本示例:
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
public class XPathExample {
public static void main(String[] args) {
try {
// 解析XML文档
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse("bookstore.xml");
// 创建XPath对象
XPathFactory xPathFactory = XPathFactory.newInstance();
XPath xpath = xPathFactory.newXPath();
// 编译XPath表达式
XPathExpression expr = xpath.compile("//book[price>30]/title");
// 执行查询,获取结果
NodeList nodes = (NodeList) expr.evaluate(document, XPathConstants.NODESET);
// 处理结果
System.out.println("找到 " + nodes.getLength() + " 本价格超过30的书籍:");
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(nodes.item(i).getTextContent());
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
这个示例演示了如何:
XPath 表达式可以返回不同类型的结果,在 Java 中使用 XPathConstants
类指定:
示例:
// 获取节点集
NodeList bookList = (NodeList) xpath.evaluate("//book", document, XPathConstants.NODESET);
// 获取单个节点
Node firstBook = (Node) xpath.evaluate("//book[1]", document, XPathConstants.NODE);
// 获取字符串值
String title = (String) xpath.evaluate("//book[1]/title/text()", document, XPathConstants.STRING);
// 获取数值
Double price = (Double) xpath.evaluate("sum(//book/price)", document, XPathConstants.NUMBER);
// 获取布尔值
Boolean hasExpensiveBooks = (Boolean) xpath.evaluate("boolean(//book[price>50])", document, XPathConstants.BOOLEAN);
以下是一个简单的工具类,封装了常见的 XPath 操作:
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.InputSource;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathFactory;
import java.io.StringReader;
import java.util.ArrayList;
import java.util.List;
public class XPathUtils {
/**
* 解析XML字符串
*/
public static Document parseXmlString(String xmlString) throws Exception {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
return builder.parse(new InputSource(new StringReader(xmlString)));
}
/**
* 获取匹配XPath表达式的节点列表
*/
public static NodeList getNodeList(Document document, String xpathExpression) throws Exception {
XPath xpath = XPathFactory.newInstance().newXPath();
return (NodeList) xpath.compile(xpathExpression).evaluate(document, XPathConstants.NODESET);
}
/**
* 获取匹配XPath表达式的第一个节点
*/
public static Node getNode(Document document, String xpathExpression) throws Exception {
XPath xpath = XPathFactory.newInstance().newXPath();
return (Node) xpath.compile(xpathExpression).evaluate(document, XPathConstants.NODE);
}
/**
* 获取匹配XPath表达式的字符串值
*/
public static String getString(Document document, String xpathExpression) throws Exception {
XPath xpath = XPathFactory.newInstance().newXPath();
return (String) xpath.compile(xpathExpression).evaluate(document, XPathConstants.STRING);
}
/**
* 获取匹配XPath表达式的数值
*/
public static Double getNumber(Document document, String xpathExpression) throws Exception {
XPath xpath = XPathFactory.newInstance().newXPath();
return (Double) xpath.compile(xpathExpression).evaluate(document, XPathConstants.NUMBER);
}
/**
* 获取匹配XPath表达式的布尔值
*/
public static Boolean getBoolean(Document document, String xpathExpression) throws Exception {
XPath xpath = XPathFactory.newInstance().newXPath();
return (Boolean) xpath.compile(xpathExpression).evaluate(document, XPathConstants.BOOLEAN);
}
/**
* 获取所有匹配节点的文本内容列表
*/
public static List<String> getTextList(Document document, String xpathExpression) throws Exception {
NodeList nodes = getNodeList(document, xpathExpression);
List<String> textList = new ArrayList<>();
for (int i = 0; i < nodes.getLength(); i++) {
textList.add(nodes.item(i).getTextContent());
}
return textList;
}
}
使用这个工具类:
public class XPathDemo {
public static void main(String[] args) {
try {
String xml = "" +
" " +
" Harry Potter " +
" 29.99 " +
" " +
" " +
" Learning XML " +
" 39.95 " +
" " +
"";
Document document = XPathUtils.parseXmlString(xml);
// 获取所有书名
List<String> titles = XPathUtils.getTextList(document, "//title");
System.out.println("所有书名: " + titles);
// 获取第一本书的价格
Double price = XPathUtils.getNumber(document, "//book[1]/price");
System.out.println("第一本书价格: " + price);
// 检查是否有科学类别的书
Boolean hasScience = XPathUtils.getBoolean(document, "boolean(//book[@category='science'])");
System.out.println("有科学类别的书: " + hasScience);
} catch (Exception e) {
e.printStackTrace();
}
}
}
XPath 提供了许多内置函数,使我们能够执行各种计算和操作。以下是主要的函数类别:
函数 | 描述 | 示例 |
---|---|---|
count() |
计算节点集中的节点数 | count(//book) |
name() |
返回当前节点的名称 | name(/bookstore/book[1]) |
local-name() |
返回当前节点的本地名称(不含命名空间前缀) | local-name(/bookstore/book[1]) |
namespace-uri() |
返回当前节点的命名空间URI | namespace-uri(/bookstore/book[1]) |
position() |
返回当前节点在当前上下文节点集中的位置 | //book[position()=2] |
last() |
返回当前上下文节点集中的最后一个节点位置 | //book[position()=last()] |
Java 示例:
// 计算书籍数量
Double bookCount = (Double) xpath.evaluate("count(//book)", document, XPathConstants.NUMBER);
System.out.println("书籍总数: " + bookCount.intValue());
// 获取第一本书的元素名
String bookName = (String) xpath.evaluate("name(/bookstore/book[1])", document, XPathConstants.STRING);
System.out.println("第一本书的元素名: " + bookName);
// 获取最后一本书的标题
String lastBookTitle = (String) xpath.evaluate("//book[last()]/title", document, XPathConstants.STRING);
System.out.println("最后一本书的标题: " + lastBookTitle);
函数 | 描述 | 示例 |
---|---|---|
string() |
将对象转换为字符串 | string(//price[1]) |
concat() |
连接多个字符串 | concat(//author[1], ' - ', //title[1]) |
starts-with() |
检查字符串是否以特定子字符串开始 | starts-with(//title[1], 'H') |
contains() |
检查字符串是否包含特定子字符串 | contains(//title[1], 'Potter') |
substring() |
返回字符串的一部分 | substring(//title[1], 1, 5) |
substring-before() |
返回分隔符之前的子字符串 | substring-before('Harry Potter', ' ') |
substring-after() |
返回分隔符之后的子字符串 | substring-after('Harry Potter', ' ') |
string-length() |
返回字符串的长度 | string-length(//title[1]) |
normalize-space() |
删除前导和尾随空格,并将连续空格替换为一个空格 | normalize-space(' Hello World ') |
translate() |
替换字符串中的字符 | translate('aabbcc', 'abc', 'ABC') |
Java 示例:
// 获取标题和作者的组合字符串
String bookInfo = (String) xpath.evaluate("concat(//book[1]/title, ' by ', //book[1]/author)",
document, XPathConstants.STRING);
System.out.println("书籍信息: " + bookInfo);
// 检查标题是否包含特定文本
Boolean containsPotter = (Boolean) xpath.evaluate("contains(//book[1]/title, 'Potter')",
document, XPathConstants.BOOLEAN);
System.out.println("标题包含'Potter': " + containsPotter);
// 获取作者名的长度
Double nameLength = (Double) xpath.evaluate("string-length(//book[1]/author)",
document, XPathConstants.NUMBER);
System.out.println("作者名长度: " + nameLength.intValue());
函数 | 描述 | 示例 |
---|---|---|
number() |
将对象转换为数值 | number('42') |
sum() |
计算节点集中所有数值的总和 | sum(//price) |
floor() |
返回不大于参数的最大整数 | floor(10.6) |
ceiling() |
返回不小于参数的最小整数 | ceiling(10.2) |
round() |
四舍五入到最接近的整数 | round(10.5) |
Java 示例:
// 计算所有书籍的总价
Double totalPrice = (Double) xpath.evaluate("sum(//book/price)", document, XPathConstants.NUMBER);
System.out.println("所有书籍总价: " + totalPrice);
// 找出价格大于四舍五入值的书籍
NodeList expensiveBooks = (NodeList) xpath.evaluate("//book[price > round(price)]",
document, XPathConstants.NODESET);
System.out.println("价格有小数的书籍数量: " + expensiveBooks.getLength());
函数 | 描述 | 示例 |
---|---|---|
boolean() |
将对象转换为布尔值 | boolean(//book) |
not() |
返回参数的否定 | not(//book[price>100]) |
true() |
返回布尔值true | true() |
false() |
返回布尔值false | false() |
lang() |
测试当前节点是否使用特定语言 | //title[lang('en')] |
Java 示例:
// 检查是否有价格超过40的书
Boolean hasExpensiveBook = (Boolean) xpath.evaluate("boolean(//book[price>40])",
document, XPathConstants.BOOLEAN);
System.out.println("有价格超过40的书: " + hasExpensiveBook);
// 获取所有不是科学类别的书
NodeList nonScienceBooks = (NodeList) xpath.evaluate("//book[not(@category='science')]",
document, XPathConstants.NODESET);
System.out.println("非科学类书籍数量: " + nonScienceBooks.getLength());
可以在谓语中使用多个条件,通过逻辑操作符(and、or)连接它们:
//book[@category='fiction' and price>30] # 选择分类为fiction且价格大于30的book元素
//book[price>30 or @category='reference'] # 选择价格大于30或分类为reference的book元素
//book[not(@category='fiction')] # 选择分类不是fiction的book元素
Java 示例:
// 获取分类为fiction且价格大于25的书籍
NodeList expensiveFictionBooks = (NodeList) xpath.evaluate(
"//book[@category='fiction' and price>25]", document, XPathConstants.NODESET);
System.out.println("昂贵的小说类书籍: " + expensiveFictionBooks.getLength());
// 获取价格大于35或者是小说类的书籍
NodeList specialBooks = (NodeList) xpath.evaluate(
"//book[price>35 or @category='fiction']", document, XPathConstants.NODESET);
System.out.println("特殊书籍数量: " + specialBooks.getLength());
联合操作符(|)可以组合多个路径表达式的结果:
//book/title | //book/author # 选择所有book元素的title和author子元素
Java 示例:
// 获取所有书籍的标题和作者
NodeList titleAndAuthors = (NodeList) xpath.evaluate(
"//book/title | //book/author", document, XPathConstants.NODESET);
System.out.println("标题和作者总数: " + titleAndAuthors.getLength());
for (int i = 0; i < titleAndAuthors.getLength(); i++) {
System.out.println(titleAndAuthors.item(i).getNodeName() + ": " +
titleAndAuthors.item(i).getTextContent());
}
在 XPath 2.0 及更高版本中,可以使用变量。但在 Java 的 XPath 1.0 实现中,我们可以通过 XPathVariableResolver
接口使用变量:
import javax.xml.namespace.QName;
import javax.xml.xpath.XPathVariableResolver;
public class XPathWithVariables {
public static void main(String[] args) {
try {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse("bookstore.xml");
XPathFactory xPathFactory = XPathFactory.newInstance();
XPath xpath = xPathFactory.newXPath();
// 设置变量解析器
xpath.setXPathVariableResolver(new XPathVariableResolver() {
@Override
public Object resolveVariable(QName variableName) {
if (variableName.getLocalPart().equals("minPrice")) {
return 30.0;
} else if (variableName.getLocalPart().equals("category")) {
return "fiction";
}
return null;
}
});
// 使用变量的XPath表达式
NodeList nodes = (NodeList) xpath.evaluate(
"//book[price > $minPrice and @category=$category]/title",
document, XPathConstants.NODESET);
System.out.println("找到 " + nodes.getLength() + " 本符合条件的书籍:");
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(nodes.item(i).getTextContent());
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
如果 XML 文档使用命名空间,我们需要在 XPath 查询中处理它们。在 Java 中,我们可以使用 NamespaceContext
接口:
import javax.xml.namespace.NamespaceContext;
import java.util.Iterator;
public class XPathWithNamespaces {
public static void main(String[] args) {
try {
String xmlWithNs =
"\n" +
"\n" +
" \n" +
" Harry Potter \n" +
" J.K. Rowling \n" +
" \n" +
"";
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true); // 重要:启用命名空间支持
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(new InputSource(new StringReader(xmlWithNs)));
XPathFactory xPathFactory = XPathFactory.newInstance();
XPath xpath = xPathFactory.newXPath();
// 设置命名空间上下文
xpath.setNamespaceContext(new NamespaceContext() {
@Override
public String getNamespaceURI(String prefix) {
if ("bk".equals(prefix)) {
return "http://www.example.com/books";
}
return XMLConstants.NULL_NS_URI;
}
@Override
public String getPrefix(String namespaceURI) {
if ("http://www.example.com/books".equals(namespaceURI)) {
return "bk";
}
return null;
}
@Override
public Iterator<String> getPrefixes(String namespaceURI) {
return null; // 简化实现
}
});
// 使用命名空间的XPath表达式
NodeList titles = (NodeList) xpath.evaluate("//bk:book/bk:title",
document, XPathConstants.NODESET);
System.out.println("找到 " + titles.getLength() + " 本书:");
for (int i = 0; i < titles.getLength(); i++) {
System.out.println(titles.item(i).getTextContent());
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
对于更复杂的命名空间处理,可以创建一个更灵活的 NamespaceContext
实现:
public class SimpleNamespaceContext implements NamespaceContext {
private Map<String, String> prefixToUri = new HashMap<>();
private Map<String, String> uriToPrefix = new HashMap<>();
public void addNamespace(String prefix, String uri) {
prefixToUri.put(prefix, uri);
uriToPrefix.put(uri, prefix);
}
@Override
public String getNamespaceURI(String prefix) {
return prefixToUri.getOrDefault(prefix, XMLConstants.NULL_NS_URI);
}
@Override
public String getPrefix(String namespaceURI) {
return uriToPrefix.get(namespaceURI);
}
@Override
public Iterator<String> getPrefixes(String namespaceURI) {
String prefix = uriToPrefix.get(namespaceURI);
if (prefix == null) {
return Collections.emptyIterator();
}
return Collections.singletonList(prefix).iterator();
}
}
然后这样使用:
SimpleNamespaceContext nsContext = new SimpleNamespaceContext();
nsContext.addNamespace("bk", "http://www.example.com/books");
nsContext.addNamespace("xsi", "http://www.w3.org/2001/XMLSchema-instance");
xpath.setNamespaceContext(nsContext);
XPath 技术在多种实际应用场景中都有广泛的应用。以下是一些常见的 XPath 应用场景和具体示例。
XML 配置文件在许多系统和框架中都被广泛使用,如 Spring、Hibernate、Maven 等。使用 XPath 可以方便地读取和修改这些配置文件。
假设我们有一个 Spring 配置文件:
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:context="http://www.springframework.org/schema/context"
xsi:schemaLocation="http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans.xsd
http://www.springframework.org/schema/context
http://www.springframework.org/schema/context/spring-context.xsd">
<bean id="dataSource" class="org.apache.commons.dbcp2.BasicDataSource" destroy-method="close">
<property name="driverClassName" value="com.mysql.jdbc.Driver"/>
<property name="url" value="jdbc:mysql://localhost:3306/testdb"/>
<property name="username" value="admin"/>
<property name="password" value="password123"/>
bean>
<bean id="userService" class="com.example.service.UserServiceImpl">
<property name="dataSource" ref="dataSource"/>
<property name="maxRetries" value="3"/>
bean>
<bean id="productService" class="com.example.service.ProductServiceImpl">
<property name="dataSource" ref="dataSource"/>
<property name="cacheEnabled" value="true"/>
bean>
beans>
使用 XPath 处理这个配置文件:
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.*;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
public class SpringConfigReader {
public static void main(String[] args) {
try {
// 解析XML文档
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true); // 处理Spring XML需要支持命名空间
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse("spring-config.xml");
// 创建XPath对象
XPathFactory xPathFactory = XPathFactory.newInstance();
XPath xpath = xPathFactory.newXPath();
// 设置命名空间
xpath.setNamespaceContext(new SimpleNamespaceContext() {{
addNamespace("beans", "http://www.springframework.org/schema/beans");
}});
// 获取所有bean的ID
NodeList beans = (NodeList) xpath.evaluate("//beans:bean/@id",
document, XPathConstants.NODESET);
System.out.println("配置中的Bean:");
for (int i = 0; i < beans.getLength(); i++) {
System.out.println(" - " + beans.item(i).getNodeValue());
}
// 获取数据库连接信息
String url = (String) xpath.evaluate("//beans:bean[@id='dataSource']/beans:property[@name='url']/@value",
document, XPathConstants.STRING);
String username = (String) xpath.evaluate("//beans:bean[@id='dataSource']/beans:property[@name='username']/@value",
document, XPathConstants.STRING);
System.out.println("\n数据库连接信息:");
System.out.println("URL: " + url);
System.out.println("用户名: " + username);
// 检查哪些服务使用了数据源
NodeList servicesWithDataSource = (NodeList) xpath.evaluate(
"//beans:bean/beans:property[@name='dataSource' and @ref='dataSource']/..",
document, XPathConstants.NODESET);
System.out.println("\n使用数据源的服务:");
for (int i = 0; i < servicesWithDataSource.getLength(); i++) {
String id = servicesWithDataSource.item(i).getAttributes()
.getNamedItem("id").getNodeValue();
String className = servicesWithDataSource.item(i).getAttributes()
.getNamedItem("class").getNodeValue();
System.out.println(" - " + id + " (" + className + ")");
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
使用 XPath 可以从 Maven 的 pom.xml 文件中提取依赖信息:
public class MavenPomReader {
public static void main(String[] args) {
try {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse("pom.xml");
XPath xpath = XPathFactory.newInstance().newXPath();
// 获取项目坐标
String groupId = (String) xpath.evaluate("/project/groupId", document, XPathConstants.STRING);
String artifactId = (String) xpath.evaluate("/project/artifactId", document, XPathConstants.STRING);
String version = (String) xpath.evaluate("/project/version", document, XPathConstants.STRING);
System.out.println("项目坐标: " + groupId + ":" + artifactId + ":" + version);
// 获取所有依赖
NodeList dependencies = (NodeList) xpath.evaluate("/project/dependencies/dependency",
document, XPathConstants.NODESET);
System.out.println("\n项目依赖:");
for (int i = 0; i < dependencies.getLength(); i++) {
String depGroupId = (String) xpath.evaluate("groupId", dependencies.item(i),
XPathConstants.STRING);
String depArtifactId = (String) xpath.evaluate("artifactId", dependencies.item(i),
XPathConstants.STRING);
String depVersion = (String) xpath.evaluate("version", dependencies.item(i),
XPathConstants.STRING);
String depScope = (String) xpath.evaluate("scope", dependencies.item(i),
XPathConstants.STRING);
System.out.println(" - " + depGroupId + ":" + depArtifactId + ":" + depVersion +
(depScope.isEmpty() ? "" : " (" + depScope + ")"));
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
XPath 在 Web 爬虫开发中非常有用,可以精确定位 HTML 页面中的元素。虽然 HTML 并不总是严格遵循 XML 规则,但许多 HTML 解析库都支持使用 XPath 选择器。
Jsoup 是一个流行的 Java HTML 解析库,可以结合 jsoup-xpath 扩展来使用 XPath:
<dependency>
<groupId>org.jsoupgroupId>
<artifactId>jsoupartifactId>
<version>1.14.3version>
dependency>
<dependency>
<groupId>cn.wanghaomiaogroupId>
<artifactId>JsoupXpathartifactId>
<version>2.5.0version>
dependency>
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import cn.wanghaomiao.xpath.model.JXDocument;
import cn.wanghaomiao.xpath.model.JXNode;
import java.util.List;
public class WebScraper {
public static void main(String[] args) {
try {
// 获取网页内容
Document doc = Jsoup.connect("https://news.baidu.com/").get();
JXDocument jxDocument = new JXDocument(doc);
// 提取新闻标题
List<JXNode> titles = jxDocument.selN("//h3[@class='news-title_1YtI1']/a/text()");
System.out.println("百度新闻标题:");
for (JXNode title : titles) {
System.out.println(" - " + title.toString());
}
// 提取新闻链接
List<JXNode> links = jxDocument.selN("//h3[@class='news-title_1YtI1']/a/@href");
System.out.println("\n百度新闻链接:");
for (JXNode link : links) {
System.out.println(" - " + link.toString());
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
对于动态网页或JavaScript渲染的内容,可以使用 Selenium 结合 XPath:
<dependency>
<groupId>org.seleniumhq.seleniumgroupId>
<artifactId>selenium-javaartifactId>
<version>4.1.0version>
dependency>
import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.chrome.ChromeDriver;
import java.util.List;
public class SeleniumScraper {
public static void main(String[] args) {
// 设置Chrome驱动路径
System.setProperty("webdriver.chrome.driver", "path/to/chromedriver");
WebDriver driver = new ChromeDriver();
try {
// 打开网页
driver.get("https://www.github.com/trending");
// 使用XPath定位流行仓库名称
List<WebElement> repoNames = driver.findElements(
By.xpath("//h1[@class='h3 lh-condensed']/a"));
System.out.println("GitHub 流行仓库:");
for (WebElement repo : repoNames) {
System.out.println(" - " + repo.getText() + " (链接: " + repo.getAttribute("href") + ")");
}
// 使用XPath定位仓库描述
List<WebElement> descriptions = driver.findElements(
By.xpath("//p[@class='col-9 color-fg-muted my-1 pr-4']"));
System.out.println("\nGitHub 仓库描述:");
for (int i = 0; i < descriptions.size(); i++) {
System.out.println(" - " + repoNames.get(i).getText() + ": " +
descriptions.get(i).getText());
}
} finally {
// 关闭浏览器
driver.quit();
}
}
}
XPath 常用于 XML 到 JSON、XML 到 CSV 或其他格式的数据转换。
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.*;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import org.json.JSONArray;
import org.json.JSONObject;
public class XmlToJsonConverter {
public static void main(String[] args) {
try {
// 解析XML文档
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse("books.xml");
XPath xpath = XPathFactory.newInstance().newXPath();
// 获取所有书籍
NodeList books = (NodeList) xpath.evaluate("//book", document, XPathConstants.NODESET);
// 创建JSON数组存储结果
JSONArray jsonBooks = new JSONArray();
for (int i = 0; i < books.getLength(); i++) {
JSONObject jsonBook = new JSONObject();
// 获取书籍基本信息
String title = (String) xpath.evaluate("title/text()", books.item(i),
XPathConstants.STRING);
String author = (String) xpath.evaluate("author/text()", books.item(i),
XPathConstants.STRING);
String category = (String) xpath.evaluate("@category", books.item(i),
XPathConstants.STRING);
String year = (String) xpath.evaluate("year/text()", books.item(i),
XPathConstants.STRING);
String price = (String) xpath.evaluate("price/text()", books.item(i),
XPathConstants.STRING);
// 添加到JSON对象
jsonBook.put("title", title);
jsonBook.put("author", author);
jsonBook.put("category", category);
if (!year.isEmpty()) jsonBook.put("year", Integer.parseInt(year));
if (!price.isEmpty()) jsonBook.put("price", Double.parseDouble(price));
// 添加到JSON数组
jsonBooks.put(jsonBook);
}
// 输出JSON结果
JSONObject result = new JSONObject();
result.put("books", jsonBooks);
System.out.println(result.toString(2)); // 格式化输出,缩进2个空格
} catch (Exception e) {
e.printStackTrace();
}
}
}
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.*;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import java.io.FileWriter;
public class XmlToCsvConverter {
public static void main(String[] args) {
try {
// 解析XML文档
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse("books.xml");
XPath xpath = XPathFactory.newInstance().newXPath();
// 获取所有书籍
NodeList books = (NodeList) xpath.evaluate("//book", document, XPathConstants.NODESET);
// 创建CSV文件
FileWriter csvWriter = new FileWriter("books.csv");
// 写入CSV头
csvWriter.append("Title,Author,Category,Year,Price\n");
for (int i = 0; i < books.getLength(); i++) {
// 获取书籍信息
String title = (String) xpath.evaluate("title/text()", books.item(i),
XPathConstants.STRING);
String author = (String) xpath.evaluate("author/text()", books.item(i),
XPathConstants.STRING);
String category = (String) xpath.evaluate("@category", books.item(i),
XPathConstants.STRING);
String year = (String) xpath.evaluate("year/text()", books.item(i),
XPathConstants.STRING);
String price = (String) xpath.evaluate("price/text()", books.item(i),
XPathConstants.STRING);
// 转义CSV特殊字符(主要是逗号和引号)
title = "\"" + title.replace("\"", "\"\"") + "\"";
author = "\"" + author.replace("\"", "\"\"") + "\"";
category = "\"" + category.replace("\"", "\"\"") + "\"";
// 写入CSV行
csvWriter.append(title).append(",")
.append(author).append(",")
.append(category).append(",")
.append(year).append(",")
.append(price).append("\n");
}
csvWriter.flush();
csvWriter.close();
System.out.println("CSV文件已生成: books.csv");
} catch (Exception e) {
e.printStackTrace();
}
}
}
XPath 在自动化测试领域特别是 UI 测试中非常重要,它可以精确定位界面元素。
import org.junit.After;
import org.junit.Before;
import org.junit.Test;
import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.chrome.ChromeDriver;
import org.openqa.selenium.support.ui.ExpectedConditions;
import org.openqa.selenium.support.ui.WebDriverWait;
import java.time.Duration;
import static org.junit.Assert.assertEquals;
public class LoginTest {
private WebDriver driver;
private WebDriverWait wait;
@Before
public void setup() {
System.setProperty("webdriver.chrome.driver", "path/to/chromedriver");
driver = new ChromeDriver();
wait = new WebDriverWait(driver, Duration.ofSeconds(10));
driver.manage().window().maximize();
}
@Test
public void testSuccessfulLogin() {
// 打开登录页面
driver.get("https://example.com/login");
// 定位用户名输入框(通过标签名和placeholder属性)
WebElement usernameInput = driver.findElement(
By.xpath("//input[@placeholder='Username' or @placeholder='用户名']"));
usernameInput.sendKeys("testuser");
// 定位密码输入框(通过input类型和name属性)
WebElement passwordInput = driver.findElement(
By.xpath("//input[@type='password' and @name='password']"));
passwordInput.sendKeys("password123");
// 定位登录按钮(通过按钮文本)
WebElement loginButton = driver.findElement(
By.xpath("//button[contains(text(), 'Login') or contains(text(), '登录')]"));
loginButton.click();
// 等待欢迎消息出现(验证登录成功)
WebElement welcomeMessage = wait.until(ExpectedConditions.visibilityOfElementLocated(
By.xpath("//div[contains(@class, 'welcome-message') and contains(text(), 'Welcome')]")));
// 验证欢迎消息中包含用户名
assertTrue(welcomeMessage.getText().contains("testuser"));
}
@Test
public void testFailedLogin() {
// 打开登录页面
driver.get("https://example.com/login");
// 输入错误的凭据
driver.findElement(By.xpath("//input[@placeholder='Username']"))
.sendKeys("wronguser");
driver.findElement(By.xpath("//input[@type='password']"))
.sendKeys("wrongpass");
driver.findElement(By.xpath("//button[contains(text(), 'Login')]"))
.click();
// 等待错误消息出现
WebElement errorMessage = wait.until(ExpectedConditions.visibilityOfElementLocated(
By.xpath("//div[contains(@class, 'error-message')]")));
// 验证错误消息
assertEquals("Invalid username or password", errorMessage.getText());
}
@After
public void tearDown() {
if (driver != null) {
driver.quit();
}
}
}
Appium 是一个用于移动应用测试的工具,它也支持 XPath 定位元素:
import io.appium.java_client.AppiumDriver;
import io.appium.java_client.MobileElement;
import io.appium.java_client.android.AndroidDriver;
import org.junit.After;
import org.junit.Before;
import org.junit.Test;
import org.openqa.selenium.By;
import org.openqa.selenium.remote.DesiredCapabilities;
import java.net.URL;
import static org.junit.Assert.assertTrue;
public class AndroidAppTest {
private AppiumDriver<MobileElement> driver;
@Before
public void setup() throws Exception {
DesiredCapabilities caps = new DesiredCapabilities();
caps.setCapability("platformName", "Android");
caps.setCapability("deviceName", "Android Device");
caps.setCapability("app", "/path/to/app.apk");
driver = new AndroidDriver<>(new URL("http://127.0.0.1:4723/wd/hub"), caps);
}
@Test
public void testLogin() {
// 定位用户名输入框(通过资源ID)
MobileElement usernameInput = driver.findElement(
By.xpath("//android.widget.EditText[@resource-id='com.example.app:id/username']"));
usernameInput.sendKeys("testuser");
// 定位密码输入框(通过类名和文本)
MobileElement passwordInput = driver.findElement(
By.xpath("//android.widget.EditText[contains(@text, 'Password')]"));
passwordInput.sendKeys("password123");
// 定位登录按钮(通过类名和文本)
MobileElement loginButton = driver.findElement(
By.xpath("//android.widget.Button[@text='Login']"));
loginButton.click();
// 验证登录后的欢迎消息
MobileElement welcomeMessage = driver.findElement(
By.xpath("//android.widget.TextView[contains(@text, 'Welcome')]"));
assertTrue(welcomeMessage.isDisplayed());
}
@After
public void tearDown() {
if (driver != null) {
driver.quit();
}
}
}
许多 RESTful API 返回 XML 格式的数据,可以使用 XPath 处理这些响应。
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.*;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import java.net.HttpURLConnection;
import java.net.URL;
public class ApiClient {
public static void main(String[] args) {
try {
// 创建API请求
URL url = new URL("https://api.example.com/products");
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
connection.setRequestMethod("GET");
connection.setRequestProperty("Accept", "application/xml");
// 解析XML响应
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(connection.getInputStream());
XPath xpath = XPathFactory.newInstance().newXPath();
// 获取所有产品
NodeList products = (NodeList) xpath.evaluate("//product", document, XPathConstants.NODESET);
System.out.println("产品列表:");
for (int i = 0; i < products.getLength(); i++) {
String id = (String) xpath.evaluate("@id", products.item(i), XPathConstants.STRING);
String name = (String) xpath.evaluate("name/text()", products.item(i),
XPathConstants.STRING);
String price = (String) xpath.evaluate("price/text()", products.item(i),
XPathConstants.STRING);
System.out.println(" - " + id + ": " + name + " ($" + price + ")");
}
// 获取特定类别的产品数量
Double electronicsCount = (Double) xpath.evaluate("count(//product[@category='electronics'])",
document, XPathConstants.NUMBER);
System.out.println("\n电子产品数量: " + electronicsCount.intValue());
// 获取最贵的产品
String mostExpensiveProduct = (String) xpath.evaluate("//product[not(//product/price > price)]/name/text()",
document, XPathConstants.STRING);
System.out.println("最贵的产品: " + mostExpensiveProduct);
} catch (Exception e) {
e.printStackTrace();
}
}
}
编写高效、可维护的 XPath 表达式需要遵循一些最佳实践:
尽量使用相对路径而非绝对路径
绝对路径依赖于完整的文档结构,当文档结构变化时容易失效。
/html/body/div[1]/div[2]/table/tr[3]/td[2]
//table//tr[3]/td[2]
使用 ID 或唯一属性定位节点
如果元素有唯一标识符(如 ID),应优先使用它来定位元素。
//div[@id='content']
//input[@name='username']
避免过度依赖索引
索引位置可能因文档变化而改变。尽量使用属性、文本内容等更稳定的特性。
//div[3]/p[2]
//div[@class='content']/p[contains(text(), '重要信息')]
使用适当的轴
选择合适的轴可以减少遍历节点的数量,提高查询效率。
//title[text()='Harry Potter']/ancestor::book
//book[title='Harry Potter']
避免使用 //
作为表达式开头
以 //
开头的表达式会搜索整个文档,可能导致性能问题。如果知道大致路径,应该尽量指定。
//input[@type='text']
//form[@id='login']//input[@type='text']
合理使用谓词
谓词应该尽量具体,避免模糊匹配。
//a[contains(@href, 'example')]
//a[starts-with(@href, 'https://example.com/products/')]
给复杂的 XPath 表达式添加注释
// 获取所有未售罄的产品
String xpath = "//div[contains(@class, 'product') and not(contains(@class, 'sold-out'))]";
提取常用表达式为常量或变量
// 定义基础XPath
private static final String PRODUCT_BASE_XPATH = "//div[contains(@class, 'product')]";
// 使用时组合
String availableProductsXPath = PRODUCT_BASE_XPATH + "[not(contains(@class, 'sold-out'))]";
String featuredProductsXPath = PRODUCT_BASE_XPATH + "[@data-featured='true']";
在Java中使用XPath构建器模式
创建一个辅助类来构建和组合 XPath 表达式:
public class XPathBuilder {
private StringBuilder xpath = new StringBuilder();
public static XPathBuilder create(String base) {
return new XPathBuilder(base);
}
private XPathBuilder(String base) {
xpath.append(base);
}
public XPathBuilder withAttribute(String name, String value) {
xpath.append("[@").append(name).append("='").append(value).append("']");
return this;
}
public XPathBuilder withClass(String className) {
xpath.append("[contains(@class, '").append(className).append("')]");
return this;
}
public XPathBuilder withText(String text) {
xpath.append("[text()='").append(text).append("']");
return this;
}
public XPathBuilder containsText(String text) {
xpath.append("[contains(text(), '").append(text).append("')]");
return this;
}
public XPathBuilder child(String element) {
xpath.append("/").append(element);
return this;
}
public XPathBuilder descendant(String element) {
xpath.append("//").append(element);
return this;
}
public String build() {
return xpath.toString();
}
}
// 使用示例
String productXPath = XPathBuilder.create("//div")
.withClass("product")
.withAttribute("data-category", "electronics")
.child("h3")
.containsText("Smartphone")
.build();
调试 XPath 表达式是一项常见任务,以下是一些有效的方法和工具。
现代浏览器的开发者工具可以用来测试 XPath 表达式:
Chrome 开发者工具:
$x()
函数测试 XPath 表达式$x("//div[@class='product']") // 返回匹配的元素数组
Firefox 开发者工具:
$x()
函数或 document.evaluate()
$x("//div[@class='product']") // 返回匹配的元素数组
在线 XPath 测试工具:
浏览器扩展:
逐步构建和测试 XPath 表达式:
// 从简单表达式开始
String simpleXPath = "//book";
NodeList nodes = (NodeList) xpath.evaluate(simpleXPath, document, XPathConstants.NODESET);
System.out.println("找到 " + nodes.getLength() + " 本书");
// 逐步添加条件
String detailedXPath = "//book[@category='fiction']";
NodeList fictionBooks = (NodeList) xpath.evaluate(detailedXPath, document, XPathConstants.NODESET);
System.out.println("找到 " + fictionBooks.getLength() + " 本小说类书籍");
// 继续细化
String specificXPath = "//book[@category='fiction']/title[contains(text(), 'Potter')]";
NodeList potterBooks = (NodeList) xpath.evaluate(specificXPath, document, XPathConstants.NODESET);
System.out.println("找到 " + potterBooks.getLength() + " 本Potter系列书籍");
打印节点信息进行检查:
NodeList nodes = (NodeList) xpath.evaluate("//book", document, XPathConstants.NODESET);
// 打印每个节点的详细信息
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println("节点 #" + (i+1) + ":");
System.out.println(" 名称: " + nodes.item(i).getNodeName());
// 打印属性
NamedNodeMap attributes = nodes.item(i).getAttributes();
if (attributes != null) {
for (int j = 0; j < attributes.getLength(); j++) {
Node attr = attributes.item(j);
System.out.println(" 属性: " + attr.getNodeName() + " = " + attr.getNodeValue());
}
}
// 打印子节点
NodeList children = nodes.item(i).getChildNodes();
for (int j = 0; j < children.getLength(); j++) {
if (children.item(j).getNodeType() == Node.ELEMENT_NODE) {
System.out.println(" 子节点: " + children.item(j).getNodeName() +
" = " + children.item(j).getTextContent().trim());
}
}
System.out.println();
}
创建一个调试辅助方法:
public static void debugXPath(Document document, String xpathExpression) {
try {
XPath xpath = XPathFactory.newInstance().newXPath();
NodeList nodes = (NodeList) xpath.evaluate(xpathExpression, document, XPathConstants.NODESET);
System.out.println("XPath: " + xpathExpression);
System.out.println("匹配节点数: " + nodes.getLength());
if (nodes.getLength() > 0) {
System.out.println("首个匹配节点详情:");
System.out.println(" 名称: " + nodes.item(0).getNodeName());
System.out.println(" 文本内容: " + nodes.item(0).getTextContent().trim());
// 打印属性
NamedNodeMap attributes = nodes.item(0).getAttributes();
if (attributes != null && attributes.getLength() > 0) {
System.out.println(" 属性:");
for (int i = 0; i < attributes.getLength(); i++) {
Node attr = attributes.item(i);
System.out.println(" " + attr.getNodeName() + " = " + attr.getNodeValue());
}
}
}
System.out.println("---------------------------------");
} catch (Exception e) {
System.out.println("XPath评估出错: " + e.getMessage());
}
}
// 使用示例
debugXPath(document, "//book[@category='fiction']");
debugXPath(document, "//book[price>30]");
XPath 使用过程中可能会遇到各种问题,以下是一些常见错误和解决方案。
括号不匹配
//book[(@category='fiction']
解决方法: 确保所有括号、引号都是成对的。
属性值引号错误
//book[@category=fiction]
//book[@category="fiction']
解决方法: 使用一致的引号(单引号或双引号)包围属性值。
轴名称或节点测试错误
//book/child:title
//book/childs::title
解决方法: 确保轴名称正确,并且使用正确的语法(双冒号)。
命名空间问题
//book
解决方法: 处理命名空间
// 设置命名空间
xpath.setNamespaceContext(new SimpleNamespaceContext() {{
addNamespace("ns", "http://www.example.com/ns");
}});
// 使用带命名空间的XPath
String result = xpath.evaluate("//ns:book", document);
大小写敏感性
//Book
解决方法: XML 是大小写敏感的,确保节点名称和属性的大小写正确。
空白字符问题
//book[title='Harry Potter']
解决方法: 使用 normalize-space()
或 contains()
//book[normalize-space(title)='Harry Potter']
//book[contains(title, 'Harry')]
表达式过于复杂
//div[contains(@class, 'container')]//table//tr//td[contains(text(), 'Product')]//ancestor::tr
解决方法: 简化表达式,提高特异性
//div[contains(@class, 'container')]//td[contains(text(), 'Product')]/parent::tr
过度使用 //
操作符
//div//span//a
解决方法: 尽量减少 //
的使用,或者提供更具体的路径
//div[@id='content']//a
XPath 结果类型不匹配
// 预期返回节点集,但表达式实际返回数值
NodeList nodes = (NodeList) xpath.evaluate("count(//book)", document, XPathConstants.NODESET);
解决方法: 确保 XPath 表达式与预期返回类型匹配
Double count = (Double) xpath.evaluate("count(//book)", document, XPathConstants.NUMBER);
字符编码问题
// XML中包含非ASCII字符,但解析时未指定正确编码
Document document = builder.parse(new File("books.xml"));
解决方法: 指定正确的字符编码
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setCoalescing(true);
DocumentBuilder builder = factory.newDocumentBuilder();
// 使用InputSource设置编码
InputSource source = new InputSource(new FileInputStream("books.xml"));
source.setEncoding("UTF-8");
Document document = builder.parse(source);
DOM 结构与预期不符
解决方法: 打印 DOM 树结构进行检查
public static void printDOMTree(Node node, String indent) {
System.out.println(indent + node.getNodeName() + ": " + node.getNodeValue());
NamedNodeMap attributes = node.getAttributes();
if (attributes != null) {
for (int i = 0; i < attributes.getLength(); i++) {
Node attr = attributes.item(i);
System.out.println(indent + " @" + attr.getNodeName() + ": " + attr.getNodeValue());
}
}
NodeList children = node.getChildNodes();
for (int i = 0; i < children.getLength(); i++) {
printDOMTree(children.item(i), indent + " ");
}
}
// 使用示例
printDOMTree(document, "");
表达式 | 描述 | 示例 |
---|---|---|
/ |
从根节点选择 | /bookstore |
// |
从当前节点选择文档中任何位置的节点 | //book |
. |
选择当前节点 | ./title |
.. |
选择当前节点的父节点 | ../price |
@ |
选择属性 | @lang |
* |
匹配任何元素节点 | /bookstore/* |
@* |
匹配任何属性节点 | //@* |
node() |
匹配任何类型的节点 | //node() |
轴 | 描述 | 示例 |
---|---|---|
child:: |
选取当前节点的所有子元素(默认轴) | child::book |
descendant:: |
选取当前节点的所有后代元素 | descendant::price |
parent:: |
选取当前节点的父节点 | parent::node() |
ancestor:: |
选取当前节点的所有祖先节点 | ancestor::bookstore |
following-sibling:: |
选取当前节点之后的所有同级节点 | following-sibling::book |
preceding-sibling:: |
选取当前节点之前的所有同级节点 | preceding-sibling::book |
self:: |
选取当前节点 | self::node() |
descendant-or-self:: |
选取当前节点及其所有后代节点 | descendant-or-self::node() |
ancestor-or-self:: |
选取当前节点及其所有祖先节点 | ancestor-or-self::bookstore |
表达式 | 描述 | 示例 |
---|---|---|
[n] |
选择第n个元素 | //book[1] |
[last()] |
选择最后一个元素 | //book[last()] |
[position() < n] |
选择前n-1个元素 | //book[position() < 3] |
[@attr] |
选择带有指定属性的元素 | //book[@lang] |
[@attr='value'] |
选择属性值等于指定值的元素 | //book[@category='fiction'] |
[element] |
选择包含指定元素的元素 | //book[author] |
[element='value'] |
选择包含指定值的元素的元素 | //book[price>30] |
操作符 | 描述 | 示例 |
---|---|---|
and |
逻辑与 | //book[@lang='en' and @category='fiction'] |
or |
逻辑或 | //book[@lang='en' or @lang='fr'] |
not() |
逻辑非 | //book[not(@lang='en')] |
| |
联合(组合两个节点集) | //book/title | //book/author |
函数 | 描述 | 示例 |
---|---|---|
count() |
计算节点数量 | count(//book) |
name() |
返回节点名称 | name(/bookstore/book[1]) |
position() |
返回节点位置 | //book[position()=2] |
last() |
返回最后节点的位置 | //book[position()=last()] |
函数 | 描述 | 示例 |
---|---|---|
string() |
将节点转换为字符串 | string(//price[1]) |
concat() |
连接字符串 | concat(//author[1], ' - ', //title[1]) |
contains() |
检查是否包含子字符串 | contains(//title[1], 'Potter') |
starts-with() |
检查是否以子字符串开头 | starts-with(//title[1], 'Harry') |
substring() |
提取子字符串 | substring(//title[1], 1, 5) |
string-length() |
返回字符串长度 | string-length(//title[1]) |
normalize-space() |
规范化空白字符 | normalize-space(//description[1]) |
translate() |
替换字符 | translate(//title[1], 'abcdefg', 'ABCDEFG') |
函数 | 描述 | 示例 |
---|---|---|
number() |
将节点转换为数值 | number('42') |
sum() |
计算节点值总和 | sum(//price) |
floor() |
返回最大整数 | floor(10.6) |
ceiling() |
返回最小整数 | ceiling(10.2) |
round() |
四舍五入 | round(10.5) |
函数 | 描述 | 示例 |
---|---|---|
boolean() |
将节点转换为布尔值 | boolean(//book) |
not() |
布尔非 | not(//book[price>100]) |
true() |
返回true | true() |
false() |
返回false | false() |
// 创建XPath对象
XPathFactory xPathFactory = XPathFactory.newInstance();
XPath xpath = xPathFactory.newXPath();
// 获取节点集
NodeList nodes = (NodeList) xpath.evaluate("//book", document, XPathConstants.NODESET);
// 获取单个节点
Node node = (Node) xpath.evaluate("//book[1]", document, XPathConstants.NODE);
// 获取字符串值
String value = (String) xpath.evaluate("//book[1]/title", document, XPathConstants.STRING);
// 获取数值
Double number = (Double) xpath.evaluate("sum(//book/price)", document, XPathConstants.NUMBER);
// 获取布尔值
Boolean result = (Boolean) xpath.evaluate("boolean(//book[@category='fiction'])",
document, XPathConstants.BOOLEAN);
// 创建命名空间上下文
SimpleNamespaceContext nsContext = new SimpleNamespaceContext();
nsContext.addNamespace("ns", "http://www.example.com/ns");
nsContext.addNamespace("xsi", "http://www.w3.org/2001/XMLSchema-instance");
// 设置命名空间上下文
xpath.setNamespaceContext(nsContext);
// 使用命名空间查询
NodeList nodes = (NodeList) xpath.evaluate("//ns:book", document, XPathConstants.NODESET);
// 创建变量解析器
xpath.setXPathVariableResolver(new XPathVariableResolver() {
@Override
public Object resolveVariable(QName variableName) {
if (variableName.getLocalPart().equals("category")) {
return "fiction";
}
return null;
}
});
// 使用变量查询
NodeList nodes = (NodeList) xpath.evaluate("//book[@category=$category]",
| 示例 |
|------|------|------|
| `number()` | 将节点转换为数值 | `number('42')` |
| `sum()` | 计算节点值总和 | `sum(//price)` |
| `floor()` | 返回最大整数 | `floor(10.6)` |
| `ceiling()` | 返回最小整数 | `ceiling(10.2)` |
| `round()` | 四舍五入 | `round(10.5)` |
#### 布尔函数
| 函数 | 描述 | 示例 |
|------|------|------|
| `boolean()` | 将节点转换为布尔值 | `boolean(//book)` |
| `not()` | 布尔非 | `not(//book[price>100])` |
| `true()` | 返回true | `true()` |
| `false()` | 返回false | `false()` |
### 9.6 Java XPath 常用代码片段
#### 基本 XPath 查询
```java
// 创建XPath对象
XPathFactory xPathFactory = XPathFactory.newInstance();
XPath xpath = xPathFactory.newXPath();
// 获取节点集
NodeList nodes = (NodeList) xpath.evaluate("//book", document, XPathConstants.NODESET);
// 获取单个节点
Node node = (Node) xpath.evaluate("//book[1]", document, XPathConstants.NODE);
// 获取字符串值
String value = (String) xpath.evaluate("//book[1]/title", document, XPathConstants.STRING);
// 获取数值
Double number = (Double) xpath.evaluate("sum(//book/price)", document, XPathConstants.NUMBER);
// 获取布尔值
Boolean result = (Boolean) xpath.evaluate("boolean(//book[@category='fiction'])",
document, XPathConstants.BOOLEAN);
// 创建命名空间上下文
SimpleNamespaceContext nsContext = new SimpleNamespaceContext();
nsContext.addNamespace("ns", "http://www.example.com/ns");
nsContext.addNamespace("xsi", "http://www.w3.org/2001/XMLSchema-instance");
// 设置命名空间上下文
xpath.setNamespaceContext(nsContext);
// 使用命名空间查询
NodeList nodes = (NodeList) xpath.evaluate("//ns:book", document, XPathConstants.NODESET);
// 创建变量解析器
xpath.setXPathVariableResolver(new XPathVariableResolver() {
@Override
public Object resolveVariable(QName variableName) {
if (variableName.getLocalPart().equals("category")) {
return "fiction";
}
return null;
}
});
// 使用变量查询
NodeList nodes = (NodeList) xpath.evaluate("//book[@category=$category]",
document, XPathConstants.NODESET);