标记语言---XML

一、XML的定义与核心定位

XML(Extensible Markup Language,可扩展标记语言)是由万维网联盟(W3C)于1998年2月发布的一种标记语言,其核心设计目标是传输和存储数据,而非直接用于显示数据(这一点与HTML有本质区别)。

XML的“可扩展性”体现在:它没有预定义标签,用户可以根据需求自定义标签,只要遵循语法规则即可。这种灵活性使其成为跨平台、跨系统数据交换的重要标准,广泛应用于配置文件、数据传输、文档存储等场景。

从历史背景看,XML源于SGML(标准通用标记语言)——SGML功能强大但过于复杂,难以在互联网普及。XML简化了SGML的语法,保留了其结构化数据的核心能力,同时降低了使用门槛,成为互联网时代数据交换的早期标杆。

二、XML的基本结构与语法规则

XML文档的结构严谨,必须遵循一套严格的语法规范,否则会被视为“无效XML”,无法被解析器正确处理。

1. 基本结构:树状模型

XML文档采用“树状结构”,所有内容被包裹在一个根元素中,根元素是整个文档的唯一顶层节点,其他元素都是其子孙节点。例如:


<bookstore>  
  <book category=" fiction ">  
    <title> The Great Gatsby title>  
    <author> F. Scott Fitzgerald author>
    <price> 15.99 price>
  book>
  <book category=" non-fiction ">
    <title> Sapiens title>
    <author> Yuval Noah Harari author>
    <price> 22.50 price>
  book>
bookstore>
  • 根元素: 是唯一的顶层元素,包含所有其他内容。
  • 元素层级: 的子元素,</code>、<code><author></code> 等是 <code><book></code> 的子元素,形成清晰的父子关系。</li> </ul> <h6>2. 核心语法规则</h6> <p>XML语法严格,任何违反规则的文档都会被解析器拒绝,这是其与HTML(语法松散,标签可不闭合)的核心区别。</p> <ul> <li> <p><strong>标签必须闭合</strong>:所有元素必须有开始标签和结束标签,例如 <code><title></code> 对应 <code>。空元素(无内容)可简写为 (等价于 )。
    ❌ 错误:The Great Gatsby</code>(缺少结束标签)<br> ✅ 正确:<code><title>The Great Gatsby
    (空元素)

  • 标签必须正确嵌套:子元素必须完全包含在父元素内,不能交叉嵌套。
    ❌ 错误:1984</book>(交叉嵌套)
    ✅ 正确:1984

  • 属性值必须加引号:元素的属性值必须用单引号(')或双引号(")包裹,且前后一致。
    ❌ 错误:(属性值无引号)
    ✅ 正确:

  • 大小写敏感:XML标签区分大小写,</code> 和 <code><title></code> 是两个不同的元素。<br> ❌ 错误:<code><Title>...(开始与结束标签大小写不一致)
    ✅ 正确:......

  • 特殊字符处理:XML中,<>&"' 是保留字符,直接使用会导致解析错误,需用实体引用CDATA块处理:

    • 实体引用:<(<)、>(>)、&(&)、"(")、'(')。
      例:He said "Hello" 表示 He said "Hello"
    • CDATA块:当文本包含大量特殊字符时,用 包裹,内部内容会被解析器视为纯文本,无需转义。
      例: d) { ... } ]]>
3. 元素与属性的区别

XML中,数据可以通过“子元素”或“属性”存储,两者的使用场景需明确区分:

  • 子元素:适合存储核心数据,结构灵活,可嵌套其他元素。
    例:YuvalHarari

  • 属性:适合存储“元数据”(描述元素的附加信息),结构简单,不可嵌套。
    例:(category是描述book的附加信息)

最佳实践:避免过度使用属性。属性难以存储复杂结构,且在解析时不如子元素直观(例如,属性值无法包含多行文本,而子元素可以)。

三、命名规则与注释

XML对元素、属性的命名有明确规范,同时支持注释以增强文档可读性。

1. 命名规则
  • 名称可包含字母、数字、下划线(_)、连字符(-)、句点(.)和冒号(:,但冒号通常用于命名空间,不建议普通命名使用)。
  • 名称必须以字母或下划线开头,不能以数字或标点符号开头(如 <1book> 无效)。
  • 名称不能包含空格(如 无效,可用 )。
  • 名称不能使用XML保留字:不能以 xml(或XML、Xml等大小写变体)开头,因为这些前缀被W3C保留用于标准功能(如命名空间)。
  • 大小写敏感: 是两个不同的元素。
2. 注释

XML注释的格式为 ,需注意:

  • 注释不能嵌套(如 注释 --> 无效)。
  • 注释不能放在XML声明()之前。
  • 注释不能包含在标签内(如 category="fiction"> 无效)。

例:


<bookstore>
  <book category="fiction">
    <title>The Great Gatsbytitle>
    
    <author>F. Scott Fitzgeraldauthor>
  book>
bookstore>
四、命名空间(Namespaces):解决命名冲突

当不同来源的XML文档合并时,可能出现“同名元素但含义不同”的冲突(例如,两个文档都有 </code> 元素,一个表示书名,一个表示网页标题)。<strong>命名空间</strong>通过唯一标识区分这些元素。</p> <h6>1. 命名空间的核心原理</h6> <p>命名空间通过一个<strong>统一资源标识符(Uniform Resource Identifier,URI)</strong> 来标识,通常是一个URL(如 <code>https://example.com/books</code>),但URL本身无需可访问,仅作为唯一标识。</p> <p>命名空间通过 <code>xmlns</code> 属性声明,有两种形式:</p> <h6>2. 默认命名空间</h6> <p>声明默认命名空间后,该范围内的所有元素默认属于此命名空间,无需前缀。<br> 格式:<code>xmlns="URI"</code></p> <p>例:</p> <pre><code class="prism language-xml"><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>bookstore</span> <span class="token attr-name">xmlns</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>https://example.com/books<span class="token punctuation">"</span></span><span class="token punctuation">></span></span> <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>book</span><span class="token punctuation">></span></span> <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>title</span><span class="token punctuation">></span></span>The Great Gatsby<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>title</span><span class="token punctuation">></span></span> <span class="token comment"><!-- 属于 https://example.com/books 命名空间 --></span> <span class="token tag"><span class="token tag"><span class="token punctuation"></</span>book</span><span class="token punctuation">></span></span> <span class="token tag"><span class="token tag"><span class="token punctuation"></</span>bookstore</span><span class="token punctuation">></span></span> </code></pre> <h6>3. 前缀命名空间</h6> <p>当需要同时使用多个命名空间时,用前缀区分,格式:<code>xmlns:前缀="URI"</code>。引用时需在元素前加“前缀:”。</p> <p>例:一个文档同时包含“书店”和“图书馆”的元素,两者都有 <code><title></code>:</p> <pre><code class="prism language-xml"><span class="token comment"><!-- 声明两个命名空间,前缀分别为 book 和 lib --></span> <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>root</span> <span class="token attr-name"><span class="token namespace">xmlns:</span>book</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>https://example.com/books<span class="token punctuation">"</span></span> <span class="token attr-name"><span class="token namespace">xmlns:</span>lib</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>https://example.com/library<span class="token punctuation">"</span></span> <span class="token punctuation">></span></span> <span class="token tag"><span class="token tag"><span class="token punctuation"><</span><span class="token namespace">book:</span>title</span><span class="token punctuation">></span></span>The Great Gatsby<span class="token tag"><span class="token tag"><span class="token punctuation"></</span><span class="token namespace">book:</span>title</span><span class="token punctuation">></span></span> <span class="token comment"><!-- 属于书店命名空间 --></span> <span class="token tag"><span class="token tag"><span class="token punctuation"><</span><span class="token namespace">lib:</span>title</span><span class="token punctuation">></span></span>New York Public Library<span class="token tag"><span class="token tag"><span class="token punctuation"></</span><span class="token namespace">lib:</span>title</span><span class="token punctuation">></span></span> <span class="token comment"><!-- 属于图书馆命名空间 --></span> <span class="token tag"><span class="token tag"><span class="token punctuation"></</span>root</span><span class="token punctuation">></span></span> </code></pre> <h6>4. 命名空间的作用域</h6> <p>命名空间的声明只在当前元素及其子元素中有效(除非被子元素重新声明覆盖)。</p> <p>例:</p> <pre><code class="prism language-xml"><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>bookstore</span> <span class="token attr-name">xmlns</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>https://example.com/books<span class="token punctuation">"</span></span><span class="token punctuation">></span></span> <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>book</span><span class="token punctuation">></span></span> <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>title</span><span class="token punctuation">></span></span>1984<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>title</span><span class="token punctuation">></span></span> <span class="token comment"><!-- 继承父命名空间 --></span> <span class="token tag"><span class="token tag"><span class="token punctuation"></</span>book</span><span class="token punctuation">></span></span> <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>magazine</span> <span class="token attr-name">xmlns</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>https://example.com/magazines<span class="token punctuation">"</span></span><span class="token punctuation">></span></span> <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>title</span><span class="token punctuation">></span></span>Time<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>title</span><span class="token punctuation">></span></span> <span class="token comment"><!-- 被新命名空间覆盖 --></span> <span class="token tag"><span class="token tag"><span class="token punctuation"></</span>magazine</span><span class="token punctuation">></span></span> <span class="token tag"><span class="token tag"><span class="token punctuation"></</span>bookstore</span><span class="token punctuation">></span></span> </code></pre> <h5>五、XML文档的约束:DTD与XML Schema</h5> <p>为确保XML文档的结构符合预期(例如,<code><book></code> 必须包含 <code><title></code> 和 <code><author></code>),需要使用“约束语言”定义规则。常见的约束方式有<strong>DTD</strong>和<strong>XML Schema(XSD)</strong>。</p> <h6>1. DTD(文档类型定义)</h6> <p>DTD是最早的XML约束规范,语法简单,但功能有限,不支持数据类型。</p> <h6>(1)DTD的声明方式</h6> <ul> <li> <p><strong>内部DTD</strong>:约束规则嵌入XML文档中,用 <code><!DOCTYPE 根元素 [ ... ]></code> 声明。<br> 例:</p> <pre><code class="prism language-xml"><span class="token prolog"><?xml version="1.0" encoding="UTF-8"?></span> <span class="token doctype"><span class="token punctuation"><!</span><span class="token doctype-tag">DOCTYPE</span> <span class="token name">bookstore</span> <span class="token punctuation">[</span><span class="token internal-subset"> <span class="token comment"><!-- 声明bookstore元素:包含一个或多个book子元素 --></span> <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>!ELEMENT</span> <span class="token attr-name">bookstore</span> <span class="token attr-name">(book+)</span><span class="token punctuation">></span></span> <span class="token comment"><!-- 声明book元素:包含title、author、price子元素,顺序固定 --></span> <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>!ELEMENT</span> <span class="token attr-name">book</span> <span class="token attr-name">(title,</span> <span class="token attr-name">author,</span> <span class="token attr-name">price)</span><span class="token punctuation">></span></span> <span class="token comment"><!-- 声明title、author为文本元素 --></span> <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>!ELEMENT</span> <span class="token attr-name">title</span> <span class="token attr-name">(#PCDATA)</span><span class="token punctuation">></span></span> <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>!ELEMENT</span> <span class="token attr-name">author</span> <span class="token attr-name">(#PCDATA)</span><span class="token punctuation">></span></span> <span class="token comment"><!-- 声明price为文本元素 --></span> <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>!ELEMENT</span> <span class="token attr-name">price</span> <span class="token attr-name">(#PCDATA)</span><span class="token punctuation">></span></span> <span class="token comment"><!-- 声明book的category属性,类型为CDATA,必须出现 --></span> <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>!ATTLIST</span> <span class="token attr-name">book</span> <span class="token attr-name">category</span> <span class="token attr-name">CDATA</span> <span class="token attr-name">#REQUIRED</span><span class="token punctuation">></span></span> </span><span class="token punctuation">]</span><span class="token punctuation">></span></span> <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>bookstore</span><span class="token punctuation">></span></span> <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>book</span> <span class="token attr-name">category</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>fiction<span class="token punctuation">"</span></span><span class="token punctuation">></span></span> <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>title</span><span class="token punctuation">></span></span>The Great Gatsby<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>title</span><span class="token punctuation">></span></span> <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>author</span><span class="token punctuation">></span></span>F. Scott Fitzgerald<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>author</span><span class="token punctuation">></span></span> <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>price</span><span class="token punctuation">></span></span>15.99<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>price</span><span class="token punctuation">></span></span> <span class="token tag"><span class="token tag"><span class="token punctuation"></</span>book</span><span class="token punctuation">></span></span> <span class="token tag"><span class="token tag"><span class="token punctuation"></</span>bookstore</span><span class="token punctuation">></span></span> </code></pre> </li> <li> <p><strong>外部DTD</strong>:约束规则存储在外部文件中,XML文档通过文件名引用,适合多个文档共享约束。<br> 例:</p> <pre><code class="prism language-xml"><span class="token prolog"><?xml version="1.0" encoding="UTF-8"?></span> <span class="token comment"><!-- 引用外部DTD文件 bookstore.dtd --></span> <span class="token doctype"><span class="token punctuation"><!</span><span class="token doctype-tag">DOCTYPE</span> <span class="token name">bookstore</span> <span class="token name">SYSTEM</span> <span class="token string">"bookstore.dtd"</span><span class="token punctuation">></span></span> <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>bookstore</span><span class="token punctuation">></span></span> <span class="token comment"><!-- 内容需符合 bookstore.dtd 的约束 --></span> <span class="token tag"><span class="token tag"><span class="token punctuation"></</span>bookstore</span><span class="token punctuation">></span></span> </code></pre> </li> </ul> <h6>(2)DTD的核心语法</h6> <ul> <li> <p>元素声明:<code><!ELEMENT 元素名 (内容模型)></code><br> 内容模型可包含:</p> <ul> <li><code>#PCDATA</code>:表示元素包含文本数据(Parsed Character Data)。</li> <li>子元素列表:如 <code>(title, author)</code> 表示元素必须包含title和author,顺序固定。</li> <li>数量限定符:<code>+</code>(1次或多次)、<code>*</code>(0次或多次)、<code>?</code>(0次或1次)、<code>|</code>(或,如 <code>(title | name)</code> 表示二选一)。</li> </ul> </li> <li> <p>属性声明:<code><!ATTLIST 元素名 属性名 类型 默认值></code><br> 类型:<code>CDATA</code>(文本)、<code>ID</code>(唯一标识)、<code>IDREF</code>(引用其他ID)、<code>ENUM</code>(枚举值,如 <code>(fiction|non-fiction)</code>)等。<br> 默认值:<code>#REQUIRED</code>(必须出现)、<code>#IMPLIED</code>(可选)、<code>#FIXED "值"</code>(固定值)、具体默认值(如 <code>category "fiction"</code>)。</p> </li> <li> <p>实体声明:<code><!ENTITY 实体名 "实体值"></code>,用于定义可重用的文本片段(类似变量)。<br> 例:</p> <pre><code class="prism language-xml"><span class="token doctype"><span class="token punctuation"><!</span><span class="token doctype-tag">DOCTYPE</span> <span class="token name">bookstore</span> <span class="token punctuation">[</span><span class="token internal-subset"> <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>!ENTITY</span> <span class="token attr-name">publisher</span> <span class="token attr-name">"Scribner"</span><span class="token punctuation">></span></span> <span class="token comment"><!-- 内部实体 --></span> </span><span class="token punctuation">]</span><span class="token punctuation">></span></span> <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>bookstore</span><span class="token punctuation">></span></span> <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>book</span><span class="token punctuation">></span></span> <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>publisher</span><span class="token punctuation">></span></span>&publisher;<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>publisher</span><span class="token punctuation">></span></span> <span class="token comment"><!-- 引用实体,等价于 <publisher>Scribner</publisher> --></span> <span class="token tag"><span class="token tag"><span class="token punctuation"></</span>book</span><span class="token punctuation">></span></span> <span class="token tag"><span class="token tag"><span class="token punctuation"></</span>bookstore</span><span class="token punctuation">></span></span> </code></pre> </li> </ul> <h6>(3)DTD的局限性</h6> <ul> <li>不支持数据类型:无法约束 <code><price></code> 必须是数字,只能限定为文本。</li> <li>语法非XML:DTD有独立的语法规则,与XML不兼容,学习成本增加。</li> <li>安全风险:外部实体(如 <code><!ENTITY ext SYSTEM "file:///etc/passwd"></code>)可能导致XXE(XML外部实体注入)攻击,泄露服务器敏感文件。</li> </ul> <h6>2. XML Schema(XSD):DTD的替代者</h6> <p>XML Schema(简称XSD)是W3C推荐的新一代约束语言,基于XML语法,功能更强大,解决了DTD的诸多缺陷。</p> <h6>(1)XSD的优势</h6> <ul> <li>支持数据类型:可约束元素/属性为整数、日期、小数等(如 <code><price></code> 必须是 <code>decimal</code> 类型)。</li> <li>基于XML语法:无需学习新语法,解析器可直接处理。</li> <li>支持命名空间:可在Schema中声明和引用命名空间,适合复杂文档。</li> <li>更灵活的约束:支持自定义类型、继承、嵌套约束等。</li> </ul> <h6>(2)XSD的基本结构</h6> <p>一个XSD文档本身也是XML文档,根元素通常是 <code><xsd:schema></code>,并通过命名空间 <code>http://www.w3.org/2001/XMLSchema</code> 标识。</p> <p>例:为书店文档定义XSD约束(<code>bookstore.xsd</code>):</p> <pre><code class="prism language-xml"><span class="token prolog"><?xml version="1.0" encoding="UTF-8"?></span> <span class="token tag"><span class="token tag"><span class="token punctuation"><</span><span class="token namespace">xsd:</span>schema</span> <span class="token attr-name"><span class="token namespace">xmlns:</span>xsd</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>http://www.w3.org/2001/XMLSchema<span class="token punctuation">"</span></span> <span class="token attr-name">targetNamespace</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>https://example.com/books<span class="token punctuation">"</span></span> <span class="token attr-name"><!--</span> <span class="token attr-name">此Schema的命名空间</span> <span class="token attr-name">--</span><span class="token punctuation">></span></span> elementFormDefault="qualified"> <span class="token comment"><!-- 子元素需显式声明命名空间 --></span> <span class="token comment"><!-- 声明根元素 bookstore --></span> <span class="token tag"><span class="token tag"><span class="token punctuation"><</span><span class="token namespace">xsd:</span>element</span> <span class="token attr-name">name</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>bookstore<span class="token punctuation">"</span></span><span class="token punctuation">></span></span> <span class="token tag"><span class="token tag"><span class="token punctuation"><</span><span class="token namespace">xsd:</span>complexType</span><span class="token punctuation">></span></span> <span class="token tag"><span class="token tag"><span class="token punctuation"><</span><span class="token namespace">xsd:</span>sequence</span><span class="token punctuation">></span></span> <span class="token comment"><!-- bookstore 包含一个或多个 book 元素 --></span> <span class="token tag"><span class="token tag"><span class="token punctuation"><</span><span class="token namespace">xsd:</span>element</span> <span class="token attr-name">name</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>book<span class="token punctuation">"</span></span> <span class="token attr-name">maxOccurs</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>unbounded<span class="token punctuation">"</span></span><span class="token punctuation">></span></span> <span class="token tag"><span class="token tag"><span class="token punctuation"><</span><span class="token namespace">xsd:</span>complexType</span><span class="token punctuation">></span></span> <span class="token tag"><span class="token tag"><span class="token punctuation"><</span><span class="token namespace">xsd:</span>sequence</span><span class="token punctuation">></span></span> <span class="token comment"><!-- book 必须包含 title、author、price,顺序固定 --></span> <span class="token tag"><span class="token tag"><span class="token punctuation"><</span><span class="token namespace">xsd:</span>element</span> <span class="token attr-name">name</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>title<span class="token punctuation">"</span></span> <span class="token attr-name">type</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>xsd:string<span class="token punctuation">"</span></span><span class="token punctuation">/></span></span> <span class="token tag"><span class="token tag"><span class="token punctuation"><</span><span class="token namespace">xsd:</span>element</span> <span class="token attr-name">name</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>author<span class="token punctuation">"</span></span> <span class="token attr-name">type</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>xsd:string<span class="token punctuation">"</span></span><span class="token punctuation">/></span></span> <span class="token comment"><!-- price 必须是小数,且大于0 --></span> <span class="token tag"><span class="token tag"><span class="token punctuation"><</span><span class="token namespace">xsd:</span>element</span> <span class="token attr-name">name</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>price<span class="token punctuation">"</span></span><span class="token punctuation">></span></span> <span class="token tag"><span class="token tag"><span class="token punctuation"><</span><span class="token namespace">xsd:</span>simpleType</span><span class="token punctuation">></span></span> <span class="token tag"><span class="token tag"><span class="token punctuation"><</span><span class="token namespace">xsd:</span>restriction</span> <span class="token attr-name">base</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>xsd:decimal<span class="token punctuation">"</span></span><span class="token punctuation">></span></span> <span class="token tag"><span class="token tag"><span class="token punctuation"><</span><span class="token namespace">xsd:</span>minExclusive</span> <span class="token attr-name">value</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>0<span class="token punctuation">"</span></span><span class="token punctuation">/></span></span> <span class="token tag"><span class="token tag"><span class="token punctuation"></</span><span class="token namespace">xsd:</span>restriction</span><span class="token punctuation">></span></span> <span class="token tag"><span class="token tag"><span class="token punctuation"></</span><span class="token namespace">xsd:</span>simpleType</span><span class="token punctuation">></span></span> <span class="token tag"><span class="token tag"><span class="token punctuation"></</span><span class="token namespace">xsd:</span>element</span><span class="token punctuation">></span></span> <span class="token tag"><span class="token tag"><span class="token punctuation"></</span><span class="token namespace">xsd:</span>sequence</span><span class="token punctuation">></span></span> <span class="token comment"><!-- book 的 category 属性,必须是枚举值 --></span> <span class="token tag"><span class="token tag"><span class="token punctuation"><</span><span class="token namespace">xsd:</span>attribute</span> <span class="token attr-name">name</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>category<span class="token punctuation">"</span></span> <span class="token attr-name">use</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>required<span class="token punctuation">"</span></span><span class="token punctuation">></span></span> <span class="token tag"><span class="token tag"><span class="token punctuation"><</span><span class="token namespace">xsd:</span>simpleType</span><span class="token punctuation">></span></span> <span class="token tag"><span class="token tag"><span class="token punctuation"><</span><span class="token namespace">xsd:</span>restriction</span> <span class="token attr-name">base</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>xsd:string<span class="token punctuation">"</span></span><span class="token punctuation">></span></span> <span class="token tag"><span class="token tag"><span class="token punctuation"><</span><span class="token namespace">xsd:</span>enumeration</span> <span class="token attr-name">value</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>fiction<span class="token punctuation">"</span></span><span class="token punctuation">/></span></span> <span class="token tag"><span class="token tag"><span class="token punctuation"><</span><span class="token namespace">xsd:</span>enumeration</span> <span class="token attr-name">value</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>non-fiction<span class="token punctuation">"</span></span><span class="token punctuation">/></span></span> <span class="token tag"><span class="token tag"><span class="token punctuation"></</span><span class="token namespace">xsd:</span>restriction</span><span class="token punctuation">></span></span> <span class="token tag"><span class="token tag"><span class="token punctuation"></</span><span class="token namespace">xsd:</span>simpleType</span><span class="token punctuation">></span></span> <span class="token tag"><span class="token tag"><span class="token punctuation"></</span><span class="token namespace">xsd:</span>attribute</span><span class="token punctuation">></span></span> <span class="token tag"><span class="token tag"><span class="token punctuation"></</span><span class="token namespace">xsd:</span>complexType</span><span class="token punctuation">></span></span> <span class="token tag"><span class="token tag"><span class="token punctuation"></</span><span class="token namespace">xsd:</span>element</span><span class="token punctuation">></span></span> <span class="token tag"><span class="token tag"><span class="token punctuation"></</span><span class="token namespace">xsd:</span>sequence</span><span class="token punctuation">></span></span> <span class="token tag"><span class="token tag"><span class="token punctuation"></</span><span class="token namespace">xsd:</span>complexType</span><span class="token punctuation">></span></span> <span class="token tag"><span class="token tag"><span class="token punctuation"></</span><span class="token namespace">xsd:</span>element</span><span class="token punctuation">></span></span> <span class="token tag"><span class="token tag"><span class="token punctuation"></</span><span class="token namespace">xsd:</span>schema</span><span class="token punctuation">></span></span> </code></pre> <h6>(3)XML文档引用XSD</h6> <p>XML文档通过 <code>xsi:schemaLocation</code> 属性引用XSD,格式为 <code>命名空间 URI XSD文件路径</code>。</p> <p>例:</p> <pre><code class="prism language-xml"><span class="token prolog"><?xml version="1.0" encoding="UTF-8"?></span> <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>bookstore</span> <span class="token attr-name">xmlns</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>https://example.com/books<span class="token punctuation">"</span></span> <span class="token attr-name"><!--</span> <span class="token attr-name">与XSD的targetNamespace一致</span> <span class="token attr-name">--</span><span class="token punctuation">></span></span> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="https://example.com/books bookstore.xsd" <span class="token comment"><!-- 关联XSD --></span> > <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>book</span> <span class="token attr-name">category</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>fiction<span class="token punctuation">"</span></span><span class="token punctuation">></span></span> <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>title</span><span class="token punctuation">></span></span>The Great Gatsby<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>title</span><span class="token punctuation">></span></span> <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>author</span><span class="token punctuation">></span></span>F. Scott Fitzgerald<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>author</span><span class="token punctuation">></span></span> <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>price</span><span class="token punctuation">></span></span>15.99<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>price</span><span class="token punctuation">></span></span> <span class="token comment"><!-- 符合XSD的decimal类型约束 --></span> <span class="token tag"><span class="token tag"><span class="token punctuation"></</span>book</span><span class="token punctuation">></span></span> <span class="token tag"><span class="token tag"><span class="token punctuation"></</span>bookstore</span><span class="token punctuation">></span></span> </code></pre> <h5>六、XML的解析方式</h5> <p>解析是XML处理的核心步骤,即读取XML文档并将其转换为程序可操作的数据结构。常见的解析方式有<strong>DOM</strong>、<strong>SAX</strong>和<strong>StAX</strong>。</p> <h6>1. DOM(Document Object Model)</h6> <p>DOM将整个XML文档加载到内存中,构建一个树状结构(文档树),每个节点(元素、属性、文本等)都是树的一部分。</p> <ul> <li><strong>优点</strong>:可随机访问任意节点,支持增删改查(如修改 <code><price></code> 的值、删除某个 <code><book></code> 元素),适合需要频繁修改文档的场景。</li> <li><strong>缺点</strong>:内存占用大,对于GB级的大型XML文档,可能导致内存溢出。</li> </ul> <p>例(Java DOM解析示例):</p> <pre><code class="prism language-java"><span class="token keyword">import</span> <span class="token namespace">javax<span class="token punctuation">.</span>xml<span class="token punctuation">.</span>parsers<span class="token punctuation">.</span></span><span class="token class-name">DocumentBuilderFactory</span><span class="token punctuation">;</span> <span class="token keyword">import</span> <span class="token namespace">org<span class="token punctuation">.</span>w3c<span class="token punctuation">.</span>dom<span class="token punctuation">.</span></span><span class="token class-name">Document</span><span class="token punctuation">;</span> <span class="token keyword">import</span> <span class="token namespace">org<span class="token punctuation">.</span>w3c<span class="token punctuation">.</span>dom<span class="token punctuation">.</span></span><span class="token class-name">NodeList</span><span class="token punctuation">;</span> <span class="token keyword">public</span> <span class="token keyword">class</span> <span class="token class-name">DOMExample</span> <span class="token punctuation">{</span> <span class="token keyword">public</span> <span class="token keyword">static</span> <span class="token keyword">void</span> <span class="token function">main</span><span class="token punctuation">(</span><span class="token class-name">String</span><span class="token punctuation">[</span><span class="token punctuation">]</span> args<span class="token punctuation">)</span> <span class="token keyword">throws</span> <span class="token class-name">Exception</span> <span class="token punctuation">{</span> <span class="token comment">// 加载XML文档到内存,构建Document对象</span> <span class="token class-name">Document</span> doc <span class="token operator">=</span> <span class="token class-name">DocumentBuilderFactory</span><span class="token punctuation">.</span><span class="token function">newInstance</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token punctuation">.</span><span class="token function">newDocumentBuilder</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token punctuation">.</span><span class="token function">parse</span><span class="token punctuation">(</span><span class="token string">"bookstore.xml"</span><span class="token punctuation">)</span><span class="token punctuation">;</span> <span class="token comment">// 获取所有book元素</span> <span class="token class-name">NodeList</span> books <span class="token operator">=</span> doc<span class="token punctuation">.</span><span class="token function">getElementsByTagName</span><span class="token punctuation">(</span><span class="token string">"book"</span><span class="token punctuation">)</span><span class="token punctuation">;</span> <span class="token comment">// 遍历book元素,打印title</span> <span class="token keyword">for</span> <span class="token punctuation">(</span><span class="token keyword">int</span> i <span class="token operator">=</span> <span class="token number">0</span><span class="token punctuation">;</span> i <span class="token operator"><</span> books<span class="token punctuation">.</span><span class="token function">getLength</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span> i<span class="token operator">++</span><span class="token punctuation">)</span> <span class="token punctuation">{</span> <span class="token class-name">String</span> title <span class="token operator">=</span> books<span class="token punctuation">.</span><span class="token function">item</span><span class="token punctuation">(</span>i<span class="token punctuation">)</span><span class="token punctuation">.</span><span class="token function">getChildNodes</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">.</span><span class="token function">item</span><span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">)</span><span class="token punctuation">.</span><span class="token function">getTextContent</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span> <span class="token class-name">System</span><span class="token punctuation">.</span>out<span class="token punctuation">.</span><span class="token function">println</span><span class="token punctuation">(</span>title<span class="token punctuation">)</span><span class="token punctuation">;</span> <span class="token punctuation">}</span> <span class="token punctuation">}</span> <span class="token punctuation">}</span> </code></pre> <h6>2. SAX(Simple API for XML)</h6> <p>SAX是事件驱动的解析方式,逐行读取XML文档,当遇到标签、文本等内容时触发相应事件(如 <code>startElement</code>、<code>endElement</code>、<code>characters</code>),程序通过监听事件处理数据。</p> <ul> <li><strong>优点</strong>:内存占用小(无需加载整个文档),适合解析大型XML文档。</li> <li><strong>缺点</strong>:只能读取文档(无法修改),无法随机访问节点(只能按顺序解析),处理复杂逻辑时代码较繁琐。</li> </ul> <p>例(Java SAX解析示例):</p> <pre><code class="prism language-java"><span class="token keyword">import</span> <span class="token namespace">javax<span class="token punctuation">.</span>xml<span class="token punctuation">.</span>parsers<span class="token punctuation">.</span></span><span class="token class-name">SAXParserFactory</span><span class="token punctuation">;</span> <span class="token keyword">import</span> <span class="token namespace">org<span class="token punctuation">.</span>xml<span class="token punctuation">.</span>sax<span class="token punctuation">.</span>helpers<span class="token punctuation">.</span></span><span class="token class-name">DefaultHandler</span><span class="token punctuation">;</span> <span class="token keyword">import</span> <span class="token namespace">org<span class="token punctuation">.</span>xml<span class="token punctuation">.</span>sax<span class="token punctuation">.</span></span><span class="token class-name">Attributes</span><span class="token punctuation">;</span> <span class="token keyword">public</span> <span class="token keyword">class</span> <span class="token class-name">SAXExample</span> <span class="token keyword">extends</span> <span class="token class-name">DefaultHandler</span> <span class="token punctuation">{</span> <span class="token keyword">private</span> <span class="token keyword">boolean</span> isTitle <span class="token operator">=</span> <span class="token boolean">false</span><span class="token punctuation">;</span> <span class="token comment">// 遇到开始标签时触发</span> <span class="token annotation punctuation">@Override</span> <span class="token keyword">public</span> <span class="token keyword">void</span> <span class="token function">startElement</span><span class="token punctuation">(</span><span class="token class-name">String</span> uri<span class="token punctuation">,</span> <span class="token class-name">String</span> localName<span class="token punctuation">,</span> <span class="token class-name">String</span> qName<span class="token punctuation">,</span> <span class="token class-name">Attributes</span> attributes<span class="token punctuation">)</span> <span class="token punctuation">{</span> <span class="token keyword">if</span> <span class="token punctuation">(</span>qName<span class="token punctuation">.</span><span class="token function">equals</span><span class="token punctuation">(</span><span class="token string">"title"</span><span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token punctuation">{</span> isTitle <span class="token operator">=</span> <span class="token boolean">true</span><span class="token punctuation">;</span> <span class="token punctuation">}</span> <span class="token punctuation">}</span> <span class="token comment">// 遇到文本内容时触发</span> <span class="token annotation punctuation">@Override</span> <span class="token keyword">public</span> <span class="token keyword">void</span> <span class="token function">characters</span><span class="token punctuation">(</span><span class="token keyword">char</span><span class="token punctuation">[</span><span class="token punctuation">]</span> ch<span class="token punctuation">,</span> <span class="token keyword">int</span> start<span class="token punctuation">,</span> <span class="token keyword">int</span> length<span class="token punctuation">)</span> <span class="token punctuation">{</span> <span class="token keyword">if</span> <span class="token punctuation">(</span>isTitle<span class="token punctuation">)</span> <span class="token punctuation">{</span> <span class="token class-name">System</span><span class="token punctuation">.</span>out<span class="token punctuation">.</span><span class="token function">println</span><span class="token punctuation">(</span><span class="token keyword">new</span> <span class="token class-name">String</span><span class="token punctuation">(</span>ch<span class="token punctuation">,</span> start<span class="token punctuation">,</span> length<span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span> <span class="token comment">// 打印title内容</span> isTitle <span class="token operator">=</span> <span class="token boolean">false</span><span class="token punctuation">;</span> <span class="token punctuation">}</span> <span class="token punctuation">}</span> <span class="token keyword">public</span> <span class="token keyword">static</span> <span class="token keyword">void</span> <span class="token function">main</span><span class="token punctuation">(</span><span class="token class-name">String</span><span class="token punctuation">[</span><span class="token punctuation">]</span> args<span class="token punctuation">)</span> <span class="token keyword">throws</span> <span class="token class-name">Exception</span> <span class="token punctuation">{</span> <span class="token class-name">SAXParserFactory</span><span class="token punctuation">.</span><span class="token function">newInstance</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">.</span><span class="token function">newSAXParser</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token punctuation">.</span><span class="token function">parse</span><span class="token punctuation">(</span><span class="token string">"bookstore.xml"</span><span class="token punctuation">,</span> <span class="token keyword">new</span> <span class="token class-name">SAXExample</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span> <span class="token punctuation">}</span> <span class="token punctuation">}</span> </code></pre> <h6>3. StAX(Streaming API for XML)</h6> <p>StAX是介于DOM和SAX之间的解析方式,允许程序主动“拉取”事件(而非被动等待事件触发),兼具SAX的低内存占用和DOM的灵活性。</p> <ul> <li>优点:可控制解析过程(如暂停、继续),支持读写双向操作,适合需要生成XML文档的场景。</li> </ul> <h5>七、XML的应用场景与局限性</h5> <p>尽管JSON在轻量级数据交换中逐渐替代XML,但XML凭借其严格的结构和强大的约束能力,仍在诸多领域发挥重要作用。</p> <h6>1. 典型应用场景</h6> <ul> <li><strong>配置文件</strong>:Spring框架、AndroidManifest.xml、Maven的pom.xml等,利用XML的结构化特性定义程序行为。</li> <li><strong>办公文档</strong>:Office Open XML(.docx、.xlsx)、ODF(开放文档格式)本质是XML文件的压缩包,通过XML描述文档结构。</li> <li><strong>数据交换</strong>:早期Web服务(SOAP)基于XML传输数据;银行、物流等行业的系统间数据交换仍广泛使用XML(需严格验证格式)。</li> <li><strong>文档存储与发布</strong>:电子书(EPUB)、技术文档(DocBook)用XML存储内容,通过XSLT转换为HTML、PDF等格式发布。</li> <li><strong>订阅源</strong>:RSS(简易信息聚合)和Atom协议用XML定义内容更新,支持博客、新闻的订阅。</li> </ul> <h6>2. 局限性</h6> <ul> <li>冗余度高:标签成对出现(如 <code><title>...),相比JSON("title": "...")更占用空间。

  • 解析效率低:复杂的结构和约束验证导致解析速度慢于JSON。
  • 学习成本高:命名空间、Schema等概念较复杂,新手入门难度大。
  • 八、XML与其他格式的对比
    特性 XML JSON HTML
    设计目标 传输/存储数据,强调结构约束 轻量级数据交换,简洁高效 显示数据,定义网页结构
    标签 自定义,需闭合,大小写敏感 无标签,用键值对,大小写敏感 预定义标签,部分可省略闭合
    约束能力 支持DTD、XSD严格约束 无内置约束(需额外工具) 无约束(依赖浏览器容错)
    扩展性 强(命名空间解决冲突) 弱(无内置冲突解决机制) 弱(标签固定)
    适用场景 复杂结构、需严格验证的场景 轻量级API数据交换 网页展示
    九、总结

    XML是一种功能强大的可扩展标记语言,以严格的语法、结构化的树状模型和强大的约束机制(DTD、XSD)为核心,广泛应用于配置文件、数据交换、文档存储等场景。尽管JSON在轻量级场景中更受欢迎,但XML在需要复杂结构和严格验证的领域(如企业级系统、文档标准)仍不可替代。 掌握XML的语法规则、命名空间、约束机制及解析方式,对于理解现有系统(如Spring配置、Office文档)和设计跨平台数据交换方案至关重要。

    你可能感兴趣的:(标记语言,xml,java,前端)