Java-网络爬虫(一)

文章目录

  • 前言
  • 一、网络爬虫
    • 1. 介绍
    • 2. 爬虫协议
    • 3. 法律法规
  • 二、相关知识
    • 1. HttpClient
    • 2. Jsoup
  • 三、综合案例
    • 1. 案例一
    • 2. 案例二
  • 四、总结


前言

在大数据时代,信息采集是一项重要的工作,而互联网中的数据是海量的,如果单纯靠人力进行信息获取,不仅低效繁琐,而且搜集的成本也会提高,如何自动高效地获取互联网中的数据是一个重要的问题,而爬虫技术就是针对这些问题而生的。

一、网络爬虫

1. 介绍

网络爬虫(Web crawler)又称为网络蜘蛛或网络机器人,是一种自动化程序,用于在互联网上浏览和抓取信息,是互联网时代一项普遍运用的网络信息搜集技术。

Java-网络爬虫(一)_第1张图片

该项技术最早应用于搜索引擎领域,是搜索引擎获取数据来源的支撑性技术之一。随着数据资源的爆炸式增长,网络爬虫的应用场景和商业模式变得更加广泛和多样,较为常见的有新闻平台的内容汇聚和生成、电子商务平台的价格对比功能、基于气象数据的天气预报应用等等。

一个出色的网络爬虫工具能够处理大量的数据,大大节省了人类在该类工作上所花费的时间。网络爬虫作为数据抓取的实践工具,构成了互联网开放和信息资源共享理念的基石,如同互联网世界的一群工蜂,不断地推动网络空间的建设和发展。

原理:

传统爬虫从一个或者若干个初始网页的 URL 开始,通过模拟浏览器行为,自动访问并解析网页。它们可以跟踪链接,从一个网页到另一个网页,逐层遍历整个互联网。通过取网页的HTML源代码,并从中提取有用的信息,如文本、图像、链接等。

Java-网络爬虫(一)_第2张图片

功能与价值:

网络爬虫技术是互联网开放共享精神的重要实现工具。允许收集者通过爬虫技术收集数据是数据开放共享的重要措施,网络爬虫能够通过聚合信息、提供链接,为数据所有者的网站带来更多的访问量,这些善意、适量的数据抓取行为,符合数据所有者开放共享数据的预期。

从功能上来讲,爬虫一般分为数据采集、处理、存储三个部分。

爬虫的应用:

  • 实现和优化搜索引擎
  • 获取更多的数据源

2. 爬虫协议

爬虫的功能十分强大,但是我们并不能为所欲为的使用爬虫,爬虫需要遵循 robots 协议,该协议是国际互联网界通行的道德规范,每一个爬虫都应该遵守。

Robots 协议(也称为爬虫协议、机器人协议等)的全称是 “网络爬虫排除标准”(Robots Exclusion Protocol),网站通过 Robots 协议告诉搜索引擎哪些页面可以抓取,哪些页面不能抓取,该协议属于一个规范,并不能保证网站的隐私。

Robots 协议是国际互联网界通行的道德规范,基于以下原则:

  1. 搜索技术应服务于人类,同时尊重信息提供者的意愿,并维护其隐私权。

  2. 网站有义务保证其使用者的个人信息和隐私不被侵犯。

在使用爬虫的时候我们应当注意一下几点:

  1. 拒绝访问和抓取有关不良信息的网站。

  2. 注意版权意识,对于原创内容,未经允许不要将信息用于其他用途,特别是商业方面。

  3. 严格遵循 robots.txt 协议。

  4. 爬虫协议查看方式

大部分网站都会提供自己的 robots.txt 文件,这个文件会告诉我们该网站的爬取准则,查看方式是在域名加 /robots.txt 并回车。

例如百度的爬虫协议:https://www.baidu.com/robots.txt

Java-网络爬虫(一)_第3张图片

  • User-agent:为访问用户
  • Allow:允许爬行的目录
  • Disallow:不允许爬行的目录
  • Sitemap:网站地图,告诉爬虫这个页面是网站地图

从上述协议可以看到百度对于普通使用者为:

User-agent: *
Disallow: /

则表示禁止所有搜索引擎访问网站的任何部分。

而对于 Baiduspider 这类用户

User-agent: Baiduspider
Disallow: /baidu
Disallow: /s?
Disallow: /ulink?
Disallow: /link?
Disallow: /home/news/data/
Disallow: /bh

则不能爬取 /baidu、/s?、/ulink?... 下面目录的数据。


3. 法律法规

网络爬虫规制的必要性:

  • (一)恶意抓取侵害他人权益和经营自由通过网络爬虫访问和收集网站数据行为本身已经产生了相当规模的网络流量,但是,有分析表明其中三分之二的数据抓取行为是恶意的,并且这一比例还在不断上升:恶意机器人可以掠夺资源、削弱竞争对手。恶意机器人往往被滥用于从一个站点抓取内容,然后将该内容发布至另一个站点,而不显示数据源或链接,这一不当手段将帮助非法组织建立虚假网站,产生欺诈风险,以及对知识产权、商业秘密的窃取行为。
  • (二)恶意爬虫危及网络安全从行为本身来讲,恶意爬虫会对目标网站产生 DDOS 攻击的效果,当有成百上千的爬虫机器人与同一网站进行交互,网站将会失去对真实目标的判断,其很难确定哪些流量来自真实用户,哪些流量来自机器人。若平台使用了掺杂虚假访问行为的缺陷数据,做出相关的营销决策,可能会导致大量时间和金钱的损失。尽管 robots 协议作为国际通行的行业规范,能够帮助网站在 robot.txt文件中明确列出限制抓取的信息范围,但并不能从根本上阻止机器人的恶意爬虫行为,其协议本身无法为网站提供任何技术层面的保护。目前恶意的网络爬虫行为已经给互联网平台带来了一定的商业和技术风险,影响了其正常的平台运营和业务开展。
  • (三)现行法律规制方式及其不足之处网络爬虫的不当访问、收集、干扰行为应当受到法律规制。目前,我国已有法律对网络爬虫进行规制主要集中在刑法有关计算机信息系统犯罪的相关条文上。从刑法所追求的法益来看,刑法规范的是对目标网站造成严重影响并具有社会危害性的数据抓取行为。若行为人违反刑法的相关规定,通过网络爬虫访问收集一般网站所存储、处理或传输的数据,可能构成刑法中的非法获取计算机信息系统数据罪;如果在数据抓取过程中实施了非法控制行为,可能构成非法控制计算机信息系统罪。此外,由于使用网络爬虫造成对目标网站的功能干扰,导致其访问流量增大、系统响应变缓,影响正常运营的,也可能构成破坏计算机信息系统罪。

由于刑法的谦抑性,其只能在网络爬虫行为产生严重社会危害而无刑罚以外手段进行规制的情形下起到惩治效果,而对于网络爬虫妨碍其他网站正常运行、过量访问收集数据等一般性危害行为很难起到规制作用,因此我国需要建立在刑法以外的行政规制手段,构建完善的刑事责任、行政责任乃至民事责任体系,以保护互联网平台的合法权益,维护网络空间的正常秩序。

完善网络爬虫规制方式的建议:

从网络爬虫的相关案例来看,其使用者往往有充分的理由做出可能涉嫌违法的数据抓取行为,其辩护理由通常包括:“我可以用公开访问的数据做任何事”“这是合理使用行为”“这与搜索引擎行为类似”“只是使用了自动脚本,而未使用在建立网站上”“我已经遵守了它们的 robots 协议”“该网站没有 robots 协议”“这些数据我只是个人研究使用,并没有商业目的”。由此可见,依托行为是否具有恶意或者通过主观层面来判断爬虫行为违法与否是具有难度的。网络爬虫规制的目标是在数据资源开放共享与互联网平台经营自由、网站安全之间取得平衡,遵循技术中立性原则,对网络爬虫进行规制应当基于客观结果,即是否妨碍网站的正常运行或者对他人合法权益造成严重危害。

数字时代,在数据利用成为网络产业中心的背景下,亟待确立数据访问、获取的规则。在技术手段、市场手段之外,需要采用法律手段规制爬虫技术的应用,对特定的数据访问场景进行规范。通过数据安全立法设置爬虫技术严重影响网站正常运行的判断标准,对具有危害性的网络爬虫行为进行适当规制,是我国安全与发展并重互联网治理根本准则在数据治理领域的体现,其目标是在数据活动各方主体中找到平衡点,兼顾数据开放共享与数据所有者经营自由和安全、社会公共利益,确保数据依法有序自由流动。

谨慎使用的技术:

  1. 爬虫访问频次要控制,别把对方服务器搞崩溃
  2. 涉及个人隐私的信息不能爬
  3. 突破网站的反爬措施,后果很严重
  4. 不要把爬取的数据做不正当竞争
  5. 付费内容,不要抓
  6. 突破网络反爬措施的代码,最好不要上传到网络上

二、相关知识

1. HttpClient

因为爬虫技术是模仿游览器行为,那么必然是需要发送 HTTP 请求,在 Javaapache 有提供支持 HTTP 协议的客户端编程工具包 HttpClient,可以使用 HttpClient 来发送请求,

例如:使用 HttpClient 请求 https://www.rgbku.com/chaxun.html(rgb颜色查询器)

Java-网络爬虫(一)_第4张图片

那么代码可以这样写:

import org.apache.http.Consts;
import org.apache.http.HttpEntity;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.util.EntityUtils;

import java.io.IOException;
import java.util.Objects;

public class HttpClientDemo {
    public static void main(String[] args) {
        // 创建 httpClient 对象
        CloseableHttpClient httpClient = HttpClients.createDefault();
        // 创建 httpGet 对象,设置访问 URL
        HttpGet httpGet = new HttpGet("https://www.rgbku.com/chaxun.html");
        CloseableHttpResponse response = null;
        try {
            // 发送请求
            response = httpClient.execute(httpGet);
            // 根据状态码判断是否响应成功(一般是 200)
            if (response.getStatusLine().getStatusCode() == 200) {
                // 解析响应
                HttpEntity entity = response.getEntity();
                String html = EntityUtils.toString(entity, Consts.UTF_8);
                // 打印响应内容
                System.out.println(html);
            }
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            // 关闭资源
            try {
                httpClient.close();
                if (Objects.nonNull(response)) {
                    response.close();
                }
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }
}

可以从打印信息中就能看出已获取到该网站的 HTML 信息了

Java-网络爬虫(一)_第5张图片

HttpClient 不仅可以发送 GET 请求,还能够发起 POSTPUTDELETE 等等各种请求,同时还能携带参数、tokencookie 和设置 User-Agent 等功能,可以做到很好的模拟用户在游览器上面访问网站。

GET 请求:

    /**
     * 发送不带参数的 GET 请求
     */
    public static void sendGet() throws Exception{
        // 创建 httpClient 对象
        CloseableHttpClient httpClient = HttpClients.createDefault();
        // 创建 httpGet 对象,设置访问 URL
        HttpGet httpGet = new HttpGet("https://www.xxxx.com");
        CloseableHttpResponse response = httpClient.execute(httpGet);
        // 对响应信息进行处理 ...

        // 关闭资源
        response.close();
        httpClient.close();
    }

    /**
     * 发送带参数的 GET 请求
     */
    public static void sendGetHasParam() throws Exception{
        // 创建 httpClient 对象
        CloseableHttpClient httpClient = HttpClients.createDefault();
        // 创建 URIBuilder
        URIBuilder uriBuilder = new URIBuilder("https://www.xxxx.com");
        // 设置参数
        uriBuilder
                .setParameter("param1", "value1")
                .setParameter("param2", "value2");
        // 创建 httpGet 对象,设置 URI
        HttpGet httpGet = new HttpGet(uriBuilder.build());
        CloseableHttpResponse response = httpClient.execute(httpGet);
        // 对响应信息进行处理 ...

        // 关闭资源
        response.close();
        httpClient.close();
    }

POST 请求:

    /**
     * 发送不带参数的 POST 请求
     */
    public static void sendPost() throws Exception{
        // 创建 httpClient 对象
        CloseableHttpClient httpClient = HttpClients.createDefault();
        // 创建 httpPost 对象,设置访问 URL
        HttpPost httpPost = new HttpPost("https://www.xxxx.com");
        CloseableHttpResponse response = httpClient.execute(httpPost);
        // 对响应信息进行处理 ...

        // 关闭资源
        response.close();
        httpClient.close();
    }

    /**
     * 发送带参数的 POST 请求
     */
    public static void sendPostHasParam() throws Exception{
        // 创建 httpClient 对象
        CloseableHttpClient httpClient = HttpClients.createDefault();
        // 创建 httpPost 对象,设置访问 URL
        HttpPost httpPost = new HttpPost("https://www.xxxx.com");
        // 封装表单中的参数
        List<NameValuePair> params = new ArrayList<>();
        params.add(new BasicNameValuePair("param1", "value1"));
        params.add(new BasicNameValuePair("param2", "value2"));
        /*
         * 创建表单的 entity 对象
         *      parameters:表单数据
         *      charset:编码
         */
        UrlEncodedFormEntity entity = new UrlEncodedFormEntity(params, Consts.UTF_8);
        // 设置表单的 entity 对象到 Post 请求中
        httpPost.setEntity(entity);
        CloseableHttpResponse response = httpClient.execute(httpPost);
        // 对响应信息进行处理 ...

        // 关闭资源
        response.close();
        httpClient.close();
    }

连接池:

每次发送请求时都需要创建 HttpClient,会有频繁创建和销毁的问题,对性能会有一定的影响,可以使用连接池来解决这个问题

    /**
     * 连接池
     */
    public static void poolManager() throws Exception {
        // 创建连接池管理器
        PoolingHttpClientConnectionManager connectionManager = new PoolingHttpClientConnectionManager();
        // 设置最大连接数
        connectionManager.setMaxTotal(100);
        // 设置每个主机的最大连接数:因为在爬取数据的时候可通会访问多个主机,如果不设置可能会导致连接不均衡
        connectionManager.setDefaultMaxPerRoute(10);
        // 使用连接池管理器发起请求
        doGet(connectionManager);
        doGet(connectionManager);
    }

    /**
     * 通过连接池发送 http 请求
     * @param connectionManager 连接池
     */
    private static void doGet(PoolingHttpClientConnectionManager connectionManager) throws Exception {
        // 从连接池中获取 HttpClient 对象
        CloseableHttpClient httpClient = HttpClients.custom().setConnectionManager(connectionManager).build();
        // 创建 httpGet 对象,设置访问 URL
        HttpGet httpGet = new HttpGet("https://www.xxxx.com");
        CloseableHttpResponse response = httpClient.execute(httpGet);
        // 对响应信息进行处理 ...

        // 关闭资源
        response.close();
        // 这里要注意的是 httpClient 不需要再关闭了,因为是连接池管理的
        // httpClient.close();
    }

设置请求信息:

有时候因为网络或者目标服务器的原因,请求需要更长的时间才能完成,或者需要改变 User-Agent 的设置才能正常发起请求时,这个时候就需要自定义设置这些参数

    /**
     * 配置请求信息
     */
    private static void setRequestInfo() throws Exception {
        // 创建 httpClient 对象
        CloseableHttpClient httpClient = HttpClients.createDefault();
        // 创建 httpGet 对象,设置访问 URL
        HttpGet httpGet = new HttpGet("https://www.xxxx.com");
        
        // 配置请求信息
        RequestConfig requestConfig = RequestConfig.custom()
                // 创建连接的最长时间,单位是毫秒
                .setConnectTimeout(1000)
                // 设置获取连接的最长时间,单位是毫秒
                .setConnectionRequestTimeout(500)
                // 设置数据传输的最长时间
                .setSocketTimeout(10 * 1000)
                // 还可以设置其它的设置 ...
                .build();
        
        // 设置请求信息
        httpGet.setConfig(requestConfig);

        CloseableHttpResponse response = httpClient.execute(httpGet);
        // 对响应信息进行处理 ...

        // 关闭资源
        response.close();
        httpClient.close();
    }

虽然说 HttpClient 已经具备了爬数据的功能,但是使用 HttpClient 得到的响应信息比较难解析其中的内容,要对 html 进行进行大量的字符串处理,编写正则表达式去匹配想要获取的信息,所以通常情况下我们并不会使用这种方式进行数据分析。


2. Jsoup

Jsoup 是一款 JavaHTML 解析器,可直接解析某个 URL 地址、HTML 文本内容,它提供了一套非常省力的 API,可通过 DOMCSS 以及类似于 jQuery 的操作方法来取出操作数据。

主要功能如下:

  1. 从一个 URL、文件或字符串中解析 HTML
  2. 使用 DOMCSS 选择器来查找,取出数据
  3. 可操作 HTML 元素、属性、文本

引入依赖:


<dependency>
    <groupId>org.jsoupgroupId>
    <artifactId>jsoupartifactId>
    <version>1.15.3version>
dependency>

示例:还是以 https://www.rgbku.com/chaxun.html(rgb颜色查询器) 这个网址为例,获取该网址的 title 内容

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;

import java.net.URL;

public class JsoupDemo {
    public static void main(String[] args) throws Exception {
        // 解析 URL
        Document document = Jsoup.parse(new URL("https://www.rgbku.com/chaxun.html"), 1000);
        // 比如我想要获取 html 文件中  部分的内容</span>
        <span class="token class-name">String</span> title <span class="token operator">=</span> document
                <span class="token comment">// 获取所有的 title 标签</span>
                <span class="token punctuation">.</span><span class="token function">getElementsByTag</span><span class="token punctuation">(</span><span class="token string">"title"</span><span class="token punctuation">)</span>
                <span class="token comment">// 拿到第一个</span>
                <span class="token punctuation">.</span><span class="token function">first</span><span class="token punctuation">(</span><span class="token punctuation">)</span>
                <span class="token comment">// 获取标签中的文本内容</span>
                <span class="token punctuation">.</span><span class="token function">text</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token comment">// 打印</span>
        <span class="token class-name">System</span><span class="token punctuation">.</span>out<span class="token punctuation">.</span><span class="token function">println</span><span class="token punctuation">(</span><span class="token string">"title = "</span> <span class="token operator">+</span> title<span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span>
<span class="token punctuation">}</span>
</code></pre> 
  <p>日志信息:</p> 
  <p><a href="http://img.e-com-net.com/image/info8/17b0e4b16709449d9cf95595ebeaa400.jpg" target="_blank"><img src="http://img.e-com-net.com/image/info8/17b0e4b16709449d9cf95595ebeaa400.jpg" alt="Java-网络爬虫(一)_第6张图片" width="650" height="122" style="border:1px solid black;"></a></p> 
  <p>虽然使用 <code>Jsoup</code> 可以替代 <code>HttpClient</code> 直接发起请求解析数据,但是往往不会这样使用,因为实际的开发过程中,需要使用到多线程、连接池、代理等等方式,而 <code>Jsoup</code> 对这些的支持不是很友好,所以一般把 <code>Jsoup</code> 仅仅作为 <code>Html</code> 解析工具使用</p> 
  <p><strong>(一)加载文档:</strong></p> 
  <pre><code class="prism language-java">    <span class="token comment">/**
     * 通过 URL 加载文档
     */</span>
    <span class="token keyword">public</span> <span class="token keyword">static</span> <span class="token keyword">void</span> <span class="token function">loadingUrl</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token keyword">throws</span> <span class="token class-name">Exception</span> <span class="token punctuation">{</span>
        <span class="token comment">/*
         * 解析 URL:
         *      spec:访问问的 url
         *      timeoutMillis:超时时间
         */</span>
        <span class="token class-name">Document</span> document <span class="token operator">=</span> <span class="token class-name">Jsoup</span><span class="token punctuation">.</span><span class="token function">parse</span><span class="token punctuation">(</span><span class="token keyword">new</span> <span class="token class-name">URL</span><span class="token punctuation">(</span><span class="token string">"https://www.rgbku.com/chaxun.html"</span><span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token number">1000</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token comment">// 解析 document</span>

    <span class="token punctuation">}</span>

    <span class="token comment">/**
     * 通过字符串加载文档
     */</span>
    <span class="token keyword">public</span> <span class="token keyword">static</span> <span class="token keyword">void</span> <span class="token function">loadingString</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token keyword">throws</span> <span class="token class-name">Exception</span> <span class="token punctuation">{</span>
        <span class="token class-name">String</span> html <span class="token operator">=</span> <span class="token string">"html-content"</span><span class="token punctuation">;</span>
        <span class="token comment">// 解析字符串</span>
        <span class="token class-name">Document</span> document <span class="token operator">=</span> <span class="token class-name">Jsoup</span><span class="token punctuation">.</span><span class="token function">parse</span><span class="token punctuation">(</span>html<span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token comment">// 解析 document</span>

    <span class="token punctuation">}</span>

    <span class="token comment">/**
     * 通过文件架子啊文档
     */</span>
    <span class="token keyword">public</span> <span class="token keyword">static</span> <span class="token keyword">void</span> <span class="token function">loadingFile</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token keyword">throws</span> <span class="token class-name">Exception</span> <span class="token punctuation">{</span>
        <span class="token comment">// html 文件</span>
        <span class="token class-name">File</span> file <span class="token operator">=</span> <span class="token keyword">new</span> <span class="token class-name">File</span><span class="token punctuation">(</span><span class="token string">"D:\\demo.html"</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token comment">// 解析字符串</span>
        <span class="token class-name">Document</span> document <span class="token operator">=</span> <span class="token class-name">Jsoup</span><span class="token punctuation">.</span><span class="token function">parse</span><span class="token punctuation">(</span>file<span class="token punctuation">,</span> <span class="token string">"utf8"</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token comment">// 解析 document</span>

    <span class="token punctuation">}</span>
</code></pre> 
  <p><strong>(二)提取数据:</strong></p> 
  <p>获取元素</p> 
  <ul> 
   <li>方式一:使用 <code>DOM</code> 方法提取文档数据 
    <ul> 
     <li>getElementById(String id):根据 <code>id</code> 获取元素</li> 
     <li>getElementsByTag(String tag):根据标签获取元素</li> 
     <li>getElementsByClass(String className):根据 <code>class</code> 获取元素</li> 
     <li>getElementsByAttribute(String key):根据属性获取元素</li> 
     <li>getElementsByAttributeValue(String key, String value):根据属性和属性值获取元素</li> 
     <li>…</li> 
    </ul> </li> 
  </ul> 
  <p>示例:</p> 
  <pre><code class="prism language-java">    <span class="token comment">/**
     * 使用DOM方法获取元素
     */</span>
    <span class="token keyword">public</span> <span class="token keyword">static</span> <span class="token keyword">void</span> <span class="token function">getElementByDom</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token keyword">throws</span> <span class="token class-name">Exception</span> <span class="token punctuation">{</span>
        <span class="token comment">// 加载 document</span>
        <span class="token class-name">Document</span> document <span class="token operator">=</span> <span class="token class-name">Jsoup</span><span class="token punctuation">.</span><span class="token function">parse</span><span class="token punctuation">(</span><span class="token keyword">new</span> <span class="token class-name">URL</span><span class="token punctuation">(</span><span class="token string">"https://www.xxxx.com"</span><span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token number">1000</span><span class="token punctuation">)</span><span class="token punctuation">;</span>

        <span class="token comment">// 根据 id 获取元素</span>
        <span class="token class-name">Element</span> idElement <span class="token operator">=</span> document<span class="token punctuation">.</span><span class="token function">getElementById</span><span class="token punctuation">(</span><span class="token string">"id"</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token comment">// 根据标签获取元素</span>
        <span class="token class-name">Elements</span> tagElements <span class="token operator">=</span> document<span class="token punctuation">.</span><span class="token function">getElementsByTag</span><span class="token punctuation">(</span><span class="token string">"tag_name"</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token comment">// 根据 class 获取元素</span>
        <span class="token class-name">Elements</span> classElements <span class="token operator">=</span> document<span class="token punctuation">.</span><span class="token function">getElementsByClass</span><span class="token punctuation">(</span><span class="token string">"class_name"</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token comment">// 根据属性获取元素</span>
        <span class="token class-name">Elements</span> attributeElements <span class="token operator">=</span> document<span class="token punctuation">.</span><span class="token function">getElementsByAttribute</span><span class="token punctuation">(</span><span class="token string">"attribute"</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token comment">// 通过属性值获取元素</span>
        <span class="token class-name">Elements</span> attributeValueElements <span class="token operator">=</span> document<span class="token punctuation">.</span><span class="token function">getElementsByAttributeValue</span><span class="token punctuation">(</span><span class="token string">"attribute"</span><span class="token punctuation">,</span> <span class="token string">"value"</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span>
</code></pre> 
  <ul> 
   <li>方式二:使用选择器获取元素 
    <ul> 
     <li>select(String cssQuery):通过选择器获取元素</li> 
    </ul> </li> 
  </ul> 
  <p>示例:</p> 
  <pre><code class="prism language-java">    <span class="token comment">/**
     * 使用选择器获取元素
     */</span>
    <span class="token keyword">public</span> <span class="token keyword">static</span> <span class="token keyword">void</span> <span class="token function">getElementBySelector</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token keyword">throws</span> <span class="token class-name">Exception</span> <span class="token punctuation">{</span>
        <span class="token comment">// 加载 document</span>
        <span class="token class-name">Document</span> document <span class="token operator">=</span> <span class="token class-name">Jsoup</span><span class="token punctuation">.</span><span class="token function">parse</span><span class="token punctuation">(</span><span class="token keyword">new</span> <span class="token class-name">URL</span><span class="token punctuation">(</span><span class="token string">"https://www.xxxx.com"</span><span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token number">1000</span><span class="token punctuation">)</span><span class="token punctuation">;</span>

        <span class="token comment">// 通过 id 查找元素</span>
        <span class="token class-name">Element</span> idElement <span class="token operator">=</span> document<span class="token punctuation">.</span><span class="token function">select</span><span class="token punctuation">(</span><span class="token string">"#id"</span><span class="token punctuation">)</span><span class="token punctuation">.</span><span class="token function">first</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token comment">// 通过标签名称查找元素</span>
        <span class="token class-name">Elements</span> tagElements <span class="token operator">=</span> document<span class="token punctuation">.</span><span class="token function">select</span><span class="token punctuation">(</span><span class="token string">"tag_name"</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token comment">// 通过 class 名称查找元素</span>
        <span class="token class-name">Elements</span> classElements <span class="token operator">=</span> document<span class="token punctuation">.</span><span class="token function">select</span><span class="token punctuation">(</span><span class="token string">".class_name"</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token comment">// 通过属性获取元素</span>
        <span class="token class-name">Elements</span> attributeElements <span class="token operator">=</span> document<span class="token punctuation">.</span><span class="token function">select</span><span class="token punctuation">(</span><span class="token string">"[attribute]"</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token comment">// 通过属性值获取元素</span>
        <span class="token class-name">Elements</span> attributeValueElements <span class="token operator">=</span> document<span class="token punctuation">.</span><span class="token function">select</span><span class="token punctuation">(</span><span class="token string">"[attribute=value]"</span><span class="token punctuation">)</span><span class="token punctuation">;</span>

        <span class="token comment">/*
         * 选择器可以任意的组合使用
         */</span>

        <span class="token comment">// tag#id:标签+ID</span>
        <span class="token class-name">Elements</span> tagIdElements <span class="token operator">=</span> document<span class="token punctuation">.</span><span class="token function">select</span><span class="token punctuation">(</span><span class="token string">"tag_name#id"</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token comment">// tag.class:标签+class</span>
        <span class="token class-name">Elements</span> tagClassElements <span class="token operator">=</span> document<span class="token punctuation">.</span><span class="token function">select</span><span class="token punctuation">(</span><span class="token string">"tag_name.class_name"</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token comment">// tag[attribute]:标签+属性名</span>
        <span class="token class-name">Elements</span> tagAttributeElements <span class="token operator">=</span> document<span class="token punctuation">.</span><span class="token function">select</span><span class="token punctuation">(</span><span class="token string">"tag_name[attribute]"</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token comment">// tag[attribute].class:标签+属性名+class</span>
        <span class="token class-name">Elements</span> tagAttributeClassElements <span class="token operator">=</span> document<span class="token punctuation">.</span><span class="token function">select</span><span class="token punctuation">(</span><span class="token string">"tag_name[attribute].class_name"</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token comment">// ancestor child:查询某个元素下的子元素</span>
        <span class="token class-name">Elements</span> ancestorChildElements <span class="token operator">=</span> document<span class="token punctuation">.</span><span class="token function">select</span><span class="token punctuation">(</span><span class="token string">"ancestor child"</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token comment">// parent > child:查询直接子元素</span>
        <span class="token class-name">Elements</span> parentChildElements <span class="token operator">=</span> document<span class="token punctuation">.</span><span class="token function">select</span><span class="token punctuation">(</span><span class="token string">"parent > child"</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token comment">// parent > *:查找所有子元素</span>
        <span class="token class-name">Elements</span> allChildElements <span class="token operator">=</span> document<span class="token punctuation">.</span><span class="token function">select</span><span class="token punctuation">(</span><span class="token string">"parent > *"</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span>
</code></pre> 
  <p>处理元素数据</p> 
  <ul> 
   <li>attr(String key):获取属性</li> 
   <li>attr(String key, String value):设置属性</li> 
   <li>attributes():获取所有属性</li> 
   <li>id():获取 id</li> 
   <li>className():获取类名</li> 
   <li>classNames():获取类名集</li> 
   <li>text():获取文本内容</li> 
   <li>text(String value):设置文本内容</li> 
   <li>html():获取内部 HTML 内容</li> 
   <li>html(String value):设置内部 HTML 内容</li> 
   <li>outerHtml():获取外部 HTML 值</li> 
   <li>data():获取数据内容(例如 script 和 style 标签)</li> 
   <li>tag():获取标签</li> 
   <li>tagName():获取标签名称</li> 
   <li>…</li> 
  </ul> 
  <hr> 
  <h2>三、综合案例</h2> 
  <p> </p> 
  <p>爬虫的工作流程通常包括以下几个步骤:</p> 
  <ol> 
   <li> <p>确定起始点:需要指定一个或多个起始 <code>URL</code> 作为抓取的入口点。</p> </li> 
   <li> <p>下载网页:使用 <code>HTTP</code> 或 <code>HTTPS</code> 协议向服务器发送请求,下载网页的 <code>HTML</code> 源代码。</p> </li> 
   <li> <p>解析网页:解析 <code>HTML</code> 源代码,提取出所需的信息。</p> </li> 
   <li> <p>处理数据:对提取的数据进行处理和清洗,以便后续分析和存储。</p> </li> 
   <li> <p>跟踪链接:从当前网页中提取所有链接,并将它们添加到待抓取的 <code>URL</code> 队列中,以便进一步遍历。</p> </li> 
   <li> <p>控制抓取速度:为了避免给服务器带来过大的负载,爬虫通常会设置抓取速度限制,包括请求间隔时间和并发请求数量。</p> </li> 
   <li> <p>存储数据:将提取的数据保存到数据库、文件或其他存储介质中,以便后续使用和分析。</p> </li> 
  </ol> 
  <p>以下案例我会将抓取到的数据存放在 <code>excel</code> 文件中,会使用到 <code>EasyExcel</code>,对于 <code>EasyExcel</code> 的使用可参考博客: Java-easyExcel入门教程</p> 
  <h3>1. 案例一</h3> 
  <p> </p> 
  <p>将 RBG 颜色查询器页面中的数据抓取出来之后保存到 <code>excel</code> 表格中</p> 
  <p><a href="http://img.e-com-net.com/image/info8/7d87da4a106a4e1f90820559e6c85441.jpg" target="_blank"><img src="http://img.e-com-net.com/image/info8/7d87da4a106a4e1f90820559e6c85441.jpg" alt="Java-网络爬虫(一)_第7张图片" width="650" height="343" style="border:1px solid black;"></a></p> 
  <p>分析:</p> 
  <p><a href="http://img.e-com-net.com/image/info8/45600732ada84e63a16804cf1a88d033.jpg" target="_blank"><img src="http://img.e-com-net.com/image/info8/45600732ada84e63a16804cf1a88d033.jpg" alt="Java-网络爬虫(一)_第8张图片" width="650" height="540" style="border:1px solid black;"></a></p> 
  <p>从该网址的 <code>HTML</code> 源码分析可得,RGB 的数据来源于 <code><table></code> 表格中,<code><tbody></code> 定义了表格主题,用于存放数据,每个单元格的数据存放在 <code><td></code> 标签中,所以只要拿到 中的文本数据,再剔除掉表头相关的数据即可。</p> 
  <p>代码实现:</p> 
  <p><code>RgbEntity.java</code></p> 
  <pre><code class="prism language-java"><span class="token keyword">import</span> <span class="token import"><span class="token namespace">com<span class="token punctuation">.</span>alibaba<span class="token punctuation">.</span>excel<span class="token punctuation">.</span>annotation<span class="token punctuation">.</span></span><span class="token class-name">ExcelProperty</span></span><span class="token punctuation">;</span>
<span class="token keyword">import</span> <span class="token import"><span class="token namespace">com<span class="token punctuation">.</span>alibaba<span class="token punctuation">.</span>excel<span class="token punctuation">.</span>annotation<span class="token punctuation">.</span>write<span class="token punctuation">.</span>style<span class="token punctuation">.</span></span><span class="token operator">*</span></span><span class="token punctuation">;</span>
<span class="token keyword">import</span> <span class="token import"><span class="token namespace">com<span class="token punctuation">.</span>alibaba<span class="token punctuation">.</span>excel<span class="token punctuation">.</span>enums<span class="token punctuation">.</span>poi<span class="token punctuation">.</span></span><span class="token class-name">BorderStyleEnum</span></span><span class="token punctuation">;</span>
<span class="token keyword">import</span> <span class="token import"><span class="token namespace">com<span class="token punctuation">.</span>alibaba<span class="token punctuation">.</span>excel<span class="token punctuation">.</span>enums<span class="token punctuation">.</span>poi<span class="token punctuation">.</span></span><span class="token class-name">FillPatternTypeEnum</span></span><span class="token punctuation">;</span>
<span class="token keyword">import</span> <span class="token import"><span class="token namespace">com<span class="token punctuation">.</span>alibaba<span class="token punctuation">.</span>excel<span class="token punctuation">.</span>enums<span class="token punctuation">.</span>poi<span class="token punctuation">.</span></span><span class="token class-name">HorizontalAlignmentEnum</span></span><span class="token punctuation">;</span>
<span class="token keyword">import</span> <span class="token import"><span class="token namespace">io<span class="token punctuation">.</span>swagger<span class="token punctuation">.</span>annotations<span class="token punctuation">.</span></span><span class="token class-name">ApiModelProperty</span></span><span class="token punctuation">;</span>
<span class="token keyword">import</span> <span class="token import"><span class="token namespace">lombok<span class="token punctuation">.</span></span><span class="token class-name">AllArgsConstructor</span></span><span class="token punctuation">;</span>
<span class="token keyword">import</span> <span class="token import"><span class="token namespace">lombok<span class="token punctuation">.</span></span><span class="token class-name">Builder</span></span><span class="token punctuation">;</span>
<span class="token keyword">import</span> <span class="token import"><span class="token namespace">lombok<span class="token punctuation">.</span></span><span class="token class-name">Data</span></span><span class="token punctuation">;</span>
<span class="token keyword">import</span> <span class="token import"><span class="token namespace">lombok<span class="token punctuation">.</span></span><span class="token class-name">NoArgsConstructor</span></span><span class="token punctuation">;</span>

<span class="token comment">/**
 * RGB 实体类
 */</span>
<span class="token annotation punctuation">@Data</span>
<span class="token annotation punctuation">@Builder</span>
<span class="token annotation punctuation">@AllArgsConstructor</span>
<span class="token annotation punctuation">@NoArgsConstructor</span>
<span class="token comment">// 头背景设置</span>
<span class="token annotation punctuation">@HeadStyle</span><span class="token punctuation">(</span>fillPatternType <span class="token operator">=</span> <span class="token class-name">FillPatternTypeEnum</span><span class="token punctuation">.</span><span class="token constant">SOLID_FOREGROUND</span><span class="token punctuation">,</span> horizontalAlignment <span class="token operator">=</span> <span class="token class-name">HorizontalAlignmentEnum</span><span class="token punctuation">.</span><span class="token constant">CENTER</span><span class="token punctuation">,</span> borderLeft <span class="token operator">=</span> <span class="token class-name">BorderStyleEnum</span><span class="token punctuation">.</span><span class="token constant">THIN</span><span class="token punctuation">,</span> borderTop <span class="token operator">=</span> <span class="token class-name">BorderStyleEnum</span><span class="token punctuation">.</span><span class="token constant">THIN</span><span class="token punctuation">,</span> borderRight <span class="token operator">=</span> <span class="token class-name">BorderStyleEnum</span><span class="token punctuation">.</span><span class="token constant">THIN</span><span class="token punctuation">,</span> borderBottom <span class="token operator">=</span> <span class="token class-name">BorderStyleEnum</span><span class="token punctuation">.</span><span class="token constant">THIN</span><span class="token punctuation">)</span>
<span class="token comment">//标题高度</span>
<span class="token annotation punctuation">@HeadRowHeight</span><span class="token punctuation">(</span><span class="token number">40</span><span class="token punctuation">)</span>
<span class="token comment">//内容高度</span>
<span class="token annotation punctuation">@ContentRowHeight</span><span class="token punctuation">(</span><span class="token number">30</span><span class="token punctuation">)</span>
<span class="token comment">//内容居中,左、上、右、下的边框显示</span>
<span class="token annotation punctuation">@ContentStyle</span><span class="token punctuation">(</span>horizontalAlignment <span class="token operator">=</span> <span class="token class-name">HorizontalAlignmentEnum</span><span class="token punctuation">.</span><span class="token constant">CENTER</span><span class="token punctuation">,</span> borderLeft <span class="token operator">=</span> <span class="token class-name">BorderStyleEnum</span><span class="token punctuation">.</span><span class="token constant">THIN</span><span class="token punctuation">,</span> borderTop <span class="token operator">=</span> <span class="token class-name">BorderStyleEnum</span><span class="token punctuation">.</span><span class="token constant">THIN</span><span class="token punctuation">,</span> borderRight <span class="token operator">=</span> <span class="token class-name">BorderStyleEnum</span><span class="token punctuation">.</span><span class="token constant">THIN</span><span class="token punctuation">,</span> borderBottom <span class="token operator">=</span> <span class="token class-name">BorderStyleEnum</span><span class="token punctuation">.</span><span class="token constant">THIN</span><span class="token punctuation">)</span>
<span class="token keyword">public</span> <span class="token keyword">class</span> <span class="token class-name">RgbEntity</span> <span class="token punctuation">{</span>

    <span class="token annotation punctuation">@ApiModelProperty</span><span class="token punctuation">(</span>value <span class="token operator">=</span> <span class="token string">"英文代码"</span><span class="token punctuation">)</span>
    <span class="token annotation punctuation">@ExcelProperty</span><span class="token punctuation">(</span><span class="token string">"英文代码"</span><span class="token punctuation">)</span>
    <span class="token annotation punctuation">@ColumnWidth</span><span class="token punctuation">(</span><span class="token number">15</span><span class="token punctuation">)</span>
    <span class="token keyword">private</span> <span class="token class-name">String</span> engName<span class="token punctuation">;</span>

    <span class="token annotation punctuation">@ApiModelProperty</span><span class="token punctuation">(</span>value <span class="token operator">=</span> <span class="token string">"中文名"</span><span class="token punctuation">)</span>
    <span class="token annotation punctuation">@ExcelProperty</span><span class="token punctuation">(</span><span class="token string">"中文名"</span><span class="token punctuation">)</span>
    <span class="token annotation punctuation">@ColumnWidth</span><span class="token punctuation">(</span><span class="token number">15</span><span class="token punctuation">)</span>
    <span class="token keyword">private</span> <span class="token class-name">String</span> zhName<span class="token punctuation">;</span>

    <span class="token annotation punctuation">@ApiModelProperty</span><span class="token punctuation">(</span>value <span class="token operator">=</span> <span class="token string">"十六进制"</span><span class="token punctuation">)</span>
    <span class="token annotation punctuation">@ExcelProperty</span><span class="token punctuation">(</span><span class="token string">"十六进制"</span><span class="token punctuation">)</span>
    <span class="token annotation punctuation">@ColumnWidth</span><span class="token punctuation">(</span><span class="token number">15</span><span class="token punctuation">)</span>
    <span class="token keyword">private</span> <span class="token class-name">String</span> code<span class="token punctuation">;</span>

    <span class="token annotation punctuation">@ApiModelProperty</span><span class="token punctuation">(</span>value <span class="token operator">=</span> <span class="token string">"RGB颜色值"</span><span class="token punctuation">)</span>
    <span class="token annotation punctuation">@ExcelProperty</span><span class="token punctuation">(</span><span class="token string">"RGB颜色值"</span><span class="token punctuation">)</span>
    <span class="token annotation punctuation">@ColumnWidth</span><span class="token punctuation">(</span><span class="token number">15</span><span class="token punctuation">)</span>
    <span class="token keyword">private</span> <span class="token class-name">String</span> value<span class="token punctuation">;</span>
<span class="token punctuation">}</span>
</code></pre> 
  <p><code>ReptileDemo.class</code></p> 
  <pre><code class="prism language-java"><span class="token keyword">import</span> <span class="token import"><span class="token namespace">com<span class="token punctuation">.</span>alibaba<span class="token punctuation">.</span>excel<span class="token punctuation">.</span></span><span class="token class-name">EasyExcel</span></span><span class="token punctuation">;</span>
<span class="token keyword">import</span> <span class="token import"><span class="token namespace">com<span class="token punctuation">.</span>mike<span class="token punctuation">.</span>server<span class="token punctuation">.</span>system<span class="token punctuation">.</span>domain<span class="token punctuation">.</span>excel<span class="token punctuation">.</span></span><span class="token class-name">RgbEntity</span></span><span class="token punctuation">;</span>
<span class="token keyword">import</span> <span class="token import"><span class="token namespace">org<span class="token punctuation">.</span>jsoup<span class="token punctuation">.</span></span><span class="token class-name">Jsoup</span></span><span class="token punctuation">;</span>
<span class="token keyword">import</span> <span class="token import"><span class="token namespace">org<span class="token punctuation">.</span>jsoup<span class="token punctuation">.</span>nodes<span class="token punctuation">.</span></span><span class="token class-name">Document</span></span><span class="token punctuation">;</span>
<span class="token keyword">import</span> <span class="token import"><span class="token namespace">org<span class="token punctuation">.</span>jsoup<span class="token punctuation">.</span>nodes<span class="token punctuation">.</span></span><span class="token class-name">Element</span></span><span class="token punctuation">;</span>
<span class="token keyword">import</span> <span class="token import"><span class="token namespace">org<span class="token punctuation">.</span>jsoup<span class="token punctuation">.</span>select<span class="token punctuation">.</span></span><span class="token class-name">Elements</span></span><span class="token punctuation">;</span>
<span class="token keyword">import</span> <span class="token import"><span class="token namespace">java<span class="token punctuation">.</span>io<span class="token punctuation">.</span></span><span class="token class-name">File</span></span><span class="token punctuation">;</span>
<span class="token keyword">import</span> <span class="token import"><span class="token namespace">java<span class="token punctuation">.</span>io<span class="token punctuation">.</span></span><span class="token class-name">FileOutputStream</span></span><span class="token punctuation">;</span>
<span class="token keyword">import</span> <span class="token import"><span class="token namespace">java<span class="token punctuation">.</span>io<span class="token punctuation">.</span></span><span class="token class-name">OutputStream</span></span><span class="token punctuation">;</span>
<span class="token keyword">import</span> <span class="token import"><span class="token namespace">java<span class="token punctuation">.</span>net<span class="token punctuation">.</span></span><span class="token class-name">URL</span></span><span class="token punctuation">;</span>
<span class="token keyword">import</span> <span class="token import"><span class="token namespace">java<span class="token punctuation">.</span>util<span class="token punctuation">.</span></span><span class="token class-name">ArrayList</span></span><span class="token punctuation">;</span>
<span class="token keyword">import</span> <span class="token import"><span class="token namespace">java<span class="token punctuation">.</span>util<span class="token punctuation">.</span></span><span class="token class-name">List</span></span><span class="token punctuation">;</span>

<span class="token keyword">public</span> <span class="token keyword">class</span> <span class="token class-name">ReptileDemo</span> <span class="token punctuation">{</span>
    <span class="token keyword">public</span> <span class="token keyword">static</span> <span class="token keyword">void</span> <span class="token function">main</span><span class="token punctuation">(</span><span class="token class-name">String</span><span class="token punctuation">[</span><span class="token punctuation">]</span> args<span class="token punctuation">)</span> <span class="token keyword">throws</span> <span class="token class-name">Exception</span> <span class="token punctuation">{</span>
        <span class="token function">demo01</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span>

    <span class="token comment">/**
     * 案例一:将 RBG 颜色查询器页面中的数据抓取出来之后保存到 excel 表格中
     */</span>
    <span class="token keyword">public</span> <span class="token keyword">static</span> <span class="token keyword">void</span> <span class="token function">demo01</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token keyword">throws</span> <span class="token class-name">Exception</span> <span class="token punctuation">{</span>
        <span class="token comment">// 获取 document 文档</span>
        <span class="token class-name">Document</span> document <span class="token operator">=</span> <span class="token class-name">Jsoup</span><span class="token punctuation">.</span><span class="token function">parse</span><span class="token punctuation">(</span><span class="token keyword">new</span> <span class="token class-name">URL</span><span class="token punctuation">(</span><span class="token string">"https://www.rgbku.com/chaxun.html"</span><span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token number">1000</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token comment">// 通过 id = color 获取 tbody 元素</span>
        <span class="token class-name">Element</span> tbodyElement <span class="token operator">=</span> document<span class="token punctuation">.</span><span class="token function">getElementById</span><span class="token punctuation">(</span><span class="token string">"color"</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token comment">// 获取 tbody 下所有的 <tr> 标签</span>
        <span class="token keyword">assert</span> tbodyElement <span class="token operator">!=</span> <span class="token keyword">null</span><span class="token punctuation">;</span>
        <span class="token class-name">Elements</span> childrenElements <span class="token operator">=</span> tbodyElement<span class="token punctuation">.</span><span class="token function">children</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token comment">// 创建 RgbEntity 集合存放数据</span>
        <span class="token class-name">List</span><span class="token generics"><span class="token punctuation"><</span><span class="token class-name">RgbEntity</span><span class="token punctuation">></span></span> list <span class="token operator">=</span> <span class="token keyword">new</span> <span class="token class-name">ArrayList</span><span class="token generics"><span class="token punctuation"><</span><span class="token punctuation">></span></span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token comment">// 遍历</span>
        <span class="token keyword">for</span> <span class="token punctuation">(</span><span class="token class-name">Element</span> childrenElement <span class="token operator">:</span> childrenElements<span class="token punctuation">)</span> <span class="token punctuation">{</span>
            <span class="token class-name">String</span> tagName <span class="token operator">=</span> childrenElement<span class="token punctuation">.</span><span class="token function">tagName</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
            <span class="token class-name">String</span> align <span class="token operator">=</span> childrenElement<span class="token punctuation">.</span><span class="token function">attr</span><span class="token punctuation">(</span><span class="token string">"align"</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
            <span class="token comment">// 筛选出每一行的数据,剔除表头</span>
            <span class="token keyword">if</span> <span class="token punctuation">(</span><span class="token string">"tr"</span><span class="token punctuation">.</span><span class="token function">equals</span><span class="token punctuation">(</span>tagName<span class="token punctuation">)</span> <span class="token operator">&&</span> <span class="token operator">!</span><span class="token string">"center"</span><span class="token punctuation">.</span><span class="token function">equals</span><span class="token punctuation">(</span>align<span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token punctuation">{</span>
                <span class="token comment">// 获取每个单元格的数据</span>
                <span class="token class-name">Elements</span> tdElements <span class="token operator">=</span> childrenElement<span class="token punctuation">.</span><span class="token function">children</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
                <span class="token comment">/*
                 * 根据 html 源码分析可得:
                 *      (1)每个 <tr> 标签下都有 5 个 <td> 标签
                 *      (2)这五个 <td> 标签中的内容分别对应:颜色 英文代码 中文名 十六进制 RGB颜色值
                 */</span>
                <span class="token class-name">String</span> engName <span class="token operator">=</span> tdElements<span class="token punctuation">.</span><span class="token function">get</span><span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">)</span><span class="token punctuation">.</span><span class="token function">text</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
                <span class="token class-name">String</span> zhName <span class="token operator">=</span> tdElements<span class="token punctuation">.</span><span class="token function">get</span><span class="token punctuation">(</span><span class="token number">2</span><span class="token punctuation">)</span><span class="token punctuation">.</span><span class="token function">text</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
                <span class="token class-name">String</span> code <span class="token operator">=</span> tdElements<span class="token punctuation">.</span><span class="token function">get</span><span class="token punctuation">(</span><span class="token number">3</span><span class="token punctuation">)</span><span class="token punctuation">.</span><span class="token function">text</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
                <span class="token class-name">String</span> value <span class="token operator">=</span> tdElements<span class="token punctuation">.</span><span class="token function">get</span><span class="token punctuation">(</span><span class="token number">4</span><span class="token punctuation">)</span><span class="token punctuation">.</span><span class="token function">text</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
				<span class="token comment">// 添加到集合中</span>
                list<span class="token punctuation">.</span><span class="token function">add</span><span class="token punctuation">(</span><span class="token class-name">RgbEntity</span><span class="token punctuation">.</span><span class="token function">builder</span><span class="token punctuation">(</span><span class="token punctuation">)</span>
                        <span class="token punctuation">.</span><span class="token function">engName</span><span class="token punctuation">(</span>engName<span class="token punctuation">)</span>
                        <span class="token punctuation">.</span><span class="token function">zhName</span><span class="token punctuation">(</span>zhName<span class="token punctuation">)</span>
                        <span class="token punctuation">.</span><span class="token function">code</span><span class="token punctuation">(</span>code<span class="token punctuation">)</span>
                        <span class="token punctuation">.</span><span class="token function">value</span><span class="token punctuation">(</span>value<span class="token punctuation">)</span>
                        <span class="token punctuation">.</span><span class="token function">build</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
            <span class="token punctuation">}</span>
        <span class="token punctuation">}</span>

        <span class="token comment">// 写入到 excel 中</span>
        <span class="token class-name">File</span> file <span class="token operator">=</span> <span class="token keyword">new</span> <span class="token class-name">File</span><span class="token punctuation">(</span><span class="token string">"D:\\rgb.xlsx"</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token class-name">OutputStream</span> os <span class="token operator">=</span> <span class="token keyword">new</span> <span class="token class-name">FileOutputStream</span><span class="token punctuation">(</span>file<span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token class-name">EasyExcel</span><span class="token punctuation">.</span><span class="token function">write</span><span class="token punctuation">(</span>os<span class="token punctuation">,</span> <span class="token class-name">RgbEntity</span><span class="token punctuation">.</span><span class="token keyword">class</span><span class="token punctuation">)</span>
                <span class="token punctuation">.</span><span class="token function">sheet</span><span class="token punctuation">(</span><span class="token string">"Sheet1"</span><span class="token punctuation">)</span><span class="token punctuation">.</span><span class="token function">doWrite</span><span class="token punctuation">(</span>list<span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span>
<span class="token punctuation">}</span>
</code></pre> 
  <p>运行代码生成 <code>excel</code> 文件:</p> 
  <p><a href="http://img.e-com-net.com/image/info8/7feaaac1860e4c15935e8f4d8369a30e.jpg" target="_blank"><img src="http://img.e-com-net.com/image/info8/7feaaac1860e4c15935e8f4d8369a30e.jpg" alt="Java-网络爬虫(一)_第9张图片" width="650" height="260" style="border:1px solid black;"></a></p> 
  <hr> 
  <h3>2. 案例二</h3> 
  <p> </p> 
  <p>通过食品营养成分查询平台获取所有食品营养成分数据,并持久化到 <code>excel</code> 文件中</p> 
  <p><a href="http://img.e-com-net.com/image/info8/ee43b4616406495cb4c4607d754530a7.jpg" target="_blank"><img src="http://img.e-com-net.com/image/info8/ee43b4616406495cb4c4607d754530a7.jpg" alt="Java-网络爬虫(一)_第10张图片" width="650" height="390" style="border:1px solid black;"></a></p> 
  <p><a href="http://img.e-com-net.com/image/info8/c4687987c140416f85fc562309a9b29d.jpg" target="_blank"><img src="http://img.e-com-net.com/image/info8/c4687987c140416f85fc562309a9b29d.jpg" alt="Java-网络爬虫(一)_第11张图片" width="650" height="500" style="border:1px solid black;"></a></p> 
  <p>这个案例的代码就不太方便展示了,我就简单的说下实现的逻辑:</p> 
  <p><a href="http://img.e-com-net.com/image/info8/2eb2e79a345d4c8f808aa944a6c34a85.jpg" target="_blank"><img src="http://img.e-com-net.com/image/info8/2eb2e79a345d4c8f808aa944a6c34a85.jpg" alt="Java-网络爬虫(一)_第12张图片" width="650" height="375" style="border:1px solid black;"></a></p> 
  <p>通过分析 <code>HTML</code> 源码可知,从大类(一级分类)列表页面中可以获取到大类的名称和图片以及进入到类别(二级分类)列表页的 <code>URL</code>,在类别(二级分类)的页面中又可以获取到类别的名称和进入到食品列表页的 <code>URL</code>,在食品列表页中可以获取到食品的名称和进入到食品成分页面的 <code>URL</code>,在食品成分页中再拿到所有的成分数据。</p> 
  <p>实体类设计:</p> 
  <pre><code class="prism language-java"><span class="token keyword">import</span> <span class="token import"><span class="token namespace">lombok<span class="token punctuation">.</span></span><span class="token class-name">AllArgsConstructor</span></span><span class="token punctuation">;</span>
<span class="token keyword">import</span> <span class="token import"><span class="token namespace">lombok<span class="token punctuation">.</span></span><span class="token class-name">Builder</span></span><span class="token punctuation">;</span>
<span class="token keyword">import</span> <span class="token import"><span class="token namespace">lombok<span class="token punctuation">.</span></span><span class="token class-name">Data</span></span><span class="token punctuation">;</span>
<span class="token keyword">import</span> <span class="token import"><span class="token namespace">lombok<span class="token punctuation">.</span></span><span class="token class-name">NoArgsConstructor</span></span><span class="token punctuation">;</span>

<span class="token keyword">import</span> <span class="token import"><span class="token namespace">java<span class="token punctuation">.</span>util<span class="token punctuation">.</span></span><span class="token class-name">List</span></span><span class="token punctuation">;</span>

<span class="token annotation punctuation">@Data</span>
<span class="token annotation punctuation">@Builder</span>
<span class="token annotation punctuation">@NoArgsConstructor</span>
<span class="token annotation punctuation">@AllArgsConstructor</span>
<span class="token keyword">public</span> <span class="token keyword">class</span> <span class="token class-name">FoodInfo</span> <span class="token punctuation">{</span>

    <span class="token comment">/**
     * 大类集
     */</span>
    <span class="token class-name">List</span><span class="token generics"><span class="token punctuation"><</span><span class="token class-name">Category</span><span class="token punctuation">></span></span> categoryList<span class="token punctuation">;</span>

    <span class="token comment">/**
     * 大类(一级分类)
     */</span>
    <span class="token annotation punctuation">@Data</span>
    <span class="token annotation punctuation">@Builder</span>
    <span class="token annotation punctuation">@NoArgsConstructor</span>
    <span class="token annotation punctuation">@AllArgsConstructor</span>
    <span class="token keyword">public</span> <span class="token keyword">static</span> <span class="token keyword">class</span> <span class="token class-name">Category</span> <span class="token punctuation">{</span>
        <span class="token comment">// 大类名称</span>
        <span class="token keyword">private</span> <span class="token class-name">String</span> name<span class="token punctuation">;</span>
        <span class="token comment">// 大类图片</span>
        <span class="token keyword">private</span> <span class="token class-name">String</span> imageUrl<span class="token punctuation">;</span>
        <span class="token comment">// 类别集</span>
        <span class="token keyword">private</span> <span class="token class-name">List</span><span class="token generics"><span class="token punctuation"><</span><span class="token class-name">Type</span><span class="token punctuation">></span></span> typeList<span class="token punctuation">;</span>

        <span class="token comment">/**
         * 类别(二级分类)
         */</span>
        <span class="token annotation punctuation">@Data</span>
        <span class="token annotation punctuation">@Builder</span>
        <span class="token annotation punctuation">@NoArgsConstructor</span>
        <span class="token annotation punctuation">@AllArgsConstructor</span>
        <span class="token keyword">public</span> <span class="token keyword">static</span> <span class="token keyword">class</span> <span class="token class-name">Type</span> <span class="token punctuation">{</span>
            <span class="token comment">// 类别名称</span>
            <span class="token keyword">private</span> <span class="token class-name">String</span> name<span class="token punctuation">;</span>
            <span class="token comment">// 食物集</span>
            <span class="token keyword">private</span> <span class="token class-name">List</span><span class="token generics"><span class="token punctuation"><</span><span class="token class-name">Food</span><span class="token punctuation">></span></span> foodList<span class="token punctuation">;</span>

            <span class="token comment">/**
             * 食物
             */</span>
            <span class="token annotation punctuation">@Data</span>
            <span class="token annotation punctuation">@Builder</span>
            <span class="token annotation punctuation">@NoArgsConstructor</span>
            <span class="token annotation punctuation">@AllArgsConstructor</span>
            <span class="token keyword">public</span> <span class="token keyword">static</span> <span class="token keyword">class</span> <span class="token class-name">Food</span> <span class="token punctuation">{</span>
                <span class="token comment">// 食物名称</span>
                <span class="token keyword">private</span> <span class="token class-name">String</span> name<span class="token punctuation">;</span>
                <span class="token comment">// 组成成分集</span>
                <span class="token keyword">private</span> <span class="token class-name">List</span><span class="token generics"><span class="token punctuation"><</span><span class="token class-name">Component</span><span class="token punctuation">></span></span> componentList<span class="token punctuation">;</span>

                <span class="token comment">/**
                 * 组成成分
                 */</span>
                <span class="token annotation punctuation">@Data</span>
                <span class="token annotation punctuation">@Builder</span>
                <span class="token annotation punctuation">@NoArgsConstructor</span>
                <span class="token annotation punctuation">@AllArgsConstructor</span>
                <span class="token keyword">public</span> <span class="token keyword">static</span> <span class="token keyword">class</span> <span class="token class-name">Component</span> <span class="token punctuation">{</span>
                    <span class="token comment">// 营养素类型</span>
                    <span class="token keyword">private</span> <span class="token class-name">String</span> nutrientType<span class="token punctuation">;</span>
                    <span class="token comment">// 项目</span>
                    <span class="token keyword">private</span> <span class="token class-name">String</span> itemName<span class="token punctuation">;</span>
                    <span class="token comment">// 含量</span>
                    <span class="token keyword">private</span> <span class="token class-name">String</span> value<span class="token punctuation">;</span>
                    <span class="token comment">// 同类排名</span>
                    <span class="token keyword">private</span> <span class="token class-name">String</span> sort<span class="token punctuation">;</span>
                    <span class="token comment">// 同类均值</span>
                    <span class="token keyword">private</span> <span class="token class-name">String</span> avgValue<span class="token punctuation">;</span>
                <span class="token punctuation">}</span>
            <span class="token punctuation">}</span>
        <span class="token punctuation">}</span>
    <span class="token punctuation">}</span>
<span class="token punctuation">}</span>
</code></pre> 
  <p>这里要注意的首先是 <code>URL</code> 的拼接,因为链接是相对路径的形式,其次是要创建连接池避免资源的浪费,设置访问间隔的时间别把对方服务器搞崩溃。</p> 
  <hr> 
  <h2>四、总结</h2> 
  <p> </p> 
  <p>在实际应用中,爬虫可能需要处理一些挑战和限制,如动态网页、反爬虫机制、登录和验证码等。为了应对这些问题,爬虫可能需要使用代理、用户代理伪装、验证码识别等技术。</p> 
  <p>值得注意的是,尽管爬虫可以自动化地抓取网页,但在使用爬虫时,需要遵守法律法规和网站的使用规则,避免侵犯他人的权益或引起不良后果。</p> 
  <hr> 
  <p><strong>参考文献:</strong></p> 
  <p>爬虫协议:https://www.dotcpp.com/course/317</p> 
  <p>网络爬虫的法律规制:http://www.cac.gov.cn/2019-06/16/c_1124630015.htm?from=singlemessage</p> 
  <p>Java 爬虫之 JSoup 使用教程:https://my.oschina.net/suveng/blog/4796066</p> 
  <p>JSoup教程:https://www.yiibai.com/jsoup</p> 
 </div> 
</div>
                            </div>
                        </div>
                    </div>
                    <!--PC和WAP自适应版-->
                    <div id="SOHUCS" sid="1742720754677465088"></div>
                    <script type="text/javascript" src="/views/front/js/chanyan.js"></script>
                    <!-- 文章页-底部 动态广告位 -->
                    <div class="youdao-fixed-ad" id="detail_ad_bottom"></div>
                </div>
                <div class="col-md-3">
                    <div class="row" id="ad">
                        <!-- 文章页-右侧1 动态广告位 -->
                        <div id="right-1" class="col-lg-12 col-md-12 col-sm-4 col-xs-4 ad">
                            <div class="youdao-fixed-ad" id="detail_ad_1"> </div>
                        </div>
                        <!-- 文章页-右侧2 动态广告位 -->
                        <div id="right-2" class="col-lg-12 col-md-12 col-sm-4 col-xs-4 ad">
                            <div class="youdao-fixed-ad" id="detail_ad_2"></div>
                        </div>
                        <!-- 文章页-右侧3 动态广告位 -->
                        <div id="right-3" class="col-lg-12 col-md-12 col-sm-4 col-xs-4 ad">
                            <div class="youdao-fixed-ad" id="detail_ad_3"></div>
                        </div>
                    </div>
                </div>
            </div>
        </div>
    </div>
    <div class="container">
        <h4 class="pt20 mb15 mt0 border-top">你可能感兴趣的:(入门教程,日常积累,java,爬虫,开发语言)</h4>
        <div id="paradigm-article-related">
            <div class="recommend-post mb30">
                <ul class="widget-links">
                    <li><a href="/article/1950232820773351424.htm"
                           title="移动端城市区县二级联动选择功能实现包" target="_blank">移动端城市区县二级联动选择功能实现包</a>
                        <span class="text-muted">good2know</span>

                        <div>本文还有配套的精品资源,点击获取简介:本项目是一套为移动端设计的jQuery实现方案,用于简化用户在选择城市和区县时的流程。它包括所有必需文件:HTML、JavaScript、CSS及图片资源。通过动态更新下拉菜单选项,实现城市到区县的联动效果,支持数据异步加载。开发者可以轻松集成此功能到移动网站或应用,并可基于需求进行扩展和优化。1.jQuery移动端解决方案概述jQuery技术简介jQuery</div>
                    </li>
                    <li><a href="/article/1950228031117258752.htm"
                           title="深入解析JVM工作原理:从字节码到机器指令的全过程" target="_blank">深入解析JVM工作原理:从字节码到机器指令的全过程</a>
                        <span class="text-muted"></span>

                        <div>一、JVM概述Java虚拟机(JVM)是Java平台的核心组件,它实现了Java"一次编写,到处运行"的理念。JVM是一个抽象的计算机器,它有自己的指令集和运行时内存管理机制。JVM的主要职责:加载:读取.class文件并验证其正确性存储:管理内存分配和垃圾回收执行:解释或编译字节码为机器指令安全:提供沙箱环境限制恶意代码二、JVM架构详解JVM由三个主要子系统组成:1.类加载子系统类加载过程分为</div>
                    </li>
                    <li><a href="/article/1950226517397139456.htm"
                           title="JVM 内存模型深度解析:原子性、可见性与有序性的实现" target="_blank">JVM 内存模型深度解析:原子性、可见性与有序性的实现</a>
                        <span class="text-muted">练习时长两年半的程序员小胡</span>
<a class="tag" taget="_blank" href="/search/JVM/1.htm">JVM</a><a class="tag" taget="_blank" href="/search/%E6%B7%B1%E5%BA%A6%E5%89%96%E6%9E%90%EF%BC%9A%E4%BB%8E%E9%9D%A2%E8%AF%95%E8%80%83%E7%82%B9%E5%88%B0%E7%94%9F%E4%BA%A7%E5%AE%9E%E8%B7%B5/1.htm">深度剖析:从面试考点到生产实践</a><a class="tag" taget="_blank" href="/search/jvm/1.htm">jvm</a><a class="tag" taget="_blank" href="/search/java/1.htm">java</a><a class="tag" taget="_blank" href="/search/%E5%86%85%E5%AD%98%E6%A8%A1%E5%9E%8B/1.htm">内存模型</a>
                        <div>在了解了JVM的基础架构和类加载机制后,我们需要进一步探索Java程序在多线程环境下的内存交互规则。JVM内存模型(JavaMemoryModel,JMM)定义了线程和主内存之间的抽象关系,它通过规范共享变量的访问方式,解决了多线程并发时的数据一致性问题。本文将从内存模型的核心目标出发,详解原子性、可见性、有序性的实现机制,以及volatile、synchronized等关键字在其中的作用。一、J</div>
                    </li>
                    <li><a href="/article/1950225785054883840.htm"
                           title="Java | 多线程经典问题 - 售票" target="_blank">Java | 多线程经典问题 - 售票</a>
                        <span class="text-muted">Ada54</span>

                        <div>一、售票需求1)同一个票池2)多个窗口卖票,不能出售同一张票二、售票问题代码实现(线程与进程小总结,请戳:Java|线程和进程,创建线程)step1:定义SaleWindow类实现Runnable接口,覆盖run方法step2:实例化SaleWindow对象,创建Thread对象,将SaleWindow作为参数传给Thread类的构造函数,然后通过Thread.start()方法启动线程step3</div>
                    </li>
                    <li><a href="/article/1950225381961297920.htm"
                           title="SpringMVC的执行流程" target="_blank">SpringMVC的执行流程</a>
                        <span class="text-muted"></span>

                        <div>1、什么是MVCMVC是一种设计模式。MVC的原理图如下所示M-Model模型(完成业务逻辑:有javaBean构成,service+dao+entity)V-View视图(做界面的展示jsp,html……)C-Controller控制器(接收请求—>调用模型—>根据结果派发页面2、SpringMVC是什么SpringMVC是一个MVC的开源框架,SpringMVC=Struts2+Spring,</div>
                    </li>
                    <li><a href="/article/1950224616647618560.htm"
                           title="JAVA接口机结构解析" target="_blank">JAVA接口机结构解析</a>
                        <span class="text-muted">秃狼</span>
<a class="tag" taget="_blank" href="/search/SpringBoot/1.htm">SpringBoot</a><a class="tag" taget="_blank" href="/search/%E5%85%AB%E8%82%A1%E6%96%87/1.htm">八股文</a><a class="tag" taget="_blank" href="/search/Java/1.htm">Java</a><a class="tag" taget="_blank" href="/search/java/1.htm">java</a><a class="tag" taget="_blank" href="/search/%E5%AD%A6%E4%B9%A0/1.htm">学习</a>
                        <div>什么是接口机在Java项目中,接口机通常指用于与外部系统进行数据交互的中间层,负责处理请求和响应的转换、协议适配、数据格式转换等任务。接口机的结构我们的接口机的结构分为两个大部分,外部接口机和内部接口机,在业务的调度上也是通过mq来实现的,只要的目的就是为了解耦合和做差异化。在接口机中主要的方法就是定时任务,消息的发送和消费,其他平台调用接口机只能提供外部接口机的方法进行调用,外部接口机可以提供消</div>
                    </li>
                    <li><a href="/article/1950223497875746816.htm"
                           title="最新阿里四面面试真题46道:面试技巧+核心问题+面试心得" target="_blank">最新阿里四面面试真题46道:面试技巧+核心问题+面试心得</a>
                        <span class="text-muted">风平浪静如码</span>

                        <div>前言做技术的有一种资历,叫做通过了阿里的面试。这些阿里Java相关问题,都是之前通过不断优秀人才的铺垫总结的,先自己弄懂了再去阿里面试,不然就是去丢脸,被虐。希望对大家帮助,祝面试成功,有个更好的职业规划。一,阿里常见技术面1、微信红包怎么实现。2、海量数据分析。3、测试职位问的线程安全和非线程安全。4、HTTP2.0、thrift。5、面试电话沟通可能先让自我介绍。6、分布式事务一致性。7、ni</div>
                    </li>
                    <li><a href="/article/1950218946015719424.htm"
                           title="图论算法经典题目解析:DFS、BFS与拓扑排序实战" target="_blank">图论算法经典题目解析:DFS、BFS与拓扑排序实战</a>
                        <span class="text-muted">周童學</span>
<a class="tag" taget="_blank" href="/search/%E6%95%B0%E6%8D%AE%E7%BB%93%E6%9E%84%E4%B8%8E%E7%AE%97%E6%B3%95/1.htm">数据结构与算法</a><a class="tag" taget="_blank" href="/search/%E6%B7%B1%E5%BA%A6%E4%BC%98%E5%85%88/1.htm">深度优先</a><a class="tag" taget="_blank" href="/search/%E7%AE%97%E6%B3%95/1.htm">算法</a><a class="tag" taget="_blank" href="/search/%E5%9B%BE%E8%AE%BA/1.htm">图论</a>
                        <div>图论算法经典题目解析:DFS、BFS与拓扑排序实战图论问题是算法面试中的高频考点,本博客将通过四道LeetCode经典题目(均来自"Top100Liked"题库),深入讲解图论的核心算法思想和实现技巧。涵盖DFS、BFS、拓扑排序和前缀树等知识点,每道题配有Java实现和易错点分析。1.岛屿数量(DFS遍历)问题描述给定一个由'1'(陆地)和'0'(水)组成的二维网格,计算岛屿的数量。岛屿由水平或</div>
                    </li>
                    <li><a href="/article/1950218818781507584.htm"
                           title="【异常】使用 LiteFlow 框架时,提示错误ChainDuplicateException: [chain name duplicate] chainName=categoryChallenge" target="_blank">【异常】使用 LiteFlow 框架时,提示错误ChainDuplicateException: [chain name duplicate] chainName=categoryChallenge</a>
                        <span class="text-muted">本本本添哥</span>
<a class="tag" taget="_blank" href="/search/002/1.htm">002</a><a class="tag" taget="_blank" href="/search/-/1.htm">-</a><a class="tag" taget="_blank" href="/search/%E8%BF%9B%E9%98%B6%E5%BC%80%E5%8F%91%E8%83%BD%E5%8A%9B/1.htm">进阶开发能力</a><a class="tag" taget="_blank" href="/search/java/1.htm">java</a>
                        <div>一、报错内容Causedby:com.yomahub.liteflow.exception.ChainDuplicateException:[chainnameduplicate]chainName=categoryChallengeatcom.yomahub.liteflow.parser.helper.ParserHelper.lambda$null$0(ParserHelper.java:1</div>
                    </li>
                    <li><a href="/article/1950218314064130048.htm"
                           title="Java并发核心:线程池使用技巧与最佳实践! | 多线程篇(五)" target="_blank">Java并发核心:线程池使用技巧与最佳实践! | 多线程篇(五)</a>
                        <span class="text-muted">bug菌¹</span>
<a class="tag" taget="_blank" href="/search/Java%E5%AE%9E%E6%88%98%28%E8%BF%9B%E9%98%B6%E7%89%88%29/1.htm">Java实战(进阶版)</a><a class="tag" taget="_blank" href="/search/java/1.htm">java</a><a class="tag" taget="_blank" href="/search/Java%E9%9B%B6%E5%9F%BA%E7%A1%80%E5%85%A5%E9%97%A8/1.htm">Java零基础入门</a><a class="tag" taget="_blank" href="/search/Java%E5%B9%B6%E5%8F%91/1.htm">Java并发</a><a class="tag" taget="_blank" href="/search/%E7%BA%BF%E7%A8%8B%E6%B1%A0/1.htm">线程池</a><a class="tag" taget="_blank" href="/search/%E5%A4%9A%E7%BA%BF%E7%A8%8B%E7%AF%87/1.htm">多线程篇</a>
                        <div>本文收录于「Java进阶实战」专栏,专业攻坚指数级提升,希望能够助你一臂之力,帮你早日登顶实现财富自由;同时,欢迎大家关注&&收藏&&订阅!持续更新中,up!up!up!!环境说明:Windows10+IntelliJIDEA2021.3.2+Jdk1.8本文目录前言摘要正文何为线程池?为什么需要线程池?线程池的好处线程池使用场景如何创建线程池?线程池的常见配置源码解析案例分享案例代码演示案例运行</div>
                    </li>
                    <li><a href="/article/1950217936077647872.htm"
                           title="Java 队列" target="_blank">Java 队列</a>
                        <span class="text-muted">tryxr</span>
<a class="tag" taget="_blank" href="/search/java/1.htm">java</a><a class="tag" taget="_blank" href="/search/%E5%BC%80%E5%8F%91%E8%AF%AD%E8%A8%80/1.htm">开发语言</a><a class="tag" taget="_blank" href="/search/%E9%98%9F%E5%88%97/1.htm">队列</a>
                        <div>队列一般用什么哪种结构实现队列的特性数据入队列时一定是从尾部插入吗数据出队列时一定是从头部删除吗队列的基本运算有什么队列支持随机访问吗队列的英文表示什么是队列队列从哪进、从哪出队列的进出顺序队列是用哪种结构实现的Queue和Deque有什么区别Queue接口的方法Queue中的add与offer的区别offer、poll、peek的模拟实现如何利用链表实现队列如何利用顺序表实现队列什么叫做双端队列</div>
                    </li>
                    <li><a href="/article/1950215540215705600.htm"
                           title="JVM 内存分配与回收策略:从对象创建到内存释放的全流程" target="_blank">JVM 内存分配与回收策略:从对象创建到内存释放的全流程</a>
                        <span class="text-muted"></span>

                        <div>在JVM的运行机制中,内存分配与回收策略是连接对象生命周期与垃圾收集器的桥梁。它决定了对象在堆内存中的创建位置、存活过程中的区域迁移,以及最终被回收的时机。合理的内存分配策略能减少GC频率、降低停顿时间,是优化Java应用性能的核心环节。本文将系统解析JVM的内存分配规则、对象晋升机制,以及实战中的内存优化技巧。一、对象优先在Eden区分配:新生代的“临时缓冲区”大多数情况下,Java对象在新生代</div>
                    </li>
                    <li><a href="/article/1950214657335685120.htm"
                           title="代码随想录算法训练营第三十五天" target="_blank">代码随想录算法训练营第三十五天</a>
                        <span class="text-muted"></span>

                        <div>01背包问题二维题目链接01背包问题二维题解importjava.util.Scanner;publicclassMain{publicstaticvoidmain(String[]args){Scannersc=newScanner(System.in);intM=sc.nextInt();intN=sc.nextInt();int[]space=newint[M];int[]value=new</div>
                    </li>
                    <li><a href="/article/1950207097413103616.htm"
                           title="微信公众号回调java_处理微信公众号消息回调" target="_blank">微信公众号回调java_处理微信公众号消息回调</a>
                        <span class="text-muted">weixin_39607620</span>
<a class="tag" taget="_blank" href="/search/%E5%BE%AE%E4%BF%A1%E5%85%AC%E4%BC%97%E5%8F%B7%E5%9B%9E%E8%B0%83java/1.htm">微信公众号回调java</a>
                        <div>1、背景在上一节中,咱们知道如何接入微信公众号,可是以后公众号会与咱们进行交互,那么微信公众号如何通知到咱们本身的服务器呢?咱们知道咱们接入的时候提供的url是GET/mp/entry,那么公众号以后产生的事件将会以POST/mp/entry发送到咱们本身的服务器上。html2、代码实现,此处仍是使用weixin-java-mp这个框架实现一、引入weixin-java-mpcom.github.</div>
                    </li>
                    <li><a href="/article/1950200667587014656.htm"
                           title="学C++的五大惊人好处" target="_blank">学C++的五大惊人好处</a>
                        <span class="text-muted"></span>

                        <div>为什么要学c++学c++有什么用学习c++的好处有1.中考可以加分2.高考可能直接录取3.就业广且工资高4.在未来30--50年c++一定是一个很受欢迎的职业5.c++成功的例子deepsick等AI智能C++语言兼备编程效率和编译运行效率的语言C++语言是C语言功能增强版,在c语言的基础上添加了面向对象编程和泛型编程的支持既继承了C语言高效,简洁,快速和可移植的传统,又具备类似Java、Go等其</div>
                    </li>
                    <li><a href="/article/1950198522972270592.htm"
                           title="Java8 Stream流的sorted()的排序【正序、倒序、多字段排序】" target="_blank">Java8 Stream流的sorted()的排序【正序、倒序、多字段排序】</a>
                        <span class="text-muted">Tony666688888</span>
<a class="tag" taget="_blank" href="/search/java/1.htm">java</a><a class="tag" taget="_blank" href="/search/windows/1.htm">windows</a><a class="tag" taget="_blank" href="/search/%E5%BC%80%E5%8F%91%E8%AF%AD%E8%A8%80/1.htm">开发语言</a>
                        <div>针对集合排序,java8可以用Stream流的sorted()进行排序。示例Bean以下我们会使用这个Bean来做示例。publicclassOrder{privateStringweight;privateDoubleprice;privateStringdateStr;//忽略getter、setter、构造方法、toString}字段排序首先是比较器Comparator,形式如下:Compa</div>
                    </li>
                    <li><a href="/article/1950194742100815872.htm"
                           title="用代码生成艺术字:设计个性化海报的秘密" target="_blank">用代码生成艺术字:设计个性化海报的秘密</a>
                        <span class="text-muted"></span>

                        <div>本文围绕“用代码生成艺术字:设计个性化海报的秘密”展开,先概述代码生成艺术字在海报设计中的独特价值,接着介绍常用的代码工具(如HTML、CSS、JavaScript等),详细阐述从构思到实现的完整流程,包括字体样式设计、动态效果添加等,还分享了提升艺术字质感的技巧及实际案例。最后总结代码生成艺术字的优势,为设计师提供打造个性化海报的实用指南,助力提升海报设计的独特性与吸引力,符合搜索引擎SEO标准</div>
                    </li>
                    <li><a href="/article/1950194728943284224.htm"
                           title="java实习生40多天有感" target="_blank">java实习生40多天有感</a>
                        <span class="text-muted">别拿爱情当饭吃</span>

                        <div>从5月15日开始,我开始第一步步入社会,我今年大三,在一家上市互联网公司做一名实习生,主要做java后端开发。开始的时候,觉得公司的环境挺不错的,不过因为公司在CBD,所以隔壁的午饭和晚饭都要20+RMB,而且还吃不饱,这让我感觉挺郁闷的。一到下午,我就会犯困(因为饿)。因此,我又不得不买一些干粮在公司屯着。关于技术,有一个比较大的项目在需求调研当中,我们做实习生,就是辅助项目经理,测试功能,并且</div>
                    </li>
                    <li><a href="/article/1950183016382918656.htm"
                           title="大学生入门:初识方法及其易踩坑的点" target="_blank">大学生入门:初识方法及其易踩坑的点</a>
                        <span class="text-muted"></span>

                        <div>在java学习过程中,我们不难发现有很多重复使用的功能代码块,每次使用如果都要重新写一遍,岂不是很麻烦,就算是“cv”大法,感觉也不是很方便,那么,有什么办法可以解决这个问题呢?方法!java中,一段可重用的,用于执行特定功能的代码块叫做方法,它可以接收参数、返回结果,并且可以被多次使用。一、方法的基本结构[修饰符]返回值类型方法名([参数列表])[throws异常类型]{//方法体}[throw</div>
                    </li>
                    <li><a href="/article/1950181126731526144.htm"
                           title="[Ljava.lang.Object; cannot be cast to [Ljava.lang.String;" target="_blank">[Ljava.lang.Object; cannot be cast to [Ljava.lang.String;</a>
                        <span class="text-muted">这些不会的</span>

                        <div>解释:这个错误是很常见的错误,错误的提示已经很清楚了就是java的Object数组不能转换成为String[]数组,这就说明你要转换的数组它本身是Object类型的数组,但是你却非要把它转换为String类的数组,这当然是错误的。示例:[java]viewplaincopypackagecom.dada;importjava.util.ArrayList;importjava.util.List;</div>
                    </li>
                    <li><a href="/article/1950180118609588224.htm"
                           title="HikariCP调试日志深度解析:生产环境故障排查完全指南" target="_blank">HikariCP调试日志深度解析:生产环境故障排查完全指南</a>
                        <span class="text-muted"></span>

                        <div>HikariCP调试日志深度解析:生产环境故障排查完全指南更新时间:2025年7月4日|作者:资深架构师|适用版本:HikariCP5.x+|难度等级:中高级前言在生产环境中,数据库连接池往往是系统性能的关键瓶颈。HikariCP作为当前最流行的Java连接池,其调试日志包含了丰富的运行时信息,能够帮助我们快速定位和解决各种连接池相关问题。本文将深入解析HikariCP的日志体系,提供一套完整的故</div>
                    </li>
                    <li><a href="/article/1950179866523529216.htm"
                           title="大学社团管理系统(11831)" target="_blank">大学社团管理系统(11831)</a>
                        <span class="text-muted">codercode2022</span>
<a class="tag" taget="_blank" href="/search/java/1.htm">java</a><a class="tag" taget="_blank" href="/search/spring/1.htm">spring</a><a class="tag" taget="_blank" href="/search/boot/1.htm">boot</a><a class="tag" taget="_blank" href="/search/spring/1.htm">spring</a><a class="tag" taget="_blank" href="/search/echarts/1.htm">echarts</a><a class="tag" taget="_blank" href="/search/spring/1.htm">spring</a><a class="tag" taget="_blank" href="/search/cloud/1.htm">cloud</a><a class="tag" taget="_blank" href="/search/sentinel/1.htm">sentinel</a><a class="tag" taget="_blank" href="/search/java-rocketmq/1.htm">java-rocketmq</a>
                        <div>有需要的同学,源代码和配套文档领取,加文章最下方的名片哦一、项目演示项目演示视频二、资料介绍完整源代码(前后端源代码+SQL脚本)配套文档(LW+PPT+开题报告)远程调试控屏包运行三、技术介绍Java语言SSM框架SpringBoot框架Vue框架JSP页面Mysql数据库IDEA/Eclipse开发四、项目截图有需要的同学,源代码和配套文档领取,加文章最下方的名片哦!</div>
                    </li>
                    <li><a href="/article/1950178809030438912.htm"
                           title="今年校招竞争真激烈" target="_blank">今年校招竞争真激烈</a>
                        <span class="text-muted">12_05</span>

                        <div>程序员满大街,都要找不到工作了。即使人工智能满大街,我也后悔当初没学机器学习,后悔当初没学Java。C++真难找工作。难道毕了业就失业吗?好担心!</div>
                    </li>
                    <li><a href="/article/1950177847956008960.htm"
                           title="【免费下载】 Aspose for Java:解锁无水印、无限制的文档处理能力" target="_blank">【免费下载】 Aspose for Java:解锁无水印、无限制的文档处理能力</a>
                        <span class="text-muted">房征劲Kendall</span>

                        <div>AsposeforJava:解锁无水印、无限制的文档处理能力【下载地址】AsposeforJava-去除水印和数量限制AsposeforJava-去除水印和数量限制Aspose是一个著名的文档处理库,专为Java应用程序设计,支持多种文档格式的操作,如Word、Excel、PDF等项目地址:https://gitcode.com/open-source-toolkit/56c82项目介绍在现代企业</div>
                    </li>
                    <li><a href="/article/1950177721669709824.htm"
                           title="微服务日志追踪,Skywalking接入TraceId功能" target="_blank">微服务日志追踪,Skywalking接入TraceId功能</a>
                        <span class="text-muted">Victor刘</span>
<a class="tag" taget="_blank" href="/search/%E5%BE%AE%E6%9C%8D%E5%8A%A1/1.htm">微服务</a><a class="tag" taget="_blank" href="/search/skywalking/1.htm">skywalking</a><a class="tag" taget="_blank" href="/search/java/1.htm">java</a>
                        <div>文章目录一、借助skywalking追加traceIdlogbacklog4j2效果二、让skywalking显示日志内容版本差异logback配置文件log4j2配置文件一、借助skywalking追加traceId背景:在微服务或多副本中难以观察一个链路的日志,需要通过唯一traceId标识来查找,下面介绍Skywalking-traceId在Java中的配置方法。介绍两种java日志的配置方</div>
                    </li>
                    <li><a href="/article/1950175452580605952.htm"
                           title="Gerapy爬虫管理框架深度解析:企业级分布式爬虫管控平台" target="_blank">Gerapy爬虫管理框架深度解析:企业级分布式爬虫管控平台</a>
                        <span class="text-muted">Python×CATIA工业智造</span>
<a class="tag" taget="_blank" href="/search/%E7%88%AC%E8%99%AB/1.htm">爬虫</a><a class="tag" taget="_blank" href="/search/%E5%88%86%E5%B8%83%E5%BC%8F/1.htm">分布式</a><a class="tag" taget="_blank" href="/search/python/1.htm">python</a><a class="tag" taget="_blank" href="/search/pycharm/1.htm">pycharm</a>
                        <div>引言:爬虫工程化的必然选择随着企业数据采集需求指数级增长,传统单点爬虫管理模式面临三重困境:管理效率瓶颈:手动部署耗时占开发总时长的40%以上系统可靠性低:研究显示超过65%的爬虫故障源于部署或调度错误资源利用率差:平均爬虫服务器CPU利用率不足30%爬虫管理方案对比:┌───────────────┬─────────────┬───────────┬───────────┬──────────</div>
                    </li>
                    <li><a href="/article/1950169524384886784.htm"
                           title="【Java Web实战】从零到一打造企业级网上购书网站系统 | 完整开发实录(三)" target="_blank">【Java Web实战】从零到一打造企业级网上购书网站系统 | 完整开发实录(三)</a>
                        <span class="text-muted">笙囧同学</span>
<a class="tag" taget="_blank" href="/search/java/1.htm">java</a><a class="tag" taget="_blank" href="/search/%E5%89%8D%E7%AB%AF/1.htm">前端</a><a class="tag" taget="_blank" href="/search/%E7%8A%B6%E6%80%81%E6%A8%A1%E5%BC%8F/1.htm">状态模式</a>
                        <div>核心功能设计用户管理系统用户管理是整个系统的基础,我设计了完整的用户生命周期管理:用户注册流程验证失败验证通过验证失败验证通过用户名已存在用户名可用失败成功用户访问注册页面填写注册信息前端表单验证显示错误提示提交到后端后端数据验证返回错误信息用户名唯一性检查提示用户名重复密码加密处理保存用户信息保存成功?显示系统错误注册成功跳转登录页面登录认证机制深度解析我实现了一套企业级的多层次安全认证机制:认</div>
                    </li>
                    <li><a href="/article/1950160194403102720.htm"
                           title="Java:数据结构-ArrayList和顺序表(2)" target="_blank">Java:数据结构-ArrayList和顺序表(2)</a>
                        <span class="text-muted">blammmp</span>
<a class="tag" taget="_blank" href="/search/java/1.htm">java</a><a class="tag" taget="_blank" href="/search/%E6%95%B0%E6%8D%AE%E7%BB%93%E6%9E%84/1.htm">数据结构</a><a class="tag" taget="_blank" href="/search/%E5%BC%80%E5%8F%91%E8%AF%AD%E8%A8%80/1.htm">开发语言</a>
                        <div>一ArrayList的使用1.ArrayList的构造方法第一种(指定容量的构造方法)创建一个空的ArrayList,指定容量为initialCapacity。publicArrayList(intinitialCapacity){if(initialCapacity>0){this.elementData=newObject[initialCapacity];}elseif(initialCap</div>
                    </li>
                    <li><a href="/article/1950155533302427648.htm"
                           title="CMS垃圾回收器和G1垃圾回收器区别_g1cms垃圾回收器区别" target="_blank">CMS垃圾回收器和G1垃圾回收器区别_g1cms垃圾回收器区别</a>
                        <span class="text-muted">2401_89191885</span>
<a class="tag" taget="_blank" href="/search/jvm/1.htm">jvm</a>
                        <div>该类所有的实例都已经被回收,也就是Java堆中不存在该类的任何实例;加载该类的ClassLoader已经被回收;该类对应的java.lang.Class对象没有在任何地方被引用,无法在任何地方通过反射访问该类的方法。3.常见的垃圾回收算法1、Mark-Sweep(标记-清除算法):(1)思想:标记清除算法分为两个阶段,标记阶段和清除阶段。标记阶段任务是标记出所有需要回收的对象,清除阶段就是清除被标</div>
                    </li>
                    <li><a href="/article/1950154524572315648.htm"
                           title="每日面试题15:如何解决堆溢出?" target="_blank">每日面试题15:如何解决堆溢出?</a>
                        <span class="text-muted">℡余晖^</span>
<a class="tag" taget="_blank" href="/search/%E6%AF%8F%E6%97%A5%E9%9D%A2%E8%AF%95%E9%A2%98/1.htm">每日面试题</a><a class="tag" taget="_blank" href="/search/python/1.htm">python</a><a class="tag" taget="_blank" href="/search/%E5%BC%80%E5%8F%91%E8%AF%AD%E8%A8%80/1.htm">开发语言</a>
                        <div>在Java应用运行过程中,"java.lang.OutOfMemoryError:Javaheapspace"是最常见的错误之一。无论是高并发的电商大促场景,还是持续运行的后台服务,堆内存溢出都可能导致服务不可用、数据丢失,甚至引发系统崩溃。本文将结合实际排查经验,系统讲解堆溢出的底层逻辑、应急处理流程及长效预防策略。一、堆溢出的本质:内存分配的"收支失衡"Java堆是JVM管理的内存区域,用于存</div>
                    </li>
                                <li><a href="/article/77.htm"
                                       title="算法 单链的创建与删除" target="_blank">算法 单链的创建与删除</a>
                                    <span class="text-muted">换个号韩国红果果</span>
<a class="tag" taget="_blank" href="/search/c/1.htm">c</a><a class="tag" taget="_blank" href="/search/%E7%AE%97%E6%B3%95/1.htm">算法</a>
                                    <div>
先创建结构体
struct student {
	int data;
	//int tag;//标记这是第几个
	struct student *next;
};
//  addone 用于将一个数插入已从小到大排好序的链中
struct student *addone(struct student *h,int x){
		if(h==NULL)  //??????
			</div>
                                </li>
                                <li><a href="/article/204.htm"
                                       title="《大型网站系统与Java中间件实践》第2章读后感" target="_blank">《大型网站系统与Java中间件实践》第2章读后感</a>
                                    <span class="text-muted">白糖_</span>
<a class="tag" taget="_blank" href="/search/java%E4%B8%AD%E9%97%B4%E4%BB%B6/1.htm">java中间件</a>
                                    <div>       断断续续花了两天时间试读了《大型网站系统与Java中间件实践》的第2章,这章总述了从一个小型单机构建的网站发展到大型网站的演化过程---整个过程会遇到很多困难,但每一个屏障都会有解决方案,最终就是依靠这些个解决方案汇聚到一起组成了一个健壮稳定高效的大型系统。 
  
       看完整章内容,</div>
                                </li>
                                <li><a href="/article/331.htm"
                                       title="zeus持久层spring事务单元测试" target="_blank">zeus持久层spring事务单元测试</a>
                                    <span class="text-muted">deng520159</span>
<a class="tag" taget="_blank" href="/search/java/1.htm">java</a><a class="tag" taget="_blank" href="/search/DAO/1.htm">DAO</a><a class="tag" taget="_blank" href="/search/spring/1.htm">spring</a><a class="tag" taget="_blank" href="/search/jdbc/1.htm">jdbc</a>
                                    <div>今天把zeus事务单元测试放出来,让大家指出他的毛病, 
1.ZeusTransactionTest.java 单元测试 
  
package com.dengliang.zeus.webdemo.test;

import java.util.ArrayList;
import java.util.List;

import org.junit.Test;
import </div>
                                </li>
                                <li><a href="/article/458.htm"
                                       title="Rss 订阅 开发" target="_blank">Rss 订阅 开发</a>
                                    <span class="text-muted">周凡杨</span>
<a class="tag" taget="_blank" href="/search/html/1.htm">html</a><a class="tag" taget="_blank" href="/search/xml/1.htm">xml</a><a class="tag" taget="_blank" href="/search/%E8%AE%A2%E9%98%85/1.htm">订阅</a><a class="tag" taget="_blank" href="/search/rss/1.htm">rss</a><a class="tag" taget="_blank" href="/search/%E8%A7%84%E8%8C%83/1.htm">规范</a>
                                    <div>  
              RSS是 Really Simple Syndication的缩写(对rss2.0而言,是这三个词的缩写,对rss1.0而言则是RDF Site Summary的缩写,1.0与2.0走的是两个体系)。 
  
RSS</div>
                                </li>
                                <li><a href="/article/585.htm"
                                       title="分页查询实现" target="_blank">分页查询实现</a>
                                    <span class="text-muted">g21121</span>
<a class="tag" taget="_blank" href="/search/%E5%88%86%E9%A1%B5%E6%9F%A5%E8%AF%A2/1.htm">分页查询</a>
                                    <div>在查询列表时我们常常会用到分页,分页的好处就是减少数据交换,每次查询一定数量减少数据库压力等等。 
按实现形式分前台分页和服务器分页: 
前台分页就是一次查询出所有记录,在页面中用js进行虚拟分页,这种形式在数据量较小时优势比较明显,一次加载就不必再访问服务器了,但当数据量较大时会对页面造成压力,传输速度也会大幅下降。 
服务器分页就是每次请求相同数量记录,按一定规则排序,每次取一定序号直接的数据</div>
                                </li>
                                <li><a href="/article/712.htm"
                                       title="spring jms异步消息处理" target="_blank">spring jms异步消息处理</a>
                                    <span class="text-muted">510888780</span>
<a class="tag" taget="_blank" href="/search/jms/1.htm">jms</a>
                                    <div>spring JMS对于异步消息处理基本上只需配置下就能进行高效的处理。其核心就是消息侦听器容器,常用的类就是DefaultMessageListenerContainer。该容器可配置侦听器的并发数量,以及配合MessageListenerAdapter使用消息驱动POJO进行消息处理。且消息驱动POJO是放入TaskExecutor中进行处理,进一步提高性能,减少侦听器的阻塞。具体配置如下: </div>
                                </li>
                                <li><a href="/article/839.htm"
                                       title="highCharts柱状图" target="_blank">highCharts柱状图</a>
                                    <span class="text-muted">布衣凌宇</span>
<a class="tag" taget="_blank" href="/search/hightCharts/1.htm">hightCharts</a><a class="tag" taget="_blank" href="/search/%E6%9F%B1%E5%9B%BE/1.htm">柱图</a>
                                    <div>第一步:导入 exporting.js,grid.js,highcharts.js;第二步:写controller 
  
@Controller@RequestMapping(value="${adminPath}/statistick")public class StatistickController {  private UserServi</div>
                                </li>
                                <li><a href="/article/966.htm"
                                       title="我的spring学习笔记2-IoC(反向控制 依赖注入)" target="_blank">我的spring学习笔记2-IoC(反向控制 依赖注入)</a>
                                    <span class="text-muted">aijuans</span>
<a class="tag" taget="_blank" href="/search/spring/1.htm">spring</a><a class="tag" taget="_blank" href="/search/mvc/1.htm">mvc</a><a class="tag" taget="_blank" href="/search/Spring+%E6%95%99%E7%A8%8B/1.htm">Spring 教程</a><a class="tag" taget="_blank" href="/search/spring3+%E6%95%99%E7%A8%8B/1.htm">spring3 教程</a><a class="tag" taget="_blank" href="/search/Spring+%E5%85%A5%E9%97%A8/1.htm">Spring 入门</a>
                                    <div>IoC(反向控制 依赖注入)这是Spring提出来了,这也是Spring一大特色。这里我不用多说,我们看Spring教程就可以了解。当然我们不用Spring也可以用IoC,下面我将介绍不用Spring的IoC。 
IoC不是框架,她是java的技术,如今大多数轻量级的容器都会用到IoC技术。这里我就用一个例子来说明: 
如:程序中有 Mysql.calss 、Oracle.class 、SqlSe</div>
                                </li>
                                <li><a href="/article/1093.htm"
                                       title="TLS java简单实现" target="_blank">TLS java简单实现</a>
                                    <span class="text-muted">antlove</span>
<a class="tag" taget="_blank" href="/search/java/1.htm">java</a><a class="tag" taget="_blank" href="/search/ssl/1.htm">ssl</a><a class="tag" taget="_blank" href="/search/keystore/1.htm">keystore</a><a class="tag" taget="_blank" href="/search/tls/1.htm">tls</a><a class="tag" taget="_blank" href="/search/secure/1.htm">secure</a>
                                    <div>  
1. SSLServer.java 
package ssl;

import java.io.FileInputStream;
import java.io.InputStream;
import java.net.ServerSocket;
import java.net.Socket;
import java.security.KeyStore;
import </div>
                                </li>
                                <li><a href="/article/1220.htm"
                                       title="Zip解压压缩文件" target="_blank">Zip解压压缩文件</a>
                                    <span class="text-muted">百合不是茶</span>
<a class="tag" taget="_blank" href="/search/Zip%E6%A0%BC%E5%BC%8F%E8%A7%A3%E5%8E%8B/1.htm">Zip格式解压</a><a class="tag" taget="_blank" href="/search/Zip%E6%B5%81%E7%9A%84%E4%BD%BF%E7%94%A8/1.htm">Zip流的使用</a><a class="tag" taget="_blank" href="/search/%E6%96%87%E4%BB%B6%E8%A7%A3%E5%8E%8B/1.htm">文件解压</a>
                                    <div>  
 ZIP文件的解压缩实质上就是从输入流中读取数据。Java.util.zip包提供了类ZipInputStream来读取ZIP文件,下面的代码段创建了一个输入流来读取ZIP格式的文件; 
ZipInputStream in = new ZipInputStream(new FileInputStream(zipFileName)); 
  
  
&n</div>
                                </li>
                                <li><a href="/article/1347.htm"
                                       title="underscore.js 学习(一)" target="_blank">underscore.js 学习(一)</a>
                                    <span class="text-muted">bijian1013</span>
<a class="tag" taget="_blank" href="/search/JavaScript/1.htm">JavaScript</a><a class="tag" taget="_blank" href="/search/underscore/1.htm">underscore</a>
                                    <div>        工作中需要用到underscore.js,发现这是一个包括了很多基本功能函数的js库,里面有很多实用的函数。而且它没有扩展 javascript的原生对象。主要涉及对Collection、Object、Array、Function的操作。       学</div>
                                </li>
                                <li><a href="/article/1474.htm"
                                       title="java jvm常用命令工具——jstatd命令(Java Statistics Monitoring Daemon)" target="_blank">java jvm常用命令工具——jstatd命令(Java Statistics Monitoring Daemon)</a>
                                    <span class="text-muted">bijian1013</span>
<a class="tag" taget="_blank" href="/search/java/1.htm">java</a><a class="tag" taget="_blank" href="/search/jvm/1.htm">jvm</a><a class="tag" taget="_blank" href="/search/jstatd/1.htm">jstatd</a>
                                    <div>1.介绍 
        jstatd是一个基于RMI(Remove Method Invocation)的服务程序,它用于监控基于HotSpot的JVM中资源的创建及销毁,并且提供了一个远程接口允许远程的监控工具连接到本地的JVM执行命令。 
        jstatd是基于RMI的,所以在运行jstatd的服务</div>
                                </li>
                                <li><a href="/article/1601.htm"
                                       title="【Spring框架三】Spring常用注解之Transactional" target="_blank">【Spring框架三】Spring常用注解之Transactional</a>
                                    <span class="text-muted">bit1129</span>
<a class="tag" taget="_blank" href="/search/transactional/1.htm">transactional</a>
                                    <div>Spring可以通过注解@Transactional来为业务逻辑层的方法(调用DAO完成持久化动作)添加事务能力,如下是@Transactional注解的定义: 
  
/*
 * Copyright 2002-2010 the original author or authors.
 *
 * Licensed under the Apache License, Version </div>
                                </li>
                                <li><a href="/article/1728.htm"
                                       title="我(程序员)的前进方向" target="_blank">我(程序员)的前进方向</a>
                                    <span class="text-muted">bitray</span>
<a class="tag" taget="_blank" href="/search/%E7%A8%8B%E5%BA%8F%E5%91%98/1.htm">程序员</a>
                                    <div>作为一个普通的程序员,我一直游走在java语言中,java也确实让我有了很多的体会.不过随着学习的深入,java语言的新技术产生的越来越多,从最初期的javase,我逐渐开始转变到ssh,ssi,这种主流的码农,.过了几天为了解决新问题,webservice的大旗也被我祭出来了,又过了些日子jms架构的activemq也开始必须学习了.再后来开始了一系列技术学习,osgi,restful.....</div>
                                </li>
                                <li><a href="/article/1855.htm"
                                       title="nginx lua开发经验总结" target="_blank">nginx lua开发经验总结</a>
                                    <span class="text-muted">ronin47</span>

                                    <div>使用nginx lua已经两三个月了,项目接开发完毕了,这几天准备上线并且跟高德地图对接。回顾下来lua在项目中占得必中还是比较大的,跟PHP的占比差不多持平了,因此在开发中遇到一些问题备忘一下  1:content_by_lua中代码容量有限制,一般不要写太多代码,正常编写代码一般在100行左右(具体容量没有细心测哈哈,在4kb左右),如果超出了则重启nginx的时候会报 too long pa</div>
                                </li>
                                <li><a href="/article/1982.htm"
                                       title="java-66-用递归颠倒一个栈。例如输入栈{1,2,3,4,5},1在栈顶。颠倒之后的栈为{5,4,3,2,1},5处在栈顶" target="_blank">java-66-用递归颠倒一个栈。例如输入栈{1,2,3,4,5},1在栈顶。颠倒之后的栈为{5,4,3,2,1},5处在栈顶</a>
                                    <span class="text-muted">bylijinnan</span>
<a class="tag" taget="_blank" href="/search/java/1.htm">java</a>
                                    <div>
import java.util.Stack;

public class ReverseStackRecursive {

	/**
	 * Q 66.颠倒栈。
	 * 题目:用递归颠倒一个栈。例如输入栈{1,2,3,4,5},1在栈顶。
	 * 颠倒之后的栈为{5,4,3,2,1},5处在栈顶。
	 *1. Pop the top element
	 *2. Revers</div>
                                </li>
                                <li><a href="/article/2109.htm"
                                       title="正确理解Linux内存占用过高的问题" target="_blank">正确理解Linux内存占用过高的问题</a>
                                    <span class="text-muted">cfyme</span>
<a class="tag" taget="_blank" href="/search/linux/1.htm">linux</a>
                                    <div>Linux开机后,使用top命令查看,4G物理内存发现已使用的多大3.2G,占用率高达80%以上: 
Mem:   3889836k total,  3341868k used,   547968k free,   286044k buffers 
Swap:  6127608k total,&nb</div>
                                </li>
                                <li><a href="/article/2236.htm"
                                       title="[JWFD开源工作流]当前流程引擎设计的一个急需解决的问题" target="_blank">[JWFD开源工作流]当前流程引擎设计的一个急需解决的问题</a>
                                    <span class="text-muted">comsci</span>
<a class="tag" taget="_blank" href="/search/%E5%B7%A5%E4%BD%9C%E6%B5%81/1.htm">工作流</a>
                                    <div> 
 
     当我们的流程引擎进入IRC阶段的时候,当循环反馈模型出现之后,每次循环都会导致一大堆节点内存数据残留在系统内存中,循环的次数越多,这些残留数据将导致系统内存溢出,并使得引擎崩溃。。。。。。 
 
      而解决办法就是利用汇编语言或者其它系统编程语言,在引擎运行时,把这些残留数据清除掉。</div>
                                </li>
                                <li><a href="/article/2363.htm"
                                       title="自定义类的equals函数" target="_blank">自定义类的equals函数</a>
                                    <span class="text-muted">dai_lm</span>
<a class="tag" taget="_blank" href="/search/equals/1.htm">equals</a>
                                    <div>仅作笔记使用 
 

public class VectorQueue {

	private final Vector<VectorItem> queue;

	private class VectorItem {
		private final Object item;
		private final int quantity;

		public VectorI</div>
                                </li>
                                <li><a href="/article/2490.htm"
                                       title="Linux下安装R语言" target="_blank">Linux下安装R语言</a>
                                    <span class="text-muted">datageek</span>
<a class="tag" taget="_blank" href="/search/R%E8%AF%AD%E8%A8%80+linux/1.htm">R语言 linux</a>
                                    <div>命令如下:sudo gedit  /etc/apt/sources.list1、deb http://mirrors.ustc.edu.cn/CRAN/bin/linux/ubuntu/ precise/ 2、deb http://dk.archive.ubuntu.com/ubuntu hardy universesudo apt-key adv --keyserver ke</div>
                                </li>
                                <li><a href="/article/2617.htm"
                                       title="如何修改mysql 并发数(连接数)最大值" target="_blank">如何修改mysql 并发数(连接数)最大值</a>
                                    <span class="text-muted">dcj3sjt126com</span>
<a class="tag" taget="_blank" href="/search/mysql/1.htm">mysql</a>
                                    <div>MySQL的连接数最大值跟MySQL没关系,主要看系统和业务逻辑了 
  
方法一:进入MYSQL安装目录 打开MYSQL配置文件 my.ini 或 my.cnf查找 max_connections=100 修改为 max_connections=1000 服务里重起MYSQL即可 
  方法二:MySQL的最大连接数默认是100客户端登录:mysql -uusername -ppass</div>
                                </li>
                                <li><a href="/article/2744.htm"
                                       title="单一功能原则" target="_blank">单一功能原则</a>
                                    <span class="text-muted">dcj3sjt126com</span>
<a class="tag" taget="_blank" href="/search/%E9%9D%A2%E5%90%91%E5%AF%B9%E8%B1%A1%E7%9A%84%E7%A8%8B%E5%BA%8F%E8%AE%BE%E8%AE%A1/1.htm">面向对象的程序设计</a><a class="tag" taget="_blank" href="/search/%E8%BD%AF%E4%BB%B6%E8%AE%BE%E8%AE%A1/1.htm">软件设计</a><a class="tag" taget="_blank" href="/search/%E7%BC%96%E7%A8%8B%E5%8E%9F%E5%88%99/1.htm">编程原则</a>
                                    <div>单一功能原则[
编辑]            
SOLID    原则    
 
 单一功能原则 
 开闭原则 
 Liskov代换原则 
 接口隔离原则 
 依赖反转原则 
      
 
 查   
 论   
 编 
      
在面向对象编程领域中,单一功能原则(Single responsibility principle)规定每个类都应该有</div>
                                </li>
                                <li><a href="/article/2871.htm"
                                       title="POJO、VO和JavaBean区别和联系" target="_blank">POJO、VO和JavaBean区别和联系</a>
                                    <span class="text-muted">fanmingxing</span>
<a class="tag" taget="_blank" href="/search/VO/1.htm">VO</a><a class="tag" taget="_blank" href="/search/POJO/1.htm">POJO</a><a class="tag" taget="_blank" href="/search/javabean/1.htm">javabean</a>
                                    <div>POJO和JavaBean是我们常见的两个关键字,一般容易混淆,POJO全称是Plain Ordinary Java Object / Plain Old Java Object,中文可以翻译成:普通Java类,具有一部分getter/setter方法的那种类就可以称作POJO,但是JavaBean则比POJO复杂很多,JavaBean是一种组件技术,就好像你做了一个扳子,而这个扳子会在很多地方被</div>
                                </li>
                                <li><a href="/article/2998.htm"
                                       title="SpringSecurity3.X--LDAP:AD配置" target="_blank">SpringSecurity3.X--LDAP:AD配置</a>
                                    <span class="text-muted">hanqunfeng</span>
<a class="tag" taget="_blank" href="/search/SpringSecurity/1.htm">SpringSecurity</a>
                                    <div>前面介绍过基于本地数据库验证的方式,参考http://hanqunfeng.iteye.com/blog/1155226,这里说一下如何修改为使用AD进行身份验证【只对用户名和密码进行验证,权限依旧存储在本地数据库中】。 
  
将配置文件中的如下部分删除: 
  <!-- 认证管理器,使用自定义的UserDetailsService,并对密码采用md5加密-->  
  </div>
                                </li>
                                <li><a href="/article/3125.htm"
                                       title="mac mysql 修改密码" target="_blank">mac mysql 修改密码</a>
                                    <span class="text-muted">IXHONG</span>
<a class="tag" taget="_blank" href="/search/mysql/1.htm">mysql</a>
                                    <div>$ sudo /usr/local/mysql/bin/mysqld_safe –user=root & //启动MySQL(也可以通过偏好设置面板来启动)$ sudo /usr/local/mysql/bin/mysqladmin -uroot password yourpassword //设置MySQL密码(注意,这是第一次MySQL密码为空的时候的设置命令,如果是修改密码,还需在-</div>
                                </li>
                                <li><a href="/article/3252.htm"
                                       title="设计模式--抽象工厂模式" target="_blank">设计模式--抽象工厂模式</a>
                                    <span class="text-muted">kerryg</span>
<a class="tag" taget="_blank" href="/search/%E8%AE%BE%E8%AE%A1%E6%A8%A1%E5%BC%8F/1.htm">设计模式</a>
                                    <div>抽象工厂模式: 
 
    工厂模式有一个问题就是,类的创建依赖于工厂类,也就是说,如果想要拓展程序,必须对工厂类进行修改,这违背了闭包原则。我们采用抽象工厂模式,创建多个工厂类,这样一旦需要增加新的功能,直接增加新的工厂类就可以了,不需要修改之前的代码。 
 
    总结:这个模式的好处就是,如果想增加一个功能,就需要做一个实现类,</div>
                                </li>
                                <li><a href="/article/3379.htm"
                                       title="评"高中女生军训期跳楼”" target="_blank">评"高中女生军训期跳楼”</a>
                                    <span class="text-muted">nannan408</span>

                                    <div>   首先,先抛出我的观点,各位看官少点砖头。那就是,中国的差异化教育必须做起来。 
   孔圣人有云:有教无类。不同类型的人,都应该有对应的教育方法。目前中国的一体化教育,不知道已经扼杀了多少创造性人才。我们出不了爱迪生,出不了爱因斯坦,很大原因,是我们的培养思路错了,我们是第一要“顺从”。如果不顺从,我们的学校,就会用各种方法,罚站,罚写作业,各种罚。军</div>
                                </li>
                                <li><a href="/article/3506.htm"
                                       title="scala如何读取和写入文件内容?" target="_blank">scala如何读取和写入文件内容?</a>
                                    <span class="text-muted">qindongliang1922</span>
<a class="tag" taget="_blank" href="/search/java/1.htm">java</a><a class="tag" taget="_blank" href="/search/jvm/1.htm">jvm</a><a class="tag" taget="_blank" href="/search/scala/1.htm">scala</a>
                                    <div>直接看如下代码: 
 
package file

import java.io.RandomAccessFile
import java.nio.charset.Charset

import scala.io.Source
import scala.reflect.io.{File, Path}

/**
 * Created by qindongliang on 2015/</div>
                                </li>
                                <li><a href="/article/3633.htm"
                                       title="C语言算法之百元买百鸡" target="_blank">C语言算法之百元买百鸡</a>
                                    <span class="text-muted">qiufeihu</span>
<a class="tag" taget="_blank" href="/search/c/1.htm">c</a><a class="tag" taget="_blank" href="/search/%E7%AE%97%E6%B3%95/1.htm">算法</a>
                                    <div>中国古代数学家张丘建在他的《算经》中提出了一个著名的“百钱买百鸡问题”,鸡翁一,值钱五,鸡母一,值钱三,鸡雏三,值钱一,百钱买百鸡,问翁,母,雏各几何? 
代码如下: 
#include <stdio.h>
int main()
{
	int cock,hen,chick;                               /*定义变量为基本整型*/
	for(coc</div>
                                </li>
                                <li><a href="/article/3760.htm"
                                       title="Hadoop集群安全性:Hadoop中Namenode单点故障的解决方案及详细介绍AvatarNode" target="_blank">Hadoop集群安全性:Hadoop中Namenode单点故障的解决方案及详细介绍AvatarNode</a>
                                    <span class="text-muted">wyz2009107220</span>
<a class="tag" taget="_blank" href="/search/NameNode/1.htm">NameNode</a>
                                    <div>正如大家所知,NameNode在Hadoop系统中存在单点故障问题,这个对于标榜高可用性的Hadoop来说一直是个软肋。本文讨论一下为了解决这个问题而存在的几个solution。 
1. Secondary NameNode 
原理:Secondary NN会定期的从NN中读取editlog,与自己存储的Image进行合并形成新的metadata image 
优点:Hadoop较早的版本都自带,</div>
                                </li>
                </ul>
            </div>
        </div>
    </div>

<div>
    <div class="container">
        <div class="indexes">
            <strong>按字母分类:</strong>
            <a href="/tags/A/1.htm" target="_blank">A</a><a href="/tags/B/1.htm" target="_blank">B</a><a href="/tags/C/1.htm" target="_blank">C</a><a
                href="/tags/D/1.htm" target="_blank">D</a><a href="/tags/E/1.htm" target="_blank">E</a><a href="/tags/F/1.htm" target="_blank">F</a><a
                href="/tags/G/1.htm" target="_blank">G</a><a href="/tags/H/1.htm" target="_blank">H</a><a href="/tags/I/1.htm" target="_blank">I</a><a
                href="/tags/J/1.htm" target="_blank">J</a><a href="/tags/K/1.htm" target="_blank">K</a><a href="/tags/L/1.htm" target="_blank">L</a><a
                href="/tags/M/1.htm" target="_blank">M</a><a href="/tags/N/1.htm" target="_blank">N</a><a href="/tags/O/1.htm" target="_blank">O</a><a
                href="/tags/P/1.htm" target="_blank">P</a><a href="/tags/Q/1.htm" target="_blank">Q</a><a href="/tags/R/1.htm" target="_blank">R</a><a
                href="/tags/S/1.htm" target="_blank">S</a><a href="/tags/T/1.htm" target="_blank">T</a><a href="/tags/U/1.htm" target="_blank">U</a><a
                href="/tags/V/1.htm" target="_blank">V</a><a href="/tags/W/1.htm" target="_blank">W</a><a href="/tags/X/1.htm" target="_blank">X</a><a
                href="/tags/Y/1.htm" target="_blank">Y</a><a href="/tags/Z/1.htm" target="_blank">Z</a><a href="/tags/0/1.htm" target="_blank">其他</a>
        </div>
    </div>
</div>
<footer id="footer" class="mb30 mt30">
    <div class="container">
        <div class="footBglm">
            <a target="_blank" href="/">首页</a> -
            <a target="_blank" href="/custom/about.htm">关于我们</a> -
            <a target="_blank" href="/search/Java/1.htm">站内搜索</a> -
            <a target="_blank" href="/sitemap.txt">Sitemap</a> -
            <a target="_blank" href="/custom/delete.htm">侵权投诉</a>
        </div>
        <div class="copyright">版权所有 IT知识库 CopyRight © 2000-2050 E-COM-NET.COM , All Rights Reserved.
<!--            <a href="https://beian.miit.gov.cn/" rel="nofollow" target="_blank">京ICP备09083238号</a><br>-->
        </div>
    </div>
</footer>
<!-- 代码高亮 -->
<script type="text/javascript" src="/static/syntaxhighlighter/scripts/shCore.js"></script>
<script type="text/javascript" src="/static/syntaxhighlighter/scripts/shLegacy.js"></script>
<script type="text/javascript" src="/static/syntaxhighlighter/scripts/shAutoloader.js"></script>
<link type="text/css" rel="stylesheet" href="/static/syntaxhighlighter/styles/shCoreDefault.css"/>
<script type="text/javascript" src="/static/syntaxhighlighter/src/my_start_1.js"></script>





</body>

</html>