七、HttpClient 异常处理

HttpClient 能够抛出两种类型的异常:

1)java.io.IOException :在 I/O 失败时,如socket连接超时或被重置的异常;

2)HttpException:标志 HTTP 请求失败的信号,如违反 HTTP 协议。通常 I/O 错误被认为是非致命的和可以恢复的,而 HTTP 协议错误,则被认为是致命的而且是不能自动恢复的。请注意HttpClient实现了可抛出异常HttpExceptions为ClientProtocolException,也是 java.io.IOException的子类。这使HttpClient使用者能够在一个单一的catch子句中处理 IOException 和HttpException。

1. HTTP传输安全

要理解 HTTP 协议并不是对所有类型的应用程序都适合的,这一点很重要。 HTTP 是一个

简单的面向请求/响应的协议,最初被设计用来支持取回静态或动态生成的内容。它从未打算支持事务性操作。比如,如果成功收到和处理请求, HTTP 服务器将不会考虑是否只完成了部分请求,它仅生成一个响应并发送一个状态码到客户端。如果客户端因为读取超时,请求取消或系统崩溃导致接收响应实体失败时,服务器不会试图回滚事务。如果客户端决定

决定发送相同的请求,那么服务器将不可避免地多次执行这个相同的事务。在一些情况下,这会导致应用数据污染或者应用程序状态不一致。

尽管 HTTP 从来都没有被设计来支持事务性处理,但它仍然能被用作于一个对目标应用提供被确定状态传输协议。要保证 HTTP 传输层的安全,系统必须保证 HTTP 方法在应用层的幂等性。

 

2.幂等方法

HTTP/1.1 详细地定义了幂等的方法:

[Methods can also have the property of "idempotence" in that (aside from error or expiration issues)the side-effects of N > 0 identical requests is the same as for a single request]

换句话说,应用程序应该确保-它是准备着的来处理相同方法的不同执行含义。这是可以达到的,比如,通过提供一个唯一的事务 ID 和避免执行相同逻辑操作的方法。

请注意,这个问题对于 HttpClient 是不明确的。基于应用的浏览器确切的说也受到了相同的问题:与非幂等的 HTTP方法有关。

HttpClient 中非内含实体方法,比如GET和HEAD 是幂等的,而内含实体方法,比如POST和PUT则不是幂等的。

 

3.自动的异常恢复

默认情况下, HttpClient 会试图自动从 I/O 异常中恢复。默认的自动恢复机制仅可以对几个被认为是安全的异常起作用。

l #HttpClient 不会尝试从任意逻辑或 HTTP 协议的异常(原文为errors)中恢复(那些是从 HttpException 类中派生出的异常类)。

l #HttpClient 将会自动重新执行那些假设是幂等的方法。

l #HttpClient 将会自动重新执行那些由于传输异常导致的失败,而 HTTP 请求仍然被传送到目标服务器的方法。(也就是请求没有完整的被传送到服务器)

 

4.请求尝试处理器(Request retry handler)

l 为了能够使用自定义异常的恢复机制,你必须提供一个HttpRequestRetryHandler接口的实现。

public void IODemo(){
        HttpRequestRetryHandler myRetryHandler = new HttpRequestRetryHandler() {

            public boolean retryRequest(IOException exception, int executionCount, HttpContext context) {
                if(executionCount >= 5){
                    return false;
                }
                if(exception instanceof InterruptedIOException){
                    return true;
                }
                if(exception instanceof UnknownHostException){
                    return true;
                }
                if(exception instanceof ConnectTimeoutException){
                    return true;
                }
                if(exception instanceof SSLException){
                    return true;
                }
                HttpClientContext clientContext = HttpClientContext.adapt(context);
                HttpRequest request = clientContext.getRequest();
                boolean idempotent = !(request instanceof  HttpEntityEnclosingRequest);
                if(idempotent){
                    return true;
                }
                return false;
            }
        };
        CloseableHttpClient httpClient = HttpClients.custom().setRetryHandler(myRetryHandler).build();
        HttpGet httpGet = new HttpGet("http://www.baixxdu.com");
        CloseableHttpResponse httpResponse = null;
        try {
            httpResponse = httpClient.execute(httpGet);
            String entity = EntityUtils.toString(httpResponse.getEntity(),"utf-8");
            System.out.println(entity);
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

请注意你可以使用StandardHttpRequestRetryHandler代替默认使用的,以便处理那些被RFC-2616定义为幂等的并且能够安全的重试的请求方法。方法有:GET, HEAD, PUT, DELETE, OPTIONS, and TRACE。 

========================================补充=====================================================

构造httpclient的时候可以setRetryHandler(HttpRequestRetryHandler) ** HttpRequestRetryHandler是Http请求出错后的重试的处理接口类,对于了某些要求比较严格的业务情况下这个参数还是比较重要的。
HttpRequestRetryHandler** 的已知实现类有 DefaultHttpRequestRetryHandler和继承了DefaultHttpRequestRetryHandlerStandardHttpRequestRetryHandler

DefaultHttpRequestRetryHandler

/*
 * ====================================================================
 *
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements. See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership. The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 * with the License. You may obtain a copy of the License at
 *
 * http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing,
 * software distributed under the License is distributed on an
 * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 * KIND, either express or implied. See the License for the
 * specific language governing permissions and limitations
 * under the License.
 * ====================================================================
 *
 * This software consists of voluntary contributions made by many
 * individuals on behalf of the Apache Software Foundation. For more
 * information on the Apache Software Foundation, please see
 * .
 *
 */
package org.apache.http.impl.client;

import java.io.IOException;
import java.io.InterruptedIOException;
import java.net.ConnectException;
import java.net.UnknownHostException;
import java.util.Arrays;
import java.util.Collection;
import java.util.HashSet;
import java.util.Set;
import javax.net.ssl.SSLException;

import org.apache.http.HttpEntityEnclosingRequest;
import org.apache.http.HttpRequest;
import org.apache.http.annotation.Immutable;
import org.apache.http.client.HttpRequestRetryHandler;
import org.apache.http.client.methods.HttpUriRequest;
import org.apache.http.client.protocol.HttpClientContext;
import org.apache.http.protocol.HttpContext;
import org.apache.http.util.Args;

/**
 * The default {@link HttpRequestRetryHandler} used by request executors.
 *
 * @since 4.0
 */
@Immutable
public class DefaultHttpRequestRetryHandler implements HttpRequestRetryHandler {
    public static final DefaultHttpRequestRetryHandler INSTANCE = new DefaultHttpRequestRetryHandler();
    /**
     * the number of times a method will be retried
     */
    private final int retryCount;
    /**
     * Whether or not methods that have successfully sent their request will be retried
     */
    private final boolean requestSentRetryEnabled;
    private final Set> nonRetriableClasses;

    /**
     * Create the request retry handler using the specified IOException classes
     *
     * @param retryCount              how many times to retry; 0 means no retries
     * @param requestSentRetryEnabled true if it's OK to retry requests that have been sent
     * @param clazzes                 the IOException types that should not be retried
     * @since 4.3
     */
    protected DefaultHttpRequestRetryHandler(final int retryCount, final boolean requestSentRetryEnabled, final Collection> clazzes) {
        super();
        this.retryCount = retryCount;
        this.requestSentRetryEnabled = requestSentRetryEnabled;
        this.nonRetriableClasses = new HashSet>();
        for (final Class clazz : clazzes) {
            this.nonRetriableClasses.add(clazz);
        }
    }

    /**
     * Create the request retry handler using the following list of
     * non-retriable IOException classes: 
*
    *
  • InterruptedIOException
  • *
  • UnknownHostException
  • *
  • ConnectException
  • *
  • SSLException
  • *
* * @param retryCount how many times to retry; 0 means no retries * @param requestSentRetryEnabled true if it's OK to retry non-idempotent requests that have been sent */ @SuppressWarnings("unchecked") public DefaultHttpRequestRetryHandler(final int retryCount, final boolean requestSentRetryEnabled) { this(retryCount, requestSentRetryEnabled, Arrays.asList(InterruptedIOException.class, UnknownHostException.class, ConnectException.class, SSLException.class)); } /** * Create the request retry handler with a retry count of 3, requestSentRetryEnabled false * and using the following list of non-retriable IOException classes:
*
    *
  • InterruptedIOException
  • *
  • UnknownHostException
  • *
  • ConnectException
  • *
  • SSLException
  • *
*/ public DefaultHttpRequestRetryHandler() { this(3, false); } /** * Used {@code retryCount} and {@code requestSentRetryEnabled} to determine * if the given method should be retried. */ @Override public boolean retryRequest(final IOException exception, final int executionCount, final HttpContext context) { Args.notNull(exception, "Exception parameter"); Args.notNull(context, "HTTP context"); if (executionCount > this.retryCount) { // Do not retry if over max retry count return false; } if (this.nonRetriableClasses.contains(exception.getClass())) { return false; } else { for (final Class rejectException : this.nonRetriableClasses) { if (rejectException.isInstance(exception)) { return false; } } } final HttpClientContext clientContext = HttpClientContext.adapt(context); final HttpRequest request = clientContext.getRequest(); if (requestIsAborted(request)) { return false; } if (handleAsIdempotent(request)) { // Retry if the request is considered idempotent return true; } if (!clientContext.isRequestSent() || this.requestSentRetryEnabled) { // Retry if the request has not been sent fully or // if it's OK to retry methods that have been sent return true; } // otherwise do not retry return false; } /** * @return {@code true} if this handler will retry methods that have * successfully sent their request, {@code false} otherwise */ public boolean isRequestSentRetryEnabled() { return requestSentRetryEnabled; } /** * @return the maximum number of times a method will be retried */ public int getRetryCount() { return retryCount; } /** * @since 4.2 */ protected boolean handleAsIdempotent(final HttpRequest request) { return !(request instanceof HttpEntityEnclosingRequest); } /** * @since 4.2 * * @deprecated (4.3) */ @Deprecated protected boolean requestIsAborted( final HttpRequest request) { HttpRequest req = request; if (request instanceof RequestWrapper) { // does not forward request to original req = ((RequestWrapper) request).getOriginal(); } return (req instanceof HttpUriRequest && ((HttpUriRequest) req).isAborted()); } }

默认构造函数是

public DefaultHttpRequestRetryHandler() {
        this(3, false);
    }

参数requestSentRetryEnabled是请求是否发送成功都重试 这里设置了false,一般情况下都不要为true我觉得。
主要实现的方法是

boolean retryRequest(IOException exception, int executionCount, HttpContext context);

StandardHttpRequestRetryHandler并没有重写该方法

@Immutable
public class StandardHttpRequestRetryHandler extends DefaultHttpRequestRetryHandler {
    private final Map idempotentMethods;

    public StandardHttpRequestRetryHandler(final int retryCount, final boolean requestSentRetryEnabled) {
        super(retryCount, requestSentRetryEnabled);
        this.idempotentMethods = new ConcurrentHashMap();
        this.idempotentMethods.put("GET", Boolean.TRUE);
        this.idempotentMethods.put("HEAD", Boolean.TRUE);
        this.idempotentMethods.put("PUT", Boolean.TRUE);
        this.idempotentMethods.put("DELETE", Boolean.TRUE);
        this.idempotentMethods.put("OPTIONS", Boolean.TRUE);
        this.idempotentMethods.put("TRACE", Boolean.TRUE);
    }

    public StandardHttpRequestRetryHandler() {
        this(3, false);
    }

    @Override
    protected boolean handleAsIdempotent(final HttpRequest request) {
        final String method = request.getRequestLine().getMethod().toUpperCase(Locale.ROOT);
        final Boolean b = this.idempotentMethods.get(method);
        return b != null && b.booleanValue();
    }
}

只是重写了

protected boolean handleAsIdempotent(final HttpRequest request)

我们参考后完全可以实现自己的HttpRequestRetryHandler

初始化httpClient
在httpClient4.5中,初始化的方式已经和以前版有差异

static CloseableHttpClient client = HttpClients.createDefault(); 
和 
static CloseableHttpClient httpClient=HttpClients.custom().build(); 
在该方式中可以添加一些网络

可以直接使用匿名类

HttpRequestRetryHandler handler = new HttpRequestRetryHandler() {
        @Override
        public boolean retryRequest(IOException arg0, int retryTimes, HttpContext arg2) {
            if (retryTimes > 5) {
                return false;
            }
            if (arg0 instanceof UnknownHostException || arg0 instanceof ConnectTimeoutException || !(arg0 instanceof SSLException) || arg0 instanceof NoHttpResponseException) {
                return true;
            }
            HttpClientContext clientContext = HttpClientContext.adapt(arg2);
            HttpRequest request = clientContext.getRequest();
            boolean idempotent = !(request instanceof HttpEntityEnclosingRequest);
            if (idempotent) {
                // 如果请求被认为是幂等的,那么就重试。即重复执行不影响程序其他效果的
                return true;
            }
            return false;
        }
    };

还可以设置路由策略 即设置代理方式访问

HttpHost proxy = new HttpHost("127.0.0.1", 80);
// 设置代理ip 
DefaultProxyRoutePlanner routePlanner = new DefaultProxyRoutePlanner(proxy); 

CloseableHttpClient httpClient=HttpClients.custom().setRoutePlanner(routePlanner).setRetryHandler(handler) .setConnectionTimeToLive(1, TimeUnit.DAYS).setDefaultCookieStore(cookieStore).build();

 

你可能感兴趣的:(爬虫专栏)