【kettle】pentaho/data-integration 报错:org.apache.http.NoHttpResponseException: failed to respond

一、版本说明

kettle版本:8.2.0.0-342
【kettle】pentaho/data-integration debug 查看日志方法

二、报错说明:

核心报错内容:
org.apache.http.NoHttpResponseException: xxx.com:80 failed to respond

执行到某一个请求时候报错,内容:

2024/01/18 15:24:06 - 获取json.0 - Connecting to [http://xxx.com/apis/query?id=123456] ...
2024/01/18 15:24:06 - 获取json.0 - Header parameter [Authorization]='Bearer ***'
15:24:06,404 DEBUG [BasicClientConnectionManager] Get connection for route {}->http://xxx.com:80
15:24:06,405 DEBUG [DefaultClientConnectionOperator] Connecting to xxx.com:80
15:24:06,407 DEBUG [RequestAddCookies] CookieSpec selected: default
15:24:06,407 DEBUG [RequestAuthCache] Auth cache not set in the context
15:24:06,407 DEBUG [RequestProxyAuthentication] Proxy auth state: UNCHALLENGED
15:24:06,407 DEBUG [DefaultHttpClient] Attempt 1 to execute request
15:24:06,408 DEBUG [DefaultClientConnection] Sending request: POST /apis/query?id=123456 HTTP/1.1
15:24:06,408 DEBUG [wire]  >> "POST /apis/query?id=123456 HTTP/1.1[\r][\n]"
15:24:06,408 DEBUG [wire]  >> "Authorization: Bearer ***[\r][\n]"
15:24:06,408 DEBUG [wire]  >> "Content-Type: application/json[\r][\n]"
15:24:06,408 DEBUG [wire]  >> "Content-Length: 0[\r][\n]"
15:24:06,408 DEBUG [wire]  >> "Host: xxx.com[\r][\n]"
15:24:06,408 DEBUG [wire]  >> "Connection: Keep-Alive[\r][\n]"
15:24:06,408 DEBUG [wire]  >> "User-Agent: Apache-HttpClient/4.5.3 (Java/1.8.0_261)[\r][\n]"
15:24:06,408 DEBUG [wire]  >> "[\r][\n]"
15:24:06,408 DEBUG [headers] >> POST /apis/query?id=123456 HTTP/1.1
15:24:06,408 DEBUG [headers] >> Authorization: Bearer ***
15:24:06,408 DEBUG [headers] >> Content-Type: application/json
15:24:06,408 DEBUG [headers] >> Content-Length: 0
15:24:06,408 DEBUG [headers] >> Host: xxx.com
15:24:06,408 DEBUG [headers] >> Connection: Keep-Alive
15:24:06,408 DEBUG [headers] >> User-Agent: Apache-HttpClient/4.5.3 (Java/1.8.0_261)
15:24:06,661 DEBUG [DefaultClientConnection] Connection 0.0.0.0:63956<->10.64.252.13:80 closed
15:24:06,661 DEBUG [DefaultHttpClient] Closing the connection.
15:24:06,666 DEBUG [DefaultClientConnection] Connection 0.0.0.0:63956<->10.64.252.13:80 closed
15:24:06,666 DEBUG [DefaultClientConnection] Connection 0.0.0.0:63956<->10.64.252.13:80 shut down
15:24:06,666 DEBUG [BasicClientConnectionManager] Releasing connection org.apache.http.impl.conn.ManagedClientConnectionImpl@7adaefad
2024/01/18 15:24:06 - 获取json.0 - ERROR (version 8.2.0.0-342, build 8.2.0.0-342 from 2018-11-14 10.30.55 by buildguy) : Because of an error, this step can't continue:
2024/01/18 15:24:06 - 获取json.0 - Can not result from [http://xxx.com/apis/query?id=123456]
2024/01/18 15:24:06 - 获取json.0 - org.apache.http.NoHttpResponseException: xxx.com:80 failed to respond
2024/01/18 15:24:06 - 获取json.0 - ERROR (version 8.2.0.0-342, build 8.2.0.0-342 from 2018-11-14 10.30.55 by buildguy) : org.pentaho.di.core.exception.KettleException:
2024/01/18 15:24:06 - 获取json.0 - Can not result from [http://xxx.com/apis/query?id=123456]
2024/01/18 15:24:06 - 获取json.0 - org.apache.http.NoHttpResponseException: xxx.com:80 failed to respond
2024/01/18 15:24:06 - 获取json.0 -
2024/01/18 15:24:06 - 获取json.0 -      at org.pentaho.di.trans.steps.rest.Rest.callRest(Rest.java:273)
2024/01/18 15:24:06 - 获取json.0 -      at org.pentaho.di.trans.steps.rest.Rest.processRow(Rest.java:470)
2024/01/18 15:24:06 - 获取json.0 -      at org.pentaho.di.trans.step.RunThread.run(RunThread.java:62)
2024/01/18 15:24:06 - 获取json.0 -      at java.lang.Thread.run(Thread.java:748)
2024/01/18 15:24:06 - 获取json.0 - Caused by: com.sun.jersey.api.client.ClientHandlerException: org.apache.http.NoHttpResponseException: xxx.com:80 failed to respond
2024/01/18 15:24:06 - 获取json.0 -      at com.sun.jersey.client.apache4.ApacheHttpClient4Handler.handle(ApacheHttpClient4Handler.java:187)
2024/01/18 15:24:06 - 获取json.0 -      at com.sun.jersey.api.client.Client.handle(Client.java:652)
2024/01/18 15:24:06 - 获取json.0 -      at com.sun.jersey.api.client.WebResource.handle(WebResource.java:682)
2024/01/18 15:24:06 - 获取json.0 -      at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
2024/01/18 15:24:06 - 获取json.0 -      at com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:570)
2024/01/18 15:24:06 - 获取json.0 -      at org.pentaho.di.trans.steps.rest.Rest.callRest(Rest.java:188)
2024/01/18 15:24:06 - 获取json.0 -      ... 3 more
2024/01/18 15:24:06 - 获取json.0 - Caused by: org.apache.http.NoHttpResponseException: xxx.com:80 failed to respond
2024/01/18 15:24:06 - 获取json.0 -      at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:141)
2024/01/18 15:24:06 - 获取json.0 -      at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
2024/01/18 15:24:06 - 获取json.0 -      at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
2024/01/18 15:24:06 - 获取json.0 -      at org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:281)
2024/01/18 15:24:06 - 获取json.0 -      at org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:257)
2024/01/18 15:24:06 - 获取json.0 -      at org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:207)
2024/01/18 15:24:06 - 获取json.0 -      at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
2024/01/18 15:24:06 - 获取json.0 -      at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
2024/01/18 15:24:06 - 获取json.0 -      at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:684)
2024/01/18 15:24:06 - 获取json.0 -      at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:486)
2024/01/18 15:24:06 - 获取json.0 -      at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:835)
2024/01/18 15:24:06 - 获取json.0 -      at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:118)
2024/01/18 15:24:06 - 获取json.0 -      at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
2024/01/18 15:24:06 - 获取json.0 -      at com.sun.jersey.client.apache4.ApacheHttpClient4Handler.handle(ApacheHttpClient4Handler.java:173)
2024/01/18 15:24:06 - 获取json.0 -      ... 8 more

对比看下不报错查询的日志:

15:24:06,225 DEBUG [BasicClientConnectionManager] Get connection for route {}->http://xxx.com:80
15:24:06,225 DEBUG [DefaultClientConnectionOperator] Connecting to xxx.com:80
15:24:06,228 DEBUG [RequestAddCookies] CookieSpec selected: default
15:24:06,228 DEBUG [RequestAuthCache] Auth cache not set in the context
15:24:06,228 DEBUG [RequestProxyAuthentication] Proxy auth state: UNCHALLENGED
15:24:06,228 DEBUG [DefaultHttpClient] Attempt 1 to execute request
15:24:06,228 DEBUG [DefaultClientConnection] Sending request: POST /apis/query?id=123456 HTTP/1.1
15:24:06,228 DEBUG [wire]  >> "POST /apis/query?id=123456 HTTP/1.1[\r][\n]"
15:24:06,228 DEBUG [wire]  >> "Authorization: Bearer ***[\r][\n]"
15:24:06,228 DEBUG [wire]  >> "Content-Type: application/json[\r][\n]"
15:24:06,228 DEBUG [wire]  >> "Content-Length: 0[\r][\n]"
15:24:06,228 DEBUG [wire]  >> "Host: xxx.com[\r][\n]"
15:24:06,228 DEBUG [wire]  >> "Connection: Keep-Alive[\r][\n]"
15:24:06,228 DEBUG [wire]  >> "User-Agent: Apache-HttpClient/4.5.3 (Java/1.8.0_261)[\r][\n]"
15:24:06,228 DEBUG [wire]  >> "[\r][\n]"
15:24:06,228 DEBUG [headers] >> POST /apis/query?id=123456 HTTP/1.1
15:24:06,228 DEBUG [headers] >> Authorization: Bearer ***
15:24:06,228 DEBUG [headers] >> Content-Type: application/json
15:24:06,228 DEBUG [headers] >> Content-Length: 0
15:24:06,228 DEBUG [headers] >> Host: xxx.com
15:24:06,229 DEBUG [headers] >> Connection: Keep-Alive
15:24:06,229 DEBUG [headers] >> User-Agent: Apache-HttpClient/4.5.3 (Java/1.8.0_261)
15:24:06,343 DEBUG [wire]  << "HTTP/1.1 200 [\r][\n]"
15:24:06,343 DEBUG [wire]  << "Content-Type: application/json[\r][\n]"
15:24:06,347 DEBUG [wire]  << "Transfer-Encoding: chunked[\r][\n]"
15:24:06,347 DEBUG [wire]  << "Connection: keep-alive[\r][\n]"
15:24:06,347 DEBUG [wire]  << "Vary: Origin[\r][\n]"
15:24:06,347 DEBUG [wire]  << "X-Content-Type-Options: nosniff[\r][\n]"
15:24:06,347 DEBUG [wire]  << "X-XSS-Protection: 1; mode=block[\r][\n]"
15:24:06,347 DEBUG [wire]  << "Cache-Control: no-cache, no-store, max-age=0, must-revalidate[\r][\n]"
15:24:06,347 DEBUG [wire]  << "Pragma: no-cache[\r][\n]"
15:24:06,347 DEBUG [wire]  << "Expires: 0[\r][\n]"
15:24:06,347 DEBUG [wire]  << "Date: Thu, 18 Jan 2024 07:24:06 GMT[\r][\n]"
15:24:06,347 DEBUG [wire]  << "Access-Control-Allow-Origin: *[\r][\n]"
15:24:06,347 DEBUG [wire]  << "X-Kong-Upstream-Latency: 113[\r][\n]"
15:24:06,347 DEBUG [wire]  << "X-Kong-Proxy-Latency: 0[\r][\n]"
15:24:06,348 DEBUG [wire]  << "Via: kong/2.7.0[\r][\n]"
15:24:06,348 DEBUG [wire]  << "vary: Origin[\r][\n]"
15:24:06,348 DEBUG [wire]  << "[\r][\n]"
15:24:06,348 DEBUG [DefaultClientConnection] Receiving response: HTTP/1.1 200
15:24:06,348 DEBUG [headers] << HTTP/1.1 200
15:24:06,348 DEBUG [headers] << Content-Type: application/json
15:24:06,348 DEBUG [headers] << Transfer-Encoding: chunked
15:24:06,348 DEBUG [headers] << Connection: keep-alive
15:24:06,348 DEBUG [headers] << Vary: Origin
15:24:06,348 DEBUG [headers] << X-Content-Type-Options: nosniff
15:24:06,348 DEBUG [headers] << X-XSS-Protection: 1; mode=block
15:24:06,348 DEBUG [headers] << Cache-Control: no-cache, no-store, max-age=0, must-revalidate
15:24:06,348 DEBUG [headers] << Pragma: no-cache
15:24:06,348 DEBUG [headers] << Expires: 0
15:24:06,348 DEBUG [headers] << Date: Thu, 18 Jan 2024 07:24:06 GMT
15:24:06,348 DEBUG [headers] << Access-Control-Allow-Origin: *
15:24:06,348 DEBUG [headers] << X-Kong-Upstream-Latency: 113
15:24:06,348 DEBUG [headers] << X-Kong-Proxy-Latency: 0
15:24:06,348 DEBUG [headers] << Via: kong/2.7.0
15:24:06,348 DEBUG [headers] << vary: Origin
15:24:06,348 DEBUG [DefaultHttpClient] Connection can be kept alive indefinitely
15:24:06,349 DEBUG [wire]  << "38[\r][\n]"
15:24:06,349 DEBUG [wire]  << "隐藏内容}"
2024/01/18 15:24:06 - 获取json.0 - Response time (milliseconds): [125] for [http://xxx.com/apis/query?id=123456]
2024/01/18 15:24:06 - 获取json.0 - The response code is 200
15:24:06,349 DEBUG [wire]  << "[\r][\n]"
15:24:06,349 DEBUG [wire]  << "0[\r][\n]"
15:24:06,349 DEBUG [wire]  << "[\r][\n]"
15:24:06,349 DEBUG [BasicClientConnectionManager] Releasing connection org.apache.http.impl.conn.ManagedClientConnectionImpl@4bdd086a
15:24:06,349 DEBUG [BasicClientConnectionManager] Connection can be kept alive indefinitely

三、报错原因分析

参考文章 记一次NoHttpResponseException:xxx failed to respond
得知kettle使用的是 apache的httpclient作为
【kettle】pentaho/data-integration 报错:org.apache.http.NoHttpResponseException: failed to respond_第1张图片
三个组件的连接工具。
而问题原因归结为:keep-alive配置

于是手动搭建springboot项目并增加配置:
定制KeepAliveTimeout,设置10秒;5个请求则自动断开keepalive连接

import org.apache.catalina.connector.Connector;
import org.apache.coyote.http11.Http11NioProtocol;
import org.springframework.boot.web.embedded.tomcat.TomcatConnectorCustomizer;
import org.springframework.boot.web.embedded.tomcat.TomcatServletWebServerFactory;
import org.springframework.boot.web.server.ConfigurableWebServerFactory;
import org.springframework.boot.web.server.WebServerFactoryCustomizer;
import org.springframework.context.annotation.Configuration;

@Configuration
public class WebServerConfiguration implements WebServerFactoryCustomizer<ConfigurableWebServerFactory> {
    @Override
    public void customize(ConfigurableWebServerFactory factory) {
        //使用对应工厂类提供给我们的接口定制化我们的tomcat connector
        ((TomcatServletWebServerFactory) factory).addConnectorCustomizers(new TomcatConnectorCustomizer() {
            @Override
            public void customize(Connector connector) {
                Http11NioProtocol protocol = (Http11NioProtocol) connector.getProtocolHandler();
                //定制KeepAliveTimeout,设置10秒内没有请求则服务器自动断开keepalive连接
                protocol.setKeepAliveTimeout(10000);
                //当客户端发送超过5个请求则自动断开keepalive连接
                protocol.setMaxKeepAliveRequests(5);
            }
        });
    }
}

增加测试类:
关闭重试,强制keepAlive=-1

import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.client.utils.URIBuilder;
import org.apache.http.conn.ConnectionKeepAliveStrategy;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.DefaultConnectionKeepAliveStrategy;
import org.apache.http.impl.client.DefaultHttpRequestRetryHandler;
import org.apache.http.impl.client.HttpClientBuilder;
import org.apache.http.impl.conn.PoolingHttpClientConnectionManager;
import org.apache.http.util.EntityUtils;
import org.junit.jupiter.api.Test;

import java.io.IOException;
import java.net.URI;
import java.net.URISyntaxException;
import java.util.Arrays;
import java.util.HashMap;
import java.util.Map;

public class HttpRequest {
    // 默认keepalive策略 ,会获取response中的keepalive参数并配置到client中
    private ConnectionKeepAliveStrategy udfKeepAliveStrategy = DefaultConnectionKeepAliveStrategy.INSTANCE;
    // 强制 -1表示无论如何都任务server端不会关闭连接
    private ConnectionKeepAliveStrategy noneKeepAliveStrategy = (response, context) -> -1;

    // 如果不配置此项,也会有添加默认配置重试3次,此处增加重试次数。
    private DefaultHttpRequestRetryHandler udfRetryHandler = new DefaultHttpRequestRetryHandler(30, false);
    private PoolingHttpClientConnectionManager manager;

    private String INCR_URL = "http://localhost:8080/api/v1/incr";

    public HttpRequest() {
        manager = new PoolingHttpClientConnectionManager();
        manager.setDefaultMaxPerRoute(100);
        manager.setMaxTotal(200);
        manager.setValidateAfterInactivity(10_000);
    }

    public CloseableHttpClient getClient() {
        HttpClientBuilder httpClientBuilder = HttpClientBuilder.create();
        httpClientBuilder
                .setConnectionManager(manager)
                // .setRetryHandler(udfRetryHandler)
                // .setKeepAliveStrategy(udfKeepAliveStrategy)
                
                .setKeepAliveStrategy(noneKeepAliveStrategy)
                .disableAutomaticRetries()
        ;
        CloseableHttpClient client = httpClientBuilder.build();
        return client;
    }

    @Test
    public void httpRequest() throws URISyntaxException, IOException {
        CloseableHttpClient client = getClient();
        URI uri = new URIBuilder(INCR_URL + "/info").build();
        for (int i = 0; i < 10; i++) {
            HttpPost post = new HttpPost(uri);
            CloseableHttpResponse response = client.execute(post);
            Map<String, String> headerMap = new HashMap<>();
             Arrays.stream(response.getAllHeaders()).forEach(f->headerMap.put(f.getName(),f.getValue()));
            String responseStr = EntityUtils.toString(response.getEntity());
            String headersStr=headerMap.toString();
            System.out.println(String.format("content: %s, headers: %s",responseStr,headersStr));
        }
    }
}

经过测试:

16:01:57.448 [main] DEBUG org.apache.http.client.protocol.RequestAddCookies - CookieSpec selected: default
16:01:57.448 [main] DEBUG org.apache.http.client.protocol.RequestAuthCache - Auth cache not set in the context
16:01:57.448 [main] DEBUG org.apache.http.impl.conn.PoolingHttpClientConnectionManager - Connection request: [route: {}->http://localhost:8080][total kept alive: 1; route allocated: 1 of 100; total allocated: 1 of 200]
16:01:57.448 [main] DEBUG org.apache.http.impl.conn.PoolingHttpClientConnectionManager - Connection leased: [id: 1][route: {}->http://localhost:8080][total kept alive: 0; route allocated: 1 of 100; total allocated: 1 of 200]
16:01:57.448 [main] DEBUG org.apache.http.impl.execchain.MainClientExec - Executing request POST /api/v1/incr/info HTTP/1.1
16:01:57.448 [main] DEBUG org.apache.http.impl.execchain.MainClientExec - Target auth state: UNCHALLENGED
16:01:57.448 [main] DEBUG org.apache.http.impl.execchain.MainClientExec - Proxy auth state: UNCHALLENGED
16:01:57.448 [main] DEBUG org.apache.http.headers - http-outgoing-1 >> POST /api/v1/incr/info HTTP/1.1
16:01:57.448 [main] DEBUG org.apache.http.headers - http-outgoing-1 >> Content-Length: 0
16:01:57.448 [main] DEBUG org.apache.http.headers - http-outgoing-1 >> Host: localhost:8080
16:01:57.448 [main] DEBUG org.apache.http.headers - http-outgoing-1 >> Connection: Keep-Alive
16:01:57.448 [main] DEBUG org.apache.http.headers - http-outgoing-1 >> User-Agent: Apache-HttpClient/4.5.3 (Java/1.8.0_261)
16:01:57.448 [main] DEBUG org.apache.http.headers - http-outgoing-1 >> Accept-Encoding: gzip,deflate
16:01:57.448 [main] DEBUG org.apache.http.wire - http-outgoing-1 >> "POST /api/v1/incr/info HTTP/1.1[\r][\n]"
16:01:57.448 [main] DEBUG org.apache.http.wire - http-outgoing-1 >> "Content-Length: 0[\r][\n]"
16:01:57.448 [main] DEBUG org.apache.http.wire - http-outgoing-1 >> "Host: localhost:8080[\r][\n]"
16:01:57.448 [main] DEBUG org.apache.http.wire - http-outgoing-1 >> "Connection: Keep-Alive[\r][\n]"
16:01:57.448 [main] DEBUG org.apache.http.wire - http-outgoing-1 >> "User-Agent: Apache-HttpClient/4.5.3 (Java/1.8.0_261)[\r][\n]"
16:01:57.448 [main] DEBUG org.apache.http.wire - http-outgoing-1 >> "Accept-Encoding: gzip,deflate[\r][\n]"
16:01:57.448 [main] DEBUG org.apache.http.wire - http-outgoing-1 >> "[\r][\n]"
16:01:57.758 [main] DEBUG org.apache.http.wire - http-outgoing-1 << "HTTP/1.1 200 [\r][\n]"
16:01:57.758 [main] DEBUG org.apache.http.wire - http-outgoing-1 << "Content-Type: application/json[\r][\n]"
16:01:57.758 [main] DEBUG org.apache.http.wire - http-outgoing-1 << "Transfer-Encoding: chunked[\r][\n]"
16:01:57.758 [main] DEBUG org.apache.http.wire - http-outgoing-1 << "Date: Wed, 17 Jan 2024 08:01:57 GMT[\r][\n]"
16:01:57.758 [main] DEBUG org.apache.http.wire - http-outgoing-1 << "Connection: close[\r][\n]"
16:01:57.758 [main] DEBUG org.apache.http.wire - http-outgoing-1 << "[\r][\n]"
16:01:57.758 [main] DEBUG org.apache.http.wire - http-outgoing-1 << "18[\r][\n]"
16:01:57.758 [main] DEBUG org.apache.http.wire - http-outgoing-1 << "{"id":9,"name":"info-9"}[\r][\n]"
16:01:57.758 [main] DEBUG org.apache.http.headers - http-outgoing-1 << HTTP/1.1 200 
16:01:57.758 [main] DEBUG org.apache.http.headers - http-outgoing-1 << Content-Type: application/json
16:01:57.758 [main] DEBUG org.apache.http.headers - http-outgoing-1 << Transfer-Encoding: chunked
16:01:57.758 [main] DEBUG org.apache.http.headers - http-outgoing-1 << Date: Wed, 17 Jan 2024 08:01:57 GMT
16:01:57.758 [main] DEBUG org.apache.http.headers - http-outgoing-1 << Connection: close
16:01:57.759 [main] DEBUG org.apache.http.wire - http-outgoing-1 << "0[\r][\n]"
16:01:57.759 [main] DEBUG org.apache.http.wire - http-outgoing-1 << "[\r][\n]"
16:01:57.759 [main] DEBUG org.apache.http.impl.conn.DefaultManagedHttpClientConnection - http-outgoing-1: Close connection
16:01:57.759 [main] DEBUG org.apache.http.impl.execchain.MainClientExec - Connection discarded
16:01:57.759 [main] DEBUG org.apache.http.impl.conn.PoolingHttpClientConnectionManager - Connection released: [id: 1][route: {}->http://localhost:8080][total kept alive: 0; route allocated: 0 of 100; total allocated: 0 of 200]

结果:
程序根本不会报错!!!但是debug srpingboot端设置的close connection生效了,处理每5条日志此日志就会打印获取的response中含有Connection: close内容。apache 会根据此内容,重建connection。所以没有任何报错内容。备注:其他记录都含有Connection: keep-alive字样,表示服务器还没关闭connection。

四、解决方法

办法就是增加重试配置。

4.1 准备工作:

把kettle/lib中的包install到本地仓库:
需要的kettle包都在kettle安装目录/lib下:KETTLE_HOME/lib

mvn install:install-file "-DgroupId=pentaho-kettle" "-DartifactId=kettle-core" "-Dversion=8.2.0.0-342" "-Dpackaging=jar" "-Dfile=install/kettle-core-8.2.0.0-342.jar"
mvn install:install-file "-DgroupId=pentaho-kettle" "-DartifactId=kettle-engine" "-Dversion=8.2.0.0-342" "-Dpackaging=jar" "-Dfile=install/kettle-engine-8.2.0.0-342.jar"
mvn install:install-file "-DgroupId=pentaho" "-DartifactId=metastore" "-Dversion=8.2.0.0-342" "-Dpackaging=jar" "-Dfile=install/metastore-8.2.0.0-342.jar"

新建maven项目:

pom.xml增加依赖:

    <dependency>
      <groupId>pentaho-kettlegroupId>
      <artifactId>kettle-coreartifactId>
      <version>8.2.0.0-342version>
      <scope>providedscope>
    dependency>

    <dependency>
      <groupId>pentaho-kettlegroupId>
      <artifactId>kettle-engineartifactId>
      <version>8.2.0.0-342version>
      <scope>providedscope>
    dependency>

    <dependency>
      <groupId>pentahogroupId>
      <artifactId>metastoreartifactId>
      <version>8.2.0.0-342version>
      <scope>providedscope>
    dependency>

    
    <dependency>
      <groupId>org.apache.httpcomponentsgroupId>
      <artifactId>httpclientartifactId>
      <version>4.5.3version>
      <scope>providedscope>
    dependency>

    <dependency>
      <groupId>commons-langgroupId>
      <artifactId>commons-langartifactId>
      <version>2.6version>
      <scope>providedscope>
    dependency>

    
    <dependency>
      <groupId>com.googlecode.json-simplegroupId>
      <artifactId>json-simpleartifactId>
      <version>1.1version>
      <scope>providedscope>
    dependency>

    
    <dependency>
      <groupId>com.github.rholdergroupId>
      <artifactId>guava-retryingartifactId>
      <version>2.0.0version>
      <scope>providedscope>
    dependency>

4.2 解决方法一:

问题解决

所以按照之前参考文章的内容: org.pentaho.di.cluster.SlaveConnectionManagerorg.pentaho.di.core.util.HttpClientManager 类中
修改所有使用 HttpClientsHttpClientBuilder 新建 client的位置,增加retry相关配置
注意:使用DefaultHttpRequestRetryHandler必须设置true,或者使用StandardHttpRequestRetryHandler也可以。

    // private ConnectionKeepAliveStrategy udfKeepAliveStrategy = DefaultConnectionKeepAliveStrategy.INSTANCE;
   // private DefaultClientConnectionReuseStrategy udfReuseHandler= new DefaultClientConnectionReuseStrategy();
    private HttpRequestRetryHandler udfRetryHandler=new DefaultHttpRequestRetryHandler(5,true); // 必须true!
    // private HttpRequestRetryHandler udfRetryHandler = new StandardHttpRequestRetryHandler(5,true) // true false无所谓 

KeepAlive和ReuseStrategy默认都是会添加的,所以不配置也可以。retry策略必须添加。
在这里插入图片描述
编译,用刚刚编译的替换kettle-core-8.2.0.0-342.jar中的.class文件,并替换KETTLE_HOME/lib/kettle-core-8.2.0.0-342.jar包

用kettle调用自己的写的springboot服务看下:
【kettle】pentaho/data-integration 报错:org.apache.http.NoHttpResponseException: failed to respond_第2张图片
使用kettle的rest组件调用自己的springboot接口发现,cmd窗口打印的debug日志response信息中不会出现:Connection: close。所以kettle中rest或者http在每条数据运行过程都是新建了一个apache httpclient对象,就更没有复用connection!每条数据服务器返回的response都有Connection: keep-alive,看来效率很低。

2024/01/17 13:35:04 - 写日志.0 -
2024/01/17 13:35:04 - 写日志.0 - ------------> 行号 99------------------------------
2024/01/17 13:35:04 - 写日志.0 - res = {"id":1510,"name":"info-1510"}
2024/01/17 13:35:04 - 写日志.0 -
2024/01/17 13:35:04 - 写日志.0 - ====================
2024/01/17 13:35:04 - REST client.0 - Connecting to [http://localhost:8080/api/v1/incr/info] ...
2024/01/17 13:35:04 - REST client.0 - Connecting to [http://localhost:8080/api/v1/incr/info] ...
2024/01/17 13:35:04 - REST client.0 - Adding HTTP body value [1]
13:35:04,357 DEBUG [BasicClientConnectionManager] Get connection for route {}->http://localhost:8080
13:35:04,358 DEBUG [DefaultClientConnectionOperator] Connecting to localhost:8080
13:35:04,359 DEBUG [RequestAddCookies] CookieSpec selected: default
13:35:04,359 DEBUG [RequestAuthCache] Auth cache not set in the context
13:35:04,359 DEBUG [RequestTargetAuthentication] Target auth state: UNCHALLENGED
13:35:04,359 DEBUG [RequestProxyAuthentication] Proxy auth state: UNCHALLENGED
13:35:04,359 DEBUG [DefaultHttpClient] Attempt 1 to execute request
13:35:04,359 DEBUG [DefaultClientConnection] Sending request: POST /api/v1/incr/info HTTP/1.1
13:35:04,359 DEBUG [wire]  >> "POST /api/v1/incr/info HTTP/1.1[\r][\n]"
13:35:04,359 DEBUG [wire]  >> "Content-Type: application/json[\r][\n]"
13:35:04,359 DEBUG [wire]  >> "Transfer-Encoding: chunked[\r][\n]"
13:35:04,359 DEBUG [wire]  >> "Host: localhost:8080[\r][\n]"
13:35:04,359 DEBUG [wire]  >> "Connection: Keep-Alive[\r][\n]"
13:35:04,359 DEBUG [wire]  >> "User-Agent: Apache-HttpClient/4.5.3 (Java/1.8.0_261)[\r][\n]"
13:35:04,360 DEBUG [wire]  >> "[\r][\n]"
13:35:04,360 DEBUG [headers] >> POST /api/v1/incr/info HTTP/1.1
13:35:04,360 DEBUG [headers] >> Content-Type: application/json
13:35:04,360 DEBUG [headers] >> Transfer-Encoding: chunked
13:35:04,360 DEBUG [headers] >> Host: localhost:8080
13:35:04,360 DEBUG [headers] >> Connection: Keep-Alive
13:35:04,360 DEBUG [headers] >> User-Agent: Apache-HttpClient/4.5.3 (Java/1.8.0_261)
13:35:04,360 DEBUG [wire]  >> "1[\r][\n]"
13:35:04,360 DEBUG [wire]  >> "1"
13:35:04,360 DEBUG [wire]  >> "[\r][\n]"
13:35:04,360 DEBUG [wire]  >> "0[\r][\n]"
13:35:04,360 DEBUG [wire]  >> "[\r][\n]"
13:35:04,674 DEBUG [wire]  << "HTTP/1.1 200 [\r][\n]"
13:35:04,675 DEBUG [wire]  << "Content-Type: application/json[\r][\n]"
13:35:04,680 DEBUG [wire]  << "Transfer-Encoding: chunked[\r][\n]"
13:35:04,680 DEBUG [wire]  << "Date: Wed, 17 Jan 2024 05:35:04 GMT[\r][\n]"
13:35:04,680 DEBUG [wire]  << "Keep-Alive: timeout=10[\r][\n]"
13:35:04,680 DEBUG [wire]  << "Connection: keep-alive[\r][\n]"
13:35:04,680 DEBUG [wire]  << "[\r][\n]"
13:35:04,680 DEBUG [DefaultClientConnection] Receiving response: HTTP/1.1 200
13:35:04,680 DEBUG [headers] << HTTP/1.1 200
13:35:04,680 DEBUG [headers] << Content-Type: application/json
13:35:04,680 DEBUG [headers] << Transfer-Encoding: chunked
13:35:04,680 DEBUG [headers] << Date: Wed, 17 Jan 2024 05:35:04 GMT
13:35:04,680 DEBUG [headers] << Keep-Alive: timeout=10
13:35:04,680 DEBUG [headers] << Connection: keep-alive
13:35:04,681 DEBUG [DefaultHttpClient] Connection can be kept alive for 10000 MILLISECONDS
13:35:04,681 DEBUG [wire]  << "1e[\r][\n]"
13:35:04,681 DEBUG [wire]  << "{"id":1511,"name":"info-1511"}"
2024/01/17 13:35:04 - REST client.0 - Response time (milliseconds): [325] for [http://localhost:8080/api/v1/incr/info]
2024/01/17 13:35:04 - REST client.0 - The response code is 200
13:35:04,681 DEBUG [wire]  << "[\r][\n]"
13:35:04,681 DEBUG [wire]  << "0[\r][\n]"
13:35:04,682 DEBUG [wire]  << "[\r][\n]"
13:35:04,682 DEBUG [BasicClientConnectionManager] Releasing connection org.apache.http.impl.conn.ManagedClientConnectionImpl@2192ee11
13:35:04,682 DEBUG [BasicClientConnectionManager] Connection can be kept alive for 10000 MILLISECONDS

现在可以断定不是keepalive配置或connection复用的问题了。

再次运行kettle访问生产服务,由于增加了retry,也没有报错了!
推测是kettle中的apache httpclient创建connection后进行get/post发现服务端connection就已经关闭了,是服务端问题,但服务端问题没权限解决。

apache的httpclient重试机制浅析

具体可见org.apache.http.impl.client.HttpClientBuilder
如果没有HttpClientBuilder没有配置retry则会在build是时候设置默认retryhandler
【kettle】pentaho/data-integration 报错:org.apache.http.NoHttpResponseException: failed to respond_第3张图片
org.apache.http.impl.execchain.RetryExec就是重试的执行器,retryHandler.retryRequet的if判断最重要,重试过程如下:

【kettle】pentaho/data-integration 报错:org.apache.http.NoHttpResponseException: failed to respond_第4张图片但是设置了默认的org.apache.http.impl.client.DefaultHttpRequestRetryHandler重试handler是不是就可以重试呢,不是的,如下图:默认重试handler,要重试需要满足:
在限定次数3次内,requestSentRetryEnabled=ture或Request是幂等的。
如果使用默认的重试器,发起HttpPut或者HttpPost从源码推定不是幂等的,所以默认重试器不会重试。
【kettle】pentaho/data-integration 报错:org.apache.http.NoHttpResponseException: failed to respond_第5张图片StandardHttpRequestRetryHandler类中重写了handleAsIdempotent幂等判断的方法,基本所有请求都视为幂等得了,所以requestSentRetryEnabled配置成true/false就无所谓了。

ReuseStrategy和KeepAliveStrategy工作原理:

如下图从response获取的header信息:
在这里插入图片描述

org.apache.http.impl.client.DefaultClientConnectionReuseStrategy类的功能就是根据response返回值获取connection参数是否close并返回boolean值。
【kettle】pentaho/data-integration 报错:org.apache.http.NoHttpResponseException: failed to respond_第6张图片org.apache.http.impl.client.DefaultConnectionKeepAliveStrategy根据response解析connection_keep_alive的timeout值
【kettle】pentaho/data-integration 报错:org.apache.http.NoHttpResponseException: failed to respond_第7张图片reuse和keepalive的使用在主执行器中:
org.apache.http.impl.execchain.MainClientExec#execute方法,描述了reuse 和 keepalive 使用。
即,如果从response中获取connection=keep-alive则,再去获取keep-alive的timeout值并将此值回填至connection,待connection pool使用此connection前校验是否有效。
【kettle】pentaho/data-integration 报错:org.apache.http.NoHttpResponseException: failed to respond_第8张图片
org.apache.http.impl.client.HttpClientBuilder#build方法中都是默认添加reuse和keepalive的handler的。
【kettle】pentaho/data-integration 报错:org.apache.http.NoHttpResponseException: failed to respond_第9张图片

4.3 解决方法二

可以在外层自己实现重试!
kettle中http或rest组件就是使用如下两个类:
org.pentaho.di.trans.steps.http.HTTPorg.pentaho.di.trans.steps.httppost.HTTPPOST
其中HTTP负责除POST外的请求,HTTPPOST负责POST。
这两个类都在kettle-engine-8.2.0.0-342.jar包中

把kettle包中的 org.pentaho.di.trans.steps.http.HTTPorg.pentaho.di.trans.steps.httppost.HTTPPOST 复制源码到自己的项目中:
【kettle】pentaho/data-integration 报错:org.apache.http.NoHttpResponseException: failed to respond_第10张图片
新建一个retryutils,这里使用guava-retrying实现,也可以使用spring-retry,策略更丰富些。
此处重试次数和延迟时间都写死了,根据自己需求修改。

import com.github.rholder.retry.*;
import org.apache.http.NoHttpResponseException;
import org.apache.http.client.methods.CloseableHttpResponse;

import java.util.concurrent.Callable;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.TimeUnit;

public class RetryUtils {

    public static Retryer<CloseableHttpResponse> getHttpResponseRetryer() {
        Retryer<CloseableHttpResponse> retryer = RetryerBuilder.<CloseableHttpResponse>newBuilder()
                .retryIfExceptionOfType(NoHttpResponseException.class) //设置异常重试源
                .retryIfResult(res -> res == null)  //设置根据结果重试
                .withWaitStrategy(WaitStrategies.fixedWait(2, TimeUnit.SECONDS)) //设置等待间隔时间
                .withStopStrategy(StopStrategies.stopAfterAttempt(999)) //设置最大重试次数
                .build();
        return retryer;
    }

    public static CloseableHttpResponse getResponseWithRetry(Callable<CloseableHttpResponse> supplier) throws ExecutionException, RetryException {
        Retryer<CloseableHttpResponse> retryer = getHttpResponseRetryer();
        CloseableHttpResponse res = retryer.call(supplier);
        return res;
    }
}

修改源码:
org.pentaho.di.trans.steps.http.HTTPorg.pentaho.di.trans.steps.httppost.HTTPPOST 中的httpClient.execute都用RetryUtils.getResponseWithRetry给包起来,如下:

在这里插入图片描述
【kettle】pentaho/data-integration 报错:org.apache.http.NoHttpResponseException: failed to respond_第11张图片

编译,并把原始kettle-engine-8.2.0.0-342.jar包中的HTTP.class和HTTPPOST.class用自己刚刚编译的替换掉。再把KETTLE_HOME/lib/kettle-engine-8.2.0.0-342.jar包替换掉。把guava-retrying-2.0.0.jar 也复制到KETTLE_HOME/lib下。

大功告成,启动kettle!

注意事项:
重新发起rest/http只是适用于请求数据操作,如果发起请求后的操作不是幂等的,重试机制就会造成服务端操作被执行多次,切记!!!

你可能感兴趣的:(apache,大数据)