一次生产问题的排查---解决springboot集成dubbo并使用外部容器部署在应用重启出现dataSource has already close 异常

一、场景复现:

1、生产环境在上线得时候,重启应用,出现如下报错

在这里插入图片描述

2、使用技术栈 springboot 1.5.7 dubbo 2.6.1 druid 1.1.0 …
二、问题分析

​ 上面错误是说 拿到得数据库连接已经被关闭,这种情况一般是由于停机得时候应用里面还有请求在处理,而这些处理中得请求所需要得资源,比如依赖的某个bean已经被spring容器关闭。

三、错误复现

主要复现流程如下

1、启动zk

2、注册一个服务提供着到zk

3、注册一个服务消费者

4、通过jmeter,起N个线程发送请求,在任务处理还没完成的某个时间点,手动kill pid tomcat进程

5、观察日志输出

一次生产问题的排查---解决springboot集成dubbo并使用外部容器部署在应用重启出现dataSource has already close 异常_第1张图片
数据源DruidDataSource这个bean 已经被spring关闭,但我们的请求还没处理完,复现了生产报错

一次生产问题的排查---解决springboot集成dubbo并使用外部容器部署在应用重启出现dataSource has already close 异常_第2张图片
四、 问题定位

​ 在springboot里面,如果使用外置容器,需要通过SpringBootServletInitializer来实现,

@Override
protected SpringApplicationBuilder configure(SpringApplicationBuilder builder) {
  return builder.sources(DubboConsumerApplication.class);
}

通过跟踪源码,可以看到,这个实现默认会注册一个钩子,参见

private void refreshContext(ConfigurableApplicationContext context) {
 refresh(context);
 if (this.registerShutdownHook) {
  try {
   context.registerShutdownHook();
  }
  catch (AccessControlException ex) {
   // Not allowed in some environments.
  }
 }
}

springboot里面得钩子注册


@Override
public void registerShutdownHook() {
 if (this.shutdownHook == null) {
  // No shutdown hook registered yet.
  this.shutdownHook = new Thread() {
   @Override
   public void run() {
    synchronized (startupShutdownMonitor) {
     doClose();
    }
   }
  };
  Runtime.getRuntime().addShutdownHook(this.shutdownHook);
 }
}

而dubbo也有自己的钩子实现,dubbo的钩子在AbstractConfig的静态代码块中,

static {
    Runtime.getRuntime().addShutdownHook(new Thread(new Runnable() {
        @Override
        public void run() {
            if (logger.isInfoEnabled()) {
                logger.info("Run shutdown hook now.");
            }
            ProtocolConfig.destroyAll();
        }
    }, "DubboShutdownHook"));
}

​ JVM处理钩子类为java.lang.ApplicationShutdownHooks,当运行hook的时候,多个钩子并发执行,没有先后顺序,因为没有先后顺序,就有可能出现钩子之间得资源竞争,比如某个钩子所需要得资源被其它钩子给关闭了。

static void runHooks() {
    Collection threads;
    synchronized(ApplicationShutdownHooks.class) {
        threads = hooks.keySet();
        hooks = null;
    }
​
    //多个钩子线程执行
    for (Thread hook : threads) {
        hook.start();
    }
    for (Thread hook : threads) {
        while (true) {
            try {
                hook.join();
                break;
            } catch (InterruptedException ignored) {
            }
        }
    }
}

​ 目前解决这个问题的方案是:在spring关闭的时候,手动调用dubbo钩子线程执行的方法,再关闭spring管理的资源,比如bean等。

​ 首先springboot的钩子线程被触发得时候,运行如下代码

protected void doClose() {
  //入口处做了判断,采用CAS,即使多次执行该方法也ok
 if (this.active.get() && this.closed.compareAndSet(false, true)) {
  if (logger.isInfoEnabled()) {
   logger.info("Closing " + this);
  }
​
  LiveBeansView.unregisterApplicationContext(this);
​
  try {
   // Publish shutdown event. spring event事件,这里发送容器关闭事件,我们可以在这里做切入
   publishEvent(new ContextClosedEvent(this));
  }
  catch (Throwable ex) {
   logger.warn("Exception thrown from ApplicationListener handling ContextClosedEvent", ex);
  }
​
  // Stop all Lifecycle beans, to avoid delays during individual destruction.
  if (this.lifecycleProcessor != null) {
   try {
    this.lifecycleProcessor.onClose();
   }
   catch (Throwable ex) {
    logger.warn("Exception thrown from LifecycleProcessor on context close", ex);
   }
  }
  
  //执行bean得销毁操作
​
  // Destroy all cached singletons in the context's BeanFactory.
  destroyBeans();
​
  // Close the state of this context itself.
  closeBeanFactory();
​
  // Let subclasses do some final clean-up if they wish...
  onClose();
​
  this.active.set(false);
 }
}

​ 首先实现一个Listener,用来监听spring关闭事件,然后手动sleep, 延迟容器得关闭时间

@Component
public class ContextClosedOrderHandler implements ApplicationListener, ApplicationContextAware {
​
  private static final Logger logger = LoggerFactory.getLogger(ContextClosedOrderHandler.class);
​
  private ApplicationContext context;
​
  @Override
  public void setApplicationContext(ApplicationContext applicationContext) throws BeansException   {
    this.context = applicationContext;
  }
​
  @Override
  public void onApplicationEvent(ApplicationEvent event) {
    if (event instanceof ContextClosedEvent) {
      logger.info("开始关闭spring容器");
      try {
        ProtocolConfig.destroyAll();
        logger.info("spring关闭事件,开始销毁bean");
        //如果应用里面还有其它的自定义的线程池等,也可以在这里关闭
      } catch (InterruptedException e) {
        e.printStackTrace();
      }
    }
  }
}

​ 上面延缓了spring容器得关闭时间,先执行dubbo的优雅停机,通过测试,日志如下

34590 [http-nio-8080-exec-1] INFO  org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=192.168.127.138:2181 sessionTimeout=60000 watcher=org.apache.curator.ConnectionState@18fb7e3
34669 [http-nio-8080-exec-1] INFO  o.a.c.f.imps.CuratorFrameworkImpl - Default schema
34716 [http-nio-8080-exec-1-SendThread(192.168.127.138:2181)] INFO  org.apache.zookeeper.ClientCnxn - Opening socket connection to server 192.168.127.138/192.168.127.138:2181. Will not attempt to authenticate using SASL (unknown error)
34730 [http-nio-8080-exec-1-SendThread(192.168.127.138:2181)] INFO  org.apache.zookeeper.ClientCnxn - Socket connection established to 192.168.127.138/192.168.127.138:2181, initiating session
34767 [http-nio-8080-exec-1-SendThread(192.168.127.138:2181)] INFO  org.apache.zookeeper.ClientCnxn - Session establishment complete on server 192.168.127.138/192.168.127.138:2181, sessionid = 0x16ad66caf160009, negotiated timeout = 40000
34794 [http-nio-8080-exec-1] INFO  c.a.d.r.zookeeper.ZookeeperRegistry -  [DUBBO] Register: consumer://192.168.127.138/com.alibaba.dubbo.rpc.service.GenericService?application=dubbo-consumer&category=consumers&check=false&dubbo=2.6.1&generic=true&group=dubbo&interface=com.dubbo.api.UserService&logger=slf4j&pid=10118&side=consumer×tamp=1558378077302&version=1.0.0, dubbo version: 2.6.1, current host: 192.168.127.138
34820 [http-nio-8080-exec-1-EventThread] INFO  o.a.c.f.s.ConnectionStateManager - State change: CONNECTED
35029 [http-nio-8080-exec-1] INFO  c.a.d.r.zookeeper.ZookeeperRegistry -  [DUBBO] Subscribe: consumer://192.168.127.138/com.alibaba.dubbo.rpc.service.GenericService?application=dubbo-consumer&category=providers,configurators,routers&dubbo=2.6.1&generic=true&group=dubbo&interface=com.dubbo.api.UserService&logger=slf4j&pid=10118&side=consumer×tamp=1558378077302&version=1.0.0, dubbo version: 2.6.1, current host: 192.168.127.138
35086 [http-nio-8080-exec-1] INFO  c.a.d.r.zookeeper.ZookeeperRegistry -  [DUBBO] Notify urls for subscribe url consumer://192.168.127.138/com.alibaba.dubbo.rpc.service.GenericService?application=dubbo-consumer&category=providers,configurators,routers&dubbo=2.6.1&generic=true&group=dubbo&interface=com.dubbo.api.UserService&logger=slf4j&pid=10118&side=consumer×tamp=1558378077302&version=1.0.0, urls: [dubbo://192.168.10.17:20890/com.dubbo.api.UserService?anyhost=true&application=dubbo-provider&dubbo=2.6.1&generic=false&group=dubbo&interface=com.dubbo.api.UserService&methods=sayUser,sayHello,sayUserOutPutPOJO,sayHello2&pid=17664&revision=1.0.0&side=provider×tamp=1558323464597&version=1.0.0, empty://192.168.127.138/com.alibaba.dubbo.rpc.service.GenericService?application=dubbo-consumer&category=configurators&dubbo=2.6.1&generic=true&group=dubbo&interface=com.dubbo.api.UserService&logger=slf4j&pid=10118&side=consumer×tamp=1558378077302&version=1.0.0, empty://192.168.127.138/com.alibaba.dubbo.rpc.service.GenericService?application=dubbo-consumer&category=routers&dubbo=2.6.1&generic=true&group=dubbo&interface=com.dubbo.api.UserService&logger=slf4j&pid=10118&side=consumer×tamp=1558378077302&version=1.0.0], dubbo version: 2.6.1, current host: 192.168.127.138
35532 [http-nio-8080-exec-1] INFO  c.a.d.r.transport.AbstractClient -  [DUBBO] Successed connect to server /192.168.10.17:20890 from NettyClient 192.168.127.138 using dubbo version 2.6.1, channel is NettyChannel [channel=[id: 0x9af80983, /192.168.127.138:46560 => /192.168.10.17:20890]], dubbo version: 2.6.1, current host: 192.168.127.138
35532 [http-nio-8080-exec-1] INFO  c.a.d.r.transport.AbstractClient -  [DUBBO] Start NettyClient /192.168.127.138 connect to the server /192.168.10.17:20890, dubbo version: 2.6.1, current host: 192.168.127.138
35774 [http-nio-8080-exec-1] INFO  c.a.dubbo.config.AbstractConfig -  [DUBBO] Refer dubbo service com.alibaba.dubbo.rpc.service.GenericService from url zookeeper://192.168.127.138:2181/com.alibaba.dubbo.registry.RegistryService?anyhost=true&application=dubbo-consumer&check=false&dubbo=2.6.1&generic=true&group=dubbo&interface=com.dubbo.api.UserService&logger=slf4j&methods=sayUser,sayHello,sayUserOutPutPOJO,sayHello2&pid=10118®ister.ip=192.168.127.138&remote.timestamp=1558323464597&revision=1.0.0&side=consumer×tamp=1558378077302&version=1.0.0, dubbo version: 2.6.1, current host: 192.168.127.138
37102 [http-nio-8080-exec-1] INFO  c.a.druid.pool.DruidDataSource - {dataSource-1} inited
37784 [http-nio-8080-exec-1] INFO  c.d.w.c.DemoConsumerController - 当前线程:http-nio-8080-exec-1 准备睡眠1s,请执行shutdown
38791 [http-nio-8080-exec-1] INFO  c.d.w.c.DemoConsumerController - 当前线程:http-nio-8080-exec-1  ------ start get data
38963 [http-nio-8080-exec-1] INFO  c.d.w.c.DemoConsumerController - 当前线程:http-nio-8080-exec-1执行结果返回:{"age":10,"id":1,"name":"tanjie"}[{"age":10,"id":1,"name":"tanjie"},{"age":20,"id":2,"name":"tanjie2"}]+===============+1_测试name_201_测试name_20
50904 [http-nio-8080-exec-2] INFO  c.d.w.c.DemoConsumerController - 当前线程:http-nio-8080-exec-2 准备睡眠1s,请执行shutdown
50955 [http-nio-8080-exec-3] INFO  c.d.w.c.DemoConsumerController - 当前线程:http-nio-8080-exec-3 准备睡眠1s,请执行shutdown
51001 [http-nio-8080-exec-4] INFO  c.d.w.c.DemoConsumerController - 当前线程:http-nio-8080-exec-4 准备睡眠1s,请执行shutdown
51057 [http-nio-8080-exec-5] INFO  c.d.w.c.DemoConsumerController - 当前线程:http-nio-8080-exec-5 准备睡眠1s,请执行shutdown
51104 [http-nio-8080-exec-6] INFO  c.d.w.c.DemoConsumerController - 当前线程:http-nio-8080-exec-6 准备睡眠1s,请执行shutdown
51155 [http-nio-8080-exec-7] INFO  c.d.w.c.DemoConsumerController - 当前线程:http-nio-8080-exec-7 准备睡眠1s,请执行shutdown
51214 [http-nio-8080-exec-8] INFO  c.d.w.c.DemoConsumerController - 当前线程:http-nio-8080-exec-8 准备睡眠1s,请执行shutdown
51251 [http-nio-8080-exec-9] INFO  c.d.w.c.DemoConsumerController - 当前线程:http-nio-8080-exec-9 准备睡眠1s,请执行shutdown
51301 [http-nio-8080-exec-10] INFO  c.d.w.c.DemoConsumerController - 当前线程:http-nio-8080-exec-10 准备睡眠1s,请执行shutdown
51353 [http-nio-8080-exec-1] INFO  c.d.w.c.DemoConsumerController - 当前线程:http-nio-8080-exec-1 准备睡眠1s,请执行shutdown
51410 [http-nio-8080-exec-11] INFO  c.d.w.c.DemoConsumerController - 当前线程:http-nio-8080-exec-11 准备睡眠1s,请执行shutdown
51457 [http-nio-8080-exec-12] INFO  c.d.w.c.DemoConsumerController - 当前线程:http-nio-8080-exec-12 准备睡眠1s,请执行shutdown
51509 [http-nio-8080-exec-13] INFO  c.d.w.c.DemoConsumerController - 当前线程:http-nio-8080-exec-13 准备睡眠1s,请执行shutdown
51560 [http-nio-8080-exec-14] INFO  c.d.w.c.DemoConsumerController - 当前线程:http-nio-8080-exec-14 准备睡眠1s,请执行shutdown
51607 [http-nio-8080-exec-15] INFO  c.d.w.c.DemoConsumerController - 当前线程:http-nio-8080-exec-15 准备睡眠1s,请执行shutdown
51652 [http-nio-8080-exec-16] INFO  c.d.w.c.DemoConsumerController - 当前线程:http-nio-8080-exec-16 准备睡眠1s,请执行shutdown
51703 [http-nio-8080-exec-17] INFO  c.d.w.c.DemoConsumerController - 当前线程:http-nio-8080-exec-17 准备睡眠1s,请执行shutdown
51770 [http-nio-8080-exec-18] INFO  c.d.w.c.DemoConsumerController - 当前线程:http-nio-8080-exec-18 准备睡眠1s,请执行shutdown
51806 [http-nio-8080-exec-19] INFO  c.d.w.c.DemoConsumerController - 当前线程:http-nio-8080-exec-19 准备睡眠1s,请执行shutdown
51859 [http-nio-8080-exec-20] INFO  c.d.w.c.DemoConsumerController - 当前线程:http-nio-8080-exec-20 准备睡眠1s,请执行shutdown
51905 [http-nio-8080-exec-2] INFO  c.d.w.c.DemoConsumerController - 当前线程:http-nio-8080-exec-2  ------ start get data
51909 [http-nio-8080-exec-2] INFO  c.d.w.c.DemoConsumerController - 当前线程:http-nio-8080-exec-2执行结果返回:{"age":10,"id":1,"name":"tanjie"}[{"age":10,"id":1,"name":"tanjie"},{"age":20,"id":2,"name":"tanjie2"}]+===============+1_测试name_201_测试name_20
51956 [http-nio-8080-exec-3] INFO  c.d.w.c.DemoConsumerController - 当前线程:http-nio-8080-exec-3  ------ start get data
51962 [http-nio-8080-exec-3] INFO  c.d.w.c.DemoConsumerController - 当前线程:http-nio-8080-exec-3执行结果返回:{"age":10,"id":1,"name":"tanjie"}[{"age":10,"id":1,"name":"tanjie"},{"age":20,"id":2,"name":"tanjie2"}]+===============+1_测试name_201_测试name_20
52002 [http-nio-8080-exec-4] INFO  c.d.w.c.DemoConsumerController - 当前线程:http-nio-8080-exec-4  ------ start get data
52006 [http-nio-8080-exec-4] INFO  c.d.w.c.DemoConsumerController - 当前线程:http-nio-8080-exec-4执行结果返回:{"age":10,"id":1,"name":"tanjie"}[{"age":10,"id":1,"name":"tanjie"},{"age":20,"id":2,"name":"tanjie2"}]+===============+1_测试name_201_测试name_20
52027 [DubboShutdownHook] INFO  c.a.dubbo.config.AbstractConfig -  [DUBBO] Run shutdown hook now., dubbo version: 2.6.1, current host: 192.168.127.138
52031 [Thread-6] INFO  o.s.b.w.s.c.AnnotationConfigServletWebServerApplicationContext - Closing org.springframework.boot.web.servlet.context.AnnotationConfigServletWebServerApplicationContext@14878f7: startup date [Mon May 20 11:47:25 PDT 2019]; root of context hierarchy
52034 [DubboShutdownHook] INFO  c.a.d.r.s.AbstractRegistryFactory -  [DUBBO] Close all registries [zookeeper://192.168.127.138:2181/com.alibaba.dubbo.registry.RegistryService?application=dubbo-consumer&check=false&client=curator&dubbo=2.6.1&interface=com.alibaba.dubbo.registry.RegistryService&logger=slf4j&pid=10118×tamp=1558378077322], dubbo version: 2.6.1, current host: 192.168.127.138
20-May-2019 11:48:15.360 INFO [Thread-7] org.apache.coyote.AbstractProtocol.pause Pausing ProtocolHandler ["http-nio-8080"]
52036 [Thread-6] INFO  c.d.w.c.ContextClosedOrderHandler - 开始关闭spring容器
52038 [DubboShutdownHook] INFO  c.a.d.r.zookeeper.ZookeeperRegistry -  [DUBBO] Destroy registry:zookeeper://192.168.127.138:2181/com.alibaba.dubbo.registry.RegistryService?application=dubbo-consumer&check=false&client=curator&dubbo=2.6.1&interface=com.alibaba.dubbo.registry.RegistryService&logger=slf4j&pid=10118×tamp=1558378077322, dubbo version: 2.6.1, current host: 192.168.127.138
52038 [Thread-6] INFO  c.d.w.c.ContextClosedOrderHandler - spring关闭事件,等20s
52039 [DubboShutdownHook] INFO  c.a.d.r.zookeeper.ZookeeperRegistry -  [DUBBO] Unregister: consumer://192.168.127.138/com.alibaba.dubbo.rpc.service.GenericService?application=dubbo-consumer&category=consumers&check=false&dubbo=2.6.1&generic=true&group=dubbo&interface=com.dubbo.api.UserService&logger=slf4j&pid=10118&side=consumer×tamp=1558378077302&version=1.0.0, dubbo version: 2.6.1, current host: 192.168.127.138
20-May-2019 11:48:15.365 INFO [Thread-7] org.apache.coyote.AbstractProtocol.pause Pausing ProtocolHandler ["ajp-nio-8009"]
20-May-2019 11:48:15.366 INFO [Thread-7] org.apache.catalina.core.StandardService.stopInternal Stopping service [Catalina]
52055 [DubboShutdownHook] INFO  c.a.d.r.zookeeper.ZookeeperRegistry -  [DUBBO] Destroy unregister url consumer://192.168.127.138/com.alibaba.dubbo.rpc.service.GenericService?application=dubbo-consumer&category=consumers&check=false&dubbo=2.6.1&generic=true&group=dubbo&interface=com.dubbo.api.UserService&logger=slf4j&pid=10118&side=consumer×tamp=1558378077302&version=1.0.0, dubbo version: 2.6.1, current host: 192.168.127.138
52057 [DubboShutdownHook] INFO  c.a.d.r.zookeeper.ZookeeperRegistry -  [DUBBO] Unsubscribe: consumer://192.168.127.138/com.alibaba.dubbo.rpc.service.GenericService?application=dubbo-consumer&category=providers,configurators,routers&dubbo=2.6.1&generic=true&group=dubbo&interface=com.dubbo.api.UserService&logger=slf4j&pid=10118&side=consumer×tamp=1558378077302&version=1.0.0, dubbo version: 2.6.1, current host: 192.168.127.138
52060 [http-nio-8080-exec-5] INFO  c.d.w.c.DemoConsumerController - 当前线程:http-nio-8080-exec-5  ------ start get data
52068 [DubboShutdownHook] INFO  c.a.d.r.zookeeper.ZookeeperRegistry -  [DUBBO] Destroy unsubscribe url consumer://192.168.127.138/com.alibaba.dubbo.rpc.service.GenericService?application=dubbo-consumer&category=providers,configurators,routers&dubbo=2.6.1&generic=true&group=dubbo&interface=com.dubbo.api.UserService&logger=slf4j&pid=10118&side=consumer×tamp=1558378077302&version=1.0.0, dubbo version: 2.6.1, current host: 192.168.127.138
52071 [http-nio-8080-exec-5] INFO  c.d.w.c.DemoConsumerController - 当前线程:http-nio-8080-exec-5执行结果返回:{"age":10,"id":1,"name":"tanjie"}[{"age":10,"id":1,"name":"tanjie"},{"age":20,"id":2,"name":"tanjie2"}]+===============+1_测试name_201_测试name_20
52076 [Curator-Framework-0] INFO  o.a.c.f.imps.CuratorFrameworkImpl - backgroundOperationsLoop exiting
52107 [http-nio-8080-exec-6] INFO  c.d.w.c.DemoConsumerController - 当前线程:http-nio-8080-exec-6  ------ start get data
52113 [http-nio-8080-exec-6] INFO  c.d.w.c.DemoConsumerController - 当前线程:http-nio-8080-exec-6执行结果返回:{"age":10,"id":1,"name":"tanjie"}[{"age":10,"id":1,"name":"tanjie"},{"age":20,"id":2,"name":"tanjie2"}]+===============+1_测试name_201_测试name_20
52137 [DubboShutdownHook] INFO  org.apache.zookeeper.ZooKeeper - Session: 0x16ad66caf160009 closed
52138 [http-nio-8080-exec-1-EventThread] INFO  org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x16ad66caf160009
52155 [http-nio-8080-exec-7] INFO  c.d.w.c.DemoConsumerController - 当前线程:http-nio-8080-exec-7  ------ start get data
52162 [http-nio-8080-exec-7] INFO  c.d.w.c.DemoConsumerController - 当前线程:http-nio-8080-exec-7执行结果返回:{"age":10,"id":1,"name":"tanjie"}[{"age":10,"id":1,"name":"tanjie"},{"age":20,"id":2,"name":"tanjie2"}]+===============+1_测试name_201_测试name_20
20-May-2019 11:48:15.520 INFO [localhost-startStop-2] org.apache.catalina.core.StandardWrapper.unload Waiting for [14] instance(s) to be deallocated for Servlet [dispatcherServlet]
52216 [http-nio-8080-exec-8] INFO  c.d.w.c.DemoConsumerController - 当前线程:http-nio-8080-exec-8  ------ start get data
52221 [http-nio-8080-exec-8] INFO  c.d.w.c.DemoConsumerController - 当前线程:http-nio-8080-exec-8执行结果返回:{"age":10,"id":1,"name":"tanjie"}[{"age":10,"id":1,"name":"tanjie"},{"age":20,"id":2,"name":"tanjie2"}]+===============+1_测试name_201_测试name_20
52253 [http-nio-8080-exec-9] INFO  c.d.w.c.DemoConsumerController - 当前线程:http-nio-8080-exec-9  ------ start get data
52259 [http-nio-8080-exec-9] INFO  c.d.w.c.DemoConsumerController - 当前线程:http-nio-8080-exec-9执行结果返回:{"age":10,"id":1,"name":"tanjie"}[{"age":10,"id":1,"name":"tanjie"},{"age":20,"id":2,"name":"tanjie2"}]+===============+1_测试name_201_测试name_20
52302 [http-nio-8080-exec-10] INFO  c.d.w.c.DemoConsumerController - 当前线程:http-nio-8080-exec-10  ------ start get data
52307 [http-nio-8080-exec-10] INFO  c.d.w.c.DemoConsumerController - 当前线程:http-nio-8080-exec-10执行结果返回:{"age":10,"id":1,"name":"tanjie"}[{"age":10,"id":1,"name":"tanjie"},{"age":20,"id":2,"name":"tanjie2"}]+===============+1_测试name_201_测试name_20
52354 [http-nio-8080-exec-1] INFO  c.d.w.c.DemoConsumerController - 当前线程:http-nio-8080-exec-1  ------ start get data
52360 [http-nio-8080-exec-1] INFO  c.d.w.c.DemoConsumerController - 当前线程:http-nio-8080-exec-1执行结果返回:{"age":10,"id":1,"name":"tanjie"}[{"age":10,"id":1,"name":"tanjie"},{"age":20,"id":2,"name":"tanjie2"}]+===============+1_测试name_201_测试name_20
52412 [http-nio-8080-exec-11] INFO  c.d.w.c.DemoConsumerController - 当前线程:http-nio-8080-exec-11  ------ start get data
52416 [http-nio-8080-exec-11] INFO  c.d.w.c.DemoConsumerController - 当前线程:http-nio-8080-exec-11执行结果返回:{"age":10,"id":1,"name":"tanjie"}[{"age":10,"id":1,"name":"tanjie"},{"age":20,"id":2,"name":"tanjie2"}]+===============+1_测试name_201_测试name_20
52459 [http-nio-8080-exec-12] INFO  c.d.w.c.DemoConsumerController - 当前线程:http-nio-8080-exec-12  ------ start get data
52465 [http-nio-8080-exec-12] INFO  c.d.w.c.DemoConsumerController - 当前线程:http-nio-8080-exec-12执行结果返回:{"age":10,"id":1,"name":"tanjie"}[{"age":10,"id":1,"name":"tanjie"},{"age":20,"id":2,"name":"tanjie2"}]+===============+1_测试name_201_测试name_20
52510 [http-nio-8080-exec-13] INFO  c.d.w.c.DemoConsumerController - 当前线程:http-nio-8080-exec-13  ------ start get data
52515 [http-nio-8080-exec-13] INFO  c.d.w.c.DemoConsumerController - 当前线程:http-nio-8080-exec-13执行结果返回:{"age":10,"id":1,"name":"tanjie"}[{"age":10,"id":1,"name":"tanjie"},{"age":20,"id":2,"name":"tanjie2"}]+===============+1_测试name_201_测试name_20
52562 [http-nio-8080-exec-14] INFO  c.d.w.c.DemoConsumerController - 当前线程:http-nio-8080-exec-14  ------ start get data
52566 [http-nio-8080-exec-14] INFO  c.d.w.c.DemoConsumerController - 当前线程:http-nio-8080-exec-14执行结果返回:{"age":10,"id":1,"name":"tanjie"}[{"age":10,"id":1,"name":"tanjie"},{"age":20,"id":2,"name":"tanjie2"}]+===============+1_测试name_201_测试name_20
52608 [http-nio-8080-exec-15] INFO  c.d.w.c.DemoConsumerController - 当前线程:http-nio-8080-exec-15  ------ start get data
52619 [http-nio-8080-exec-15] INFO  c.d.w.c.DemoConsumerController - 当前线程:http-nio-8080-exec-15执行结果返回:{"age":10,"id":1,"name":"tanjie"}[{"age":10,"id":1,"name":"tanjie"},{"age":20,"id":2,"name":"tanjie2"}]+===============+1_测试name_201_测试name_20
52653 [http-nio-8080-exec-16] INFO  c.d.w.c.DemoConsumerController - 当前线程:http-nio-8080-exec-16  ------ start get data
52668 [http-nio-8080-exec-16] INFO  c.d.w.c.DemoConsumerController - 当前线程:http-nio-8080-exec-16执行结果返回:{"age":10,"id":1,"name":"tanjie"}[{"age":10,"id":1,"name":"tanjie"},{"age":20,"id":2,"name":"tanjie2"}]+===============+1_测试name_201_测试name_20
52704 [http-nio-8080-exec-17] INFO  c.d.w.c.DemoConsumerController - 当前线程:http-nio-8080-exec-17  ------ start get data
52710 [http-nio-8080-exec-17] INFO  c.d.w.c.DemoConsumerController - 当前线程:http-nio-8080-exec-17执行结果返回:{"age":10,"id":1,"name":"tanjie"}[{"age":10,"id":1,"name":"tanjie"},{"age":20,"id":2,"name":"tanjie2"}]+===============+1_测试name_201_测试name_20
52771 [http-nio-8080-exec-18] INFO  c.d.w.c.DemoConsumerController - 当前线程:http-nio-8080-exec-18  ------ start get data
52776 [http-nio-8080-exec-18] INFO  c.d.w.c.DemoConsumerController - 当前线程:http-nio-8080-exec-18执行结果返回:{"age":10,"id":1,"name":"tanjie"}[{"age":10,"id":1,"name":"tanjie"},{"age":20,"id":2,"name":"tanjie2"}]+===============+1_测试name_201_测试name_20
52808 [http-nio-8080-exec-19] INFO  c.d.w.c.DemoConsumerController - 当前线程:http-nio-8080-exec-19  ------ start get data
52812 [http-nio-8080-exec-19] INFO  c.d.w.c.DemoConsumerController - 当前线程:http-nio-8080-exec-19执行结果返回:{"age":10,"id":1,"name":"tanjie"}[{"age":10,"id":1,"name":"tanjie"},{"age":20,"id":2,"name":"tanjie2"}]+===============+1_测试name_201_测试name_20
52860 [http-nio-8080-exec-20] INFO  c.d.w.c.DemoConsumerController - 当前线程:http-nio-8080-exec-20  ------ start get data
52864 [http-nio-8080-exec-20] INFO  c.d.w.c.DemoConsumerController - 当前线程:http-nio-8080-exec-20执行结果返回:{"age":10,"id":1,"name":"tanjie"}[{"age":10,"id":1,"name":"tanjie"},{"age":20,"id":2,"name":"tanjie2"}]+===============+1_测试name_201_测试name_20
62143 [DubboShutdownHook] INFO  c.a.d.r.p.dubbo.DubboProtocol -  [DUBBO] Close dubbo connect: /192.168.127.138:46560-->/192.168.10.17:20890, dubbo version: 2.6.1, current host: 192.168.127.138
62144 [DubboShutdownHook] INFO  c.a.d.r.t.netty.NettyChannel -  [DUBBO] Close netty channel [id: 0x9af80983, /192.168.127.138:46560 => /192.168.10.17:20890], dubbo version: 2.6.1, current host: 192.168.127.138
62148 [DubboShutdownHook] INFO  c.a.d.r.p.dubbo.DubboProtocol -  [DUBBO] Close dubbo connect: 192.168.127.138:0-->192.168.10.17:20890, dubbo version: 2.6.1, current host: 192.168.127.138
62150 [DubboShutdownHook] INFO  c.a.d.r.p.dubbo.DubboProtocol -  [DUBBO] Destroy reference: dubbo://192.168.10.17:20890/com.dubbo.api.UserService?anyhost=true&application=dubbo-consumer&check=false&dubbo=2.6.1&generic=true&group=dubbo&interface=com.dubbo.api.UserService&logger=slf4j&methods=sayUser,sayHello,sayUserOutPutPOJO,sayHello2&pid=10118®ister.ip=192.168.127.138&remote.timestamp=1558323464597&revision=1.0.0&side=consumer×tamp=1558378077302&version=1.0.0, dubbo version: 2.6.1, current host: 192.168.127.138
62150 [DubboSharedHandler-thread-1] INFO  c.a.d.r.p.dubbo.DubboProtocol -  [DUBBO] disconnected from /192.168.10.17:20890,url:dubbo://192.168.10.17:20890/com.dubbo.api.UserService?anyhost=true&application=dubbo-consumer&check=false&codec=dubbo&dubbo=2.6.1&generic=true&group=dubbo&heartbeat=60000&interface=com.dubbo.api.UserService&logger=slf4j&methods=sayUser,sayHello,sayUserOutPutPOJO,sayHello2&pid=10118®ister.ip=192.168.127.138&remote.timestamp=1558323464597&revision=1.0.0&side=consumer×tamp=1558378077302&version=1.0.0, dubbo version: 2.6.1, current host: 192.168.127.138
72039 [Thread-6] INFO  c.d.w.c.ContextClosedOrderHandler - spring关闭事件,开始销毁bean
72041 [Thread-6] INFO  o.s.j.e.a.AnnotationMBeanExporter - Unregistering JMX-exposed beans on shutdown
72042 [Thread-6] INFO  o.s.j.e.a.AnnotationMBeanExporter - Unregistering JMX-exposed beans
72059 [Thread-6] INFO  c.a.druid.pool.DruidDataSource - {dataSource-1} closed
72062 [localhost-startStop-2] INFO  c.d.web.config.TomcatClosedListener - tomcat 关闭..........................
20-May-2019 11:48:35.388 WARNING [localhost-startStop-2] org.apache.catalina.loader.WebappClassLoaderBase.clearReferencesJdbc The web application [dubbo-web] registered the JDBC driver [com.alibaba.druid.proxy.DruidDriver] but failed to unregister it when the web application was stopped. To prevent a memory leak, the JDBC Driver has been forcibly unregistered.
20-May-2019 11:48:35.391 WARNING [localhost-startStop-2] org.apache.catalina.loader.WebappClassLoaderBase.clearReferencesJdbc The web application [dubbo-web] registered the JDBC driver [com.mysql.jdbc.Driver] but failed to unregister it when the web application was stopped. To prevent a memory leak, the JDBC Driver has been forcibly unregistered.
20-May-2019 11:48:35.392 WARNING [localhost-startStop-2] org.apache.catalina.loader.WebappClassLoaderBase.clearReferencesThreads The web application [dubbo-web] appears to have started a thread named [DubboRegistryFailedRetryTimer-thread-1] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread:
 sun.misc.Unsafe.park(Native Method)
 java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
 java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1081)
 java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
 java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 java.lang.Thread.run(Thread.java:748)
 。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。
 20-May-2019 11:48:35.506 INFO [Thread-7] org.apache.coyote.AbstractProtocol.stop Stopping ProtocolHandler ["http-nio-8080"]
20-May-2019 11:48:35.507 INFO [Thread-7] org.apache.coyote.AbstractProtocol.stop Stopping ProtocolHandler ["ajp-nio-8009"]
20-May-2019 11:48:35.509 INFO [Thread-7] org.apache.coyote.AbstractProtocol.destroy Destroying ProtocolHandler ["http-nio-8080"]
20-May-2019 11:48:35.510 INFO [Thread-7] org.apache.coyote.AbstractProtocol.destroy Destroying ProtocolHandler ["ajp-nio-8009"]
​

从日志看,只要时间设置得足够保证dubbo里面得请求被处理完,就没有问题,接下来我们来看一下dubbo是如何做到优雅停机的,

五、Dubbo优雅停机原理

​ dubbo版本和线上保持一致为 2.6.1,优雅停机的原理可参见官网 http://dubbo.apache.org/zh-cn/docs/user/demos/graceful-shutdown.html

5.1 优雅停机源码分析

dubbo的优雅停机入口 AbstractConfig

static {
    Runtime.getRuntime().addShutdownHook(new Thread(new Runnable() {
        @Override
        public void run() {
            if (logger.isInfoEnabled()) {
                logger.info("Run shutdown hook now.");
            }
            ProtocolConfig.destroyAll();
        }
    }, "DubboShutdownHook"));
}

​ 当钩子线程被执行的时候,执行ProtocolConfig.destroyAll(); 这是一个public 的 static 方法,如果你的应用因为某些原因没有触发这个钩子线程,也可以手动调用 ProtocolConfig.destroyAll();

一次生产问题的排查---解决springboot集成dubbo并使用外部容器部署在应用重启出现dataSource has already close 异常_第3张图片

​ 从时序图中可以看到,这个方法主要分为2大步,第一步是调用AbstractRegistryFactory.destroyAll(),第二步是拿到ExtensionLoader,循环执行 protocol.destroy();

// TODO: 2017/8/30 to move this method somewhere else
public static void destroyAll() {
    //通过CAS比较,如果已经销毁,直接忽略
    if (!destroyed.compareAndSet(false, true)) {
        return;
    }
​
    //销毁register相关,销毁应用程序中服务提供着和服务消费者的注册和订阅
    AbstractRegistryFactory.destroyAll();
​
    //销毁协议protocol相关
    ExtensionLoader loader = ExtensionLoader.getExtensionLoader(Protocol.class);
    for (String protocolName : loader.getLoadedExtensions()) {
        try {
            Protocol protocol = loader.getLoadedExtension(protocolName);
            if (protocol != null) {
                protocol.destroy();
            }
        } catch (Throwable t) {
            logger.warn(t.getMessage(), t);
        }
    }
}

先看第一步,销毁register

// TODO: 2017/8/30 to move somewhere else better
public static void destroyAll() {
    if (LOGGER.isInfoEnabled()) {
        LOGGER.info("Close all registries " + getRegistries());
    }
    // Lock up the registry shutdown process 加锁
    LOCK.lock();
    try {
        for (Registry registry : getRegistries()) {
            try {
                //循环销毁
                registry.destroy();
            } catch (Throwable e) {
                LOGGER.error(e.getMessage(), e);
            }
        }
        REGISTRIES.clear();
    } finally {
        // Release the lock
        LOCK.unlock();
    }
}

这里我们已zookeeper作为注册中心来分析

@Override
public void destroy() {
    //调用父类方法,取消注册和订阅
    super.destroy();
    try {
        //关闭zk客户端
        zkClient.close();
    } catch (Exception e) {
        logger.warn("Failed to close zookeeper client " + getUrl() + ", cause: " + e.getMessage(), e);
    }
}

ZookeeperRegister的类图,顶层为Registry, 然后有个抽象类实现了共有的执行服务在注册中心的取消注册和取消订阅操作,当然这里还实现了重试机制,依靠FailbackRegister实现

下面看一下AbstractRegister的destroy

 @Override
public void destroy() {
    //无论是服务提供者还是消费者,都会向 Registry 发起注册和订阅,所以都需要进行取消
    if (logger.isInfoEnabled()) {
        logger.info("Destroy registry:" + getUrl());
    }
    //取消注册,直接干掉在zk上的所有注册的url
    Set destroyRegistered = new HashSet(getRegistered());
    if (!destroyRegistered.isEmpty()) {
        for (URL url : new HashSet(getRegistered())) {
            if (url.getParameter(Constants.DYNAMIC_KEY, true)) {
                try {
                    unregister(url); //取消注册
                    if (logger.isInfoEnabled()) {
                        logger.info("Destroy unregister url " + url);
                    }
                } catch (Throwable t) {
                    logger.warn("Failed to unregister url " + url + " to registry " + getUrl() + " on destroy, cause: " + t.getMessage(), t);
                }
            }
        }
    }
    //取消订阅,一个URL ,可以有多个订阅着,就是说 一个生产着,对应多个消费者
    Map> destroySubscribed = new HashMap>(getSubscribed());
    if (!destroySubscribed.isEmpty()) {
        for (Map.Entry> entry : destroySubscribed.entrySet()) {
            URL url = entry.getKey();
            for (NotifyListener listener : entry.getValue()) {
                try {
                    unsubscribe(url, listener); //取消每个消费者对该url的订阅事件
                    if (logger.isInfoEnabled()) {
                        logger.info("Destroy unsubscribe url " + url);
                    }
                } catch (Throwable t) {
                    logger.warn("Failed to unsubscribe url " + url + " to registry " + getUrl() + " on destroy, cause: " + t.getMessage(), t);
                }
            }
        }
    }
}

​ 简单总结一下第一步,核心是干掉在zk上的注册的所有服务提供者的url地址,然后服务消费者取消对依赖服务的订阅关系,不在监听其对应的watcher事件。

第二步,销毁协议protocol相关

先说明一下:

    exchange 信息交换层:封装请求响应模式,以 Request, Response 为中心,扩展接口为 Exchanger, ExchangeChannel(信息交互通道), ExchangeClient(信息交互客户端), ExchangeServer(信息交互服务端)。  在一次 RPC 调用,每个请求( Request ),是关注对应的响应( Response )。那么transport 层 提供的网络传输功能,而exchange 层,在其 Message之上,构造了Request-Response 的模型。

 所以下面我们看到,其实和远程服务交互得时候,都由Exchanger接口得几个实现类在处理

这里我们依然以dubboProtocol来进行说明,

@Override
public void destroy() {
    // 实际情况下,一个应用程序即可以是服务提供者,又是服务消费者。
    // 因此,需要关闭 ExchangeClient 和 ExchangeServer
    // 销毁所有的ExchangeServer,即假如某个服务是服务提供者,那么对于其它应用,自己就是一个server端
    for (String key : new ArrayList(serverMap.keySet())) {
        ExchangeServer server = serverMap.remove(key);
        if (server != null) {
            try {
                if (logger.isInfoEnabled()) {
                    logger.info("Close dubbo server: " + server.getLocalAddress());
                }
                server.close(ConfigUtils.getServerShutdownTimeout());
            } catch (Throwable t) {
                logger.warn(t.getMessage(), t);
            }
        }
    }
    // 销毁所有 ExchangeClient,如果某个服务是消费者,相对于服务提供着来说,自己是client
    for (String key : new ArrayList(referenceClientMap.keySet())) {
        ExchangeClient client = referenceClientMap.remove(key);
        if (client != null) {
            try {
                if (logger.isInfoEnabled()) {
                    logger.info("Close dubbo connect: " + client.getLocalAddress() + "-->" + client.getRemoteAddress());
                }
                client.close(ConfigUtils.getServerShutdownTimeout());
            } catch (Throwable t) {
                logger.warn(t.getMessage(), t);
            }
        }
    }

    for (String key : new ArrayList(ghostClientMap.keySet())) {
        ExchangeClient client = ghostClientMap.remove(key);
        if (client != null) {
            try {
                if (logger.isInfoEnabled()) {
                    logger.info("Close dubbo connect: " + client.getLocalAddress() + "-->" + client.getRemoteAddress());
                }
                client.close(ConfigUtils.getServerShutdownTimeout());
            } catch (Throwable t) {
                logger.warn(t.getMessage(), t);
            }
        }
    }
    stubServiceMethodsMap.clear();
    //循环,销毁协议( 此处为 DubboProtocol )对应的服务消费者的所有 Invoker(此处为 DubboInvoker)
    //循环,销毁协议( 此处为 DubboProtocol )对应的服务提供者的所有 Exporter(此处为 DubboExporter) 
    super.destroy();
}

先看 server.close(ConfigUtils.getServerShutdownTimeout());

时序图
一次生产问题的排查---解决springboot集成dubbo并使用外部容器部署在应用重启出现dataSource has already close 异常_第4张图片HeaderChangeServer.close

@Override
public void close(final int timeout) {
    startClose();
    if (timeout > 0) {
        final long max = (long) timeout;
        final long start = System.currentTimeMillis();
        // 发送 READONLY 事件给所有 Client(也就是订阅我服务得消费者) ,表示 Server 不可读了。
        if (getUrl().getParameter(Constants.CHANNEL_SEND_READONLYEVENT_KEY, true)) {
            sendChannelReadOnlyEvent();
        }
        //等待请求完成
        while (HeaderExchangeServer.this.isRunning()
                && System.currentTimeMillis() - start < max) {
            try {
                Thread.sleep(10);
            } catch (InterruptedException e) {
                logger.warn(e.getMessage(), e);
            }
        }
    }
    //关闭心跳定时器
    doClose();
    //关闭底层服务器通信通道,底层比如是Netty
    server.close(timeout);
}

最后一句server.close会调到 AbstractServer.close()

@Override
public void close(int timeout) {
    //处理线程池中的任务
    ExecutorUtil.gracefulShutdown(executor, timeout);
    close();
}

首先执行 ExecutorUtil.gracefulShutdown(executor, timeout);执行优雅处理,处理线程池里面还有的请求

public static void gracefulShutdown(Executor executor, int timeout) {
    if (!(executor instanceof ExecutorService) || isShutdown(executor)) {
        return;
    }
    final ExecutorService es = (ExecutorService) executor;
    try {
        //首先等待线程池里面的任务执行完
        es.shutdown(); // Disable new tasks from being submitted
    } catch (SecurityException ex2) {
        return;
    } catch (NullPointerException ex2) {
        return;
    }
    //等待原任务执行完,如果超时,直接结束任务
    try {
        if (!es.awaitTermination(timeout, TimeUnit.MILLISECONDS)) {
            es.shutdownNow();
        }
    } catch (InterruptedException ex) {
        //如果抛出了异常,也是直接结束任务
        es.shutdownNow();
        Thread.currentThread().interrupt();
    }
    //如果最后线程池没有被关闭,单开一个线程去执行关闭操作
    if (!isShutdown(es)) {
        newThreadToCloseExecutor(es);
    }
}

接下来执行close,假如底层是netty,就是执行nettyServer的Close操作,

@Override
public void close() {
    if (logger.isInfoEnabled()) {
        logger.info("Close " + getClass().getSimpleName() + " bind " + getBindAddress() + ", export " + getLocalAddress());
    }
    ExecutorUtil.shutdownNow(executor, 100);
    try {
        super.close();
    } catch (Throwable e) {
        logger.warn(e.getMessage(), e);
    }
    try {
        doClose();
    } catch (Throwable e) {
        logger.warn(e.getMessage(), e);
    }
}
protected abstract void doClose() throws Throwable;

抽象方法,实现由具体的子类实现,这里不展开,不影响我们分析主干逻辑。可以看到,服务提供着,是支持优雅关机的,也就是它会等待线程池里面的任务处理完,除非超时,结束任务。

再看 client.close(ConfigUtils.getServerShutdownTimeout());

时序图
一次生产问题的排查---解决springboot集成dubbo并使用外部容器部署在应用重启出现dataSource has already close 异常_第5张图片首先看HeaderExchangeClient.close

@Override
public void close(int timeout) {
    // Mark the client into the closure process
    startClose(); //一个标记
    doClose();//关闭心跳线程
    channel.close(timeout); 
}

调用到 HeaderExchangeChannel.close

// graceful close
@Override
public void close(int timeout) {
    if (closed) {
        return;
    }
    closed = true;
    if (timeout > 0) {
        long start = System.currentTimeMillis();
        //这句是重点,实现消费者优雅处理的关键,其核心是通过 DefaultFuture.hasFuture(channel) 实现
        while (DefaultFuture.hasFuture(channel)
                && System.currentTimeMillis() - start < timeout) {
            try {
                Thread.sleep(10);
            } catch (InterruptedException e) {
                logger.warn(e.getMessage(), e);
            }
        }
    }
    close();
}

DefaultFuture.hasFuture(channel)里面,DefaultFuture这个类其实是封装了每次Rpc-request和channel的映射关系,首先,每次request都会有一个全局递增的id, DefaultFuture里有个Map存储了id和channel的关系

private static final Map CHANNELS = new ConcurrentHashMap();

所以执行 DefaultFuture.hasFuture(channel) 的时候,只需要看当前这个channel里面的request是否存在,如果存在说明该请求还没有得到响应,需要在超时时间范围内处理完成。

而方法最后的close,则是真正的去关闭Netty相关的组件。

至此,我们已经看到,不管是服务提供者,还是服务消费者,都支持了优雅停机实现

你可能感兴趣的:(springboot,mybatis)