数据库连接池之c3p0-0.9.1.2,线上偶发APPARENT DEADLOCK,如何解?

前言

本篇其实是承接前面两篇的,都是讲定位线上的c3p0数据库连接池,发生连接泄露的问题。

第二篇讲到,可以配置两个参数,来找出是哪里的代码借了连接后没有归还。但是,在我这边的情况是,对于没有归还的连接,借用者的堆栈确实是打印到日志了,但是我在本地模拟的时候,发现其实这些场景是有归还连接的,所以,我开始怀疑不是代码问题。

不是业务代码问题,能是啥问题呢?我们先来看看连接是怎么归还到连接池的。

连接的实际类型

我在本地debug了下,发现获取连接时,代码如下:

com.mchange.v2.c3p0.impl.AbstractPoolBackedDataSource#getConnection()
public Connection getConnection() throws SQLException
{
    // javax.sql.PooledConnection,实际类型为com.mchange.v2.c3p0.impl.NewPooledConnection
    PooledConnection pc = getPoolManager().getPool().checkoutPooledConnection();
    return pc.getConnection();
}

说实话,之前都没注意到jdbc api里还有javax.sql.PooledConnection这个类,这里,就是首先从c3p0连接池获取了一个com.mchange.v2.c3p0.impl.NewPooledConnection对象,然后转换为javax.sql.PooledConnection

数据库连接池之c3p0-0.9.1.2,线上偶发APPARENT DEADLOCK,如何解?_第1张图片

然后,调用javax.sql.PooledConnection#getConnection,会返回给实际类型为com.mchange.v2.c3p0.impl.NewProxyConnection的对象。

com.mchange.v2.c3p0.impl.NewPooledConnection#getConnection
public synchronized Connection getConnection() throws SQLException
{
    if ( exposedProxy == null )
    {
        exposedProxy = new NewProxyConnection( physicalConnection, this );
    }
    return exposedProxy;
}

数据库连接池之c3p0-0.9.1.2,线上偶发APPARENT DEADLOCK,如何解?_第2张图片

在该类中,主要包含如下几个字段:

数据库连接池之c3p0-0.9.1.2,线上偶发APPARENT DEADLOCK,如何解?_第3张图片

inner:实际的底层连接,如我这里,其类型为oracle.jdbc.driver.T4CConnection
parentPooledConnection:javax.sql.PooledConnection类型的池化连接
cel:类型为ConnectionEventListener,就是一个监听器

connection.close方法逻辑

com.mchange.v2.c3p0.impl.NewProxyConnection
public synchronized void close() throws SQLException {
    // 0
    if (!this.isDetached()) {
        // 1 
        NewPooledConnection npc = this.parentPooledConnection;
        this.detach();
        // 2
        npc.markClosedProxyConnection(this, this.txn_known_resolved);
        this.inner = null;
    }  
}

0处,检查该对象是否已经和底层的池化连接解绑:

boolean isDetached() {
    return this.parentPooledConnection == null;
}

1处,通过parentPooledConnection获取到NewPooledConnection类型的池化连接,然后和池化连接解绑:

private void detach() {
    this.parentPooledConnection.removeConnectionEventListener(this.cel);
    this.parentPooledConnection = null;
}

2处,调用池化连接的方法,进行清理:

void markClosedProxyConnection( NewProxyConnection npc, boolean txn_known_resolved ) 
{
    // 2.1
    List closeExceptions = new LinkedList();
    // 2.2
    cleanupResultSets( closeExceptions );
    cleanupUncachedStatements( closeExceptions );
    checkinAllCachedStatements( closeExceptions );
    // 2.3
    if ( closeExceptions.size() > 0 )
    {
        ...
        // 打印异常
    }
    reset( txn_known_resolved );
    
    exposedProxy = null; //volatile
    // 2.4
    fireConnectionClosed(); 
}

2.1处,建个list,用来收集清理过程中的各种异常;

2.2处,清理ResultSet、Statement等

2.3处,打印异常

2.4处,通知监听者:

private void fireConnectionClosed()
{
    ces.fireConnectionClosed(); 
}

然后进入:

ConnectionEvent evt = new ConnectionEvent(source);
for (Iterator i = mlCopy.iterator(); i.hasNext();)
{
    ConnectionEventListener cl = (ConnectionEventListener) i.next();
    // 1 调用listener的方法
    cl.connectionClosed(evt);
}

// com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool.ConnectionEventListenerImpl#connectionClosed    
public void connectionClosed(final ConnectionEvent evt)
{
    doCheckinResource( evt );
}
然后如下方法被调用:
private void doCheckinResource(ConnectionEvent evt)
{
    // rp: com.mchange.v2.resourcepool.BasicResourcePool
	rp.checkinResource( evt.getSource() ); 
}

这里rp就是资源池,这里就会向资源池归还连接。

数据库连接池之c3p0-0.9.1.2,线上偶发APPARENT DEADLOCK,如何解?_第4张图片

内部的实现如下:

数据库连接池之c3p0-0.9.1.2,线上偶发APPARENT DEADLOCK,如何解?_第5张图片

这里是定义了一个内部类RefurbishCheckinResourceTask,内部类实现了Runnable,然后new了一个实例,丢给了taskRunner,进行异步归还。

这个task的逻辑:

class RefurbishCheckinResourceTask implements Runnable
{
    public void run()
    {
        // 1 检查资源是否ok
        boolean resc_okay = attemptRefurbishResourceOnCheckin( resc );
        synchronized( BasicResourcePool.this )
        {
            PunchCard card = (PunchCard) managed.get( resc );
			// 2 如果资源ok,归还到unused空闲链表,更新卡片
            if ( resc_okay && card != null) 
            {
                // 2.1 归还到unused空闲链表
                unused.add(0,  resc );
				// 2.2 更新卡片的归还时间为当前时间、借出时间为-1,表示未借出
                card.last_checkin_time = System.currentTimeMillis();
                card.checkout_time = -1;
            }
            else
            {
                if (card != null)
                    card.checkout_time = -1; 
				// 连接是坏的,那就把这个连接毁灭
                removeResource( resc );
                ensureMinResources();
            }

            BasicResourcePool.this.notifyAll();
        }
    }
}

这里归还连接,可以看到,是new了一个runnable,丢给线程池去异步执行,但是,异步执行,不是很稳啊,比如,如果此时线程池里的线程,都卡住了,没法处理task,怎么办呢?

线上日志出现APPARENT DEADLOCK字样

问题描述

如果你去搜索引擎查APPARENT DEADLOCK,会搜到很多,说明这些年,大家还是被这个问题困扰了挺久

我们这边,每次出现这个连接泄露问题时,貌似都伴随着这个日志,这个日志大概长下面这样:

06-08 17:00:30,119[Timer-5][][c.ThreadPoolAsynchronousRunner:608][WARN]-com.mchange.v2.async.ThreadPoolAsynchronousRunner$DeadlockDetector@3cf46c2 -- APPARENT DEADLOCK!!! Creating emergency threads for unassigned pending tasks!
06-08 17:00:30,121[Timer-5][][c.ThreadPoolAsynchronousRunner:624][WARN]-com.mchange.v2.async.ThreadPoolAsynchronousRunner$DeadlockDetector@3cf46c2 -- APPARENT DEADLOCK!!! Complete Status: 
	Managed Threads: 3
	Active Threads: 3
	Active Tasks: 
		com.mchange.v2.resourcepool.BasicResourcePool$AcquireTask@b451b27 (com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread-#0)
		com.mchange.v2.resourcepool.BasicResourcePool$AcquireTask@65f9a338 (com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread-#1)
		com.mchange.v2.resourcepool.BasicResourcePool$AcquireTask@684ae5d5 (com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread-#2)
	Pending Tasks: 
		com.mchange.v2.resourcepool.BasicResourcePool$AcquireTask@d373871
		com.mchange.v2.resourcepool.BasicResourcePool$AcquireTask@245a897e
		com.mchange.v2.resourcepool.BasicResourcePool$DestroyResourceTask@33f8c1d7
		com.mchange.v2.resourcepool.BasicResourcePool$AcquireTask@107e24e9    
Pool thread stack traces:
	Thread[com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread-#0,5,main]
		java.net.SocketInputStream.socketRead0(Native Method)
		java.net.SocketInputStream.read(SocketInputStream.java:152)
		java.net.SocketInputStream.read(SocketInputStream.java:122)
		oracle.net.ns.Packet.receive(Packet.java:300)
		oracle.net.ns.DataPacket.receive(DataPacket.java:106)
		oracle.net.ns.NetInputStream.getNextPacket(NetInputStream.java:315)
		oracle.net.ns.NetInputStream.read(NetInputStream.java:260)
		oracle.net.ns.NetInputStream.read(NetInputStream.java:185)
		oracle.net.ns.NetInputStream.read(NetInputStream.java:102)
		oracle.jdbc.driver.T4CSocketInputStreamWrapper.readNextPacket(T4CSocketInputStreamWrapper.java:124)
		oracle.jdbc.driver.T4CSocketInputStreamWrapper.read(T4CSocketInputStreamWrapper.java:80)
		oracle.jdbc.driver.T4CMAREngine.unmarshalUB1(T4CMAREngine.java:1137)
		oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:290)
		oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:192)
		oracle.jdbc.driver.T4CTTIoauthenticate.doOSESSKEY(T4CTTIoauthenticate.java:404)
		oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:385)
		oracle.jdbc.driver.PhysicalConnection.(PhysicalConnection.java:546)
		oracle.jdbc.driver.T4CConnection.(T4CConnection.java:236)
		oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:32)
		oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:521)
		com.mchange.v2.c3p0.DriverManagerDataSource.getConnection(DriverManagerDataSource.java:134)
		com.mchange.v2.c3p0.WrapperConnectionPoolDataSource.getPooledConnection(WrapperConnectionPoolDataSource.java:182)
		com.mchange.v2.c3p0.WrapperConnectionPoolDataSource.getPooledConnection(WrapperConnectionPoolDataSource.java:171)
		com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool$1PooledConnectionResourcePoolManager.acquireResource(C3P0PooledConnectionPool.java:137)
		com.mchange.v2.resourcepool.BasicResourcePool.doAcquire(BasicResourcePool.java:1014)
		com.mchange.v2.resourcepool.BasicResourcePool.access$800(BasicResourcePool.java:32)
		com.mchange.v2.resourcepool.BasicResourcePool$AcquireTask.run(BasicResourcePool.java:1810)
		com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread.run(ThreadPoolAsynchronousRunner.java:547)            

线程池中的task类型

我们有提到,有很多事情都是丢给线程池异步执行的,比如main线程初始化连接时,main并不会自己去创建连接,而是new几个task,丢给线程池并行执行,然后main线程在那边等待。

主要有这么几种task:

  • com.mchange.v2.resourcepool.BasicResourcePool.AcquireTask

    获取数据库连接,和底层db driver打交道,如mysql、oracle的driver

  • com/mchange/v2/resourcepool/BasicResourcePool.java:959

    这个方法内,定义了一个内部class,这个DestroyResourceTask就是用来销毁底层连接

    private void destroyResource(final Object resc, boolean synchronous)
    {
        class DestroyResourceTask implements Runnable
        {
    
  • com.mchange.v2.resourcepool.BasicResourcePool#doCheckinManaged中的内部类:

    class RefurbishCheckinResourceTask implements Runnable
    

​ 这个类很重要,前面已经讲到了,归还连接的时候,就会生成这个task异步执行

  • com.mchange.v2.resourcepool.BasicResourcePool.AsyncTestIdleResourceTask#AsyncTestIdleResourceTask

    这个类,主要是测试那些空闲时间太长的资源,看看是不是还ok,不ok的话,会及时销毁

  • com.mchange.v2.resourcepool.BasicResourcePool.RemoveTask

    连接池缩容的时候需要,比如现在有20个连接,我们配置的min为10,那么多出的10个连接会被销毁

这里面,有好几个都是要和db通信的,如AcquireTask、DestroyResourceTask、AsyncTestIdleResourceTask,通信就有可能超时,长时间超时就可能阻塞当前的线程,接下来,我们就看看这些线程有没有被阻塞的可能。

线程池是如何执行task的

线程池的创建如下:

private ThreadPoolAsynchronousRunner( int num_threads, 
                    boolean daemon, 
                    int max_individual_task_time,
                    int deadlock_detector_interval, 
                    int interrupt_delay_after_apparent_deadlock,
                    Timer myTimer,
                    boolean should_cancel_timer )
    {
        this.num_threads = num_threads;
        this.daemon = daemon;
        this.max_individual_task_time = max_individual_task_time;
        this.deadlock_detector_interval = deadlock_detector_interval;
        this.interrupt_delay_after_apparent_deadlock = interrupt_delay_after_apparent_deadlock;
        this.myTimer = myTimer;
        this.should_cancel_timer = should_cancel_timer;
		// 创建线程池
        recreateThreadsAndTasks();

        myTimer.schedule( deadlockDetector, deadlock_detector_interval, deadlock_detector_interval );

    }
private void recreateThreadsAndTasks()
    {
    	// 如果线程池已经存在,则先销毁
        if ( this.managed != null)
        {
            Date aboutNow = new Date();
            for (Iterator ii = managed.iterator(); ii.hasNext(); )
            {
                PoolThread pt = (PoolThread) ii.next();
                pt.gentleStop();
                stoppedThreadsToStopDates.put( pt, aboutNow );
                ensureReplacedThreadsProcessing();
            }
        }
		
    	// 创建线程池
        this.managed = new HashSet();
        this.available = new HashSet();
        this.pendingTasks = new LinkedList();
        for (int i = 0; i < num_threads; ++i)
        {
            // 线程type为com.mchange.v2.async.ThreadPoolAsynchronousRunner.PoolThread
            Thread t = new PoolThread(i, daemon);
            managed.add( t );
            available.add( t );
            t.start();
        }
    }

线程的执行逻辑:

// 1
boolean should_stop;
LinkedList pendingTasks;

while (true)
{
    Runnable myTask;
    synchronized ( ThreadPoolAsynchronousRunner.this )
    {
        while ( !should_stop && pendingTasks.size() == 0 )
            ThreadPoolAsynchronousRunner.this.wait( POLL_FOR_STOP_INTERVAL );
        // 2
        if (should_stop) 
            break thread_loop;
		// 3
        myTask = (Runnable) pendingTasks.remove(0);
        currentTask = myTask;
    }
    try
    { 	// 4
        if (max_individual_task_time > 0)
            setMaxIndividualTaskTimeEnforcer();
        // 5
        myTask.run(); 
    }
    ...
}

1处,在线程中定义了一个标志,如果这个标志为true,线程检测到,会停止执行;

2处,检测标志;

3处,从任务列表摘取任务;

你可能感兴趣的:(数据库,oracle)