getContentProviderImpl占满binder导致应用anr的案例分析

背景:

bug# 813706
日志路径:Z:\stability_doc\case\binderfull_getcontentprovider

systemui,settings等一直anr导致系统无法正常运行。
通过查看trace确定systemui anr的直接原因是由于SS的binder线程被占满且都在getContentProviderImpl时wait,无法及时响应

----- pid 1863 at 2018-08-06 19:57:10 -----
Cmd line: com.android.systemui
"main" prio=5 tid=1 Native
  | group="main" sCount=1 dsCount=0 flags=1 obj=0x72c33e38 self=0x78a64c2a00
  | sysTid=1863 nice=0 cgrp=default sched=0/0 handle=0x792b5f49a8
  | state=S schedstat=( 2587940558080 1667956760666 8922333 ) utm=201251 stm=57543 core=7 HZ=100
  | stack=0x7fd02c9000-0x7fd02cb000 stackSize=8MB
  | held mutexes=
  kernel: (couldn't read /proc/self/task/1863/stack)
  native: #00 pc 000000000007a2c4  /system/lib64/libc.so (__ioctl+4)
  native: #01 pc 0000000000027a34  /system/lib64/libc.so (ioctl+132)
  native: #02 pc 0000000000058458  /system/lib64/libbinder.so (android::IPCThreadState::talkWithDriver(bool)+252)
  native: #03 pc 000000000005912c  /system/lib64/libbinder.so (android::IPCThreadState::waitForResponse(android::Parcel*, int*)+60)
  native: #04 pc 0000000000058f68  /system/lib64/libbinder.so (android::IPCThreadState::transact(int, unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+216)
  native: #05 pc 000000000004f3ec  /system/lib64/libbinder.so (android::BpBinder::transact(unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+72)
  native: #06 pc 0000000000127890  /system/lib64/libandroid_runtime.so (???)
  native: #07 pc 000000000093b1e4  /system/framework/arm64/boot-framework.oat (Java_android_os_BinderProxy_transactNative__ILandroid_os_Parcel_2Landroid_os_Parcel_2I+196)
  at android.os.BinderProxy.transactNative(Native method)
  at android.os.BinderProxy.transact(Binder.java:769)
  at android.app.IActivityManager$Stub$Proxy.getCurrentUser(IActivityManager.java:7528)
  at android.app.ActivityManager.getCurrentUser(ActivityManager.java:4081)
  at com.android.systemui.statusbar.policy.Clock.getSmallTime(Clock.java:216)
  at com.android.systemui.statusbar.policy.Clock.updateClock(Clock.java:173)
  at com.android.systemui.statusbar.policy.Clock.onTimeTick(Clock.java:369)

----- pid 1391 at 2018-08-06 19:57:10 -----
Cmd line: system_server
"Binder:1391_1" prio=5 tid=7 TimedWaiting
  | group="main" sCount=1 dsCount=0 flags=1 obj=0x136803e8 self=0x78a65a6800
  | sysTid=1401 nice=0 cgrp=default sched=0/0 handle=0x7889d814f0
  | state=S schedstat=( 629918502103 239344250787 1866819 ) utm=45619 stm=17372 core=6 HZ=100
  | stack=0x7889c87000-0x7889c89000 stackSize=1005KB
  | held mutexes=
  at java.lang.Object.wait(Native method)
  - waiting on <0x02fd645a> (a com.android.server.am.ContentProviderRecord)
  at java.lang.Object.wait(Object.java:422)
  at com.android.server.am.ActivityManagerService.getContentProviderImpl(ActivityManagerService.java:12588)
  - locked <0x02fd645a> (a com.android.server.am.ContentProviderRecord)
  at com.android.server.am.ActivityManagerService.getContentProvider(ActivityManagerService.java:12663)
  at android.app.IActivityManager$Stub.onTransact(IActivityManager.java:439)
  at com.android.server.am.ActivityManagerService.onTransact(ActivityManagerService.java:3132)
  at android.os.Binder.execTransact(Binder.java:702)

本文重点记录getContentProviderImpl wait被占满的调查过程:

08-06 19:57:10.041  1391  1405 I am_anr  : [0,1863,com.android.systemui,952647181, }]

最早发生systemui anr的时间是08-06 19:57:10,这56分的时候出现com.android.providers.calendar进程启动失败的日志。

// 启动com.android.providers.calendar进程且fork进程成功

08-06 19:56:23.777  8600  8600 I Zygote  : ForkAndSpecializeCommon: fork is finished, pid=0 
08-06 19:56:23.781  1391  1401 I am_proc_start: [0,8600,10007,com.android.providers.calendar,content provider,com.android.providers.calendar/.CalendarProvider2]

// com.android.providers.calendar 8600进程启动超时并kill改进程。

08-06 19:56:33.788  1391  1405 I am_process_start_timeout: [0,8600,10007,com.android.providers.calendar]
08-06 19:56:33.789  1391  1405 I am_kill : [0,8600,com.android.providers.calendar,-10000,start timeout]

//下面code说明ams必须收到calendar provider的attachApplicationLocked回调,才会remove
PROC_START_TIMEOUT_MSG,启动成功。
//从上面fork的日志来看calendar provider进程已经启动成功了,但是ams仍然打印timeout日志,说明ams并没有接收到8600的回调。继续调查原因。

public void handleMessage(Message msg) {
//...
            case PROC_START_TIMEOUT_MSG: {
                ProcessRecord app = (ProcessRecord)msg.obj;
                synchronized (ActivityManagerService.this) {
                    processStartTimedOutLocked(app);
                }
            } break;
//...
}
private final boolean attachApplicationLocked(IApplicationThread thread, int pid) {
        mHandler.removeMessages(PROC_START_TIMEOUT_MSG, app);
}

// getprovider的binder线程 发现launchapp为null返回,13379 2072 这些均为binder线程,熟了下一共32条日志刚好是ss的最大binder线程数。 这意味这什么呢?请接下来看代码解析

    Line 125379: 08-06 19:56:33.789  1391 13379 W ActivityManager: Unable to launch app com.android.providers.calendar/10007 for provider com.android.calendar: launching app became null
    Line 125381: 08-06 19:56:33.789  1391  2072 W ActivityManager: Unable to launch app com.android.providers.calendar/10007 for provider com.android.calendar: launching app became null
    Line 125385: 08-06 19:56:33.790  1391  2756 W ActivityManager: Unable to launch app com.android.providers.calendar/10007 for provider com.android.calendar: launching app became null
    Line 125387: 08-06 19:56:33.790  1391  2804 W ActivityManager: Unable to launch app com.android.providers.calendar/10007 for provider com.android.calendar: launching app became null
    Line 125388: 08-06 19:56:33.790  1391  2045 W ActivityManager: Unable to launch app com.android.providers.calendar/10007 for provider com.android.calendar: launching app became null
    Line 125390: 08-06 19:56:33.790  1391  2757 W ActivityManager: Unable to launch app com.android.providers.calendar/10007 for provider com.android.calendar: launching app became null
    Line 125393: 08-06 19:56:33.790  1391  2075 W ActivityManager: Unable to launch app com.android.providers.calendar/10007 for provider com.android.calendar: launching app became null
    Line 125395: 08-06 19:56:33.790  1391 15409 W ActivityManager: Unable to launch app com.android.providers.calendar/10007 for provider com.android.calendar: launching app became null
    Line 125396: 08-06 19:56:33.791  1391  2044 W ActivityManager: Unable to launch app com.android.providers.calendar/10007 for provider com.android.calendar: launching app became null
    Line 125397: 08-06 19:56:33.791  1391  2803 W ActivityManager: Unable to launch app com.android.providers.calendar/10007 for provider com.android.calendar: launching app became null
    Line 125398: 08-06 19:56:33.791  1391 13449 W ActivityManager: Unable to launch app com.android.providers.calendar/10007 for provider com.android.calendar: launching app became null
    Line 125399: 08-06 19:56:33.791  1391  2129 W ActivityManager: Unable to launch app com.android.providers.calendar/10007 for provider com.android.calendar: launching app became null
    Line 125401: 08-06 19:56:33.792  1391  2799 W ActivityManager: Unable to launch app com.android.providers.calendar/10007 for provider com.android.calendar: launching app became null
    Line 125408: 08-06 19:56:33.792  1391  2805 W ActivityManager: Unable to launch app com.android.providers.calendar/10007 for provider com.android.calendar: launching app became null
    Line 125410: 08-06 19:56:33.792  1391  2831 W ActivityManager: Unable to launch app com.android.providers.calendar/10007 for provider com.android.calendar: launching app became null
    Line 125411: 08-06 19:56:33.792  1391 18489 W ActivityManager: Unable to launch app com.android.providers.calendar/10007 for provider com.android.calendar: launching app became null
    Line 125412: 08-06 19:56:33.793  1391  2834 W ActivityManager: Unable to launch app com.android.providers.calendar/10007 for provider com.android.calendar: launching app became null
    Line 125415: 08-06 19:56:33.793  1391  2824 W ActivityManager: Unable to launch app com.android.providers.calendar/10007 for provider com.android.calendar: launching app became null
    Line 125418: 08-06 19:56:33.793  1391  5968 W ActivityManager: Unable to launch app com.android.providers.calendar/10007 for provider com.android.calendar: launching app became null
    Line 125419: 08-06 19:56:33.793  1391  1809 W ActivityManager: Unable to launch app com.android.providers.calendar/10007 for provider com.android.calendar: launching app became null
    Line 125420: 08-06 19:56:33.793  1391  2718 W ActivityManager: Unable to launch app com.android.providers.calendar/10007 for provider com.android.calendar: launching app became null
    Line 125421: 08-06 19:56:33.793  1391  8866 W ActivityManager: Unable to launch app com.android.providers.calendar/10007 for provider com.android.calendar: launching app became null
    Line 125422: 08-06 19:56:33.794  1391  2729 W ActivityManager: Unable to launch app com.android.providers.calendar/10007 for provider com.android.calendar: launching app became null
    Line 125424: 08-06 19:56:33.794  1391 15408 W ActivityManager: Unable to launch app com.android.providers.calendar/10007 for provider com.android.calendar: launching app became null
    Line 125425: 08-06 19:56:33.794  1391  1873 W ActivityManager: Unable to launch app com.android.providers.calendar/10007 for provider com.android.calendar: launching app became null
    Line 125427: 08-06 19:56:33.794  1391  1401 W ActivityManager: Unable to launch app com.android.providers.calendar/10007 for provider com.android.calendar: launching app became null
    Line 125429: 08-06 19:56:33.794  1391  2719 W ActivityManager: Unable to launch app com.android.providers.calendar/10007 for provider com.android.calendar: launching app became null
    Line 125430: 08-06 19:56:33.794  1391  1402 W ActivityManager: Unable to launch app com.android.providers.calendar/10007 for provider com.android.calendar: launching app became null
    Line 125431: 08-06 19:56:33.794  1391  2835 W ActivityManager: Unable to launch app com.android.providers.calendar/10007 for provider com.android.calendar: launching app became null
    Line 125432: 08-06 19:56:33.795  1391  2832 W ActivityManager: Unable to launch app com.android.providers.calendar/10007 for provider com.android.calendar: launching app became null
    Line 125433: 08-06 19:56:33.795  1391  6326 W ActivityManager: Unable to launch app com.android.providers.calendar/10007 for provider com.android.calendar: launching app became null
    Line 125434: 08-06 19:56:33.795  1391  2800 W ActivityManager: Unable to launch app com.android.providers.calendar/10007 for provider com.android.calendar: launching app became null
private ContentProviderHolder getContentProviderImpl(IApplicationThread caller,
            String name, IBinder token, boolean stable, int userId) {

    proc = startProcessLocked(cpi.processName,
                                    cpr.appInfo, false, 0, "content provider",
                                    new ComponentName(cpi.applicationInfo.packageName,
                                            cpi.name), false, false, false);
    while (cpr.provider == null) {
        if (cpr.launchingApp == null) {
                    Slog.w(TAG, "Unable to launch app "
                            + cpi.applicationInfo.packageName + "/"
                            + cpi.applicationInfo.uid + " for provider "
                            + name + ": launching app became null");
                    return null;
                }
        cpr.wait(waitTimeout);
    }
}

打印上面日志的地方在getContentProviderImpl方法里,cpr.launchingApp为null的时候
下面代码执行是又某应用进程(此处是calendar)发起,设想执行流程应该是这样的
1 第一个访问calendar provider的应用向ams发起getContentProvider的请求,ams发现proc 为空则去启动com.android.providers.calendar进程,且cpr wait ,该binder线程被hang住。
2 接着ams等待com.android.providers.calendar启动完成的attachApplicationLocked回调
3 然后ams 又收到31个calendar的binder请求均被 wait住了
4 calendar provider 的process启动完成回调发现所有binder被hang住,ss没有多余的binder线程可以分配.等待10秒后超时
5 08-06 19:56:33 ,10秒超时后回调cpr.notifyall被调用, 上述代码里的while循环被继续执行,发现cpr.launchingApp则返回。所有binder线程被释放,也就是上面的" Unable to launch app "的日志打印
6 接着又从一开始循环执行到第5步,至此发生了systemui等应用anr。

Line 15194: 08-06 16:58:42.550  1391  1405 I am_process_start_timeout: [0,14150,10007,com.android.providers.calendar]
    Line 87516: 08-06 19:56:33.788  1391  1405 I am_process_start_timeout: [0,8600,10007,com.android.providers.calendar]
    Line 87559: 08-06 19:56:43.810  1391  1405 I am_process_start_timeout: [0,8621,10007,com.android.providers.calendar]
    Line 87597: 08-06 19:56:53.854  1391  1405 I am_process_start_timeout: [0,8636,10007,com.android.providers.calendar]
    Line 87642: 08-06 19:57:03.877  1391  1405 I am_process_start_timeout: [0,8659,10007,com.android.providers.calendar]
    Line 87684: 08-06 19:57:13.894  1391  1405 I am_process_start_timeout: [0,8676,10007,com.android.providers.calendar]
    Line 87728: 08-06 19:57:26.282  1391  1405 I am_process_start_timeout: [0,8720,10007,com.android.providers.calendar]
    Line 87788: 08-06 19:57:38.406  1391  1405 I am_process_start_timeout: [0,8761,10007,com.android.providers.calendar]

你可能感兴趣的:(getContentProviderImpl占满binder导致应用anr的案例分析)