高通camx hal进程CheckForRecovery原理分析

【关注我,后续持续新增专题博文,谢谢!!!】

上一篇我们讲了

        这一篇我们开始讲 

目录

一、问题背景

二、CheckForRecovery原理

    2.1:我们分析下代码

    2.2 :ChiMulticameraBase::CheckForRecovery

    2.3 :Feature2Wrapper::CheckForRecovery

    2.4 :Pipeline::CheckForRecovery


一、问题背景

高通Camx架构项目经常会遇到涉及CheckForRecovery的异常,会让人看不明白什么原因导致crash。今天来讲解CheckForRecovery自身原理。而原因种类较多,还需要具体问题具体分析找到解决方案,后续补充相关案例分析。

二、CheckForRecovery原理

    2.1:我们分析下代码

SM4350原生代码如下:会发现有三处定义:

  1. Feature2Wrapper::CheckForRecovery:在ExecuteProcessRequest()中检测。
  2. ChiMulticameraBase::CheckForRecovery:在ExecuteProcessRequest()中检测。
  3. Pipeline::CheckForRecovery:在CSLMessageHandler()中检测。
vendor/qcom/proprietary/chi-cdk/core/chifeature2/
D	chifeature2wrapper.cpp	2387 CDKResult Feature2Wrapper::CheckForRecovery( in CheckForRecovery() function in Feature2Wrapper
vendor/qcom/proprietary/chi-cdk/core/chiframework/
D	chxmulticamerabase.cpp	3116 CDKResult ChiMulticameraBase::CheckForRecovery( in CheckForRecovery() function in ChiMulticameraBase
vendor/qcom/proprietary/camx/src/core/
D	camxpipeline.cpp	4976 VOID Pipeline::CheckForRecovery( in CheckForRecovery() function in Pipeline

    2.2 :ChiMulticameraBase::CheckForRecovery

在ExecuteProcessRequest()中通过IsRequestStuck检查request请求卡住时间是否超过阈值,如果是返回CDKResultEFailed,将设置isRecoveryTriggered = TRUE代表已经触发Recovery

CDKResult Feature2Wrapper::ExecuteProcessRequest(
    camera3_capture_request_t* pRequest)
{
    if ((CDKResultSuccess == result) && (TRUE == ExtensionModule::GetInstance()->EnableChiRecovery()))
    {
        result = CheckForRecovery(pRequest->frame_number);
        if (CDKResultSuccess != result)
        {
            isRecoveryTriggered = TRUE;
        }
    }
}

/// ChiMulticameraBase::CheckForRecovery

CDKResult ChiMulticameraBase::CheckForRecovery(
    UINT frameNumber)
{
    CDKResult                   result    = CDKResultSuccess;

    if (TRUE == static_cast(ChxUtils::AtomicLoadU32(&m_isFlushinProgress)))
    {
        CHX_LOG_WARN("Recovery cancelled because of ongoing flush");
        return result;
    }

    for (auto itr = m_usecaseRequestObjectMap.begin(); itr != m_usecaseRequestObjectMap.end(); itr++)
    {
        if (NULL != *itr)
        {
            ChiFeature2UsecaseRequestObject* pUsecaseRequestObj = *itr;

            if (pUsecaseRequestObj->GetAppFrameNumber() < frameNumber)
            {
                if ((ChiFeature2UsecaseRequestObjectState::InputConfigPending == pUsecaseRequestObj->GetRequestState()) ||
                    (ChiFeature2UsecaseRequestObjectState::OutputPending == pUsecaseRequestObj->GetRequestState()))
                {
                    // check if the request is stuck more than the threshold time
                    //检查请求卡住时间是否超过阈值
                    if (TRUE == IsRequestStuck(pUsecaseRequestObj))
                    {
                        CHX_LOG_ERROR("Lets do a Reset:MCX Chi frameNumber = %d", pUsecaseRequestObj->GetAppFrameNumber());
                        // Set recovery status to TRUE
                        CHIMESSAGEDESCRIPTOR messageDescriptor     = {};

                        messageDescriptor.messageType              = ChiMessageType::ChiMessageTypeSystemEvent;

                        m_pFeatureGraphManager->FillSystemEventInfo(pUsecaseRequestObj,
                            messageDescriptor.message.systemEventMessage.graphData,
                            messageDescriptor.message.systemEventMessage.featureData,
                            messageDescriptor.message.systemEventMessage.headerData);
                        FillUsecaseInfo(messageDescriptor.message.systemEventMessage.usecaseData);

                        messageDescriptor.message.systemEventMessage.headerData.errorCode = CDKResultExceedThreshold;

                        Usecase::ProcessSystemEventMessage(&messageDescriptor);
                        result =  CDKResultEFailed;
                        break;
                    }
                }
            }
        }
    }

    return result;
}


/// Feature2Wrapper::IsRequestStuck

BOOL Feature2Wrapper::IsRequestStuck(
    ChiFeature2UsecaseRequestObject* pUsecaseRequestObj)
{
    UINT64      nowTimeMs   = CdkUtils::GetCurrTimeInMs();
    UINT64      resetTimeMs = pUsecaseRequestObj->GetResetTime();
    BOOL        isReqStuck  = FALSE;
    UINT32      timeMs      = (nowTimeMs > resetTimeMs) ? static_cast(nowTimeMs - resetTimeMs) : 0;

    if ((0 < timeMs) && (timeMs >= pUsecaseRequestObj->GetThresholdTime()))
    {
        isReqStuck = TRUE;
    }

    return isReqStuck;

    2.3 :Feature2Wrapper::CheckForRecovery

类似在ExecuteProcessRequest()中通过IsRequestStuck检查request请求卡住时间是否超过阈值,如果是返回CDKResultEFailed,将设置isRecoveryTriggered = TRUE代表已经触发Recovery。


/// Feature2Wrapper::CheckForRecovery

CDKResult Feature2Wrapper::CheckForRecovery(
    UINT frameNumber)
{
    CDKResult                   result    = CDKResultSuccess;

    if (TRUE == m_isFlush)
    {
        CHX_LOG_WARN("Recovery cancelled because of ongoing flush");
        return result;
    }

    for (auto Mapiterator = m_usecaseRequestObjectMap.begin(); Mapiterator != m_usecaseRequestObjectMap.end(); Mapiterator++)
    {
        ChiFeature2UsecaseRequestObject* pUsecaseRequestObj = Mapiterator->second;
        UINT                             frameNum           = Mapiterator->first;

        if (frameNum < frameNumber)
        {
            if ((NULL != pUsecaseRequestObj) &&
                ((ChiFeature2UsecaseRequestObjectState::InputConfigPending == pUsecaseRequestObj->GetRequestState()) ||
                (ChiFeature2UsecaseRequestObjectState::OutputPending == pUsecaseRequestObj->GetRequestState())))
            {
                // check if the request is stuck more than the threshold time
                if (TRUE == IsRequestStuck(pUsecaseRequestObj))
                {
                    CHX_LOG_ERROR("Lets do a Reset: Chi frameNumber = %d", pUsecaseRequestObj->GetAppFrameNumber());
                    // Set recovery status to TRUE
                    CHIMESSAGEDESCRIPTOR messageDescriptor                      = {};
                    messageDescriptor.messageType                               = ChiMessageType::ChiMessageTypeSystemEvent;

                    m_pChiFeatureGraphManager->FillSystemEventInfo(pUsecaseRequestObj,
                        messageDescriptor.message.systemEventMessage.graphData,
                        messageDescriptor.message.systemEventMessage.featureData,
                        messageDescriptor.message.systemEventMessage.headerData);
                    m_pUsecaseBase->FillUsecaseInfo(messageDescriptor.message.systemEventMessage.usecaseData);

                    messageDescriptor.message.systemEventMessage.headerData.errorCode = CDKResultExceedThreshold;

                    m_pUsecaseBase->ProcessSystemEventMessage(&messageDescriptor);
                    result =  CDKResultEFailed;
                    break;
                }
            }
        }
    }

    return result;
}

    2.4 :Pipeline::CheckForRecovery

Pipeline::CheckForRecovery和上面两个不太一样,如下:

  1. 由m_invalidSOFCounter SOF看门狗计数器控制整个过程。
  2. 如果是无效Invalid RequestId,或者最后完成的RequestId等于最后递交的RequestId,则计数清0
  3. 只在pipeline有活动且不在flushing的情况下触发recovery
  4. 如果Invalid SOF计数器达到了请求队列深度的两倍,我们就知道自己卡住了。
  5. 提示:连续无效requestId的帧达到SOF阈值,正在触发pipeline的watchdog recovery。
  6. sendRecovery = TRUE;//此时send Recovery,触发Recovery
    UINT                           m_invalidSOFCounter;                 ///< Keep track of consecutive SOF w/ invalid requestId
    UINT64                         m_lastCompletedRequestId;            ///< Last completed request Id
    UINT64                         m_lastInOrderCompletedRequestId;     ///< Last monotonically increasing request id, every
                                                                        ///< request id before it is guaranteed to be done,
                                                                        ///< NOT every request id after it is guaranteed to be
                                                                        ///< pending.
    UINT64                         m_lastMetaCompletedRequestId;        ///< Last Metadata completed request Id
    UINT64                         m_lastInOrderMetaCompletedRequestId; ///< Last monotonically increasing request id, every
                                                                        ///< request id before it is guaranteed to be done with
                                                                        ///< the metadata, NOT every request id after it is
                                                                        ///< guaranteed to be pending.
    UINT64                         m_lastRequestIdScanned;              ///< Last requestId that was updated while scanning
                                                                        ///  for last valid request
    BOOL                           m_isIPERealtime;                     ///< is IPE in RealTime
    UINT64                         m_lastSubmittedRequestId;            ///< Last request id before stream off
    UINT64                         m_submittedRequestCount;             ///< submitted request number
    UINT32                         m_lastSubmittedSequenceId;           ///< Last sequence id before stream off
    UINT64                         m_lastSubmittedShutterRequestId;     ///< Last shutter notification request id
    UINT64                         m_lastSubmittedSHDRSOFTimeRequestId; ///< Last sublitted SHDR sof time request id
    CHIDEACTIVATEPIPELINEMODE      m_lastModeBitMask;                   ///< modeBitMask of stream off


/// Pipeline::CheckForRecovery

VOID Pipeline::CheckForRecovery(
    const CSLFrameMessage* pMessage)
{
    //默认值是有效RequestId
    BOOL        isSequenceValid     = (CamxInvalidRequestId != pMessage->requestID);
    ResultsData errorData           = {};
    UINT32      requestQueueDepth   = m_pSession->GetCurrentRequestQueueDepth();
    BOOL        sendRecovery        = FALSE;//默认值
    BOOL        delayRecovery       = FALSE;

    const StaticSettings* pStaticSettings = m_pChiContext->GetStaticSettings();

    // Reset the SOF watchdog counter if //重置SOF看门狗计数器,如果满足下列两个条件
    // (1) SOF is serviced with valid request or  //SOF服务一个无效request
    // (2) there are no active request in pipeline  //pipeline中是一个未激活的request
    if ((TRUE == isSequenceValid) || (m_lastCompletedRequestId == m_lastSubmittedRequestId))
    {//如果是无效Invalid RequestId,或者最后完成的RequestId等于最后递交的RequestId,则计数清0
        m_invalidSOFCounter = 0;
    }
    else //否则
    {
        // We only want to trigger recovery if there has been activity and we are not flushing
        //我们只希望在有活动且不在flushing的情况下触发recovery
        if ((FALSE == GetFlushStatus()) && (FALSE == m_pSession->IsResultHolderEmpty()))
        {//flush状态是false,且node result个数的空的情况下
            // Calculate SOF threshold based on frame rate 基于帧率计算SOF阈值
            UINT32 frameRateMultiplier = ((m_pPipelineDescriptor->maxFPSValue / 30) / m_numMaxBatchedFrames);
            if (0 == frameRateMultiplier)
            {
                frameRateMultiplier = 1;
            }

            // If invalidSOF counter has hit twice the request queue depth, we know that we are stuck
            //如果Invalid SOF计数器达到了请求队列深度的两倍,我们就知道自己卡住了
            if (m_invalidSOFCounter > (requestQueueDepth * 2 * frameRateMultiplier))
            {
                for (UINT i = 0; i < MaxPerRequestInfo; i++)
                {
                    if (TRUE == m_perRequestInfo[i].isSlowdownPresent)
                    {
                        delayRecovery = TRUE;
                        //检测到request速度减慢,不触发recovery
                        CAMX_LOG_INFO(CamxLogGroupCore, "Detected a slowdown for request:%llu, don't trigger recovery",
                                      m_perRequestInfo[i].request.requestId);
                        m_perRequestInfo[i].isSlowdownPresent = FALSE;
                    }
                }

                if ((m_lastCompletedRequestId != m_lastSubmittedRequestId) && (FALSE == delayRecovery))
                {   //提示:连续无效requestId的帧达到SOF阈值,正在触发pipeline的watchdog recovery。
                    CAMX_LOG_ERROR(CamxLogGroupCore, "Hit SOF threshold of [%d] consecutive frames with invalid"
                                   "requestId; triggering watchdog recovery for pipeline %s",
                                   m_invalidSOFCounter, GetPipelineIdentifierString());

                    // Since we need to do recovery and we don't have way of knowing which handle to choose
                    // because we have gotten continuous SOF, we are attempting with an arbitrary handle
                    //由于我们需要进行recovery,而我们无法知道该选择哪个handle
                    //因为我们得到了连续的sof,所以我们尝试使用任意句柄
                    PerRequestInfo* pPerRequestInfo = &m_perRequestInfo[0];
                    sendRecovery = TRUE;//此时send Recovery,触发Recovery
                }
                m_invalidSOFCounter = 0;//send Recovery后清0
            }
            else
            {
                m_invalidSOFCounter++;//计算器加1
            }
        }
        else//否则清0
        {
            m_invalidSOFCounter = 0;
        }
    }

    if (TRUE == sendRecovery)
    {
        UINT32 fenceErrorCode;

        fenceErrorCode = GetFenceErrorCode();

        // Preference to fence error than SOF WatchDog Timeout
        if (UINT32_MAX != fenceErrorCode)
        {   //此时将调用ProcessSystemEventMessage函数
            PipelineTriggerSystemEvent(fenceErrorCode, FALSE, FALSE);
        }
        else
        {   //此时将调用CanTriggerSOFWatchDogTimeout函数
            PipelineTriggerSystemEvent(CamxResultExtCoreSOFWatchDogTimeout, FALSE, FALSE);
        }
    }
}

【关注我,后续持续新增专题博文,谢谢!!!】

下一篇讲解

你可能感兴趣的:(数码相机,android,图像处理)