【关注我,后续持续新增专题博文,谢谢!!!】
上一篇我们讲了:
这一篇我们开始讲:
目录
一、问题背景
二、CheckForRecovery原理
2.1:我们分析下代码
2.2 :ChiMulticameraBase::CheckForRecovery
2.3 :Feature2Wrapper::CheckForRecovery
2.4 :Pipeline::CheckForRecovery
高通Camx架构项目经常会遇到涉及CheckForRecovery的异常,会让人看不明白什么原因导致crash。今天来讲解CheckForRecovery自身原理。而原因种类较多,还需要具体问题具体分析找到解决方案,后续补充相关案例分析。
SM4350原生代码如下:会发现有三处定义:
- Feature2Wrapper::CheckForRecovery:在ExecuteProcessRequest()中检测。
- ChiMulticameraBase::CheckForRecovery:在ExecuteProcessRequest()中检测。
- Pipeline::CheckForRecovery:在CSLMessageHandler()中检测。
vendor/qcom/proprietary/chi-cdk/core/chifeature2/
D chifeature2wrapper.cpp 2387 CDKResult Feature2Wrapper::CheckForRecovery( in CheckForRecovery() function in Feature2Wrapper
vendor/qcom/proprietary/chi-cdk/core/chiframework/
D chxmulticamerabase.cpp 3116 CDKResult ChiMulticameraBase::CheckForRecovery( in CheckForRecovery() function in ChiMulticameraBase
vendor/qcom/proprietary/camx/src/core/
D camxpipeline.cpp 4976 VOID Pipeline::CheckForRecovery( in CheckForRecovery() function in Pipeline
在ExecuteProcessRequest()中通过IsRequestStuck检查request请求卡住时间是否超过阈值,如果是返回CDKResultEFailed,将设置isRecoveryTriggered = TRUE代表已经触发Recovery
CDKResult Feature2Wrapper::ExecuteProcessRequest(
camera3_capture_request_t* pRequest)
{
if ((CDKResultSuccess == result) && (TRUE == ExtensionModule::GetInstance()->EnableChiRecovery()))
{
result = CheckForRecovery(pRequest->frame_number);
if (CDKResultSuccess != result)
{
isRecoveryTriggered = TRUE;
}
}
}
/// ChiMulticameraBase::CheckForRecovery
CDKResult ChiMulticameraBase::CheckForRecovery(
UINT frameNumber)
{
CDKResult result = CDKResultSuccess;
if (TRUE == static_cast(ChxUtils::AtomicLoadU32(&m_isFlushinProgress)))
{
CHX_LOG_WARN("Recovery cancelled because of ongoing flush");
return result;
}
for (auto itr = m_usecaseRequestObjectMap.begin(); itr != m_usecaseRequestObjectMap.end(); itr++)
{
if (NULL != *itr)
{
ChiFeature2UsecaseRequestObject* pUsecaseRequestObj = *itr;
if (pUsecaseRequestObj->GetAppFrameNumber() < frameNumber)
{
if ((ChiFeature2UsecaseRequestObjectState::InputConfigPending == pUsecaseRequestObj->GetRequestState()) ||
(ChiFeature2UsecaseRequestObjectState::OutputPending == pUsecaseRequestObj->GetRequestState()))
{
// check if the request is stuck more than the threshold time
//检查请求卡住时间是否超过阈值
if (TRUE == IsRequestStuck(pUsecaseRequestObj))
{
CHX_LOG_ERROR("Lets do a Reset:MCX Chi frameNumber = %d", pUsecaseRequestObj->GetAppFrameNumber());
// Set recovery status to TRUE
CHIMESSAGEDESCRIPTOR messageDescriptor = {};
messageDescriptor.messageType = ChiMessageType::ChiMessageTypeSystemEvent;
m_pFeatureGraphManager->FillSystemEventInfo(pUsecaseRequestObj,
messageDescriptor.message.systemEventMessage.graphData,
messageDescriptor.message.systemEventMessage.featureData,
messageDescriptor.message.systemEventMessage.headerData);
FillUsecaseInfo(messageDescriptor.message.systemEventMessage.usecaseData);
messageDescriptor.message.systemEventMessage.headerData.errorCode = CDKResultExceedThreshold;
Usecase::ProcessSystemEventMessage(&messageDescriptor);
result = CDKResultEFailed;
break;
}
}
}
}
}
return result;
}
/// Feature2Wrapper::IsRequestStuck
BOOL Feature2Wrapper::IsRequestStuck(
ChiFeature2UsecaseRequestObject* pUsecaseRequestObj)
{
UINT64 nowTimeMs = CdkUtils::GetCurrTimeInMs();
UINT64 resetTimeMs = pUsecaseRequestObj->GetResetTime();
BOOL isReqStuck = FALSE;
UINT32 timeMs = (nowTimeMs > resetTimeMs) ? static_cast(nowTimeMs - resetTimeMs) : 0;
if ((0 < timeMs) && (timeMs >= pUsecaseRequestObj->GetThresholdTime()))
{
isReqStuck = TRUE;
}
return isReqStuck;
类似在ExecuteProcessRequest()中通过IsRequestStuck检查request请求卡住时间是否超过阈值,如果是返回CDKResultEFailed,将设置isRecoveryTriggered = TRUE代表已经触发Recovery。
/// Feature2Wrapper::CheckForRecovery
CDKResult Feature2Wrapper::CheckForRecovery(
UINT frameNumber)
{
CDKResult result = CDKResultSuccess;
if (TRUE == m_isFlush)
{
CHX_LOG_WARN("Recovery cancelled because of ongoing flush");
return result;
}
for (auto Mapiterator = m_usecaseRequestObjectMap.begin(); Mapiterator != m_usecaseRequestObjectMap.end(); Mapiterator++)
{
ChiFeature2UsecaseRequestObject* pUsecaseRequestObj = Mapiterator->second;
UINT frameNum = Mapiterator->first;
if (frameNum < frameNumber)
{
if ((NULL != pUsecaseRequestObj) &&
((ChiFeature2UsecaseRequestObjectState::InputConfigPending == pUsecaseRequestObj->GetRequestState()) ||
(ChiFeature2UsecaseRequestObjectState::OutputPending == pUsecaseRequestObj->GetRequestState())))
{
// check if the request is stuck more than the threshold time
if (TRUE == IsRequestStuck(pUsecaseRequestObj))
{
CHX_LOG_ERROR("Lets do a Reset: Chi frameNumber = %d", pUsecaseRequestObj->GetAppFrameNumber());
// Set recovery status to TRUE
CHIMESSAGEDESCRIPTOR messageDescriptor = {};
messageDescriptor.messageType = ChiMessageType::ChiMessageTypeSystemEvent;
m_pChiFeatureGraphManager->FillSystemEventInfo(pUsecaseRequestObj,
messageDescriptor.message.systemEventMessage.graphData,
messageDescriptor.message.systemEventMessage.featureData,
messageDescriptor.message.systemEventMessage.headerData);
m_pUsecaseBase->FillUsecaseInfo(messageDescriptor.message.systemEventMessage.usecaseData);
messageDescriptor.message.systemEventMessage.headerData.errorCode = CDKResultExceedThreshold;
m_pUsecaseBase->ProcessSystemEventMessage(&messageDescriptor);
result = CDKResultEFailed;
break;
}
}
}
}
return result;
}
Pipeline::CheckForRecovery和上面两个不太一样,如下:
- 由m_invalidSOFCounter SOF看门狗计数器控制整个过程。
- 如果是无效Invalid RequestId,或者最后完成的RequestId等于最后递交的RequestId,则计数清0
- 只在pipeline有活动且不在flushing的情况下触发recovery
- 如果Invalid SOF计数器达到了请求队列深度的两倍,我们就知道自己卡住了。
- 提示:连续无效requestId的帧达到SOF阈值,正在触发pipeline的watchdog recovery。
- sendRecovery = TRUE;//此时send Recovery,触发Recovery
UINT m_invalidSOFCounter; ///< Keep track of consecutive SOF w/ invalid requestId
UINT64 m_lastCompletedRequestId; ///< Last completed request Id
UINT64 m_lastInOrderCompletedRequestId; ///< Last monotonically increasing request id, every
///< request id before it is guaranteed to be done,
///< NOT every request id after it is guaranteed to be
///< pending.
UINT64 m_lastMetaCompletedRequestId; ///< Last Metadata completed request Id
UINT64 m_lastInOrderMetaCompletedRequestId; ///< Last monotonically increasing request id, every
///< request id before it is guaranteed to be done with
///< the metadata, NOT every request id after it is
///< guaranteed to be pending.
UINT64 m_lastRequestIdScanned; ///< Last requestId that was updated while scanning
/// for last valid request
BOOL m_isIPERealtime; ///< is IPE in RealTime
UINT64 m_lastSubmittedRequestId; ///< Last request id before stream off
UINT64 m_submittedRequestCount; ///< submitted request number
UINT32 m_lastSubmittedSequenceId; ///< Last sequence id before stream off
UINT64 m_lastSubmittedShutterRequestId; ///< Last shutter notification request id
UINT64 m_lastSubmittedSHDRSOFTimeRequestId; ///< Last sublitted SHDR sof time request id
CHIDEACTIVATEPIPELINEMODE m_lastModeBitMask; ///< modeBitMask of stream off
/// Pipeline::CheckForRecovery
VOID Pipeline::CheckForRecovery(
const CSLFrameMessage* pMessage)
{
//默认值是有效RequestId
BOOL isSequenceValid = (CamxInvalidRequestId != pMessage->requestID);
ResultsData errorData = {};
UINT32 requestQueueDepth = m_pSession->GetCurrentRequestQueueDepth();
BOOL sendRecovery = FALSE;//默认值
BOOL delayRecovery = FALSE;
const StaticSettings* pStaticSettings = m_pChiContext->GetStaticSettings();
// Reset the SOF watchdog counter if //重置SOF看门狗计数器,如果满足下列两个条件
// (1) SOF is serviced with valid request or //SOF服务一个无效request
// (2) there are no active request in pipeline //pipeline中是一个未激活的request
if ((TRUE == isSequenceValid) || (m_lastCompletedRequestId == m_lastSubmittedRequestId))
{//如果是无效Invalid RequestId,或者最后完成的RequestId等于最后递交的RequestId,则计数清0
m_invalidSOFCounter = 0;
}
else //否则
{
// We only want to trigger recovery if there has been activity and we are not flushing
//我们只希望在有活动且不在flushing的情况下触发recovery
if ((FALSE == GetFlushStatus()) && (FALSE == m_pSession->IsResultHolderEmpty()))
{//flush状态是false,且node result个数的空的情况下
// Calculate SOF threshold based on frame rate 基于帧率计算SOF阈值
UINT32 frameRateMultiplier = ((m_pPipelineDescriptor->maxFPSValue / 30) / m_numMaxBatchedFrames);
if (0 == frameRateMultiplier)
{
frameRateMultiplier = 1;
}
// If invalidSOF counter has hit twice the request queue depth, we know that we are stuck
//如果Invalid SOF计数器达到了请求队列深度的两倍,我们就知道自己卡住了
if (m_invalidSOFCounter > (requestQueueDepth * 2 * frameRateMultiplier))
{
for (UINT i = 0; i < MaxPerRequestInfo; i++)
{
if (TRUE == m_perRequestInfo[i].isSlowdownPresent)
{
delayRecovery = TRUE;
//检测到request速度减慢,不触发recovery
CAMX_LOG_INFO(CamxLogGroupCore, "Detected a slowdown for request:%llu, don't trigger recovery",
m_perRequestInfo[i].request.requestId);
m_perRequestInfo[i].isSlowdownPresent = FALSE;
}
}
if ((m_lastCompletedRequestId != m_lastSubmittedRequestId) && (FALSE == delayRecovery))
{ //提示:连续无效requestId的帧达到SOF阈值,正在触发pipeline的watchdog recovery。
CAMX_LOG_ERROR(CamxLogGroupCore, "Hit SOF threshold of [%d] consecutive frames with invalid"
"requestId; triggering watchdog recovery for pipeline %s",
m_invalidSOFCounter, GetPipelineIdentifierString());
// Since we need to do recovery and we don't have way of knowing which handle to choose
// because we have gotten continuous SOF, we are attempting with an arbitrary handle
//由于我们需要进行recovery,而我们无法知道该选择哪个handle
//因为我们得到了连续的sof,所以我们尝试使用任意句柄
PerRequestInfo* pPerRequestInfo = &m_perRequestInfo[0];
sendRecovery = TRUE;//此时send Recovery,触发Recovery
}
m_invalidSOFCounter = 0;//send Recovery后清0
}
else
{
m_invalidSOFCounter++;//计算器加1
}
}
else//否则清0
{
m_invalidSOFCounter = 0;
}
}
if (TRUE == sendRecovery)
{
UINT32 fenceErrorCode;
fenceErrorCode = GetFenceErrorCode();
// Preference to fence error than SOF WatchDog Timeout
if (UINT32_MAX != fenceErrorCode)
{ //此时将调用ProcessSystemEventMessage函数
PipelineTriggerSystemEvent(fenceErrorCode, FALSE, FALSE);
}
else
{ //此时将调用CanTriggerSOFWatchDogTimeout函数
PipelineTriggerSystemEvent(CamxResultExtCoreSOFWatchDogTimeout, FALSE, FALSE);
}
}
}
【关注我,后续持续新增专题博文,谢谢!!!】
下一篇讲解: