Message ID | 20230202081312.404394-1-alan.previn.teres.alexis@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/i915/pxp: limit drm-errors or warnings on firmware API failures | expand |
On 02/02/2023 08:13, Alan Previn wrote: > MESA driver is creating protected context on every driver handle > initialization to query caps bit for app. So when running CI tests, > they are observing hundreds of drm_errors when enabling PXP > in .config but using SOC or BIOS configuration that cannot support > PXP sessions. > > Update error handling codes to be more selective on which errors > are reported as drm_error vs drm_WARN_ONCE vs drm_debug. > Don't completely remove all FW error replies (at least keep them > but use drm_debug) or else cusomers that really needs to know that > content protection failed won't be aware of it when debugging. > > Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com> How does this relate to b762787bf767 ("drm/i915/pxp: Use drm_dbg if arb session failed due to fw version") which I thought was already fixing the drm_error spam caused by userspace probing? Regards, Tvrtko > --- > .../i915/pxp/intel_pxp_cmd_interface_cmn.h | 3 ++ > drivers/gpu/drm/i915/pxp/intel_pxp_session.c | 2 +- > drivers/gpu/drm/i915/pxp/intel_pxp_tee.c | 52 ++++++++++++++----- > 3 files changed, 44 insertions(+), 13 deletions(-) > > diff --git a/drivers/gpu/drm/i915/pxp/intel_pxp_cmd_interface_cmn.h b/drivers/gpu/drm/i915/pxp/intel_pxp_cmd_interface_cmn.h > index ae9b151b7cb7..6f6541d5e49a 100644 > --- a/drivers/gpu/drm/i915/pxp/intel_pxp_cmd_interface_cmn.h > +++ b/drivers/gpu/drm/i915/pxp/intel_pxp_cmd_interface_cmn.h > @@ -18,6 +18,9 @@ > enum pxp_status { > PXP_STATUS_SUCCESS = 0x0, > PXP_STATUS_ERROR_API_VERSION = 0x1002, > + PXP_STATUS_NOT_READY = 0x100e, > + PXP_STATUS_PLATFCONFIG_KF1_NOVERIF = 0x101a, > + PXP_STATUS_PLATFCONFIG_KF1_BAD = 0x101f, > PXP_STATUS_OP_NOT_PERMITTED = 0x4013 > }; > > diff --git a/drivers/gpu/drm/i915/pxp/intel_pxp_session.c b/drivers/gpu/drm/i915/pxp/intel_pxp_session.c > index 448cacb0465d..7de849cb6c47 100644 > --- a/drivers/gpu/drm/i915/pxp/intel_pxp_session.c > +++ b/drivers/gpu/drm/i915/pxp/intel_pxp_session.c > @@ -74,7 +74,7 @@ static int pxp_create_arb_session(struct intel_pxp *pxp) > > ret = pxp_wait_for_session_state(pxp, ARB_SESSION, true); > if (ret) { > - drm_err(>->i915->drm, "arb session failed to go in play\n"); > + drm_dbg(>->i915->drm, "arb session failed to go in play\n"); > return ret; > } > drm_dbg(>->i915->drm, "PXP ARB session is alive\n"); > diff --git a/drivers/gpu/drm/i915/pxp/intel_pxp_tee.c b/drivers/gpu/drm/i915/pxp/intel_pxp_tee.c > index d9d248b48093..1c2e4a75a968 100644 > --- a/drivers/gpu/drm/i915/pxp/intel_pxp_tee.c > +++ b/drivers/gpu/drm/i915/pxp/intel_pxp_tee.c > @@ -19,6 +19,23 @@ > #include "intel_pxp_tee.h" > #include "intel_pxp_types.h" > > +static const char * > +pxp_fw_err_to_string(u32 type) > +{ > + switch (type) { > + case PXP_STATUS_ERROR_API_VERSION: > + return "ERR_API_VERSION"; > + case PXP_STATUS_NOT_READY: > + return "ERR_NOT_READY"; > + case PXP_STATUS_PLATFCONFIG_KF1_NOVERIF: > + case PXP_STATUS_PLATFCONFIG_KF1_BAD: > + return "ERR_PLATFORM_CONFIG"; > + default: > + break; > + } > + return NULL; > +} > + > static int intel_pxp_tee_io_message(struct intel_pxp *pxp, > void *msg_in, u32 msg_in_size, > void *msg_out, u32 msg_out_max_size, > @@ -307,15 +324,19 @@ int intel_pxp_tee_cmd_create_arb_session(struct intel_pxp *pxp, > &msg_out, sizeof(msg_out), > NULL); > > - if (ret) > - drm_err(&i915->drm, "Failed to send tee msg ret=[%d]\n", ret); > - else if (msg_out.header.status == PXP_STATUS_ERROR_API_VERSION) > - drm_dbg(&i915->drm, "PXP firmware version unsupported, requested: " > - "CMD-ID-[0x%08x] on API-Ver-[0x%08x]\n", > + if (ret) { > + drm_err(&i915->drm, "Failed to send tee msg init arb session, ret=[%d]\n", ret); > + } else if (msg_out.header.status != 0) { > + if (msg_out.header.status == PXP_STATUS_PLATFCONFIG_KF1_NOVERIF || > + msg_out.header.status == PXP_STATUS_PLATFCONFIG_KF1_BAD) > + drm_WARN_ONCE(&i915->drm, true, > + "Platform BIOS or Fusing won't allow PXP arb creation\n"); > + > + drm_dbg(&i915->drm, "PXP init arb session failed 0x%08x:%s:" > + "CMD-ID-[0x%08x]:API-Ver-[0x%08x]\n", > + msg_out.header.status, pxp_fw_err_to_string(msg_out.header.status), > msg_in.header.command_id, msg_in.header.api_version); > - else if (msg_out.header.status != 0x0) > - drm_warn(&i915->drm, "PXP firmware failed arb session init request ret=[0x%08x]\n", > - msg_out.header.status); > + } > > return ret; > } > @@ -347,10 +368,17 @@ void intel_pxp_tee_end_arb_fw_session(struct intel_pxp *pxp, u32 session_id) > if ((ret || msg_out.header.status != 0x0) && ++trials < 3) > goto try_again; > > - if (ret) > + if (ret) { > drm_err(&i915->drm, "Failed to send tee msg for inv-stream-key-%d, ret=[%d]\n", > session_id, ret); > - else if (msg_out.header.status != 0x0) > - drm_warn(&i915->drm, "PXP firmware failed inv-stream-key-%d with status 0x%08x\n", > - session_id, msg_out.header.status); > + } else if (msg_out.header.status != 0) { > + if (msg_out.header.status == PXP_STATUS_PLATFCONFIG_KF1_NOVERIF || > + msg_out.header.status == PXP_STATUS_PLATFCONFIG_KF1_BAD) > + drm_WARN_ONCE(&i915->drm, true, > + "Platform BIOS or Fusing won't allow PXP arb creation\n"); > + drm_dbg(&i915->drm, "PXP inv-stream-key-%d failed 0x%08x:%st:\n" > + "CMD-ID-[0x%08x]:API-Ver-[0x%08x]\n", (int)session_id, > + msg_out.header.status, pxp_fw_err_to_string(msg_out.header.status), > + msg_in.header.command_id, msg_in.header.api_version); > + } > }
On Thu, 2023-02-02 at 08:43 +0000, Tvrtko Ursulin wrote: > > On 02/02/2023 08:13, Alan Previn wrote: > > MESA driver is creating protected context on every driver handle > > initialization to query caps bit for app. So when running CI tests, > > they are observing hundreds of drm_errors when enabling PXP > > in .config but using SOC or BIOS configuration that cannot support > > PXP sessions. > > > > Update error handling codes to be more selective on which errors > > are reported as drm_error vs drm_WARN_ONCE vs drm_debug. > > Don't completely remove all FW error replies (at least keep them > > but use drm_debug) or else cusomers that really needs to know that > > content protection failed won't be aware of it when debugging. > > > > Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com> > > How does this relate to b762787bf767 ("drm/i915/pxp: Use drm_dbg if arb > session failed due to fw version") which I thought was already fixing > the drm_error spam caused by userspace probing? > Good question. That previous error was specific to a board that was using outdated firmware version that really needed to be upgraded. At that point i wasn't aware of the the fact that MESA was seeing high frequency of this failure that is tied to platform issues (BIOS configuration / SOC fusing). Also, i believe in the prior case PXP was not enabled by default the .config in all testing. In this latest reported bug (i realized i forgot to include the bug no. for this new patch - https://gitlab.freedesktop.org/drm/intel/-/issues/7706#note_1746952), i was informed that PXP is being enabled by default and there were DUT hardware that was not PXP-capable (SOC fusing / BIOS config). So with this patch, i am trying to balance between issues that is critical but are root-caused from HW/platform gaps (louder drm-warn - but just ONCE) vs other cases where it could also come from hw/sw state machine (which cannot be a WARB_ONCE message since it can occur due to runtime operation events). One thing to note: i am pushing-for / waiting-on our firmware team to get blessing on more fw-error-code to error-string translations that can be allowed upstream which is why i added the "pxp_fw_err_to_string" and a single "drm_dbg" so that in future, we don't have to keep adding a whole new lines of code to multiple functions but just one new error code translation - and instead just add the new err-code-to-string entry into a single location. note: i will re-rev with the bug id.
On 02/02/2023 17:11, Teres Alexis, Alan Previn wrote: > On Thu, 2023-02-02 at 08:43 +0000, Tvrtko Ursulin wrote: >> >> On 02/02/2023 08:13, Alan Previn wrote: >>> MESA driver is creating protected context on every driver handle >>> initialization to query caps bit for app. So when running CI tests, >>> they are observing hundreds of drm_errors when enabling PXP >>> in .config but using SOC or BIOS configuration that cannot support >>> PXP sessions. >>> >>> Update error handling codes to be more selective on which errors >>> are reported as drm_error vs drm_WARN_ONCE vs drm_debug. >>> Don't completely remove all FW error replies (at least keep them >>> but use drm_debug) or else cusomers that really needs to know that >>> content protection failed won't be aware of it when debugging. >>> >>> Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com> >> >> How does this relate to b762787bf767 ("drm/i915/pxp: Use drm_dbg if arb >> session failed due to fw version") which I thought was already fixing >> the drm_error spam caused by userspace probing? >> > Good question. That previous error was specific to a board that was using > outdated firmware version that really needed to be upgraded. > At that point i wasn't aware of the the fact that MESA was seeing > high frequency of this failure that is tied to platform issues > (BIOS configuration / SOC fusing). Also, i believe in the prior case > PXP was not enabled by default the .config in all testing. > > In this latest reported bug (i realized i forgot to include the bug no. for this > new patch - https://gitlab.freedesktop.org/drm/intel/-/issues/7706#note_1746952), > i was informed that PXP is being enabled by default and there > were DUT hardware that was not PXP-capable (SOC fusing / BIOS config). > > So with this patch, i am trying to balance between issues that is critical > but are root-caused from HW/platform gaps (louder drm-warn - but just ONCE) > vs other cases where it could also come from hw/sw state machine (which cannot > be a WARB_ONCE message since it can occur due to runtime operation events). > > One thing to note: i am pushing-for / waiting-on our firmware team to get > blessing on more fw-error-code to error-string translations that can be allowed > upstream which is why i added the "pxp_fw_err_to_string" and a single > "drm_dbg" so that in future, we don't have to keep adding a whole new lines of > code to multiple functions but just one new error code translation - and instead > just add the new err-code-to-string entry into a single location. > > note: i will re-rev with the bug id. Thanks for the details. Yes definitely avoid any drm_warn/err/WARN on invalid conditions/usage that can be triggered from userspace. And given the bug report is about TGL probably try to add a Fixes: tag with an appropriate target too, so that there is less bug re-reports from the released kernels. Regards, Tvrtko
diff --git a/drivers/gpu/drm/i915/pxp/intel_pxp_cmd_interface_cmn.h b/drivers/gpu/drm/i915/pxp/intel_pxp_cmd_interface_cmn.h index ae9b151b7cb7..6f6541d5e49a 100644 --- a/drivers/gpu/drm/i915/pxp/intel_pxp_cmd_interface_cmn.h +++ b/drivers/gpu/drm/i915/pxp/intel_pxp_cmd_interface_cmn.h @@ -18,6 +18,9 @@ enum pxp_status { PXP_STATUS_SUCCESS = 0x0, PXP_STATUS_ERROR_API_VERSION = 0x1002, + PXP_STATUS_NOT_READY = 0x100e, + PXP_STATUS_PLATFCONFIG_KF1_NOVERIF = 0x101a, + PXP_STATUS_PLATFCONFIG_KF1_BAD = 0x101f, PXP_STATUS_OP_NOT_PERMITTED = 0x4013 }; diff --git a/drivers/gpu/drm/i915/pxp/intel_pxp_session.c b/drivers/gpu/drm/i915/pxp/intel_pxp_session.c index 448cacb0465d..7de849cb6c47 100644 --- a/drivers/gpu/drm/i915/pxp/intel_pxp_session.c +++ b/drivers/gpu/drm/i915/pxp/intel_pxp_session.c @@ -74,7 +74,7 @@ static int pxp_create_arb_session(struct intel_pxp *pxp) ret = pxp_wait_for_session_state(pxp, ARB_SESSION, true); if (ret) { - drm_err(>->i915->drm, "arb session failed to go in play\n"); + drm_dbg(>->i915->drm, "arb session failed to go in play\n"); return ret; } drm_dbg(>->i915->drm, "PXP ARB session is alive\n"); diff --git a/drivers/gpu/drm/i915/pxp/intel_pxp_tee.c b/drivers/gpu/drm/i915/pxp/intel_pxp_tee.c index d9d248b48093..1c2e4a75a968 100644 --- a/drivers/gpu/drm/i915/pxp/intel_pxp_tee.c +++ b/drivers/gpu/drm/i915/pxp/intel_pxp_tee.c @@ -19,6 +19,23 @@ #include "intel_pxp_tee.h" #include "intel_pxp_types.h" +static const char * +pxp_fw_err_to_string(u32 type) +{ + switch (type) { + case PXP_STATUS_ERROR_API_VERSION: + return "ERR_API_VERSION"; + case PXP_STATUS_NOT_READY: + return "ERR_NOT_READY"; + case PXP_STATUS_PLATFCONFIG_KF1_NOVERIF: + case PXP_STATUS_PLATFCONFIG_KF1_BAD: + return "ERR_PLATFORM_CONFIG"; + default: + break; + } + return NULL; +} + static int intel_pxp_tee_io_message(struct intel_pxp *pxp, void *msg_in, u32 msg_in_size, void *msg_out, u32 msg_out_max_size, @@ -307,15 +324,19 @@ int intel_pxp_tee_cmd_create_arb_session(struct intel_pxp *pxp, &msg_out, sizeof(msg_out), NULL); - if (ret) - drm_err(&i915->drm, "Failed to send tee msg ret=[%d]\n", ret); - else if (msg_out.header.status == PXP_STATUS_ERROR_API_VERSION) - drm_dbg(&i915->drm, "PXP firmware version unsupported, requested: " - "CMD-ID-[0x%08x] on API-Ver-[0x%08x]\n", + if (ret) { + drm_err(&i915->drm, "Failed to send tee msg init arb session, ret=[%d]\n", ret); + } else if (msg_out.header.status != 0) { + if (msg_out.header.status == PXP_STATUS_PLATFCONFIG_KF1_NOVERIF || + msg_out.header.status == PXP_STATUS_PLATFCONFIG_KF1_BAD) + drm_WARN_ONCE(&i915->drm, true, + "Platform BIOS or Fusing won't allow PXP arb creation\n"); + + drm_dbg(&i915->drm, "PXP init arb session failed 0x%08x:%s:" + "CMD-ID-[0x%08x]:API-Ver-[0x%08x]\n", + msg_out.header.status, pxp_fw_err_to_string(msg_out.header.status), msg_in.header.command_id, msg_in.header.api_version); - else if (msg_out.header.status != 0x0) - drm_warn(&i915->drm, "PXP firmware failed arb session init request ret=[0x%08x]\n", - msg_out.header.status); + } return ret; } @@ -347,10 +368,17 @@ void intel_pxp_tee_end_arb_fw_session(struct intel_pxp *pxp, u32 session_id) if ((ret || msg_out.header.status != 0x0) && ++trials < 3) goto try_again; - if (ret) + if (ret) { drm_err(&i915->drm, "Failed to send tee msg for inv-stream-key-%d, ret=[%d]\n", session_id, ret); - else if (msg_out.header.status != 0x0) - drm_warn(&i915->drm, "PXP firmware failed inv-stream-key-%d with status 0x%08x\n", - session_id, msg_out.header.status); + } else if (msg_out.header.status != 0) { + if (msg_out.header.status == PXP_STATUS_PLATFCONFIG_KF1_NOVERIF || + msg_out.header.status == PXP_STATUS_PLATFCONFIG_KF1_BAD) + drm_WARN_ONCE(&i915->drm, true, + "Platform BIOS or Fusing won't allow PXP arb creation\n"); + drm_dbg(&i915->drm, "PXP inv-stream-key-%d failed 0x%08x:%st:\n" + "CMD-ID-[0x%08x]:API-Ver-[0x%08x]\n", (int)session_id, + msg_out.header.status, pxp_fw_err_to_string(msg_out.header.status), + msg_in.header.command_id, msg_in.header.api_version); + } }
MESA driver is creating protected context on every driver handle initialization to query caps bit for app. So when running CI tests, they are observing hundreds of drm_errors when enabling PXP in .config but using SOC or BIOS configuration that cannot support PXP sessions. Update error handling codes to be more selective on which errors are reported as drm_error vs drm_WARN_ONCE vs drm_debug. Don't completely remove all FW error replies (at least keep them but use drm_debug) or else cusomers that really needs to know that content protection failed won't be aware of it when debugging. Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com> --- .../i915/pxp/intel_pxp_cmd_interface_cmn.h | 3 ++ drivers/gpu/drm/i915/pxp/intel_pxp_session.c | 2 +- drivers/gpu/drm/i915/pxp/intel_pxp_tee.c | 52 ++++++++++++++----- 3 files changed, 44 insertions(+), 13 deletions(-)