[00/15] HuC loading for DG2

Message ID	20220609231955.3632596-1-daniele.ceraolospurio@intel.com (mailing list archive)
Headers	show Return-Path: <intel-gfx-bounces@lists.freedesktop.org> From: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> To: intel-gfx@lists.freedesktop.org Date: Thu, 9 Jun 2022 16:19:40 -0700 Message-Id: <20220609231955.3632596-1-daniele.ceraolospurio@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Subject: [Intel-gfx] [PATCH 00/15] HuC loading for DG2 Precedence: list Cc: Alexander Usyskin <alexander.usyskin@intel.com>, alan.previn.teres.alexis@intel.com Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>
Series	HuC loading for DG2 \| expand [00/15] HuC loading for DG2 [01/15] HAX: mei: GSC support for XeHP SDV and DG2 platform [02/15] mei: add support to GSC extended header [03/15] mei: bus: enable sending gsc commands [04/15] mei: bus: extend bus API to support command streamer API [05/15] mei: pxp: add command streamer API to the PXP driver [06/15] mei: pxp: support matching with a gfx discrete card [07/15] drm/i915/pxp: load the pxp module when we have a gsc-loaded huc [08/15] drm/i915/pxp: implement function for sending tee stream command [09/15] drm/i915/pxp: add huc authentication and loading command [10/15] drm/i915/dg2: setup HuC loading via GSC [11/15] drm/i915/huc: track delayed HuC load with a fence [12/15] drm/i915/huc: stall media submission until HuC is loaded [13/15] drm/i915/huc: report HuC as loaded even if load still in progress [14/15] drm/i915/huc: define gsc-compatible HuC fw for DG2 [15/15] HAX: drm/i915: force INTEL_MEI_GSC and INTEL_MEI_PXP on for CI

Daniele Ceraolo Spurio June 9, 2022, 11:19 p.m. UTC

On DG2, HuC loading is performed by the GSC, via a PXP command. The load
operation itself is relatively simple (just send a message to the GSC
with the physical address of the HuC in LMEM), but there are timing
changes that requires special attention. In particular, to send a PXP
command we need to first export the GSC driver and then wait for the
mei-gsc and mei-pxp modules to start, which means that HuC load will
complete after i915 load is complete. This means that there is a small
window of time after i915 is registered and before HuC is loaded
during which userspace could submit and/or checking the HuC load status,
although this is quite unlikely to happen (HuC is usually loaded before
kernel init/resume completes).
We've consulted with the media team in regards to how to handle this and
they've asked us to do the following:

1) Report HuC as loaded in the getparam IOCTL even if load is still in
progress. The media driver uses the IOCTL as a way to check if HuC is
enabled and then includes a secondary check in the batches to get the
actual status, so doing it this way allows userspace to keep working
without changes.

2) Stall all userspace VCS submission until HuC is loaded. Stalls are
expected to be very rare (if any), due to the fact that HuC is usually
loaded before kernel init/resume is completed.

Timeouts are in place to ensure all submissions are unlocked in case
something goes wrong. Since we need to monitor the status of the mei
driver to know what's happening and when to time out, a notifier has
been added so we get a callback when the status of the mei driver
changes.

Note that this series depends on the GSC support for DG2 [1], which has
been included squashed in a single patch.

[1]: https://patchwork.freedesktop.org/series/102339/

Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Cc: Tony Ye <tony.ye@intel.com>
Cc: Alexander Usyskin <alexander.usyskin@intel.com>

Daniele Ceraolo Spurio (8):
  HAX: mei: GSC support for XeHP SDV and DG2 platform
  drm/i915/pxp: load the pxp module when we have a gsc-loaded huc
  drm/i915/dg2: setup HuC loading via GSC
  drm/i915/huc: track delayed HuC load with a fence
  drm/i915/huc: stall media submission until HuC is loaded
  drm/i915/huc: report HuC as loaded even if load still in progress
  drm/i915/huc: define gsc-compatible HuC fw for DG2
  HAX: drm/i915: force INTEL_MEI_GSC and INTEL_MEI_PXP on for CI

Tomas Winkler (4):
  mei: add support to GSC extended header
  mei: bus: enable sending gsc commands
  mei: pxp: support matching with a gfx discrete card
  drm/i915/pxp: add huc authentication and loading command

Vitaly Lubart (3):
  mei: bus: extend bus API to support command streamer API
  mei: pxp: add command streamer API to the PXP driver
  drm/i915/pxp: implement function for sending tee stream command

 drivers/gpu/drm/i915/Kconfig.debug            |   2 +
 drivers/gpu/drm/i915/Makefile                 |  11 +-
 drivers/gpu/drm/i915/gt/intel_gsc.c           | 141 +++++++++-
 drivers/gpu/drm/i915/gt/intel_gsc.h           |   3 +
 drivers/gpu/drm/i915/gt/uc/intel_huc.c        | 244 ++++++++++++++++--
 drivers/gpu/drm/i915/gt/uc/intel_huc.h        |  27 ++
 drivers/gpu/drm/i915/gt/uc/intel_huc_fw.c     |  34 +++
 drivers/gpu/drm/i915/gt/uc/intel_huc_fw.h     |   1 +
 drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c      |  64 +++--
 drivers/gpu/drm/i915/i915_request.c           |  24 ++
 drivers/gpu/drm/i915/pxp/intel_pxp.c          |  32 ++-
 drivers/gpu/drm/i915/pxp/intel_pxp.h          |  32 ---
 drivers/gpu/drm/i915/pxp/intel_pxp_huc.c      |  69 +++++
 drivers/gpu/drm/i915/pxp/intel_pxp_huc.h      |  15 ++
 drivers/gpu/drm/i915/pxp/intel_pxp_irq.h      |   8 +
 drivers/gpu/drm/i915/pxp/intel_pxp_session.c  |   8 +-
 drivers/gpu/drm/i915/pxp/intel_pxp_session.h  |  11 +-
 drivers/gpu/drm/i915/pxp/intel_pxp_tee.c      | 138 +++++++++-
 drivers/gpu/drm/i915/pxp/intel_pxp_tee.h      |   5 +
 .../drm/i915/pxp/intel_pxp_tee_interface.h    |  21 ++
 drivers/gpu/drm/i915/pxp/intel_pxp_types.h    |   6 +
 drivers/misc/mei/bus-fixup.c                  | 105 +++++---
 drivers/misc/mei/bus.c                        | 145 ++++++++++-
 drivers/misc/mei/client.c                     |  69 +++--
 drivers/misc/mei/debugfs.c                    |  17 ++
 drivers/misc/mei/gsc-me.c                     |  77 +++++-
 drivers/misc/mei/hbm.c                        |  25 +-
 drivers/misc/mei/hw-me-regs.h                 |   7 +
 drivers/misc/mei/hw-me.c                      | 123 +++++++--
 drivers/misc/mei/hw-me.h                      |  14 +-
 drivers/misc/mei/hw-txe.c                     |   2 +-
 drivers/misc/mei/hw.h                         |  62 +++++
 drivers/misc/mei/init.c                       |  21 +-
 drivers/misc/mei/interrupt.c                  |  47 +++-
 drivers/misc/mei/main.c                       |   2 +-
 drivers/misc/mei/mei_dev.h                    |  33 +++
 drivers/misc/mei/mkhi.h                       |  57 ++++
 drivers/misc/mei/pci-me.c                     |   2 +-
 drivers/misc/mei/pxp/mei_pxp.c                |  40 ++-
 include/drm/i915_pxp_tee_interface.h          |   5 +
 include/linux/mei_aux.h                       |   2 +
 include/linux/mei_cl_bus.h                    |   6 +
 42 files changed, 1537 insertions(+), 220 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/pxp/intel_pxp_huc.c
 create mode 100644 drivers/gpu/drm/i915/pxp/intel_pxp_huc.h
 create mode 100644 drivers/misc/mei/mkhi.h

Tvrtko Ursulin June 13, 2022, 8:16 a.m. UTC | #1

On 10/06/2022 00:19, Daniele Ceraolo Spurio wrote:
> On DG2, HuC loading is performed by the GSC, via a PXP command. The load
> operation itself is relatively simple (just send a message to the GSC
> with the physical address of the HuC in LMEM), but there are timing
> changes that requires special attention. In particular, to send a PXP
> command we need to first export the GSC driver and then wait for the
> mei-gsc and mei-pxp modules to start, which means that HuC load will
> complete after i915 load is complete. This means that there is a small
> window of time after i915 is registered and before HuC is loaded
> during which userspace could submit and/or checking the HuC load status,
> although this is quite unlikely to happen (HuC is usually loaded before
> kernel init/resume completes).
> We've consulted with the media team in regards to how to handle this and
> they've asked us to do the following:
> 
> 1) Report HuC as loaded in the getparam IOCTL even if load is still in
> progress. The media driver uses the IOCTL as a way to check if HuC is
> enabled and then includes a secondary check in the batches to get the
> actual status, so doing it this way allows userspace to keep working
> without changes.
> 
> 2) Stall all userspace VCS submission until HuC is loaded. Stalls are
> expected to be very rare (if any), due to the fact that HuC is usually
> loaded before kernel init/resume is completed.

Motivation to add these complications into i915 are not clear to me 
here. I mean there is no HuC on DG2 _yet_ is the premise of the series, 
right? So no backwards compatibility concerns. In this case why jump 
through the hoops and not let userspace handle all of this by just 
leaving the getparam return the true status?

Regards,

Tvrtko

> Timeouts are in place to ensure all submissions are unlocked in case
> something goes wrong. Since we need to monitor the status of the mei
> driver to know what's happening and when to time out, a notifier has
> been added so we get a callback when the status of the mei driver
> changes.
> 
> Note that this series depends on the GSC support for DG2 [1], which has
> been included squashed in a single patch.
> 
> [1]: https://patchwork.freedesktop.org/series/102339/
> 
> Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
> Cc: Tony Ye <tony.ye@intel.com>
> Cc: Alexander Usyskin <alexander.usyskin@intel.com>
> 
> Daniele Ceraolo Spurio (8):
>    HAX: mei: GSC support for XeHP SDV and DG2 platform
>    drm/i915/pxp: load the pxp module when we have a gsc-loaded huc
>    drm/i915/dg2: setup HuC loading via GSC
>    drm/i915/huc: track delayed HuC load with a fence
>    drm/i915/huc: stall media submission until HuC is loaded
>    drm/i915/huc: report HuC as loaded even if load still in progress
>    drm/i915/huc: define gsc-compatible HuC fw for DG2
>    HAX: drm/i915: force INTEL_MEI_GSC and INTEL_MEI_PXP on for CI
> 
> Tomas Winkler (4):
>    mei: add support to GSC extended header
>    mei: bus: enable sending gsc commands
>    mei: pxp: support matching with a gfx discrete card
>    drm/i915/pxp: add huc authentication and loading command
> 
> Vitaly Lubart (3):
>    mei: bus: extend bus API to support command streamer API
>    mei: pxp: add command streamer API to the PXP driver
>    drm/i915/pxp: implement function for sending tee stream command
> 
>   drivers/gpu/drm/i915/Kconfig.debug            |   2 +
>   drivers/gpu/drm/i915/Makefile                 |  11 +-
>   drivers/gpu/drm/i915/gt/intel_gsc.c           | 141 +++++++++-
>   drivers/gpu/drm/i915/gt/intel_gsc.h           |   3 +
>   drivers/gpu/drm/i915/gt/uc/intel_huc.c        | 244 ++++++++++++++++--
>   drivers/gpu/drm/i915/gt/uc/intel_huc.h        |  27 ++
>   drivers/gpu/drm/i915/gt/uc/intel_huc_fw.c     |  34 +++
>   drivers/gpu/drm/i915/gt/uc/intel_huc_fw.h     |   1 +
>   drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c      |  64 +++--
>   drivers/gpu/drm/i915/i915_request.c           |  24 ++
>   drivers/gpu/drm/i915/pxp/intel_pxp.c          |  32 ++-
>   drivers/gpu/drm/i915/pxp/intel_pxp.h          |  32 ---
>   drivers/gpu/drm/i915/pxp/intel_pxp_huc.c      |  69 +++++
>   drivers/gpu/drm/i915/pxp/intel_pxp_huc.h      |  15 ++
>   drivers/gpu/drm/i915/pxp/intel_pxp_irq.h      |   8 +
>   drivers/gpu/drm/i915/pxp/intel_pxp_session.c  |   8 +-
>   drivers/gpu/drm/i915/pxp/intel_pxp_session.h  |  11 +-
>   drivers/gpu/drm/i915/pxp/intel_pxp_tee.c      | 138 +++++++++-
>   drivers/gpu/drm/i915/pxp/intel_pxp_tee.h      |   5 +
>   .../drm/i915/pxp/intel_pxp_tee_interface.h    |  21 ++
>   drivers/gpu/drm/i915/pxp/intel_pxp_types.h    |   6 +
>   drivers/misc/mei/bus-fixup.c                  | 105 +++++---
>   drivers/misc/mei/bus.c                        | 145 ++++++++++-
>   drivers/misc/mei/client.c                     |  69 +++--
>   drivers/misc/mei/debugfs.c                    |  17 ++
>   drivers/misc/mei/gsc-me.c                     |  77 +++++-
>   drivers/misc/mei/hbm.c                        |  25 +-
>   drivers/misc/mei/hw-me-regs.h                 |   7 +
>   drivers/misc/mei/hw-me.c                      | 123 +++++++--
>   drivers/misc/mei/hw-me.h                      |  14 +-
>   drivers/misc/mei/hw-txe.c                     |   2 +-
>   drivers/misc/mei/hw.h                         |  62 +++++
>   drivers/misc/mei/init.c                       |  21 +-
>   drivers/misc/mei/interrupt.c                  |  47 +++-
>   drivers/misc/mei/main.c                       |   2 +-
>   drivers/misc/mei/mei_dev.h                    |  33 +++
>   drivers/misc/mei/mkhi.h                       |  57 ++++
>   drivers/misc/mei/pci-me.c                     |   2 +-
>   drivers/misc/mei/pxp/mei_pxp.c                |  40 ++-
>   include/drm/i915_pxp_tee_interface.h          |   5 +
>   include/linux/mei_aux.h                       |   2 +
>   include/linux/mei_cl_bus.h                    |   6 +
>   42 files changed, 1537 insertions(+), 220 deletions(-)
>   create mode 100644 drivers/gpu/drm/i915/pxp/intel_pxp_huc.c
>   create mode 100644 drivers/gpu/drm/i915/pxp/intel_pxp_huc.h
>   create mode 100644 drivers/misc/mei/mkhi.h
>

Daniele Ceraolo Spurio June 13, 2022, 3:39 p.m. UTC | #2

On 6/13/2022 1:16 AM, Tvrtko Ursulin wrote:
>
> On 10/06/2022 00:19, Daniele Ceraolo Spurio wrote:
>> On DG2, HuC loading is performed by the GSC, via a PXP command. The load
>> operation itself is relatively simple (just send a message to the GSC
>> with the physical address of the HuC in LMEM), but there are timing
>> changes that requires special attention. In particular, to send a PXP
>> command we need to first export the GSC driver and then wait for the
>> mei-gsc and mei-pxp modules to start, which means that HuC load will
>> complete after i915 load is complete. This means that there is a small
>> window of time after i915 is registered and before HuC is loaded
>> during which userspace could submit and/or checking the HuC load status,
>> although this is quite unlikely to happen (HuC is usually loaded before
>> kernel init/resume completes).
>> We've consulted with the media team in regards to how to handle this and
>> they've asked us to do the following:
>>
>> 1) Report HuC as loaded in the getparam IOCTL even if load is still in
>> progress. The media driver uses the IOCTL as a way to check if HuC is
>> enabled and then includes a secondary check in the batches to get the
>> actual status, so doing it this way allows userspace to keep working
>> without changes.
>>
>> 2) Stall all userspace VCS submission until HuC is loaded. Stalls are
>> expected to be very rare (if any), due to the fact that HuC is usually
>> loaded before kernel init/resume is completed.
>
> Motivation to add these complications into i915 are not clear to me 
> here. I mean there is no HuC on DG2 _yet_ is the premise of the 
> series, right? So no backwards compatibility concerns. In this case 
> why jump through the hoops and not let userspace handle all of this by 
> just leaving the getparam return the true status?

The main areas impacted by the fact that we can't guarantee that HuC 
load is complete when i915 starts accepting submissions are boot and 
suspend/resume, with the latter being the main problem; GT reset is not 
a concern because HuC now survives it. A suspend/resume can be 
transparent to userspace and therefore the HuC status can temporarily 
flip from loaded to not without userspace knowledge, especially if we 
start going into deeper suspend states and start causing HuC resets when 
we go into runtime suspend. Note that this is different from what 
happens during GT reset for older platforms, because in that scenario we 
guarantee that HuC reload is complete before we restart the submission 
back-end, so userspace doesn't notice that the HuC status change. We had 
an internal discussion about this problem with both media and i915 archs 
and the conclusion was that the best option is for i915 to stall media 
submission while HuC (re-)load is in progress.

Implementation-wise, this turned out a bit more complicated than 
expected due to the required tracking of the mei-gsc status to enforce 
the timeouts. When I discussed this with the mei team they suggested 
using the notifier for that, which is what I went with. Another option I 
considered was to start a timer when the first media submission comes in 
while HuC load is in progress and stall for the maximum expected HuC 
load time; this would be more error prone, hence why I preferred the 
notifier option, but would be simpler. I'm open to other options here if 
you have any better ideas.

Thanks,
Daniele

>
> Regards,
>
> Tvrtko
>
>> Timeouts are in place to ensure all submissions are unlocked in case
>> something goes wrong. Since we need to monitor the status of the mei
>> driver to know what's happening and when to time out, a notifier has
>> been added so we get a callback when the status of the mei driver
>> changes.
>>
>> Note that this series depends on the GSC support for DG2 [1], which has
>> been included squashed in a single patch.
>>
>> [1]: https://patchwork.freedesktop.org/series/102339/
>>
>> Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
>> Cc: Tony Ye <tony.ye@intel.com>
>> Cc: Alexander Usyskin <alexander.usyskin@intel.com>
>>
>> Daniele Ceraolo Spurio (8):
>>    HAX: mei: GSC support for XeHP SDV and DG2 platform
>>    drm/i915/pxp: load the pxp module when we have a gsc-loaded huc
>>    drm/i915/dg2: setup HuC loading via GSC
>>    drm/i915/huc: track delayed HuC load with a fence
>>    drm/i915/huc: stall media submission until HuC is loaded
>>    drm/i915/huc: report HuC as loaded even if load still in progress
>>    drm/i915/huc: define gsc-compatible HuC fw for DG2
>>    HAX: drm/i915: force INTEL_MEI_GSC and INTEL_MEI_PXP on for CI
>>
>> Tomas Winkler (4):
>>    mei: add support to GSC extended header
>>    mei: bus: enable sending gsc commands
>>    mei: pxp: support matching with a gfx discrete card
>>    drm/i915/pxp: add huc authentication and loading command
>>
>> Vitaly Lubart (3):
>>    mei: bus: extend bus API to support command streamer API
>>    mei: pxp: add command streamer API to the PXP driver
>>    drm/i915/pxp: implement function for sending tee stream command
>>
>>   drivers/gpu/drm/i915/Kconfig.debug            |   2 +
>>   drivers/gpu/drm/i915/Makefile                 |  11 +-
>>   drivers/gpu/drm/i915/gt/intel_gsc.c           | 141 +++++++++-
>>   drivers/gpu/drm/i915/gt/intel_gsc.h           |   3 +
>>   drivers/gpu/drm/i915/gt/uc/intel_huc.c        | 244 ++++++++++++++++--
>>   drivers/gpu/drm/i915/gt/uc/intel_huc.h        |  27 ++
>>   drivers/gpu/drm/i915/gt/uc/intel_huc_fw.c     |  34 +++
>>   drivers/gpu/drm/i915/gt/uc/intel_huc_fw.h     |   1 +
>>   drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c      |  64 +++--
>>   drivers/gpu/drm/i915/i915_request.c           |  24 ++
>>   drivers/gpu/drm/i915/pxp/intel_pxp.c          |  32 ++-
>>   drivers/gpu/drm/i915/pxp/intel_pxp.h          |  32 ---
>>   drivers/gpu/drm/i915/pxp/intel_pxp_huc.c      |  69 +++++
>>   drivers/gpu/drm/i915/pxp/intel_pxp_huc.h      |  15 ++
>>   drivers/gpu/drm/i915/pxp/intel_pxp_irq.h      |   8 +
>>   drivers/gpu/drm/i915/pxp/intel_pxp_session.c  |   8 +-
>>   drivers/gpu/drm/i915/pxp/intel_pxp_session.h  |  11 +-
>>   drivers/gpu/drm/i915/pxp/intel_pxp_tee.c      | 138 +++++++++-
>>   drivers/gpu/drm/i915/pxp/intel_pxp_tee.h      |   5 +
>>   .../drm/i915/pxp/intel_pxp_tee_interface.h    |  21 ++
>>   drivers/gpu/drm/i915/pxp/intel_pxp_types.h    |   6 +
>>   drivers/misc/mei/bus-fixup.c                  | 105 +++++---
>>   drivers/misc/mei/bus.c                        | 145 ++++++++++-
>>   drivers/misc/mei/client.c                     |  69 +++--
>>   drivers/misc/mei/debugfs.c                    |  17 ++
>>   drivers/misc/mei/gsc-me.c                     |  77 +++++-
>>   drivers/misc/mei/hbm.c                        |  25 +-
>>   drivers/misc/mei/hw-me-regs.h                 |   7 +
>>   drivers/misc/mei/hw-me.c                      | 123 +++++++--
>>   drivers/misc/mei/hw-me.h                      |  14 +-
>>   drivers/misc/mei/hw-txe.c                     |   2 +-
>>   drivers/misc/mei/hw.h                         |  62 +++++
>>   drivers/misc/mei/init.c                       |  21 +-
>>   drivers/misc/mei/interrupt.c                  |  47 +++-
>>   drivers/misc/mei/main.c                       |   2 +-
>>   drivers/misc/mei/mei_dev.h                    |  33 +++
>>   drivers/misc/mei/mkhi.h                       |  57 ++++
>>   drivers/misc/mei/pci-me.c                     |   2 +-
>>   drivers/misc/mei/pxp/mei_pxp.c                |  40 ++-
>>   include/drm/i915_pxp_tee_interface.h          |   5 +
>>   include/linux/mei_aux.h                       |   2 +
>>   include/linux/mei_cl_bus.h                    |   6 +
>>   42 files changed, 1537 insertions(+), 220 deletions(-)
>>   create mode 100644 drivers/gpu/drm/i915/pxp/intel_pxp_huc.c
>>   create mode 100644 drivers/gpu/drm/i915/pxp/intel_pxp_huc.h
>>   create mode 100644 drivers/misc/mei/mkhi.h
>>

Tvrtko Ursulin June 13, 2022, 4:31 p.m. UTC | #3

On 13/06/2022 16:39, Ceraolo Spurio, Daniele wrote:
> On 6/13/2022 1:16 AM, Tvrtko Ursulin wrote:
>>
>> On 10/06/2022 00:19, Daniele Ceraolo Spurio wrote:
>>> On DG2, HuC loading is performed by the GSC, via a PXP command. The load
>>> operation itself is relatively simple (just send a message to the GSC
>>> with the physical address of the HuC in LMEM), but there are timing
>>> changes that requires special attention. In particular, to send a PXP
>>> command we need to first export the GSC driver and then wait for the
>>> mei-gsc and mei-pxp modules to start, which means that HuC load will
>>> complete after i915 load is complete. This means that there is a small
>>> window of time after i915 is registered and before HuC is loaded
>>> during which userspace could submit and/or checking the HuC load status,
>>> although this is quite unlikely to happen (HuC is usually loaded before
>>> kernel init/resume completes).
>>> We've consulted with the media team in regards to how to handle this and
>>> they've asked us to do the following:
>>>
>>> 1) Report HuC as loaded in the getparam IOCTL even if load is still in
>>> progress. The media driver uses the IOCTL as a way to check if HuC is
>>> enabled and then includes a secondary check in the batches to get the
>>> actual status, so doing it this way allows userspace to keep working
>>> without changes.
>>>
>>> 2) Stall all userspace VCS submission until HuC is loaded. Stalls are
>>> expected to be very rare (if any), due to the fact that HuC is usually
>>> loaded before kernel init/resume is completed.
>>
>> Motivation to add these complications into i915 are not clear to me 
>> here. I mean there is no HuC on DG2 _yet_ is the premise of the 
>> series, right? So no backwards compatibility concerns. In this case 
>> why jump through the hoops and not let userspace handle all of this by 
>> just leaving the getparam return the true status?
> 
> The main areas impacted by the fact that we can't guarantee that HuC 
> load is complete when i915 starts accepting submissions are boot and 
> suspend/resume, with the latter being the main problem; GT reset is not 
> a concern because HuC now survives it. A suspend/resume can be 
> transparent to userspace and therefore the HuC status can temporarily 
> flip from loaded to not without userspace knowledge, especially if we 
> start going into deeper suspend states and start causing HuC resets when 
> we go into runtime suspend. Note that this is different from what 
> happens during GT reset for older platforms, because in that scenario we 
> guarantee that HuC reload is complete before we restart the submission 
> back-end, so userspace doesn't notice that the HuC status change. We had 
> an internal discussion about this problem with both media and i915 archs 
> and the conclusion was that the best option is for i915 to stall media 
> submission while HuC (re-)load is in progress.

Resume is potentialy a good reason - I did not pick up on that from the 
cover letter. I read the statement about the unlikely and small window 
where HuC is not loaded during kernel init/resume and I guess did not 
pick up on the resume part.

Waiting for GSC to load HuC from i915 resume is not an option?

Will there be runtime suspend happening on the GSC device behind i915's 
back, or i915 and GSC will always be able to transition the states in 
tandem?

Regards,

Tvrtko

> Implementation-wise, this turned out a bit more complicated than 
> expected due to the required tracking of the mei-gsc status to enforce 
> the timeouts. When I discussed this with the mei team they suggested 
> using the notifier for that, which is what I went with. Another option I 
> considered was to start a timer when the first media submission comes in 
> while HuC load is in progress and stall for the maximum expected HuC 
> load time; this would be more error prone, hence why I preferred the 
> notifier option, but would be simpler. I'm open to other options here if 
> you have any better ideas.
> 
> Thanks,
> Daniele
> 
>>
>> Regards,
>>
>> Tvrtko
>>
>>> Timeouts are in place to ensure all submissions are unlocked in case
>>> something goes wrong. Since we need to monitor the status of the mei
>>> driver to know what's happening and when to time out, a notifier has
>>> been added so we get a callback when the status of the mei driver
>>> changes.
>>>
>>> Note that this series depends on the GSC support for DG2 [1], which has
>>> been included squashed in a single patch.
>>>
>>> [1]: https://patchwork.freedesktop.org/series/102339/
>>>
>>> Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
>>> Cc: Tony Ye <tony.ye@intel.com>
>>> Cc: Alexander Usyskin <alexander.usyskin@intel.com>
>>>
>>> Daniele Ceraolo Spurio (8):
>>>    HAX: mei: GSC support for XeHP SDV and DG2 platform
>>>    drm/i915/pxp: load the pxp module when we have a gsc-loaded huc
>>>    drm/i915/dg2: setup HuC loading via GSC
>>>    drm/i915/huc: track delayed HuC load with a fence
>>>    drm/i915/huc: stall media submission until HuC is loaded
>>>    drm/i915/huc: report HuC as loaded even if load still in progress
>>>    drm/i915/huc: define gsc-compatible HuC fw for DG2
>>>    HAX: drm/i915: force INTEL_MEI_GSC and INTEL_MEI_PXP on for CI
>>>
>>> Tomas Winkler (4):
>>>    mei: add support to GSC extended header
>>>    mei: bus: enable sending gsc commands
>>>    mei: pxp: support matching with a gfx discrete card
>>>    drm/i915/pxp: add huc authentication and loading command
>>>
>>> Vitaly Lubart (3):
>>>    mei: bus: extend bus API to support command streamer API
>>>    mei: pxp: add command streamer API to the PXP driver
>>>    drm/i915/pxp: implement function for sending tee stream command
>>>
>>>   drivers/gpu/drm/i915/Kconfig.debug            |   2 +
>>>   drivers/gpu/drm/i915/Makefile                 |  11 +-
>>>   drivers/gpu/drm/i915/gt/intel_gsc.c           | 141 +++++++++-
>>>   drivers/gpu/drm/i915/gt/intel_gsc.h           |   3 +
>>>   drivers/gpu/drm/i915/gt/uc/intel_huc.c        | 244 ++++++++++++++++--
>>>   drivers/gpu/drm/i915/gt/uc/intel_huc.h        |  27 ++
>>>   drivers/gpu/drm/i915/gt/uc/intel_huc_fw.c     |  34 +++
>>>   drivers/gpu/drm/i915/gt/uc/intel_huc_fw.h     |   1 +
>>>   drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c      |  64 +++--
>>>   drivers/gpu/drm/i915/i915_request.c           |  24 ++
>>>   drivers/gpu/drm/i915/pxp/intel_pxp.c          |  32 ++-
>>>   drivers/gpu/drm/i915/pxp/intel_pxp.h          |  32 ---
>>>   drivers/gpu/drm/i915/pxp/intel_pxp_huc.c      |  69 +++++
>>>   drivers/gpu/drm/i915/pxp/intel_pxp_huc.h      |  15 ++
>>>   drivers/gpu/drm/i915/pxp/intel_pxp_irq.h      |   8 +
>>>   drivers/gpu/drm/i915/pxp/intel_pxp_session.c  |   8 +-
>>>   drivers/gpu/drm/i915/pxp/intel_pxp_session.h  |  11 +-
>>>   drivers/gpu/drm/i915/pxp/intel_pxp_tee.c      | 138 +++++++++-
>>>   drivers/gpu/drm/i915/pxp/intel_pxp_tee.h      |   5 +
>>>   .../drm/i915/pxp/intel_pxp_tee_interface.h    |  21 ++
>>>   drivers/gpu/drm/i915/pxp/intel_pxp_types.h    |   6 +
>>>   drivers/misc/mei/bus-fixup.c                  | 105 +++++---
>>>   drivers/misc/mei/bus.c                        | 145 ++++++++++-
>>>   drivers/misc/mei/client.c                     |  69 +++--
>>>   drivers/misc/mei/debugfs.c                    |  17 ++
>>>   drivers/misc/mei/gsc-me.c                     |  77 +++++-
>>>   drivers/misc/mei/hbm.c                        |  25 +-
>>>   drivers/misc/mei/hw-me-regs.h                 |   7 +
>>>   drivers/misc/mei/hw-me.c                      | 123 +++++++--
>>>   drivers/misc/mei/hw-me.h                      |  14 +-
>>>   drivers/misc/mei/hw-txe.c                     |   2 +-
>>>   drivers/misc/mei/hw.h                         |  62 +++++
>>>   drivers/misc/mei/init.c                       |  21 +-
>>>   drivers/misc/mei/interrupt.c                  |  47 +++-
>>>   drivers/misc/mei/main.c                       |   2 +-
>>>   drivers/misc/mei/mei_dev.h                    |  33 +++
>>>   drivers/misc/mei/mkhi.h                       |  57 ++++
>>>   drivers/misc/mei/pci-me.c                     |   2 +-
>>>   drivers/misc/mei/pxp/mei_pxp.c                |  40 ++-
>>>   include/drm/i915_pxp_tee_interface.h          |   5 +
>>>   include/linux/mei_aux.h                       |   2 +
>>>   include/linux/mei_cl_bus.h                    |   6 +
>>>   42 files changed, 1537 insertions(+), 220 deletions(-)
>>>   create mode 100644 drivers/gpu/drm/i915/pxp/intel_pxp_huc.c
>>>   create mode 100644 drivers/gpu/drm/i915/pxp/intel_pxp_huc.h
>>>   create mode 100644 drivers/misc/mei/mkhi.h
>>>
>

Daniele Ceraolo Spurio June 13, 2022, 4:41 p.m. UTC | #4

On 6/13/2022 9:31 AM, Tvrtko Ursulin wrote:
>
> On 13/06/2022 16:39, Ceraolo Spurio, Daniele wrote:
>> On 6/13/2022 1:16 AM, Tvrtko Ursulin wrote:
>>>
>>> On 10/06/2022 00:19, Daniele Ceraolo Spurio wrote:
>>>> On DG2, HuC loading is performed by the GSC, via a PXP command. The 
>>>> load
>>>> operation itself is relatively simple (just send a message to the GSC
>>>> with the physical address of the HuC in LMEM), but there are timing
>>>> changes that requires special attention. In particular, to send a PXP
>>>> command we need to first export the GSC driver and then wait for the
>>>> mei-gsc and mei-pxp modules to start, which means that HuC load will
>>>> complete after i915 load is complete. This means that there is a small
>>>> window of time after i915 is registered and before HuC is loaded
>>>> during which userspace could submit and/or checking the HuC load 
>>>> status,
>>>> although this is quite unlikely to happen (HuC is usually loaded 
>>>> before
>>>> kernel init/resume completes).
>>>> We've consulted with the media team in regards to how to handle 
>>>> this and
>>>> they've asked us to do the following:
>>>>
>>>> 1) Report HuC as loaded in the getparam IOCTL even if load is still in
>>>> progress. The media driver uses the IOCTL as a way to check if HuC is
>>>> enabled and then includes a secondary check in the batches to get the
>>>> actual status, so doing it this way allows userspace to keep working
>>>> without changes.
>>>>
>>>> 2) Stall all userspace VCS submission until HuC is loaded. Stalls are
>>>> expected to be very rare (if any), due to the fact that HuC is usually
>>>> loaded before kernel init/resume is completed.
>>>
>>> Motivation to add these complications into i915 are not clear to me 
>>> here. I mean there is no HuC on DG2 _yet_ is the premise of the 
>>> series, right? So no backwards compatibility concerns. In this case 
>>> why jump through the hoops and not let userspace handle all of this 
>>> by just leaving the getparam return the true status?
>>
>> The main areas impacted by the fact that we can't guarantee that HuC 
>> load is complete when i915 starts accepting submissions are boot and 
>> suspend/resume, with the latter being the main problem; GT reset is 
>> not a concern because HuC now survives it. A suspend/resume can be 
>> transparent to userspace and therefore the HuC status can temporarily 
>> flip from loaded to not without userspace knowledge, especially if we 
>> start going into deeper suspend states and start causing HuC resets 
>> when we go into runtime suspend. Note that this is different from 
>> what happens during GT reset for older platforms, because in that 
>> scenario we guarantee that HuC reload is complete before we restart 
>> the submission back-end, so userspace doesn't notice that the HuC 
>> status change. We had an internal discussion about this problem with 
>> both media and i915 archs and the conclusion was that the best option 
>> is for i915 to stall media submission while HuC (re-)load is in 
>> progress.
>
> Resume is potentialy a good reason - I did not pick up on that from 
> the cover letter. I read the statement about the unlikely and small 
> window where HuC is not loaded during kernel init/resume and I guess 
> did not pick up on the resume part.
>
> Waiting for GSC to load HuC from i915 resume is not an option?

GSC is an aux device exported by i915, so AFAIU GSC resume can't start 
until i915 resume completes.

>
> Will there be runtime suspend happening on the GSC device behind 
> i915's back, or i915 and GSC will always be able to transition the 
> states in tandem?

They're always in sync. The GSC is part of the same HW PCI device as the 
rest of the GPU, so they change HW state together.

Daniele

>
> Regards,
>
> Tvrtko
>
>> Implementation-wise, this turned out a bit more complicated than 
>> expected due to the required tracking of the mei-gsc status to 
>> enforce the timeouts. When I discussed this with the mei team they 
>> suggested using the notifier for that, which is what I went with. 
>> Another option I considered was to start a timer when the first media 
>> submission comes in while HuC load is in progress and stall for the 
>> maximum expected HuC load time; this would be more error prone, hence 
>> why I preferred the notifier option, but would be simpler. I'm open 
>> to other options here if you have any better ideas.
>>
>> Thanks,
>> Daniele
>>
>>>
>>> Regards,
>>>
>>> Tvrtko
>>>
>>>> Timeouts are in place to ensure all submissions are unlocked in case
>>>> something goes wrong. Since we need to monitor the status of the mei
>>>> driver to know what's happening and when to time out, a notifier has
>>>> been added so we get a callback when the status of the mei driver
>>>> changes.
>>>>
>>>> Note that this series depends on the GSC support for DG2 [1], which 
>>>> has
>>>> been included squashed in a single patch.
>>>>
>>>> [1]: https://patchwork.freedesktop.org/series/102339/
>>>>
>>>> Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
>>>> Cc: Tony Ye <tony.ye@intel.com>
>>>> Cc: Alexander Usyskin <alexander.usyskin@intel.com>
>>>>
>>>> Daniele Ceraolo Spurio (8):
>>>>    HAX: mei: GSC support for XeHP SDV and DG2 platform
>>>>    drm/i915/pxp: load the pxp module when we have a gsc-loaded huc
>>>>    drm/i915/dg2: setup HuC loading via GSC
>>>>    drm/i915/huc: track delayed HuC load with a fence
>>>>    drm/i915/huc: stall media submission until HuC is loaded
>>>>    drm/i915/huc: report HuC as loaded even if load still in progress
>>>>    drm/i915/huc: define gsc-compatible HuC fw for DG2
>>>>    HAX: drm/i915: force INTEL_MEI_GSC and INTEL_MEI_PXP on for CI
>>>>
>>>> Tomas Winkler (4):
>>>>    mei: add support to GSC extended header
>>>>    mei: bus: enable sending gsc commands
>>>>    mei: pxp: support matching with a gfx discrete card
>>>>    drm/i915/pxp: add huc authentication and loading command
>>>>
>>>> Vitaly Lubart (3):
>>>>    mei: bus: extend bus API to support command streamer API
>>>>    mei: pxp: add command streamer API to the PXP driver
>>>>    drm/i915/pxp: implement function for sending tee stream command
>>>>
>>>>   drivers/gpu/drm/i915/Kconfig.debug            |   2 +
>>>>   drivers/gpu/drm/i915/Makefile                 |  11 +-
>>>>   drivers/gpu/drm/i915/gt/intel_gsc.c           | 141 +++++++++-
>>>>   drivers/gpu/drm/i915/gt/intel_gsc.h           |   3 +
>>>>   drivers/gpu/drm/i915/gt/uc/intel_huc.c        | 244 
>>>> ++++++++++++++++--
>>>>   drivers/gpu/drm/i915/gt/uc/intel_huc.h        |  27 ++
>>>>   drivers/gpu/drm/i915/gt/uc/intel_huc_fw.c     |  34 +++
>>>>   drivers/gpu/drm/i915/gt/uc/intel_huc_fw.h     |   1 +
>>>>   drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c      |  64 +++--
>>>>   drivers/gpu/drm/i915/i915_request.c           |  24 ++
>>>>   drivers/gpu/drm/i915/pxp/intel_pxp.c          |  32 ++-
>>>>   drivers/gpu/drm/i915/pxp/intel_pxp.h          |  32 ---
>>>>   drivers/gpu/drm/i915/pxp/intel_pxp_huc.c      |  69 +++++
>>>>   drivers/gpu/drm/i915/pxp/intel_pxp_huc.h      |  15 ++
>>>>   drivers/gpu/drm/i915/pxp/intel_pxp_irq.h      |   8 +
>>>>   drivers/gpu/drm/i915/pxp/intel_pxp_session.c  |   8 +-
>>>>   drivers/gpu/drm/i915/pxp/intel_pxp_session.h  |  11 +-
>>>>   drivers/gpu/drm/i915/pxp/intel_pxp_tee.c      | 138 +++++++++-
>>>>   drivers/gpu/drm/i915/pxp/intel_pxp_tee.h      |   5 +
>>>>   .../drm/i915/pxp/intel_pxp_tee_interface.h    |  21 ++
>>>>   drivers/gpu/drm/i915/pxp/intel_pxp_types.h    |   6 +
>>>>   drivers/misc/mei/bus-fixup.c                  | 105 +++++---
>>>>   drivers/misc/mei/bus.c                        | 145 ++++++++++-
>>>>   drivers/misc/mei/client.c                     |  69 +++--
>>>>   drivers/misc/mei/debugfs.c                    |  17 ++
>>>>   drivers/misc/mei/gsc-me.c                     |  77 +++++-
>>>>   drivers/misc/mei/hbm.c                        |  25 +-
>>>>   drivers/misc/mei/hw-me-regs.h                 |   7 +
>>>>   drivers/misc/mei/hw-me.c                      | 123 +++++++--
>>>>   drivers/misc/mei/hw-me.h                      |  14 +-
>>>>   drivers/misc/mei/hw-txe.c                     |   2 +-
>>>>   drivers/misc/mei/hw.h                         |  62 +++++
>>>>   drivers/misc/mei/init.c                       |  21 +-
>>>>   drivers/misc/mei/interrupt.c                  |  47 +++-
>>>>   drivers/misc/mei/main.c                       |   2 +-
>>>>   drivers/misc/mei/mei_dev.h                    |  33 +++
>>>>   drivers/misc/mei/mkhi.h                       |  57 ++++
>>>>   drivers/misc/mei/pci-me.c                     |   2 +-
>>>>   drivers/misc/mei/pxp/mei_pxp.c                |  40 ++-
>>>>   include/drm/i915_pxp_tee_interface.h          |   5 +
>>>>   include/linux/mei_aux.h                       |   2 +
>>>>   include/linux/mei_cl_bus.h                    |   6 +
>>>>   42 files changed, 1537 insertions(+), 220 deletions(-)
>>>>   create mode 100644 drivers/gpu/drm/i915/pxp/intel_pxp_huc.c
>>>>   create mode 100644 drivers/gpu/drm/i915/pxp/intel_pxp_huc.h
>>>>   create mode 100644 drivers/misc/mei/mkhi.h
>>>>
>>

Tvrtko Ursulin June 13, 2022, 4:56 p.m. UTC | #5

On 13/06/2022 17:41, Ceraolo Spurio, Daniele wrote:
> On 6/13/2022 9:31 AM, Tvrtko Ursulin wrote:
>>
>> On 13/06/2022 16:39, Ceraolo Spurio, Daniele wrote:
>>> On 6/13/2022 1:16 AM, Tvrtko Ursulin wrote:
>>>>
>>>> On 10/06/2022 00:19, Daniele Ceraolo Spurio wrote:
>>>>> On DG2, HuC loading is performed by the GSC, via a PXP command. The 
>>>>> load
>>>>> operation itself is relatively simple (just send a message to the GSC
>>>>> with the physical address of the HuC in LMEM), but there are timing
>>>>> changes that requires special attention. In particular, to send a PXP
>>>>> command we need to first export the GSC driver and then wait for the
>>>>> mei-gsc and mei-pxp modules to start, which means that HuC load will
>>>>> complete after i915 load is complete. This means that there is a small
>>>>> window of time after i915 is registered and before HuC is loaded
>>>>> during which userspace could submit and/or checking the HuC load 
>>>>> status,
>>>>> although this is quite unlikely to happen (HuC is usually loaded 
>>>>> before
>>>>> kernel init/resume completes).
>>>>> We've consulted with the media team in regards to how to handle 
>>>>> this and
>>>>> they've asked us to do the following:
>>>>>
>>>>> 1) Report HuC as loaded in the getparam IOCTL even if load is still in
>>>>> progress. The media driver uses the IOCTL as a way to check if HuC is
>>>>> enabled and then includes a secondary check in the batches to get the
>>>>> actual status, so doing it this way allows userspace to keep working
>>>>> without changes.
>>>>>
>>>>> 2) Stall all userspace VCS submission until HuC is loaded. Stalls are
>>>>> expected to be very rare (if any), due to the fact that HuC is usually
>>>>> loaded before kernel init/resume is completed.
>>>>
>>>> Motivation to add these complications into i915 are not clear to me 
>>>> here. I mean there is no HuC on DG2 _yet_ is the premise of the 
>>>> series, right? So no backwards compatibility concerns. In this case 
>>>> why jump through the hoops and not let userspace handle all of this 
>>>> by just leaving the getparam return the true status?
>>>
>>> The main areas impacted by the fact that we can't guarantee that HuC 
>>> load is complete when i915 starts accepting submissions are boot and 
>>> suspend/resume, with the latter being the main problem; GT reset is 
>>> not a concern because HuC now survives it. A suspend/resume can be 
>>> transparent to userspace and therefore the HuC status can temporarily 
>>> flip from loaded to not without userspace knowledge, especially if we 
>>> start going into deeper suspend states and start causing HuC resets 
>>> when we go into runtime suspend. Note that this is different from 
>>> what happens during GT reset for older platforms, because in that 
>>> scenario we guarantee that HuC reload is complete before we restart 
>>> the submission back-end, so userspace doesn't notice that the HuC 
>>> status change. We had an internal discussion about this problem with 
>>> both media and i915 archs and the conclusion was that the best option 
>>> is for i915 to stall media submission while HuC (re-)load is in 
>>> progress.
>>
>> Resume is potentialy a good reason - I did not pick up on that from 
>> the cover letter. I read the statement about the unlikely and small 
>> window where HuC is not loaded during kernel init/resume and I guess 
>> did not pick up on the resume part.
>>
>> Waiting for GSC to load HuC from i915 resume is not an option?
> 
> GSC is an aux device exported by i915, so AFAIU GSC resume can't start 
> until i915 resume completes.

I'll dig into this in the next few days since I want to understand how 
exactly it works. Or someone can help explain.

If in the end conclusion will be that i915 resume indeed cannot wait for 
GSC, then I think auto-blocking of queued up contexts on media engines 
indeed sounds unavoidable. Otherwise, as you explained, user experience 
post resume wouldn't be good.

However, do we really need to lie in the getparam? How about extend or 
add a new one to separate the loading vs loaded states? Since userspace 
does not support DG2 HuC yet this should be doable.

>> Will there be runtime suspend happening on the GSC device behind 
>> i915's back, or i915 and GSC will always be able to transition the 
>> states in tandem?
> 
> They're always in sync. The GSC is part of the same HW PCI device as the 
> rest of the GPU, so they change HW state together.

Okay thanks, I wasn't sure if it is the same or separate device.

Regards,

Tvrtko

Daniele Ceraolo Spurio June 13, 2022, 5:06 p.m. UTC | #6

On 6/13/2022 9:56 AM, Tvrtko Ursulin wrote:
>
> On 13/06/2022 17:41, Ceraolo Spurio, Daniele wrote:
>> On 6/13/2022 9:31 AM, Tvrtko Ursulin wrote:
>>>
>>> On 13/06/2022 16:39, Ceraolo Spurio, Daniele wrote:
>>>> On 6/13/2022 1:16 AM, Tvrtko Ursulin wrote:
>>>>>
>>>>> On 10/06/2022 00:19, Daniele Ceraolo Spurio wrote:
>>>>>> On DG2, HuC loading is performed by the GSC, via a PXP command. 
>>>>>> The load
>>>>>> operation itself is relatively simple (just send a message to the 
>>>>>> GSC
>>>>>> with the physical address of the HuC in LMEM), but there are timing
>>>>>> changes that requires special attention. In particular, to send a 
>>>>>> PXP
>>>>>> command we need to first export the GSC driver and then wait for the
>>>>>> mei-gsc and mei-pxp modules to start, which means that HuC load will
>>>>>> complete after i915 load is complete. This means that there is a 
>>>>>> small
>>>>>> window of time after i915 is registered and before HuC is loaded
>>>>>> during which userspace could submit and/or checking the HuC load 
>>>>>> status,
>>>>>> although this is quite unlikely to happen (HuC is usually loaded 
>>>>>> before
>>>>>> kernel init/resume completes).
>>>>>> We've consulted with the media team in regards to how to handle 
>>>>>> this and
>>>>>> they've asked us to do the following:
>>>>>>
>>>>>> 1) Report HuC as loaded in the getparam IOCTL even if load is 
>>>>>> still in
>>>>>> progress. The media driver uses the IOCTL as a way to check if 
>>>>>> HuC is
>>>>>> enabled and then includes a secondary check in the batches to get 
>>>>>> the
>>>>>> actual status, so doing it this way allows userspace to keep working
>>>>>> without changes.
>>>>>>
>>>>>> 2) Stall all userspace VCS submission until HuC is loaded. Stalls 
>>>>>> are
>>>>>> expected to be very rare (if any), due to the fact that HuC is 
>>>>>> usually
>>>>>> loaded before kernel init/resume is completed.
>>>>>
>>>>> Motivation to add these complications into i915 are not clear to 
>>>>> me here. I mean there is no HuC on DG2 _yet_ is the premise of the 
>>>>> series, right? So no backwards compatibility concerns. In this 
>>>>> case why jump through the hoops and not let userspace handle all 
>>>>> of this by just leaving the getparam return the true status?
>>>>
>>>> The main areas impacted by the fact that we can't guarantee that 
>>>> HuC load is complete when i915 starts accepting submissions are 
>>>> boot and suspend/resume, with the latter being the main problem; GT 
>>>> reset is not a concern because HuC now survives it. A 
>>>> suspend/resume can be transparent to userspace and therefore the 
>>>> HuC status can temporarily flip from loaded to not without 
>>>> userspace knowledge, especially if we start going into deeper 
>>>> suspend states and start causing HuC resets when we go into runtime 
>>>> suspend. Note that this is different from what happens during GT 
>>>> reset for older platforms, because in that scenario we guarantee 
>>>> that HuC reload is complete before we restart the submission 
>>>> back-end, so userspace doesn't notice that the HuC status change. 
>>>> We had an internal discussion about this problem with both media 
>>>> and i915 archs and the conclusion was that the best option is for 
>>>> i915 to stall media submission while HuC (re-)load is in progress.
>>>
>>> Resume is potentialy a good reason - I did not pick up on that from 
>>> the cover letter. I read the statement about the unlikely and small 
>>> window where HuC is not loaded during kernel init/resume and I guess 
>>> did not pick up on the resume part.
>>>
>>> Waiting for GSC to load HuC from i915 resume is not an option?
>>
>> GSC is an aux device exported by i915, so AFAIU GSC resume can't 
>> start until i915 resume completes.
>
> I'll dig into this in the next few days since I want to understand how 
> exactly it works. Or someone can help explain.
>
> If in the end conclusion will be that i915 resume indeed cannot wait 
> for GSC, then I think auto-blocking of queued up contexts on media 
> engines indeed sounds unavoidable. Otherwise, as you explained, user 
> experience post resume wouldn't be good.

Even if we could implement a wait, I'm not sure we should. GSC resume 
and HuC reload takes ~300ms in most cases, I don't think we want to 
block within the i915 resume path for that long.

>
> However, do we really need to lie in the getparam? How about extend or 
> add a new one to separate the loading vs loaded states? Since 
> userspace does not support DG2 HuC yet this should be doable.

I don't really have a preference here. The media team asked us to do it 
this way because they wouldn't have a use for the different "in 
progress" and "done" states. If they're ok with having separate flags 
that's fine by me.
Tony, any feedback here?

Thanks,
Daniele

>
>>> Will there be runtime suspend happening on the GSC device behind 
>>> i915's back, or i915 and GSC will always be able to transition the 
>>> states in tandem?
>>
>> They're always in sync. The GSC is part of the same HW PCI device as 
>> the rest of the GPU, so they change HW state together.
>
> Okay thanks, I wasn't sure if it is the same or separate device.
>
> Regards,
>
> Tvrtko

Tvrtko Ursulin June 13, 2022, 5:39 p.m. UTC | #7

On 13/06/2022 18:06, Ceraolo Spurio, Daniele wrote:
> On 6/13/2022 9:56 AM, Tvrtko Ursulin wrote:
>> On 13/06/2022 17:41, Ceraolo Spurio, Daniele wrote:
>>> On 6/13/2022 9:31 AM, Tvrtko Ursulin wrote:
>>>>
>>>> On 13/06/2022 16:39, Ceraolo Spurio, Daniele wrote:
>>>>> On 6/13/2022 1:16 AM, Tvrtko Ursulin wrote:
>>>>>>
>>>>>> On 10/06/2022 00:19, Daniele Ceraolo Spurio wrote:
>>>>>>> On DG2, HuC loading is performed by the GSC, via a PXP command. 
>>>>>>> The load
>>>>>>> operation itself is relatively simple (just send a message to the 
>>>>>>> GSC
>>>>>>> with the physical address of the HuC in LMEM), but there are timing
>>>>>>> changes that requires special attention. In particular, to send a 
>>>>>>> PXP
>>>>>>> command we need to first export the GSC driver and then wait for the
>>>>>>> mei-gsc and mei-pxp modules to start, which means that HuC load will
>>>>>>> complete after i915 load is complete. This means that there is a 
>>>>>>> small
>>>>>>> window of time after i915 is registered and before HuC is loaded
>>>>>>> during which userspace could submit and/or checking the HuC load 
>>>>>>> status,
>>>>>>> although this is quite unlikely to happen (HuC is usually loaded 
>>>>>>> before
>>>>>>> kernel init/resume completes).
>>>>>>> We've consulted with the media team in regards to how to handle 
>>>>>>> this and
>>>>>>> they've asked us to do the following:
>>>>>>>
>>>>>>> 1) Report HuC as loaded in the getparam IOCTL even if load is 
>>>>>>> still in
>>>>>>> progress. The media driver uses the IOCTL as a way to check if 
>>>>>>> HuC is
>>>>>>> enabled and then includes a secondary check in the batches to get 
>>>>>>> the
>>>>>>> actual status, so doing it this way allows userspace to keep working
>>>>>>> without changes.
>>>>>>>
>>>>>>> 2) Stall all userspace VCS submission until HuC is loaded. Stalls 
>>>>>>> are
>>>>>>> expected to be very rare (if any), due to the fact that HuC is 
>>>>>>> usually
>>>>>>> loaded before kernel init/resume is completed.
>>>>>>
>>>>>> Motivation to add these complications into i915 are not clear to 
>>>>>> me here. I mean there is no HuC on DG2 _yet_ is the premise of the 
>>>>>> series, right? So no backwards compatibility concerns. In this 
>>>>>> case why jump through the hoops and not let userspace handle all 
>>>>>> of this by just leaving the getparam return the true status?
>>>>>
>>>>> The main areas impacted by the fact that we can't guarantee that 
>>>>> HuC load is complete when i915 starts accepting submissions are 
>>>>> boot and suspend/resume, with the latter being the main problem; GT 
>>>>> reset is not a concern because HuC now survives it. A 
>>>>> suspend/resume can be transparent to userspace and therefore the 
>>>>> HuC status can temporarily flip from loaded to not without 
>>>>> userspace knowledge, especially if we start going into deeper 
>>>>> suspend states and start causing HuC resets when we go into runtime 
>>>>> suspend. Note that this is different from what happens during GT 
>>>>> reset for older platforms, because in that scenario we guarantee 
>>>>> that HuC reload is complete before we restart the submission 
>>>>> back-end, so userspace doesn't notice that the HuC status change. 
>>>>> We had an internal discussion about this problem with both media 
>>>>> and i915 archs and the conclusion was that the best option is for 
>>>>> i915 to stall media submission while HuC (re-)load is in progress.
>>>>
>>>> Resume is potentialy a good reason - I did not pick up on that from 
>>>> the cover letter. I read the statement about the unlikely and small 
>>>> window where HuC is not loaded during kernel init/resume and I guess 
>>>> did not pick up on the resume part.
>>>>
>>>> Waiting for GSC to load HuC from i915 resume is not an option?
>>>
>>> GSC is an aux device exported by i915, so AFAIU GSC resume can't 
>>> start until i915 resume completes.
>>
>> I'll dig into this in the next few days since I want to understand how 
>> exactly it works. Or someone can help explain.
>>
>> If in the end conclusion will be that i915 resume indeed cannot wait 
>> for GSC, then I think auto-blocking of queued up contexts on media 
>> engines indeed sounds unavoidable. Otherwise, as you explained, user 
>> experience post resume wouldn't be good.
> 
> Even if we could implement a wait, I'm not sure we should. GSC resume 
> and HuC reload takes ~300ms in most cases, I don't think we want to 
> block within the i915 resume path for that long.

Yeah maybe not. But entertaining the idea that it is technically 
possible to block - we could perhaps add uapi for userspace to mark 
contexts which want HuC access. Then track if there are any such 
contexts with outstanding submissions and only wait in resume if there 
are. If that would end up significantly less code on the i915 side to 
maintain is an open.

What would be the end result from users point of view in case where it 
suspended during video playback? The proposed solution from this series 
sees the video stuck after resume. Maybe compositor blocks as well since 
I am not sure how well they handle one window not providing new data. 
Probably depends on the compositor.

And then with a simpler solution definitely the whole resume would be 
delayed by 300ms.

With my ChromeOS hat the stalled media engines does sound like a better 
solution. But with the maintainer hat I'd like all options evaluated 
since there is attractiveness if a good enough solution can be achieved 
with significantly less kernel code.

You say 300ms is typical time for HuC load. How long it is on other 
platforms? If much faster then why is it so slow here?

>> However, do we really need to lie in the getparam? How about extend or 
>> add a new one to separate the loading vs loaded states? Since 
>> userspace does not support DG2 HuC yet this should be doable.
> 
> I don't really have a preference here. The media team asked us to do it 
> this way because they wouldn't have a use for the different "in 
> progress" and "done" states. If they're ok with having separate flags 
> that's fine by me.
> Tony, any feedback here?

We don't even have any docs in i915_drm.h in terms of what it means:

#define I915_PARAM_HUC_STATUS		 42

Seems to be a boolean. Status false vs true? Could you add some docs?

Regards,

Tvrtko

> 
> Thanks,
> Daniele
> 
>>
>>>> Will there be runtime suspend happening on the GSC device behind 
>>>> i915's back, or i915 and GSC will always be able to transition the 
>>>> states in tandem?
>>>
>>> They're always in sync. The GSC is part of the same HW PCI device as 
>>> the rest of the GPU, so they change HW state together.
>>
>> Okay thanks, I wasn't sure if it is the same or separate device.
>>
>> Regards,
>>
>> Tvrtko
>

Daniele Ceraolo Spurio June 13, 2022, 6:13 p.m. UTC | #8

On 6/13/2022 10:39 AM, Tvrtko Ursulin wrote:
>
>
> On 13/06/2022 18:06, Ceraolo Spurio, Daniele wrote:
>> On 6/13/2022 9:56 AM, Tvrtko Ursulin wrote:
>>> On 13/06/2022 17:41, Ceraolo Spurio, Daniele wrote:
>>>> On 6/13/2022 9:31 AM, Tvrtko Ursulin wrote:
>>>>>
>>>>> On 13/06/2022 16:39, Ceraolo Spurio, Daniele wrote:
>>>>>> On 6/13/2022 1:16 AM, Tvrtko Ursulin wrote:
>>>>>>>
>>>>>>> On 10/06/2022 00:19, Daniele Ceraolo Spurio wrote:
>>>>>>>> On DG2, HuC loading is performed by the GSC, via a PXP command. 
>>>>>>>> The load
>>>>>>>> operation itself is relatively simple (just send a message to 
>>>>>>>> the GSC
>>>>>>>> with the physical address of the HuC in LMEM), but there are 
>>>>>>>> timing
>>>>>>>> changes that requires special attention. In particular, to send 
>>>>>>>> a PXP
>>>>>>>> command we need to first export the GSC driver and then wait 
>>>>>>>> for the
>>>>>>>> mei-gsc and mei-pxp modules to start, which means that HuC load 
>>>>>>>> will
>>>>>>>> complete after i915 load is complete. This means that there is 
>>>>>>>> a small
>>>>>>>> window of time after i915 is registered and before HuC is loaded
>>>>>>>> during which userspace could submit and/or checking the HuC 
>>>>>>>> load status,
>>>>>>>> although this is quite unlikely to happen (HuC is usually 
>>>>>>>> loaded before
>>>>>>>> kernel init/resume completes).
>>>>>>>> We've consulted with the media team in regards to how to handle 
>>>>>>>> this and
>>>>>>>> they've asked us to do the following:
>>>>>>>>
>>>>>>>> 1) Report HuC as loaded in the getparam IOCTL even if load is 
>>>>>>>> still in
>>>>>>>> progress. The media driver uses the IOCTL as a way to check if 
>>>>>>>> HuC is
>>>>>>>> enabled and then includes a secondary check in the batches to 
>>>>>>>> get the
>>>>>>>> actual status, so doing it this way allows userspace to keep 
>>>>>>>> working
>>>>>>>> without changes.
>>>>>>>>
>>>>>>>> 2) Stall all userspace VCS submission until HuC is loaded. 
>>>>>>>> Stalls are
>>>>>>>> expected to be very rare (if any), due to the fact that HuC is 
>>>>>>>> usually
>>>>>>>> loaded before kernel init/resume is completed.
>>>>>>>
>>>>>>> Motivation to add these complications into i915 are not clear to 
>>>>>>> me here. I mean there is no HuC on DG2 _yet_ is the premise of 
>>>>>>> the series, right? So no backwards compatibility concerns. In 
>>>>>>> this case why jump through the hoops and not let userspace 
>>>>>>> handle all of this by just leaving the getparam return the true 
>>>>>>> status?
>>>>>>
>>>>>> The main areas impacted by the fact that we can't guarantee that 
>>>>>> HuC load is complete when i915 starts accepting submissions are 
>>>>>> boot and suspend/resume, with the latter being the main problem; 
>>>>>> GT reset is not a concern because HuC now survives it. A 
>>>>>> suspend/resume can be transparent to userspace and therefore the 
>>>>>> HuC status can temporarily flip from loaded to not without 
>>>>>> userspace knowledge, especially if we start going into deeper 
>>>>>> suspend states and start causing HuC resets when we go into 
>>>>>> runtime suspend. Note that this is different from what happens 
>>>>>> during GT reset for older platforms, because in that scenario we 
>>>>>> guarantee that HuC reload is complete before we restart the 
>>>>>> submission back-end, so userspace doesn't notice that the HuC 
>>>>>> status change. We had an internal discussion about this problem 
>>>>>> with both media and i915 archs and the conclusion was that the 
>>>>>> best option is for i915 to stall media submission while HuC 
>>>>>> (re-)load is in progress.
>>>>>
>>>>> Resume is potentialy a good reason - I did not pick up on that 
>>>>> from the cover letter. I read the statement about the unlikely and 
>>>>> small window where HuC is not loaded during kernel init/resume and 
>>>>> I guess did not pick up on the resume part.
>>>>>
>>>>> Waiting for GSC to load HuC from i915 resume is not an option?
>>>>
>>>> GSC is an aux device exported by i915, so AFAIU GSC resume can't 
>>>> start until i915 resume completes.
>>>
>>> I'll dig into this in the next few days since I want to understand 
>>> how exactly it works. Or someone can help explain.
>>>
>>> If in the end conclusion will be that i915 resume indeed cannot wait 
>>> for GSC, then I think auto-blocking of queued up contexts on media 
>>> engines indeed sounds unavoidable. Otherwise, as you explained, user 
>>> experience post resume wouldn't be good.
>>
>> Even if we could implement a wait, I'm not sure we should. GSC resume 
>> and HuC reload takes ~300ms in most cases, I don't think we want to 
>> block within the i915 resume path for that long.
>
> Yeah maybe not. But entertaining the idea that it is technically 
> possible to block - we could perhaps add uapi for userspace to mark 
> contexts which want HuC access. Then track if there are any such 
> contexts with outstanding submissions and only wait in resume if there 
> are. If that would end up significantly less code on the i915 side to 
> maintain is an open.
>
> What would be the end result from users point of view in case where it 
> suspended during video playback? The proposed solution from this 
> series sees the video stuck after resume. Maybe compositor blocks as 
> well since I am not sure how well they handle one window not providing 
> new data. Probably depends on the compositor.
>
> And then with a simpler solution definitely the whole resume would be 
> delayed by 300ms.
>
> With my ChromeOS hat the stalled media engines does sound like a 
> better solution. But with the maintainer hat I'd like all options 
> evaluated since there is attractiveness if a good enough solution can 
> be achieved with significantly less kernel code.
>
> You say 300ms is typical time for HuC load. How long it is on other 
> platforms? If much faster then why is it so slow here?

The GSC itself has to come out of suspend before it can perform the 
load, which takes a few tens of ms I believe. AFAIU the GSC is also 
slower in processing the HuC load and auth compared to the legacy path. 
The GSC FW team gave a 250ms limit for the time the GSC FW needs from 
start of the resume flow to HuC load complete, so I bumped that to 
~300ms to account for all other SW interactions, plus a bit of buffer. 
Note that a bit of the SW overhead is caused by the fact that we have 2 
mei modules in play here: mei-gsc, which manages the GSC device itself 
(including resume), and mei-pxp, which owns the pxp messaging, including 
HuC load.

>
>>> However, do we really need to lie in the getparam? How about extend 
>>> or add a new one to separate the loading vs loaded states? Since 
>>> userspace does not support DG2 HuC yet this should be doable.
>>
>> I don't really have a preference here. The media team asked us to do 
>> it this way because they wouldn't have a use for the different "in 
>> progress" and "done" states. If they're ok with having separate flags 
>> that's fine by me.
>> Tony, any feedback here?
>
> We don't even have any docs in i915_drm.h in terms of what it means:
>
> #define I915_PARAM_HUC_STATUS         42
>
> Seems to be a boolean. Status false vs true? Could you add some docs?

There is documentation above intel_huc_check_status(), which is also 
updated in this series. I can move that to i915_drm.h.

Daniele

>
> Regards,
>
> Tvrtko
>
>>
>> Thanks,
>> Daniele
>>
>>>
>>>>> Will there be runtime suspend happening on the GSC device behind 
>>>>> i915's back, or i915 and GSC will always be able to transition the 
>>>>> states in tandem?
>>>>
>>>> They're always in sync. The GSC is part of the same HW PCI device 
>>>> as the rest of the GPU, so they change HW state together.
>>>
>>> Okay thanks, I wasn't sure if it is the same or separate device.
>>>
>>> Regards,
>>>
>>> Tvrtko
>>

Tvrtko Ursulin June 14, 2022, 7:44 a.m. UTC | #9

On 13/06/2022 19:13, Ceraolo Spurio, Daniele wrote:
> On 6/13/2022 10:39 AM, Tvrtko Ursulin wrote:
>> On 13/06/2022 18:06, Ceraolo Spurio, Daniele wrote:
>>> On 6/13/2022 9:56 AM, Tvrtko Ursulin wrote:
>>>> On 13/06/2022 17:41, Ceraolo Spurio, Daniele wrote:
>>>>> On 6/13/2022 9:31 AM, Tvrtko Ursulin wrote:
>>>>>>
>>>>>> On 13/06/2022 16:39, Ceraolo Spurio, Daniele wrote:
>>>>>>> On 6/13/2022 1:16 AM, Tvrtko Ursulin wrote:
>>>>>>>>
>>>>>>>> On 10/06/2022 00:19, Daniele Ceraolo Spurio wrote:
>>>>>>>>> On DG2, HuC loading is performed by the GSC, via a PXP command. 
>>>>>>>>> The load
>>>>>>>>> operation itself is relatively simple (just send a message to 
>>>>>>>>> the GSC
>>>>>>>>> with the physical address of the HuC in LMEM), but there are 
>>>>>>>>> timing
>>>>>>>>> changes that requires special attention. In particular, to send 
>>>>>>>>> a PXP
>>>>>>>>> command we need to first export the GSC driver and then wait 
>>>>>>>>> for the
>>>>>>>>> mei-gsc and mei-pxp modules to start, which means that HuC load 
>>>>>>>>> will
>>>>>>>>> complete after i915 load is complete. This means that there is 
>>>>>>>>> a small
>>>>>>>>> window of time after i915 is registered and before HuC is loaded
>>>>>>>>> during which userspace could submit and/or checking the HuC 
>>>>>>>>> load status,
>>>>>>>>> although this is quite unlikely to happen (HuC is usually 
>>>>>>>>> loaded before
>>>>>>>>> kernel init/resume completes).
>>>>>>>>> We've consulted with the media team in regards to how to handle 
>>>>>>>>> this and
>>>>>>>>> they've asked us to do the following:
>>>>>>>>>
>>>>>>>>> 1) Report HuC as loaded in the getparam IOCTL even if load is 
>>>>>>>>> still in
>>>>>>>>> progress. The media driver uses the IOCTL as a way to check if 
>>>>>>>>> HuC is
>>>>>>>>> enabled and then includes a secondary check in the batches to 
>>>>>>>>> get the
>>>>>>>>> actual status, so doing it this way allows userspace to keep 
>>>>>>>>> working
>>>>>>>>> without changes.
>>>>>>>>>
>>>>>>>>> 2) Stall all userspace VCS submission until HuC is loaded. 
>>>>>>>>> Stalls are
>>>>>>>>> expected to be very rare (if any), due to the fact that HuC is 
>>>>>>>>> usually
>>>>>>>>> loaded before kernel init/resume is completed.
>>>>>>>>
>>>>>>>> Motivation to add these complications into i915 are not clear to 
>>>>>>>> me here. I mean there is no HuC on DG2 _yet_ is the premise of 
>>>>>>>> the series, right? So no backwards compatibility concerns. In 
>>>>>>>> this case why jump through the hoops and not let userspace 
>>>>>>>> handle all of this by just leaving the getparam return the true 
>>>>>>>> status?
>>>>>>>
>>>>>>> The main areas impacted by the fact that we can't guarantee that 
>>>>>>> HuC load is complete when i915 starts accepting submissions are 
>>>>>>> boot and suspend/resume, with the latter being the main problem; 
>>>>>>> GT reset is not a concern because HuC now survives it. A 
>>>>>>> suspend/resume can be transparent to userspace and therefore the 
>>>>>>> HuC status can temporarily flip from loaded to not without 
>>>>>>> userspace knowledge, especially if we start going into deeper 
>>>>>>> suspend states and start causing HuC resets when we go into 
>>>>>>> runtime suspend. Note that this is different from what happens 
>>>>>>> during GT reset for older platforms, because in that scenario we 
>>>>>>> guarantee that HuC reload is complete before we restart the 
>>>>>>> submission back-end, so userspace doesn't notice that the HuC 
>>>>>>> status change. We had an internal discussion about this problem 
>>>>>>> with both media and i915 archs and the conclusion was that the 
>>>>>>> best option is for i915 to stall media submission while HuC 
>>>>>>> (re-)load is in progress.
>>>>>>
>>>>>> Resume is potentialy a good reason - I did not pick up on that 
>>>>>> from the cover letter. I read the statement about the unlikely and 
>>>>>> small window where HuC is not loaded during kernel init/resume and 
>>>>>> I guess did not pick up on the resume part.
>>>>>>
>>>>>> Waiting for GSC to load HuC from i915 resume is not an option?
>>>>>
>>>>> GSC is an aux device exported by i915, so AFAIU GSC resume can't 
>>>>> start until i915 resume completes.
>>>>
>>>> I'll dig into this in the next few days since I want to understand 
>>>> how exactly it works. Or someone can help explain.
>>>>
>>>> If in the end conclusion will be that i915 resume indeed cannot wait 
>>>> for GSC, then I think auto-blocking of queued up contexts on media 
>>>> engines indeed sounds unavoidable. Otherwise, as you explained, user 
>>>> experience post resume wouldn't be good.
>>>
>>> Even if we could implement a wait, I'm not sure we should. GSC resume 
>>> and HuC reload takes ~300ms in most cases, I don't think we want to 
>>> block within the i915 resume path for that long.
>>
>> Yeah maybe not. But entertaining the idea that it is technically 
>> possible to block - we could perhaps add uapi for userspace to mark 
>> contexts which want HuC access. Then track if there are any such 
>> contexts with outstanding submissions and only wait in resume if there 
>> are. If that would end up significantly less code on the i915 side to 
>> maintain is an open.
>>
>> What would be the end result from users point of view in case where it 
>> suspended during video playback? The proposed solution from this 
>> series sees the video stuck after resume. Maybe compositor blocks as 
>> well since I am not sure how well they handle one window not providing 
>> new data. Probably depends on the compositor.
>>
>> And then with a simpler solution definitely the whole resume would be 
>> delayed by 300ms.
>>
>> With my ChromeOS hat the stalled media engines does sound like a 
>> better solution. But with the maintainer hat I'd like all options 
>> evaluated since there is attractiveness if a good enough solution can 
>> be achieved with significantly less kernel code.
>>
>> You say 300ms is typical time for HuC load. How long it is on other 
>> platforms? If much faster then why is it so slow here?
> 
> The GSC itself has to come out of suspend before it can perform the 
> load, which takes a few tens of ms I believe. AFAIU the GSC is also 
> slower in processing the HuC load and auth compared to the legacy path. 
> The GSC FW team gave a 250ms limit for the time the GSC FW needs from 
> start of the resume flow to HuC load complete, so I bumped that to 
> ~300ms to account for all other SW interactions, plus a bit of buffer. 
> Note that a bit of the SW overhead is caused by the fact that we have 2 
> mei modules in play here: mei-gsc, which manages the GSC device itself 
> (including resume), and mei-pxp, which owns the pxp messaging, including 
> HuC load.

And how long on other platforms (not DG2) do you know? Presumably there 
the wait is on the i915 resume path?

>>>> However, do we really need to lie in the getparam? How about extend 
>>>> or add a new one to separate the loading vs loaded states? Since 
>>>> userspace does not support DG2 HuC yet this should be doable.
>>>
>>> I don't really have a preference here. The media team asked us to do 
>>> it this way because they wouldn't have a use for the different "in 
>>> progress" and "done" states. If they're ok with having separate flags 
>>> that's fine by me.
>>> Tony, any feedback here?
>>
>> We don't even have any docs in i915_drm.h in terms of what it means:
>>
>> #define I915_PARAM_HUC_STATUS         42
>>
>> Seems to be a boolean. Status false vs true? Could you add some docs?
> 
> There is documentation above intel_huc_check_status(), which is also 
> updated in this series. I can move that to i915_drm.h.

That would be great, thanks.

And with so rich return codes already documented and exposed via uapi - 
would we really need to add anything new for DG2 apart for userspace to 
know that if zero is returned (not a negative error value) it should 
retry? I mean is there another negative error missing which would 
prevent zero transitioning to one?

Regards,

Tvrtko

> 
> Daniele
> 
>>
>> Regards,
>>
>> Tvrtko
>>
>>>
>>> Thanks,
>>> Daniele
>>>
>>>>
>>>>>> Will there be runtime suspend happening on the GSC device behind 
>>>>>> i915's back, or i915 and GSC will always be able to transition the 
>>>>>> states in tandem?
>>>>>
>>>>> They're always in sync. The GSC is part of the same HW PCI device 
>>>>> as the rest of the GPU, so they change HW state together.
>>>>
>>>> Okay thanks, I wasn't sure if it is the same or separate device.
>>>>
>>>> Regards,
>>>>
>>>> Tvrtko
>>>
>

Daniele Ceraolo Spurio June 14, 2022, 3:30 p.m. UTC | #10

On 6/14/2022 12:44 AM, Tvrtko Ursulin wrote:
>
> On 13/06/2022 19:13, Ceraolo Spurio, Daniele wrote:
>> On 6/13/2022 10:39 AM, Tvrtko Ursulin wrote:
>>> On 13/06/2022 18:06, Ceraolo Spurio, Daniele wrote:
>>>> On 6/13/2022 9:56 AM, Tvrtko Ursulin wrote:
>>>>> On 13/06/2022 17:41, Ceraolo Spurio, Daniele wrote:
>>>>>> On 6/13/2022 9:31 AM, Tvrtko Ursulin wrote:
>>>>>>>
>>>>>>> On 13/06/2022 16:39, Ceraolo Spurio, Daniele wrote:
>>>>>>>> On 6/13/2022 1:16 AM, Tvrtko Ursulin wrote:
>>>>>>>>>
>>>>>>>>> On 10/06/2022 00:19, Daniele Ceraolo Spurio wrote:
>>>>>>>>>> On DG2, HuC loading is performed by the GSC, via a PXP 
>>>>>>>>>> command. The load
>>>>>>>>>> operation itself is relatively simple (just send a message to 
>>>>>>>>>> the GSC
>>>>>>>>>> with the physical address of the HuC in LMEM), but there are 
>>>>>>>>>> timing
>>>>>>>>>> changes that requires special attention. In particular, to 
>>>>>>>>>> send a PXP
>>>>>>>>>> command we need to first export the GSC driver and then wait 
>>>>>>>>>> for the
>>>>>>>>>> mei-gsc and mei-pxp modules to start, which means that HuC 
>>>>>>>>>> load will
>>>>>>>>>> complete after i915 load is complete. This means that there 
>>>>>>>>>> is a small
>>>>>>>>>> window of time after i915 is registered and before HuC is loaded
>>>>>>>>>> during which userspace could submit and/or checking the HuC 
>>>>>>>>>> load status,
>>>>>>>>>> although this is quite unlikely to happen (HuC is usually 
>>>>>>>>>> loaded before
>>>>>>>>>> kernel init/resume completes).
>>>>>>>>>> We've consulted with the media team in regards to how to 
>>>>>>>>>> handle this and
>>>>>>>>>> they've asked us to do the following:
>>>>>>>>>>
>>>>>>>>>> 1) Report HuC as loaded in the getparam IOCTL even if load is 
>>>>>>>>>> still in
>>>>>>>>>> progress. The media driver uses the IOCTL as a way to check 
>>>>>>>>>> if HuC is
>>>>>>>>>> enabled and then includes a secondary check in the batches to 
>>>>>>>>>> get the
>>>>>>>>>> actual status, so doing it this way allows userspace to keep 
>>>>>>>>>> working
>>>>>>>>>> without changes.
>>>>>>>>>>
>>>>>>>>>> 2) Stall all userspace VCS submission until HuC is loaded. 
>>>>>>>>>> Stalls are
>>>>>>>>>> expected to be very rare (if any), due to the fact that HuC 
>>>>>>>>>> is usually
>>>>>>>>>> loaded before kernel init/resume is completed.
>>>>>>>>>
>>>>>>>>> Motivation to add these complications into i915 are not clear 
>>>>>>>>> to me here. I mean there is no HuC on DG2 _yet_ is the premise 
>>>>>>>>> of the series, right? So no backwards compatibility concerns. 
>>>>>>>>> In this case why jump through the hoops and not let userspace 
>>>>>>>>> handle all of this by just leaving the getparam return the 
>>>>>>>>> true status?
>>>>>>>>
>>>>>>>> The main areas impacted by the fact that we can't guarantee 
>>>>>>>> that HuC load is complete when i915 starts accepting 
>>>>>>>> submissions are boot and suspend/resume, with the latter being 
>>>>>>>> the main problem; GT reset is not a concern because HuC now 
>>>>>>>> survives it. A suspend/resume can be transparent to userspace 
>>>>>>>> and therefore the HuC status can temporarily flip from loaded 
>>>>>>>> to not without userspace knowledge, especially if we start 
>>>>>>>> going into deeper suspend states and start causing HuC resets 
>>>>>>>> when we go into runtime suspend. Note that this is different 
>>>>>>>> from what happens during GT reset for older platforms, because 
>>>>>>>> in that scenario we guarantee that HuC reload is complete 
>>>>>>>> before we restart the submission back-end, so userspace doesn't 
>>>>>>>> notice that the HuC status change. We had an internal 
>>>>>>>> discussion about this problem with both media and i915 archs 
>>>>>>>> and the conclusion was that the best option is for i915 to 
>>>>>>>> stall media submission while HuC (re-)load is in progress.
>>>>>>>
>>>>>>> Resume is potentialy a good reason - I did not pick up on that 
>>>>>>> from the cover letter. I read the statement about the unlikely 
>>>>>>> and small window where HuC is not loaded during kernel 
>>>>>>> init/resume and I guess did not pick up on the resume part.
>>>>>>>
>>>>>>> Waiting for GSC to load HuC from i915 resume is not an option?
>>>>>>
>>>>>> GSC is an aux device exported by i915, so AFAIU GSC resume can't 
>>>>>> start until i915 resume completes.
>>>>>
>>>>> I'll dig into this in the next few days since I want to understand 
>>>>> how exactly it works. Or someone can help explain.
>>>>>
>>>>> If in the end conclusion will be that i915 resume indeed cannot 
>>>>> wait for GSC, then I think auto-blocking of queued up contexts on 
>>>>> media engines indeed sounds unavoidable. Otherwise, as you 
>>>>> explained, user experience post resume wouldn't be good.
>>>>
>>>> Even if we could implement a wait, I'm not sure we should. GSC 
>>>> resume and HuC reload takes ~300ms in most cases, I don't think we 
>>>> want to block within the i915 resume path for that long.
>>>
>>> Yeah maybe not. But entertaining the idea that it is technically 
>>> possible to block - we could perhaps add uapi for userspace to mark 
>>> contexts which want HuC access. Then track if there are any such 
>>> contexts with outstanding submissions and only wait in resume if 
>>> there are. If that would end up significantly less code on the i915 
>>> side to maintain is an open.
>>>
>>> What would be the end result from users point of view in case where 
>>> it suspended during video playback? The proposed solution from this 
>>> series sees the video stuck after resume. Maybe compositor blocks as 
>>> well since I am not sure how well they handle one window not 
>>> providing new data. Probably depends on the compositor.
>>>
>>> And then with a simpler solution definitely the whole resume would 
>>> be delayed by 300ms.
>>>
>>> With my ChromeOS hat the stalled media engines does sound like a 
>>> better solution. But with the maintainer hat I'd like all options 
>>> evaluated since there is attractiveness if a good enough solution 
>>> can be achieved with significantly less kernel code.
>>>
>>> You say 300ms is typical time for HuC load. How long it is on other 
>>> platforms? If much faster then why is it so slow here?
>>
>> The GSC itself has to come out of suspend before it can perform the 
>> load, which takes a few tens of ms I believe. AFAIU the GSC is also 
>> slower in processing the HuC load and auth compared to the legacy 
>> path. The GSC FW team gave a 250ms limit for the time the GSC FW 
>> needs from start of the resume flow to HuC load complete, so I bumped 
>> that to ~300ms to account for all other SW interactions, plus a bit 
>> of buffer. Note that a bit of the SW overhead is caused by the fact 
>> that we have 2 mei modules in play here: mei-gsc, which manages the 
>> GSC device itself (including resume), and mei-pxp, which owns the pxp 
>> messaging, including HuC load.
>
> And how long on other platforms (not DG2) do you know? Presumably 
> there the wait is on the i915 resume path?

I don't have "official" expected load times at hand, but looking at the 
BAT boot logs for this series for DG1 I see it takes ~10 ms to load both 
GuC and HuC:

<7>[    8.157838] i915 0000:03:00.0: [drm:intel_huc_init [i915]] GSC 
loads huc=no
<6>[    8.158632] i915 0000:03:00.0: [drm] GuC firmware 
i915/dg1_guc_70.1.1.bin version 70.1
<6>[    8.158634] i915 0000:03:00.0: [drm] HuC firmware 
i915/dg1_huc_7.9.3.bin version 7.9
<7>[    8.164255] i915 0000:03:00.0: [drm:guc_enable_communication 
[i915]] GuC communication enabled
<6>[    8.166111] i915 0000:03:00.0: [drm] HuC authenticated

Note that we increase the GT frequency all the way to the max before 
starting the FW load, which speeds things up.

>
>>>>> However, do we really need to lie in the getparam? How about 
>>>>> extend or add a new one to separate the loading vs loaded states? 
>>>>> Since userspace does not support DG2 HuC yet this should be doable.
>>>>
>>>> I don't really have a preference here. The media team asked us to 
>>>> do it this way because they wouldn't have a use for the different 
>>>> "in progress" and "done" states. If they're ok with having separate 
>>>> flags that's fine by me.
>>>> Tony, any feedback here?
>>>
>>> We don't even have any docs in i915_drm.h in terms of what it means:
>>>
>>> #define I915_PARAM_HUC_STATUS         42
>>>
>>> Seems to be a boolean. Status false vs true? Could you add some docs?
>>
>> There is documentation above intel_huc_check_status(), which is also 
>> updated in this series. I can move that to i915_drm.h.
>
> That would be great, thanks.
>
> And with so rich return codes already documented and exposed via uapi 
> - would we really need to add anything new for DG2 apart for userspace 
> to know that if zero is returned (not a negative error value) it 
> should retry? I mean is there another negative error missing which 
> would prevent zero transitioning to one?

I think if the auth fails we currently return 0, because the uc state in 
that case would be "TRANSFERRED", i.e. DMA complete but not fully 
enabled. I don't have anything against changing the FW state to "ERROR" 
in this scenario and leave the 0 to mean "not done yet", but I'd prefer 
the media team to comment on their needs for this IOCTL before 
committing to anything.

Daniele

>
> Regards,
>
> Tvrtko
>
>>
>> Daniele
>>
>>>
>>> Regards,
>>>
>>> Tvrtko
>>>
>>>>
>>>> Thanks,
>>>> Daniele
>>>>
>>>>>
>>>>>>> Will there be runtime suspend happening on the GSC device behind 
>>>>>>> i915's back, or i915 and GSC will always be able to transition 
>>>>>>> the states in tandem?
>>>>>>
>>>>>> They're always in sync. The GSC is part of the same HW PCI device 
>>>>>> as the rest of the GPU, so they change HW state together.
>>>>>
>>>>> Okay thanks, I wasn't sure if it is the same or separate device.
>>>>>
>>>>> Regards,
>>>>>
>>>>> Tvrtko
>>>>
>>

Ye, Tony June 14, 2022, 11:15 p.m. UTC | #11

On 6/14/2022 8:30 AM, Ceraolo Spurio, Daniele wrote:
>
>
> On 6/14/2022 12:44 AM, Tvrtko Ursulin wrote:
>>
>> On 13/06/2022 19:13, Ceraolo Spurio, Daniele wrote:
>>> On 6/13/2022 10:39 AM, Tvrtko Ursulin wrote:
>>>> On 13/06/2022 18:06, Ceraolo Spurio, Daniele wrote:
>>>>> On 6/13/2022 9:56 AM, Tvrtko Ursulin wrote:
>>>>>> On 13/06/2022 17:41, Ceraolo Spurio, Daniele wrote:
>>>>>>> On 6/13/2022 9:31 AM, Tvrtko Ursulin wrote:
>>>>>>>>
>>>>>>>> On 13/06/2022 16:39, Ceraolo Spurio, Daniele wrote:
>>>>>>>>> On 6/13/2022 1:16 AM, Tvrtko Ursulin wrote:
>>>>>>>>>>
>>>>>>>>>> On 10/06/2022 00:19, Daniele Ceraolo Spurio wrote:
>>>>>>>>>>> On DG2, HuC loading is performed by the GSC, via a PXP 
>>>>>>>>>>> command. The load
>>>>>>>>>>> operation itself is relatively simple (just send a message 
>>>>>>>>>>> to the GSC
>>>>>>>>>>> with the physical address of the HuC in LMEM), but there are 
>>>>>>>>>>> timing
>>>>>>>>>>> changes that requires special attention. In particular, to 
>>>>>>>>>>> send a PXP
>>>>>>>>>>> command we need to first export the GSC driver and then wait 
>>>>>>>>>>> for the
>>>>>>>>>>> mei-gsc and mei-pxp modules to start, which means that HuC 
>>>>>>>>>>> load will
>>>>>>>>>>> complete after i915 load is complete. This means that there 
>>>>>>>>>>> is a small
>>>>>>>>>>> window of time after i915 is registered and before HuC is 
>>>>>>>>>>> loaded
>>>>>>>>>>> during which userspace could submit and/or checking the HuC 
>>>>>>>>>>> load status,
>>>>>>>>>>> although this is quite unlikely to happen (HuC is usually 
>>>>>>>>>>> loaded before
>>>>>>>>>>> kernel init/resume completes).
>>>>>>>>>>> We've consulted with the media team in regards to how to 
>>>>>>>>>>> handle this and
>>>>>>>>>>> they've asked us to do the following:
>>>>>>>>>>>
>>>>>>>>>>> 1) Report HuC as loaded in the getparam IOCTL even if load 
>>>>>>>>>>> is still in
>>>>>>>>>>> progress. The media driver uses the IOCTL as a way to check 
>>>>>>>>>>> if HuC is
>>>>>>>>>>> enabled and then includes a secondary check in the batches 
>>>>>>>>>>> to get the
>>>>>>>>>>> actual status, so doing it this way allows userspace to keep 
>>>>>>>>>>> working
>>>>>>>>>>> without changes.
>>>>>>>>>>>
>>>>>>>>>>> 2) Stall all userspace VCS submission until HuC is loaded. 
>>>>>>>>>>> Stalls are
>>>>>>>>>>> expected to be very rare (if any), due to the fact that HuC 
>>>>>>>>>>> is usually
>>>>>>>>>>> loaded before kernel init/resume is completed.
>>>>>>>>>>
>>>>>>>>>> Motivation to add these complications into i915 are not clear 
>>>>>>>>>> to me here. I mean there is no HuC on DG2 _yet_ is the 
>>>>>>>>>> premise of the series, right? So no backwards compatibility 
>>>>>>>>>> concerns. In this case why jump through the hoops and not let 
>>>>>>>>>> userspace handle all of this by just leaving the getparam 
>>>>>>>>>> return the true status?
>>>>>>>>>
>>>>>>>>> The main areas impacted by the fact that we can't guarantee 
>>>>>>>>> that HuC load is complete when i915 starts accepting 
>>>>>>>>> submissions are boot and suspend/resume, with the latter being 
>>>>>>>>> the main problem; GT reset is not a concern because HuC now 
>>>>>>>>> survives it. A suspend/resume can be transparent to userspace 
>>>>>>>>> and therefore the HuC status can temporarily flip from loaded 
>>>>>>>>> to not without userspace knowledge, especially if we start 
>>>>>>>>> going into deeper suspend states and start causing HuC resets 
>>>>>>>>> when we go into runtime suspend. Note that this is different 
>>>>>>>>> from what happens during GT reset for older platforms, because 
>>>>>>>>> in that scenario we guarantee that HuC reload is complete 
>>>>>>>>> before we restart the submission back-end, so userspace 
>>>>>>>>> doesn't notice that the HuC status change. We had an internal 
>>>>>>>>> discussion about this problem with both media and i915 archs 
>>>>>>>>> and the conclusion was that the best option is for i915 to 
>>>>>>>>> stall media submission while HuC (re-)load is in progress.
>>>>>>>>
>>>>>>>> Resume is potentialy a good reason - I did not pick up on that 
>>>>>>>> from the cover letter. I read the statement about the unlikely 
>>>>>>>> and small window where HuC is not loaded during kernel 
>>>>>>>> init/resume and I guess did not pick up on the resume part.
>>>>>>>>
>>>>>>>> Waiting for GSC to load HuC from i915 resume is not an option?
>>>>>>>
>>>>>>> GSC is an aux device exported by i915, so AFAIU GSC resume can't 
>>>>>>> start until i915 resume completes.
>>>>>>
>>>>>> I'll dig into this in the next few days since I want to 
>>>>>> understand how exactly it works. Or someone can help explain.
>>>>>>
>>>>>> If in the end conclusion will be that i915 resume indeed cannot 
>>>>>> wait for GSC, then I think auto-blocking of queued up contexts on 
>>>>>> media engines indeed sounds unavoidable. Otherwise, as you 
>>>>>> explained, user experience post resume wouldn't be good.
>>>>>
>>>>> Even if we could implement a wait, I'm not sure we should. GSC 
>>>>> resume and HuC reload takes ~300ms in most cases, I don't think we 
>>>>> want to block within the i915 resume path for that long.
>>>>
>>>> Yeah maybe not. But entertaining the idea that it is technically 
>>>> possible to block - we could perhaps add uapi for userspace to mark 
>>>> contexts which want HuC access. Then track if there are any such 
>>>> contexts with outstanding submissions and only wait in resume if 
>>>> there are. If that would end up significantly less code on the i915 
>>>> side to maintain is an open.
>>>>
>>>> What would be the end result from users point of view in case where 
>>>> it suspended during video playback? The proposed solution from this 
>>>> series sees the video stuck after resume. Maybe compositor blocks 
>>>> as well since I am not sure how well they handle one window not 
>>>> providing new data. Probably depends on the compositor.
>>>>
>>>> And then with a simpler solution definitely the whole resume would 
>>>> be delayed by 300ms.
>>>>
>>>> With my ChromeOS hat the stalled media engines does sound like a 
>>>> better solution. But with the maintainer hat I'd like all options 
>>>> evaluated since there is attractiveness if a good enough solution 
>>>> can be achieved with significantly less kernel code.
>>>>
>>>> You say 300ms is typical time for HuC load. How long it is on other 
>>>> platforms? If much faster then why is it so slow here?
>>>
>>> The GSC itself has to come out of suspend before it can perform the 
>>> load, which takes a few tens of ms I believe. AFAIU the GSC is also 
>>> slower in processing the HuC load and auth compared to the legacy 
>>> path. The GSC FW team gave a 250ms limit for the time the GSC FW 
>>> needs from start of the resume flow to HuC load complete, so I 
>>> bumped that to ~300ms to account for all other SW interactions, plus 
>>> a bit of buffer. Note that a bit of the SW overhead is caused by the 
>>> fact that we have 2 mei modules in play here: mei-gsc, which manages 
>>> the GSC device itself (including resume), and mei-pxp, which owns 
>>> the pxp messaging, including HuC load.
>>
>> And how long on other platforms (not DG2) do you know? Presumably 
>> there the wait is on the i915 resume path?
>
> I don't have "official" expected load times at hand, but looking at 
> the BAT boot logs for this series for DG1 I see it takes ~10 ms to 
> load both GuC and HuC:
>
> <7>[    8.157838] i915 0000:03:00.0: [drm:intel_huc_init [i915]] GSC 
> loads huc=no
> <6>[    8.158632] i915 0000:03:00.0: [drm] GuC firmware 
> i915/dg1_guc_70.1.1.bin version 70.1
> <6>[    8.158634] i915 0000:03:00.0: [drm] HuC firmware 
> i915/dg1_huc_7.9.3.bin version 7.9
> <7>[    8.164255] i915 0000:03:00.0: [drm:guc_enable_communication 
> [i915]] GuC communication enabled
> <6>[    8.166111] i915 0000:03:00.0: [drm] HuC authenticated
>
> Note that we increase the GT frequency all the way to the max before 
> starting the FW load, which speeds things up.
>
>>
>>>>>> However, do we really need to lie in the getparam? How about 
>>>>>> extend or add a new one to separate the loading vs loaded states? 
>>>>>> Since userspace does not support DG2 HuC yet this should be doable.
>>>>>
>>>>> I don't really have a preference here. The media team asked us to 
>>>>> do it this way because they wouldn't have a use for the different 
>>>>> "in progress" and "done" states. If they're ok with having 
>>>>> separate flags that's fine by me.
>>>>> Tony, any feedback here?
>>>>
>>>> We don't even have any docs in i915_drm.h in terms of what it means:
>>>>
>>>> #define I915_PARAM_HUC_STATUS         42
>>>>
>>>> Seems to be a boolean. Status false vs true? Could you add some docs?
>>>
>>> There is documentation above intel_huc_check_status(), which is also 
>>> updated in this series. I can move that to i915_drm.h.
>>
>> That would be great, thanks.
>>
>> And with so rich return codes already documented and exposed via uapi 
>> - would we really need to add anything new for DG2 apart for 
>> userspace to know that if zero is returned (not a negative error 
>> value) it should retry? I mean is there another negative error 
>> missing which would prevent zero transitioning to one?
>
> I think if the auth fails we currently return 0, because the uc state 
> in that case would be "TRANSFERRED", i.e. DMA complete but not fully 
> enabled. I don't have anything against changing the FW state to 
> "ERROR" in this scenario and leave the 0 to mean "not done yet", but 
> I'd prefer the media team to comment on their needs for this IOCTL 
> before committing to anything.


Currently media doesn't differentiate "delayed loading is in progress" 
with "HuC is authenticated and running". If the HuC authentication 
eventually fails, the user needs to check the debugfs to know the 
reason. IMHO, it's not a big problem as this is what we do even when the 
IOCTL returns non-zero values. + Carl to comment.

Thanks,

Tony

>
> Daniele
>
>>
>> Regards,
>>
>> Tvrtko
>>
>>>
>>> Daniele
>>>
>>>>
>>>> Regards,
>>>>
>>>> Tvrtko
>>>>
>>>>>
>>>>> Thanks,
>>>>> Daniele
>>>>>
>>>>>>
>>>>>>>> Will there be runtime suspend happening on the GSC device 
>>>>>>>> behind i915's back, or i915 and GSC will always be able to 
>>>>>>>> transition the states in tandem?
>>>>>>>
>>>>>>> They're always in sync. The GSC is part of the same HW PCI 
>>>>>>> device as the rest of the GPU, so they change HW state together.
>>>>>>
>>>>>> Okay thanks, I wasn't sure if it is the same or separate device.
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Tvrtko
>>>>>
>>>
>

Tvrtko Ursulin June 15, 2022, 10:13 a.m. UTC | #12

On 15/06/2022 00:15, Ye, Tony wrote:
> On 6/14/2022 8:30 AM, Ceraolo Spurio, Daniele wrote:
>> On 6/14/2022 12:44 AM, Tvrtko Ursulin wrote:
>>>
>>> On 13/06/2022 19:13, Ceraolo Spurio, Daniele wrote:
>>>> On 6/13/2022 10:39 AM, Tvrtko Ursulin wrote:
>>>>> On 13/06/2022 18:06, Ceraolo Spurio, Daniele wrote:
>>>>>> On 6/13/2022 9:56 AM, Tvrtko Ursulin wrote:
>>>>>>> On 13/06/2022 17:41, Ceraolo Spurio, Daniele wrote:
>>>>>>>> On 6/13/2022 9:31 AM, Tvrtko Ursulin wrote:
>>>>>>>>>
>>>>>>>>> On 13/06/2022 16:39, Ceraolo Spurio, Daniele wrote:
>>>>>>>>>> On 6/13/2022 1:16 AM, Tvrtko Ursulin wrote:
>>>>>>>>>>>
>>>>>>>>>>> On 10/06/2022 00:19, Daniele Ceraolo Spurio wrote:
>>>>>>>>>>>> On DG2, HuC loading is performed by the GSC, via a PXP 
>>>>>>>>>>>> command. The load
>>>>>>>>>>>> operation itself is relatively simple (just send a message 
>>>>>>>>>>>> to the GSC
>>>>>>>>>>>> with the physical address of the HuC in LMEM), but there are 
>>>>>>>>>>>> timing
>>>>>>>>>>>> changes that requires special attention. In particular, to 
>>>>>>>>>>>> send a PXP
>>>>>>>>>>>> command we need to first export the GSC driver and then wait 
>>>>>>>>>>>> for the
>>>>>>>>>>>> mei-gsc and mei-pxp modules to start, which means that HuC 
>>>>>>>>>>>> load will
>>>>>>>>>>>> complete after i915 load is complete. This means that there 
>>>>>>>>>>>> is a small
>>>>>>>>>>>> window of time after i915 is registered and before HuC is 
>>>>>>>>>>>> loaded
>>>>>>>>>>>> during which userspace could submit and/or checking the HuC 
>>>>>>>>>>>> load status,
>>>>>>>>>>>> although this is quite unlikely to happen (HuC is usually 
>>>>>>>>>>>> loaded before
>>>>>>>>>>>> kernel init/resume completes).
>>>>>>>>>>>> We've consulted with the media team in regards to how to 
>>>>>>>>>>>> handle this and
>>>>>>>>>>>> they've asked us to do the following:
>>>>>>>>>>>>
>>>>>>>>>>>> 1) Report HuC as loaded in the getparam IOCTL even if load 
>>>>>>>>>>>> is still in
>>>>>>>>>>>> progress. The media driver uses the IOCTL as a way to check 
>>>>>>>>>>>> if HuC is
>>>>>>>>>>>> enabled and then includes a secondary check in the batches 
>>>>>>>>>>>> to get the
>>>>>>>>>>>> actual status, so doing it this way allows userspace to keep 
>>>>>>>>>>>> working
>>>>>>>>>>>> without changes.
>>>>>>>>>>>>
>>>>>>>>>>>> 2) Stall all userspace VCS submission until HuC is loaded. 
>>>>>>>>>>>> Stalls are
>>>>>>>>>>>> expected to be very rare (if any), due to the fact that HuC 
>>>>>>>>>>>> is usually
>>>>>>>>>>>> loaded before kernel init/resume is completed.
>>>>>>>>>>>
>>>>>>>>>>> Motivation to add these complications into i915 are not clear 
>>>>>>>>>>> to me here. I mean there is no HuC on DG2 _yet_ is the 
>>>>>>>>>>> premise of the series, right? So no backwards compatibility 
>>>>>>>>>>> concerns. In this case why jump through the hoops and not let 
>>>>>>>>>>> userspace handle all of this by just leaving the getparam 
>>>>>>>>>>> return the true status?
>>>>>>>>>>
>>>>>>>>>> The main areas impacted by the fact that we can't guarantee 
>>>>>>>>>> that HuC load is complete when i915 starts accepting 
>>>>>>>>>> submissions are boot and suspend/resume, with the latter being 
>>>>>>>>>> the main problem; GT reset is not a concern because HuC now 
>>>>>>>>>> survives it. A suspend/resume can be transparent to userspace 
>>>>>>>>>> and therefore the HuC status can temporarily flip from loaded 
>>>>>>>>>> to not without userspace knowledge, especially if we start 
>>>>>>>>>> going into deeper suspend states and start causing HuC resets 
>>>>>>>>>> when we go into runtime suspend. Note that this is different 
>>>>>>>>>> from what happens during GT reset for older platforms, because 
>>>>>>>>>> in that scenario we guarantee that HuC reload is complete 
>>>>>>>>>> before we restart the submission back-end, so userspace 
>>>>>>>>>> doesn't notice that the HuC status change. We had an internal 
>>>>>>>>>> discussion about this problem with both media and i915 archs 
>>>>>>>>>> and the conclusion was that the best option is for i915 to 
>>>>>>>>>> stall media submission while HuC (re-)load is in progress.
>>>>>>>>>
>>>>>>>>> Resume is potentialy a good reason - I did not pick up on that 
>>>>>>>>> from the cover letter. I read the statement about the unlikely 
>>>>>>>>> and small window where HuC is not loaded during kernel 
>>>>>>>>> init/resume and I guess did not pick up on the resume part.
>>>>>>>>>
>>>>>>>>> Waiting for GSC to load HuC from i915 resume is not an option?
>>>>>>>>
>>>>>>>> GSC is an aux device exported by i915, so AFAIU GSC resume can't 
>>>>>>>> start until i915 resume completes.
>>>>>>>
>>>>>>> I'll dig into this in the next few days since I want to 
>>>>>>> understand how exactly it works. Or someone can help explain.
>>>>>>>
>>>>>>> If in the end conclusion will be that i915 resume indeed cannot 
>>>>>>> wait for GSC, then I think auto-blocking of queued up contexts on 
>>>>>>> media engines indeed sounds unavoidable. Otherwise, as you 
>>>>>>> explained, user experience post resume wouldn't be good.
>>>>>>
>>>>>> Even if we could implement a wait, I'm not sure we should. GSC 
>>>>>> resume and HuC reload takes ~300ms in most cases, I don't think we 
>>>>>> want to block within the i915 resume path for that long.
>>>>>
>>>>> Yeah maybe not. But entertaining the idea that it is technically 
>>>>> possible to block - we could perhaps add uapi for userspace to mark 
>>>>> contexts which want HuC access. Then track if there are any such 
>>>>> contexts with outstanding submissions and only wait in resume if 
>>>>> there are. If that would end up significantly less code on the i915 
>>>>> side to maintain is an open.
>>>>>
>>>>> What would be the end result from users point of view in case where 
>>>>> it suspended during video playback? The proposed solution from this 
>>>>> series sees the video stuck after resume. Maybe compositor blocks 
>>>>> as well since I am not sure how well they handle one window not 
>>>>> providing new data. Probably depends on the compositor.
>>>>>
>>>>> And then with a simpler solution definitely the whole resume would 
>>>>> be delayed by 300ms.
>>>>>
>>>>> With my ChromeOS hat the stalled media engines does sound like a 
>>>>> better solution. But with the maintainer hat I'd like all options 
>>>>> evaluated since there is attractiveness if a good enough solution 
>>>>> can be achieved with significantly less kernel code.
>>>>>
>>>>> You say 300ms is typical time for HuC load. How long it is on other 
>>>>> platforms? If much faster then why is it so slow here?
>>>>
>>>> The GSC itself has to come out of suspend before it can perform the 
>>>> load, which takes a few tens of ms I believe. AFAIU the GSC is also 
>>>> slower in processing the HuC load and auth compared to the legacy 
>>>> path. The GSC FW team gave a 250ms limit for the time the GSC FW 
>>>> needs from start of the resume flow to HuC load complete, so I 
>>>> bumped that to ~300ms to account for all other SW interactions, plus 
>>>> a bit of buffer. Note that a bit of the SW overhead is caused by the 
>>>> fact that we have 2 mei modules in play here: mei-gsc, which manages 
>>>> the GSC device itself (including resume), and mei-pxp, which owns 
>>>> the pxp messaging, including HuC load.
>>>
>>> And how long on other platforms (not DG2) do you know? Presumably 
>>> there the wait is on the i915 resume path?
>>
>> I don't have "official" expected load times at hand, but looking at 
>> the BAT boot logs for this series for DG1 I see it takes ~10 ms to 
>> load both GuC and HuC:
>>
>> <7>[    8.157838] i915 0000:03:00.0: [drm:intel_huc_init [i915]] GSC 
>> loads huc=no
>> <6>[    8.158632] i915 0000:03:00.0: [drm] GuC firmware 
>> i915/dg1_guc_70.1.1.bin version 70.1
>> <6>[    8.158634] i915 0000:03:00.0: [drm] HuC firmware 
>> i915/dg1_huc_7.9.3.bin version 7.9
>> <7>[    8.164255] i915 0000:03:00.0: [drm:guc_enable_communication 
>> [i915]] GuC communication enabled
>> <6>[    8.166111] i915 0000:03:00.0: [drm] HuC authenticated
>>
>> Note that we increase the GT frequency all the way to the max before 
>> starting the FW load, which speeds things up.
>>
>>>
>>>>>>> However, do we really need to lie in the getparam? How about 
>>>>>>> extend or add a new one to separate the loading vs loaded states? 
>>>>>>> Since userspace does not support DG2 HuC yet this should be doable.
>>>>>>
>>>>>> I don't really have a preference here. The media team asked us to 
>>>>>> do it this way because they wouldn't have a use for the different 
>>>>>> "in progress" and "done" states. If they're ok with having 
>>>>>> separate flags that's fine by me.
>>>>>> Tony, any feedback here?
>>>>>
>>>>> We don't even have any docs in i915_drm.h in terms of what it means:
>>>>>
>>>>> #define I915_PARAM_HUC_STATUS         42
>>>>>
>>>>> Seems to be a boolean. Status false vs true? Could you add some docs?
>>>>
>>>> There is documentation above intel_huc_check_status(), which is also 
>>>> updated in this series. I can move that to i915_drm.h.
>>>
>>> That would be great, thanks.
>>>
>>> And with so rich return codes already documented and exposed via uapi 
>>> - would we really need to add anything new for DG2 apart for 
>>> userspace to know that if zero is returned (not a negative error 
>>> value) it should retry? I mean is there another negative error 
>>> missing which would prevent zero transitioning to one?
>>
>> I think if the auth fails we currently return 0, because the uc state 
>> in that case would be "TRANSFERRED", i.e. DMA complete but not fully 
>> enabled. I don't have anything against changing the FW state to 
>> "ERROR" in this scenario and leave the 0 to mean "not done yet", but 
>> I'd prefer the media team to comment on their needs for this IOCTL 
>> before committing to anything.
> 
> 
> Currently media doesn't differentiate "delayed loading is in progress" 
> with "HuC is authenticated and running". If the HuC authentication 
> eventually fails, the user needs to check the debugfs to know the 
> reason. IMHO, it's not a big problem as this is what we do even when the 
> IOCTL returns non-zero values. + Carl to comment.

(Side note - debugfs can be assumed to not exist so it is not interesting to users.)

There isn't currently a "delayed loading is in progress" state, that's the discussion in this thread, if and how to add it.

Getparam it currently documents these states:

  -ENODEV if HuC is not present on this platform,
  -EOPNOTSUPP if HuC firmware is disabled,
  -ENOPKG if HuC firmware was not installed,
  -ENOEXEC if HuC firmware is invalid or mismatched,
  0 if HuC firmware is not running,
  1 if HuC firmware is authenticated and running.

This patch proposed to change this to:

  1 if HuC firmware is authenticated and running or if delayed load is in progress,
  0 if HuC firmware is not running and delayed load is not in progress

Alternative idea is for DG2 (well in general) to add some more fine grained states, so that i915 does not have to use 1 for both running and loading. This may be adding a new error code for auth fails as Daniele mentioned. Then UMD can know that if 0 is returned and platform is DG2 it needs to query it again since it will transition to either 1 or error eventually. This would mean the non error states would be:

  0 not running (aka loading)
  1 running (and authenticated)

@Daniele - one more thing - can you make sure in the series (if you haven't already) that if HuC status was in any error before suspend reload is not re-tried on resume? My thinking is that the error is likely to persist and we don't want to impose long delay on every resume afterwards. Makes sense to you?

@Tony - one more question for the UMD. Or two.

How prevalent is usage of HuC on DG2 depending on what codecs need it? Do you know in advance, before creating a GEM context, that HuC commands will be sent to the engine or this changes at runtime?

Regards,

Tvrtko

Daniele Ceraolo Spurio June 15, 2022, 2:35 p.m. UTC | #13

On 6/15/2022 3:13 AM, Tvrtko Ursulin wrote:
>
> On 15/06/2022 00:15, Ye, Tony wrote:
>> On 6/14/2022 8:30 AM, Ceraolo Spurio, Daniele wrote:
>>> On 6/14/2022 12:44 AM, Tvrtko Ursulin wrote:
>>>>
>>>> On 13/06/2022 19:13, Ceraolo Spurio, Daniele wrote:
>>>>> On 6/13/2022 10:39 AM, Tvrtko Ursulin wrote:
>>>>>> On 13/06/2022 18:06, Ceraolo Spurio, Daniele wrote:
>>>>>>> On 6/13/2022 9:56 AM, Tvrtko Ursulin wrote:
>>>>>>>> On 13/06/2022 17:41, Ceraolo Spurio, Daniele wrote:
>>>>>>>>> On 6/13/2022 9:31 AM, Tvrtko Ursulin wrote:
>>>>>>>>>>
>>>>>>>>>> On 13/06/2022 16:39, Ceraolo Spurio, Daniele wrote:
>>>>>>>>>>> On 6/13/2022 1:16 AM, Tvrtko Ursulin wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> On 10/06/2022 00:19, Daniele Ceraolo Spurio wrote:
>>>>>>>>>>>>> On DG2, HuC loading is performed by the GSC, via a PXP 
>>>>>>>>>>>>> command. The load
>>>>>>>>>>>>> operation itself is relatively simple (just send a message 
>>>>>>>>>>>>> to the GSC
>>>>>>>>>>>>> with the physical address of the HuC in LMEM), but there 
>>>>>>>>>>>>> are timing
>>>>>>>>>>>>> changes that requires special attention. In particular, to 
>>>>>>>>>>>>> send a PXP
>>>>>>>>>>>>> command we need to first export the GSC driver and then 
>>>>>>>>>>>>> wait for the
>>>>>>>>>>>>> mei-gsc and mei-pxp modules to start, which means that HuC 
>>>>>>>>>>>>> load will
>>>>>>>>>>>>> complete after i915 load is complete. This means that 
>>>>>>>>>>>>> there is a small
>>>>>>>>>>>>> window of time after i915 is registered and before HuC is 
>>>>>>>>>>>>> loaded
>>>>>>>>>>>>> during which userspace could submit and/or checking the 
>>>>>>>>>>>>> HuC load status,
>>>>>>>>>>>>> although this is quite unlikely to happen (HuC is usually 
>>>>>>>>>>>>> loaded before
>>>>>>>>>>>>> kernel init/resume completes).
>>>>>>>>>>>>> We've consulted with the media team in regards to how to 
>>>>>>>>>>>>> handle this and
>>>>>>>>>>>>> they've asked us to do the following:
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1) Report HuC as loaded in the getparam IOCTL even if load 
>>>>>>>>>>>>> is still in
>>>>>>>>>>>>> progress. The media driver uses the IOCTL as a way to 
>>>>>>>>>>>>> check if HuC is
>>>>>>>>>>>>> enabled and then includes a secondary check in the batches 
>>>>>>>>>>>>> to get the
>>>>>>>>>>>>> actual status, so doing it this way allows userspace to 
>>>>>>>>>>>>> keep working
>>>>>>>>>>>>> without changes.
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2) Stall all userspace VCS submission until HuC is loaded. 
>>>>>>>>>>>>> Stalls are
>>>>>>>>>>>>> expected to be very rare (if any), due to the fact that 
>>>>>>>>>>>>> HuC is usually
>>>>>>>>>>>>> loaded before kernel init/resume is completed.
>>>>>>>>>>>>
>>>>>>>>>>>> Motivation to add these complications into i915 are not 
>>>>>>>>>>>> clear to me here. I mean there is no HuC on DG2 _yet_ is 
>>>>>>>>>>>> the premise of the series, right? So no backwards 
>>>>>>>>>>>> compatibility concerns. In this case why jump through the 
>>>>>>>>>>>> hoops and not let userspace handle all of this by just 
>>>>>>>>>>>> leaving the getparam return the true status?
>>>>>>>>>>>
>>>>>>>>>>> The main areas impacted by the fact that we can't guarantee 
>>>>>>>>>>> that HuC load is complete when i915 starts accepting 
>>>>>>>>>>> submissions are boot and suspend/resume, with the latter 
>>>>>>>>>>> being the main problem; GT reset is not a concern because 
>>>>>>>>>>> HuC now survives it. A suspend/resume can be transparent to 
>>>>>>>>>>> userspace and therefore the HuC status can temporarily flip 
>>>>>>>>>>> from loaded to not without userspace knowledge, especially 
>>>>>>>>>>> if we start going into deeper suspend states and start 
>>>>>>>>>>> causing HuC resets when we go into runtime suspend. Note 
>>>>>>>>>>> that this is different from what happens during GT reset for 
>>>>>>>>>>> older platforms, because in that scenario we guarantee that 
>>>>>>>>>>> HuC reload is complete before we restart the submission 
>>>>>>>>>>> back-end, so userspace doesn't notice that the HuC status 
>>>>>>>>>>> change. We had an internal discussion about this problem 
>>>>>>>>>>> with both media and i915 archs and the conclusion was that 
>>>>>>>>>>> the best option is for i915 to stall media submission while 
>>>>>>>>>>> HuC (re-)load is in progress.
>>>>>>>>>>
>>>>>>>>>> Resume is potentialy a good reason - I did not pick up on 
>>>>>>>>>> that from the cover letter. I read the statement about the 
>>>>>>>>>> unlikely and small window where HuC is not loaded during 
>>>>>>>>>> kernel init/resume and I guess did not pick up on the resume 
>>>>>>>>>> part.
>>>>>>>>>>
>>>>>>>>>> Waiting for GSC to load HuC from i915 resume is not an option?
>>>>>>>>>
>>>>>>>>> GSC is an aux device exported by i915, so AFAIU GSC resume 
>>>>>>>>> can't start until i915 resume completes.
>>>>>>>>
>>>>>>>> I'll dig into this in the next few days since I want to 
>>>>>>>> understand how exactly it works. Or someone can help explain.
>>>>>>>>
>>>>>>>> If in the end conclusion will be that i915 resume indeed cannot 
>>>>>>>> wait for GSC, then I think auto-blocking of queued up contexts 
>>>>>>>> on media engines indeed sounds unavoidable. Otherwise, as you 
>>>>>>>> explained, user experience post resume wouldn't be good.
>>>>>>>
>>>>>>> Even if we could implement a wait, I'm not sure we should. GSC 
>>>>>>> resume and HuC reload takes ~300ms in most cases, I don't think 
>>>>>>> we want to block within the i915 resume path for that long.
>>>>>>
>>>>>> Yeah maybe not. But entertaining the idea that it is technically 
>>>>>> possible to block - we could perhaps add uapi for userspace to 
>>>>>> mark contexts which want HuC access. Then track if there are any 
>>>>>> such contexts with outstanding submissions and only wait in 
>>>>>> resume if there are. If that would end up significantly less code 
>>>>>> on the i915 side to maintain is an open.
>>>>>>
>>>>>> What would be the end result from users point of view in case 
>>>>>> where it suspended during video playback? The proposed solution 
>>>>>> from this series sees the video stuck after resume. Maybe 
>>>>>> compositor blocks as well since I am not sure how well they 
>>>>>> handle one window not providing new data. Probably depends on the 
>>>>>> compositor.
>>>>>>
>>>>>> And then with a simpler solution definitely the whole resume 
>>>>>> would be delayed by 300ms.
>>>>>>
>>>>>> With my ChromeOS hat the stalled media engines does sound like a 
>>>>>> better solution. But with the maintainer hat I'd like all options 
>>>>>> evaluated since there is attractiveness if a good enough solution 
>>>>>> can be achieved with significantly less kernel code.
>>>>>>
>>>>>> You say 300ms is typical time for HuC load. How long it is on 
>>>>>> other platforms? If much faster then why is it so slow here?
>>>>>
>>>>> The GSC itself has to come out of suspend before it can perform 
>>>>> the load, which takes a few tens of ms I believe. AFAIU the GSC is 
>>>>> also slower in processing the HuC load and auth compared to the 
>>>>> legacy path. The GSC FW team gave a 250ms limit for the time the 
>>>>> GSC FW needs from start of the resume flow to HuC load complete, 
>>>>> so I bumped that to ~300ms to account for all other SW 
>>>>> interactions, plus a bit of buffer. Note that a bit of the SW 
>>>>> overhead is caused by the fact that we have 2 mei modules in play 
>>>>> here: mei-gsc, which manages the GSC device itself (including 
>>>>> resume), and mei-pxp, which owns the pxp messaging, including HuC 
>>>>> load.
>>>>
>>>> And how long on other platforms (not DG2) do you know? Presumably 
>>>> there the wait is on the i915 resume path?
>>>
>>> I don't have "official" expected load times at hand, but looking at 
>>> the BAT boot logs for this series for DG1 I see it takes ~10 ms to 
>>> load both GuC and HuC:
>>>
>>> <7>[    8.157838] i915 0000:03:00.0: [drm:intel_huc_init [i915]] GSC 
>>> loads huc=no
>>> <6>[    8.158632] i915 0000:03:00.0: [drm] GuC firmware 
>>> i915/dg1_guc_70.1.1.bin version 70.1
>>> <6>[    8.158634] i915 0000:03:00.0: [drm] HuC firmware 
>>> i915/dg1_huc_7.9.3.bin version 7.9
>>> <7>[    8.164255] i915 0000:03:00.0: [drm:guc_enable_communication 
>>> [i915]] GuC communication enabled
>>> <6>[    8.166111] i915 0000:03:00.0: [drm] HuC authenticated
>>>
>>> Note that we increase the GT frequency all the way to the max before 
>>> starting the FW load, which speeds things up.
>>>
>>>>
>>>>>>>> However, do we really need to lie in the getparam? How about 
>>>>>>>> extend or add a new one to separate the loading vs loaded 
>>>>>>>> states? Since userspace does not support DG2 HuC yet this 
>>>>>>>> should be doable.
>>>>>>>
>>>>>>> I don't really have a preference here. The media team asked us 
>>>>>>> to do it this way because they wouldn't have a use for the 
>>>>>>> different "in progress" and "done" states. If they're ok with 
>>>>>>> having separate flags that's fine by me.
>>>>>>> Tony, any feedback here?
>>>>>>
>>>>>> We don't even have any docs in i915_drm.h in terms of what it means:
>>>>>>
>>>>>> #define I915_PARAM_HUC_STATUS         42
>>>>>>
>>>>>> Seems to be a boolean. Status false vs true? Could you add some 
>>>>>> docs?
>>>>>
>>>>> There is documentation above intel_huc_check_status(), which is 
>>>>> also updated in this series. I can move that to i915_drm.h.
>>>>
>>>> That would be great, thanks.
>>>>
>>>> And with so rich return codes already documented and exposed via 
>>>> uapi - would we really need to add anything new for DG2 apart for 
>>>> userspace to know that if zero is returned (not a negative error 
>>>> value) it should retry? I mean is there another negative error 
>>>> missing which would prevent zero transitioning to one?
>>>
>>> I think if the auth fails we currently return 0, because the uc 
>>> state in that case would be "TRANSFERRED", i.e. DMA complete but not 
>>> fully enabled. I don't have anything against changing the FW state 
>>> to "ERROR" in this scenario and leave the 0 to mean "not done yet", 
>>> but I'd prefer the media team to comment on their needs for this 
>>> IOCTL before committing to anything.
>>
>>
>> Currently media doesn't differentiate "delayed loading is in 
>> progress" with "HuC is authenticated and running". If the HuC 
>> authentication eventually fails, the user needs to check the debugfs 
>> to know the reason. IMHO, it's not a big problem as this is what we 
>> do even when the IOCTL returns non-zero values. + Carl to comment.
>
> (Side note - debugfs can be assumed to not exist so it is not 
> interesting to users.)
>
> There isn't currently a "delayed loading is in progress" state, that's 
> the discussion in this thread, if and how to add it.
>
> Getparam it currently documents these states:
>
>  -ENODEV if HuC is not present on this platform,
>  -EOPNOTSUPP if HuC firmware is disabled,
>  -ENOPKG if HuC firmware was not installed,
>  -ENOEXEC if HuC firmware is invalid or mismatched,
>  0 if HuC firmware is not running,
>  1 if HuC firmware is authenticated and running.
>
> This patch proposed to change this to:
>
>  1 if HuC firmware is authenticated and running or if delayed load is 
> in progress,
>  0 if HuC firmware is not running and delayed load is not in progress
>
> Alternative idea is for DG2 (well in general) to add some more fine 
> grained states, so that i915 does not have to use 1 for both running 
> and loading. This may be adding a new error code for auth fails as 
> Daniele mentioned. Then UMD can know that if 0 is returned and 
> platform is DG2 it needs to query it again since it will transition to 
> either 1 or error eventually. This would mean the non error states 
> would be:
>
>  0 not running (aka loading)
>  1 running (and authenticated)
>
> @Daniele - one more thing - can you make sure in the series (if you 
> haven't already) that if HuC status was in any error before suspend 
> reload is not re-tried on resume? My thinking is that the error is 
> likely to persist and we don't want to impose long delay on every 
> resume afterwards. Makes sense to you?

This series does not stall any submissions on resume if there previously 
was an issue with the HuC load (the fence is left as completed), but 
we'll still attempt to re-load the HuC in the background if mei-gsc and 
mei-pxp are successful in their resume and call back into i915. Does 
that work for you?

Daniele

>
> @Tony - one more question for the UMD. Or two.
>
> How prevalent is usage of HuC on DG2 depending on what codecs need it? 
> Do you know in advance, before creating a GEM context, that HuC 
> commands will be sent to the engine or this changes at runtime?
>
> Regards,
>
> Tvrtko

Tvrtko Ursulin June 15, 2022, 2:53 p.m. UTC | #14

On 15/06/2022 15:35, Ceraolo Spurio, Daniele wrote:
> On 6/15/2022 3:13 AM, Tvrtko Ursulin wrote:
>>
>> On 15/06/2022 00:15, Ye, Tony wrote:
>>> On 6/14/2022 8:30 AM, Ceraolo Spurio, Daniele wrote:
>>>> On 6/14/2022 12:44 AM, Tvrtko Ursulin wrote:
>>>>>
>>>>> On 13/06/2022 19:13, Ceraolo Spurio, Daniele wrote:
>>>>>> On 6/13/2022 10:39 AM, Tvrtko Ursulin wrote:
>>>>>>> On 13/06/2022 18:06, Ceraolo Spurio, Daniele wrote:
>>>>>>>> On 6/13/2022 9:56 AM, Tvrtko Ursulin wrote:
>>>>>>>>> On 13/06/2022 17:41, Ceraolo Spurio, Daniele wrote:
>>>>>>>>>> On 6/13/2022 9:31 AM, Tvrtko Ursulin wrote:
>>>>>>>>>>>
>>>>>>>>>>> On 13/06/2022 16:39, Ceraolo Spurio, Daniele wrote:
>>>>>>>>>>>> On 6/13/2022 1:16 AM, Tvrtko Ursulin wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 10/06/2022 00:19, Daniele Ceraolo Spurio wrote:
>>>>>>>>>>>>>> On DG2, HuC loading is performed by the GSC, via a PXP 
>>>>>>>>>>>>>> command. The load
>>>>>>>>>>>>>> operation itself is relatively simple (just send a message 
>>>>>>>>>>>>>> to the GSC
>>>>>>>>>>>>>> with the physical address of the HuC in LMEM), but there 
>>>>>>>>>>>>>> are timing
>>>>>>>>>>>>>> changes that requires special attention. In particular, to 
>>>>>>>>>>>>>> send a PXP
>>>>>>>>>>>>>> command we need to first export the GSC driver and then 
>>>>>>>>>>>>>> wait for the
>>>>>>>>>>>>>> mei-gsc and mei-pxp modules to start, which means that HuC 
>>>>>>>>>>>>>> load will
>>>>>>>>>>>>>> complete after i915 load is complete. This means that 
>>>>>>>>>>>>>> there is a small
>>>>>>>>>>>>>> window of time after i915 is registered and before HuC is 
>>>>>>>>>>>>>> loaded
>>>>>>>>>>>>>> during which userspace could submit and/or checking the 
>>>>>>>>>>>>>> HuC load status,
>>>>>>>>>>>>>> although this is quite unlikely to happen (HuC is usually 
>>>>>>>>>>>>>> loaded before
>>>>>>>>>>>>>> kernel init/resume completes).
>>>>>>>>>>>>>> We've consulted with the media team in regards to how to 
>>>>>>>>>>>>>> handle this and
>>>>>>>>>>>>>> they've asked us to do the following:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 1) Report HuC as loaded in the getparam IOCTL even if load 
>>>>>>>>>>>>>> is still in
>>>>>>>>>>>>>> progress. The media driver uses the IOCTL as a way to 
>>>>>>>>>>>>>> check if HuC is
>>>>>>>>>>>>>> enabled and then includes a secondary check in the batches 
>>>>>>>>>>>>>> to get the
>>>>>>>>>>>>>> actual status, so doing it this way allows userspace to 
>>>>>>>>>>>>>> keep working
>>>>>>>>>>>>>> without changes.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2) Stall all userspace VCS submission until HuC is loaded. 
>>>>>>>>>>>>>> Stalls are
>>>>>>>>>>>>>> expected to be very rare (if any), due to the fact that 
>>>>>>>>>>>>>> HuC is usually
>>>>>>>>>>>>>> loaded before kernel init/resume is completed.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Motivation to add these complications into i915 are not 
>>>>>>>>>>>>> clear to me here. I mean there is no HuC on DG2 _yet_ is 
>>>>>>>>>>>>> the premise of the series, right? So no backwards 
>>>>>>>>>>>>> compatibility concerns. In this case why jump through the 
>>>>>>>>>>>>> hoops and not let userspace handle all of this by just 
>>>>>>>>>>>>> leaving the getparam return the true status?
>>>>>>>>>>>>
>>>>>>>>>>>> The main areas impacted by the fact that we can't guarantee 
>>>>>>>>>>>> that HuC load is complete when i915 starts accepting 
>>>>>>>>>>>> submissions are boot and suspend/resume, with the latter 
>>>>>>>>>>>> being the main problem; GT reset is not a concern because 
>>>>>>>>>>>> HuC now survives it. A suspend/resume can be transparent to 
>>>>>>>>>>>> userspace and therefore the HuC status can temporarily flip 
>>>>>>>>>>>> from loaded to not without userspace knowledge, especially 
>>>>>>>>>>>> if we start going into deeper suspend states and start 
>>>>>>>>>>>> causing HuC resets when we go into runtime suspend. Note 
>>>>>>>>>>>> that this is different from what happens during GT reset for 
>>>>>>>>>>>> older platforms, because in that scenario we guarantee that 
>>>>>>>>>>>> HuC reload is complete before we restart the submission 
>>>>>>>>>>>> back-end, so userspace doesn't notice that the HuC status 
>>>>>>>>>>>> change. We had an internal discussion about this problem 
>>>>>>>>>>>> with both media and i915 archs and the conclusion was that 
>>>>>>>>>>>> the best option is for i915 to stall media submission while 
>>>>>>>>>>>> HuC (re-)load is in progress.
>>>>>>>>>>>
>>>>>>>>>>> Resume is potentialy a good reason - I did not pick up on 
>>>>>>>>>>> that from the cover letter. I read the statement about the 
>>>>>>>>>>> unlikely and small window where HuC is not loaded during 
>>>>>>>>>>> kernel init/resume and I guess did not pick up on the resume 
>>>>>>>>>>> part.
>>>>>>>>>>>
>>>>>>>>>>> Waiting for GSC to load HuC from i915 resume is not an option?
>>>>>>>>>>
>>>>>>>>>> GSC is an aux device exported by i915, so AFAIU GSC resume 
>>>>>>>>>> can't start until i915 resume completes.
>>>>>>>>>
>>>>>>>>> I'll dig into this in the next few days since I want to 
>>>>>>>>> understand how exactly it works. Or someone can help explain.
>>>>>>>>>
>>>>>>>>> If in the end conclusion will be that i915 resume indeed cannot 
>>>>>>>>> wait for GSC, then I think auto-blocking of queued up contexts 
>>>>>>>>> on media engines indeed sounds unavoidable. Otherwise, as you 
>>>>>>>>> explained, user experience post resume wouldn't be good.
>>>>>>>>
>>>>>>>> Even if we could implement a wait, I'm not sure we should. GSC 
>>>>>>>> resume and HuC reload takes ~300ms in most cases, I don't think 
>>>>>>>> we want to block within the i915 resume path for that long.
>>>>>>>
>>>>>>> Yeah maybe not. But entertaining the idea that it is technically 
>>>>>>> possible to block - we could perhaps add uapi for userspace to 
>>>>>>> mark contexts which want HuC access. Then track if there are any 
>>>>>>> such contexts with outstanding submissions and only wait in 
>>>>>>> resume if there are. If that would end up significantly less code 
>>>>>>> on the i915 side to maintain is an open.
>>>>>>>
>>>>>>> What would be the end result from users point of view in case 
>>>>>>> where it suspended during video playback? The proposed solution 
>>>>>>> from this series sees the video stuck after resume. Maybe 
>>>>>>> compositor blocks as well since I am not sure how well they 
>>>>>>> handle one window not providing new data. Probably depends on the 
>>>>>>> compositor.
>>>>>>>
>>>>>>> And then with a simpler solution definitely the whole resume 
>>>>>>> would be delayed by 300ms.
>>>>>>>
>>>>>>> With my ChromeOS hat the stalled media engines does sound like a 
>>>>>>> better solution. But with the maintainer hat I'd like all options 
>>>>>>> evaluated since there is attractiveness if a good enough solution 
>>>>>>> can be achieved with significantly less kernel code.
>>>>>>>
>>>>>>> You say 300ms is typical time for HuC load. How long it is on 
>>>>>>> other platforms? If much faster then why is it so slow here?
>>>>>>
>>>>>> The GSC itself has to come out of suspend before it can perform 
>>>>>> the load, which takes a few tens of ms I believe. AFAIU the GSC is 
>>>>>> also slower in processing the HuC load and auth compared to the 
>>>>>> legacy path. The GSC FW team gave a 250ms limit for the time the 
>>>>>> GSC FW needs from start of the resume flow to HuC load complete, 
>>>>>> so I bumped that to ~300ms to account for all other SW 
>>>>>> interactions, plus a bit of buffer. Note that a bit of the SW 
>>>>>> overhead is caused by the fact that we have 2 mei modules in play 
>>>>>> here: mei-gsc, which manages the GSC device itself (including 
>>>>>> resume), and mei-pxp, which owns the pxp messaging, including HuC 
>>>>>> load.
>>>>>
>>>>> And how long on other platforms (not DG2) do you know? Presumably 
>>>>> there the wait is on the i915 resume path?
>>>>
>>>> I don't have "official" expected load times at hand, but looking at 
>>>> the BAT boot logs for this series for DG1 I see it takes ~10 ms to 
>>>> load both GuC and HuC:
>>>>
>>>> <7>[    8.157838] i915 0000:03:00.0: [drm:intel_huc_init [i915]] GSC 
>>>> loads huc=no
>>>> <6>[    8.158632] i915 0000:03:00.0: [drm] GuC firmware 
>>>> i915/dg1_guc_70.1.1.bin version 70.1
>>>> <6>[    8.158634] i915 0000:03:00.0: [drm] HuC firmware 
>>>> i915/dg1_huc_7.9.3.bin version 7.9
>>>> <7>[    8.164255] i915 0000:03:00.0: [drm:guc_enable_communication 
>>>> [i915]] GuC communication enabled
>>>> <6>[    8.166111] i915 0000:03:00.0: [drm] HuC authenticated
>>>>
>>>> Note that we increase the GT frequency all the way to the max before 
>>>> starting the FW load, which speeds things up.
>>>>
>>>>>
>>>>>>>>> However, do we really need to lie in the getparam? How about 
>>>>>>>>> extend or add a new one to separate the loading vs loaded 
>>>>>>>>> states? Since userspace does not support DG2 HuC yet this 
>>>>>>>>> should be doable.
>>>>>>>>
>>>>>>>> I don't really have a preference here. The media team asked us 
>>>>>>>> to do it this way because they wouldn't have a use for the 
>>>>>>>> different "in progress" and "done" states. If they're ok with 
>>>>>>>> having separate flags that's fine by me.
>>>>>>>> Tony, any feedback here?
>>>>>>>
>>>>>>> We don't even have any docs in i915_drm.h in terms of what it means:
>>>>>>>
>>>>>>> #define I915_PARAM_HUC_STATUS         42
>>>>>>>
>>>>>>> Seems to be a boolean. Status false vs true? Could you add some 
>>>>>>> docs?
>>>>>>
>>>>>> There is documentation above intel_huc_check_status(), which is 
>>>>>> also updated in this series. I can move that to i915_drm.h.
>>>>>
>>>>> That would be great, thanks.
>>>>>
>>>>> And with so rich return codes already documented and exposed via 
>>>>> uapi - would we really need to add anything new for DG2 apart for 
>>>>> userspace to know that if zero is returned (not a negative error 
>>>>> value) it should retry? I mean is there another negative error 
>>>>> missing which would prevent zero transitioning to one?
>>>>
>>>> I think if the auth fails we currently return 0, because the uc 
>>>> state in that case would be "TRANSFERRED", i.e. DMA complete but not 
>>>> fully enabled. I don't have anything against changing the FW state 
>>>> to "ERROR" in this scenario and leave the 0 to mean "not done yet", 
>>>> but I'd prefer the media team to comment on their needs for this 
>>>> IOCTL before committing to anything.
>>>
>>>
>>> Currently media doesn't differentiate "delayed loading is in 
>>> progress" with "HuC is authenticated and running". If the HuC 
>>> authentication eventually fails, the user needs to check the debugfs 
>>> to know the reason. IMHO, it's not a big problem as this is what we 
>>> do even when the IOCTL returns non-zero values. + Carl to comment.
>>
>> (Side note - debugfs can be assumed to not exist so it is not 
>> interesting to users.)
>>
>> There isn't currently a "delayed loading is in progress" state, that's 
>> the discussion in this thread, if and how to add it.
>>
>> Getparam it currently documents these states:
>>
>>  -ENODEV if HuC is not present on this platform,
>>  -EOPNOTSUPP if HuC firmware is disabled,
>>  -ENOPKG if HuC firmware was not installed,
>>  -ENOEXEC if HuC firmware is invalid or mismatched,
>>  0 if HuC firmware is not running,
>>  1 if HuC firmware is authenticated and running.
>>
>> This patch proposed to change this to:
>>
>>  1 if HuC firmware is authenticated and running or if delayed load is 
>> in progress,
>>  0 if HuC firmware is not running and delayed load is not in progress
>>
>> Alternative idea is for DG2 (well in general) to add some more fine 
>> grained states, so that i915 does not have to use 1 for both running 
>> and loading. This may be adding a new error code for auth fails as 
>> Daniele mentioned. Then UMD can know that if 0 is returned and 
>> platform is DG2 it needs to query it again since it will transition to 
>> either 1 or error eventually. This would mean the non error states 
>> would be:
>>
>>  0 not running (aka loading)
>>  1 running (and authenticated)
>>
>> @Daniele - one more thing - can you make sure in the series (if you 
>> haven't already) that if HuC status was in any error before suspend 
>> reload is not re-tried on resume? My thinking is that the error is 
>> likely to persist and we don't want to impose long delay on every 
>> resume afterwards. Makes sense to you?
> 
> This series does not stall any submissions on resume if there previously 
> was an issue with the HuC load (the fence is left as completed), but 
> we'll still attempt to re-load the HuC in the background if mei-gsc and 
> mei-pxp are successful in their resume and call back into i915. Does 
> that work for you?

Yep, I think that's the best option.

Regards,

Tvrtko

> 
> Daniele
> 
>>
>> @Tony - one more question for the UMD. Or two.
>>
>> How prevalent is usage of HuC on DG2 depending on what codecs need it? 
>> Do you know in advance, before creating a GEM context, that HuC 
>> commands will be sent to the engine or this changes at runtime?
>>
>> Regards,
>>
>> Tvrtko
>

Ye, Tony June 15, 2022, 4:14 p.m. UTC | #15

On 6/15/2022 3:13 AM, Tvrtko Ursulin wrote:
>
> On 15/06/2022 00:15, Ye, Tony wrote:
>> On 6/14/2022 8:30 AM, Ceraolo Spurio, Daniele wrote:
>>> On 6/14/2022 12:44 AM, Tvrtko Ursulin wrote:
>>>>
>>>> On 13/06/2022 19:13, Ceraolo Spurio, Daniele wrote:
>>>>> On 6/13/2022 10:39 AM, Tvrtko Ursulin wrote:
>>>>>> On 13/06/2022 18:06, Ceraolo Spurio, Daniele wrote:
>>>>>>> On 6/13/2022 9:56 AM, Tvrtko Ursulin wrote:
>>>>>>>> On 13/06/2022 17:41, Ceraolo Spurio, Daniele wrote:
>>>>>>>>> On 6/13/2022 9:31 AM, Tvrtko Ursulin wrote:
>>>>>>>>>>
>>>>>>>>>> On 13/06/2022 16:39, Ceraolo Spurio, Daniele wrote:
>>>>>>>>>>> On 6/13/2022 1:16 AM, Tvrtko Ursulin wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> On 10/06/2022 00:19, Daniele Ceraolo Spurio wrote:
>>>>>>>>>>>>> On DG2, HuC loading is performed by the GSC, via a PXP 
>>>>>>>>>>>>> command. The load
>>>>>>>>>>>>> operation itself is relatively simple (just send a message 
>>>>>>>>>>>>> to the GSC
>>>>>>>>>>>>> with the physical address of the HuC in LMEM), but there 
>>>>>>>>>>>>> are timing
>>>>>>>>>>>>> changes that requires special attention. In particular, to 
>>>>>>>>>>>>> send a PXP
>>>>>>>>>>>>> command we need to first export the GSC driver and then 
>>>>>>>>>>>>> wait for the
>>>>>>>>>>>>> mei-gsc and mei-pxp modules to start, which means that HuC 
>>>>>>>>>>>>> load will
>>>>>>>>>>>>> complete after i915 load is complete. This means that 
>>>>>>>>>>>>> there is a small
>>>>>>>>>>>>> window of time after i915 is registered and before HuC is 
>>>>>>>>>>>>> loaded
>>>>>>>>>>>>> during which userspace could submit and/or checking the 
>>>>>>>>>>>>> HuC load status,
>>>>>>>>>>>>> although this is quite unlikely to happen (HuC is usually 
>>>>>>>>>>>>> loaded before
>>>>>>>>>>>>> kernel init/resume completes).
>>>>>>>>>>>>> We've consulted with the media team in regards to how to 
>>>>>>>>>>>>> handle this and
>>>>>>>>>>>>> they've asked us to do the following:
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1) Report HuC as loaded in the getparam IOCTL even if load 
>>>>>>>>>>>>> is still in
>>>>>>>>>>>>> progress. The media driver uses the IOCTL as a way to 
>>>>>>>>>>>>> check if HuC is
>>>>>>>>>>>>> enabled and then includes a secondary check in the batches 
>>>>>>>>>>>>> to get the
>>>>>>>>>>>>> actual status, so doing it this way allows userspace to 
>>>>>>>>>>>>> keep working
>>>>>>>>>>>>> without changes.
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2) Stall all userspace VCS submission until HuC is loaded. 
>>>>>>>>>>>>> Stalls are
>>>>>>>>>>>>> expected to be very rare (if any), due to the fact that 
>>>>>>>>>>>>> HuC is usually
>>>>>>>>>>>>> loaded before kernel init/resume is completed.
>>>>>>>>>>>>
>>>>>>>>>>>> Motivation to add these complications into i915 are not 
>>>>>>>>>>>> clear to me here. I mean there is no HuC on DG2 _yet_ is 
>>>>>>>>>>>> the premise of the series, right? So no backwards 
>>>>>>>>>>>> compatibility concerns. In this case why jump through the 
>>>>>>>>>>>> hoops and not let userspace handle all of this by just 
>>>>>>>>>>>> leaving the getparam return the true status?
>>>>>>>>>>>
>>>>>>>>>>> The main areas impacted by the fact that we can't guarantee 
>>>>>>>>>>> that HuC load is complete when i915 starts accepting 
>>>>>>>>>>> submissions are boot and suspend/resume, with the latter 
>>>>>>>>>>> being the main problem; GT reset is not a concern because 
>>>>>>>>>>> HuC now survives it. A suspend/resume can be transparent to 
>>>>>>>>>>> userspace and therefore the HuC status can temporarily flip 
>>>>>>>>>>> from loaded to not without userspace knowledge, especially 
>>>>>>>>>>> if we start going into deeper suspend states and start 
>>>>>>>>>>> causing HuC resets when we go into runtime suspend. Note 
>>>>>>>>>>> that this is different from what happens during GT reset for 
>>>>>>>>>>> older platforms, because in that scenario we guarantee that 
>>>>>>>>>>> HuC reload is complete before we restart the submission 
>>>>>>>>>>> back-end, so userspace doesn't notice that the HuC status 
>>>>>>>>>>> change. We had an internal discussion about this problem 
>>>>>>>>>>> with both media and i915 archs and the conclusion was that 
>>>>>>>>>>> the best option is for i915 to stall media submission while 
>>>>>>>>>>> HuC (re-)load is in progress.
>>>>>>>>>>
>>>>>>>>>> Resume is potentialy a good reason - I did not pick up on 
>>>>>>>>>> that from the cover letter. I read the statement about the 
>>>>>>>>>> unlikely and small window where HuC is not loaded during 
>>>>>>>>>> kernel init/resume and I guess did not pick up on the resume 
>>>>>>>>>> part.
>>>>>>>>>>
>>>>>>>>>> Waiting for GSC to load HuC from i915 resume is not an option?
>>>>>>>>>
>>>>>>>>> GSC is an aux device exported by i915, so AFAIU GSC resume 
>>>>>>>>> can't start until i915 resume completes.
>>>>>>>>
>>>>>>>> I'll dig into this in the next few days since I want to 
>>>>>>>> understand how exactly it works. Or someone can help explain.
>>>>>>>>
>>>>>>>> If in the end conclusion will be that i915 resume indeed cannot 
>>>>>>>> wait for GSC, then I think auto-blocking of queued up contexts 
>>>>>>>> on media engines indeed sounds unavoidable. Otherwise, as you 
>>>>>>>> explained, user experience post resume wouldn't be good.
>>>>>>>
>>>>>>> Even if we could implement a wait, I'm not sure we should. GSC 
>>>>>>> resume and HuC reload takes ~300ms in most cases, I don't think 
>>>>>>> we want to block within the i915 resume path for that long.
>>>>>>
>>>>>> Yeah maybe not. But entertaining the idea that it is technically 
>>>>>> possible to block - we could perhaps add uapi for userspace to 
>>>>>> mark contexts which want HuC access. Then track if there are any 
>>>>>> such contexts with outstanding submissions and only wait in 
>>>>>> resume if there are. If that would end up significantly less code 
>>>>>> on the i915 side to maintain is an open.
>>>>>>
>>>>>> What would be the end result from users point of view in case 
>>>>>> where it suspended during video playback? The proposed solution 
>>>>>> from this series sees the video stuck after resume. Maybe 
>>>>>> compositor blocks as well since I am not sure how well they 
>>>>>> handle one window not providing new data. Probably depends on the 
>>>>>> compositor.
>>>>>>
>>>>>> And then with a simpler solution definitely the whole resume 
>>>>>> would be delayed by 300ms.
>>>>>>
>>>>>> With my ChromeOS hat the stalled media engines does sound like a 
>>>>>> better solution. But with the maintainer hat I'd like all options 
>>>>>> evaluated since there is attractiveness if a good enough solution 
>>>>>> can be achieved with significantly less kernel code.
>>>>>>
>>>>>> You say 300ms is typical time for HuC load. How long it is on 
>>>>>> other platforms? If much faster then why is it so slow here?
>>>>>
>>>>> The GSC itself has to come out of suspend before it can perform 
>>>>> the load, which takes a few tens of ms I believe. AFAIU the GSC is 
>>>>> also slower in processing the HuC load and auth compared to the 
>>>>> legacy path. The GSC FW team gave a 250ms limit for the time the 
>>>>> GSC FW needs from start of the resume flow to HuC load complete, 
>>>>> so I bumped that to ~300ms to account for all other SW 
>>>>> interactions, plus a bit of buffer. Note that a bit of the SW 
>>>>> overhead is caused by the fact that we have 2 mei modules in play 
>>>>> here: mei-gsc, which manages the GSC device itself (including 
>>>>> resume), and mei-pxp, which owns the pxp messaging, including HuC 
>>>>> load.
>>>>
>>>> And how long on other platforms (not DG2) do you know? Presumably 
>>>> there the wait is on the i915 resume path?
>>>
>>> I don't have "official" expected load times at hand, but looking at 
>>> the BAT boot logs for this series for DG1 I see it takes ~10 ms to 
>>> load both GuC and HuC:
>>>
>>> <7>[    8.157838] i915 0000:03:00.0: [drm:intel_huc_init [i915]] GSC 
>>> loads huc=no
>>> <6>[    8.158632] i915 0000:03:00.0: [drm] GuC firmware 
>>> i915/dg1_guc_70.1.1.bin version 70.1
>>> <6>[    8.158634] i915 0000:03:00.0: [drm] HuC firmware 
>>> i915/dg1_huc_7.9.3.bin version 7.9
>>> <7>[    8.164255] i915 0000:03:00.0: [drm:guc_enable_communication 
>>> [i915]] GuC communication enabled
>>> <6>[    8.166111] i915 0000:03:00.0: [drm] HuC authenticated
>>>
>>> Note that we increase the GT frequency all the way to the max before 
>>> starting the FW load, which speeds things up.
>>>
>>>>
>>>>>>>> However, do we really need to lie in the getparam? How about 
>>>>>>>> extend or add a new one to separate the loading vs loaded 
>>>>>>>> states? Since userspace does not support DG2 HuC yet this 
>>>>>>>> should be doable.
>>>>>>>
>>>>>>> I don't really have a preference here. The media team asked us 
>>>>>>> to do it this way because they wouldn't have a use for the 
>>>>>>> different "in progress" and "done" states. If they're ok with 
>>>>>>> having separate flags that's fine by me.
>>>>>>> Tony, any feedback here?
>>>>>>
>>>>>> We don't even have any docs in i915_drm.h in terms of what it means:
>>>>>>
>>>>>> #define I915_PARAM_HUC_STATUS         42
>>>>>>
>>>>>> Seems to be a boolean. Status false vs true? Could you add some 
>>>>>> docs?
>>>>>
>>>>> There is documentation above intel_huc_check_status(), which is 
>>>>> also updated in this series. I can move that to i915_drm.h.
>>>>
>>>> That would be great, thanks.
>>>>
>>>> And with so rich return codes already documented and exposed via 
>>>> uapi - would we really need to add anything new for DG2 apart for 
>>>> userspace to know that if zero is returned (not a negative error 
>>>> value) it should retry? I mean is there another negative error 
>>>> missing which would prevent zero transitioning to one?
>>>
>>> I think if the auth fails we currently return 0, because the uc 
>>> state in that case would be "TRANSFERRED", i.e. DMA complete but not 
>>> fully enabled. I don't have anything against changing the FW state 
>>> to "ERROR" in this scenario and leave the 0 to mean "not done yet", 
>>> but I'd prefer the media team to comment on their needs for this 
>>> IOCTL before committing to anything.
>>
>>
>> Currently media doesn't differentiate "delayed loading is in 
>> progress" with "HuC is authenticated and running". If the HuC 
>> authentication eventually fails, the user needs to check the debugfs 
>> to know the reason. IMHO, it's not a big problem as this is what we 
>> do even when the IOCTL returns non-zero values. + Carl to comment.
>
> (Side note - debugfs can be assumed to not exist so it is not 
> interesting to users.)
>
> There isn't currently a "delayed loading is in progress" state, that's 
> the discussion in this thread, if and how to add it.
>
> Getparam it currently documents these states:
>
>  -ENODEV if HuC is not present on this platform,
>  -EOPNOTSUPP if HuC firmware is disabled,
>  -ENOPKG if HuC firmware was not installed,
>  -ENOEXEC if HuC firmware is invalid or mismatched,
>  0 if HuC firmware is not running,
>  1 if HuC firmware is authenticated and running.
>
> This patch proposed to change this to:
>
>  1 if HuC firmware is authenticated and running or if delayed load is 
> in progress,
>  0 if HuC firmware is not running and delayed load is not in progress
>
> Alternative idea is for DG2 (well in general) to add some more fine 
> grained states, so that i915 does not have to use 1 for both running 
> and loading. This may be adding a new error code for auth fails as 
> Daniele mentioned. Then UMD can know that if 0 is returned and
> platform is DG2 it needs to query it again since it will transition to 
> either 1 or error eventually. This would mean the non error states 
> would be:
>
>  0 not running (aka loading)
>  1 running (and authenticated)
>
> @Daniele - one more thing - can you make sure in the series (if you 
> haven't already) that if HuC status was in any error before suspend 
> reload is not re-tried on resume? My thinking is that the error is 
> likely to persist and we don't want to impose long delay on every 
> resume afterwards. Makes sense to you?
>
> @Tony - one more question for the UMD. Or two.
>
> How prevalent is usage of HuC on DG2 depending on what codecs need it? 
> Do you know in advance, before creating a GEM context, that HuC 
> commands will be sent to the engine or this changes at runtime?

HuC is needed for all codecs while HW bit rate control (CBR, VBR) is in 
use. It's also used by content protection. And UMD doesn't know if it 
will be used later at context creation time.

Thanks,

Tony

>
> Regards,
>
> Tvrtko

Zhang, Carl June 16, 2022, 2:28 a.m. UTC | #16

> On From: Ye, Tony <tony.ye@intel.com>
> Sent: Thursday, June 16, 2022 12:15 AM
> 
> 
> On 6/15/2022 3:13 AM, Tvrtko Ursulin wrote:
> >
> > On 15/06/2022 00:15, Ye, Tony wrote:
> >> On 6/14/2022 8:30 AM, Ceraolo Spurio, Daniele wrote:
> >>> On 6/14/2022 12:44 AM, Tvrtko Ursulin wrote:
> >>>>
> >>>> On 13/06/2022 19:13, Ceraolo Spurio, Daniele wrote:
> >>>>> On 6/13/2022 10:39 AM, Tvrtko Ursulin wrote:
> >>>>>> On 13/06/2022 18:06, Ceraolo Spurio, Daniele wrote:
> >>>>>>> On 6/13/2022 9:56 AM, Tvrtko Ursulin wrote:
> >>>>>>>> On 13/06/2022 17:41, Ceraolo Spurio, Daniele wrote:
> >>>>>>>>> On 6/13/2022 9:31 AM, Tvrtko Ursulin wrote:
> >>>>>>>>>>
> >>>>>>>>>> On 13/06/2022 16:39, Ceraolo Spurio, Daniele wrote:
> >>>>>>>>>>> On 6/13/2022 1:16 AM, Tvrtko Ursulin wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> On 10/06/2022 00:19, Daniele Ceraolo Spurio wrote:
> >>>>>>>>>>>>> On DG2, HuC loading is performed by the GSC, via a PXP
> >>>>>>>>>>>>> command. The load operation itself is relatively simple
> >>>>>>>>>>>>> (just send a message to the GSC with the physical address
> >>>>>>>>>>>>> of the HuC in LMEM), but there are timing changes that
> >>>>>>>>>>>>> requires special attention. In particular, to send a PXP
> >>>>>>>>>>>>> command we need to first export the GSC driver and then
> >>>>>>>>>>>>> wait for the mei-gsc and mei-pxp modules to start, which
> >>>>>>>>>>>>> means that HuC load will complete after i915 load is
> >>>>>>>>>>>>> complete. This means that there is a small window of time
> >>>>>>>>>>>>> after i915 is registered and before HuC is loaded during
> >>>>>>>>>>>>> which userspace could submit and/or checking the HuC load
> >>>>>>>>>>>>> status, although this is quite unlikely to happen (HuC is
> >>>>>>>>>>>>> usually loaded before kernel init/resume completes).
> >>>>>>>>>>>>> We've consulted with the media team in regards to how to
> >>>>>>>>>>>>> handle this and they've asked us to do the following:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 1) Report HuC as loaded in the getparam IOCTL even if load
> >>>>>>>>>>>>> is still in progress. The media driver uses the IOCTL as a
> >>>>>>>>>>>>> way to check if HuC is enabled and then includes a
> >>>>>>>>>>>>> secondary check in the batches to get the actual status,
> >>>>>>>>>>>>> so doing it this way allows userspace to keep working
> >>>>>>>>>>>>> without changes.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 2) Stall all userspace VCS submission until HuC is loaded.
> >>>>>>>>>>>>> Stalls are
> >>>>>>>>>>>>> expected to be very rare (if any), due to the fact that
> >>>>>>>>>>>>> HuC is usually loaded before kernel init/resume is
> >>>>>>>>>>>>> completed.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Motivation to add these complications into i915 are not
> >>>>>>>>>>>> clear to me here. I mean there is no HuC on DG2 _yet_ is
> >>>>>>>>>>>> the premise of the series, right? So no backwards
> >>>>>>>>>>>> compatibility concerns. In this case why jump through the
> >>>>>>>>>>>> hoops and not let userspace handle all of this by just
> >>>>>>>>>>>> leaving the getparam return the true status?
> >>>>>>>>>>>
> >>>>>>>>>>> The main areas impacted by the fact that we can't guarantee
> >>>>>>>>>>> that HuC load is complete when i915 starts accepting
> >>>>>>>>>>> submissions are boot and suspend/resume, with the latter
> >>>>>>>>>>> being the main problem; GT reset is not a concern because
> >>>>>>>>>>> HuC now survives it. A suspend/resume can be transparent to
> >>>>>>>>>>> userspace and therefore the HuC status can temporarily flip
> >>>>>>>>>>> from loaded to not without userspace knowledge, especially
> >>>>>>>>>>> if we start going into deeper suspend states and start
> >>>>>>>>>>> causing HuC resets when we go into runtime suspend. Note
> >>>>>>>>>>> that this is different from what happens during GT reset for
> >>>>>>>>>>> older platforms, because in that scenario we guarantee that
> >>>>>>>>>>> HuC reload is complete before we restart the submission
> >>>>>>>>>>> back-end, so userspace doesn't notice that the HuC status
> >>>>>>>>>>> change. We had an internal discussion about this problem
> >>>>>>>>>>> with both media and i915 archs and the conclusion was that
> >>>>>>>>>>> the best option is for i915 to stall media submission while
> >>>>>>>>>>> HuC (re-)load is in progress.
> >>>>>>>>>>
> >>>>>>>>>> Resume is potentialy a good reason - I did not pick up on
> >>>>>>>>>> that from the cover letter. I read the statement about the
> >>>>>>>>>> unlikely and small window where HuC is not loaded during
> >>>>>>>>>> kernel init/resume and I guess did not pick up on the resume
> >>>>>>>>>> part.
> >>>>>>>>>>
> >>>>>>>>>> Waiting for GSC to load HuC from i915 resume is not an option?
> >>>>>>>>>
> >>>>>>>>> GSC is an aux device exported by i915, so AFAIU GSC resume
> >>>>>>>>> can't start until i915 resume completes.
> >>>>>>>>
> >>>>>>>> I'll dig into this in the next few days since I want to
> >>>>>>>> understand how exactly it works. Or someone can help explain.
> >>>>>>>>
> >>>>>>>> If in the end conclusion will be that i915 resume indeed cannot
> >>>>>>>> wait for GSC, then I think auto-blocking of queued up contexts
> >>>>>>>> on media engines indeed sounds unavoidable. Otherwise, as you
> >>>>>>>> explained, user experience post resume wouldn't be good.
> >>>>>>>
> >>>>>>> Even if we could implement a wait, I'm not sure we should. GSC
> >>>>>>> resume and HuC reload takes ~300ms in most cases, I don't think
> >>>>>>> we want to block within the i915 resume path for that long.
> >>>>>>
> >>>>>> Yeah maybe not. But entertaining the idea that it is technically
> >>>>>> possible to block - we could perhaps add uapi for userspace to
> >>>>>> mark contexts which want HuC access. Then track if there are any
> >>>>>> such contexts with outstanding submissions and only wait in
> >>>>>> resume if there are. If that would end up significantly less code
> >>>>>> on the i915 side to maintain is an open.
> >>>>>>
> >>>>>> What would be the end result from users point of view in case
> >>>>>> where it suspended during video playback? The proposed solution
> >>>>>> from this series sees the video stuck after resume. Maybe
> >>>>>> compositor blocks as well since I am not sure how well they
> >>>>>> handle one window not providing new data. Probably depends on
> the
> >>>>>> compositor.
> >>>>>>
> >>>>>> And then with a simpler solution definitely the whole resume
> >>>>>> would be delayed by 300ms.
> >>>>>>
> >>>>>> With my ChromeOS hat the stalled media engines does sound like a
> >>>>>> better solution. But with the maintainer hat I'd like all options
> >>>>>> evaluated since there is attractiveness if a good enough solution
> >>>>>> can be achieved with significantly less kernel code.
> >>>>>>
> >>>>>> You say 300ms is typical time for HuC load. How long it is on
> >>>>>> other platforms? If much faster then why is it so slow here?
> >>>>>
> >>>>> The GSC itself has to come out of suspend before it can perform
> >>>>> the load, which takes a few tens of ms I believe. AFAIU the GSC is
> >>>>> also slower in processing the HuC load and auth compared to the
> >>>>> legacy path. The GSC FW team gave a 250ms limit for the time the
> >>>>> GSC FW needs from start of the resume flow to HuC load complete,
> >>>>> so I bumped that to ~300ms to account for all other SW
> >>>>> interactions, plus a bit of buffer. Note that a bit of the SW
> >>>>> overhead is caused by the fact that we have 2 mei modules in play
> >>>>> here: mei-gsc, which manages the GSC device itself (including
> >>>>> resume), and mei-pxp, which owns the pxp messaging, including HuC
> >>>>> load.
> >>>>
> >>>> And how long on other platforms (not DG2) do you know? Presumably
> >>>> there the wait is on the i915 resume path?
> >>>
> >>> I don't have "official" expected load times at hand, but looking at
> >>> the BAT boot logs for this series for DG1 I see it takes ~10 ms to
> >>> load both GuC and HuC:
> >>>
> >>> <7>[    8.157838] i915 0000:03:00.0: [drm:intel_huc_init [i915]] GSC
> >>> loads huc=no <6>[    8.158632] i915 0000:03:00.0: [drm] GuC firmware
> >>> i915/dg1_guc_70.1.1.bin version 70.1 <6>[    8.158634] i915
> >>> 0000:03:00.0: [drm] HuC firmware i915/dg1_huc_7.9.3.bin version 7.9
> >>> <7>[    8.164255] i915 0000:03:00.0: [drm:guc_enable_communication
> >>> [i915]] GuC communication enabled <6>[    8.166111] i915
> >>> 0000:03:00.0: [drm] HuC authenticated
> >>>
> >>> Note that we increase the GT frequency all the way to the max before
> >>> starting the FW load, which speeds things up.
> >>>
> >>>>
> >>>>>>>> However, do we really need to lie in the getparam? How about
> >>>>>>>> extend or add a new one to separate the loading vs loaded
> >>>>>>>> states? Since userspace does not support DG2 HuC yet this
> >>>>>>>> should be doable.
> >>>>>>>
> >>>>>>> I don't really have a preference here. The media team asked us
> >>>>>>> to do it this way because they wouldn't have a use for the
> >>>>>>> different "in progress" and "done" states. If they're ok with
> >>>>>>> having separate flags that's fine by me.
> >>>>>>> Tony, any feedback here?
> >>>>>>
> >>>>>> We don't even have any docs in i915_drm.h in terms of what it
> means:
> >>>>>>
> >>>>>> #define I915_PARAM_HUC_STATUS         42
> >>>>>>
> >>>>>> Seems to be a boolean. Status false vs true? Could you add some
> >>>>>> docs?
> >>>>>
> >>>>> There is documentation above intel_huc_check_status(), which is
> >>>>> also updated in this series. I can move that to i915_drm.h.
> >>>>
> >>>> That would be great, thanks.
> >>>>
> >>>> And with so rich return codes already documented and exposed via
> >>>> uapi - would we really need to add anything new for DG2 apart for
> >>>> userspace to know that if zero is returned (not a negative error
> >>>> value) it should retry? I mean is there another negative error
> >>>> missing which would prevent zero transitioning to one?
> >>>
> >>> I think if the auth fails we currently return 0, because the uc
> >>> state in that case would be "TRANSFERRED", i.e. DMA complete but not
> >>> fully enabled. I don't have anything against changing the FW state
> >>> to "ERROR" in this scenario and leave the 0 to mean "not done yet",
> >>> but I'd prefer the media team to comment on their needs for this
> >>> IOCTL before committing to anything.
> >>
> >>
> >> Currently media doesn't differentiate "delayed loading is in
> >> progress" with "HuC is authenticated and running". If the HuC
> >> authentication eventually fails, the user needs to check the debugfs
> >> to know the reason. IMHO, it's not a big problem as this is what we
> >> do even when the IOCTL returns non-zero values. + Carl to comment.
> >
> > (Side note - debugfs can be assumed to not exist so it is not
> > interesting to users.)
> >
> > There isn't currently a "delayed loading is in progress" state, that's
> > the discussion in this thread, if and how to add it.
> >
> > Getparam it currently documents these states:
> >
> >  -ENODEV if HuC is not present on this platform,
> >  -EOPNOTSUPP if HuC firmware is disabled,
> >  -ENOPKG if HuC firmware was not installed,
> >  -ENOEXEC if HuC firmware is invalid or mismatched,
> >  0 if HuC firmware is not running,
> >  1 if HuC firmware is authenticated and running.
> >
> > This patch proposed to change this to:
> >
> >  1 if HuC firmware is authenticated and running or if delayed load is
> > in progress,
> >  0 if HuC firmware is not running and delayed load is not in progress
> >
> > Alternative idea is for DG2 (well in general) to add some more fine
> > grained states, so that i915 does not have to use 1 for both running
> > and loading. This may be adding a new error code for auth fails as
> > Daniele mentioned. Then UMD can know that if 0 is returned and
> > platform is DG2 it needs to query it again since it will transition to
> > either 1 or error eventually. This would mean the non error states
> > would be:
> >
> >  0 not running (aka loading)
> >  1 running (and authenticated)
> >
> > @Daniele - one more thing - can you make sure in the series (if you
> > haven't already) that if HuC status was in any error before suspend
> > reload is not re-tried on resume? My thinking is that the error is
> > likely to persist and we don't want to impose long delay on every
> > resume afterwards. Makes sense to you?
> >
> > @Tony - one more question for the UMD. Or two.
> >
> > How prevalent is usage of HuC on DG2 depending on what codecs need it?
> > Do you know in advance, before creating a GEM context, that HuC
> > commands will be sent to the engine or this changes at runtime?
> 
> HuC is needed for all codecs while HW bit rate control (CBR, VBR) is in use.
> It's also used by content protection. And UMD doesn't know if it will be used
> later at context creation time.
> 
from UMD perspective, We don’t care much on the normal initialization process
because, I could not image that a system is boot up, and user select a crypted content
to playback, and huc is still not ready.
of course, We are  also ok to query the huc status twice, and wait if the status is "0 not running"
to avoid potential issue.

I suppose the main possible issue will happen in the hibernation/awake process, it is transparent to UMD.
UMD will not call ioctrl  to query huc status in this process, and will continue to send command buffer to KMD.


> Thanks,
> 
> Tony
> 
> >
> > Regards,
> >
> > Tvrtko

Tvrtko Ursulin June 16, 2022, 7:10 a.m. UTC | #17

On 15/06/2022 17:14, Ye, Tony wrote:
> On 6/15/2022 3:13 AM, Tvrtko Ursulin wrote:
>> On 15/06/2022 00:15, Ye, Tony wrote:
>>> On 6/14/2022 8:30 AM, Ceraolo Spurio, Daniele wrote:
>>>> On 6/14/2022 12:44 AM, Tvrtko Ursulin wrote:
>>>>>
>>>>> On 13/06/2022 19:13, Ceraolo Spurio, Daniele wrote:
>>>>>> On 6/13/2022 10:39 AM, Tvrtko Ursulin wrote:
>>>>>>> On 13/06/2022 18:06, Ceraolo Spurio, Daniele wrote:
>>>>>>>> On 6/13/2022 9:56 AM, Tvrtko Ursulin wrote:
>>>>>>>>> On 13/06/2022 17:41, Ceraolo Spurio, Daniele wrote:
>>>>>>>>>> On 6/13/2022 9:31 AM, Tvrtko Ursulin wrote:
>>>>>>>>>>>
>>>>>>>>>>> On 13/06/2022 16:39, Ceraolo Spurio, Daniele wrote:
>>>>>>>>>>>> On 6/13/2022 1:16 AM, Tvrtko Ursulin wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 10/06/2022 00:19, Daniele Ceraolo Spurio wrote:
>>>>>>>>>>>>>> On DG2, HuC loading is performed by the GSC, via a PXP 
>>>>>>>>>>>>>> command. The load
>>>>>>>>>>>>>> operation itself is relatively simple (just send a message 
>>>>>>>>>>>>>> to the GSC
>>>>>>>>>>>>>> with the physical address of the HuC in LMEM), but there 
>>>>>>>>>>>>>> are timing
>>>>>>>>>>>>>> changes that requires special attention. In particular, to 
>>>>>>>>>>>>>> send a PXP
>>>>>>>>>>>>>> command we need to first export the GSC driver and then 
>>>>>>>>>>>>>> wait for the
>>>>>>>>>>>>>> mei-gsc and mei-pxp modules to start, which means that HuC 
>>>>>>>>>>>>>> load will
>>>>>>>>>>>>>> complete after i915 load is complete. This means that 
>>>>>>>>>>>>>> there is a small
>>>>>>>>>>>>>> window of time after i915 is registered and before HuC is 
>>>>>>>>>>>>>> loaded
>>>>>>>>>>>>>> during which userspace could submit and/or checking the 
>>>>>>>>>>>>>> HuC load status,
>>>>>>>>>>>>>> although this is quite unlikely to happen (HuC is usually 
>>>>>>>>>>>>>> loaded before
>>>>>>>>>>>>>> kernel init/resume completes).
>>>>>>>>>>>>>> We've consulted with the media team in regards to how to 
>>>>>>>>>>>>>> handle this and
>>>>>>>>>>>>>> they've asked us to do the following:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 1) Report HuC as loaded in the getparam IOCTL even if load 
>>>>>>>>>>>>>> is still in
>>>>>>>>>>>>>> progress. The media driver uses the IOCTL as a way to 
>>>>>>>>>>>>>> check if HuC is
>>>>>>>>>>>>>> enabled and then includes a secondary check in the batches 
>>>>>>>>>>>>>> to get the
>>>>>>>>>>>>>> actual status, so doing it this way allows userspace to 
>>>>>>>>>>>>>> keep working
>>>>>>>>>>>>>> without changes.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2) Stall all userspace VCS submission until HuC is loaded. 
>>>>>>>>>>>>>> Stalls are
>>>>>>>>>>>>>> expected to be very rare (if any), due to the fact that 
>>>>>>>>>>>>>> HuC is usually
>>>>>>>>>>>>>> loaded before kernel init/resume is completed.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Motivation to add these complications into i915 are not 
>>>>>>>>>>>>> clear to me here. I mean there is no HuC on DG2 _yet_ is 
>>>>>>>>>>>>> the premise of the series, right? So no backwards 
>>>>>>>>>>>>> compatibility concerns. In this case why jump through the 
>>>>>>>>>>>>> hoops and not let userspace handle all of this by just 
>>>>>>>>>>>>> leaving the getparam return the true status?
>>>>>>>>>>>>
>>>>>>>>>>>> The main areas impacted by the fact that we can't guarantee 
>>>>>>>>>>>> that HuC load is complete when i915 starts accepting 
>>>>>>>>>>>> submissions are boot and suspend/resume, with the latter 
>>>>>>>>>>>> being the main problem; GT reset is not a concern because 
>>>>>>>>>>>> HuC now survives it. A suspend/resume can be transparent to 
>>>>>>>>>>>> userspace and therefore the HuC status can temporarily flip 
>>>>>>>>>>>> from loaded to not without userspace knowledge, especially 
>>>>>>>>>>>> if we start going into deeper suspend states and start 
>>>>>>>>>>>> causing HuC resets when we go into runtime suspend. Note 
>>>>>>>>>>>> that this is different from what happens during GT reset for 
>>>>>>>>>>>> older platforms, because in that scenario we guarantee that 
>>>>>>>>>>>> HuC reload is complete before we restart the submission 
>>>>>>>>>>>> back-end, so userspace doesn't notice that the HuC status 
>>>>>>>>>>>> change. We had an internal discussion about this problem 
>>>>>>>>>>>> with both media and i915 archs and the conclusion was that 
>>>>>>>>>>>> the best option is for i915 to stall media submission while 
>>>>>>>>>>>> HuC (re-)load is in progress.
>>>>>>>>>>>
>>>>>>>>>>> Resume is potentialy a good reason - I did not pick up on 
>>>>>>>>>>> that from the cover letter. I read the statement about the 
>>>>>>>>>>> unlikely and small window where HuC is not loaded during 
>>>>>>>>>>> kernel init/resume and I guess did not pick up on the resume 
>>>>>>>>>>> part.
>>>>>>>>>>>
>>>>>>>>>>> Waiting for GSC to load HuC from i915 resume is not an option?
>>>>>>>>>>
>>>>>>>>>> GSC is an aux device exported by i915, so AFAIU GSC resume 
>>>>>>>>>> can't start until i915 resume completes.
>>>>>>>>>
>>>>>>>>> I'll dig into this in the next few days since I want to 
>>>>>>>>> understand how exactly it works. Or someone can help explain.
>>>>>>>>>
>>>>>>>>> If in the end conclusion will be that i915 resume indeed cannot 
>>>>>>>>> wait for GSC, then I think auto-blocking of queued up contexts 
>>>>>>>>> on media engines indeed sounds unavoidable. Otherwise, as you 
>>>>>>>>> explained, user experience post resume wouldn't be good.
>>>>>>>>
>>>>>>>> Even if we could implement a wait, I'm not sure we should. GSC 
>>>>>>>> resume and HuC reload takes ~300ms in most cases, I don't think 
>>>>>>>> we want to block within the i915 resume path for that long.
>>>>>>>
>>>>>>> Yeah maybe not. But entertaining the idea that it is technically 
>>>>>>> possible to block - we could perhaps add uapi for userspace to 
>>>>>>> mark contexts which want HuC access. Then track if there are any 
>>>>>>> such contexts with outstanding submissions and only wait in 
>>>>>>> resume if there are. If that would end up significantly less code 
>>>>>>> on the i915 side to maintain is an open.
>>>>>>>
>>>>>>> What would be the end result from users point of view in case 
>>>>>>> where it suspended during video playback? The proposed solution 
>>>>>>> from this series sees the video stuck after resume. Maybe 
>>>>>>> compositor blocks as well since I am not sure how well they 
>>>>>>> handle one window not providing new data. Probably depends on the 
>>>>>>> compositor.
>>>>>>>
>>>>>>> And then with a simpler solution definitely the whole resume 
>>>>>>> would be delayed by 300ms.
>>>>>>>
>>>>>>> With my ChromeOS hat the stalled media engines does sound like a 
>>>>>>> better solution. But with the maintainer hat I'd like all options 
>>>>>>> evaluated since there is attractiveness if a good enough solution 
>>>>>>> can be achieved with significantly less kernel code.
>>>>>>>
>>>>>>> You say 300ms is typical time for HuC load. How long it is on 
>>>>>>> other platforms? If much faster then why is it so slow here?
>>>>>>
>>>>>> The GSC itself has to come out of suspend before it can perform 
>>>>>> the load, which takes a few tens of ms I believe. AFAIU the GSC is 
>>>>>> also slower in processing the HuC load and auth compared to the 
>>>>>> legacy path. The GSC FW team gave a 250ms limit for the time the 
>>>>>> GSC FW needs from start of the resume flow to HuC load complete, 
>>>>>> so I bumped that to ~300ms to account for all other SW 
>>>>>> interactions, plus a bit of buffer. Note that a bit of the SW 
>>>>>> overhead is caused by the fact that we have 2 mei modules in play 
>>>>>> here: mei-gsc, which manages the GSC device itself (including 
>>>>>> resume), and mei-pxp, which owns the pxp messaging, including HuC 
>>>>>> load.
>>>>>
>>>>> And how long on other platforms (not DG2) do you know? Presumably 
>>>>> there the wait is on the i915 resume path?
>>>>
>>>> I don't have "official" expected load times at hand, but looking at 
>>>> the BAT boot logs for this series for DG1 I see it takes ~10 ms to 
>>>> load both GuC and HuC:
>>>>
>>>> <7>[    8.157838] i915 0000:03:00.0: [drm:intel_huc_init [i915]] GSC 
>>>> loads huc=no
>>>> <6>[    8.158632] i915 0000:03:00.0: [drm] GuC firmware 
>>>> i915/dg1_guc_70.1.1.bin version 70.1
>>>> <6>[    8.158634] i915 0000:03:00.0: [drm] HuC firmware 
>>>> i915/dg1_huc_7.9.3.bin version 7.9
>>>> <7>[    8.164255] i915 0000:03:00.0: [drm:guc_enable_communication 
>>>> [i915]] GuC communication enabled
>>>> <6>[    8.166111] i915 0000:03:00.0: [drm] HuC authenticated
>>>>
>>>> Note that we increase the GT frequency all the way to the max before 
>>>> starting the FW load, which speeds things up.
>>>>
>>>>>
>>>>>>>>> However, do we really need to lie in the getparam? How about 
>>>>>>>>> extend or add a new one to separate the loading vs loaded 
>>>>>>>>> states? Since userspace does not support DG2 HuC yet this 
>>>>>>>>> should be doable.
>>>>>>>>
>>>>>>>> I don't really have a preference here. The media team asked us 
>>>>>>>> to do it this way because they wouldn't have a use for the 
>>>>>>>> different "in progress" and "done" states. If they're ok with 
>>>>>>>> having separate flags that's fine by me.
>>>>>>>> Tony, any feedback here?
>>>>>>>
>>>>>>> We don't even have any docs in i915_drm.h in terms of what it means:
>>>>>>>
>>>>>>> #define I915_PARAM_HUC_STATUS         42
>>>>>>>
>>>>>>> Seems to be a boolean. Status false vs true? Could you add some 
>>>>>>> docs?
>>>>>>
>>>>>> There is documentation above intel_huc_check_status(), which is 
>>>>>> also updated in this series. I can move that to i915_drm.h.
>>>>>
>>>>> That would be great, thanks.
>>>>>
>>>>> And with so rich return codes already documented and exposed via 
>>>>> uapi - would we really need to add anything new for DG2 apart for 
>>>>> userspace to know that if zero is returned (not a negative error 
>>>>> value) it should retry? I mean is there another negative error 
>>>>> missing which would prevent zero transitioning to one?
>>>>
>>>> I think if the auth fails we currently return 0, because the uc 
>>>> state in that case would be "TRANSFERRED", i.e. DMA complete but not 
>>>> fully enabled. I don't have anything against changing the FW state 
>>>> to "ERROR" in this scenario and leave the 0 to mean "not done yet", 
>>>> but I'd prefer the media team to comment on their needs for this 
>>>> IOCTL before committing to anything.
>>>
>>>
>>> Currently media doesn't differentiate "delayed loading is in 
>>> progress" with "HuC is authenticated and running". If the HuC 
>>> authentication eventually fails, the user needs to check the debugfs 
>>> to know the reason. IMHO, it's not a big problem as this is what we 
>>> do even when the IOCTL returns non-zero values. + Carl to comment.
>>
>> (Side note - debugfs can be assumed to not exist so it is not 
>> interesting to users.)
>>
>> There isn't currently a "delayed loading is in progress" state, that's 
>> the discussion in this thread, if and how to add it.
>>
>> Getparam it currently documents these states:
>>
>>  -ENODEV if HuC is not present on this platform,
>>  -EOPNOTSUPP if HuC firmware is disabled,
>>  -ENOPKG if HuC firmware was not installed,
>>  -ENOEXEC if HuC firmware is invalid or mismatched,
>>  0 if HuC firmware is not running,
>>  1 if HuC firmware is authenticated and running.
>>
>> This patch proposed to change this to:
>>
>>  1 if HuC firmware is authenticated and running or if delayed load is 
>> in progress,
>>  0 if HuC firmware is not running and delayed load is not in progress
>>
>> Alternative idea is for DG2 (well in general) to add some more fine 
>> grained states, so that i915 does not have to use 1 for both running 
>> and loading. This may be adding a new error code for auth fails as 
>> Daniele mentioned. Then UMD can know that if 0 is returned and
>> platform is DG2 it needs to query it again since it will transition to 
>> either 1 or error eventually. This would mean the non error states 
>> would be:
>>
>>  0 not running (aka loading)
>>  1 running (and authenticated)
>>
>> @Daniele - one more thing - can you make sure in the series (if you 
>> haven't already) that if HuC status was in any error before suspend 
>> reload is not re-tried on resume? My thinking is that the error is 
>> likely to persist and we don't want to impose long delay on every 
>> resume afterwards. Makes sense to you?
>>
>> @Tony - one more question for the UMD. Or two.
>>
>> How prevalent is usage of HuC on DG2 depending on what codecs need it? 
>> Do you know in advance, before creating a GEM context, that HuC 
>> commands will be sent to the engine or this changes at runtime?
> 
> HuC is needed for all codecs while HW bit rate control (CBR, VBR) is in 
> use. It's also used by content protection. And UMD doesn't know if it 
> will be used later at context creation time.

Bit rate control implies encode only? So encode and protected content 
decode?

Regards,

Tvrtko

Daniele Ceraolo Spurio July 5, 2022, 11:30 p.m. UTC | #18

On 6/15/2022 7:28 PM, Zhang, Carl wrote:
>> On From: Ye, Tony <tony.ye@intel.com>
>> Sent: Thursday, June 16, 2022 12:15 AM
>>
>>
>> On 6/15/2022 3:13 AM, Tvrtko Ursulin wrote:
>>> On 15/06/2022 00:15, Ye, Tony wrote:
>>>> On 6/14/2022 8:30 AM, Ceraolo Spurio, Daniele wrote:
>>>>> On 6/14/2022 12:44 AM, Tvrtko Ursulin wrote:
>>>>>> On 13/06/2022 19:13, Ceraolo Spurio, Daniele wrote:
>>>>>>> On 6/13/2022 10:39 AM, Tvrtko Ursulin wrote:
>>>>>>>> On 13/06/2022 18:06, Ceraolo Spurio, Daniele wrote:
>>>>>>>>> On 6/13/2022 9:56 AM, Tvrtko Ursulin wrote:
>>>>>>>>>> On 13/06/2022 17:41, Ceraolo Spurio, Daniele wrote:
>>>>>>>>>>> On 6/13/2022 9:31 AM, Tvrtko Ursulin wrote:
>>>>>>>>>>>> On 13/06/2022 16:39, Ceraolo Spurio, Daniele wrote:
>>>>>>>>>>>>> On 6/13/2022 1:16 AM, Tvrtko Ursulin wrote:
>>>>>>>>>>>>>> On 10/06/2022 00:19, Daniele Ceraolo Spurio wrote:
>>>>>>>>>>>>>>> On DG2, HuC loading is performed by the GSC, via a PXP
>>>>>>>>>>>>>>> command. The load operation itself is relatively simple
>>>>>>>>>>>>>>> (just send a message to the GSC with the physical address
>>>>>>>>>>>>>>> of the HuC in LMEM), but there are timing changes that
>>>>>>>>>>>>>>> requires special attention. In particular, to send a PXP
>>>>>>>>>>>>>>> command we need to first export the GSC driver and then
>>>>>>>>>>>>>>> wait for the mei-gsc and mei-pxp modules to start, which
>>>>>>>>>>>>>>> means that HuC load will complete after i915 load is
>>>>>>>>>>>>>>> complete. This means that there is a small window of time
>>>>>>>>>>>>>>> after i915 is registered and before HuC is loaded during
>>>>>>>>>>>>>>> which userspace could submit and/or checking the HuC load
>>>>>>>>>>>>>>> status, although this is quite unlikely to happen (HuC is
>>>>>>>>>>>>>>> usually loaded before kernel init/resume completes).
>>>>>>>>>>>>>>> We've consulted with the media team in regards to how to
>>>>>>>>>>>>>>> handle this and they've asked us to do the following:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 1) Report HuC as loaded in the getparam IOCTL even if load
>>>>>>>>>>>>>>> is still in progress. The media driver uses the IOCTL as a
>>>>>>>>>>>>>>> way to check if HuC is enabled and then includes a
>>>>>>>>>>>>>>> secondary check in the batches to get the actual status,
>>>>>>>>>>>>>>> so doing it this way allows userspace to keep working
>>>>>>>>>>>>>>> without changes.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2) Stall all userspace VCS submission until HuC is loaded.
>>>>>>>>>>>>>>> Stalls are
>>>>>>>>>>>>>>> expected to be very rare (if any), due to the fact that
>>>>>>>>>>>>>>> HuC is usually loaded before kernel init/resume is
>>>>>>>>>>>>>>> completed.
>>>>>>>>>>>>>> Motivation to add these complications into i915 are not
>>>>>>>>>>>>>> clear to me here. I mean there is no HuC on DG2 _yet_ is
>>>>>>>>>>>>>> the premise of the series, right? So no backwards
>>>>>>>>>>>>>> compatibility concerns. In this case why jump through the
>>>>>>>>>>>>>> hoops and not let userspace handle all of this by just
>>>>>>>>>>>>>> leaving the getparam return the true status?
>>>>>>>>>>>>> The main areas impacted by the fact that we can't guarantee
>>>>>>>>>>>>> that HuC load is complete when i915 starts accepting
>>>>>>>>>>>>> submissions are boot and suspend/resume, with the latter
>>>>>>>>>>>>> being the main problem; GT reset is not a concern because
>>>>>>>>>>>>> HuC now survives it. A suspend/resume can be transparent to
>>>>>>>>>>>>> userspace and therefore the HuC status can temporarily flip
>>>>>>>>>>>>> from loaded to not without userspace knowledge, especially
>>>>>>>>>>>>> if we start going into deeper suspend states and start
>>>>>>>>>>>>> causing HuC resets when we go into runtime suspend. Note
>>>>>>>>>>>>> that this is different from what happens during GT reset for
>>>>>>>>>>>>> older platforms, because in that scenario we guarantee that
>>>>>>>>>>>>> HuC reload is complete before we restart the submission
>>>>>>>>>>>>> back-end, so userspace doesn't notice that the HuC status
>>>>>>>>>>>>> change. We had an internal discussion about this problem
>>>>>>>>>>>>> with both media and i915 archs and the conclusion was that
>>>>>>>>>>>>> the best option is for i915 to stall media submission while
>>>>>>>>>>>>> HuC (re-)load is in progress.
>>>>>>>>>>>> Resume is potentialy a good reason - I did not pick up on
>>>>>>>>>>>> that from the cover letter. I read the statement about the
>>>>>>>>>>>> unlikely and small window where HuC is not loaded during
>>>>>>>>>>>> kernel init/resume and I guess did not pick up on the resume
>>>>>>>>>>>> part.
>>>>>>>>>>>>
>>>>>>>>>>>> Waiting for GSC to load HuC from i915 resume is not an option?
>>>>>>>>>>> GSC is an aux device exported by i915, so AFAIU GSC resume
>>>>>>>>>>> can't start until i915 resume completes.
>>>>>>>>>> I'll dig into this in the next few days since I want to
>>>>>>>>>> understand how exactly it works. Or someone can help explain.
>>>>>>>>>>
>>>>>>>>>> If in the end conclusion will be that i915 resume indeed cannot
>>>>>>>>>> wait for GSC, then I think auto-blocking of queued up contexts
>>>>>>>>>> on media engines indeed sounds unavoidable. Otherwise, as you
>>>>>>>>>> explained, user experience post resume wouldn't be good.
>>>>>>>>> Even if we could implement a wait, I'm not sure we should. GSC
>>>>>>>>> resume and HuC reload takes ~300ms in most cases, I don't think
>>>>>>>>> we want to block within the i915 resume path for that long.
>>>>>>>> Yeah maybe not. But entertaining the idea that it is technically
>>>>>>>> possible to block - we could perhaps add uapi for userspace to
>>>>>>>> mark contexts which want HuC access. Then track if there are any
>>>>>>>> such contexts with outstanding submissions and only wait in
>>>>>>>> resume if there are. If that would end up significantly less code
>>>>>>>> on the i915 side to maintain is an open.
>>>>>>>>
>>>>>>>> What would be the end result from users point of view in case
>>>>>>>> where it suspended during video playback? The proposed solution
>>>>>>>> from this series sees the video stuck after resume. Maybe
>>>>>>>> compositor blocks as well since I am not sure how well they
>>>>>>>> handle one window not providing new data. Probably depends on
>> the
>>>>>>>> compositor.
>>>>>>>>
>>>>>>>> And then with a simpler solution definitely the whole resume
>>>>>>>> would be delayed by 300ms.
>>>>>>>>
>>>>>>>> With my ChromeOS hat the stalled media engines does sound like a
>>>>>>>> better solution. But with the maintainer hat I'd like all options
>>>>>>>> evaluated since there is attractiveness if a good enough solution
>>>>>>>> can be achieved with significantly less kernel code.
>>>>>>>>
>>>>>>>> You say 300ms is typical time for HuC load. How long it is on
>>>>>>>> other platforms? If much faster then why is it so slow here?
>>>>>>> The GSC itself has to come out of suspend before it can perform
>>>>>>> the load, which takes a few tens of ms I believe. AFAIU the GSC is
>>>>>>> also slower in processing the HuC load and auth compared to the
>>>>>>> legacy path. The GSC FW team gave a 250ms limit for the time the
>>>>>>> GSC FW needs from start of the resume flow to HuC load complete,
>>>>>>> so I bumped that to ~300ms to account for all other SW
>>>>>>> interactions, plus a bit of buffer. Note that a bit of the SW
>>>>>>> overhead is caused by the fact that we have 2 mei modules in play
>>>>>>> here: mei-gsc, which manages the GSC device itself (including
>>>>>>> resume), and mei-pxp, which owns the pxp messaging, including HuC
>>>>>>> load.
>>>>>> And how long on other platforms (not DG2) do you know? Presumably
>>>>>> there the wait is on the i915 resume path?
>>>>> I don't have "official" expected load times at hand, but looking at
>>>>> the BAT boot logs for this series for DG1 I see it takes ~10 ms to
>>>>> load both GuC and HuC:
>>>>>
>>>>> <7>[    8.157838] i915 0000:03:00.0: [drm:intel_huc_init [i915]] GSC
>>>>> loads huc=no <6>[    8.158632] i915 0000:03:00.0: [drm] GuC firmware
>>>>> i915/dg1_guc_70.1.1.bin version 70.1 <6>[    8.158634] i915
>>>>> 0000:03:00.0: [drm] HuC firmware i915/dg1_huc_7.9.3.bin version 7.9
>>>>> <7>[    8.164255] i915 0000:03:00.0: [drm:guc_enable_communication
>>>>> [i915]] GuC communication enabled <6>[    8.166111] i915
>>>>> 0000:03:00.0: [drm] HuC authenticated
>>>>>
>>>>> Note that we increase the GT frequency all the way to the max before
>>>>> starting the FW load, which speeds things up.
>>>>>
>>>>>>>>>> However, do we really need to lie in the getparam? How about
>>>>>>>>>> extend or add a new one to separate the loading vs loaded
>>>>>>>>>> states? Since userspace does not support DG2 HuC yet this
>>>>>>>>>> should be doable.
>>>>>>>>> I don't really have a preference here. The media team asked us
>>>>>>>>> to do it this way because they wouldn't have a use for the
>>>>>>>>> different "in progress" and "done" states. If they're ok with
>>>>>>>>> having separate flags that's fine by me.
>>>>>>>>> Tony, any feedback here?
>>>>>>>> We don't even have any docs in i915_drm.h in terms of what it
>> means:
>>>>>>>> #define I915_PARAM_HUC_STATUS         42
>>>>>>>>
>>>>>>>> Seems to be a boolean. Status false vs true? Could you add some
>>>>>>>> docs?
>>>>>>> There is documentation above intel_huc_check_status(), which is
>>>>>>> also updated in this series. I can move that to i915_drm.h.
>>>>>> That would be great, thanks.
>>>>>>
>>>>>> And with so rich return codes already documented and exposed via
>>>>>> uapi - would we really need to add anything new for DG2 apart for
>>>>>> userspace to know that if zero is returned (not a negative error
>>>>>> value) it should retry? I mean is there another negative error
>>>>>> missing which would prevent zero transitioning to one?
>>>>> I think if the auth fails we currently return 0, because the uc
>>>>> state in that case would be "TRANSFERRED", i.e. DMA complete but not
>>>>> fully enabled. I don't have anything against changing the FW state
>>>>> to "ERROR" in this scenario and leave the 0 to mean "not done yet",
>>>>> but I'd prefer the media team to comment on their needs for this
>>>>> IOCTL before committing to anything.
>>>>
>>>> Currently media doesn't differentiate "delayed loading is in
>>>> progress" with "HuC is authenticated and running". If the HuC
>>>> authentication eventually fails, the user needs to check the debugfs
>>>> to know the reason. IMHO, it's not a big problem as this is what we
>>>> do even when the IOCTL returns non-zero values. + Carl to comment.
>>> (Side note - debugfs can be assumed to not exist so it is not
>>> interesting to users.)
>>>
>>> There isn't currently a "delayed loading is in progress" state, that's
>>> the discussion in this thread, if and how to add it.
>>>
>>> Getparam it currently documents these states:
>>>
>>>   -ENODEV if HuC is not present on this platform,
>>>   -EOPNOTSUPP if HuC firmware is disabled,
>>>   -ENOPKG if HuC firmware was not installed,
>>>   -ENOEXEC if HuC firmware is invalid or mismatched,
>>>   0 if HuC firmware is not running,
>>>   1 if HuC firmware is authenticated and running.
>>>
>>> This patch proposed to change this to:
>>>
>>>   1 if HuC firmware is authenticated and running or if delayed load is
>>> in progress,
>>>   0 if HuC firmware is not running and delayed load is not in progress
>>>
>>> Alternative idea is for DG2 (well in general) to add some more fine
>>> grained states, so that i915 does not have to use 1 for both running
>>> and loading. This may be adding a new error code for auth fails as
>>> Daniele mentioned. Then UMD can know that if 0 is returned and
>>> platform is DG2 it needs to query it again since it will transition to
>>> either 1 or error eventually. This would mean the non error states
>>> would be:
>>>
>>>   0 not running (aka loading)
>>>   1 running (and authenticated)
>>>
>>> @Daniele - one more thing - can you make sure in the series (if you
>>> haven't already) that if HuC status was in any error before suspend
>>> reload is not re-tried on resume? My thinking is that the error is
>>> likely to persist and we don't want to impose long delay on every
>>> resume afterwards. Makes sense to you?
>>>
>>> @Tony - one more question for the UMD. Or two.
>>>
>>> How prevalent is usage of HuC on DG2 depending on what codecs need it?
>>> Do you know in advance, before creating a GEM context, that HuC
>>> commands will be sent to the engine or this changes at runtime?
>> HuC is needed for all codecs while HW bit rate control (CBR, VBR) is in use.
>> It's also used by content protection. And UMD doesn't know if it will be used
>> later at context creation time.
>>
> from UMD perspective, We don’t care much on the normal initialization process
> because, I could not image that a system is boot up, and user select a crypted content
> to playback, and huc is still not ready.
> of course, We are  also ok to query the huc status twice, and wait if the status is "0 not running"
> to avoid potential issue.
>
> I suppose the main possible issue will happen in the hibernation/awake process, it is transparent to UMD.
> UMD will not call ioctrl  to query huc status in this process, and will continue to send command buffer to KMD.

I think there is an agreement that it is ok to return 0 to mark the load 
still in progress and 1 for load & auth complete. However, double 
checking the code it turns out that we currently return 0 on load 
failure, even if that's not particularly clear from the comment. I can 
easily change that to be an error code, but not sure if that's 
considered an API breakage considering it's not a well documented 
behavior. I believe that on pre-DG2 userspace considers 1 as ok and 
everything else as failure, so changing the ioctl to return an error 
code on failure and 0 for load pending (with the latter being a 
DG2-esclusive code for now) should be safe, but I'd like confirmation 
that I'm not breaking API before sending the relevant code.

Thanks,
Daniele

>
>> Thanks,
>>
>> Tony
>>
>>> Regards,
>>>
>>> Tvrtko

Ye, Tony July 6, 2022, 5:26 p.m. UTC | #19

On 7/5/2022 4:30 PM, Ceraolo Spurio, Daniele wrote:
>
>
> On 6/15/2022 7:28 PM, Zhang, Carl wrote:
>>> On From: Ye, Tony <tony.ye@intel.com>
>>> Sent: Thursday, June 16, 2022 12:15 AM
>>>
>>>
>>> On 6/15/2022 3:13 AM, Tvrtko Ursulin wrote:
>>>> On 15/06/2022 00:15, Ye, Tony wrote:
>>>>> On 6/14/2022 8:30 AM, Ceraolo Spurio, Daniele wrote:
>>>>>> On 6/14/2022 12:44 AM, Tvrtko Ursulin wrote:
>>>>>>> On 13/06/2022 19:13, Ceraolo Spurio, Daniele wrote:
>>>>>>>> On 6/13/2022 10:39 AM, Tvrtko Ursulin wrote:
>>>>>>>>> On 13/06/2022 18:06, Ceraolo Spurio, Daniele wrote:
>>>>>>>>>> On 6/13/2022 9:56 AM, Tvrtko Ursulin wrote:
>>>>>>>>>>> On 13/06/2022 17:41, Ceraolo Spurio, Daniele wrote:
>>>>>>>>>>>> On 6/13/2022 9:31 AM, Tvrtko Ursulin wrote:
>>>>>>>>>>>>> On 13/06/2022 16:39, Ceraolo Spurio, Daniele wrote:
>>>>>>>>>>>>>> On 6/13/2022 1:16 AM, Tvrtko Ursulin wrote:
>>>>>>>>>>>>>>> On 10/06/2022 00:19, Daniele Ceraolo Spurio wrote:
>>>>>>>>>>>>>>>> On DG2, HuC loading is performed by the GSC, via a PXP
>>>>>>>>>>>>>>>> command. The load operation itself is relatively simple
>>>>>>>>>>>>>>>> (just send a message to the GSC with the physical address
>>>>>>>>>>>>>>>> of the HuC in LMEM), but there are timing changes that
>>>>>>>>>>>>>>>> requires special attention. In particular, to send a PXP
>>>>>>>>>>>>>>>> command we need to first export the GSC driver and then
>>>>>>>>>>>>>>>> wait for the mei-gsc and mei-pxp modules to start, which
>>>>>>>>>>>>>>>> means that HuC load will complete after i915 load is
>>>>>>>>>>>>>>>> complete. This means that there is a small window of time
>>>>>>>>>>>>>>>> after i915 is registered and before HuC is loaded during
>>>>>>>>>>>>>>>> which userspace could submit and/or checking the HuC load
>>>>>>>>>>>>>>>> status, although this is quite unlikely to happen (HuC is
>>>>>>>>>>>>>>>> usually loaded before kernel init/resume completes).
>>>>>>>>>>>>>>>> We've consulted with the media team in regards to how to
>>>>>>>>>>>>>>>> handle this and they've asked us to do the following:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 1) Report HuC as loaded in the getparam IOCTL even if load
>>>>>>>>>>>>>>>> is still in progress. The media driver uses the IOCTL as a
>>>>>>>>>>>>>>>> way to check if HuC is enabled and then includes a
>>>>>>>>>>>>>>>> secondary check in the batches to get the actual status,
>>>>>>>>>>>>>>>> so doing it this way allows userspace to keep working
>>>>>>>>>>>>>>>> without changes.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2) Stall all userspace VCS submission until HuC is loaded.
>>>>>>>>>>>>>>>> Stalls are
>>>>>>>>>>>>>>>> expected to be very rare (if any), due to the fact that
>>>>>>>>>>>>>>>> HuC is usually loaded before kernel init/resume is
>>>>>>>>>>>>>>>> completed.
>>>>>>>>>>>>>>> Motivation to add these complications into i915 are not
>>>>>>>>>>>>>>> clear to me here. I mean there is no HuC on DG2 _yet_ is
>>>>>>>>>>>>>>> the premise of the series, right? So no backwards
>>>>>>>>>>>>>>> compatibility concerns. In this case why jump through the
>>>>>>>>>>>>>>> hoops and not let userspace handle all of this by just
>>>>>>>>>>>>>>> leaving the getparam return the true status?
>>>>>>>>>>>>>> The main areas impacted by the fact that we can't guarantee
>>>>>>>>>>>>>> that HuC load is complete when i915 starts accepting
>>>>>>>>>>>>>> submissions are boot and suspend/resume, with the latter
>>>>>>>>>>>>>> being the main problem; GT reset is not a concern because
>>>>>>>>>>>>>> HuC now survives it. A suspend/resume can be transparent to
>>>>>>>>>>>>>> userspace and therefore the HuC status can temporarily flip
>>>>>>>>>>>>>> from loaded to not without userspace knowledge, especially
>>>>>>>>>>>>>> if we start going into deeper suspend states and start
>>>>>>>>>>>>>> causing HuC resets when we go into runtime suspend. Note
>>>>>>>>>>>>>> that this is different from what happens during GT reset for
>>>>>>>>>>>>>> older platforms, because in that scenario we guarantee that
>>>>>>>>>>>>>> HuC reload is complete before we restart the submission
>>>>>>>>>>>>>> back-end, so userspace doesn't notice that the HuC status
>>>>>>>>>>>>>> change. We had an internal discussion about this problem
>>>>>>>>>>>>>> with both media and i915 archs and the conclusion was that
>>>>>>>>>>>>>> the best option is for i915 to stall media submission while
>>>>>>>>>>>>>> HuC (re-)load is in progress.
>>>>>>>>>>>>> Resume is potentialy a good reason - I did not pick up on
>>>>>>>>>>>>> that from the cover letter. I read the statement about the
>>>>>>>>>>>>> unlikely and small window where HuC is not loaded during
>>>>>>>>>>>>> kernel init/resume and I guess did not pick up on the resume
>>>>>>>>>>>>> part.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Waiting for GSC to load HuC from i915 resume is not an 
>>>>>>>>>>>>> option?
>>>>>>>>>>>> GSC is an aux device exported by i915, so AFAIU GSC resume
>>>>>>>>>>>> can't start until i915 resume completes.
>>>>>>>>>>> I'll dig into this in the next few days since I want to
>>>>>>>>>>> understand how exactly it works. Or someone can help explain.
>>>>>>>>>>>
>>>>>>>>>>> If in the end conclusion will be that i915 resume indeed cannot
>>>>>>>>>>> wait for GSC, then I think auto-blocking of queued up contexts
>>>>>>>>>>> on media engines indeed sounds unavoidable. Otherwise, as you
>>>>>>>>>>> explained, user experience post resume wouldn't be good.
>>>>>>>>>> Even if we could implement a wait, I'm not sure we should. GSC
>>>>>>>>>> resume and HuC reload takes ~300ms in most cases, I don't think
>>>>>>>>>> we want to block within the i915 resume path for that long.
>>>>>>>>> Yeah maybe not. But entertaining the idea that it is technically
>>>>>>>>> possible to block - we could perhaps add uapi for userspace to
>>>>>>>>> mark contexts which want HuC access. Then track if there are any
>>>>>>>>> such contexts with outstanding submissions and only wait in
>>>>>>>>> resume if there are. If that would end up significantly less code
>>>>>>>>> on the i915 side to maintain is an open.
>>>>>>>>>
>>>>>>>>> What would be the end result from users point of view in case
>>>>>>>>> where it suspended during video playback? The proposed solution
>>>>>>>>> from this series sees the video stuck after resume. Maybe
>>>>>>>>> compositor blocks as well since I am not sure how well they
>>>>>>>>> handle one window not providing new data. Probably depends on
>>> the
>>>>>>>>> compositor.
>>>>>>>>>
>>>>>>>>> And then with a simpler solution definitely the whole resume
>>>>>>>>> would be delayed by 300ms.
>>>>>>>>>
>>>>>>>>> With my ChromeOS hat the stalled media engines does sound like a
>>>>>>>>> better solution. But with the maintainer hat I'd like all options
>>>>>>>>> evaluated since there is attractiveness if a good enough solution
>>>>>>>>> can be achieved with significantly less kernel code.
>>>>>>>>>
>>>>>>>>> You say 300ms is typical time for HuC load. How long it is on
>>>>>>>>> other platforms? If much faster then why is it so slow here?
>>>>>>>> The GSC itself has to come out of suspend before it can perform
>>>>>>>> the load, which takes a few tens of ms I believe. AFAIU the GSC is
>>>>>>>> also slower in processing the HuC load and auth compared to the
>>>>>>>> legacy path. The GSC FW team gave a 250ms limit for the time the
>>>>>>>> GSC FW needs from start of the resume flow to HuC load complete,
>>>>>>>> so I bumped that to ~300ms to account for all other SW
>>>>>>>> interactions, plus a bit of buffer. Note that a bit of the SW
>>>>>>>> overhead is caused by the fact that we have 2 mei modules in play
>>>>>>>> here: mei-gsc, which manages the GSC device itself (including
>>>>>>>> resume), and mei-pxp, which owns the pxp messaging, including HuC
>>>>>>>> load.
>>>>>>> And how long on other platforms (not DG2) do you know? Presumably
>>>>>>> there the wait is on the i915 resume path?
>>>>>> I don't have "official" expected load times at hand, but looking at
>>>>>> the BAT boot logs for this series for DG1 I see it takes ~10 ms to
>>>>>> load both GuC and HuC:
>>>>>>
>>>>>> <7>[    8.157838] i915 0000:03:00.0: [drm:intel_huc_init [i915]] GSC
>>>>>> loads huc=no <6>[    8.158632] i915 0000:03:00.0: [drm] GuC firmware
>>>>>> i915/dg1_guc_70.1.1.bin version 70.1 <6>[ 8.158634] i915
>>>>>> 0000:03:00.0: [drm] HuC firmware i915/dg1_huc_7.9.3.bin version 7.9
>>>>>> <7>[    8.164255] i915 0000:03:00.0: [drm:guc_enable_communication
>>>>>> [i915]] GuC communication enabled <6>[ 8.166111] i915
>>>>>> 0000:03:00.0: [drm] HuC authenticated
>>>>>>
>>>>>> Note that we increase the GT frequency all the way to the max before
>>>>>> starting the FW load, which speeds things up.
>>>>>>
>>>>>>>>>>> However, do we really need to lie in the getparam? How about
>>>>>>>>>>> extend or add a new one to separate the loading vs loaded
>>>>>>>>>>> states? Since userspace does not support DG2 HuC yet this
>>>>>>>>>>> should be doable.
>>>>>>>>>> I don't really have a preference here. The media team asked us
>>>>>>>>>> to do it this way because they wouldn't have a use for the
>>>>>>>>>> different "in progress" and "done" states. If they're ok with
>>>>>>>>>> having separate flags that's fine by me.
>>>>>>>>>> Tony, any feedback here?
>>>>>>>>> We don't even have any docs in i915_drm.h in terms of what it
>>> means:
>>>>>>>>> #define I915_PARAM_HUC_STATUS         42
>>>>>>>>>
>>>>>>>>> Seems to be a boolean. Status false vs true? Could you add some
>>>>>>>>> docs?
>>>>>>>> There is documentation above intel_huc_check_status(), which is
>>>>>>>> also updated in this series. I can move that to i915_drm.h.
>>>>>>> That would be great, thanks.
>>>>>>>
>>>>>>> And with so rich return codes already documented and exposed via
>>>>>>> uapi - would we really need to add anything new for DG2 apart for
>>>>>>> userspace to know that if zero is returned (not a negative error
>>>>>>> value) it should retry? I mean is there another negative error
>>>>>>> missing which would prevent zero transitioning to one?
>>>>>> I think if the auth fails we currently return 0, because the uc
>>>>>> state in that case would be "TRANSFERRED", i.e. DMA complete but not
>>>>>> fully enabled. I don't have anything against changing the FW state
>>>>>> to "ERROR" in this scenario and leave the 0 to mean "not done yet",
>>>>>> but I'd prefer the media team to comment on their needs for this
>>>>>> IOCTL before committing to anything.
>>>>>
>>>>> Currently media doesn't differentiate "delayed loading is in
>>>>> progress" with "HuC is authenticated and running". If the HuC
>>>>> authentication eventually fails, the user needs to check the debugfs
>>>>> to know the reason. IMHO, it's not a big problem as this is what we
>>>>> do even when the IOCTL returns non-zero values. + Carl to comment.
>>>> (Side note - debugfs can be assumed to not exist so it is not
>>>> interesting to users.)
>>>>
>>>> There isn't currently a "delayed loading is in progress" state, that's
>>>> the discussion in this thread, if and how to add it.
>>>>
>>>> Getparam it currently documents these states:
>>>>
>>>>   -ENODEV if HuC is not present on this platform,
>>>>   -EOPNOTSUPP if HuC firmware is disabled,
>>>>   -ENOPKG if HuC firmware was not installed,
>>>>   -ENOEXEC if HuC firmware is invalid or mismatched,
>>>>   0 if HuC firmware is not running,
>>>>   1 if HuC firmware is authenticated and running.
>>>>
>>>> This patch proposed to change this to:
>>>>
>>>>   1 if HuC firmware is authenticated and running or if delayed load is
>>>> in progress,
>>>>   0 if HuC firmware is not running and delayed load is not in progress
>>>>
>>>> Alternative idea is for DG2 (well in general) to add some more fine
>>>> grained states, so that i915 does not have to use 1 for both running
>>>> and loading. This may be adding a new error code for auth fails as
>>>> Daniele mentioned. Then UMD can know that if 0 is returned and
>>>> platform is DG2 it needs to query it again since it will transition to
>>>> either 1 or error eventually. This would mean the non error states
>>>> would be:
>>>>
>>>>   0 not running (aka loading)
>>>>   1 running (and authenticated)
>>>>
>>>> @Daniele - one more thing - can you make sure in the series (if you
>>>> haven't already) that if HuC status was in any error before suspend
>>>> reload is not re-tried on resume? My thinking is that the error is
>>>> likely to persist and we don't want to impose long delay on every
>>>> resume afterwards. Makes sense to you?
>>>>
>>>> @Tony - one more question for the UMD. Or two.
>>>>
>>>> How prevalent is usage of HuC on DG2 depending on what codecs need it?
>>>> Do you know in advance, before creating a GEM context, that HuC
>>>> commands will be sent to the engine or this changes at runtime?
>>> HuC is needed for all codecs while HW bit rate control (CBR, VBR) is 
>>> in use.
>>> It's also used by content protection. And UMD doesn't know if it 
>>> will be used
>>> later at context creation time.
>>>
>> from UMD perspective, We don’t care much on the normal initialization 
>> process
>> because, I could not image that a system is boot up, and user select 
>> a crypted content
>> to playback, and huc is still not ready.
>> of course, We are  also ok to query the huc status twice, and wait if 
>> the status is "0 not running"
>> to avoid potential issue.
>>
>> I suppose the main possible issue will happen in the 
>> hibernation/awake process, it is transparent to UMD.
>> UMD will not call ioctrl  to query huc status in this process, and 
>> will continue to send command buffer to KMD.
>
> I think there is an agreement that it is ok to return 0 to mark the 
> load still in progress and 1 for load & auth complete. However, double 
> checking the code it turns out that we currently return 0 on load 
> failure, even if that's not particularly clear from the comment. I can 
> easily change that to be an error code, but not sure if that's 
> considered an API breakage considering it's not a well documented 
> behavior. I believe that on pre-DG2 userspace considers 1 as ok and 
> everything else as failure, so changing the ioctl to return an error 
> code on failure and 0 for load pending (with the latter being a 
> DG2-esclusive code for now) should be safe, but I'd like confirmation 
> that I'm not breaking API before sending the relevant code.

The UMD code is like this:

     struct drm_i915_getparam gp;
     int32_t value;
     gp.param = I915_PARAM_HUC_STATUS;
     gp.value = &value;
     ret = ioctl(fd, DRM_IOCTL_I915_GETPARAM, &gp);
     if (ret != 0)
         hasHuC = 0
     else
         if (value == 0)
             hasHuC = 0;
         else
             hasHuC = 1;

Currently the behavior of i915 is:

     if there is an error, ioctl returns -1, and set errno as 
ENODEV/EOPNOTSUPP/ENOPKG/ENOEXEC;

     otherwise, set *(gp.value) as 0 if HuC is not running, or 1 if HuC 
is authenticated.

Hi Daniele, which value are you going to change - the "ret" or the "value"?


Thanks,

Tony

>
> Thanks,
> Daniele
>
>>
>>> Thanks,
>>>
>>> Tony
>>>
>>>> Regards,
>>>>
>>>> Tvrtko
>

Daniele Ceraolo Spurio July 6, 2022, 7:29 p.m. UTC | #20

On 7/6/2022 10:26 AM, Ye, Tony wrote:
>
> On 7/5/2022 4:30 PM, Ceraolo Spurio, Daniele wrote:
>>
>>
>> On 6/15/2022 7:28 PM, Zhang, Carl wrote:
>>>> On From: Ye, Tony <tony.ye@intel.com>
>>>> Sent: Thursday, June 16, 2022 12:15 AM
>>>>
>>>>
>>>> On 6/15/2022 3:13 AM, Tvrtko Ursulin wrote:
>>>>> On 15/06/2022 00:15, Ye, Tony wrote:
>>>>>> On 6/14/2022 8:30 AM, Ceraolo Spurio, Daniele wrote:
>>>>>>> On 6/14/2022 12:44 AM, Tvrtko Ursulin wrote:
>>>>>>>> On 13/06/2022 19:13, Ceraolo Spurio, Daniele wrote:
>>>>>>>>> On 6/13/2022 10:39 AM, Tvrtko Ursulin wrote:
>>>>>>>>>> On 13/06/2022 18:06, Ceraolo Spurio, Daniele wrote:
>>>>>>>>>>> On 6/13/2022 9:56 AM, Tvrtko Ursulin wrote:
>>>>>>>>>>>> On 13/06/2022 17:41, Ceraolo Spurio, Daniele wrote:
>>>>>>>>>>>>> On 6/13/2022 9:31 AM, Tvrtko Ursulin wrote:
>>>>>>>>>>>>>> On 13/06/2022 16:39, Ceraolo Spurio, Daniele wrote:
>>>>>>>>>>>>>>> On 6/13/2022 1:16 AM, Tvrtko Ursulin wrote:
>>>>>>>>>>>>>>>> On 10/06/2022 00:19, Daniele Ceraolo Spurio wrote:
>>>>>>>>>>>>>>>>> On DG2, HuC loading is performed by the GSC, via a PXP
>>>>>>>>>>>>>>>>> command. The load operation itself is relatively simple
>>>>>>>>>>>>>>>>> (just send a message to the GSC with the physical address
>>>>>>>>>>>>>>>>> of the HuC in LMEM), but there are timing changes that
>>>>>>>>>>>>>>>>> requires special attention. In particular, to send a PXP
>>>>>>>>>>>>>>>>> command we need to first export the GSC driver and then
>>>>>>>>>>>>>>>>> wait for the mei-gsc and mei-pxp modules to start, which
>>>>>>>>>>>>>>>>> means that HuC load will complete after i915 load is
>>>>>>>>>>>>>>>>> complete. This means that there is a small window of time
>>>>>>>>>>>>>>>>> after i915 is registered and before HuC is loaded during
>>>>>>>>>>>>>>>>> which userspace could submit and/or checking the HuC load
>>>>>>>>>>>>>>>>> status, although this is quite unlikely to happen (HuC is
>>>>>>>>>>>>>>>>> usually loaded before kernel init/resume completes).
>>>>>>>>>>>>>>>>> We've consulted with the media team in regards to how to
>>>>>>>>>>>>>>>>> handle this and they've asked us to do the following:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 1) Report HuC as loaded in the getparam IOCTL even if 
>>>>>>>>>>>>>>>>> load
>>>>>>>>>>>>>>>>> is still in progress. The media driver uses the IOCTL 
>>>>>>>>>>>>>>>>> as a
>>>>>>>>>>>>>>>>> way to check if HuC is enabled and then includes a
>>>>>>>>>>>>>>>>> secondary check in the batches to get the actual status,
>>>>>>>>>>>>>>>>> so doing it this way allows userspace to keep working
>>>>>>>>>>>>>>>>> without changes.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 2) Stall all userspace VCS submission until HuC is 
>>>>>>>>>>>>>>>>> loaded.
>>>>>>>>>>>>>>>>> Stalls are
>>>>>>>>>>>>>>>>> expected to be very rare (if any), due to the fact that
>>>>>>>>>>>>>>>>> HuC is usually loaded before kernel init/resume is
>>>>>>>>>>>>>>>>> completed.
>>>>>>>>>>>>>>>> Motivation to add these complications into i915 are not
>>>>>>>>>>>>>>>> clear to me here. I mean there is no HuC on DG2 _yet_ is
>>>>>>>>>>>>>>>> the premise of the series, right? So no backwards
>>>>>>>>>>>>>>>> compatibility concerns. In this case why jump through the
>>>>>>>>>>>>>>>> hoops and not let userspace handle all of this by just
>>>>>>>>>>>>>>>> leaving the getparam return the true status?
>>>>>>>>>>>>>>> The main areas impacted by the fact that we can't guarantee
>>>>>>>>>>>>>>> that HuC load is complete when i915 starts accepting
>>>>>>>>>>>>>>> submissions are boot and suspend/resume, with the latter
>>>>>>>>>>>>>>> being the main problem; GT reset is not a concern because
>>>>>>>>>>>>>>> HuC now survives it. A suspend/resume can be transparent to
>>>>>>>>>>>>>>> userspace and therefore the HuC status can temporarily flip
>>>>>>>>>>>>>>> from loaded to not without userspace knowledge, especially
>>>>>>>>>>>>>>> if we start going into deeper suspend states and start
>>>>>>>>>>>>>>> causing HuC resets when we go into runtime suspend. Note
>>>>>>>>>>>>>>> that this is different from what happens during GT reset 
>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>> older platforms, because in that scenario we guarantee that
>>>>>>>>>>>>>>> HuC reload is complete before we restart the submission
>>>>>>>>>>>>>>> back-end, so userspace doesn't notice that the HuC status
>>>>>>>>>>>>>>> change. We had an internal discussion about this problem
>>>>>>>>>>>>>>> with both media and i915 archs and the conclusion was that
>>>>>>>>>>>>>>> the best option is for i915 to stall media submission while
>>>>>>>>>>>>>>> HuC (re-)load is in progress.
>>>>>>>>>>>>>> Resume is potentialy a good reason - I did not pick up on
>>>>>>>>>>>>>> that from the cover letter. I read the statement about the
>>>>>>>>>>>>>> unlikely and small window where HuC is not loaded during
>>>>>>>>>>>>>> kernel init/resume and I guess did not pick up on the resume
>>>>>>>>>>>>>> part.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Waiting for GSC to load HuC from i915 resume is not an 
>>>>>>>>>>>>>> option?
>>>>>>>>>>>>> GSC is an aux device exported by i915, so AFAIU GSC resume
>>>>>>>>>>>>> can't start until i915 resume completes.
>>>>>>>>>>>> I'll dig into this in the next few days since I want to
>>>>>>>>>>>> understand how exactly it works. Or someone can help explain.
>>>>>>>>>>>>
>>>>>>>>>>>> If in the end conclusion will be that i915 resume indeed 
>>>>>>>>>>>> cannot
>>>>>>>>>>>> wait for GSC, then I think auto-blocking of queued up contexts
>>>>>>>>>>>> on media engines indeed sounds unavoidable. Otherwise, as you
>>>>>>>>>>>> explained, user experience post resume wouldn't be good.
>>>>>>>>>>> Even if we could implement a wait, I'm not sure we should. GSC
>>>>>>>>>>> resume and HuC reload takes ~300ms in most cases, I don't think
>>>>>>>>>>> we want to block within the i915 resume path for that long.
>>>>>>>>>> Yeah maybe not. But entertaining the idea that it is technically
>>>>>>>>>> possible to block - we could perhaps add uapi for userspace to
>>>>>>>>>> mark contexts which want HuC access. Then track if there are any
>>>>>>>>>> such contexts with outstanding submissions and only wait in
>>>>>>>>>> resume if there are. If that would end up significantly less 
>>>>>>>>>> code
>>>>>>>>>> on the i915 side to maintain is an open.
>>>>>>>>>>
>>>>>>>>>> What would be the end result from users point of view in case
>>>>>>>>>> where it suspended during video playback? The proposed solution
>>>>>>>>>> from this series sees the video stuck after resume. Maybe
>>>>>>>>>> compositor blocks as well since I am not sure how well they
>>>>>>>>>> handle one window not providing new data. Probably depends on
>>>> the
>>>>>>>>>> compositor.
>>>>>>>>>>
>>>>>>>>>> And then with a simpler solution definitely the whole resume
>>>>>>>>>> would be delayed by 300ms.
>>>>>>>>>>
>>>>>>>>>> With my ChromeOS hat the stalled media engines does sound like a
>>>>>>>>>> better solution. But with the maintainer hat I'd like all 
>>>>>>>>>> options
>>>>>>>>>> evaluated since there is attractiveness if a good enough 
>>>>>>>>>> solution
>>>>>>>>>> can be achieved with significantly less kernel code.
>>>>>>>>>>
>>>>>>>>>> You say 300ms is typical time for HuC load. How long it is on
>>>>>>>>>> other platforms? If much faster then why is it so slow here?
>>>>>>>>> The GSC itself has to come out of suspend before it can perform
>>>>>>>>> the load, which takes a few tens of ms I believe. AFAIU the 
>>>>>>>>> GSC is
>>>>>>>>> also slower in processing the HuC load and auth compared to the
>>>>>>>>> legacy path. The GSC FW team gave a 250ms limit for the time the
>>>>>>>>> GSC FW needs from start of the resume flow to HuC load complete,
>>>>>>>>> so I bumped that to ~300ms to account for all other SW
>>>>>>>>> interactions, plus a bit of buffer. Note that a bit of the SW
>>>>>>>>> overhead is caused by the fact that we have 2 mei modules in play
>>>>>>>>> here: mei-gsc, which manages the GSC device itself (including
>>>>>>>>> resume), and mei-pxp, which owns the pxp messaging, including HuC
>>>>>>>>> load.
>>>>>>>> And how long on other platforms (not DG2) do you know? Presumably
>>>>>>>> there the wait is on the i915 resume path?
>>>>>>> I don't have "official" expected load times at hand, but looking at
>>>>>>> the BAT boot logs for this series for DG1 I see it takes ~10 ms to
>>>>>>> load both GuC and HuC:
>>>>>>>
>>>>>>> <7>[    8.157838] i915 0000:03:00.0: [drm:intel_huc_init [i915]] 
>>>>>>> GSC
>>>>>>> loads huc=no <6>[    8.158632] i915 0000:03:00.0: [drm] GuC 
>>>>>>> firmware
>>>>>>> i915/dg1_guc_70.1.1.bin version 70.1 <6>[ 8.158634] i915
>>>>>>> 0000:03:00.0: [drm] HuC firmware i915/dg1_huc_7.9.3.bin version 7.9
>>>>>>> <7>[    8.164255] i915 0000:03:00.0: [drm:guc_enable_communication
>>>>>>> [i915]] GuC communication enabled <6>[ 8.166111] i915
>>>>>>> 0000:03:00.0: [drm] HuC authenticated
>>>>>>>
>>>>>>> Note that we increase the GT frequency all the way to the max 
>>>>>>> before
>>>>>>> starting the FW load, which speeds things up.
>>>>>>>
>>>>>>>>>>>> However, do we really need to lie in the getparam? How about
>>>>>>>>>>>> extend or add a new one to separate the loading vs loaded
>>>>>>>>>>>> states? Since userspace does not support DG2 HuC yet this
>>>>>>>>>>>> should be doable.
>>>>>>>>>>> I don't really have a preference here. The media team asked us
>>>>>>>>>>> to do it this way because they wouldn't have a use for the
>>>>>>>>>>> different "in progress" and "done" states. If they're ok with
>>>>>>>>>>> having separate flags that's fine by me.
>>>>>>>>>>> Tony, any feedback here?
>>>>>>>>>> We don't even have any docs in i915_drm.h in terms of what it
>>>> means:
>>>>>>>>>> #define I915_PARAM_HUC_STATUS         42
>>>>>>>>>>
>>>>>>>>>> Seems to be a boolean. Status false vs true? Could you add some
>>>>>>>>>> docs?
>>>>>>>>> There is documentation above intel_huc_check_status(), which is
>>>>>>>>> also updated in this series. I can move that to i915_drm.h.
>>>>>>>> That would be great, thanks.
>>>>>>>>
>>>>>>>> And with so rich return codes already documented and exposed via
>>>>>>>> uapi - would we really need to add anything new for DG2 apart for
>>>>>>>> userspace to know that if zero is returned (not a negative error
>>>>>>>> value) it should retry? I mean is there another negative error
>>>>>>>> missing which would prevent zero transitioning to one?
>>>>>>> I think if the auth fails we currently return 0, because the uc
>>>>>>> state in that case would be "TRANSFERRED", i.e. DMA complete but 
>>>>>>> not
>>>>>>> fully enabled. I don't have anything against changing the FW state
>>>>>>> to "ERROR" in this scenario and leave the 0 to mean "not done yet",
>>>>>>> but I'd prefer the media team to comment on their needs for this
>>>>>>> IOCTL before committing to anything.
>>>>>>
>>>>>> Currently media doesn't differentiate "delayed loading is in
>>>>>> progress" with "HuC is authenticated and running". If the HuC
>>>>>> authentication eventually fails, the user needs to check the debugfs
>>>>>> to know the reason. IMHO, it's not a big problem as this is what we
>>>>>> do even when the IOCTL returns non-zero values. + Carl to comment.
>>>>> (Side note - debugfs can be assumed to not exist so it is not
>>>>> interesting to users.)
>>>>>
>>>>> There isn't currently a "delayed loading is in progress" state, 
>>>>> that's
>>>>> the discussion in this thread, if and how to add it.
>>>>>
>>>>> Getparam it currently documents these states:
>>>>>
>>>>>   -ENODEV if HuC is not present on this platform,
>>>>>   -EOPNOTSUPP if HuC firmware is disabled,
>>>>>   -ENOPKG if HuC firmware was not installed,
>>>>>   -ENOEXEC if HuC firmware is invalid or mismatched,
>>>>>   0 if HuC firmware is not running,
>>>>>   1 if HuC firmware is authenticated and running.
>>>>>
>>>>> This patch proposed to change this to:
>>>>>
>>>>>   1 if HuC firmware is authenticated and running or if delayed 
>>>>> load is
>>>>> in progress,
>>>>>   0 if HuC firmware is not running and delayed load is not in 
>>>>> progress
>>>>>
>>>>> Alternative idea is for DG2 (well in general) to add some more fine
>>>>> grained states, so that i915 does not have to use 1 for both running
>>>>> and loading. This may be adding a new error code for auth fails as
>>>>> Daniele mentioned. Then UMD can know that if 0 is returned and
>>>>> platform is DG2 it needs to query it again since it will 
>>>>> transition to
>>>>> either 1 or error eventually. This would mean the non error states
>>>>> would be:
>>>>>
>>>>>   0 not running (aka loading)
>>>>>   1 running (and authenticated)
>>>>>
>>>>> @Daniele - one more thing - can you make sure in the series (if you
>>>>> haven't already) that if HuC status was in any error before suspend
>>>>> reload is not re-tried on resume? My thinking is that the error is
>>>>> likely to persist and we don't want to impose long delay on every
>>>>> resume afterwards. Makes sense to you?
>>>>>
>>>>> @Tony - one more question for the UMD. Or two.
>>>>>
>>>>> How prevalent is usage of HuC on DG2 depending on what codecs need 
>>>>> it?
>>>>> Do you know in advance, before creating a GEM context, that HuC
>>>>> commands will be sent to the engine or this changes at runtime?
>>>> HuC is needed for all codecs while HW bit rate control (CBR, VBR) 
>>>> is in use.
>>>> It's also used by content protection. And UMD doesn't know if it 
>>>> will be used
>>>> later at context creation time.
>>>>
>>> from UMD perspective, We don’t care much on the normal 
>>> initialization process
>>> because, I could not image that a system is boot up, and user select 
>>> a crypted content
>>> to playback, and huc is still not ready.
>>> of course, We are  also ok to query the huc status twice, and wait 
>>> if the status is "0 not running"
>>> to avoid potential issue.
>>>
>>> I suppose the main possible issue will happen in the 
>>> hibernation/awake process, it is transparent to UMD.
>>> UMD will not call ioctrl  to query huc status in this process, and 
>>> will continue to send command buffer to KMD.
>>
>> I think there is an agreement that it is ok to return 0 to mark the 
>> load still in progress and 1 for load & auth complete. However, 
>> double checking the code it turns out that we currently return 0 on 
>> load failure, even if that's not particularly clear from the comment. 
>> I can easily change that to be an error code, but not sure if that's 
>> considered an API breakage considering it's not a well documented 
>> behavior. I believe that on pre-DG2 userspace considers 1 as ok and 
>> everything else as failure, so changing the ioctl to return an error 
>> code on failure and 0 for load pending (with the latter being a 
>> DG2-esclusive code for now) should be safe, but I'd like confirmation 
>> that I'm not breaking API before sending the relevant code.
>
> The UMD code is like this:
>
>     struct drm_i915_getparam gp;
>     int32_t value;
>     gp.param = I915_PARAM_HUC_STATUS;
>     gp.value = &value;
>     ret = ioctl(fd, DRM_IOCTL_I915_GETPARAM, &gp);
>     if (ret != 0)
>         hasHuC = 0
>     else
>         if (value == 0)
>             hasHuC = 0;
>         else
>             hasHuC = 1;
>
> Currently the behavior of i915 is:
>
>     if there is an error, ioctl returns -1, and set errno as 
> ENODEV/EOPNOTSUPP/ENOPKG/ENOEXEC;

Not exactly. The current map of FW states to ioctl behavior is:

FIRMWARE_NOT_SUPPORTED -> return -ENODEV;
FIRMWARE_DISABLED -> return -EOPNOTSUPP;
FIRMWARE_MISSING ->  return -ENOPKG;
FIRMWARE_ERROR -> return -ENOEXEC
FIRMWARE_INIT_FAIL -> return 0, value 0
FIRMWARE_LOAD_FAIL -> return 0, value 0
FIRMWARE_RUNNING -> return 0, value 1

So we do have 2 error state in which the ioctl succeeds.

>
>     otherwise, set *(gp.value) as 0 if HuC is not running, or 1 if HuC 
> is authenticated.
>
> Hi Daniele, which value are you going to change - the "ret" or the 
> "value"?

The idea is to change the 2 FAIL states above to return an error 
(probably -EIO) instead of setting value to 0. This would be compatible 
with your code snippet, because it'll hit the ret != 0 condition. Value 
0 can then be re-purposed for DG2 to indicate "load pending", which 
would not be compatible with your current code, but being a new addition 
for a new platform does not necessarily need to be.

This said, I'm not sure if changing the return behavior of INIT_FAIL and 
LOAD_FAIL is API breakage or not, given that it won't impact userspace 
expectations. Tvrtko, any feedback here?

Thanks,
Daniele

>
>
> Thanks,
>
> Tony
>
>>
>> Thanks,
>> Daniele
>>
>>>
>>>> Thanks,
>>>>
>>>> Tony
>>>>
>>>>> Regards,
>>>>>
>>>>> Tvrtko
>>

Ye, Tony July 6, 2022, 8:11 p.m. UTC | #21

On 7/6/2022 12:29 PM, Ceraolo Spurio, Daniele wrote:
>
>
> On 7/6/2022 10:26 AM, Ye, Tony wrote:
>>
>> On 7/5/2022 4:30 PM, Ceraolo Spurio, Daniele wrote:
>>>
>>>
>>> On 6/15/2022 7:28 PM, Zhang, Carl wrote:
>>>>> On From: Ye, Tony <tony.ye@intel.com>
>>>>> Sent: Thursday, June 16, 2022 12:15 AM
>>>>>
>>>>>
>>>>> On 6/15/2022 3:13 AM, Tvrtko Ursulin wrote:
>>>>>> On 15/06/2022 00:15, Ye, Tony wrote:
>>>>>>> On 6/14/2022 8:30 AM, Ceraolo Spurio, Daniele wrote:
>>>>>>>> On 6/14/2022 12:44 AM, Tvrtko Ursulin wrote:
>>>>>>>>> On 13/06/2022 19:13, Ceraolo Spurio, Daniele wrote:
>>>>>>>>>> On 6/13/2022 10:39 AM, Tvrtko Ursulin wrote:
>>>>>>>>>>> On 13/06/2022 18:06, Ceraolo Spurio, Daniele wrote:
>>>>>>>>>>>> On 6/13/2022 9:56 AM, Tvrtko Ursulin wrote:
>>>>>>>>>>>>> On 13/06/2022 17:41, Ceraolo Spurio, Daniele wrote:
>>>>>>>>>>>>>> On 6/13/2022 9:31 AM, Tvrtko Ursulin wrote:
>>>>>>>>>>>>>>> On 13/06/2022 16:39, Ceraolo Spurio, Daniele wrote:
>>>>>>>>>>>>>>>> On 6/13/2022 1:16 AM, Tvrtko Ursulin wrote:
>>>>>>>>>>>>>>>>> On 10/06/2022 00:19, Daniele Ceraolo Spurio wrote:
>>>>>>>>>>>>>>>>>> On DG2, HuC loading is performed by the GSC, via a PXP
>>>>>>>>>>>>>>>>>> command. The load operation itself is relatively simple
>>>>>>>>>>>>>>>>>> (just send a message to the GSC with the physical 
>>>>>>>>>>>>>>>>>> address
>>>>>>>>>>>>>>>>>> of the HuC in LMEM), but there are timing changes that
>>>>>>>>>>>>>>>>>> requires special attention. In particular, to send a PXP
>>>>>>>>>>>>>>>>>> command we need to first export the GSC driver and then
>>>>>>>>>>>>>>>>>> wait for the mei-gsc and mei-pxp modules to start, which
>>>>>>>>>>>>>>>>>> means that HuC load will complete after i915 load is
>>>>>>>>>>>>>>>>>> complete. This means that there is a small window of 
>>>>>>>>>>>>>>>>>> time
>>>>>>>>>>>>>>>>>> after i915 is registered and before HuC is loaded during
>>>>>>>>>>>>>>>>>> which userspace could submit and/or checking the HuC 
>>>>>>>>>>>>>>>>>> load
>>>>>>>>>>>>>>>>>> status, although this is quite unlikely to happen 
>>>>>>>>>>>>>>>>>> (HuC is
>>>>>>>>>>>>>>>>>> usually loaded before kernel init/resume completes).
>>>>>>>>>>>>>>>>>> We've consulted with the media team in regards to how to
>>>>>>>>>>>>>>>>>> handle this and they've asked us to do the following:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 1) Report HuC as loaded in the getparam IOCTL even if 
>>>>>>>>>>>>>>>>>> load
>>>>>>>>>>>>>>>>>> is still in progress. The media driver uses the IOCTL 
>>>>>>>>>>>>>>>>>> as a
>>>>>>>>>>>>>>>>>> way to check if HuC is enabled and then includes a
>>>>>>>>>>>>>>>>>> secondary check in the batches to get the actual status,
>>>>>>>>>>>>>>>>>> so doing it this way allows userspace to keep working
>>>>>>>>>>>>>>>>>> without changes.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 2) Stall all userspace VCS submission until HuC is 
>>>>>>>>>>>>>>>>>> loaded.
>>>>>>>>>>>>>>>>>> Stalls are
>>>>>>>>>>>>>>>>>> expected to be very rare (if any), due to the fact that
>>>>>>>>>>>>>>>>>> HuC is usually loaded before kernel init/resume is
>>>>>>>>>>>>>>>>>> completed.
>>>>>>>>>>>>>>>>> Motivation to add these complications into i915 are not
>>>>>>>>>>>>>>>>> clear to me here. I mean there is no HuC on DG2 _yet_ is
>>>>>>>>>>>>>>>>> the premise of the series, right? So no backwards
>>>>>>>>>>>>>>>>> compatibility concerns. In this case why jump through the
>>>>>>>>>>>>>>>>> hoops and not let userspace handle all of this by just
>>>>>>>>>>>>>>>>> leaving the getparam return the true status?
>>>>>>>>>>>>>>>> The main areas impacted by the fact that we can't 
>>>>>>>>>>>>>>>> guarantee
>>>>>>>>>>>>>>>> that HuC load is complete when i915 starts accepting
>>>>>>>>>>>>>>>> submissions are boot and suspend/resume, with the latter
>>>>>>>>>>>>>>>> being the main problem; GT reset is not a concern because
>>>>>>>>>>>>>>>> HuC now survives it. A suspend/resume can be 
>>>>>>>>>>>>>>>> transparent to
>>>>>>>>>>>>>>>> userspace and therefore the HuC status can temporarily 
>>>>>>>>>>>>>>>> flip
>>>>>>>>>>>>>>>> from loaded to not without userspace knowledge, especially
>>>>>>>>>>>>>>>> if we start going into deeper suspend states and start
>>>>>>>>>>>>>>>> causing HuC resets when we go into runtime suspend. Note
>>>>>>>>>>>>>>>> that this is different from what happens during GT 
>>>>>>>>>>>>>>>> reset for
>>>>>>>>>>>>>>>> older platforms, because in that scenario we guarantee 
>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>> HuC reload is complete before we restart the submission
>>>>>>>>>>>>>>>> back-end, so userspace doesn't notice that the HuC status
>>>>>>>>>>>>>>>> change. We had an internal discussion about this problem
>>>>>>>>>>>>>>>> with both media and i915 archs and the conclusion was that
>>>>>>>>>>>>>>>> the best option is for i915 to stall media submission 
>>>>>>>>>>>>>>>> while
>>>>>>>>>>>>>>>> HuC (re-)load is in progress.
>>>>>>>>>>>>>>> Resume is potentialy a good reason - I did not pick up on
>>>>>>>>>>>>>>> that from the cover letter. I read the statement about the
>>>>>>>>>>>>>>> unlikely and small window where HuC is not loaded during
>>>>>>>>>>>>>>> kernel init/resume and I guess did not pick up on the 
>>>>>>>>>>>>>>> resume
>>>>>>>>>>>>>>> part.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Waiting for GSC to load HuC from i915 resume is not an 
>>>>>>>>>>>>>>> option?
>>>>>>>>>>>>>> GSC is an aux device exported by i915, so AFAIU GSC resume
>>>>>>>>>>>>>> can't start until i915 resume completes.
>>>>>>>>>>>>> I'll dig into this in the next few days since I want to
>>>>>>>>>>>>> understand how exactly it works. Or someone can help explain.
>>>>>>>>>>>>>
>>>>>>>>>>>>> If in the end conclusion will be that i915 resume indeed 
>>>>>>>>>>>>> cannot
>>>>>>>>>>>>> wait for GSC, then I think auto-blocking of queued up 
>>>>>>>>>>>>> contexts
>>>>>>>>>>>>> on media engines indeed sounds unavoidable. Otherwise, as you
>>>>>>>>>>>>> explained, user experience post resume wouldn't be good.
>>>>>>>>>>>> Even if we could implement a wait, I'm not sure we should. GSC
>>>>>>>>>>>> resume and HuC reload takes ~300ms in most cases, I don't 
>>>>>>>>>>>> think
>>>>>>>>>>>> we want to block within the i915 resume path for that long.
>>>>>>>>>>> Yeah maybe not. But entertaining the idea that it is 
>>>>>>>>>>> technically
>>>>>>>>>>> possible to block - we could perhaps add uapi for userspace to
>>>>>>>>>>> mark contexts which want HuC access. Then track if there are 
>>>>>>>>>>> any
>>>>>>>>>>> such contexts with outstanding submissions and only wait in
>>>>>>>>>>> resume if there are. If that would end up significantly less 
>>>>>>>>>>> code
>>>>>>>>>>> on the i915 side to maintain is an open.
>>>>>>>>>>>
>>>>>>>>>>> What would be the end result from users point of view in case
>>>>>>>>>>> where it suspended during video playback? The proposed solution
>>>>>>>>>>> from this series sees the video stuck after resume. Maybe
>>>>>>>>>>> compositor blocks as well since I am not sure how well they
>>>>>>>>>>> handle one window not providing new data. Probably depends on
>>>>> the
>>>>>>>>>>> compositor.
>>>>>>>>>>>
>>>>>>>>>>> And then with a simpler solution definitely the whole resume
>>>>>>>>>>> would be delayed by 300ms.
>>>>>>>>>>>
>>>>>>>>>>> With my ChromeOS hat the stalled media engines does sound 
>>>>>>>>>>> like a
>>>>>>>>>>> better solution. But with the maintainer hat I'd like all 
>>>>>>>>>>> options
>>>>>>>>>>> evaluated since there is attractiveness if a good enough 
>>>>>>>>>>> solution
>>>>>>>>>>> can be achieved with significantly less kernel code.
>>>>>>>>>>>
>>>>>>>>>>> You say 300ms is typical time for HuC load. How long it is on
>>>>>>>>>>> other platforms? If much faster then why is it so slow here?
>>>>>>>>>> The GSC itself has to come out of suspend before it can perform
>>>>>>>>>> the load, which takes a few tens of ms I believe. AFAIU the 
>>>>>>>>>> GSC is
>>>>>>>>>> also slower in processing the HuC load and auth compared to the
>>>>>>>>>> legacy path. The GSC FW team gave a 250ms limit for the time the
>>>>>>>>>> GSC FW needs from start of the resume flow to HuC load complete,
>>>>>>>>>> so I bumped that to ~300ms to account for all other SW
>>>>>>>>>> interactions, plus a bit of buffer. Note that a bit of the SW
>>>>>>>>>> overhead is caused by the fact that we have 2 mei modules in 
>>>>>>>>>> play
>>>>>>>>>> here: mei-gsc, which manages the GSC device itself (including
>>>>>>>>>> resume), and mei-pxp, which owns the pxp messaging, including 
>>>>>>>>>> HuC
>>>>>>>>>> load.
>>>>>>>>> And how long on other platforms (not DG2) do you know? Presumably
>>>>>>>>> there the wait is on the i915 resume path?
>>>>>>>> I don't have "official" expected load times at hand, but 
>>>>>>>> looking at
>>>>>>>> the BAT boot logs for this series for DG1 I see it takes ~10 ms to
>>>>>>>> load both GuC and HuC:
>>>>>>>>
>>>>>>>> <7>[    8.157838] i915 0000:03:00.0: [drm:intel_huc_init 
>>>>>>>> [i915]] GSC
>>>>>>>> loads huc=no <6>[    8.158632] i915 0000:03:00.0: [drm] GuC 
>>>>>>>> firmware
>>>>>>>> i915/dg1_guc_70.1.1.bin version 70.1 <6>[ 8.158634] i915
>>>>>>>> 0000:03:00.0: [drm] HuC firmware i915/dg1_huc_7.9.3.bin version 
>>>>>>>> 7.9
>>>>>>>> <7>[    8.164255] i915 0000:03:00.0: [drm:guc_enable_communication
>>>>>>>> [i915]] GuC communication enabled <6>[ 8.166111] i915
>>>>>>>> 0000:03:00.0: [drm] HuC authenticated
>>>>>>>>
>>>>>>>> Note that we increase the GT frequency all the way to the max 
>>>>>>>> before
>>>>>>>> starting the FW load, which speeds things up.
>>>>>>>>
>>>>>>>>>>>>> However, do we really need to lie in the getparam? How about
>>>>>>>>>>>>> extend or add a new one to separate the loading vs loaded
>>>>>>>>>>>>> states? Since userspace does not support DG2 HuC yet this
>>>>>>>>>>>>> should be doable.
>>>>>>>>>>>> I don't really have a preference here. The media team asked us
>>>>>>>>>>>> to do it this way because they wouldn't have a use for the
>>>>>>>>>>>> different "in progress" and "done" states. If they're ok with
>>>>>>>>>>>> having separate flags that's fine by me.
>>>>>>>>>>>> Tony, any feedback here?
>>>>>>>>>>> We don't even have any docs in i915_drm.h in terms of what it
>>>>> means:
>>>>>>>>>>> #define I915_PARAM_HUC_STATUS         42
>>>>>>>>>>>
>>>>>>>>>>> Seems to be a boolean. Status false vs true? Could you add some
>>>>>>>>>>> docs?
>>>>>>>>>> There is documentation above intel_huc_check_status(), which is
>>>>>>>>>> also updated in this series. I can move that to i915_drm.h.
>>>>>>>>> That would be great, thanks.
>>>>>>>>>
>>>>>>>>> And with so rich return codes already documented and exposed via
>>>>>>>>> uapi - would we really need to add anything new for DG2 apart for
>>>>>>>>> userspace to know that if zero is returned (not a negative error
>>>>>>>>> value) it should retry? I mean is there another negative error
>>>>>>>>> missing which would prevent zero transitioning to one?
>>>>>>>> I think if the auth fails we currently return 0, because the uc
>>>>>>>> state in that case would be "TRANSFERRED", i.e. DMA complete 
>>>>>>>> but not
>>>>>>>> fully enabled. I don't have anything against changing the FW state
>>>>>>>> to "ERROR" in this scenario and leave the 0 to mean "not done 
>>>>>>>> yet",
>>>>>>>> but I'd prefer the media team to comment on their needs for this
>>>>>>>> IOCTL before committing to anything.
>>>>>>>
>>>>>>> Currently media doesn't differentiate "delayed loading is in
>>>>>>> progress" with "HuC is authenticated and running". If the HuC
>>>>>>> authentication eventually fails, the user needs to check the 
>>>>>>> debugfs
>>>>>>> to know the reason. IMHO, it's not a big problem as this is what we
>>>>>>> do even when the IOCTL returns non-zero values. + Carl to comment.
>>>>>> (Side note - debugfs can be assumed to not exist so it is not
>>>>>> interesting to users.)
>>>>>>
>>>>>> There isn't currently a "delayed loading is in progress" state, 
>>>>>> that's
>>>>>> the discussion in this thread, if and how to add it.
>>>>>>
>>>>>> Getparam it currently documents these states:
>>>>>>
>>>>>>   -ENODEV if HuC is not present on this platform,
>>>>>>   -EOPNOTSUPP if HuC firmware is disabled,
>>>>>>   -ENOPKG if HuC firmware was not installed,
>>>>>>   -ENOEXEC if HuC firmware is invalid or mismatched,
>>>>>>   0 if HuC firmware is not running,
>>>>>>   1 if HuC firmware is authenticated and running.
>>>>>>
>>>>>> This patch proposed to change this to:
>>>>>>
>>>>>>   1 if HuC firmware is authenticated and running or if delayed 
>>>>>> load is
>>>>>> in progress,
>>>>>>   0 if HuC firmware is not running and delayed load is not in 
>>>>>> progress
>>>>>>
>>>>>> Alternative idea is for DG2 (well in general) to add some more fine
>>>>>> grained states, so that i915 does not have to use 1 for both running
>>>>>> and loading. This may be adding a new error code for auth fails as
>>>>>> Daniele mentioned. Then UMD can know that if 0 is returned and
>>>>>> platform is DG2 it needs to query it again since it will 
>>>>>> transition to
>>>>>> either 1 or error eventually. This would mean the non error states
>>>>>> would be:
>>>>>>
>>>>>>   0 not running (aka loading)
>>>>>>   1 running (and authenticated)
>>>>>>
>>>>>> @Daniele - one more thing - can you make sure in the series (if you
>>>>>> haven't already) that if HuC status was in any error before suspend
>>>>>> reload is not re-tried on resume? My thinking is that the error is
>>>>>> likely to persist and we don't want to impose long delay on every
>>>>>> resume afterwards. Makes sense to you?
>>>>>>
>>>>>> @Tony - one more question for the UMD. Or two.
>>>>>>
>>>>>> How prevalent is usage of HuC on DG2 depending on what codecs 
>>>>>> need it?
>>>>>> Do you know in advance, before creating a GEM context, that HuC
>>>>>> commands will be sent to the engine or this changes at runtime?
>>>>> HuC is needed for all codecs while HW bit rate control (CBR, VBR) 
>>>>> is in use.
>>>>> It's also used by content protection. And UMD doesn't know if it 
>>>>> will be used
>>>>> later at context creation time.
>>>>>
>>>> from UMD perspective, We don’t care much on the normal 
>>>> initialization process
>>>> because, I could not image that a system is boot up, and user 
>>>> select a crypted content
>>>> to playback, and huc is still not ready.
>>>> of course, We are  also ok to query the huc status twice, and wait 
>>>> if the status is "0 not running"
>>>> to avoid potential issue.
>>>>
>>>> I suppose the main possible issue will happen in the 
>>>> hibernation/awake process, it is transparent to UMD.
>>>> UMD will not call ioctrl  to query huc status in this process, and 
>>>> will continue to send command buffer to KMD.
>>>
>>> I think there is an agreement that it is ok to return 0 to mark the 
>>> load still in progress and 1 for load & auth complete. However, 
>>> double checking the code it turns out that we currently return 0 on 
>>> load failure, even if that's not particularly clear from the 
>>> comment. I can easily change that to be an error code, but not sure 
>>> if that's considered an API breakage considering it's not a well 
>>> documented behavior. I believe that on pre-DG2 userspace considers 1 
>>> as ok and everything else as failure, so changing the ioctl to 
>>> return an error code on failure and 0 for load pending (with the 
>>> latter being a DG2-esclusive code for now) should be safe, but I'd 
>>> like confirmation that I'm not breaking API before sending the 
>>> relevant code.
>>
>> The UMD code is like this:
>>
>>     struct drm_i915_getparam gp;
>>     int32_t value;
>>     gp.param = I915_PARAM_HUC_STATUS;
>>     gp.value = &value;
>>     ret = ioctl(fd, DRM_IOCTL_I915_GETPARAM, &gp);
>>     if (ret != 0)
>>         hasHuC = 0
>>     else
>>         if (value == 0)
>>             hasHuC = 0;
>>         else
>>             hasHuC = 1;
>>
>> Currently the behavior of i915 is:
>>
>>     if there is an error, ioctl returns -1, and set errno as 
>> ENODEV/EOPNOTSUPP/ENOPKG/ENOEXEC;
>
> Not exactly. The current map of FW states to ioctl behavior is:
>
> FIRMWARE_NOT_SUPPORTED -> return -ENODEV;
> FIRMWARE_DISABLED -> return -EOPNOTSUPP;
> FIRMWARE_MISSING ->  return -ENOPKG;
> FIRMWARE_ERROR -> return -ENOEXEC

For these 4 errors, ioctl is returning -1 and setting the errno as one 
of the error values above.

ioctl is not returning the error values. It only returns -1.

This is exactly the behavior of i915 observed from the user space. You 
can verify it with the code snippet.

The igt huc_copy test is also checking the errno instead of the ret value:

https://gitlab.freedesktop.org/drm/igt-gpu-tools/-/blob/master/tests/i915/gem_huc_copy.c#L66

> FIRMWARE_INIT_FAIL -> return 0, value 0
> FIRMWARE_LOAD_FAIL -> return 0, value 0

If you change these two conditions to return -EIO, then what the user 
space actually gets is going to be:

     ioctl returns -1, and errno==EIO.

It's not going to break the current UMD as you said below.

Thanks,

Tony


> FIRMWARE_RUNNING -> return 0, value 1
>
> So we do have 2 error state in which the ioctl succeeds.
>
>>
>>     otherwise, set *(gp.value) as 0 if HuC is not running, or 1 if 
>> HuC is authenticated.
>>
>> Hi Daniele, which value are you going to change - the "ret" or the 
>> "value"?
>
> The idea is to change the 2 FAIL states above to return an error 
> (probably -EIO) instead of setting value to 0. This would be 
> compatible with your code snippet, because it'll hit the ret != 0 
> condition. Value 0 can then be re-purposed for DG2 to indicate "load 
> pending", which would not be compatible with your current code, but 
> being a new addition for a new platform does not necessarily need to be.
>
> This said, I'm not sure if changing the return behavior of INIT_FAIL 
> and LOAD_FAIL is API breakage or not, given that it won't impact 
> userspace expectations. Tvrtko, any feedback here?
>
> Thanks,
> Daniele
>
>>
>>
>> Thanks,
>>
>> Tony
>>
>>>
>>> Thanks,
>>> Daniele
>>>
>>>>
>>>>> Thanks,
>>>>>
>>>>> Tony
>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Tvrtko
>>>
>

[00/15] HuC loading for DG2

Message

Comments